draft-ietf-eai-rfc5335bis-10.txt   draft-ietf-eai-rfc5335bis-11.txt 
Email Address Internationalization Y. Abel Email Address Internationalization A. Yang
(EAI) TWNIC (EAI) TWNIC
Internet-Draft S. Steele Internet-Draft S. Steele
Obsoletes: 5335 (if approved) Microsoft Obsoletes: 5335 (if approved) Microsoft
Updates: 2045,5321,5322 March 15, 2011 Updates: 2045,5322 (if approved) N. Freed
(if approved) Intended status: Standards Track Oracle
Intended status: Standards Track Expires: January 11, 2012 July 10, 2011
Expires: September 16, 2011
Internationalized Email Headers Internationalized Email Headers
draft-ietf-eai-rfc5335bis-09 draft-ietf-eai-rfc5335bis-11
Abstract Abstract
Internet mail was originally limited to 7-bit ASCII. Recent Internet mail was originally limited to 7-bit ASCII. MIME added
enhancements support Unicode's UTF-8 encoding in portions of a support for the use of 8-bit character sets in body parts, and also
message. Full internationalization of electronic mail requires defined an encoded-word construct so other character sets could be
additional enhancement, including support for UTF-8 in user-oriented used in certain header field values. But full internationalization
header fields, such as in the To, From, and Subject fields. This of electronic mail requires additional enhancements to allow the use
document specifies an enhancement to Internet mail that permits of Unicode, including characters outside the ASCII repertoire, in
native UTF-8 support in the header and body of a message. mail addresses as well as direct use of Unicode in header fields like
From:, To:, and Subject:, without requiring the use of complex
encoded-word constructs. This document specifies an enhancement to
the Internet Message Format that allows use of Unicode in mail
addresses and most header field content.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 16, 2011. This Internet-Draft will expire on January 11, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Brief Overview of Changes . . . . . . . . . . . . . . . . 3 2. Terminology Used In This Specification . . . . . . . . . . . . 3
2. Relation to Other Standards . . . . . . . . . . . . . . . . . 3 3. Changes to Message Header Fields . . . . . . . . . . . . . . . 4
3. Background and History . . . . . . . . . . . . . . . . . . . . 4 3.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 4
4. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Syntax Extensions to RFC 5322 . . . . . . . . . . . . . . 5
5. Changes on Message Header Fields . . . . . . . . . . . . . . . 5 3.3. Changes to MIME Message Type Encoding Restrictions . . . . 6
5.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 5 3.4. The Message/global Media Type . . . . . . . . . . . . . . 6
5.2. Changes on MIME Headers . . . . . . . . . . . . . . . . . 6 4. Security Considerations . . . . . . . . . . . . . . . . . . . 8
5.3. Syntax Extensions to RFC 5322 . . . . . . . . . . . . . . 6 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
5.4. Change on addr-spec Syntax . . . . . . . . . . . . . . . . 8 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9
5.5. Trace Field Syntax . . . . . . . . . . . . . . . . . . . . 9 7. Edit history . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.6. message/global . . . . . . . . . . . . . . . . . . . . . . 9 7.1. draft-ietf-eai-rfc5335bis-00 . . . . . . . . . . . . . . . 9
6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 7.2. draft-ietf-eai-rfc5335bis-01 . . . . . . . . . . . . . . . 10
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 7.3. draft-ietf-eai-rfc5335bis-02 . . . . . . . . . . . . . . . 10
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 7.4. draft-ietf-eai-rfc5335bis-03 . . . . . . . . . . . . . . . 10
9. Edit history . . . . . . . . . . . . . . . . . . . . . . . . . 12 7.5. draft-ietf-eai-rfc5335bis-04 . . . . . . . . . . . . . . . 10
9.1. draft-ietf-eai-rfc5335bis-00 . . . . . . . . . . . . . . . 12 7.6. draft-ietf-eai-rfc5335bis-05 . . . . . . . . . . . . . . . 10
9.2. draft-ietf-eai-rfc5335bis-01 . . . . . . . . . . . . . . . 12 7.7. draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . . . . . . 10
9.3. draft-ietf-eai-rfc5335bis-02 . . . . . . . . . . . . . . . 12 7.8. draft-ietf-eai-rfc5335bis-07 . . . . . . . . . . . . . . . 10
9.4. draft-ietf-eai-rfc5335bis-03 . . . . . . . . . . . . . . . 13 7.9. draft-ietf-eai-rfc5335bis-09 . . . . . . . . . . . . . . . 10
9.5. draft-ietf-eai-rfc5335bis-04 . . . . . . . . . . . . . . . 13 7.10. draft-ietf-eai-rfc5335bis-10 . . . . . . . . . . . . . . . 10
9.6. draft-ietf-eai-rfc5335bis-05 . . . . . . . . . . . . . . . 13 7.11. draft-ietf-eai-rfc5335bis-11 . . . . . . . . . . . . . . . 11
9.7. draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . . . . . . 13 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.8. draft-ietf-eai-rfc5335bis-07 . . . . . . . . . . . . . . . 13 8.1. Normative References . . . . . . . . . . . . . . . . . . . 11
9.9. draft-ietf-eai-rfc5335bis-09 . . . . . . . . . . . . . . . 13 8.2. Informative References . . . . . . . . . . . . . . . . . . 12
9.10. draft-ietf-eai-rfc5335bis-10 . . . . . . . . . . . . . . . 13
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.1. Normative References . . . . . . . . . . . . . . . . . . . 13
10.2. Informative References . . . . . . . . . . . . . . . . . . 15
Appendix A. Changes to support UTF-8 . . . . . . . . . . . . . . 15
1. Introduction 1. Introduction
Internet mail distinguishes a message from its transport and further Internet mail distinguishes a message from its transport and further
divides a message between a header and a body [RFC5598]. Internet divides a message between a header and a body [RFC5598]. Internet
mail header fields contain a variety of strings that are intended to mail header field values contain a variety of strings that are
be user-visible. The range of supported characters for these strings intended to be user-visible. The range of supported characters for
was originally limited to a subset of [ASCII]; globalization of the these strings was originally limited to 7-bit [ASCII]. MIME
Internet requires support of the much larger set contained in UTF-8 [RFC2045] [RFC2046] [RFC2047] provides the ability to use additional
[RFC5198]. Complex encoding alternatives to UTF-8, as an overlay to character sets, but this support is limited to body part data and to
the existing ASCII base, would introduce inefficiencies as well as special encoded-word constructs that were only allowed in a limited
opportunities for processing errors. Native support for UTF-8 number of places in header field values.
encoding [RFC3629]. is widely available among systems now used over
the Internet. Hence supporting this encoding directly within email
is desired. This document specifies an enhancement to Internet mail
that permits the use of UTF-8 encoding, rather than only ASCII, as
the base form for header fields.
This specification is based on a model of native, end-to-end support
for UTF-8, which uses an "8-bit clean" environment . Support for
carriage across legacy, 7-bit infrastructure and for processing by
7-bit receivers requires additional mechanisms that are not provided
by this specification.
1.1. Brief Overview of Changes
This document updates email header [RFC5322] and MIME header
[RFC2045]. Email header value extends beyond ASCII, allowing UTF-8
encoding. MIME header lifts the prohibition against using content-
transfer-encoding on all subtypes under composite-type "message/".
Appendix A documents the derived ABNF rules that inherit support
UTF-8, due to the update of ABNF introduces from this document.
[Editor notes will be removed before Last Call]
[Editor Note 1: This revision (-08 and up) changed the ABNF approach
compared to previous revision (-07 and before). ]
[Editor Note 2: pending -- ABF to support IDN]
[Editor Note 3: Discuss with WG whether some fields, like Trace
header, have issues if UTF-8 encoding allowed in values]
2. Relation to Other Standards
This document updates Section 6.4 of [RFC2045]. It removes the
blanket ban on applying a content-transfer-encoding to all subtypes
of message/ and instead specifies that a composite subtype MAY
specify whether or not a content-transfer-encoding can be used for
that subtype, with "cannot be used" as the default.
This document also updates Section 3.4 of [RFC5322]. It Extended
mailbox address syntax to permit UTF-8 character in Section 5.3.
Allowing use of a content-transfer-encoding on subtypes of messages
is not limited to transmissions that are authorized by the SMTP
extension specified in [I-D.ietf-eai-rfc5336bis]. message/global (see
Section 5.6) of this document permits use of a content-transfer-
encoding.
3. Background and History
Mailbox names often represent the names of human users. Many of
these users throughout the world have names that are not normally
expressed with just the ASCII repertoire of characters, and would
like to use more or less their real names in their mailbox names.
These users are also likely to use non-ASCII text in their display
names and subjects of email messages, both received and sent. This
protocol specifies UTF-8 as the encoding to represent email header
field bodies.
The traditional format of email messages [RFC5322] allows only ASCII
characters in the header fields of messages. This prevents users
from having email addresses that contain non-ASCII characters. It
further forces non-ASCII text in display names, comments, and in free
text (such as in the "Subject:" field) to be encoded (as required by
MIME format [RFC2047]). This specification describes a change to the
email message format that is related to the SMTP message transport
change described in the associated documents
[I-D.ietf-eai-frmwrk-4952bis] and [I-D.ietf-eai-rfc5336bis], and that
allows non-ASCII characters in most email header fields. These
changes affect SMTP clients, SMTP servers, mail user agents (MUAs),
list expanders, gateways to other media, and all other processes that
parse or handle email messages.
As specified in [I-D.ietf-eai-rfc5336bis], an SMTP protocol extension
"UTF8SMTP" is used to prevent the transmission of messages with UTF-8
header fields to systems that cannot handle such messages.
Use of this SMTP extension helps prevent the introduction of such Globalization of the Internet requires support of the much larger set
messages into message stores that might misinterpret, improperly of characters provided by Unicode [RFC5198] in both mail addresses
display, or mangle such messages. It should be noted that using an and most header field values. Additionally, complex encoding schemes
ESMTP extension does not prevent transferring email messages with like encoded-words introduce inefficiencies as well as significant
UTF-8 header fields to other systems that use the email format for opportunities for processing errors. And finally, native support for
messages and that may not be upgraded, such as unextended POP and the UTF-8 charset is now available on most systems. Hence it is
IMAP servers. Changes to these protocols to handle UTF-8 header strongly desirable for Internet mail to support UTF-8 [RFC3629]
fields are addressed in [I-D.ietf-eai-rfc5721bis] and directly.
[I-D.ietf-eai-5378bis]. This document specifies an enhancement to the Internet Message Format
[RFC5322] and to MIME that permits the direct use of UTF-8, rather
than only ASCII, in header field values, including mail addresses. A
new media type, message/global, is defined for messages that use this
extended format. This specification also lifts the MIME restriction
on having non-identity content-transfer-encodings on any subtype of
the message top-level type so that message/global parts can be safely
transmitted across existing mail infrastructure.
The objective for this protocol is to allow UTF-8 in email header This specification is based on a model of native, end-to-end support
fields. for UTF-8, which depends on having an "8-bit clean" environment
assured by the transport system. Support for carriage across legacy,
7-bit infrastructure and for processing by 7-bit receivers requires
additional mechanisms that are not provided by these specifications.
4. Terminology 2. Terminology Used In This Specification
A plain ASCII string is fully compatible with [RFC5321] and A plain ASCII string is fully compatible with [RFC5321] and
[RFC5322]. In this document, non-ASCII strings are UTF-8 strings if [RFC5322]. In this document, non-ASCII strings are UTF-8 strings if
they are in header which contain at least one <UTF8-non-ascii>. they are in header field values which contain at least one <UTF8-non-
ascii> (see Section 3.1).
Unless otherwise noted, all terms used here are defined in [RFC5321], Unless otherwise noted, all terms used here are defined in [RFC5321],
[RFC5322], [I-D.ietf-eai-frmwrk-4952bis], or [RFC5322], [I-D.ietf-eai-frmwrk-4952bis], or
[I-D.ietf-eai-rfc5336bis]. [I-D.ietf-eai-rfc5336bis].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
5. Changes on Message Header Fields The term "8-bit" means octets are present in the data with values
above 0x7F.
SMTP clients can send header fields in UTF-8 format, if the UTF8SMTP 3. Changes to Message Header Fields
extension is advertised by the SMTP server or is permitted by other
transport mechanisms.
This protocol does NOT change the [RFC5322] rules for defining header To permit Unicode characters in field values, the header definition
field names. The bodies of header fields are allowed to contain in [RFC5322] is extended to support the new format. The following
UTF-8 characters, but the header field names themselves must contain sections specify the necessary changes to RFC 5322's ABNF.
only ASCII characters.
To permit UTF-8 characters in field values, the header definition in The syntax rules not mentioned below remain defined as in [RFC5322].
[RFC5322] is extended to support the new format. The following ABNF
is defined to substitute those definitions in [RFC5322].
The syntax rules not covered in this section remain as defined in Note that this protocol does not change RFC 5322 rules for defining
[RFC5322]. header field names. The bodies of header fields are allowed to
contain Unicode characters, but the header field names themselves
must contain only ASCII characters.
5.1. UTF-8 Syntax and Normalization Also note that messages in this format require the use of the
&UTF8SMTPbis; extension [I-D.ietf-eai-rfc5336bis] to be transferred
via SMTP.
3.1. UTF-8 Syntax and Normalization
UTF-8 characters can be defined in terms of octets using the UTF-8 characters can be defined in terms of octets using the
following ABNF [RFC5234], taken from [RFC3629]: following ABNF [RFC5234], taken from [RFC3629]:
UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4 UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4
UTF8-2 = <Defined in Section 4 of RFC3629> UTF8-2 = <Defined in Section 4 of RFC3629>
UTF8-3 = <Defined in Section 4 of RFC3629> UTF8-3 = <Defined in Section 4 of RFC3629>
UTF8-4 = <Defined in Section 4 of RFC3629> UTF8-4 = <Defined in Section 4 of RFC3629>
See [RFC5198] for a discussion of normalization; the use of See [RFC5198] for a discussion of Unicode normalization;
normalization form [NFC] is RECOMMENDED. Actually, if one is going normalization form [NFC] SHOULD be used. Actually, if one is going
to do internationalization properly, one of the most often-cited to do internationalization properly, one of the most often-cited
goals is to permit people to spell their names correctly. Since many goals is to permit people to spell their names correctly. Since many
mailbox local parts reflect personal names, that principle applies as mailbox local parts reflect personal names, that principle applies to
well. And NFKC is not recommended because it may lose information mailboxes as well. The NFKC normalization form SHOULD NOT be used
that is needed to correctly spell some names in unusual because it may lose information that is needed to correctly spell
circumstances. some names in some unusual circumstances.
5.2. Changes on MIME Headers
This specification updates Section 6.4 of [RFC2045]. [RFC2045]
prohibits applying a content-transfer-encoding to any subtypes of
"message/". This specification relaxes the rule -- it allows newly
defined MIME types to permit content-transfer-encoding, and it allows
content-transfer-encoding for message/global (see Section 5.6).
Background: Normally, transfer of message/global will be done in
8-bit-clean channels, and body parts will have "identity" encodings,
that is, no decoding is necessary. In the case where a message
containing a message/global is downgraded from 8-bit to 7-bit as
described in [RFC1652], an encoding may be applied to the message; if
the message travels multiple times between a 7-bit environment and an
environment implementing UTF8SMTP, multiple levels of encoding may
occur. This is expected to be rarely seen in practice, and the
potential complexity of other ways of dealing with the issue are
thought to be larger than the complexity of allowing nested encodings
where necessary.
5.3. Syntax Extensions to RFC 5322
The following rules are intended to extend the corresponding rules in 3.2. Syntax Extensions to RFC 5322
[RFC5322] in order to allow UTF-8 characters.
FWS = <Defined in Section 3.2.2 of RFC 5322> The following rules extend the ABNF syntax defined in [RFC5322] and
[RFC5234] in order to allow UTF-8 content.
CFWS = <Defined in Section 3.2.2 of RFC 5322> VCHAR =/ UTF8-non-ascii
ctext =/ UTF8-non-ascii ctext =/ UTF8-non-ascii
; Extending ctext in RFC 5322, Section 3.2.2
comment = "(" *([FWS] uCcontent) [FWS] ")"
word = uAtom / uQuoted-String
This means that all the [RFC5322] constructs that build upon these atext =/ UTF8-non-ascii
will permit UTF-8 characters, including comments and quoted strings.
We do not change the syntax of <atext> in order to allow UTF-8
characters in <addr-spec>. This would also allow UTF-8 characters in
<message-id>, which is not allowed due to the limitation described in
Section 5.5. Instead, <uAtext> is added to meet this requirement.
uText = %d1-9 / ; all UTF-8 characters except
%d11-12 / ; US-ASCII NUL, CR, and LF
%d14-127 /
UTF8-non-ascii
uQuoted-Pair = ("\" (VCHAR / WSP / UTF8-non-ascii )) / obs-qp
VCHAR = <Defined in appendix B.1 of RFC 5234> qtext =/ UTF8-non-ascii
WSP = <Defined in appendix B.1 of RFC 5234> text =/ UTF8-non-ascii
; note that this upgrades the body to UTF-8
obs-qp = <Defined in Section 4.1 of RFC 5322> dtext =/ UTF8-non-ascii
uQcontent = uQtext / uQuoted-Pair A consequence of the change to the dtext rule is that UTF-8 would
then be allowed in the domain parts of message-ids as well as
addresses. This is unnecessary and undesirable, so three additional
RFC 5322 rules are redefined and a new itext rule is added:
DQUOTE = <Defined in appendix B.1 of RFC 5234> id-left = dot-id-text
uCcontent = ctext / uQuoted-Pair / comment id-right = dot-id-text / no-fold-literal
uQtext = qtext / UTF8-non-ascii dot-id-text = 1*itext *("." 1*itext)
uAtext = ALPHA / DIGIT / itext = ALPHA / DIGIT / ; Printable US-ASCII
"!" / "#" / ; Any character except "!" / "#" / ; characters not including
"$" / "%" / ; controls, SP, and specials. "$" / "%" / ; specials. Used for msg-ids.
"&" / "'" / ; Used for atoms. "&" / "'" /
"*" / "+" / "*" / "+" /
"-" / "/" / "-" / "/" /
"=" / "?" / "=" / "?" /
"^" / "_" / "^" / "_" /
"`" / "{" / "`" / "{" /
"|" / "}" / "|" / "}" /
"~" / "~"
UTF8-non-ascii
uAtom = [CFWS] 1*uAtext [CFWS]
uDot-Atom = [CFWS] uDot-Atom-text [CFWS]
uDot-Atom-text = 1*uAtext *("." 1*uAtext)
To allow the use of UTF-8 in a Content-Description header field
[RFC2045], the following syntax is used:
description = "Content-Description" ":" *uText
; Replace description in RFC 2045, Section 8
The <uText> syntax is extended above to allow UTF-8 in all
<description> header fields.
Note, however, this does not remove any constraint on the character
set of protocol elements; for instance, all the allowed values for
timezone in the "Date:" header fields are still expressed in ASCII.
And also, none of this revised syntax changes what is allowed in a
<msg-id>, which will still remain in pure ASCII.
5.4. Change on addr-spec Syntax
Internationalized email addresses are represented in UTF-8. Thus,
all header fields containing <mailbox>es are updated from [RFC5321]
Section 4.1.2 to permit UTF-8 addresses.
mailbox = name-addr / addr-spec / uAddr-Spec
; Replace mailbox in RFC 5322, Section 3.4
angle-addr =/ [CFWS] "<" uAddr-Spec ">" [CFWS]
; Extending angle-addr in RFC 5322, Section 3.4
uAddr-Spec = uLocal-Part "@" uDomain
uLocal-Part = uDot-Atom / uQuoted-String / obs-local-part
; Replace Local-Part in RFC 5322, Section 3.4.1
uQuoted-String = [CFWS] DQUOTE *([FWS] uQcontent) [FWS] DQUOTE [CFWS] This change also specifically disallows obsolete forms of message-ids
that RFC 5322 allows.
obs-local-part = <Defined in Section 4.4 of RFC 5322> The preceding changes mean that the following constructs now allow
uDomain = uDot-Atom / domain-literal / obs-domain UTF-8:
domain-literal = <Defined in Section 3.4.1 of RFC 5322> 1. Unstructured text, used in header fields like Subject: or
Content-description:.
Below are a few examples of possible <mailbox> representations. 2. Any construct that uses atoms, including but not limited to the
local parts of addresses. This includes addresses in the "for"
clauses of Received: header fields.
"DISPLAY_NAME" <ASCII@ASCII> 3. Quoted strings.
; traditional mailbox format
"DISPLAY_NAME" <non-ASCII@non-ASCII> 4. Domains. (But not in message-ids.)
; message will be rejected if UTF8SMTP extension is not supported
<non-ASCII@non-ASCII> Note that header field names are not on this list; these are still
; without DISPLAY_NAME and quoted string restricted to ASCII.
; message will be rejected if UTF8SMTP extension is not supported
5.5. Trace Field Syntax 3.3. Changes to MIME Message Type Encoding Restrictions
The 'uFor' clause in "Received:" fields has been allowed the use of This specification updates Section 6.4 of [RFC2045]. [RFC2045]
internationalized addresses in "For" fields. It is described in prohibits applying a content-transfer-encoding to any subtypes of
[I-D.ietf-eai-rfc5336bis], Section 3.6.3. "message/". This specification relaxes that rule -- it allows newly
defined MIME types to permit content-transfer-encoding, and it allows
content-transfer-encoding for message/global (see Section 3.4).
The "Return-path" designates the address to which messages indicating Background: Normally, transfer of message/global will be done in
non-delivery or other mail system failures are to be sent. Thus, the 8-bit-clean channels, and body parts will have "identity" encodings,
header is augmented to carry UTF-8 addresses (see the revised syntax that is, no decoding is necessary.
of <angle-addr> in Section 5.4 of this document). This will not
break the rule of trace field integrity, because the header field is
added at the last MTA and described in [RFC5321].
The <received-token> on "Received:" field ( described in Section But in the case where a message containing a message/global is
3.6.7 of [RFC5322]) syntax is augmented to allow UTF-8 email address downgraded from 8-bit to 7-bit as described in [RFC6152], an encoding
in the "For" field. <angle-addr> is augmented to include UTF-8 email might have to be applied to the message; if the message travels
address. In order to allow UTF-8 email addresses in an <addr-spec>, multiple times between a 7-bit environment and an environment
<uAddr-Spec> is added to <received-token>. implementing these extensions, multiple levels of encoding may occur.
This is expected to be rarely seen in practice, and the potential
complexity of other ways of dealing with the issue are thought to be
larger than the complexity of allowing nested encodings where
necessary.
received-token =/ uAddr-Spec 3.4. The Message/global Media Type
5.6. message/global Internationalized messages in this format MUST only be transmitted as
authorized by [I-D.ietf-eai-rfc5336bis] or within a non-SMTP
environment that supports these messages. A message is a "message/
global message" if:
Internationalized messages MUST only be transmitted as authorized by o it contains 8-bit UTF-8 header values as specified in this
[I-D.ietf-eai-rfc5336bis] or within a non-SMTP environment that document, or
supports these messages. A message is a "message/global message", if
o it contains UTF-8 header values as specified in this document, or o it contains 8-bit UTF-8 values in the header fields of body parts.
o it contains UTF-8 values in the headers fields of body parts. The content of a message/global part is otherwise identical to that
of a message/rfc822 part.
The type message/global is similar to message/rfc822, except that it If this type is sent to a 7-bit-only system, it has to have an
specifies that a message can contain UTF-8 characters in the headers appropriate content-transfer-encoding applied. (Note that a system
of the message or body parts. If this type is sent to a 7-bit-only compliant with MIME that doesn't recognize message/global is supposed
system, it has to be encoded in MIME [RFC2045]. (Note that a system to treat it as "application/octet-stream" as described in Section
compliant with MIME that doesn't recognize message/global SHOULD 5.2.4 of [RFC2046].)
treat it as "application/octet-stream" as described in Section 5.2.4
of [RFC2046].)
Type name: message Type name: message
Subtype name: global Subtype name: global
Required parameters: none Required parameters: none
Optional parameters: none Optional parameters: none
Encoding considerations: Any content-transfer-encoding is permitted. Encoding considerations: Any content-transfer-encoding is permitted.
The 8-bit or binary content-transfer-encodings are recommended The 8-bit or binary content-transfer-encodings are recommended
where permitted. where permitted.
Security considerations: See Section 6. Security considerations: See Section 4.
Interoperability considerations: The media type provides Interoperability considerations: This media type provides
functionality similar to the message/rfc822 content type for email functionality similar to the message/rfc822 content type for email
messages with international email headers. When there is a need messages with international email headers. When there is a need
to embed or return such content in another message, there is to embed or return such content in another message, there is
generally an option to use this media type and leave the content generally an option to use this media type and leave the content
unchanged or down-convert the content to message/rfc822. Both of unchanged or down-convert the content to message/rfc822. Both of
these choices will interoperate with the installed base, but with these choices will interoperate with the installed base, but with
different properties. Systems unaware of internationalized different properties. Systems unaware of internationalized
headers will typically treat a message/global body part as an headers will typically treat a message/global body part as an
unknown attachment, while they will understand the structure of a unknown attachment, while they will understand the structure of a
message/rfc822. However, systems that understand message/global message/rfc822. However, systems that understand message/global
skipping to change at page 11, line 24 skipping to change at page 8, line 30
Restrictions on usage: This is a structured media type that embeds Restrictions on usage: This is a structured media type that embeds
other MIME media types. The 8-bit or binary content-transfer- other MIME media types. The 8-bit or binary content-transfer-
encoding SHOULD be used unless this media type is sent over a encoding SHOULD be used unless this media type is sent over a
7-bit-only transport. 7-bit-only transport.
Author: See the Author's Address section of this document. Author: See the Author's Address section of this document.
Change controller: IETF Standards Process Change controller: IETF Standards Process
6. Security Considerations 4. Security Considerations
If a user has a non-ASCII mailbox address and an ASCII mailbox
address, a digital certificate that identifies that user may have
both addresses in the identity. Having multiple email addresses as
identities in a single certificate is already supported in PKIX
(Public Key Infrastructure for X.509 Certificates) [RFC5280] and
OpenPGP [RFC3156].
Because UTF-8 often requires several octets to encode a single Because UTF-8 often requires several octets to encode a single
character, internationalized local parts and header value may cause character, internationalization may cause header field values in
mail addresses to become longer. As specified in [RFC5322], each general and mail addresses in particular to become longer. As
line of characters MUST be no more than 998 octets, excluding the specified in [RFC5322], each line of characters MUST be no more than
CRLF. On the other hand, MDA (Mail Delivery Agent) processes that 998 octets, excluding the CRLF. On the other hand, MDA (Mail
parse, store, or handle email addresses or local parts must take Delivery Agent) processes that parse, store, or handle email
extra care not to overflow buffers, truncate addresses, or exceed addresses or local parts must take extra care not to overflow
storage allotments. Also, they must take care, when comparing, to buffers, truncate addresses, or exceed storage allotments. Also,
use the entire lengths of the addresses. they must take care, when comparing, to use the entire lengths of the
addresses.
There are lots of ways of using UTF-8 to represent something There are lots of ways of using UTF-8 to represent something
equivalent or similar to a particular displayed character or group of equivalent or similar to a particular displayed character or group of
characters, then filtering systems can be bypassed by using one of characters. This may allow filtering systems to be bypassed by using
the variants to avoid detection while still reaching the end user a slightly different character to avoid detection while still
with largely the same original effect. This normalization is reaching the end user with largely the same intended deleterious
described in Section 5.1. effect. The normalization process is described in Section 3.1 is
recommended to minimize this problem.
The security impact of UTF-8 headers on email signature systems such The security impact of UTF-8 headers on email signature systems such
as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is
discussed in [I-D.ietf-eai-frmwrk-4952bis], Section 14. discussed in [I-D.ietf-eai-frmwrk-4952bis], Section 14.
7. IANA Considerations If a user has a non-ASCII mailbox address and an ASCII mailbox
address, a digital certificate that identifies that user might have
both addresses in the identity. Having multiple email addresses as
identities in a single certificate is already supported in PKIX
(Public Key Infrastructure for X.509 Certificates) [RFC5280] and
OpenPGP [RFC3156], but there may be user interface issues associated
with the introduction of UTF-8 into addresses in this context.
5. IANA Considerations
IANA is requested to update the registration of the message/global IANA is requested to update the registration of the message/global
MIME type using the registration form contained in Section 5.6. MIME type using the registration form contained in Section 3.4.
8. Acknowledgements 6. Acknowledgements
This document incorporates many ideas first described in Internet- This document incorporates many ideas first described in Internet-
Draft form by Paul Hoffman, although many details have changed from Draft form by Paul Hoffman, although many details have changed from
that earlier work. that earlier work.
The author especially thanks Jeff Yeh for his efforts and The author especially thanks Jeff Yeh for his efforts and
contributions on editing previous versions. contributions on editing previous versions.
Most of the content of this document is provided by John C Klensin. Most of the content of this document was provided by John C Klensin
Also, some significant comments and suggestions were received from and Dave Crocker. Significant comments and suggestions were received
Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov, Chris from Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov,
Newman, Yangwoo Ko, Yoshiro Yoneya, and other members of the JET team Chris Newman, Kristin Hubner, Yangwoo Ko, Yoshiro Yoneya, and other
(Joint Engineering Team) and were incorporated into the document. members of the JET team (Joint Engineering Team) and were
The editor sincerely thanks them for their contributions. incorporated into the document. The editors wish to sincerely thank
them all for their contributions.
9. Edit history 7. Edit history
[[RFC Editor: please remove this section before publishing.]] [[RFC Editor: please remove this section before publishing.]]
9.1. draft-ietf-eai-rfc5335bis-00 7.1. draft-ietf-eai-rfc5335bis-00
1. Applied Errata suggested by Alfred Hoenes. 1. Applied Errata suggested by Alfred Hoenes.
2. Adjust [RFC2821] and [RFC2822] to [RFC5321] and [RFC5322]. 2. Adjust [RFC2821] and [RFC2822] to [RFC5321] and [RFC5322].
3. Abrogate <alt-address> in ABNF of <angle-addr>. 3. Abrogate <alt-address> in ABNF of <angle-addr>.
4. Revoke [RFC5504] from this document. 4. Revoke [RFC5504] from this document.
5. Upgrade some references from I-Ds to RFC. 5. Upgrade some references from I-Ds to RFC.
9.2. draft-ietf-eai-rfc5335bis-01 7.2. draft-ietf-eai-rfc5335bis-01
1. Author name revised. 1. Author name revised.
9.3. draft-ietf-eai-rfc5335bis-02 7.3. draft-ietf-eai-rfc5335bis-02
1. ABNF revised. 1. ABNF revised.
9.4. draft-ietf-eai-rfc5335bis-03 7.4. draft-ietf-eai-rfc5335bis-03
1. Fix typos 1. Fix typos
2. ABNF revised 2. ABNF revised
3. Improve sentence 3. Improve sentence
9.5. draft-ietf-eai-rfc5335bis-04 7.5. draft-ietf-eai-rfc5335bis-04
1. improve sentences and ABNF revised based on AD and Co-chairs 1. improve sentences and ABNF revised based on AD and Co-chairs
9.6. draft-ietf-eai-rfc5335bis-05 7.6. draft-ietf-eai-rfc5335bis-05
1. ABNF revised in Section 5.4 based on AD comments 1. ABNF revised based on AD comments
9.7. draft-ietf-eai-rfc5335bis-06 7.7. draft-ietf-eai-rfc5335bis-06
1. ABNF revised 1. ABNF revised
2. improve Section 7 2. improve Section 5
9.8. draft-ietf-eai-rfc5335bis-07 7.8. draft-ietf-eai-rfc5335bis-07
1. Minor ABNF revised in Section 5.3 1. Minor ABNF revised in Section 3.2
2. improve Section 7 2. improve Section 5
9.9. draft-ietf-eai-rfc5335bis-09 7.9. draft-ietf-eai-rfc5335bis-09
Version -08 was posted in error and withdrawn. Version 09 is is Version -08 was posted in error and withdrawn. Version 09 is is
identical to version 07 except for a date change, addition of this identical to version 07 except for a date change, addition of this
note, and some vertical spacing compression on this page. note, and some vertical spacing compression on this page.
9.10. draft-ietf-eai-rfc5335bis-10 7.10. draft-ietf-eai-rfc5335bis-10
1. Add Appendix A and Section 1.1 1. Add appendix and overview of changes
2. Replace polls result in Abstract and Section 1 2. Replace polls result in Abstract and Section 1
3. Minor Sentence modification 3. Minor Sentence modification
10. References 7.11. draft-ietf-eai-rfc5335bis-11
10.1. Normative References 1. Major rewrite of entire document to incorporate Dave Crocker's
simplified ABNF.
2. The document has intentionally been refocused on implementors
wishing to adapt their software to support EAI, so much of the
explanatory and historical text has been removed. (Some of it
may be reintroduced later as an appendix.
8. References
8.1. Normative References
[ASCII] "Coded Character Set -- 7-bit American [ASCII] "Coded Character Set -- 7-bit American
Standard Code for Information Standard Code for Information
Interchange", ANSI X3.4, 1986. Interchange", ANSI X3.4, 1986.
[I-D.ietf-eai-5378bis] Resnick, P., Newman, C., and S. Shen,
"IMAP Support for UTF-8",
draft-ietf-eai-5378bis-00 (work in
progress), November 2010.
[I-D.ietf-eai-frmwrk-4952bis] Klensin, J. and Y. Ko, "Overview and [I-D.ietf-eai-frmwrk-4952bis] Klensin, J. and Y. Ko, "Overview and
Framework for Internationalized Framework for Internationalized
Email", Email",
draft-ietf-eai-frmwrk-4952bis-10 (work draft-ietf-eai-frmwrk-4952bis-10 (work
in progress), September 2010. in progress), September 2010.
[I-D.ietf-eai-rfc5336bis] Yao, J. and W. MAO, "SMTP Extension [I-D.ietf-eai-rfc5336bis] Yao, J. and W. MAO, "SMTP Extension
for Internationalized Email Address", for Internationalized Email Address",
draft-ietf-eai-rfc5336bis-07 (work in draft-ietf-eai-rfc5336bis-07 (work in
progress), December 2010. progress), December 2010.
[I-D.ietf-eai-rfc5721bis] Gellens, R., Newman, C., Yao, J., and
K. Fujiwara, "POP3 Support for UTF-8",
draft-ietf-eai-rfc5721bis-00 (work in
progress), September 2010.
[NFC] Davis, M. and K. Whistler, "Unicode [NFC] Davis, M. and K. Whistler, "Unicode
Standard Annex #15: Unicode Standard Annex #15: Unicode
Normalization Forms", September 2010, Normalization Forms", September 2010,
<http://www.unicode.org/reports/ <http://www.unicode.org/reports/
tr15/>. tr15/>.
[RFC2119] Bradner, S., "Key words for use in [RFC2119] Bradner, S., "Key words for use in
RFCs to Indicate Requirement Levels", RFCs to Indicate Requirement Levels",
BCP 14, RFC 2119, March 1997. BCP 14, RFC 2119, March 1997.
skipping to change at page 15, line 11 skipping to change at page 12, line 18
[RFC5321] Klensin, J., "Simple Mail Transfer [RFC5321] Klensin, J., "Simple Mail Transfer
Protocol", RFC 5321, October 2008. Protocol", RFC 5321, October 2008.
[RFC5322] Resnick, P., Ed., "Internet Message [RFC5322] Resnick, P., Ed., "Internet Message
Format", RFC 5322, October 2008. Format", RFC 5322, October 2008.
[RFC5598] Crocker, D., "Internet Mail [RFC5598] Crocker, D., "Internet Mail
Architecture", RFC 5598, July 2009. Architecture", RFC 5598, July 2009.
10.2. Informative References 8.2. Informative References
[RFC1652] Klensin, J., Freed, N., Rose, M.,
Stefferud, E., and D. Crocker, "SMTP
Service Extension for 8bit-
MIMEtransport", RFC 1652, July 1994.
[RFC2045] Freed, N. and N. Borenstein, [RFC2045] Freed, N. and N. Borenstein,
"Multipurpose Internet Mail Extensions "Multipurpose Internet Mail Extensions
(MIME) Part One: Format of Internet (MIME) Part One: Format of Internet
Message Bodies", RFC 2045, Message Bodies", RFC 2045,
November 1996. November 1996.
[RFC2046] Freed, N. and N. Borenstein, [RFC2046] Freed, N. and N. Borenstein,
"Multipurpose Internet Mail Extensions "Multipurpose Internet Mail Extensions
(MIME) Part Two: Media Types", (MIME) Part Two: Media Types",
skipping to change at page 15, line 45 skipping to change at page 12, line 47
and T. Roessler, "MIME Security with and T. Roessler, "MIME Security with
OpenPGP", RFC 3156, August 2001. OpenPGP", RFC 3156, August 2001.
[RFC5280] Cooper, D., Santesson, S., Farrell, [RFC5280] Cooper, D., Santesson, S., Farrell,
S., Boeyen, S., Housley, R., and W. S., Boeyen, S., Housley, R., and W.
Polk, "Internet X.509 Public Key Polk, "Internet X.509 Public Key
Infrastructure Certificate and Infrastructure Certificate and
Certificate Revocation List (CRL) Certificate Revocation List (CRL)
Profile", RFC 5280, May 2008. Profile", RFC 5280, May 2008.
Appendix A. Changes to support UTF-8 [RFC6152] Klensin, J., Freed, N., Rose, M., and
D. Crocker, "SMTP Service Extension
This section provides a basic audit of the places in a message that for 8-bit MIME Transport", STD 71,
now can permit UTF-8 rather than being restricted to ASCII, based on RFC 6152, March 2011.
the changes to underlying ABNF. The audit ignores rule for
"obsolete" constructs in [RFC5322]. (This is a first cut and the
list is likely incomplete):
VCHAR: quoted-pair, unstructured
> ccontent, qcontent
> comment, quoted-string
> word, local-part
> phrase
> display-name, keywords
ctext: ccontent > comment
atext: atom, dot-atom-text
qtext: qcontent > quoted-string
Authors' Addresses Authors' Addresses
Abel Yang Abel Yang
TWNIC TWNIC
4F-2, No. 9, Sec 2, Roosevelt Rd. 4F-2, No. 9, Sec 2, Roosevelt Rd.
Taipei, 100 Taipei, 100
Taiwan Taiwan
Phone: +886 2 23411313 ext 505 Phone: +886 2 23411313 ext 505
EMail: abelyang@twnic.net.tw EMail: abelyang@twnic.net.tw
Shawn Steele Shawn Steele
Microsoft Microsoft
EMail: Shawn.Steele@microsoft.com EMail: Shawn.Steele@microsoft.com
Ned Freed
Oracle
800 Royal Oaks
Monrovia, CA 91016-6347
USA
EMail: ned+ietf@mrochek.com
 End of changes. 85 change blocks. 
386 lines changed or deleted 230 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/