draft-ietf-eai-rfc5335bis-11.txt   draft-ietf-eai-rfc5335bis-12.txt 
Email Address Internationalization A. Yang Email Address Internationalization A. Yang
(EAI) TWNIC (EAI) TWNIC
Internet-Draft S. Steele Internet-Draft S. Steele
Obsoletes: 5335 (if approved) Microsoft Obsoletes: 5335 (if approved) Microsoft
Updates: 2045,5322 (if approved) N. Freed Updates: 2045,5322 (if approved) N. Freed
Intended status: Standards Track Oracle Intended status: Standards Track Oracle
Expires: January 11, 2012 July 10, 2011 Expires: March 21, 2012 September 18, 2011
Internationalized Email Headers Internationalized Email Headers
draft-ietf-eai-rfc5335bis-11 draft-ietf-eai-rfc5335bis-12
Abstract Abstract
Internet mail was originally limited to 7-bit ASCII. MIME added Internet mail was originally limited to 7-bit ASCII. MIME added
support for the use of 8-bit character sets in body parts, and also support for the use of 8-bit character sets in body parts, and also
defined an encoded-word construct so other character sets could be defined an encoded-word construct so other character sets could be
used in certain header field values. But full internationalization used in certain header field values. But full internationalization
of electronic mail requires additional enhancements to allow the use of electronic mail requires additional enhancements to allow the use
of Unicode, including characters outside the ASCII repertoire, in of Unicode, including characters outside the ASCII repertoire, in
mail addresses as well as direct use of Unicode in header fields like mail addresses as well as direct use of Unicode in header fields like
skipping to change at page 1, line 43 skipping to change at page 1, line 43
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 11, 2012. This Internet-Draft will expire on March 21, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 22 skipping to change at page 2, line 22
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology Used In This Specification . . . . . . . . . . . . 3 2. Terminology Used In This Specification . . . . . . . . . . . . 3
3. Changes to Message Header Fields . . . . . . . . . . . . . . . 4 3. Changes to Message Header Fields . . . . . . . . . . . . . . . 4
3.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 4 3.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 4
3.2. Syntax Extensions to RFC 5322 . . . . . . . . . . . . . . 5 3.2. Syntax Extensions to RFC 5322 . . . . . . . . . . . . . . 5
3.3. Changes to MIME Message Type Encoding Restrictions . . . . 6 3.3. Use of 8-bit UTF-8 in Message-Ids . . . . . . . . . . . . 5
3.4. The Message/global Media Type . . . . . . . . . . . . . . 6 3.4. Effects on Line Length Limits . . . . . . . . . . . . . . 5
3.5. Changes to MIME Message Type Encoding Restrictions . . . . 6
3.6. Use of MIME Encoded-Words . . . . . . . . . . . . . . . . 6
3.7. The Message/global Media Type . . . . . . . . . . . . . . 6
4. Security Considerations . . . . . . . . . . . . . . . . . . . 8 4. Security Considerations . . . . . . . . . . . . . . . . . . . 8
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9
7. Edit history . . . . . . . . . . . . . . . . . . . . . . . . . 9 7. Edit history . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.1. draft-ietf-eai-rfc5335bis-00 . . . . . . . . . . . . . . . 9 7.1. draft-ietf-eai-rfc5335bis-00 . . . . . . . . . . . . . . . 9
7.2. draft-ietf-eai-rfc5335bis-01 . . . . . . . . . . . . . . . 10 7.2. draft-ietf-eai-rfc5335bis-01 . . . . . . . . . . . . . . . 10
7.3. draft-ietf-eai-rfc5335bis-02 . . . . . . . . . . . . . . . 10 7.3. draft-ietf-eai-rfc5335bis-02 . . . . . . . . . . . . . . . 10
7.4. draft-ietf-eai-rfc5335bis-03 . . . . . . . . . . . . . . . 10 7.4. draft-ietf-eai-rfc5335bis-03 . . . . . . . . . . . . . . . 10
7.5. draft-ietf-eai-rfc5335bis-04 . . . . . . . . . . . . . . . 10 7.5. draft-ietf-eai-rfc5335bis-04 . . . . . . . . . . . . . . . 10
7.6. draft-ietf-eai-rfc5335bis-05 . . . . . . . . . . . . . . . 10 7.6. draft-ietf-eai-rfc5335bis-05 . . . . . . . . . . . . . . . 10
7.7. draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . . . . . . 10 7.7. draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . . . . . . 10
7.8. draft-ietf-eai-rfc5335bis-07 . . . . . . . . . . . . . . . 10 7.8. draft-ietf-eai-rfc5335bis-07 . . . . . . . . . . . . . . . 10
7.9. draft-ietf-eai-rfc5335bis-09 . . . . . . . . . . . . . . . 10 7.9. draft-ietf-eai-rfc5335bis-09 . . . . . . . . . . . . . . . 10
7.10. draft-ietf-eai-rfc5335bis-10 . . . . . . . . . . . . . . . 10 7.10. draft-ietf-eai-rfc5335bis-10 . . . . . . . . . . . . . . . 11
7.11. draft-ietf-eai-rfc5335bis-11 . . . . . . . . . . . . . . . 11 7.11. draft-ietf-eai-rfc5335bis-11 . . . . . . . . . . . . . . . 11
7.12. draft-ietf-eai-rfc5335bis-12 . . . . . . . . . . . . . . . 11
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8.1. Normative References . . . . . . . . . . . . . . . . . . . 11 8.1. Normative References . . . . . . . . . . . . . . . . . . . 11
8.2. Informative References . . . . . . . . . . . . . . . . . . 12 8.2. Informative References . . . . . . . . . . . . . . . . . . 12
1. Introduction 1. Introduction
Internet mail distinguishes a message from its transport and further Internet mail distinguishes a message from its transport and further
divides a message between a header and a body [RFC5598]. Internet divides a message between a header and a body [RFC5598]. Internet
mail header field values contain a variety of strings that are mail header field values contain a variety of strings that are
intended to be user-visible. The range of supported characters for intended to be user-visible. The range of supported characters for
skipping to change at page 5, line 23 skipping to change at page 5, line 23
atext =/ UTF8-non-ascii atext =/ UTF8-non-ascii
qtext =/ UTF8-non-ascii qtext =/ UTF8-non-ascii
text =/ UTF8-non-ascii text =/ UTF8-non-ascii
; note that this upgrades the body to UTF-8 ; note that this upgrades the body to UTF-8
dtext =/ UTF8-non-ascii dtext =/ UTF8-non-ascii
A consequence of the change to the dtext rule is that UTF-8 would
then be allowed in the domain parts of message-ids as well as
addresses. This is unnecessary and undesirable, so three additional
RFC 5322 rules are redefined and a new itext rule is added:
id-left = dot-id-text
id-right = dot-id-text / no-fold-literal
dot-id-text = 1*itext *("." 1*itext)
itext = ALPHA / DIGIT / ; Printable US-ASCII
"!" / "#" / ; characters not including
"$" / "%" / ; specials. Used for msg-ids.
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
This change also specifically disallows obsolete forms of message-ids
that RFC 5322 allows.
The preceding changes mean that the following constructs now allow The preceding changes mean that the following constructs now allow
UTF-8: UTF-8:
1. Unstructured text, used in header fields like Subject: or 1. Unstructured text, used in header fields like Subject: or
Content-description:. Content-description:.
2. Any construct that uses atoms, including but not limited to the 2. Any construct that uses atoms, including but not limited to the
local parts of addresses. This includes addresses in the "for" local parts of addresses and message-ids. This includes
clauses of Received: header fields. addresses in the "for" clauses of Received: header fields.
3. Quoted strings. 3. Quoted strings.
4. Domains. (But not in message-ids.) 4. Domains.
Note that header field names are not on this list; these are still Note that header field names are not on this list; these are still
restricted to ASCII. restricted to ASCII.
3.3. Changes to MIME Message Type Encoding Restrictions 3.3. Use of 8-bit UTF-8 in Message-Ids
Implementers of message-id generation algorithms MAY prefer to
restrain their output to ASCII since that has some advantages, such
as when constructing References fields in mailing-list threads where
some senders use EAI and others not.
3.4. Effects on Line Length Limits
Section 2.1.1 of [RFC5322] limits lines to 998 characters and
recommends that the lines be restricted to only 78 characters. This
specification changes the former limit to 988 octets. (Note that in
ASCII octets and characters are effectively the same but this is not
true in UTF-8.) The 78 character limit remains defined in terms of
characters, not octets, since it is intended to address display width
issues, not line length issues.
3.5. Changes to MIME Message Type Encoding Restrictions
This specification updates Section 6.4 of [RFC2045]. [RFC2045] This specification updates Section 6.4 of [RFC2045]. [RFC2045]
prohibits applying a content-transfer-encoding to any subtypes of prohibits applying a content-transfer-encoding to any subtypes of
"message/". This specification relaxes that rule -- it allows newly "message/". This specification relaxes that rule -- it allows newly
defined MIME types to permit content-transfer-encoding, and it allows defined MIME types to permit content-transfer-encoding, and it allows
content-transfer-encoding for message/global (see Section 3.4). content-transfer-encoding for message/global (see Section 3.7).
Background: Normally, transfer of message/global will be done in Background: Normally, transfer of message/global will be done in
8-bit-clean channels, and body parts will have "identity" encodings, 8-bit-clean channels, and body parts will have "identity" encodings,
that is, no decoding is necessary. that is, no decoding is necessary.
But in the case where a message containing a message/global is But in the case where a message containing a message/global is
downgraded from 8-bit to 7-bit as described in [RFC6152], an encoding downgraded from 8-bit to 7-bit as described in [RFC6152], an encoding
might have to be applied to the message; if the message travels might have to be applied to the message; if the message travels
multiple times between a 7-bit environment and an environment multiple times between a 7-bit environment and an environment
implementing these extensions, multiple levels of encoding may occur. implementing these extensions, multiple levels of encoding may occur.
This is expected to be rarely seen in practice, and the potential This is expected to be rarely seen in practice, and the potential
complexity of other ways of dealing with the issue are thought to be complexity of other ways of dealing with the issue are thought to be
larger than the complexity of allowing nested encodings where larger than the complexity of allowing nested encodings where
necessary. necessary.
3.4. The Message/global Media Type 3.6. Use of MIME Encoded-Words
The MIME encoded-words facility [RFC2047] provides the ability to
place non-ASCII text, but only in a subset of the places allowed by
this extension. Additionally, encoded-words are substantially more
complex since they allow the use of arbitrary charsets. Accordingly,
encoded-words SHOULD NOT be used when generating header fields for
messages employing this extension. Agents MAY, when incorporating
material from another message, convert encoded-word use to direct use
of UTF-8.
Note that care must be taken when decoding encoded-words because the
results after replacing an encoded-word with its decoded equivalent
in UTF-8 may be syntactically invalid. Processors that elect to
decode encoded-words MUST NOT generate syntactically invalid fields.
3.7. The Message/global Media Type
Internationalized messages in this format MUST only be transmitted as Internationalized messages in this format MUST only be transmitted as
authorized by [I-D.ietf-eai-rfc5336bis] or within a non-SMTP authorized by [I-D.ietf-eai-rfc5336bis] or within a non-SMTP
environment that supports these messages. A message is a "message/ environment that supports these messages. A message is a "message/
global message" if: global message" if:
o it contains 8-bit UTF-8 header values as specified in this o it contains 8-bit UTF-8 header values as specified in this
document, or document, or
o it contains 8-bit UTF-8 values in the header fields of body parts. o it contains 8-bit UTF-8 values in the header fields of body parts.
skipping to change at page 8, line 43 skipping to change at page 8, line 50
character, internationalization may cause header field values in character, internationalization may cause header field values in
general and mail addresses in particular to become longer. As general and mail addresses in particular to become longer. As
specified in [RFC5322], each line of characters MUST be no more than specified in [RFC5322], each line of characters MUST be no more than
998 octets, excluding the CRLF. On the other hand, MDA (Mail 998 octets, excluding the CRLF. On the other hand, MDA (Mail
Delivery Agent) processes that parse, store, or handle email Delivery Agent) processes that parse, store, or handle email
addresses or local parts must take extra care not to overflow addresses or local parts must take extra care not to overflow
buffers, truncate addresses, or exceed storage allotments. Also, buffers, truncate addresses, or exceed storage allotments. Also,
they must take care, when comparing, to use the entire lengths of the they must take care, when comparing, to use the entire lengths of the
addresses. addresses.
There are lots of ways of using UTF-8 to represent something There are lots of ways to use UTF-8 to represent something equivalent
equivalent or similar to a particular displayed character or group of or similar to a particular displayed character or group of
characters. This may allow filtering systems to be bypassed by using characters; see the security considerations in [RFC3629] for details
a slightly different character to avoid detection while still on the problems this can cause. The normalization process is
reaching the end user with largely the same intended deleterious described in Section 3.1 is recommended to minimize these issues.
effect. The normalization process is described in Section 3.1 is
recommended to minimize this problem.
The security impact of UTF-8 headers on email signature systems such The security impact of UTF-8 headers on email signature systems such
as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is
discussed in [I-D.ietf-eai-frmwrk-4952bis], Section 14. discussed in [I-D.ietf-eai-frmwrk-4952bis], Section 14.
If a user has a non-ASCII mailbox address and an ASCII mailbox If a user has a non-ASCII mailbox address and an ASCII mailbox
address, a digital certificate that identifies that user might have address, a digital certificate that identifies that user might have
both addresses in the identity. Having multiple email addresses as both addresses in the identity. Having multiple email addresses as
identities in a single certificate is already supported in PKIX identities in a single certificate is already supported in PKIX
(Public Key Infrastructure for X.509 Certificates) [RFC5280] and (Public Key Infrastructure for X.509 Certificates) [RFC5280] and
OpenPGP [RFC3156], but there may be user interface issues associated OpenPGP [RFC3156], but there may be user interface issues associated
with the introduction of UTF-8 into addresses in this context. with the introduction of UTF-8 into addresses in this context.
5. IANA Considerations 5. IANA Considerations
IANA is requested to update the registration of the message/global IANA is requested to update the registration of the message/global
MIME type using the registration form contained in Section 3.4. MIME type using the registration form contained in Section 3.7.
6. Acknowledgements 6. Acknowledgements
This document incorporates many ideas first described in Internet- This document incorporates many ideas first described in Internet-
Draft form by Paul Hoffman, although many details have changed from Draft form by Paul Hoffman, although many details have changed from
that earlier work. that earlier work.
The author especially thanks Jeff Yeh for his efforts and The author especially thanks Jeff Yeh for his efforts and
contributions on editing previous versions. contributions on editing previous versions.
skipping to change at page 11, line 16 skipping to change at page 11, line 23
7.11. draft-ietf-eai-rfc5335bis-11 7.11. draft-ietf-eai-rfc5335bis-11
1. Major rewrite of entire document to incorporate Dave Crocker's 1. Major rewrite of entire document to incorporate Dave Crocker's
simplified ABNF. simplified ABNF.
2. The document has intentionally been refocused on implementors 2. The document has intentionally been refocused on implementors
wishing to adapt their software to support EAI, so much of the wishing to adapt their software to support EAI, so much of the
explanatory and historical text has been removed. (Some of it explanatory and historical text has been removed. (Some of it
may be reintroduced later as an appendix. may be reintroduced later as an appendix.
7.12. draft-ietf-eai-rfc5335bis-12
1. Added a section on the handling of MIME encoded-words.
2. Updated the security considerations to refer to the more complete
discussion in RFC 3629.
3. Added a section on the effects on line length limits.
4. Removed the syntax restriction on the use of 8-bit UTF-8 in
message-ids.
5. Added text recommending that 8-bit UTF-8 be avoided in message-
ids.
8. References 8. References
8.1. Normative References 8.1. Normative References
[ASCII] "Coded Character Set -- 7-bit American [ASCII] "Coded Character Set -- 7-bit American
Standard Code for Information Standard Code for Information
Interchange", ANSI X3.4, 1986. Interchange", ANSI X3.4, 1986.
[I-D.ietf-eai-frmwrk-4952bis] Klensin, J. and Y. Ko, "Overview and [I-D.ietf-eai-frmwrk-4952bis] Klensin, J. and Y. Ko, "Overview and
Framework for Internationalized Framework for Internationalized
 End of changes. 15 change blocks. 
46 lines changed or deleted 70 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/