draft-ietf-eai-rfc5335bis-08.txt   draft-ietf-eai-rfc5335bis-09.txt 
EAI A. Yang Email Address Internationalization Y. Abel
Internet-Draft TWNIC (EAI) TWNIC
Obsoletes: 5335 (if approved) S. Steele Internet-Draft S. Steele
Updates: 2045,5322 Microsoft Obsoletes: 5335 (if approved) Microsoft
(if approved) D. Crocker Updates: 2045,5321,5322 March 2, 2011
Intended status: Standards Track Brandenburg InternetWorking (if approved)
Expires: July 29, 2011 N. Freed Intended status: Standards Track
Oracle Expires: September 3, 2011
January 25, 2011
Internationalized Email Headers Internationalized Email Headers
draft-ietf-eai-rfc5335bis-08 draft-ietf-eai-rfc5335bis-09
Abstract Abstract
Internet mail was originally limited to 7-bit ASCII. Recent Full internationalization of electronic mail requires not only the
enhancements support Unicode's UTF-8 encoding in portions of a capabilities to transmit non-ASCII content, to encode selected
message. Full internationalization of electronic mail requires information in specific header fields, and to use non-ASCII
additional enhancement, including support for UTF-8 in user-oriented characters in envelope addresses. It also requires being able to
header fields, such as in the To, From, and Subject fields. This express those addresses and the information based on them in mail
document specifies an enhancement to Internet mail that permits header fields. This document specifies a variant of Internet mail
native UTF-8 support in the header and body of a message. that permits the use of Unicode encoded in UTF-8, rather than ASCII,
as the base form for Internet email header field. This form is
permitted in transmission only if authorized by an SMTP extension, as
specified in an associated specification. This specification updates
Section 6.4 of [RFC2045] to conform with the requirements.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 29, 2011. This Internet-Draft will expire on September 3, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Changes to the 8-bit clean Model<!-- (by DCrocker) --> . . 3 1.1. Role of This Specification . . . . . . . . . . . . . . . . 3
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Relation to Other Standards . . . . . . . . . . . . . . . 3
2. Support for UTF-8 Encoding . . . . . . . . . . . . . . . . . . 4 2. Background and History . . . . . . . . . . . . . . . . . . . . 3
2.1. Message Object ABNF Changes . . . . . . . . . . . . . . . 4 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Normalization . . . . . . . . . . . . . . . . . . . . . . 5 4. Changes on Message Header Fields . . . . . . . . . . . . . . . 5
2.3. Content-Transfer-Encoding . . . . . . . . . . . . . . . . 5 4.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 5
3. Internet Message Format Enhancement . . . . . . . . . . . . . 5 4.2. Changes on MIME Headers . . . . . . . . . . . . . . . . . 5
4. Message Labeling . . . . . . . . . . . . . . . . . . . . . . . 6 4.3. Syntax Extensions to RFC 5322 . . . . . . . . . . . . . . 6
5. MIME Enhancement . . . . . . . . . . . . . . . . . . . . . . . 6 4.4. Change on addr-spec Syntax . . . . . . . . . . . . . . . . 8
5.1. Content-Transfer-Encoding . . . . . . . . . . . . . . . . 6 4.5. Trace Field Syntax . . . . . . . . . . . . . . . . . . . . 9
5.2. MIME Header Field . . . . . . . . . . . . . . . . . . . . 6 4.6. message/global . . . . . . . . . . . . . . . . . . . . . . 9
5.3. Content-Type: message/utf8-rfc822 . . . . . . . . . . . . 7 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11
6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 8. Edit history . . . . . . . . . . . . . . . . . . . . . . . . . 12
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 8.1. draft-ietf-eai-rfc5335bis-00 . . . . . . . . . . . . . . . 12
9.1. Normative References . . . . . . . . . . . . . . . . . . . 9 8.2. draft-ietf-eai-rfc5335bis-01 . . . . . . . . . . . . . . . 12
9.2. Informative References . . . . . . . . . . . . . . . . . . 10 8.3. draft-ietf-eai-rfc5335bis-02 . . . . . . . . . . . . . . . 12
Appendix A. Changes to support UTF-8 . . . . . . . . . . . . . . 10 8.4. draft-ietf-eai-rfc5335bis-03 . . . . . . . . . . . . . . . 12
8.5. draft-ietf-eai-rfc5335bis-04 . . . . . . . . . . . . . . . 12
8.6. draft-ietf-eai-rfc5335bis-05 . . . . . . . . . . . . . . . 12
8.7. draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . . . . . . 13
8.8. draft-ietf-eai-rfc5335bis-07 . . . . . . . . . . . . . . . 13
8.9. draft-ietf-eai-rfc5335bis-09 . . . . . . . . . . . . . . . 13
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
9.1. Normative References . . . . . . . . . . . . . . . . . . . 13
9.2. Informative References . . . . . . . . . . . . . . . . . . 14
1. Introduction 1. Introduction
Internet mail distinguishes a message from its transport and further 1.1. Role of This Specification
divides a message between a header and a body [RFC5598]. Internet
mail header fields contain a variety of strings that are intended to
be user-visible. The range of supported characters for these strings
was originally limited to a subset of [ASCII]; globalization of the
Internet requires support of the much larger set contained in UTF-8
[RFC5198]. Complex encoding alternatives to UTF-8, as an overlay to
the existing ASCII base, would introduce inefficiencies as well as
opportunities for processing errors. Native support for UTF-8
encoding [RFC3629] is widely available among systems now used over
the Internet. Hence supporting this encoding directly within email
is desired. This document specifies an enhancement to Internet mail
that permits the use of UTF-8 encoding, rather than only ASCII, as
the base form for header fields.
This specification is based on a model of native, end-to-end support Full internationalization of electronic mail requires several
for UTF-8, which uses an "8-bit clean" environment . Support for capabilities:
carriage across legacy, 7-bit infrastructure and for processing by
7-bit receivers requires additional mechanisms that are not provided
by this specification.
1.1. Changes to the 8-bit clean Model<!-- (by DCrocker) --> o The capability to transmit non-ASCII content, provided for as part
of the basic MIME specification [RFC2045], [RFC2046].
This is an extensive revision to the draft. Changes include: o The capability to use international characters in envelope
addresses, discussed in [I-D.ietf-eai-frmwrk-4952bis] and
specified in [I-D.ietf-eai-rfc5336bis].
o Greatly simplified ABNF that is much more basic and integrated. o The capability to express those addresses, and information related
to them and based on them, in mail header fields, defined in this
document.
o Clean separation of the changes in an email header [RFC5322] from This document specifies a variant of Internet mail that permits the
those in a MIME header [RFC2045] use of Unicode encoded in UTF-8 [RFC3629], rather than ASCII, as the
base form for Internet email header fields. This form is permitted
in transmission, if authorized by the SMTP extension specified in
[I-D.ietf-eai-rfc5336bis] or by other transport mechanisms capable of
processing it.
o Change to the default MIME content-transfer-encoding to be 8bit 1.2. Relation to Other Standards
o Elimination of all discussion of transport This document updates Section 6.4 of [RFC2045]. It removes the
blanket ban on applying a content-transfer-encoding to all subtypes
of message/ and instead specifies that a composite subtype MAY
specify whether or not a content-transfer-encoding can be used for
that subtype, with "cannot be used" as the default.
o An appendix that lists the derived ABNF rules that inherit support This document also updates Section 3.4 of [RFC5322]. It Extended
UTF-8, due to the changed basic rules mailbox address syntax to permit UTF-8 character in Section 4.3.
Still Pending: Allowing use of a content-transfer-encoding on subtypes of messages
is not limited to transmissions that are authorized by the SMTP
extension specified in [I-D.ietf-eai-rfc5336bis]. message/global (see
Section 4.6) permits use of a content-transfer-encoding.
o ABF to support IDN 2. Background and History
o Fix "Normalization" section; I could not figure out what it needs Mailbox names often represent the names of human users. Many of
to say. I wasn't trying to change the existing spec, but simply these users throughout the world have names that are not normally
fix the writing. expressed with just the ASCII repertoire of characters, and would
like to use more or less their real names in their mailbox names.
These users are also likely to use non-ASCII text in their display
names and subjects of email messages, both received and sent. This
protocol specifies UTF-8 as the encoding to represent email header
field bodies.
o Review/fix MIME C-T-E details The traditional format of email messages [RFC5322] allows only ASCII
characters in the header fields of messages. This prevents users
from having email addresses that contain non-ASCII characters. It
further forces non-ASCII text in display names, comments, and in free
text (such as in the "Subject:" field) to be encoded (as required by
MIME format [RFC2047]). This specification describes a change to the
email message format that is related to the SMTP message transport
change described in the associated documents
[I-D.ietf-eai-frmwrk-4952bis] and [I-D.ietf-eai-rfc5336bis], and that
allows non-ASCII characters in most email header fields. These
changes affect SMTP clients, SMTP servers, mail user agents (MUAs),
list expanders, gateways to other media, and all other processes that
parse or handle email messages.
The goal of the changes is to dramatically simplify the specification As specified in [I-D.ietf-eai-rfc5336bis], an SMTP protocol extension
and the software needed to support a message with UTF8 encoding. "UTF8SMTP" is used to prevent the transmission of messages with UTF-8
Rather than specify a wide range of UTF8-specific changes to the header fields to systems that cannot handle such messages.
existing ABNF rules, it focuses on the few, underlying ABNF rules
that are the basis for user-visible ASCII text. The premise for this
is simple: If the message is to be in UTF-8, then it is in UTF-8.
Subtle or complex rules that selectively add UTF-8 are not worth the
effort, once the message has already entered into the realm of UTF-8.
The question, then, is whether this change has planted some Use of this SMTP extension helps prevent the introduction of such
landmines, such as in Trace header fields? messages into message stores that might misinterpret, improperly
display, or mangle such messages. It should be noted that using an
ESMTP extension does not prevent transferring email messages with
UTF-8 header fields to other systems that use the email format for
messages and that may not be upgraded, such as unextended POP and
IMAP servers. Changes to these protocols to handle UTF-8 header
fields are addressed in [I-D.ietf-eai-rfc5721bis] and
[I-D.ietf-eai-5378bis].
1.2. Terminology The objective for this protocol is to allow UTF-8 in email header
fields.
3. Terminology
A plain ASCII string is full compatible with [RFC5321] and [RFC5322].
In this document, non-ASCII strings are UTF-8 strings if they are in
header which contain at least one <UTF8-non-ascii>.
Unless otherwise noted, all terms used here are defined in [RFC5321],
[RFC5322], [I-D.ietf-eai-frmwrk-4952bis], or
[I-D.ietf-eai-rfc5336bis].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
Syntax descriptions use Augmented BNF (ABNF) [RFC5234]. 4. Changes on Message Header Fields
Basic terms for this specification include: SMTP clients can send header fields in UTF-8 format, if the UTF8SMTP
extension is advertised by the SMTP server or is permitted by other
transport mechanisms.
ASCII: An encoding of Control Characters and Basic Latin that This protocol does NOT change the [RFC5322] rules for defining header
occupies 7-bits, per [ASCII]. Such a string is fully field names. The bodies of header fields are allowed to contain
compatible with email as specified in [RFC5322]. UTF-8 characters, but the header field names themselves must contain
only ASCII characters.
UTF-8: An encoding of Unicode in 8-bit bytes, per [RFC3629]. To permit UTF-8 characters in field values, the header definition in
[RFC5322] is extended to support the new format. The following ABNF
is defined to substitute those definitions in [RFC5322].
2. Support for UTF-8 Encoding The syntax rules not covered in this section remain as defined in
[RFC5322].
2.1. Message Object ABNF Changes 4.1. UTF-8 Syntax and Normalization
Internet Mail that conforms to this specification is classed as UTF-8 characters can be defined in terms of octets using the
supporting UTF-8. However, UTF-8 characters within the ASCII range following ABNF [RFC5234], taken from [RFC3629]:
retain the restrictions defined for original, legacy, Latin-only
email. Therefore, ABNF enhancements to include UTF-8 incrementally
add the non-ASCII portions of UTF-8 to that established base of
ASCII.
UTF-8 characters are defined by using the following ABNF taken from UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4
[RFC3629]:
UTF8-enhancement = UTF8-2 / UTF8-3 / UTF8-4 UTF8-2 = <See Section 4 of RFC3629>
UTF8-2 = <See Section 4 of RFC3629> UTF8-3 = <See Section 4 of RFC3629>
UTF8-3 = <See Section 4 of RFC3629> UTF8-4 = <See Section 4 of RFC3629>
UTF8-4 = <See Section 4 of RFC3629> See [RFC5198] for a discussion of normalization; the use of
normalization form [NFC] is RECOMMENDED. Actually, if one is going
to do internationalization properly, one of the most often-cited
goals is to permit people to spell their names correctly. Since many
mailbox local parts reflect personal names, that principle applies as
well. And NFKC is not recommended because it may lose information
that is needed to correctly spell some names in unusual
circumstances.
2.2. Normalization 4.2. Changes on MIME Headers
See [RFC5198] for a discussion of normalization. A normalized form This specification updates Section 6.4 of [RFC2045]. [RFC2045]
[NFC] MAY be used. However [NFC] can lose information that is needed prohibits applying a content-transfer-encoding to any subtypes of
to correctly spell some names in unusual circumstances. "message/". This specification relaxes the rule -- it allows newly
defined MIME types to permit content-transfer-encoding, and it allows
content-transfer-encoding for message/global (see Section 4.6).
2.3. Content-Transfer-Encoding Background: Normally, transfer of message/global will be done in
8-bit-clean channels, and body parts will have "identity" encodings,
that is, no decoding is necessary. In the case where a message
containing a message/global is downgraded from 8-bit to 7-bit as
described in [RFC1652], an encoding may be applied to the message; if
the message travels multiple times between a 7-bit environment and an
environment implementing UTF8SMTP, multiple levels of encoding may
occur. This is expected to be rarely seen in practice, and the
potential complexity of other ways of dealing with the issue are
thought to be larger than the complexity of allowing nested encodings
where necessary.
This specification is based on a requirement for an "8-bit clean" 4.3. Syntax Extensions to RFC 5322
infrastructure. Support for UTF-8 semantics within a 7-bit
environment requires translation conventions that are not specified
here. Consequently a Content-Transfer-Encoding value of 7-bit is not
useful for a message that is labeled as containing UTF-8.
3. Internet Message Format Enhancement The following rules are intended to extend the corresponding rules in
[RFC5322] in order to allow UTF-8 characters.
This section specifies UTF-8 enhancements for the header of an FWS = <see Section 3.2.2 of RFC 5322>
Internet Mail message, as defined in [RFC5322].
ABNF used in this section is taken from that specification and the CFWS = <see Section 3.2.2 of RFC 5322>
ABNF specification.
This specification retains the [RFC5322] rules for defining header ctext =/ UTF8-non-ascii
field names. The bodies of header fields are allowed to contain ; Extending ctext in RFC 5322, Section 3.2.2
UTF-8 characters, but the header field names themselves must contain comment = "(" *([FWS] uCcontent) [FWS] ")"
only ASCII characters.
The following rules extend the corresponding rules in [RFC5322] and word = uAtom / uQuoted-String
[RFC5234] in order to allow additional UTF-8 characters.
VCHAR =/ UTF8-non-ascii This means that all the [RFC5322] constructs that build upon these
will permit UTF-8 characters, including comments and quoted strings.
We do not change the syntax of <atext> in order to allow UTF-8
characters in <addr-spec>. This would also allow UTF-8 characters in
<message-id>, which is not allowed due to the limitation described in
Section 4.5. Instead, <uAtext> is added to meet this requirement.
ctext =/ UTF8-enhancement uText = %d1-9 / ; all UTF-8 characters except
%d11-12 / ; US-ASCII NUL, CR, and LF
%d14-127 /
UTF8-non-ascii
atext =/ UTF8-enhancement uQuoted-Pair = ("\" (VCHAR / WSP / UTF8-non-ascii )) / obs-qp
qtext =/ UTF8-enhancement VCHAR = <See appendix B.1 of RFC 5234>
TENTATIVE (DCrocker): WSP = <See appendix B.1 of RFC 5234>
text =/ UTF8-enhancement
; note that this upgrades the body to UTF-8
{{ how to add IDN to this? }} obs-qp = <See Section 4.1 of RFC 5322>
domain = dot-atom / domain-literal / obs-domain
This means that all the [RFC5322] constructs that build upon these uQcontent = uQtext / uQuoted-Pair
will permit UTF-8 characters, including comments and quoted strings.
<field-name> [RFC5322] has the rule <field-name> which specifies DQUOTE = <See appendix B.1 of RFC 5234>
permissible names for user-defined header fields. The current
specification defines no changes to that rule.
<msg-id> This ABNF enables Message-ID strings to be full UTF-8. uCcontent = ctext / uQuoted-Pair / comment
However the specification directs that Message-ID strings
SHOULD be restricted to ASCII.
4. Message Labeling uQtext = qtext / UTF8-non-ascii
For clarity and convenience, a message SHOULD contain an explicit uAtext = ALPHA / DIGIT /
label indicating the character base it uses. This section defines a "!" / "#" / ; Any character except
new header field for this label: "$" / "%" / ; controls, SP, and specials.
fields =/ msg-character "&" / "'" / ; Used for atoms.
msg-character = "MSG-Char:" ( "ASCII" / "UTF-8" ) CRLF "*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~" /
UTF8-non-ascii
5. MIME Enhancement uAtom = [CFWS] 1*uAtext [CFWS]
5.1. Content-Transfer-Encoding uDot-Atom = [CFWS] uDot-Atom-text [CFWS]
The default "Content-Transfer-Encoding: is 8BIT" and is assumed if uDot-Atom-text = 1*uAtext *("." 1*uAtext)
the Content-Transfer-Encoding header field is not present.
5.2. MIME Header Field To allow the use of UTF-8 in a Content-Description header field
[RFC2045], the following syntax is used:
MIME contains at least one header field that is intended for user description = "Content-Description" ":" *uText
display, namely Content-Description. This section specifies UTF-8 ; Replace description in RFC 2045, Section 8
enhancements to MIME header fields, as defined in [RFC2045]. ABNF
rules used in this section is taken from that specification and the
ABNF specification.
The enhanced ABNF rules are: The <uText> syntax is extended above to allow UTF-8 in all
text =/ UTF8-non-ascii <description> header fields.
5.3. Content-Type: message/utf8-rfc822 Note, however, this does not remove any constraint on the character
set of protocol elements; for instance, all the allowed values for
timezone in the "Date:" header fields are still expressed in ASCII.
And also, none of this revised syntax changes what is allowed in a
<msg-id>, which will still remain in pure ASCII.
The type message/utf-rfc822 is similar to message/rfc822. However it 4.4. Change on addr-spec Syntax
specifies that characters are interpreted as UTF-8 rather than being
limited to ASCII. Internationalized email addresses are represented in UTF-8. Thus,
all header fields containing <mailbox>es are updated from [RFC5321]
Section 4.1.2 to permit UTF-8 addresses.
mailbox = name-addr / addr-spec / uAddr-Spec
; Replace mailbox in RFC 5322, Section 3.4
angle-addr =/ [CFWS] "<" uAddr-Spec ">" [CFWS]
; Extending angle-addr in RFC 5322, Section 3.4
uAddr-Spec = uLocal-Part "@" uDomain
uLocal-Part = uDot-Atom / uQuoted-String / obs-local-part
; Replace Local-Part in RFC 5322, Section 3.4.1
uQuoted-String = [CFWS] DQUOTE *([FWS] uQcontent) [FWS] DQUOTE [CFWS]
obs-local-part = <See Section 4.4 of RFC 5322>
uDomain = uDot-Atom / domain-literal / obs-domain
domain-literal = <See Section 3.4.1 of RFC 5322>
Below are a few examples of possible <mailbox> representations.
"DISPLAY_NAME" <ASCII@ASCII>
; traditional mailbox format
"DISPLAY_NAME" <non-ASCII@non-ASCII>
; message will be rejected if UTF8SMTP extension is not supported
<non-ASCII@non-ASCII>
; without DISPLAY_NAME and quoted string
; message will be rejected if UTF8SMTP extension is not supported
4.5. Trace Field Syntax
The 'uFor' clause in "Received:" fields has been allowed the use of
internationalized addresses in "For" fields. It described in
[I-D.ietf-eai-rfc5336bis], Section 3.6.3.
The "Return-path" designates the address to which messages indicating
non-delivery or other mail system failures are to be sent. Thus, the
header is augmented to carry UTF-8 addresses (see the revised syntax
of <angle-addr> in Section 4.4 of this document). This will not
break the rule of trace field integrity, because the header field is
added at the last MTA and described in [RFC5321].
The <received-token> on "Received:" field ( described in Section
3.6.7 of [RFC5322]) syntax is augmented to allow UTF-8 email address
in the "For" field. <angle-addr> is augmented to include UTF-8 email
address. In order to allow UTF-8 email addresses in an <addr-spec>,
<uAddr-Spec> is added to <received-token>.
received-token =/ uAddr-Spec
4.6. message/global
Internationalized messages MUST only be transmitted as authorized by
[I-D.ietf-eai-rfc5336bis] or within a non-SMTP environment which
supports these messages. A message is a "message/global message", if
o it contains UTF-8 header values as specified in this document, or
o it contains UTF-8 values in the headers fields of body parts.
The type message/global is similar to message/rfc822, except that it
specifies that a message can contain UTF-8 characters in the headers
of the message or body parts. If this type is sent to a 7-bit-only
system, it has to be encoded in MIME [RFC2045]. (Note that a system
compliant with MIME that doesn't recognize message/global SHOULD
treat it as "application/octet-stream" as described in Section 5.2.4
of [RFC2046].)
Type name: message Type name: message
Subtype name: utf8-rfc822 Subtype name: global
Required parameters: none Required parameters: none
Optional parameters: none Optional parameters: none
Encoding considerations: Any content-transfer-encoding is permitted. Encoding considerations: Any content-transfer-encoding is permitted.
The 8-bit or binary content-transfer-encodings are recommended The 8-bit or binary content-transfer-encodings are recommended
where permitted. where permitted.
Security considerations: See Section 6. Security considerations: See Section 5.
Interoperability considerations: The media type provides Interoperability considerations: The media type provides
functionality similar to the message/rfc822 content type for email functionality similar to the message/rfc822 content type for email
messages with international email headers. When there is a need messages with international email headers. When there is a need
to embed or return such content in another message, there is to embed or return such content in another message, there is
generally an option to use this media type and leave the content generally an option to use this media type and leave the content
unchanged or down-convert the content to message/rfc822. Both of unchanged or down-convert the content to message/rfc822. Both of
these choices will interoperate with the installed base, but with these choices will interoperate with the installed base, but with
different properties. Systems unaware of internationalized different properties. Systems unaware of internationalized
headers will typically treat a message/utf8-rfc822 body part as an headers will typically treat a message/global body part as an
unknown attachment, while they will understand the structure of a unknown attachment, while they will understand the structure of a
message/rfc822. However, systems that understand message/ message/rfc822. However, systems that understand message/global
utf8-rfc822 will provide functionality superior to the result of a will provide functionality superior to the result of a down-
down-conversion to message/rfc822. The most interoperable choice conversion to message/rfc822. The most interoperable choice
depends on the deployed software. depends on the deployed software.
Published specification: RFC XXXX Published specification: RFC XXXX
Applications that use this media type: SMTP servers and email Applications that use this media type: SMTP servers and email
clients that support multipart/report generation or parsing. clients that support multipart/report generation or parsing.
Email clients which forward messages with international headers as Email clients which forward messages with international headers as
attachments. attachments.
Additional information: Additional information:
skipping to change at page 8, line 20 skipping to change at page 11, line 4
Macintosh file type code(s): A uniform type identifier (UTI) of Macintosh file type code(s): A uniform type identifier (UTI) of
"public.utf8-email-message" is suggested. This conforms to "public.utf8-email-message" is suggested. This conforms to
"public.message" and "public.composite-content", but does not "public.message" and "public.composite-content", but does not
necessarily conform to "public.utf8-plain-text". necessarily conform to "public.utf8-plain-text".
Person & email address to contact for further information: See the Person & email address to contact for further information: See the
Author's Address section of this document. Author's Address section of this document.
Intended usage: COMMON Intended usage: COMMON
Restrictions on usage: This is a structured media type which embeds Restrictions on usage: This is a structured media type which embeds
other MIME media types. The 8-bit or binary content-transfer- other MIME media types. The 8-bit or binary content-transfer-
encoding SHOULD be used unless this media type is sent over a encoding SHOULD be used unless this media type is sent over a
7-bit-only transport. 7-bit-only transport.
Author: See the Author's Address section of this document. Author: See the Author's Address section of this document.
Change controller: IETF Standards Process Change controller: IETF Standards Process
6. Security Considerations 5. Security Considerations
If a user has a mailbox address in UTF-8 and a mailbox address in If a user has a non-ASCII mailbox address and an ASCII mailbox
ASCII, a digital certificate that identifies that user might have address, a digital certificate that identifies that user may have
both addresses in the identity. Having multiple email addresses as both addresses in the identity. Having multiple email addresses as
identities in a single certificate is already supported in PKIX identities in a single certificate is already supported in PKIX
(Public Key Infrastructure for X.509 Certificates) [RFC5280] and (Public Key Infrastructure for X.509 Certificates) [RFC5280] and
OpenPGP [RFC3156]. OpenPGP [RFC3156].
Because UTF-8 often requires several octets to encode a single Because UTF-8 often requires several octets to encode a single
character, internationalized local parts and header value may cause character, internationalized local parts and header value may cause
mail addresses to become longer. As specified in [RFC5322], each mail addresses to become longer. As specified in [RFC5322], each
line of characters MUST be no more 998 octets, excluding the CRLF. line of characters MUST be no more 998 octets, excluding the CRLF.
On the other hand, MDA (Mail Delivery Agent) processes that parse, On the other hand, MDA (Mail Delivery Agent) processes that parse,
store, or handle email addresses or local parts must take extra care store, or handle email addresses or local parts must take extra care
not to overflow buffers, truncate addresses, or exceed storage not to overflow buffers, truncate addresses, or exceed storage
allotments. Also, they must take care, when comparing, to use the allotments. Also, they must take care, when comparing, to use the
entire lengths of the addresses. entire lengths of the addresses.
The security impact of UTF-8 headers on email signature systems such The security impact of UTF-8 headers on email signature systems such
as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is
discussed in [I-D.eai-frmwrk-4952bis], Section 14. discussed in [I-D.ietf-eai-frmwrk-4952bis], Section 14.
7. IANA Considerations 6. IANA Considerations
IANA is requested to update the registration of the message/ IANA is requested to update the registration of the message/global
utf8-rfc822 MIME type using the registration form contained in MIME type using the registration form contained in Section 4.6.
Section 5.3.
8. Acknowledgements 7. Acknowledgements
This document incorporates many ideas first described in Internet- This document incorporates many ideas first described in Internet-
Draft form by Paul Hoffman, although many details have changed from Draft form by Paul Hoffman, although many details have changed from
that earlier work. that earlier work.
The author especially thanks Jeff Yeh for his efforts and The author especially thanks Jeff Yeh for his efforts and
contributions on editing previous versions. contributions on editing previous versions.
Most of the content of this document is provided by John C Klensin. Most of the content of this document is provided by John C Klensin.
Also, some significant comments and suggestions were received from Also, some significant comments and suggestions were received from
Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov, Chris Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov, Chris
Newman, Yangwoo Ko, Yoshiro Yoneya, and other members of the JET team Newman, Yangwoo Ko, Yoshiro Yoneya, and other members of the JET team
(Joint Engineering Team) and were incorporated into the document. (Joint Engineering Team) and were incorporated into the document.
The editor sincerely thanks them for their contributions. The editor sincerely thanks them for their contributions.
9. References 8. Edit history
9.1. Normative References [[RFC Editor: please remove this section before publishing.]]
[ASCII] "Coded Character Set -- 7-bit American 8.1. draft-ietf-eai-rfc5335bis-00
Standard Code for Information Interchange",
ANSI X3.4, 1986.
[I-D.eai-frmwrk-4952bis] Klensin, J. and Y. Ko, "Overview and 1. Applied Errata suggested by Alfred Hoenes.
Framework for Internationalized Email",
draft-ietf-eai-frmwrk-4952bis-10 (work in
progress), September 2010.
[Latin] Unicode Consortium, "C0 Controls and Basic 2. Adjust [RFC2821] and [RFC2822] to [RFC5321] and [RFC5322].
Latin",
http://unicode.org /charts/PDF/U0000.pdf,
2010.
[NFC] Davis, M. and K. Whistler, "Unicode 3. Abrogate <alt-address> in ABNF of <angle-addr>.
Standard Annex #15: Unicode Normalization
Forms", September 2010,
<http://www.unicode.org/reports/tr15/>.
[RFC2119] Bradner, S., "Key words for use in RFCs to 4. Revoke [RFC5504] from this document.
Indicate Requirement Levels", BCP 14,
RFC 2119, March 1997.
[RFC3629] Yergeau, F., "UTF-8, a transformation 5. Upgrade some references from I-Ds to RFC.
format of ISO 10646", STD 63, RFC 3629,
November 2003.
[RFC5198] Klensin, J. and M. Padlipsky, "Unicode 8.2. draft-ietf-eai-rfc5335bis-01
Format for Network Interchange", RFC 5198,
March 2008.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF 1. Author name revised.
for Syntax Specifications: ABNF", STD 68,
RFC 5234, January 2008.
[RFC5322] Resnick, P., Ed., "Internet Message 8.3. draft-ietf-eai-rfc5335bis-02
Format", RFC 5322, October 2008.
[RFC5598] Crocker, D., "Internet Mail Architecture", 1. ABNF revised.
RFC 5598, July 2009.
[Unicode] Unicode Consortium, "Unicode 6.0 Character 8.4. draft-ietf-eai-rfc5335bis-03
Code Charts", http://unicode.org /charts/,
2010.
9.2. Informative References 1. Fix typos
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose 2. ABNF revised
Internet Mail Extensions (MIME) Part One:
Format of Internet Message Bodies",
RFC 2045, November 1996.
[RFC2046] Freed, N. and N. Borenstein, "Multipurpose 3. Improve sentence
Internet Mail Extensions (MIME) Part Two:
Media Types", RFC 2046, November 1996.
[RFC3156] Elkins, M., Del Torto, D., Levien, R., and 8.5. draft-ietf-eai-rfc5335bis-04
T. Roessler, "MIME Security with OpenPGP",
RFC 3156, August 2001.
[RFC5280] Cooper, D., Santesson, S., Farrell, S., 1. improve sentences and ABNF revised based on AD and Co-chairs
Boeyen, S., Housley, R., and W. Polk,
"Internet X.509 Public Key Infrastructure
Certificate and Certificate Revocation List
(CRL) Profile", RFC 5280, May 2008.
Appendix A. Changes to support UTF-8 8.6. draft-ietf-eai-rfc5335bis-05
This section provides a basic audit of the places in a message that 1. ABNF revised in Section 4.4 based on AD comments
now can permit UTF-8 rather than being restricted to ASCII, based on
the changes to underlying ABNF. The audit ignores rule for
"obsolete" constructs in RFC 5322. (This is a first cut and the list
is likely incomplete):
VCHAR: quoted-pair, unstructured 8.7. draft-ietf-eai-rfc5335bis-06
> ccontent, qcontent 1. ABNF revised
> comment, quoted-string 2. improve Section 6
> word, local-part 8.8. draft-ietf-eai-rfc5335bis-07
> phrase 1. Minor ABNF revised in Section 4.3
> display-name, keywords 2. improve Section 6
ctext: ccontent > comment 8.9. draft-ietf-eai-rfc5335bis-09
atext: atom, dot-atom-text Version -08 was posted in error and withdrawn. Version 09 is is
identical to version 07 except for a date change, addition of this
note, and some vertical spacing compression on this page.
qtext: qcontent > quoted-string 9. References
9.1. Normative References
[I-D.ietf-eai-5378bis] Resnick, P., Newman, C., and S. Shen,
"IMAP Support for UTF-8",
draft-ietf-eai-5378bis-00 (work in
progress), November 2010.
[I-D.ietf-eai-frmwrk-4952bis] Klensin, J. and Y. Ko, "Overview and
Framework for Internationalized
Email",
draft-ietf-eai-frmwrk-4952bis-10 (work
in progress), September 2010.
[I-D.ietf-eai-rfc5336bis] Yao, J. and W. MAO, "SMTP Extension
for Internationalized Email Address",
draft-ietf-eai-rfc5336bis-07 (work in
progress), December 2010.
[I-D.ietf-eai-rfc5721bis] Gellens, R., Newman, C., Yao, J., and
K. Fujiwara, "POP3 Support for UTF-8",
draft-ietf-eai-rfc5721bis-00 (work in
progress), September 2010.
[NFC] Davis, M. and K. Whistler, "Unicode
Standard Annex #15: Unicode
Normalization Forms", September 2010,
<http://www.unicode.org/reports/
tr15/>.
[RFC2119] Bradner, S., "Key words for use in
RFCs to Indicate Requirement Levels",
BCP 14, RFC 2119, March 1997.
[RFC3629] Yergeau, F., "UTF-8, a transformation
format of ISO 10646", STD 63,
RFC 3629, November 2003.
[RFC5198] Klensin, J. and M. Padlipsky, "Unicode
Format for Network Interchange",
RFC 5198, March 2008.
[RFC5234] Crocker, D. and P. Overell, "Augmented
BNF for Syntax Specifications: ABNF",
STD 68, RFC 5234, January 2008.
[RFC5321] Klensin, J., "Simple Mail Transfer
Protocol", RFC 5321, October 2008.
[RFC5322] Resnick, P., Ed., "Internet Message
Format", RFC 5322, October 2008.
9.2. Informative References
[RFC1652] Klensin, J., Freed, N., Rose, M.,
Stefferud, E., and D. Crocker, "SMTP
Service Extension for 8bit-
MIMEtransport", RFC 1652, July 1994.
[RFC2045] Freed, N. and N. Borenstein,
"Multipurpose Internet Mail Extensions
(MIME) Part One: Format of Internet
Message Bodies", RFC 2045,
November 1996.
[RFC2046] Freed, N. and N. Borenstein,
"Multipurpose Internet Mail Extensions
(MIME) Part Two: Media Types",
RFC 2046, November 1996.
[RFC2047] Moore, K., "MIME (Multipurpose
Internet Mail Extensions) Part Three:
Message Header Extensions for Non-
ASCII Text", RFC 2047, November 1996.
[RFC3156] Elkins, M., Del Torto, D., Levien, R.,
and T. Roessler, "MIME Security with
OpenPGP", RFC 3156, August 2001.
[RFC5280] Cooper, D., Santesson, S., Farrell,
S., Boeyen, S., Housley, R., and W.
Polk, "Internet X.509 Public Key
Infrastructure Certificate and
Certificate Revocation List (CRL)
Profile", RFC 5280, May 2008.
Authors' Addresses Authors' Addresses
Abel Yang Abel Yang
TWNIC TWNIC
4F-2, No. 9, Sec 2, Roosevelt Rd. 4F-2, No. 9, Sec 2, Roosevelt Rd.
Taipei, 100 Taipei, 100
Taiwan Taiwan
Phone: +886 2 23411313 ext 505 Phone: +886 2 23411313 ext 505
EMail: abelyang@twnic.net.tw EMail: abelyang@twnic.net.tw
Shawn Steele Shawn Steele
Microsoft Microsoft
EMail: Shawn.Steele@microsoft.com EMail: Shawn.Steele@microsoft.com
D. Crocker
Brandenburg InternetWorking
675 Spruce Dr.
Sunnyvale
USA
Phone: +1.408.246.8253
EMail: dcrocker@bbiw.net
URI: http://bbiw.net
Ned Freed
Oracle
800 Royal Oaks
Monrovia, CA 91016-6347
USA
EMail: ned.freed@mrochek.com
 End of changes. 103 change blocks. 
246 lines changed or deleted 425 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/