Email Address Internationalization                               Y. Abel                               A. Yang
(EAI)                                                              TWNIC
Internet-Draft                                                 S. Steele
Obsoletes: 5335 (if approved)                                  Microsoft
Updates: 2045,5321,5322                                   March 15, 2011 2045,5322 (if approved)                                N. Freed
Intended status: Standards Track                                  Oracle
Expires: September 16, January 11, 2012                                  July 10, 2011

                    Internationalized Email Headers
                      draft-ietf-eai-rfc5335bis-09
                      draft-ietf-eai-rfc5335bis-11

Abstract

   Internet mail was originally limited to 7-bit ASCII.  Recent
   enhancements  MIME added
   support Unicode's UTF-8 encoding in portions for the use of a
   message.  Full 8-bit character sets in body parts, and also
   defined an encoded-word construct so other character sets could be
   used in certain header field values.  But full internationalization
   of electronic mail requires additional enhancement, enhancements to allow the use
   of Unicode, including support for UTF-8 characters outside the ASCII repertoire, in user-oriented
   header fields, such
   mail addresses as well as direct use of Unicode in the To, From, header fields like
   From:, To:, and Subject fields. Subject:, without requiring the use of complex
   encoded-word constructs.  This document specifies an enhancement to
   the Internet mail Message Format that permits
   native UTF-8 support allows use of Unicode in the header mail
   addresses and body of a message. most header field content.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 16, 2011. January 11, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Brief Overview of Changes  . . . .
   2.  Terminology Used In This Specification . . . . . . . . . . . .  3
   2.  Relation
   3.  Changes to Other Standards  . . Message Header Fields . . . . . . . . . . . . . . .  3
   3.  Background  4
     3.1.  UTF-8 Syntax and History . . . . . . Normalization . . . . . . . . . . . . . .  4
   4.  Terminology  . . . . . . . . . . .
     3.2.  Syntax Extensions to RFC 5322  . . . . . . . . . . . . . .  5
   5.
     3.3.  Changes on to MIME Message Header Fields . . . . . . . . . . . . . . .  5
     5.1.  UTF-8 Syntax and Normalization Type Encoding Restrictions . . . .  6
     3.4.  The Message/global Media Type  . . . . . . . . . .  5
     5.2.  Changes on MIME Headers . . . .  6
   4.  Security Considerations  . . . . . . . . . . . . .  6
     5.3.  Syntax Extensions to RFC 5322 . . . . . .  8
   5.  IANA Considerations  . . . . . . . .  6
     5.4.  Change on addr-spec Syntax . . . . . . . . . . . . .  9
   6.  Acknowledgements . . .  8
     5.5.  Trace Field Syntax . . . . . . . . . . . . . . . . . . . .  9
     5.6.  message/global
   7.  Edit history . . . . . . . . . . . . . . . . . . . . . .  9
   6.  Security Considerations . . .  9
     7.1.  draft-ietf-eai-rfc5335bis-00 . . . . . . . . . . . . . . .  9
     7.2.  draft-ietf-eai-rfc5335bis-01 . 11
   7.  IANA Considerations . . . . . . . . . . . . . . 10
     7.3.  draft-ietf-eai-rfc5335bis-02 . . . . . . . 12
   8.  Acknowledgements . . . . . . . . 10
     7.4.  draft-ietf-eai-rfc5335bis-03 . . . . . . . . . . . . . . . 12
   9.  Edit history 10
     7.5.  draft-ietf-eai-rfc5335bis-04 . . . . . . . . . . . . . . . 10
     7.6.  draft-ietf-eai-rfc5335bis-05 . . . . . . . . . . 12
     9.1.  draft-ietf-eai-rfc5335bis-00 . . . . . 10
     7.7.  draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . 12
     9.2.  draft-ietf-eai-rfc5335bis-01 . . . . . 10
     7.8.  draft-ietf-eai-rfc5335bis-07 . . . . . . . . . . 12
     9.3.  draft-ietf-eai-rfc5335bis-02 . . . . . 10
     7.9.  draft-ietf-eai-rfc5335bis-09 . . . . . . . . . . 12
     9.4.  draft-ietf-eai-rfc5335bis-03 . . . . . 10
     7.10. draft-ietf-eai-rfc5335bis-10 . . . . . . . . . . 13
     9.5.  draft-ietf-eai-rfc5335bis-04 . . . . . 10
     7.11. draft-ietf-eai-rfc5335bis-11 . . . . . . . . . . 13
     9.6.  draft-ietf-eai-rfc5335bis-05 . . . . . 11
   8.  References . . . . . . . . . . 13
     9.7.  draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . . . . . . 13
     9.8.  draft-ietf-eai-rfc5335bis-07 . 11
     8.1.  Normative References . . . . . . . . . . . . . . 13
     9.9.  draft-ietf-eai-rfc5335bis-09 . . . . . 11
     8.2.  Informative References . . . . . . . . . . 13
     9.10. draft-ietf-eai-rfc5335bis-10 . . . . . . . . . . . . . . . 13
   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
     10.1. Normative References . . . . . . . . . . . . . . . . . . . 13
     10.2. Informative References . . . . . . . . . . . . . . . . . . 15
   Appendix A.  Changes to support UTF-8  . . . . . . . . . . . . . . 15 12

1.  Introduction

   Internet mail distinguishes a message from its transport and further
   divides a message between a header and a body [RFC5598].  Internet
   mail header fields field values contain a variety of strings that are
   intended to be user-visible.  The range of supported characters for
   these strings was originally limited to 7-bit [ASCII].  MIME
   [RFC2045] [RFC2046] [RFC2047] provides the ability to use additional
   character sets, but this support is limited to body part data and to
   special encoded-word constructs that were only allowed in a subset limited
   number of [ASCII]; globalization places in header field values.

   Globalization of the Internet requires support of the much larger set contained
   of characters provided by Unicode [RFC5198] in UTF-8
   [RFC5198].  Complex both mail addresses
   and most header field values.  Additionally, complex encoding alternatives to UTF-8, as an overlay to
   the existing ASCII base, would schemes
   like encoded-words introduce inefficiencies as well as significant
   opportunities for processing errors.  Native  And finally, native support for
   the UTF-8
   encoding [RFC3629]. charset is widely available among systems now used over
   the Internet. available on most systems.  Hence supporting this encoding directly within email it is desired.
   strongly desirable for Internet mail to support UTF-8 [RFC3629]
   directly.

   This document specifies an enhancement to the Internet mail Message Format
   [RFC5322] and to MIME that permits the direct use of UTF-8 encoding, UTF-8, rather
   than only ASCII, as
   the base form for in header fields. field values, including mail addresses.  A
   new media type, message/global, is defined for messages that use this
   extended format.  This specification also lifts the MIME restriction
   on having non-identity content-transfer-encodings on any subtype of
   the message top-level type so that message/global parts can be safely
   transmitted across existing mail infrastructure.

   This specification is based on a model of native, end-to-end support
   for UTF-8, which uses depends on having an "8-bit clean" environment .
   assured by the transport system.  Support for carriage across legacy,
   7-bit infrastructure and for processing by 7-bit receivers requires
   additional mechanisms that are not provided by this specification.

1.1.  Brief Overview of Changes these specifications.

2.  Terminology Used In This document updates email header [RFC5322] and MIME header
   [RFC2045].  Email header value extends beyond ASCII, allowing UTF-8
   encoding.  MIME header lifts the prohibition against using content-
   transfer-encoding on all subtypes under composite-type "message/".
   Appendix A documents the derived ABNF rules that inherit support
   UTF-8, due to the update of ABNF introduces from this document.

   [Editor notes will be removed before Last Call]

   [Editor Note 1: This revision (-08 and up) changed the ABNF approach
   compared to previous revision (-07 and before). ]

   [Editor Note 2: pending -- ABF to support IDN]

   [Editor Note 3: Discuss with WG whether some fields, like Trace
   header, have issues if UTF-8 encoding allowed in values]

2.  Relation to Other Standards

   This document updates Section 6.4 of [RFC2045].  It removes the
   blanket ban on applying a content-transfer-encoding to all subtypes
   of message/ and instead specifies that a composite subtype MAY
   specify whether or not a content-transfer-encoding can be used for
   that subtype, with "cannot be used" as the default.

   This document also updates Section 3.4 of [RFC5322].  It Extended
   mailbox address syntax to permit UTF-8 character in Section 5.3.

   Allowing use of a content-transfer-encoding on subtypes of messages
   is not limited to transmissions that are authorized by the SMTP
   extension specified in [I-D.ietf-eai-rfc5336bis]. message/global (see
   Section 5.6) of this document permits use of a content-transfer-
   encoding.

3.  Background and History

   Mailbox names often represent the names of human users.  Many of
   these users throughout the world have names that are not normally
   expressed with just the ASCII repertoire of characters, and would
   like to use more or less their real names in their mailbox names.
   These users are also likely to use non-ASCII text in their display
   names and subjects of email messages, both received and sent.  This
   protocol specifies UTF-8 as the encoding to represent email header
   field bodies.

   The traditional format of email messages [RFC5322] allows only ASCII
   characters in the header fields of messages.  This prevents users
   from having email addresses that contain non-ASCII characters.  It
   further forces non-ASCII text in display names, comments, and in free
   text (such as in the "Subject:" field) to be encoded (as required by
   MIME format [RFC2047]).  This specification describes a change to the
   email message format that is related to the SMTP message transport
   change described in the associated documents
   [I-D.ietf-eai-frmwrk-4952bis] and [I-D.ietf-eai-rfc5336bis], and that
   allows non-ASCII characters in most email header fields.  These
   changes affect SMTP clients, SMTP servers, mail user agents (MUAs),
   list expanders, gateways to other media, and all other processes that
   parse or handle email messages.

   As specified in [I-D.ietf-eai-rfc5336bis], an SMTP protocol extension
   "UTF8SMTP" is used to prevent the transmission of messages with UTF-8
   header fields to systems that cannot handle such messages.

   Use of this SMTP extension helps prevent the introduction of such
   messages into message stores that might misinterpret, improperly
   display, or mangle such messages.  It should be noted that using an
   ESMTP extension does not prevent transferring email messages with
   UTF-8 header fields to other systems that use the email format for
   messages and that may not be upgraded, such as unextended POP and
   IMAP servers.  Changes to these protocols to handle UTF-8 header
   fields are addressed in [I-D.ietf-eai-rfc5721bis] and

   [I-D.ietf-eai-5378bis].

   The objective for this protocol is to allow UTF-8 in email header
   fields.

4.  Terminology Specification

   A plain ASCII string is fully compatible with [RFC5321] and
   [RFC5322].  In this document, non-ASCII strings are UTF-8 strings if
   they are in header field values which contain at least one <UTF8-non-ascii>. <UTF8-non-
   ascii> (see Section 3.1).

   Unless otherwise noted, all terms used here are defined in [RFC5321],
   [RFC5322], [I-D.ietf-eai-frmwrk-4952bis], or
   [I-D.ietf-eai-rfc5336bis].

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

5.

   The term "8-bit" means octets are present in the data with values
   above 0x7F.

3.  Changes on to Message Header Fields

   SMTP clients can send header fields

   To permit Unicode characters in UTF-8 format, if field values, the UTF8SMTP
   extension header definition
   in [RFC5322] is advertised by extended to support the SMTP server or is permitted by other
   transport mechanisms.

   This new format.  The following
   sections specify the necessary changes to RFC 5322's ABNF.

   The syntax rules not mentioned below remain defined as in [RFC5322].

   Note that this protocol does NOT not change the [RFC5322] RFC 5322 rules for defining
   header field names.  The bodies of header fields are allowed to
   contain
   UTF-8 Unicode characters, but the header field names themselves
   must contain only ASCII characters.

   To permit UTF-8 characters

   Also note that messages in field values, this format require the header definition in
   [RFC5322] is extended to support use of the new format.  The following ABNF
   is defined
   &UTF8SMTPbis; extension [I-D.ietf-eai-rfc5336bis] to substitute those definitions in [RFC5322].

   The syntax rules not covered in this section remain as defined in
   [RFC5322].

5.1. be transferred
   via SMTP.

3.1.  UTF-8 Syntax and Normalization

   UTF-8 characters can be defined in terms of octets using the
   following ABNF [RFC5234], taken from [RFC3629]:

   UTF8-non-ascii  =   UTF8-2 / UTF8-3 / UTF8-4

   UTF8-2          =   <Defined in Section 4 of RFC3629>

   UTF8-3          =   <Defined in Section 4 of RFC3629>

   UTF8-4          =   <Defined in Section 4 of RFC3629>

   See [RFC5198] for a discussion of Unicode normalization; the use of
   normalization form [NFC] is RECOMMENDED. SHOULD be used.  Actually, if one is going
   to do internationalization properly, one of the most often-cited
   goals is to permit people to spell their names correctly.  Since many
   mailbox local parts reflect personal names, that principle applies as
   well.  And NFKC is not recommended because it may lose information
   that is needed to correctly spell some names in unusual
   circumstances.

5.2.  Changes on MIME Headers

   This specification updates Section 6.4 of [RFC2045].  [RFC2045]
   prohibits applying a content-transfer-encoding to any subtypes of
   "message/".  This specification relaxes the rule -- it allows newly
   defined MIME types to permit content-transfer-encoding, and it allows
   content-transfer-encoding for message/global (see Section 5.6).

   Background: Normally, transfer of message/global will be done in
   8-bit-clean channels, and body parts will have "identity" encodings,
   that is, no decoding is necessary.  In the case where a message
   containing a message/global is downgraded from 8-bit to 7-bit
   mailboxes as
   described in [RFC1652], an encoding may well.  The NFKC normalization form SHOULD NOT be applied to the message; if
   the message travels multiple times between a 7-bit environment and an
   environment implementing UTF8SMTP, multiple levels of encoding used
   because it may
   occur.  This lose information that is expected needed to be rarely seen correctly spell
   some names in practice, and the
   potential complexity of other ways of dealing with the issue are
   thought to be larger than the complexity of allowing nested encodings
   where necessary.

5.3. some unusual circumstances.

3.2.  Syntax Extensions to RFC 5322

   The following rules are intended to extend the corresponding rules ABNF syntax defined in [RFC5322] and
   [RFC5234] in order to allow UTF-8 characters.

   FWS     =  <Defined in Section 3.2.2 of RFC 5322>

   CFWS    =  <Defined in Section 3.2.2 of RFC 5322> content.

   VCHAR   =/  UTF8-non-ascii

   ctext   =/  UTF8-non-ascii

   atext   =/  UTF8-non-ascii

   qtext   =/  UTF8-non-ascii

   text    =/  UTF8-non-ascii
                  ; Extending ctext in RFC 5322, Section 3.2.2
   comment =   "(" *([FWS] uCcontent) [FWS] ")"

   word    =   uAtom / uQuoted-String

   This means note that all this upgrades the [RFC5322] constructs that build upon these
   will permit body to UTF-8 characters, including comments and quoted strings.
   We do not change the syntax

   dtext   =/  UTF8-non-ascii

   A consequence of <atext> in order the change to allow the dtext rule is that UTF-8
   characters in <addr-spec>.  This would also allow UTF-8 characters in
   <message-id>, which is not
   then be allowed due to the limitation described in
   Section 5.5.  Instead, <uAtext> the domain parts of message-ids as well as
   addresses.  This is added to meet this requirement.

   uText          = %d1-9 /    ; all UTF-8 characters except
                    %d11-12 /  ; US-ASCII NUL, CR, unnecessary and LF
                    %d14-127 /
                    UTF8-non-ascii

   uQuoted-Pair   = ("\" (VCHAR / WSP / UTF8-non-ascii )) / obs-qp

   VCHAR          = <Defined in appendix B.1 of RFC 5234>

   WSP            = <Defined in appendix B.1 of RFC 5234>

   obs-qp         = <Defined in Section 4.1 of undesirable, so three additional
   RFC 5322>

   uQcontent      = uQtext / uQuoted-Pair

   DQUOTE 5322 rules are redefined and a new itext rule is added:

   id-left      = <Defined in appendix B.1 of RFC 5234>

   uCcontent   dot-id-text

   id-right     = ctext / uQuoted-Pair   dot-id-text / comment

   uQtext no-fold-literal

   dot-id-text  = qtext / UTF8-non-ascii

   uAtext   1*itext *("." 1*itext)

   itext        =   ALPHA / DIGIT /  ; Printable US-ASCII
                    "!" / "#" /      ; Any character except  characters not including
                    "$" / "%" /      ; controls, SP, and  specials.  Used for msg-ids.
                    "&" / "'" /  ; Used for atoms.
                    "*" / "+" /
                    "-" / "/" /
                    "=" / "?" /
                    "^" / "_" /
                    "`" / "{" /
                    "|" / "}" /
                    "~" /
                    UTF8-non-ascii

   uAtom          = [CFWS] 1*uAtext [CFWS]

   uDot-Atom      = [CFWS] uDot-Atom-text [CFWS]

   uDot-Atom-text = 1*uAtext *("." 1*uAtext)

   To allow the use

   This change also specifically disallows obsolete forms of UTF-8 in a Content-Description header field

   [RFC2045], the following syntax is used:

   description    = "Content-Description" ":" *uText
                   ; Replace description in message-ids
   that RFC 2045, Section 8 5322 allows.

   The <uText> syntax is extended above to preceding changes mean that the following constructs now allow UTF-8
   UTF-8:

   1.  Unstructured text, used in all
   <description> header fields.

   Note, however, this does fields like Subject: or
       Content-description:.

   2.  Any construct that uses atoms, including but not remove any constraint on limited to the character
   set
       local parts of protocol elements; for instance, all the allowed values for
   timezone addresses.  This includes addresses in the "Date:" "for"
       clauses of Received: header fields are still expressed fields.

   3.  Quoted strings.

   4.  Domains.  (But not in ASCII.
   And also, none of message-ids.)

   Note that header field names are not on this revised syntax changes what is allowed in a
   <msg-id>, which will list; these are still remain in pure
   restricted to ASCII.

5.4.  Change on addr-spec Syntax

   Internationalized email addresses are represented in UTF-8.  Thus,
   all header fields containing <mailbox>es are updated from [RFC5321]
   Section 4.1.2

3.3.  Changes to permit UTF-8 addresses.

   mailbox        = name-addr / addr-spec / uAddr-Spec
                    ; Replace mailbox in RFC 5322, Section 3.4

   angle-addr     =/ [CFWS] "<" uAddr-Spec ">" [CFWS]
                    ; Extending angle-addr in RFC 5322, Section 3.4

   uAddr-Spec     = uLocal-Part "@" uDomain

   uLocal-Part    = uDot-Atom / uQuoted-String / obs-local-part
                    ; Replace Local-Part in RFC 5322, Section 3.4.1

   uQuoted-String = [CFWS] DQUOTE *([FWS] uQcontent) [FWS] DQUOTE [CFWS]

   obs-local-part = <Defined in Section 4.4 of RFC 5322>
   uDomain        = uDot-Atom / domain-literal / obs-domain

   domain-literal = <Defined in MIME Message Type Encoding Restrictions

   This specification updates Section 3.4.1 6.4 of RFC 5322>

   Below are [RFC2045].  [RFC2045]
   prohibits applying a few examples content-transfer-encoding to any subtypes of possible <mailbox> representations.

      "DISPLAY_NAME" <ASCII@ASCII>
      ; traditional mailbox format

      "DISPLAY_NAME" <non-ASCII@non-ASCII>
      ; message
   "message/".  This specification relaxes that rule -- it allows newly
   defined MIME types to permit content-transfer-encoding, and it allows
   content-transfer-encoding for message/global (see Section 3.4).

   Background: Normally, transfer of message/global will be rejected if UTF8SMTP extension is not supported

      <non-ASCII@non-ASCII>
      ; without DISPLAY_NAME done in
   8-bit-clean channels, and quoted string
      ; message body parts will be rejected if UTF8SMTP extension have "identity" encodings,
   that is, no decoding is not supported

5.5.  Trace Field Syntax

   The 'uFor' clause necessary.

   But in "Received:" fields has been allowed the use of
   internationalized addresses in "For" fields.  It case where a message containing a message/global is
   downgraded from 8-bit to 7-bit as described in
   [I-D.ietf-eai-rfc5336bis], Section 3.6.3.

   The "Return-path" designates the address to which messages indicating
   non-delivery or other mail system failures are [RFC6152], an encoding
   might have to be sent.  Thus, the
   header is augmented applied to carry UTF-8 addresses (see the revised syntax
   of <angle-addr> in Section 5.4 of this document).  This will not
   break the rule of trace field integrity, because the header field is
   added at message; if the last MTA message travels
   multiple times between a 7-bit environment and described in [RFC5321].

   The <received-token> on "Received:" field ( described in Section
   3.6.7 an environment
   implementing these extensions, multiple levels of [RFC5322]) syntax encoding may occur.
   This is augmented expected to allow UTF-8 email address be rarely seen in practice, and the "For" field. <angle-addr> is augmented to include UTF-8 email
   address.  In order to allow UTF-8 email addresses in an <addr-spec>,
   <uAddr-Spec> is added potential
   complexity of other ways of dealing with the issue are thought to <received-token>.

   received-token =/ uAddr-Spec

5.6.  message/global be
   larger than the complexity of allowing nested encodings where
   necessary.

3.4.  The Message/global Media Type

   Internationalized messages in this format MUST only be transmitted as
   authorized by [I-D.ietf-eai-rfc5336bis] or within a non-SMTP
   environment that supports these messages.  A message is a "message/global message", if "message/
   global message" if:

   o  it contains 8-bit UTF-8 header values as specified in this
      document, or

   o  it contains 8-bit UTF-8 values in the headers header fields of body parts.

   The type content of a message/global part is similar otherwise identical to message/rfc822, except that it
   specifies that a message can contain UTF-8 characters in the headers
   of the message or body parts. a message/rfc822 part.

   If this type is sent to a 7-bit-only system, it has to be encoded in MIME [RFC2045]. have an
   appropriate content-transfer-encoding applied.  (Note that a system
   compliant with MIME that doesn't recognize message/global SHOULD is supposed
   to treat it as "application/octet-stream" as described in Section
   5.2.4 of [RFC2046].)

   Type name:  message

   Subtype name:  global

   Required parameters:  none

   Optional parameters:  none

   Encoding considerations:  Any content-transfer-encoding is permitted.
      The 8-bit or binary content-transfer-encodings are recommended
      where permitted.

   Security considerations:  See Section 6. 4.

   Interoperability considerations:  The  This media type provides
      functionality similar to the message/rfc822 content type for email
      messages with international email headers.  When there is a need
      to embed or return such content in another message, there is
      generally an option to use this media type and leave the content
      unchanged or down-convert the content to message/rfc822.  Both of
      these choices will interoperate with the installed base, but with
      different properties.  Systems unaware of internationalized
      headers will typically treat a message/global body part as an
      unknown attachment, while they will understand the structure of a
      message/rfc822.  However, systems that understand message/global
      will provide functionality superior to the result of a down-
      conversion to message/rfc822.  The most interoperable choice
      depends on the deployed software.

   Published specification:  RFC XXXX

   Applications that use this media type:  SMTP servers and email
      clients that support multipart/report generation or parsing.
      Email clients that forward messages with international headers as
      attachments.

   Additional information:

   Magic number(s):  none

   File extension(s):  The extension ".u8msg" is suggested.

   Macintosh file type code(s):  A uniform type identifier (UTI) of
      "public.utf8-email-message" is suggested.  This conforms to
      "public.message" and "public.composite-content", but does not
      necessarily conform to "public.utf8-plain-text".

   Person & email address to contact for further information:  See the
      Author's Address section of this document.

   Intended usage:  COMMON

   Restrictions on usage:  This is a structured media type that embeds
      other MIME media types.  The 8-bit or binary content-transfer-
      encoding SHOULD be used unless this media type is sent over a
      7-bit-only transport.

   Author:  See the Author's Address section of this document.

   Change controller:  IETF Standards Process

6.

4.  Security Considerations

   If a user has a non-ASCII mailbox address and an ASCII mailbox
   address, a digital certificate that identifies that user may have
   both addresses in the identity.  Having multiple email addresses as
   identities in a single certificate is already supported in PKIX
   (Public Key Infrastructure for X.509 Certificates) [RFC5280] and
   OpenPGP [RFC3156].

   Because UTF-8 often requires several octets to encode a single
   character, internationalized local parts and header value internationalization may cause header field values in
   general and mail addresses in particular to become longer.  As
   specified in [RFC5322], each line of characters MUST be no more than
   998 octets, excluding the CRLF.  On the other hand, MDA (Mail
   Delivery Agent) processes that parse, store, or handle email
   addresses or local parts must take extra care not to overflow
   buffers, truncate addresses, or exceed storage allotments.  Also,
   they must take care, when comparing, to use the entire lengths of the
   addresses.

   There are lots of ways of using UTF-8 to represent something
   equivalent or similar to a particular displayed character or group of
   characters, then
   characters.  This may allow filtering systems can to be bypassed by using one of
   the variants
   a slightly different character to avoid detection while still
   reaching the end user with largely the same original intended deleterious
   effect.  This  The normalization process is described in Section 5.1. 3.1 is
   recommended to minimize this problem.

   The security impact of UTF-8 headers on email signature systems such
   as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is
   discussed in [I-D.ietf-eai-frmwrk-4952bis], Section 14.

7.

   If a user has a non-ASCII mailbox address and an ASCII mailbox
   address, a digital certificate that identifies that user might have
   both addresses in the identity.  Having multiple email addresses as
   identities in a single certificate is already supported in PKIX
   (Public Key Infrastructure for X.509 Certificates) [RFC5280] and
   OpenPGP [RFC3156], but there may be user interface issues associated
   with the introduction of UTF-8 into addresses in this context.

5.  IANA Considerations

   IANA is requested to update the registration of the message/global
   MIME type using the registration form contained in Section 5.6.

8. 3.4.

6.  Acknowledgements

   This document incorporates many ideas first described in Internet-
   Draft form by Paul Hoffman, although many details have changed from
   that earlier work.

   The author especially thanks Jeff Yeh for his efforts and
   contributions on editing previous versions.

   Most of the content of this document is was provided by John C Klensin.
   Also, some significant Klensin
   and Dave Crocker.  Significant comments and suggestions were received
   from Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov,
   Chris Newman, Kristin Hubner, Yangwoo Ko, Yoshiro Yoneya, and other
   members of the JET team (Joint Engineering Team) and were
   incorporated into the document.  The editor editors wish to sincerely thanks thank
   them all for their contributions.

9.

7.  Edit history

   [[RFC Editor: please remove this section before publishing.]]

9.1.

7.1.  draft-ietf-eai-rfc5335bis-00

   1.  Applied Errata suggested by Alfred Hoenes.

   2.  Adjust [RFC2821] and [RFC2822] to [RFC5321] and [RFC5322].

   3.  Abrogate <alt-address> in ABNF of <angle-addr>.

   4.  Revoke [RFC5504] from this document.

   5.  Upgrade some references from I-Ds to RFC.

9.2.

7.2.  draft-ietf-eai-rfc5335bis-01

   1.  Author name revised.

9.3.

7.3.  draft-ietf-eai-rfc5335bis-02

   1.  ABNF revised.

9.4.

7.4.  draft-ietf-eai-rfc5335bis-03

   1.  Fix typos

   2.  ABNF revised

   3.  Improve sentence

9.5.

7.5.  draft-ietf-eai-rfc5335bis-04

   1.  improve sentences and ABNF revised based on AD and Co-chairs

9.6.

7.6.  draft-ietf-eai-rfc5335bis-05

   1.  ABNF revised in Section 5.4 based on AD comments

9.7.

7.7.  draft-ietf-eai-rfc5335bis-06

   1.  ABNF revised

   2.  improve Section 7

9.8. 5

7.8.  draft-ietf-eai-rfc5335bis-07

   1.  Minor ABNF revised in Section 5.3 3.2

   2.  improve Section 7

9.9. 5

7.9.  draft-ietf-eai-rfc5335bis-09

   Version -08 was posted in error and withdrawn.  Version 09 is is
   identical to version 07 except for a date change, addition of this
   note, and some vertical spacing compression on this page.

9.10.

7.10.  draft-ietf-eai-rfc5335bis-10

   1.  Add Appendix A appendix and Section 1.1 overview of changes

   2.  Replace polls result in Abstract and Section 1
   3.  Minor Sentence modification

10.

7.11.  draft-ietf-eai-rfc5335bis-11

   1.  Major rewrite of entire document to incorporate Dave Crocker's
       simplified ABNF.

   2.  The document has intentionally been refocused on implementors
       wishing to adapt their software to support EAI, so much of the
       explanatory and historical text has been removed.  (Some of it
       may be reintroduced later as an appendix.

8.  References

10.1.

8.1.  Normative References

   [ASCII]                        "Coded Character Set -- 7-bit American
                                  Standard Code for Information
                                  Interchange", ANSI X3.4, 1986.

   [I-D.ietf-eai-5378bis]         Resnick, P., Newman, C., and S. Shen,
                                  "IMAP Support for UTF-8",
                                  draft-ietf-eai-5378bis-00 (work in
                                  progress), November 2010.

   [I-D.ietf-eai-frmwrk-4952bis]  Klensin, J. and Y. Ko, "Overview and
                                  Framework for Internationalized
                                  Email",
                                  draft-ietf-eai-frmwrk-4952bis-10 (work
                                  in progress), September 2010.

   [I-D.ietf-eai-rfc5336bis]      Yao, J. and W. MAO, "SMTP Extension
                                  for Internationalized Email Address",
                                  draft-ietf-eai-rfc5336bis-07 (work in
                                  progress), December 2010.

   [I-D.ietf-eai-rfc5721bis]      Gellens, R., Newman, C., Yao, J., and
                                  K. Fujiwara, "POP3 Support for UTF-8",
                                  draft-ietf-eai-rfc5721bis-00 (work in
                                  progress), September 2010.

   [NFC]                          Davis, M. and K. Whistler, "Unicode
                                  Standard Annex #15: Unicode
                                  Normalization Forms", September 2010,
                                  <http://www.unicode.org/reports/
                                  tr15/>.

   [RFC2119]                      Bradner, S., "Key words for use in
                                  RFCs to Indicate Requirement Levels",
                                  BCP 14, RFC 2119, March 1997.

   [RFC3629]                      Yergeau, F., "UTF-8, a transformation
                                  format of ISO 10646", STD 63,
                                  RFC 3629, November 2003.

   [RFC5198]                      Klensin, J. and M. Padlipsky, "Unicode
                                  Format for Network Interchange",
                                  RFC 5198, March 2008.

   [RFC5234]                      Crocker, D. and P. Overell, "Augmented
                                  BNF for Syntax Specifications: ABNF",
                                  STD 68, RFC 5234, January 2008.

   [RFC5321]                      Klensin, J., "Simple Mail Transfer
                                  Protocol", RFC 5321, October 2008.

   [RFC5322]                      Resnick, P., Ed., "Internet Message
                                  Format", RFC 5322, October 2008.

   [RFC5598]                      Crocker, D., "Internet Mail
                                  Architecture", RFC 5598, July 2009.

10.2.

8.2.  Informative References

   [RFC1652]                      Klensin, J., Freed, N., Rose, M.,
                                  Stefferud, E., and D. Crocker, "SMTP
                                  Service Extension for 8bit-
                                  MIMEtransport", RFC 1652, July 1994.

   [RFC2045]                      Freed, N. and N. Borenstein,
                                  "Multipurpose Internet Mail Extensions
                                  (MIME) Part One: Format of Internet
                                  Message Bodies", RFC 2045,
                                  November 1996.

   [RFC2046]                      Freed, N. and N. Borenstein,
                                  "Multipurpose Internet Mail Extensions
                                  (MIME) Part Two: Media Types",
                                  RFC 2046, November 1996.

   [RFC2047]                      Moore, K., "MIME (Multipurpose
                                  Internet Mail Extensions) Part Three:
                                  Message Header Extensions for Non-
                                  ASCII Text", RFC 2047, November 1996.

   [RFC3156]                      Elkins, M., Del Torto, D., Levien, R.,
                                  and T. Roessler, "MIME Security with
                                  OpenPGP", RFC 3156, August 2001.

   [RFC5280]                      Cooper, D., Santesson, S., Farrell,
                                  S., Boeyen, S., Housley, R., and W.
                                  Polk, "Internet X.509 Public Key
                                  Infrastructure Certificate and
                                  Certificate Revocation List (CRL)
                                  Profile", RFC 5280, May 2008.

Appendix A.  Changes to support UTF-8

   This section provides a basic audit of the places in a message that
   now can permit UTF-8 rather than being restricted to ASCII, based on
   the changes to underlying ABNF.  The audit ignores rule for
   "obsolete" constructs in [RFC5322].  (This is a first cut

   [RFC6152]                      Klensin, J., Freed, N., Rose, M., and the
   list is likely incomplete):

   VCHAR:   quoted-pair, unstructured

      > ccontent, qcontent

      > comment, quoted-string

      > word, local-part

      > phrase

      > display-name, keywords

   ctext:   ccontent > comment

   atext:   atom, dot-atom-text

   qtext:   qcontent > quoted-string
                                  D. Crocker, "SMTP Service Extension
                                  for 8-bit MIME Transport", STD 71,
                                  RFC 6152, March 2011.

Authors' Addresses

   Abel Yang
   TWNIC
   4F-2, No. 9, Sec 2, Roosevelt Rd.
   Taipei,   100
   Taiwan

   Phone: +886 2 23411313 ext 505
   EMail: abelyang@twnic.net.tw

   Shawn Steele
   Microsoft

   EMail: Shawn.Steele@microsoft.com

   Ned Freed
   Oracle
   800 Royal Oaks
   Monrovia, CA  91016-6347
   USA

   EMail: ned+ietf@mrochek.com