EAI                                                              A. Yang
Internet-Draft
Email Address Internationalization                               Y. Abel
(EAI)                                                              TWNIC
Internet-Draft                                                 S. Steele
Obsoletes: 5335 (if approved)                                  S. Steele
Updates: 2045,5322                                  Microsoft
Updates: 2045,5321,5322                                    March 2, 2011
(if approved)                                                 D. Crocker
Intended status: Standards Track             Brandenburg InternetWorking
Expires: July 29, 2011                                          N. Freed
                                                                  Oracle
                                                        January 25, September 3, 2011

                    Internationalized Email Headers
                      draft-ietf-eai-rfc5335bis-08
                      draft-ietf-eai-rfc5335bis-09

Abstract

   Internet mail was originally limited to 7-bit ASCII.  Recent
   enhancements support Unicode's UTF-8 encoding in portions of a
   message.

   Full internationalization of electronic mail requires
   additional enhancement, including support for UTF-8 not only the
   capabilities to transmit non-ASCII content, to encode selected
   information in user-oriented specific header fields, such as and to use non-ASCII
   characters in the To, From, envelope addresses.  It also requires being able to
   express those addresses and Subject the information based on them in mail
   header fields.  This document specifies an enhancement to a variant of Internet mail
   that permits
   native UTF-8 support the use of Unicode encoded in UTF-8, rather than ASCII,
   as the base form for Internet email header and body field.  This form is
   permitted in transmission only if authorized by an SMTP extension, as
   specified in an associated specification.  This specification updates
   Section 6.4 of a message. [RFC2045] to conform with the requirements.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on July 29, September 3, 2011.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Changes to the 8-bit clean Model<!-- (by DCrocker) -->  Role of This Specification . . .  3
     1.2.  Terminology . . . . . . . . . . . . .  3
     1.2.  Relation to Other Standards  . . . . . . . . . .  4 . . . . .  3
   2.  Support for UTF-8 Encoding  Background and History . . . . . . . . . . . . . . . . . .  4
     2.1.  Message Object ABNF Changes . .  3
   3.  Terminology  . . . . . . . . . . . . .  4
     2.2.  Normalization . . . . . . . . . . . .  4
   4.  Changes on Message Header Fields . . . . . . . . . .  5
     2.3.  Content-Transfer-Encoding . . . . .  5
     4.1.  UTF-8 Syntax and Normalization . . . . . . . . . . . . . .  5
   3.  Internet Message Format Enhancement
     4.2.  Changes on MIME Headers  . . . . . . . . . . . . . .  5
   4.  Message Labeling . . .  5
     4.3.  Syntax Extensions to RFC 5322  . . . . . . . . . . . . . .  6
     4.4.  Change on addr-spec Syntax . . . . . .  6
   5.  MIME Enhancement . . . . . . . . . .  8
     4.5.  Trace Field Syntax . . . . . . . . . . . . .  6
     5.1.  Content-Transfer-Encoding . . . . . . .  9
     4.6.  message/global . . . . . . . . .  6
     5.2.  MIME Header Field . . . . . . . . . . . . .  9
   5.  Security Considerations  . . . . . . .  6
     5.3.  Content-Type: message/utf8-rfc822 . . . . . . . . . . . .  7 11
   6.  Security  IANA Considerations  . . . . . . . . . . . . . . . . . . .  8 . . 11
   7.  IANA Considerations  Acknowledgements . . . . . . . . . . . . . . . . . . . . .  9 . . 11
   8.  Acknowledgements  Edit history . . . . . . . . . . . . . . . . . . . . . . .  9
   9.  References . . 12
     8.1.  draft-ietf-eai-rfc5335bis-00 . . . . . . . . . . . . . . . 12
     8.2.  draft-ietf-eai-rfc5335bis-01 . . . . . . . . .  9
     9.1.  Normative References . . . . . . 12
     8.3.  draft-ietf-eai-rfc5335bis-02 . . . . . . . . . . . . .  9
     9.2.  Informative References . . 12
     8.4.  draft-ietf-eai-rfc5335bis-03 . . . . . . . . . . . . . . . 12
     8.5.  draft-ietf-eai-rfc5335bis-04 . 10
   Appendix A.  Changes to support UTF-8 . . . . . . . . . . . . . . 10

1.  Introduction

   Internet mail distinguishes a message from its transport and further
   divides a message between a header and a body [RFC5598].  Internet
   mail header fields contain a variety of strings that are intended to
   be user-visible.  The range of supported characters for these strings
   was originally limited to a subset 12
     8.6.  draft-ietf-eai-rfc5335bis-05 . . . . . . . . . . . . . . . 12
     8.7.  draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . . . . . . 13
     8.8.  draft-ietf-eai-rfc5335bis-07 . . . . . . . . . . . . . . . 13
     8.9.  draft-ietf-eai-rfc5335bis-09 . . . . . . . . . . . . . . . 13
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 13
     9.2.  Informative References . . . . . . . . . . . . . . . . . . 14

1.  Introduction

1.1.  Role of [ASCII]; globalization This Specification

   Full internationalization of the
   Internet electronic mail requires support several
   capabilities:

   o  The capability to transmit non-ASCII content, provided for as part
      of the much larger set contained basic MIME specification [RFC2045], [RFC2046].

   o  The capability to use international characters in UTF-8
   [RFC5198].  Complex encoding alternatives envelope
      addresses, discussed in [I-D.ietf-eai-frmwrk-4952bis] and
      specified in [I-D.ietf-eai-rfc5336bis].

   o  The capability to UTF-8, as an overlay express those addresses, and information related
      to
   the existing ASCII base, would introduce inefficiencies as well as
   opportunities for processing errors.  Native support for UTF-8
   encoding [RFC3629] is widely available among systems now used over
   the Internet.  Hence supporting them and based on them, in mail header fields, defined in this encoding directly within email
   is desired.
      document.

   This document specifies an enhancement to a variant of Internet mail that permits the
   use of Unicode encoded in UTF-8 encoding, [RFC3629], rather than only ASCII, as the
   base form for Internet email header fields.  This specification form is based on a model of native, end-to-end support
   for UTF-8, which uses an "8-bit clean" environment .  Support for
   carriage across legacy, 7-bit infrastructure and for processing permitted
   in transmission, if authorized by
   7-bit receivers requires additional mechanisms that are not provided the SMTP extension specified in
   [I-D.ietf-eai-rfc5336bis] or by this specification.

1.1.  Changes other transport mechanisms capable of
   processing it.

1.2.  Relation to the 8-bit clean Model<!-- (by DCrocker) --> Other Standards

   This is an extensive revision to the draft.  Changes include:

   o  Greatly simplified ABNF that is much more basic and integrated.

   o  Clean separation document updates Section 6.4 of [RFC2045].  It removes the changes in an email header [RFC5322] from
      those in
   blanket ban on applying a MIME header [RFC2045]

   o  Change to the default MIME content-transfer-encoding to be 8bit

   o  Elimination of all discussion subtypes
   of transport

   o  An appendix message/ and instead specifies that lists the derived ABNF rules a composite subtype MAY
   specify whether or not a content-transfer-encoding can be used for
   that inherit support
      UTF-8, due to subtype, with "cannot be used" as the changed basic rules

   Still Pending:

   o  ABF to support IDN

   o  Fix "Normalization" section; I could not figure out what it needs
      to say.  I wasn't trying default.

   This document also updates Section 3.4 of [RFC5322].  It Extended
   mailbox address syntax to change the existing spec, but simply
      fix the writing.

   o  Review/fix MIME C-T-E details

   The goal permit UTF-8 character in Section 4.3.

   Allowing use of the changes a content-transfer-encoding on subtypes of messages
   is not limited to dramatically simplify transmissions that are authorized by the specification SMTP
   extension specified in [I-D.ietf-eai-rfc5336bis]. message/global (see
   Section 4.6) permits use of a content-transfer-encoding.

2.  Background and History

   Mailbox names often represent the software needed to support a message with UTF8 encoding.
   Rather than specify a wide range names of UTF8-specific changes to the
   existing ABNF rules, it focuses on human users.  Many of
   these users throughout the few, underlying ABNF rules world have names that are not normally
   expressed with just the basis for user-visible ASCII text.  The premise for this
   is simple: If the message is repertoire of characters, and would
   like to be use more or less their real names in UTF-8, then it is their mailbox names.
   These users are also likely to use non-ASCII text in UTF-8.
   Subtle or complex rules that selectively add their display
   names and subjects of email messages, both received and sent.  This
   protocol specifies UTF-8 are not worth the
   effort, once as the message has already entered into encoding to represent email header
   field bodies.

   The traditional format of email messages [RFC5322] allows only ASCII
   characters in the realm header fields of UTF-8.

   The question, then, messages.  This prevents users
   from having email addresses that contain non-ASCII characters.  It
   further forces non-ASCII text in display names, comments, and in free
   text (such as in the "Subject:" field) to be encoded (as required by
   MIME format [RFC2047]).  This specification describes a change to the
   email message format that is whether this related to the SMTP message transport
   change has planted some
   landmines, described in the associated documents
   [I-D.ietf-eai-frmwrk-4952bis] and [I-D.ietf-eai-rfc5336bis], and that
   allows non-ASCII characters in most email header fields.  These
   changes affect SMTP clients, SMTP servers, mail user agents (MUAs),
   list expanders, gateways to other media, and all other processes that
   parse or handle email messages.

   As specified in [I-D.ietf-eai-rfc5336bis], an SMTP protocol extension
   "UTF8SMTP" is used to prevent the transmission of messages with UTF-8
   header fields to systems that cannot handle such messages.

   Use of this SMTP extension helps prevent the introduction of such
   messages into message stores that might misinterpret, improperly
   display, or mangle such messages.  It should be noted that using an
   ESMTP extension does not prevent transferring email messages with
   UTF-8 header fields to other systems that use the email format for
   messages and that may not be upgraded, such as unextended POP and
   IMAP servers.  Changes to these protocols to handle UTF-8 header
   fields are addressed in Trace [I-D.ietf-eai-rfc5721bis] and
   [I-D.ietf-eai-5378bis].

   The objective for this protocol is to allow UTF-8 in email header fields?

1.2.
   fields.

3.  Terminology

   A plain ASCII string is full compatible with [RFC5321] and [RFC5322].
   In this document, non-ASCII strings are UTF-8 strings if they are in
   header which contain at least one <UTF8-non-ascii>.

   Unless otherwise noted, all terms used here are defined in [RFC5321],
   [RFC5322], [I-D.ietf-eai-frmwrk-4952bis], or
   [I-D.ietf-eai-rfc5336bis].

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   Syntax descriptions use Augmented BNF (ABNF) [RFC5234].

   Basic terms for this specification include:

      ASCII:   An encoding of Control Characters and Basic Latin that
         occupies 7-bits, per [ASCII].  Such a string is fully
         compatible with email as specified in [RFC5322].

      UTF-8:   An encoding of Unicode

4.  Changes on Message Header Fields

   SMTP clients can send header fields in 8-bit bytes, per [RFC3629].

2.  Support for UTF-8 Encoding

2.1.  Message Object ABNF Changes

   Internet Mail that conforms to this specification format, if the UTF8SMTP
   extension is classed as
   supporting UTF-8.  However, UTF-8 characters within advertised by the ASCII range
   retain SMTP server or is permitted by other
   transport mechanisms.

   This protocol does NOT change the restrictions defined [RFC5322] rules for original, legacy, Latin-only
   email.  Therefore, ABNF enhancements defining header
   field names.  The bodies of header fields are allowed to include contain
   UTF-8 incrementally
   add characters, but the non-ASCII portions of header field names themselves must contain
   only ASCII characters.

   To permit UTF-8 characters in field values, the header definition in
   [RFC5322] is extended to that established base of
   ASCII. support the new format.  The following ABNF
   is defined to substitute those definitions in [RFC5322].

   The syntax rules not covered in this section remain as defined in
   [RFC5322].

4.1.  UTF-8 Syntax and Normalization

   UTF-8 characters are can be defined by in terms of octets using the
   following ABNF [RFC5234], taken from [RFC3629]:

   UTF8-enhancement

   UTF8-non-ascii  =   UTF8-2 / UTF8-3 / UTF8-4

   UTF8-2          =   <See Section 4 of RFC3629>

   UTF8-3          =   <See Section 4 of RFC3629>

   UTF8-4          =   <See Section 4 of RFC3629>

2.2.  Normalization

   See [RFC5198] for a discussion of normalization.  A normalized normalization; the use of
   normalization form [NFC] MAY be used.  However [NFC] can is RECOMMENDED.  Actually, if one is going
   to do internationalization properly, one of the most often-cited
   goals is to permit people to spell their names correctly.  Since many
   mailbox local parts reflect personal names, that principle applies as
   well.  And NFKC is not recommended because it may lose information
   that is needed to correctly spell some names in unusual
   circumstances.

2.3.  Content-Transfer-Encoding

4.2.  Changes on MIME Headers

   This specification is based on updates Section 6.4 of [RFC2045].  [RFC2045]
   prohibits applying a requirement for an "8-bit clean"
   infrastructure.  Support content-transfer-encoding to any subtypes of
   "message/".  This specification relaxes the rule -- it allows newly
   defined MIME types to permit content-transfer-encoding, and it allows
   content-transfer-encoding for UTF-8 semantics within a 7-bit
   environment requires translation conventions that are not specified
   here.  Consequently a Content-Transfer-Encoding value message/global (see Section 4.6).

   Background: Normally, transfer of 7-bit message/global will be done in
   8-bit-clean channels, and body parts will have "identity" encodings,
   that is, no decoding is not
   useful for necessary.  In the case where a message that
   containing a message/global is labeled downgraded from 8-bit to 7-bit as containing UTF-8.

3.  Internet Message Format Enhancement

   This section specifies UTF-8 enhancements for
   described in [RFC1652], an encoding may be applied to the header of message; if
   the message travels multiple times between a 7-bit environment and an
   Internet Mail message, as defined in [RFC5322].

   ABNF used in this section
   environment implementing UTF8SMTP, multiple levels of encoding may
   occur.  This is taken from that specification expected to be rarely seen in practice, and the
   ABNF specification.

   This specification retains
   potential complexity of other ways of dealing with the [RFC5322] rules for defining header
   field names.  The bodies of header fields issue are allowed
   thought to contain
   UTF-8 characters, but be larger than the header field names themselves must contain
   only ASCII characters. complexity of allowing nested encodings
   where necessary.

4.3.  Syntax Extensions to RFC 5322

   The following rules are intended to extend the corresponding rules in
   [RFC5322] and
   [RFC5234] in order to allow additional UTF-8 characters.

   VCHAR   =/  UTF8-non-ascii

   FWS     =  <see Section 3.2.2 of RFC 5322>

   CFWS    =  <see Section 3.2.2 of RFC 5322>

   ctext   =/  UTF8-enhancement

   atext   =/  UTF8-enhancement

   qtext   =/  UTF8-enhancement

   TENTATIVE (DCrocker):
   text    =/  UTF8-enhancement  UTF8-non-ascii
               ; note that this upgrades the body to UTF-8

   {{ how to add IDN to this? }}
   domain Extending ctext in RFC 5322, Section 3.2.2
   comment =   dot-atom / domain-literal   "(" *([FWS] uCcontent) [FWS] ")"

   word    =   uAtom / obs-domain uQuoted-String

   This means that all the [RFC5322] constructs that build upon these
   will permit UTF-8 characters, including comments and quoted strings.

      <field-name>   [RFC5322] has
   We do not change the rule <field-name> which specifies
         permissible names for user-defined header fields.  The current
         specification defines no changes syntax of <atext> in order to that rule.

      <msg-id> allow UTF-8
   characters in <addr-spec>.  This ABNF enables Message-ID strings would also allow UTF-8 characters in
   <message-id>, which is not allowed due to be full UTF-8.
         However the specification directs that Message-ID strings
         SHOULD be restricted limitation described in
   Section 4.5.  Instead, <uAtext> is added to ASCII.

4.  Message Labeling

   For clarity meet this requirement.

   uText          = %d1-9 /    ; all UTF-8 characters except
                    %d11-12 /  ; US-ASCII NUL, CR, and convenience, a message SHOULD contain an explicit
   label indicating the LF
                    %d14-127 /
                    UTF8-non-ascii

   uQuoted-Pair   = ("\" (VCHAR / WSP / UTF8-non-ascii )) / obs-qp

   VCHAR          = <See appendix B.1 of RFC 5234>

   WSP            = <See appendix B.1 of RFC 5234>

   obs-qp         = <See Section 4.1 of RFC 5322>

   uQcontent      = uQtext / uQuoted-Pair

   DQUOTE         = <See appendix B.1 of RFC 5234>

   uCcontent      = ctext / uQuoted-Pair / comment

   uQtext         = qtext / UTF8-non-ascii

   uAtext         = ALPHA / DIGIT /
                    "!" / "#" /  ; Any character base it uses.  This section defines except
                    "$" / "%" /  ; controls, SP, and specials.
                    "&" / "'" /  ; Used for atoms.
                    "*" / "+" /
                    "-" / "/" /
                    "=" / "?" /
                    "^" / "_" /
                    "`" / "{" /
                    "|" / "}" /
                    "~" /
                    UTF8-non-ascii

   uAtom          = [CFWS] 1*uAtext [CFWS]

   uDot-Atom      = [CFWS] uDot-Atom-text [CFWS]

   uDot-Atom-text = 1*uAtext *("." 1*uAtext)

   To allow the use of UTF-8 in a
   new Content-Description header field for this label:
   fields         =/ msg-character
   msg-character
   [RFC2045], the following syntax is used:

   description    = "MSG-Char:" ( "ASCII" / "UTF-8" ) CRLF

5.  MIME Enhancement

5.1.  Content-Transfer-Encoding "Content-Description" ":" *uText
                   ; Replace description in RFC 2045, Section 8

   The default "Content-Transfer-Encoding: is 8BIT" and <uText> syntax is assumed if
   the Content-Transfer-Encoding extended above to allow UTF-8 in all
   <description> header field is fields.

   Note, however, this does not present.

5.2.  MIME Header Field

   MIME contains at least one remove any constraint on the character
   set of protocol elements; for instance, all the allowed values for
   timezone in the "Date:" header field that fields are still expressed in ASCII.
   And also, none of this revised syntax changes what is intended for user
   display, namely Content-Description.  This section specifies allowed in a
   <msg-id>, which will still remain in pure ASCII.

4.4.  Change on addr-spec Syntax

   Internationalized email addresses are represented in UTF-8.  Thus,
   all header fields containing <mailbox>es are updated from [RFC5321]
   Section 4.1.2 to permit UTF-8
   enhancements addresses.

   mailbox        = name-addr / addr-spec / uAddr-Spec
                    ; Replace mailbox in RFC 5322, Section 3.4

   angle-addr     =/ [CFWS] "<" uAddr-Spec ">" [CFWS]
                    ; Extending angle-addr in RFC 5322, Section 3.4

   uAddr-Spec     = uLocal-Part "@" uDomain

   uLocal-Part    = uDot-Atom / uQuoted-String / obs-local-part
                    ; Replace Local-Part in RFC 5322, Section 3.4.1

   uQuoted-String = [CFWS] DQUOTE *([FWS] uQcontent) [FWS] DQUOTE [CFWS]

   obs-local-part = <See Section 4.4 of RFC 5322>
   uDomain        = uDot-Atom / domain-literal / obs-domain

   domain-literal = <See Section 3.4.1 of RFC 5322>

   Below are a few examples of possible <mailbox> representations.

      "DISPLAY_NAME" <ASCII@ASCII>
      ; traditional mailbox format

      "DISPLAY_NAME" <non-ASCII@non-ASCII>
      ; message will be rejected if UTF8SMTP extension is not supported

      <non-ASCII@non-ASCII>
      ; without DISPLAY_NAME and quoted string
      ; message will be rejected if UTF8SMTP extension is not supported

4.5.  Trace Field Syntax

   The 'uFor' clause in "Received:" fields has been allowed the use of
   internationalized addresses in "For" fields.  It described in
   [I-D.ietf-eai-rfc5336bis], Section 3.6.3.

   The "Return-path" designates the address to MIME which messages indicating
   non-delivery or other mail system failures are to be sent.  Thus, the
   header fields, as defined in [RFC2045].  ABNF
   rules used is augmented to carry UTF-8 addresses (see the revised syntax
   of <angle-addr> in Section 4.4 of this section document).  This will not
   break the rule of trace field integrity, because the header field is taken from that specification and
   added at the
   ABNF specification. last MTA and described in [RFC5321].

   The enhanced ABNF rules are:
   text <received-token> on "Received:" field ( described in Section
   3.6.7 of [RFC5322]) syntax is augmented to allow UTF-8 email address
   in the "For" field. <angle-addr> is augmented to include UTF-8 email
   address.  In order to allow UTF-8 email addresses in an <addr-spec>,
   <uAddr-Spec> is added to <received-token>.

   received-token =/ UTF8-non-ascii

5.3.  Content-Type: message/utf8-rfc822 uAddr-Spec

4.6.  message/global

   Internationalized messages MUST only be transmitted as authorized by
   [I-D.ietf-eai-rfc5336bis] or within a non-SMTP environment which
   supports these messages.  A message is a "message/global message", if

   o  it contains UTF-8 header values as specified in this document, or

   o  it contains UTF-8 values in the headers fields of body parts.

   The type message/utf-rfc822 message/global is similar to message/rfc822, except that it
   specifies that a message can contain UTF-8 characters in the headers
   of the message or body parts.  If this type is similar sent to message/rfc822.  However a 7-bit-only
   system, it
   specifies has to be encoded in MIME [RFC2045].  (Note that characters are interpreted a system
   compliant with MIME that doesn't recognize message/global SHOULD
   treat it as UTF-8 rather than being
   limited to ASCII. "application/octet-stream" as described in Section 5.2.4
   of [RFC2046].)

   Type name:  message

   Subtype name:  utf8-rfc822  global

   Required parameters:  none
   Optional parameters:  none

   Encoding considerations:  Any content-transfer-encoding is permitted.
      The 8-bit or binary content-transfer-encodings are recommended
      where permitted.

   Security considerations:  See Section 6. 5.

   Interoperability considerations:  The media type provides
      functionality similar to the message/rfc822 content type for email
      messages with international email headers.  When there is a need
      to embed or return such content in another message, there is
      generally an option to use this media type and leave the content
      unchanged or down-convert the content to message/rfc822.  Both of
      these choices will interoperate with the installed base, but with
      different properties.  Systems unaware of internationalized
      headers will typically treat a message/utf8-rfc822 message/global body part as an
      unknown attachment, while they will understand the structure of a
      message/rfc822.  However, systems that understand message/
      utf8-rfc822 message/global
      will provide functionality superior to the result of a
      down-conversion down-
      conversion to message/rfc822.  The most interoperable choice
      depends on the deployed software.

   Published specification:  RFC XXXX

   Applications that use this media type:  SMTP servers and email
      clients that support multipart/report generation or parsing.
      Email clients which forward messages with international headers as
      attachments.

   Additional information:

   Magic number(s):  none

   File extension(s):  The extension ".u8msg" is suggested.

   Macintosh file type code(s):  A uniform type identifier (UTI) of
      "public.utf8-email-message" is suggested.  This conforms to
      "public.message" and "public.composite-content", but does not
      necessarily conform to "public.utf8-plain-text".

   Person & email address to contact for further information:  See the
      Author's Address section of this document.

   Intended usage:  COMMON
   Restrictions on usage:  This is a structured media type which embeds
      other MIME media types.  The 8-bit or binary content-transfer-
      encoding SHOULD be used unless this media type is sent over a
      7-bit-only transport.

   Author:  See the Author's Address section of this document.

   Change controller:  IETF Standards Process

6.

5.  Security Considerations

   If a user has a non-ASCII mailbox address in UTF-8 and a an ASCII mailbox address in
   ASCII,
   address, a digital certificate that identifies that user might may have
   both addresses in the identity.  Having multiple email addresses as
   identities in a single certificate is already supported in PKIX
   (Public Key Infrastructure for X.509 Certificates) [RFC5280] and
   OpenPGP [RFC3156].

   Because UTF-8 often requires several octets to encode a single
   character, internationalized local parts and header value may cause
   mail addresses to become longer.  As specified in [RFC5322], each
   line of characters MUST be no more 998 octets, excluding the CRLF.
   On the other hand, MDA (Mail Delivery Agent) processes that parse,
   store, or handle email addresses or local parts must take extra care
   not to overflow buffers, truncate addresses, or exceed storage
   allotments.  Also, they must take care, when comparing, to use the
   entire lengths of the addresses.

   The security impact of UTF-8 headers on email signature systems such
   as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is
   discussed in [I-D.eai-frmwrk-4952bis], [I-D.ietf-eai-frmwrk-4952bis], Section 14.

7.

6.  IANA Considerations

   IANA is requested to update the registration of the message/
   utf8-rfc822 message/global
   MIME type using the registration form contained in Section 5.3.

8. 4.6.

7.  Acknowledgements

   This document incorporates many ideas first described in Internet-
   Draft form by Paul Hoffman, although many details have changed from
   that earlier work.

   The author especially thanks Jeff Yeh for his efforts and
   contributions on editing previous versions.

   Most of the content of this document is provided by John C Klensin.
   Also, some significant comments and suggestions were received from
   Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov, Chris
   Newman, Yangwoo Ko, Yoshiro Yoneya, and other members of the JET team
   (Joint Engineering Team) and were incorporated into the document.
   The editor sincerely thanks them for their contributions.

8.  Edit history

   [[RFC Editor: please remove this section before publishing.]]

8.1.  draft-ietf-eai-rfc5335bis-00

   1.  Applied Errata suggested by Alfred Hoenes.

   2.  Adjust [RFC2821] and [RFC2822] to [RFC5321] and [RFC5322].

   3.  Abrogate <alt-address> in ABNF of <angle-addr>.

   4.  Revoke [RFC5504] from this document.

   5.  Upgrade some references from I-Ds to RFC.

8.2.  draft-ietf-eai-rfc5335bis-01

   1.  Author name revised.

8.3.  draft-ietf-eai-rfc5335bis-02

   1.  ABNF revised.

8.4.  draft-ietf-eai-rfc5335bis-03

   1.  Fix typos

   2.  ABNF revised

   3.  Improve sentence

8.5.  draft-ietf-eai-rfc5335bis-04

   1.  improve sentences and ABNF revised based on AD and Co-chairs

8.6.  draft-ietf-eai-rfc5335bis-05

   1.  ABNF revised in Section 4.4 based on AD comments

8.7.  draft-ietf-eai-rfc5335bis-06

   1.  ABNF revised

   2.  improve Section 6

8.8.  draft-ietf-eai-rfc5335bis-07

   1.  Minor ABNF revised in Section 4.3

   2.  improve Section 6

8.9.  draft-ietf-eai-rfc5335bis-09

   Version -08 was posted in error and withdrawn.  Version 09 is is
   identical to version 07 except for a date change, addition of this
   note, and some vertical spacing compression on this page.

9.  References

9.1.  Normative References

   [ASCII]                   "Coded Character Set -- 7-bit American
                             Standard Code

   [I-D.ietf-eai-5378bis]         Resnick, P., Newman, C., and S. Shen,
                                  "IMAP Support for Information Interchange",
                             ANSI X3.4, 1986.

   [I-D.eai-frmwrk-4952bis] UTF-8",
                                  draft-ietf-eai-5378bis-00 (work in
                                  progress), November 2010.

   [I-D.ietf-eai-frmwrk-4952bis]  Klensin, J. and Y. Ko, "Overview and
                                  Framework for Internationalized
                                  Email",
                                  draft-ietf-eai-frmwrk-4952bis-10 (work
                                  in progress), September 2010.

   [Latin]                   Unicode Consortium, "C0 Controls and Basic
                             Latin",
                             http://unicode.org /charts/PDF/U0000.pdf,

   [I-D.ietf-eai-rfc5336bis]      Yao, J. and W. MAO, "SMTP Extension
                                  for Internationalized Email Address",
                                  draft-ietf-eai-rfc5336bis-07 (work in
                                  progress), December 2010.

   [I-D.ietf-eai-rfc5721bis]      Gellens, R., Newman, C., Yao, J., and
                                  K. Fujiwara, "POP3 Support for UTF-8",
                                  draft-ietf-eai-rfc5721bis-00 (work in
                                  progress), September 2010.

   [NFC]                          Davis, M. and K. Whistler, "Unicode
                                  Standard Annex #15: Unicode
                                  Normalization Forms", September 2010,
                             <http://www.unicode.org/reports/tr15/>.
                                  <http://www.unicode.org/reports/
                                  tr15/>.

   [RFC2119]                      Bradner, S., "Key words for use in
                                  RFCs to Indicate Requirement Levels",
                                  BCP 14, RFC 2119, March 1997.

   [RFC3629]                      Yergeau, F., "UTF-8, a transformation
                                  format of ISO 10646", STD 63,
                                  RFC 3629, November 2003.

   [RFC5198]                      Klensin, J. and M. Padlipsky, "Unicode
                                  Format for Network Interchange",
                                  RFC 5198, March 2008.

   [RFC5234]                      Crocker, D. and P. Overell, "Augmented
                                  BNF for Syntax Specifications: ABNF",
                                  STD 68, RFC 5234, January 2008.

   [RFC5321]                      Klensin, J., "Simple Mail Transfer
                                  Protocol", RFC 5321, October 2008.

   [RFC5322]                      Resnick, P., Ed., "Internet Message
                                  Format", RFC 5322, October 2008.

   [RFC5598]                 Crocker, D., "Internet Mail Architecture",
                             RFC 5598, July 2009.

   [Unicode]                 Unicode Consortium, "Unicode 6.0 Character
                             Code Charts", http://unicode.org /charts/,
                             2010.

9.2.  Informative References

   [RFC1652]                      Klensin, J., Freed, N., Rose, M.,
                                  Stefferud, E., and D. Crocker, "SMTP
                                  Service Extension for 8bit-
                                  MIMEtransport", RFC 1652, July 1994.

   [RFC2045]                      Freed, N. and N. Borenstein,
                                  "Multipurpose Internet Mail Extensions
                                  (MIME) Part One: Format of Internet
                                  Message Bodies", RFC 2045,
                                  November 1996.

   [RFC2046]                      Freed, N. and N. Borenstein,
                                  "Multipurpose Internet Mail Extensions
                                  (MIME) Part Two: Media Types",
                                  RFC 2046, November 1996.

   [RFC2047]                      Moore, K., "MIME (Multipurpose
                                  Internet Mail Extensions) Part Three:
                                  Message Header Extensions for Non-
                                  ASCII Text", RFC 2047, November 1996.

   [RFC3156]                      Elkins, M., Del Torto, D., Levien, R.,
                                  and T. Roessler, "MIME Security with
                                  OpenPGP", RFC 3156, August 2001.

   [RFC5280]                      Cooper, D., Santesson, S., Farrell,
                                  S., Boeyen, S., Housley, R., and W.
                                  Polk, "Internet X.509 Public Key
                                  Infrastructure Certificate and
                                  Certificate Revocation List (CRL)
                                  Profile", RFC 5280, May 2008.

Appendix A.  Changes to support UTF-8

   This section provides a basic audit of the places in a message that
   now can permit UTF-8 rather than being restricted to ASCII, based on
   the changes to underlying ABNF.  The audit ignores rule for
   "obsolete" constructs in RFC 5322.  (This is a first cut and the list
   is likely incomplete):

   VCHAR:   quoted-pair, unstructured

      > ccontent, qcontent

      > comment, quoted-string

      > word, local-part

      > phrase

      > display-name, keywords

   ctext:   ccontent > comment

   atext:   atom, dot-atom-text

   qtext:   qcontent > quoted-string

Authors' Addresses

   Abel Yang
   TWNIC
   4F-2, No. 9, Sec 2, Roosevelt Rd.
   Taipei,   100
   Taiwan

   Phone: +886 2 23411313 ext 505
   EMail: abelyang@twnic.net.tw

   Shawn Steele
   Microsoft

   EMail: Shawn.Steele@microsoft.com
   D. Crocker
   Brandenburg InternetWorking
   675 Spruce Dr.
   Sunnyvale
   USA

   Phone: +1.408.246.8253
   EMail: dcrocker@bbiw.net
   URI:   http://bbiw.net

   Ned Freed
   Oracle
   800 Royal Oaks
   Monrovia, CA  91016-6347
   USA

   EMail: ned.freed@mrochek.com