[Docs] [txt|pdf|xml|html] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits]

Versions: (RFC 4646) 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 RFC 5646

Network Working Group                                   A. Phillips, Ed.
Internet-Draft                                                    Lab126
Obsoletes: 4646 (if approved)                              M. Davis, Ed.
Intended status: BCP                                              Google
Expires: January 9, 2009                                    July 8, 2008


                     Tags for Identifying Languages
                       draft-ietf-ltru-4646bis-16

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 9, 2009.

















Phillips & Davis         Expires January 9, 2009                [Page 1]


Internet-Draft                language-tags                    July 2008


Abstract

   This document describes the structure, content, construction, and
   semantics of language tags for use in cases where it is desirable to
   indicate the language used in an information object.  It also
   describes how to register values for use in language tags and the
   creation of user-defined extensions for private interchange.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  The Language Tag . . . . . . . . . . . . . . . . . . . . . . .  5
     2.1.  Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .  5
       2.1.1.  Formatting of Language Tags  . . . . . . . . . . . . .  7
     2.2.  Language Subtag Sources and Interpretation . . . . . . . .  8
       2.2.1.  Primary Language Subtag  . . . . . . . . . . . . . . . 10
       2.2.2.  Extended Language Subtags  . . . . . . . . . . . . . . 12
       2.2.3.  Script Subtag  . . . . . . . . . . . . . . . . . . . . 13
       2.2.4.  Region Subtag  . . . . . . . . . . . . . . . . . . . . 14
       2.2.5.  Variant Subtags  . . . . . . . . . . . . . . . . . . . 16
       2.2.6.  Extension Subtags  . . . . . . . . . . . . . . . . . . 17
       2.2.7.  Private Use Subtags  . . . . . . . . . . . . . . . . . 18
       2.2.8.  Grandfathered Registrations  . . . . . . . . . . . . . 19
       2.2.9.  Classes of Conformance . . . . . . . . . . . . . . . . 20
   3.  Registry Format and Maintenance  . . . . . . . . . . . . . . . 22
     3.1.  Format of the IANA Language Subtag Registry  . . . . . . . 22
       3.1.1.  File Format  . . . . . . . . . . . . . . . . . . . . . 22
       3.1.2.  Record Definitions . . . . . . . . . . . . . . . . . . 23
       3.1.3.  Subtag and Tag Fields  . . . . . . . . . . . . . . . . 26
       3.1.4.  Description Field  . . . . . . . . . . . . . . . . . . 27
       3.1.5.  Deprecated Field . . . . . . . . . . . . . . . . . . . 28
       3.1.6.  Preferred-Value Field  . . . . . . . . . . . . . . . . 28
       3.1.7.  Prefix Field . . . . . . . . . . . . . . . . . . . . . 30
       3.1.8.  Suppress-Script Field  . . . . . . . . . . . . . . . . 30
       3.1.9.  Macrolanguage Field  . . . . . . . . . . . . . . . . . 31
       3.1.10. Scope Field  . . . . . . . . . . . . . . . . . . . . . 32
       3.1.11. Comments Field . . . . . . . . . . . . . . . . . . . . 33
     3.2.  Language Subtag Reviewer . . . . . . . . . . . . . . . . . 33
     3.3.  Maintenance of the Registry  . . . . . . . . . . . . . . . 34
     3.4.  Stability of IANA Registry Entries . . . . . . . . . . . . 34
     3.5.  Registration Procedure for Subtags . . . . . . . . . . . . 40
     3.6.  Possibilities for Registration . . . . . . . . . . . . . . 44
     3.7.  Extensions and the Extensions Registry . . . . . . . . . . 47
     3.8.  Update of the Language Subtag Registry . . . . . . . . . . 50
   4.  Formation and Processing of Language Tags  . . . . . . . . . . 51
     4.1.  Choice of Language Tag . . . . . . . . . . . . . . . . . . 51
       4.1.1.  Tagging Encompassed Languages  . . . . . . . . . . . . 55



Phillips & Davis         Expires January 9, 2009                [Page 2]


Internet-Draft                language-tags                    July 2008


       4.1.2.  Using Extended Language Subtags  . . . . . . . . . . . 55
     4.2.  Meaning of the Language Tag  . . . . . . . . . . . . . . . 57
     4.3.  Lists of Languages . . . . . . . . . . . . . . . . . . . . 59
     4.4.  Length Considerations  . . . . . . . . . . . . . . . . . . 60
       4.4.1.  Working with Limited Buffer Sizes  . . . . . . . . . . 60
       4.4.2.  Truncation of Language Tags  . . . . . . . . . . . . . 61
     4.5.  Canonicalization of Language Tags  . . . . . . . . . . . . 62
     4.6.  Considerations for Private Use Subtags . . . . . . . . . . 64
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 65
     5.1.  Language Subtag Registry . . . . . . . . . . . . . . . . . 65
     5.2.  Extensions Registry  . . . . . . . . . . . . . . . . . . . 66
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 68
   7.  Character Set Considerations . . . . . . . . . . . . . . . . . 69
   8.  Changes from RFC 4646  . . . . . . . . . . . . . . . . . . . . 70
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 74
     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 74
     9.2.  Informative References . . . . . . . . . . . . . . . . . . 75
   Appendix A.  Acknowledgements  . . . . . . . . . . . . . . . . . . 77
   Appendix B.  Examples of Language Tags (Informative) . . . . . . . 78
   Appendix C.  Examples of Registration Forms  . . . . . . . . . . . 81
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 83
   Intellectual Property and Copyright Statements . . . . . . . . . . 84





























Phillips & Davis         Expires January 9, 2009                [Page 3]


Internet-Draft                language-tags                    July 2008


1.  Introduction

   Human beings on our planet have, past and present, used a number of
   languages.  There are many reasons why one would want to identify the
   language used when presenting or requesting information.

   A user's language preferences often need to be identified so that
   appropriate processing can be applied.  For example, the user's
   language preferences in a Web browser can be used to select Web pages
   appropriately.  Language preferences can also be used to select among
   tools (such as dictionaries) to assist in the processing or
   understanding of content in different languages.

   In addition, knowledge about the particular language used by some
   piece of information content might be useful or even required by some
   types of processing; for example, spell-checking, computer-
   synthesized speech, Braille transcription, or high-quality print
   renderings.

   One means of indicating the language used is by labeling the
   information content with an identifier or "tag".  These tags can be
   used to specify user preferences when selecting information content,
   or for labeling additional attributes of content and associated
   resources.

   Tags can also be used to indicate additional language attributes of
   content.  For example, indicating specific information about the
   dialect, writing system, or orthography used in a document or
   resource may enable the user to obtain information in a form that
   they can understand, or it can be important in processing or
   rendering the given content into an appropriate form or style.

   This document specifies a particular identifier mechanism (the
   language tag) and a registration function for values to be used to
   form tags.  It also defines a mechanism for private use values and
   future extension.

   This document replaces [RFC4646], which replaced [RFC3066] and its
   predecessor [RFC1766].  For a list of changes in this document, see
   Section 8.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].







Phillips & Davis         Expires January 9, 2009                [Page 4]


Internet-Draft                language-tags                    July 2008


2.  The Language Tag

   Language tags are used to help identify languages, whether spoken,
   written, signed, or otherwise signaled, for the purpose of
   communication.  This includes constructed and artificial languages,
   but excludes languages not intended primarily for human
   communication, such as programming languages.

2.1.  Syntax

   The language tag is composed of one or more parts, known as
   "subtags".  Each subtag consists of a sequence of alphanumeric
   characters.  Subtags are distinguished and separated from one another
   by a hyphen ("-", ABNF [RFC5234] %x2D) and each subtag refines or
   narrows the range of language identified by the overall tag.

   Most subtags are distinguished by length, position in the tag, and
   content: each subtag's type can be recognized solely by these
   features.  This makes it possible to construct a parser that can
   extract and assign some semantic information to the subtags, even if
   the specific subtag values are not recognized.  Thus, a parser need
   not have a list of valid tags or subtags (that is, a copy of some
   version of the IANA Language Subtag Registry) in order to perform
   common searching and matching operations.  The only exceptions to
   this ability to infer meaning from subtag structure are the
   grandfathered tags shown in the productions 'regular' and 'irregular'
   below.  These tags were registered under RFC 3066 [RFC3066] and are a
   fixed list that can never change.

   The syntax of the language tag in ABNF [RFC5234] is:

 Language-Tag  = langtag             ; normal language tags
               / privateuse          ; private use tag
               / grandfathered       ; grandfathered tags

 langtag       = language
                 ["-" script]
                 ["-" region]
                 *("-" variant)
                 *("-" extension)
                 ["-" privateuse]

 language      = 2*3ALPHA            ; shortest ISO 639 code
                 ["-" extlang]       ; sometimes followed by
                                     ;   extended language subtags
               / 4ALPHA              ; or reserved for future use
               / 5*8ALPHA            ; or registered language subtag




Phillips & Davis         Expires January 9, 2009                [Page 5]


Internet-Draft                language-tags                    July 2008


 extlang       = 3ALPHA              ; selected ISO 639 codes
                 *2("-" 3ALPHA)      ; permanently reserved

 script        = 4ALPHA              ; ISO 15924 code

 region        = 2ALPHA              ; ISO 3166-1 code
               / 3DIGIT              ; UN M.49 code

 variant       = 5*8alphanum         ; registered variants
               / (DIGIT 3alphanum)

 extension     = singleton 1*("-" (2*8alphanum))

                                     ; Single alphanumerics
                                     ; "x" reserved for private use
 singleton     = %x41-57             ; a - w
               / %x59-5A             ; y - z
               / %x61-77             ; A - W
               / %x79-7A             ; Y - Z
               / DIGIT               ; 0 - 9


 privateuse    = "x" 1*("-" (1*8alphanum))

 grandfathered = irregular           ; non-redundant tags registered
               / regular             ;   during the RFC 3066 era


 irregular     = "en-GB-oed"         ; irregular tags do not match
               / "i-ami"             ; the 'langtag' production and
               / "i-bnn"             ; would not otherwise be
               / "i-default"         ; considered 'well-formed'
               / "i-enochian"        ; These tags are all valid,
               / "i-hak"             ; but most are deprecated
               / "i-klingon"         ; in favor of more modern
               / "i-lux"             ; subtags or subtag
               / "i-mingo"           ; combination
               / "i-navajo"
               / "i-pwn"
               / "i-tao"
               / "i-tay"
               / "i-tsu"
               / "sgn-BE-FR"
               / "sgn-BE-NL"
               / "sgn-CH-DE"






Phillips & Davis         Expires January 9, 2009                [Page 6]


Internet-Draft                language-tags                    July 2008


 regular       = "art-lojban"        ; these tags match the 'langtag'
               / "cel-gaulish"       ; production, but their subtags
               / "no-bok"            ; are not extended language
               / "no-nyn"            ; or variant subtags: their meaning
               / "zh-guoyu"          ; is defined by their registration
               / "zh-hakka"          ; and all of these are deprecated
               / "zh-min"            ; in favor of a more modern
               / "zh-min-nan"        ; subtag or sequence of subtags
               / "zh-xiang"

 alphanum      = (ALPHA / DIGIT)     ; letters and numbers

                        Figure 1: Language Tag ABNF

   All subtags have a maximum length of eight characters and whitespace
   is not permitted in a language tag.  There is a subtlety in the ABNF
   production 'variant': variants starting with a digit MAY be four
   characters long, while those starting with a letter MUST be at least
   five characters long.  For examples of language tags, see Appendix B.

   Although [RFC5234] refers to octets, the language tags described in
   this document are sequences of characters from the US-ASCII [ISO646]
   repertoire.  Language tags MAY be used in documents and applications
   that use other encodings, so long as these encompass the relevant
   part of the US-ASCII repertoire.  An example of this would be an XML
   document that uses the UTF-16LE [RFC2781] encoding of [Unicode].

2.1.1.  Formatting of Language Tags

   Note Well: the ABNF syntax does not distinguish between upper and
   lowercase.  The appearance of upper and lowercase letters in the
   various ABNF productions above do not affect how implementations
   interpret tags.  That is, the tag "I-AMI" is the same tag as the item
   "i-ami" in the 'irregular' production.  At all times, the tags and
   their subtags, including private use and extensions, are to be
   treated as case insensitive: there exist conventions for the
   capitalization of some of the subtags, but these MUST NOT be taken to
   carry meaning.

   Some of the common conventions for capitalization of subtags include:

   o  [ISO639-1] recommends that language codes be written in lowercase
      ('mn' Mongolian).

   o  [ISO15924] recommends that script codes use lowercase with the
      initial letter capitalized ('Cyrl' Cyrillic).





Phillips & Davis         Expires January 9, 2009                [Page 7]


Internet-Draft                language-tags                    July 2008


   o  [ISO3166-1] recommends that country codes be capitalized ('MN'
      Mongolia).

   However, in language tags the uppercase US-ASCII letters in the range
   'A' through 'Z' are always considered equivalent and mapped directly
   to their US-ASCII lowercase equivalents in the range 'a' through 'z'.
   Thus, the tag "mn-Cyrl-MN" is not distinct from "MN-cYRL-mn" or "mN-
   cYrL-Mn" (or any other combination), and each of these variations
   conveys the same meaning: Mongolian written in the Cyrillic script as
   used in Mongolia.

   Although case distinctions do not carry meaning in language tags,
   consistent formatting and presentation of the tags will aid users.
   The format of subtags in the registry is RECOMMENDED as the form to
   use in language tags.  This format generally corresponds to the
   common conventions for the various ISO standards from which the
   subtags are derived as listed above.  An implementation can reproduce
   this format without accessing the registry as follows: All subtags,
   including extension and private use subtags, use lowercase letters,
   with two exceptions: two-letter and four-letter subtags that neither
   appear at the start of the tag nor occur after singletons.  Such two-
   letter subtags are all uppercase (as in the tags "en-CA-x-ca" or
   "sgn-BE-FR") and four-letter subtags are titlecase (as in the tag
   "az-Latn-x-latn").

   Note: Case folding of ASCII letters in certain locales, unless
   carefully handled, sometimes produces non-ASCII character values.
   The Unicode Character Database file "SpecialCasing.txt" defines the
   specific cases that are known to cause problems with this.  In
   particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is
   uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE).
   Implementers SHOULD specify a locale-neutral casing operation to
   ensure that case folding of subtags does not produce this value,
   which is illegal in language tags.  For example, if one were to
   uppercase the region subtag 'in' using Turkish locale rules, the
   sequence U+0130 U+004E would result instead of the expected 'IN'.

2.2.  Language Subtag Sources and Interpretation

   The namespace of language tags and their subtags is administered by
   the Internet Assigned Numbers Authority (IANA) [RFC2860] according to
   the rules in Section 5 of this document.  The Language Subtag
   Registry maintained by IANA is the source for valid subtags: other
   standards referenced in this section provide the source material for
   that registry.

   Terminology used in this document:




Phillips & Davis         Expires January 9, 2009                [Page 8]


Internet-Draft                language-tags                    July 2008


   o  "Tag" refers to a complete language tag, such as "sr-Latn-RS" or
      "az-Arab-IR".  Examples of tags in this document are enclosed in
      double-quotes ("en-US").

   o  "Subtag" refers to a specific section of a tag, delimited by
      hyphen, such as the subtag 'Hant' in "zh-Hant-CN".  Examples of
      subtags in this document are enclosed in single quotes ('Hant').

   o  "Code" refers to values defined in external standards (and which
      are used as subtags in this document).  For example, 'Hant' is an
      [ISO15924] script code that was used to define the 'Hant' script
      subtag for use in a language tag.  Examples of codes in this
      document are enclosed in single quotes ('en', 'Hant').

   The definitions in this section apply to the various subtags within
   the language tags defined by this document, excepting those
   "grandfathered" tags defined in Section 2.2.8.

   Language tags are designed so that each subtag type has unique length
   and content restrictions.  These make identification of the subtag's
   type possible, even if the content of the subtag itself is
   unrecognized.  This allows tags to be parsed and processed without
   reference to the latest version of the underlying standards or the
   IANA registry and makes the associated exception handling when
   parsing tags simpler.

   Subtags in the IANA registry that do not come from an underlying
   standard can only appear in specific positions in a tag.
   Specifically, they can only occur as primary language subtags or as
   variant subtags.

   Note that sequences of private use and extension subtags MUST occur
   at the end of the sequence of subtags and MUST NOT be interspersed
   with subtags defined elsewhere in this document.

   Single-letter and single-digit subtags are reserved for current or
   future use.  These include the following current uses:

   o  The single-letter subtag 'x' is reserved to introduce a sequence
      of private use subtags.  The interpretation of any private use
      subtags is defined solely by private agreement and is not defined
      by the rules in this section or in any standard or registry
      defined in this document.

   o  All other single-letter subtags are reserved to introduce
      standardized extension subtag sequences as described in
      Section 3.7.




Phillips & Davis         Expires January 9, 2009                [Page 9]


Internet-Draft                language-tags                    July 2008


   o  The single-letter subtag 'i' is used by some grandfathered tags,
      such as "i-default", where it always appears in the first position
      and cannot be confused with an extension.

2.2.1.  Primary Language Subtag

   The primary language subtag is the first subtag in a language tag
   (with the exception of private use and certain grandfathered tags)
   and cannot be omitted.  The following rules apply to the primary
   language subtag:

   1.  All two-character primary language subtags were defined in the
       IANA registry according to the assignments found in the standard
       "ISO 639-1:2002, Codes for the representation of names of
       languages -- Part 1: Alpha-2 code" [ISO639-1], or using
       assignments subsequently made by the ISO 639-1 registration
       authority (RA) or governing standardization bodies.

   2.  All three-character primary language subtags in the IANA registry
       were defined according to the assignments found in one of these
       additional ISO 639 parts or assignments subsequently made by the
       relevant ISO 639 registration authorities or governing
       standardization bodies:

       A.  "ISO 639-2:1998 - Codes for the representation of names of
           languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2]

       B.  "ISO 639-3:2007 - Codes for the representation of names of
           languages -- Part 3: Alpha-3 code for comprehensive coverage
           of languages" [ISO639-3]

       C.  "ISO 639-5:2008 - Codes for the representation of names of
           languages -- Part 5: Alpha-3 code for language families and
           groups" [ISO639-5]

   3.  The subtags in the range 'qaa' through 'qtz' are reserved for
       private use in language tags.  These subtags correspond to codes
       reserved by ISO 639-2 for private use.  These codes MAY be used
       for non-registered primary language subtags (instead of using
       private use subtags following 'x-').  Please refer to Section 4.6
       for more information on private use subtags.

   4.  All four-character language subtags are reserved for possible
       future standardization.

   5.  All language subtags of 5 to 8 characters in length in the IANA
       registry were defined via the registration process in Section 3.5
       and MAY be used to form the primary language subtag.  At the time



Phillips & Davis         Expires January 9, 2009               [Page 10]


Internet-Draft                language-tags                    July 2008


       this document was created, there were no examples of this kind of
       subtag and future registrations of this type will be discouraged:
       primary languages are strongly RECOMMENDED for registration with
       ISO 639, and proposals rejected by ISO 639/RA-JAC will be closely
       scrutinized before they are registered with IANA.

   6.  The single-character subtag 'x' as the primary subtag indicates
       that the language tag consists solely of subtags whose meaning is
       defined by private agreement.  For example, in the tag "x-fr-CH",
       the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the
       French language or the country of Switzerland (or any other value
       in the IANA registry) unless there is a private agreement in
       place to do so.  See Section 4.6.

   7.  The single-character subtag 'i' is used by some grandfathered
       tags (see Section 2.2.8) such as "i-klingon" and "i-bnn".  (Other
       grandfathered tags have a primary language subtag in their first
       position.)

   8.  Other values MUST NOT be assigned to the primary subtag except by
       revision or update of this document.

   Note: For languages that have both an ISO 639-1 two-character code
   and a three character code (assigned by ISO 639-2, ISO 639-3, or ISO
   639-5), only the ISO 639-1 two-character code is defined in the IANA
   registry.

   Note: For languages that have no ISO 639-1 two-character code and for
   which the ISO 639-2/T (Terminology) code and the ISO 639-2/B
   (Bibliographic) codes differ, only the Terminology code is defined in
   the IANA registry.  At the time this document was created, all
   languages that had both kinds of three-character code were also
   assigned a two-character code; it is expected that future assignments
   of this nature will not occur.

   Note: To avoid problems with versioning and subtag choice as
   experienced during the transition between RFC 1766 and RFC 3066, as
   well as the canonical nature of subtags defined by this document, the
   ISO 639 Registration Authority Joint Advisory Committee (ISO 639/
   RA-JAC) has included the following statement in [iso639.prin]:

      "A language code already in ISO 639-2 at the point of freezing ISO
      639-1 shall not later be added to ISO 639-1.  This is to ensure
      consistency in usage over time, since users are directed in
      Internet applications to employ the alpha-3 code when an alpha-2
      code for that language is not available."

   In order to avoid instability in the canonical form of tags, if a



Phillips & Davis         Expires January 9, 2009               [Page 11]


Internet-Draft                language-tags                    July 2008


   two-character code is added to ISO 639-1 for a language for which a
   three-character code was already included in either ISO 639-2 or ISO
   639-3, the two-character code MUST NOT be registered.  See
   Section 3.4.

   For example, if some content were tagged with 'haw' (Hawaiian), which
   currently has no two-character code, the tag would not be invalidated
   if ISO 639-1 were to assign a two-character code to the Hawaiian
   language at a later date.

   Note: An example of independent primary language subtag registration
   might include: one of the grandfathered IANA registrations is
   "i-enochian".  The subtag 'enochian' could be registered in the IANA
   registry as a primary language subtag (assuming that ISO 639 does not
   register this language first), making tags such as "enochian-AQ" and
   "enochian-Latn" valid.

2.2.2.  Extended Language Subtags

   Extended language subtags are used to identify certain specially-
   selected languages that, for various historical reasons, are closely
   identified with an existing primary language subtag.  Extended
   language subtags are always used with their enclosing primary
   language subtag (indicated with a 'Prefix' field in the registry)
   when used to form the language tag.  All languages that have an
   extended language subtag in the registry also have an identical
   primary language subtag record in the registry.  This primary
   language subtag is RECOMMENDED for forming the language tag.  The
   following rules apply to the extended language subtags:

   1.  Extended language subtags consist solely of three-letter subtags.
       All extended language subtag records defined in the registry were
       defined in the IANA registry according to the assignments found
       in [ISO639-3].  Language collections and groupings, such as
       defined in [ISO639-5] are specifically excluded from being
       extended language subtags.

   2.  Extended language subtag records MUST include exactly one
       'Prefix' field indicating an appropriate subtag or sequence of
       subtags for that extended language subtag.

   3.  Extended language subtag records MUST include a 'Preferred-Value'
       and 'Deprecated' field.  The 'Preferred-Value' and 'Subtag'
       fields MUST be identical.

   4.  Although the ABNF production 'extlang' permits up to three
       extended language tags in the language tag, extended language
       subtags MUST NOT include another extended language subtag in



Phillips & Davis         Expires January 9, 2009               [Page 12]


Internet-Draft                language-tags                    July 2008


       their Prefix.  That is, the second and third extended language
       subtag positions in a language tag are permanently reserved and
       tags that include subtags in that position are invalid.

   For example, the macrolanguage Chinese ('zh') encompasses a number of
   languages.  For compatibility reasons, each of these languages has
   both a primary and extended language subtag in the registry.  Some
   examples of these include Gan Chinese ('gan'), Cantonese Chinese
   ('yue') and Mandarin Chinese ('cmn').  Each is encompassed by the
   macrolanguage 'zh' (Chinese).  Therefore, they each have the prefix
   "zh" in their registry records.  Thus Gan Chinese is represented with
   tags beginning "zh-gan" or "gan"; Cantonese with tags beginning
   either "yue" or "zh-yue"; and Mandarin Chinese with "zh-cmn" or
   "cmn".  The language subtag 'zh' can still be used without an
   extended language subtag to label a resource as some unspecified
   variety of Chinese, while the primary language subtag ('gan', 'yue',
   'cmn') is preferred to using the extended language form ("zh-gan",
   "zh-yue", "zh-cmn").

2.2.3.  Script Subtag

   Script subtags are used to indicate the script or writing system
   variations that distinguish the written forms of a language or its
   dialects.  The following rules apply to the script subtags:

   1.  Script subtags MUST follow the primary language subtag and MUST
       precede any other type of subtag.

   2.  All four-character subtags were defined according to
       [ISO15924]--"Codes for the representation of the names of
       scripts": alpha-4 script codes, or subsequently assigned by the
       ISO 15924 registration authority or governing standardization
       bodies, denoting the script or writing system used in conjunction
       with this language.

   3.  The script subtags 'Qaaa' through 'Qabx' are reserved for private
       use in language tags.  These subtags correspond to codes reserved
       by ISO 15924 for private use.  These codes MAY be used for non-
       registered script values.  Please refer to Section 4.6 for more
       information on private use subtags.

   4.  Script subtags MUST NOT be registered using the process in
       Section 3.5 of this document.  Variant subtags MAY be considered
       for registration for that purpose.

   5.  There MUST be at most one script subtag in a language tag, and
       the script subtag SHOULD be omitted when it adds no
       distinguishing value to the tag or when the primary language



Phillips & Davis         Expires January 9, 2009               [Page 13]


Internet-Draft                language-tags                    July 2008


       subtag's record includes a Suppress-Script field listing the
       applicable script subtag.

   Example: "sr-Latn" represents Serbian written using the Latin script.

2.2.4.  Region Subtag

   Region subtags are used to indicate linguistic variations associated
   with or appropriate to a specific country, territory, or region.
   Typically, a region subtag is used to indicate regional dialects or
   usage, or region-specific spelling conventions.  A region subtag can
   also be used to indicate that content is expressed in a way that is
   appropriate for use throughout a region, for instance, Spanish
   content tailored to be useful throughout Latin America.

   The following rules apply to the region subtags:

   1.  Region subtags MUST follow any primary language, extended
       language, or script subtags and MUST precede any other type of
       subtag.

   2.  All two-character subtags following the primary subtag were
       defined in the IANA registry according to the assignments found
       in [ISO3166-1] ("Codes for the representation of names of
       countries and their subdivisions -- Part 1: Country codes") using
       the list of alpha-2 country codes, or using assignments
       subsequently made by the ISO 3166-1 maintenance agency or
       governing standardization bodies.  In addition, the codes that
       are "exceptionally reserved" (as opposed to "assigned") in ISO
       3166-1 were also defined in the registry, with the exception of
       'UK', which is an exact synonym for the assigned code 'GB'.

   3.  All three-character subtags consisting of digit (numeric)
       characters following the primary subtag were defined in the IANA
       registry according to the assignments found in UN Standard
       Country or Area Codes for Statistical Use [UN_M.49] or
       assignments subsequently made by the governing standards body.
       Note that not all of the UN M.49 codes are defined in the IANA