Internet-Draft                                       H. Alvestrand

draft-alvestrand-lang-tags-v2-01.txt                    EDB Maxware

Target Category: Standards Track

                                                        March 2000

Obsoletes: RFC 1766                        Expires: September 2000

Tags for the Identification of Languages

Status of this Memo

     This document is an Internet-Draft and is in full conformance with
     all provisions of Section 10 of RFC 2026.

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups.  Note that
     other groups may also distribute working documents as Internet-

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-
     Drafts as reference material or to cite them other than as "work
     in progress."

This document describes a language tag for use in cases where it is
desired to indicate the language used in an information object.

It also defines a "Content-language:" header, for use in the case where
one desires to indicate the language of something that has RFC-822-like
headers, like MIME body parts or Web documents, and a new parameter to
the Multipart/Alternative type, to aid in the usage of the Content-
Language: header.

1. Introduction

There are a number of languages presently or previously spoken by human
beings in this world.

A great number of these people would prefer to have information
presented in a language which they understand.

In some contexts, it is possible to have information in more than one
language, or it might be possible to provide tools for assisting in the
understanding of a language (such as dictionaries).

A prerequisite for any such function is a means of labelling the
information content with an identifier for the language that is used in
this information content.

This document specifies an identifier mechanism, and one possible use
for it.

2. The Language tag

The language tag is composed of one or more parts: A primary language
tag and a (possibly empty) series of subtags.

The syntax of this tag in RFC-822 EBNF is:

 Language-Tag = Primary-tag *( "-" Subtag )
 Primary-tag = 1*8ALPHA
 Subtag = 1*8ALPHA

Whitespace is not allowed within the tag.

All tags are to be treated as case insensitive; there exist conventions
for capitalization of some of them, but these should not be taken to
carry meaning. For instance, ISO 3166 recommends that country codes are
capitalized (MN Mongolia), while ISO 639 recommends that language codes
are written in lower case (mn Mongolian).
The namespace of language tags is administered by the IANA according to
the rules in section 5 of this document.

The following registrations are predefined:

In the primary language tag:

- All 2-letter tags are interpreted according to ISO standard 639,
  "Code for the representation of names of languages" [ISO 639].

- All 3-letter tags are interpreted according to ISO 639 part 2, "Codes
  for the representation of names of languages -- Part 2: Alpha-3 code
  [ISO 639-2]
- The value "i" is reserved for IANA-defined registrations

- The value "x" is reserved for private use. Subtags of "x"will not be
  registered by the IANA.

- Other values shall not be assigned except by revisions of this

The reason for reserving all other tags is to be open towards new
revisions of ISO 639; the use of "i" and "x" is the minimum we can do
here to be able to extend the mechanism to meet our immediate

In the first subtag:

- All 2-letter codes are interpreted as ISO 3166 alpha-2 country codes
  denoting the area in which the language is used.

- Codes of 3 to 8 letters may be registered with the IANA, according to
  the rules in chapter 5 of this document.

The information in the subtag may for instance be:

- Country identification, such as en-US (this usage is described in ISO

- Dialect or variant information, such as no-nyn (nynorsk) or en-scouse

- Languages not listed in ISO 639 that are not variants of any listed
  language, which can be registered with the i-prefix, such as i-

- Script variations, such as az-arabic and az-cyrillic

In the second and subsequent subtag, any value can be registered.

ISO 639 defines a registration authority for additions to and changes
in the list of languages in ISO 639. This authority is:

      International Information Centre for Terminology (Infoterm)
      P.O. Box 130
      A-1021 Wien

      Phone: +43 1  26 75 35 Ext. 312
      Fax:   +43 1 216 32 72

The following codes have been added in 1989: ug (Uigur), iu (Inuktitut,
also called Eskimo), za (Zhuang), he (Hebrew, replacing iw), yi
(Yiddish, replacing ji), and id (Indonesian, replacing in).

In 1998, the following codes were added: se (Sami), kw (Cornish), gv
(Max Gaelic) and lb (Luxembourgish).

ISO 639-2 defines a registration authority for additions to and changes
in the list of languages in ISO 639-2. This authority is:

     Library of Congress
     (c/o Network Development and MARC Standards Office).
     Washington, D.C. 20540

     Phone: +1 [to be supplied]
     Fax:   +1 [to be supplied]

The registration agency for ISO 3166 (country codes) is:

      ISO 3166 Maintenance Agency Secretariat
      c/o DIN Deutches Institut fuer Normung
      Burggrafenstrasse 6
      Postfach 1107
      D-10787 Berlin
      Phone: +49 30 26 01 320
      Fax:   +49 30 26 01 231

The country codes AA, QM-QZ, XA-XZ and ZZ are reserved by ISO 3166 as
user-assigned codes.

2.1 Choice of language tag

One may occasionally be faced with several possible tags for the same
body of text.

Interoperability is best served if all users send the same tag, and use
the same tag for the same language for all documents; therefore, the
following guideline is recommended:

1.   Use the most precise tagging that can be ascertained.

2.   When a language has both an ISO 639-1 2-character tag and an ISO 639-
  2 3-character tag, use the ISO 639-1 2-character tag.

3.   When a language has both an ISO 639-2/T (Terminology) tag and an ISO
  639-2/B (Bibliographic) tag, and these differ, use the Terminology
  tag. (NOTE: At present, all languages for which there is a difference
  have 2-character tags. So this situation will hopefully not arise.)
  (The choice is arbitrary

4.   When a language has both an IANA-registered tag (i-something) and an
  ISO registered tag, use the ISO tag.

5.   Do NOT use the UND (Undetermined) tag unless the protocol in use
  forces you to give a value for the language tag, even if the language
  is unknown. Omitting the tag is preferred.

6.   Do NOT use the MUL (Multiple) tag if the protocol allows you to use
  multiple languages, as is the case for the Content-Language: header.

2.2 Meaning of the language tag

The language tag always defines a language as spoken (or written) by
human beings for communication of information to other human beings.
Computer languages such as programming languages are explicitly
There is no guaranteed relationship between languages whose tags begin
with the same series of subtags; specifically, they are NOT guraranteed
to be mutually intelligible, although this will sometimes be the case.

Applications should always treat a language tag as a single token; the
division into main tag and subtags is an administrative mechanism, not
a navigation aid.

The relationship between the tag and the information it relates to is
defined by the standard describing the context in which it appears.
Accordingly, this section can only give possible examples of its usage.

- For a single information object, it should be taken as the set of
  languages that is required for a complete comprehension of the
  complete object.
  Example: Plain text documents.

- For an aggregation of information objects, it should be taken as the
  set of languages used inside components of that aggregation.
  Examples: Document stores and libraries.

- For information objects whose purpose is to provide alternatives, it
  should be regarded as a hint that the material inside is provided in
  several languages, and that one has to inspect each of the
  alternatives in order to find its language or languages.  In this
  case, multiple languages need not mean that one needs to be
  multilingual to get complete understanding of the document.
  Example: MIME multipart/alternative.

- In markup languages, such as HTML, it is possible to define a
  construct embedding a language tag to indicate that contained text is
  written in this language, such that one could write <DIV
  lang="FR">C'est la vie</DIV> inside a Norwegian document; the
  Norwegian-speaking user could then access a French-Norwegian
  dictionary to find out what the marked section meant.

2.3 Language-range

Since the writing of RFC 1766, it has become apparent that there is a
need to define a term for a set of languages that share some common
property. The following definition of language-range is derived from
RFC 2068 (HTTP/1.1).

          language-range  = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" )

A language-range matches a language-tag if it exactly equals the tag,
or if it exactly equals a prefix of the tag such that the first tag
character following the prefix is "-".

 The special range "*" matches any tag. A protocol which uses language
ranges may specify additional rules about the semantics of "*"; for
instance, HTTP/1.1 specifies that it only matches languages not matched
by any other range within an "Accept-Language:" header.

NOTE: This use of a prefix matching rule does not imply that language
tags are assigned to languages in such a way that it is always true
that if a user understands a language with a certain tag, then this
user will also understand all languages with tags for which this tag is
a prefix. The prefix rule simply allows the use of prefix tags if this
is the case.

3. The Content-language header

The "Content-Language" header is intended for use in the case where one
desires to indicate the language(s) of something that has RFC-822-like
headers, such as MIME body parts or Web documents.

The RFC-822 EBNF of the Content-Language header is:

 Content-Language = "Content-Language" ":" 1#Language-tag

Note that the Content-Language header may list several languages in a
comma-separated list.

Whitespace is allowed, which means also that one can place
parenthesized comments anywhere in the language sequence.

3.1 Examples of Content-language values

Norwegian official document, with parallel text in both official
versions of Norwegian. (Both versions are readable by all Norwegians).

   Content-Type: multipart/alternative;
   Content-Language: no-nyn, no-bok

Voice recording from Liverpool downtown

   Content-type: audio/basic
   Content-Language: en-scouse

Document in Mingo, an American Indian language which does not have an
ISO 639 code:

   Content-type: text/plain

   Content-Language: i-mingo

An English-French dictionary

   Content-type: application/dictionary
   Content-Language: en, fr (This is a dictionary)

An official European Commission document (in a few of its official

   Content-type: multipart/alternative
   Content-Language: da, de, el, en, fr, it

An excerpt from Star Trek

   Content-type: video/mpeg
   Content-Language: i-klingon

(All the tags used in these examples were registered with IANA after
the publication of RFC 1766)

4. IANA registration procedure for language tags

Any language tag shall begin with an existing tag, and extend it.

The registration form given here must be used by anyone who wants to
use a language tag not defined by ISO or IANA.

Name of requester          :
E-mail address of requester:
Tag to be registered       :

English name of language   :

Native name of language (transcribed into ASCII):

Reference to published description of the language (book or article):

Any other relevant information:

The language form must be sent to <ietf-languages@iana.org> for a 2-
week review period before it can be submitted to IANA.  (This is an
open list. Requests to be added should be sent to <ietf-languages-

When the two week period has passed, the language tag reviewer, who is
appointed by the IETF Applications Area Director, either forwards the

request to IANA@ISI.EDU, or rejects it because of significant
objections raised on the list. Note that the reviewer can raise
objections on the list himself, if he so desires; the important thing
is that the objection must be made in public.

The applicant is free to modify a rejected application with additional
information and submit it again.

Decisions made by the reviewer may be appealed to the IESG.
All registered forms are available online in the directory

Updates of registrations follow the same procedure as registrations.
The language tag reviewer decides whether to allow a new registrant to
update a registration made by someone else; in the normal case,
objections by the original registrant would carry extra weight in such
a decision.

There is no deletion of registrations; when some registered tag should
not be used any more, for instance because a corresponding ISO 639 code
has been registered, the registration should  be amended by adding a
remark like "DO NOT USE: use <new code> instead" to the "other relevant
information" section.

5. Security Considerations

The only security issue that has been raised with language tags since
the publication of RFC 1766, which stated that "Security issues are
believed to be irrelevant to this memo", is a concern with language
ranges used in content negotiation - that they may be used to infer the
nationality of the sender, and thus identify potential targets for

This is a special case of the general problem that anything you send is
visible to the receiving party; it is useful to be aware that such
concerns can exist in some cases.

The exact magnitude of the threat, and any possible countermeasures, is
left to each application protocol.

6. Character set considerations

Codes may always be expressed using the US-ASCII character repertoire
(a-z), which is present in most character sets.

The issue of deciding upon the rendering of a character set based on
the language tag is not addressed in this memo; however, it is thought
impossible to make such a decision correctly for all cases unless means
of switching language in the middle of a text are defined (for example,
a rendering engine that decides font based on Japanese or Chinese
language may fail to work when a mixed Japanese-Chinese text is

7. Acknowledgements

This document has benefited from many rounds of review and comments in
various fora of the IETF and the Internet working groups.

Any list of contributors is bound to be incomplete; please regard the
following as only a selection from the group of people who have
contributed to make this document what it is today.

In alphabetical order:

Tim Berners-Lee, Nathaniel Borenstein, Sean Burke, Jim Conklin, John
Cowan, Dave Crocker, Martin Duerst, Michael Everson, Ned Freed, Tim
Goodwin, Dirk-Willem van Gulik,
Paul Hoffman, Olle Jarnefors, John Klensin, Keith Moore, Masataka Ohta,
Keld Jorn Simonsen, Rhys Weatherley, Misha Wolf, Francois Yergeau and
many, many others.

Special thanks must go to Michael Everson, who has served as language
tag reviewer for almost the complete period since the publication of
RFC 1766, and has provided a great deal of input to this version.

8. Author's Address

Harald Tveit Alvestrand
EDB Maxware

EMail: Harald.Alvestrand@maxware.no

Phone: +47 73 54 57 97

9. References

[ISO 639]

     ISO 639:1988 (E/F) - Code for the representation of names of
     languages - The International Organization for Standardization,
     1st edition, 1988-04-01 Prepared by ISO/TC 37 - Terminology
     (principles and coordination).

     Note that a new version (ISO 639-1:2000) is in preparation at the
     time of this writing.

[ISO 639-2]
     ISO 639-2:1998 - Codes for the representation of names of
     languages -- Part 2: Alpha-3 code  - edition 1, 1998-11-01, 66
     pages, prepared by ISO/TC 37/SC 2
[ISO 3166]

     ISO 3166:1988 (E/F) - Codes for the representation of names of
     countries - The International Organization for Standardization,
     3rd edition, 1988-08-15.

[RFC 1521]

     Borenstein, N., and N. Freed, "MIME Part One: Mechanisms for
     Specifying and Describing the Format of Internet Message Bodies",
     RFC 1521, Bellcore, Innosoft, September 1993.

[RFC 1327]

     Kille, S., "Mapping between X.400(1988) / ISO 10021 and RFC 822",
     RFC 1327, University College London, May 1992.

[ISO 15924]

     ISO/DIS 15924 - Codes for the representation of names of scripts
(being actively developed by ISO)

Appendix A: List of language tags

This list is NOT authoritative. It was prepared based on Keld
Simonsen's publicly available lists of codes, which were prepared from
drafts of the standards.

In matching 639-1 names to 639-2 names, a great number of changes in
names of languages were noted; it is expected that these will be
modified also in 639-1 in the forthcoming revision of that standard.

All the cases where the 639-2/T and 639-2/B codes differ have been
marked with an asterisk (*)

639-1     639-2/T   639-2/B   English name

aa   aar  aar  Afar
ab   abk  abk  Abkhazian
     ace  ace  Achinese
     ach  ach  Acoli
     ada  ada  Adangme
     afa  afa  Afro-Asiatic (Other)
     afh  afh  Afrihili
af   afr  afr  Afrikaans
     aka  aka  Akan
     akk  akk  Akkadian
     ale  ale  Aleut
     alg  alg  Algonquian languages
am   amh  amh  Amharic
     ang  ang  English, Old (ca. 450-1100)
     apa  apa  Apache languages
ar   ara  ara  Arabic
     arc  arc  Aramaic
     arn  arn  Araucanian
     arp  arp  Arapaho

     art  art  Artificial (Other)
     arw  arw  Arawak
as   asm  asm  Assamese
     ath  ath  Athapascan languages
     aus  aus  Australian languages
     ava  ava  Avaric
     ave  ave  Avestan
     awa  awa  Awadhi
ay   aym  aym  Aymara
az   aze  aze  Azerbaijani
     bad  bad  Banda
     bai  bai  Bamileke languages
ba   bak  bak  Bashkir
     bal  bal  Baluchi
     bam  bam  Bambara
     ban  ban  Balinese
     bas  bas  Basa
     bat  bat  Baltic (Other)
     bej  bej  Beja
be   bel  bel  Belarussian (ISO 639-1: Byelorussian)
     bem  bem  Bemba
bn   ben  ben  Bengali (ISO 639-1: Bengali; Bangla)
     ber  ber  Berber (Other)
     bho  bho  Bhojpuri
bi   bih  bih  Bihari
     bik  bik  Bikol
     bin  bin  Bini
     bis  bis  Bislama
     bla  bla  Siksika (Blackfoot)
     bnt  bnt  Bantu (Other)
bo * bod  tib  Tibetan
     bra  bra  Braj
br   bre  bre  Breton
     btk  btk  Batak (Indonesia)
     bua  bua  Buriat
     bug  bug  Buginese
bg   bul  bul  Bulgarian
     cad  cad  Caddo
     cai  cai  Central American Indian (Other)
     car  car  Carib
ca   cat  cat  Catalan
     cau  cau  Caucasian (Other)
     ceb  ceb  Cebuano
     cel  cel  Celtic (Other)
cs * ces  cze  Czech
     cha  cha  Chamorro
     chb  chb  Chibcha
     che  che  Chechen
     chg  chg  Chagatai
     chk  chk  Chuukese
     chm  chm  Mari
     chn  chn  Chinook jargon

     cho  cho  Choctaw
     chp  chp  Chipewyan
     chr  chr  Cherokee (Jalagi)
     chu  chu  Church Slavic
     chv  chv  Chuvash
     chy  chy  Cheyenne
     cmc  cmc  Chamic languages
     cop  cop  Coptic
kw   cor  cor  Cornish
co   cos  cos  Corsican
     cpe  cpe  Creoles and pidgins, English-based (Other)
     cpf  cpf  Creoles and pidgins, French-based (Other)
     cpp  cpp  Creoles and pidgins, Portuguese-based (Other)
     cre  cre  Cree
     crp  crp  Creoles and pidgins (Other)
     cus  cus  Cushitic (Other)
cy * cym  wel  Welsh
     dak  dak  Dakota
da   dan  dan  Danish
     day  day  Dayak
     del  del  Delaware
     den  den  Slave (Athapascan)
de * deu  ger  German
     dgr  dgr  Dogrib
     din  din  Dinka
     div  div  Divehi
     doi  doi  Dogri
     dra  dra  Dravidian (Other)
     dua  dua  Duala
     dum  dum  Dutch, Middle (ca. 1050-1350)
     dyu  dyu  Dyula
dz   dzo  dzo  Dzongkha (Bhutani in ISO 639-1)
     efi  efi  Efik
     egy  egy  Egyptian (Ancient)
     eka  eka  Ekajuk
el * ell  gre  Greek, Modern (post 1453)
     elx  elx  Elamite
en   eng  eng  English
     enm  enm  English, Middle (1100-1500)
eo   epo  epo  Esperanto
et   est  est  Estonian
eu * eus  baq  Basque
     ewe  ewe  Ewe
     ewo  ewo  Ewondo
     fan  fan  Fang
fo   fao  fao  Faroese
fa * fas  per  Persian
     fat  fat  Fanti
fj   fij  fij  Fijian (ISO 639-1: Fiji)
fi   fin  fin  Finnish
     fiu  fiu  Finno-Ugrian (Other)
     fon  fon  Fon

fr * fra  fre  French
     frm  frm  French, Middle (ca. 1400-1600)
     fro  fro  French, Old (842-ca. 1400)
fy   fry  fry  Frisian
     ful  ful  Fulah
     fur  fur  Friulian
     gaa  gaa  Ga
     gay  gay  Gayo
     gba  gba  Gbaya
     gem  gem  Germanic (Other)
     gez  gez  Geez
     gil  gil  Gilbertese
gd   gla  gla  Gaelic (Scots) (Scittish Gaelic)
ga   gle  gle  Irish (Irish Gaelic)
gl   glg  glg  Gallegan (Galician in ISO 639-1)
gv   glv  glv  Manx (Manx Gaelic)
     gmh  gmh  German, Middle High (ca. 1050-1500)
     goh  goh  German, Old High (ca. 750-1050)
     gon  gon  Gondi
     gor  gor  Gorontalo
     got  got  Gothic
     grb  grb  Grebo
     grc  grc  Greek, Ancient (to 1453)
gn   grn  grn  Guarani
gu   guj  guj  Gujarati
     gwi  gwi  Gwich'in
     hai  hai  Haida
ha   hau  hau  Hausa
     haw  haw  Hawaiian
he   heb  heb  Hebrew (iw in 639-1 first edition)
     her  her  Herero
     hil  hil  Hiligaynon
     him  him  Himachali
hi   hin  hin  Hindi
     hit  hit  Hittite
     hmn  hmn  Hmong
     hmo  hmo  Hiri Motu
hr * hrv  scr  Croatian
hu   hun  hun  Hungarian (Magyar)
     hup  hup  Hupa
hy * hye  arm  Armenian
     iba  iba  Iban
     ibo  ibo  Igbo
     ijo  ijo  Ijo
iu   iku  iku  Inuktitut
ie   ile  ile  Interlingue
     ilo  ilo  Iloko
ia   ina  ina  Interlingua (International Auxilary Language
     inc  inc  Indic (Other)
id   ind  ind  Indonesian (in in 639-1 first edition)
     ine  ine  Indo-European (Other)

ik   ipk  ipk  Inupiak
     ira  ira  Iranian (Other)
     iro  iro  Iroquoian languages
is * isl  ice  Icelandic
it   ita  ita  Italian
jw * jaw  jav  Javanese
ja   jpn  jpn  Japanese
     jpr  jpr  Judeo-Persian
     jrb  jrb  Judeo-Arabic
     kaa  kaa  Kara-Kalpak
     kab  kab  Kabyle
     kac  kac  Kachin
kl   kal  kal  Kalaallisut (Greenlandic in 639-1)
     kam  kam  Kamba
kn   kan  kan  Kannada
     kar  kar  Karen
ks   kas  kas  Kashmiri
ka * kat  geo  Georgian
     kau  kau  Kanuri
     kaw  kaw  Kawi
kk   kaz  kaz  Kazakh
     kha  kha  Khasi
     khi  khi  Khoisan (Other)
km   khm  khm  Khmer (Cambodian in 639-1)
     kho  kho  Khotanese
     kik  kik  Kikuyu
rw   kin  kin  Kinyarwanda
ky   kir  kir  Kirghiz
     kmb  kmb  Kimbundu
     kok  kok  Konkani
     kom  kom  Komi
     kon  kon  Kongo
ko   kor  kor  Korean
     kos  kos  Kosraean
     kpe  kpe  Kpelle
     kro  kro  Kru
     kru  kru  Kurukh
     kua  kua  Kuanyama
     kum  kum  Kumyk
ku   kur  kur  Kurdish
     kut  kut  Kutenai
     lad  lad  Ladino
     lah  lah  Lahnda
     lam  lam  Lamba
lo   lao  lao  Lao (Laotian in 639-1)
la   lat  lat  Latin
lv   lav  lav  Latvian (Latvian, Lettish in 639-1)
     lez  lez  Lezghian
ln   lin  lin  Lingala
lt   lit  lit  Lithuanian
     lol  lol  Mongo
     loz  loz  Lozi

lb   ltz  ltz  Letzeburgesch
     lua  lua  Luba-Lulua
     lub  lub  Luba-Katanga
     lug  lug  Ganda
     lui  lui  Luiseno
     lun  lun  Lunda
     luo  luo  Luo (Kenya and Tanzania)
     lus  lus  Lushai
     mad  mad  Madurese
     mag  mag  Magahi
     mah  mah  Marshall
     mai  mai  Maithili
     mak  mak  Makasar
ml   mal  mal  Malayalam
     man  man  Mandingo
     map  map  Austronesian (Other)
mr   mar  mar  Marathi
     mas  mas  Masai
     mdr  mdr  Mandar
     men  men  Mende
     mga  mga  Irish, Middle (900-1200)
     mic  mic  Micmac
     min  min  Minangkabau
     mis  mis  Miscellaneous languages
mk * mkd  mac  Macedonian
     mkh  mkh  Mon-Khmer (Other)
mg   mlg  mlg  Malagasy
mt   mlt  mlt  Maltese
     mni  mni  Manipuri
     mno  mno  Manobo languages
     moh  moh  Mohawk
mo   mol  mol  Moldavian
mn   mon  mon  Mongolian
     mos  mos  Mossi
mi * mri  mao  Maori
ms * msa  may  Malay
     mul  mul  Multiple languages
     mun  mun  Munda languages
     mus  mus  Creek
     mwr  mwr  Marwari
my * mya  bur  Burmese
     myn  myn  Mayan languages
     nah  nah  Nahuatl
     nai  nai  North American Indian (Other)
na   nau  nau  Nauru
     nav  nav  Navajo
     nbl  nbl  Ndebele, South
     nde  nde  Ndebele, North
     ndo  ndo  Ndonga
ne   nep  nep  Nepali
     new  new  Newari
     nia  nia  Nias

     nic  nic  Niger-Kordofanian (Other)
     niu  niu  Niuean
nl * nld  dut  Dutch
     non  non  Norse, Old
no   nor  nor  Norwegian
     nso  nso  Sohto, Northern
     nub  nub  Nubian languages
     nya  nya  Nyanja
     nym  nym  Nyamwezi
     nyn  nyn  Nyankole
     nyo  nyo  Nyoro
     nzi  nzi  Nzima
oc   oci  oci  Occitan (post 1500)
     oji  oji  Ojibwa
or   ori  ori  Oriya
om   orm  orm  Oromo
     osa  osa  Osage
     oss  oss  Ossetic
     ota  ota  Turkish, Ottoman (1500-1928)
     oto  oto  Otomian languages
     paa  paa  Papuan (Other)
     pag  pag  Pangasinan
     pal  pal  Pahlavi
     pam  pam  Pampanga
pa   pan  pan  Panjabi (Punjabi in 639-1)
     pap  pap  Papiamento
     pau  pau  Palauan
     peo  peo  Persian, Old (ca. 600-400 B.C.)
     phi  phi  Philippine (Other)
     phn  phn  Phoenician
     pli  pli  Pali
pl   pol  pol  Polish
     pon  pon  Pohnpeian
     por  por  Portuguese
     pra  pra  Prakrit languages
     pro  pro  Proven‡al, Old (to 1500)
ps   pus  pus  Pushto (Pashto, Pushto in 639-1)
     qaa-qtz   qaa-qtz   Reserved for local use
qu   que  que  Quechua
     raj  raj  Rajasthani
     rap  rap  Rapanui
     rar  rar  Rarotongan
     roa  roa  Romance (Other)
rm   roh  roh  Raeto-Romance (Rhaeto-Romance in 639-1)
     rom  rom  Romany
     ron  rum  Romanian
rn   run  run  Rundi (Kirundi in 639-1)
ru   rus  rus  Russian
     sad  sad  Sandawe
sg   sag  sag  Sango (Sangho in 639-1)
     sah  sah  Yakut
     sai  sai  South American Indian (Other)

     sal  sal  Salishan languages
     sam  sam  Samaritan Aramaic
sa   san  san  Sanskrit
     sas  sas  Sasak
     sat  sat  Santali
     sco  sco  Scots
     sel  sel  Selkup
     sem  sem  Semitic (Other)
     sga  sga  Irish, Old (to 900)
     shn  shn  Shan
sh   (shr)     (shr)     Serbo-croatian (withdrawn)
     sid  sid  Sidamo
si   sin  sin  Sinhalese
     sio  sio  Siouan languages
     sit  sit  Sino-Tibetan (Other)
     sla  sla  Slavic (Other)
sk * slk  slo  Slovak
sl   slv  slv  Slovenian
se   smi  smi  Sami languages (Northern Sami in 639-1)
sm   smo  smo  Samoan
sn   sna  sna  Shona
sd   snd  snd  Sindhi
     snk  snk  Soninke
     sog  sog  Sogdian
so   som  som  Somali
     son  son  Songhai
st   sot  sot  Sotho, Southern (Sesotho in 639-1)
es * spa  spa  Spanish (but note that T code changes to esp in 2003)
sq * sqi  alb  Albanian
     srd  srd  Sardinian
sr * srp  scc  Serbian
     srr  srr  Serer
     ssa  ssa  Nilo-Saharan (Other)
ss   ssw  ssw  Swati (Siswati in 639-1)
     suk  suk  Sukuma
su   sun  sun  Sundanese
     sus  sus  Susu
     sux  sux  Sumerian
     swa  swa  Swahili
sv   swe  swe  Swedish
     syr  syr  Syriac
     tah  tah  Tahitian
     tai  tai  Tai (Other)
ta   tam  tam  Tamil
tt   tat  tat  Tatar
te   tel  tel  Telugu
     tem  tem  Timne
     ter  ter  Tereno
     tet  tet  Tetum
tg   tgk  tgk  Tajik
tl   tgl  tgl  Tagalog
th   tha  tha  Thai

     tig  tig  Tigre
ti   tir  tir  Tigrinya
     tiv  tiv  Tiv
     tkl  tkl  Tokelau
     tli  tli  Tlingit
     tmh  tmh  Tamashek
     tog  tog  Tonga (Nyasa)
to   ton  ton  Tonga (Tonga Islands)
     tpi  tpi  Tok Pisin
     tsi  tsi  Tsimshian
tn   tsn  tsn  Tswana (Setswana in 639-1)
ts   tso  tso  Tsonga
tk   tuk  tuk  Turkmen
     tum  tum  Tumbuka
tr   tur  tur  Turkish
     tut  tut  Altaic (Other)
     tvl  tvl  Tuvalu
tw   twi  twi  Twi
     tyv  tyv  Tuvinian
     uga  uga  Ugaritic
ug   uig  uig  Uighur
uk   ukr  ukr  Ukrainian
     umb  umb  Umbundu
     und  und  Undetermined
ur   urd  urd  Urdu
uz   uzb  uzb  Uzbek
     vai  vai  Vai
     ven  ven  Venda
vi   vie  vie  Vietnamese
vo   vol  vol  Volap’k
     vot  vot  Votic
     wak  wak  Wakashan languages
     wal  wal  Walamo
     war  war  Waray
     was  was  Washo
     wen  wen  Sorbian languages
wo   wol  wol  Wolof
xh   xho  xho  Xhosa
     yao  yao  Yao
     yap  yap  Yapese
yi   yid  yid  Yiddish (ji in first edition of 639-1)
yo   yor  yor  Yoruba
     ypk  ypk  Yupik languages
     zap  zap  Zapotec
     zen  zen  Zenaga
za   zha  zha  Zhuang
zh   zho  chi  Chinese
     znd  znd  Zande
zu   zul  zul  Zulu
     zun  zun  Zuni

Appendix B: Changes from RFC 1766

@ Email list address changed from ietf-types@uninett.no to ietf-

@ Updated author's address

@ Added language-range construct from HTTP/1.1

@ Added use of ISO 639-2 language codes

@ Added list of language codes

@ Changed examples to use registered tags

@ Moved Multipart/Alternative-related stuff  to appendix C

@ Added "Any other information" to registration form

@ Added description of procedure for updating registrations

Appendix C: Use of Content-Language with Multipart/Alternative

NOTE: This appendix details an idea that was proposed in RFC 1766 to
deal with a particular kind of alternative content. However, this has
not found use in practice, and is therefore not suitable for the IETF
standards track. It is being preserved here as a non-normative appendix

When using the Multipart/Alternative body part of MIME, it is possible
to have the body parts giving the same information content in different
languages. In this case, one should put a Content-Language header on
each of the body parts, and a summary Content-Language header onto the
Multipart/Alternative itself.

The differences parameter to multipart/alternative

As defined in RFC 1541, "Multipart/Alternative" only has one parameter:

The common usage of "Multipart/Alternative" is to have more than one
format of the same message (f.ex. PostScript and ASCII).

The use of language tags to differentiate between different
alternatives will certainly not lead all MIME UAs to present the most
meaningful, understandable or significant body part as default.

Therefore, a new parameter is defined, to allow the configuration of
MIME readers to handle language differences in a sensible manner.

     Name: Differences
     Value: One or more of

Further values can be registered with IANA; these shall refer to the
name of a header for which a definition exists in a published RFC.  If
not present, "Differences=Content-Type" is assumed.

The intent is that the MIME reader can look at these headers of the
message component to do an intelligent choice of what to present to the
user, based on knowledge about the user preferences and capabilities.

(The intent of having registration with IANA of the fields used in this
context is to maintain a list of usages that a mail UA may expect to
encounter, not to reject usages.)

(NOTE: The MIME specification [RFC 1521], section 7.2, states that
headers not beginning with "Content-" are generally to be ignored in
body parts. People defining a header for use with "differences=" should
take note of this.)

The mechanism for deciding which body part to present is outside the
scope of this document.


Content-Type: multipart/alternative; differences=Content-Language;
Content-Language: en, fr, de

Content-Language: fr

Le renard brun et agile saute par dessus le chien paresseux
Content-Language: de
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-encoding: quoted-printable

Der schnelle braune Fuchs h=FCpft =FCber den faulen Hund
Content-Language: en

The quick brown fox jumps over the lazy dog

When composing a message, the choice of sequence may be arbitrary.
However, non-MIME mail readers will show the first body part first,
meaning that this should most likely be the language understood by most
of the recipients.

Appendix X: Changes from draft -00

This appendix is to be deleted by the RFC Editor before publication as

Changes from draft-00:

- Fixed up the language tag table

- Moved multipart/alternative stuff to appendix

- Changed examples to use registered tags

- Added * in languagte tag table to indicate B/T conflicts

- Considered, but did not adopt, changing from recommending T codes to
  recommending B codes. At the moment, the only argument that appeals
  to the author is that the T codes look more like the 639-1 codes than
  the B codes do.

- Added procedures for updating a registration

Here is the list of changes that need to be done to this doc before
advancing it to Draft or reissuing it.

- Decide whether or not to write anything about use of country codes in
  other places than the first subtag, or region codes, or script codes

- Decide whether it is worth it to try to write down any more
  guidelines for what language tags people should register

draft-alvestrand-lang-tags-v2-01.txt                    [Page 21]

