draft-ietf-idnabis-protocol-01.txt   draft-ietf-idnabis-protocol-02.txt 
Network Working Group J. Klensin, Ed. Network Working Group J. Klensin, Ed.
Obsoletes: 3490 (if approved) Obsoletes: 3490 (if approved)
Intended status: Standards Track Intended status: Standards Track
Expires: November 28, 2008 Expires: January 15, 2009
Internationalized Domain Names in Applications (IDNA): Protocol Internationalized Domain Names in Applications (IDNA): Protocol
draft-ietf-idnabis-protocol-01.txt draft-ietf-idnabis-protocol-02.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on November 28, 2008. This Internet-Draft will expire on January 15, 2009.
Abstract Abstract
This document supplies the protocol definition for a revised and This document supplies the protocol definition for a revised and
updated specification for internationalized domain names (IDNs). The updated specification for internationalized domain names (IDNs). The
rationale for these changes, the relationship to the older rationale for these changes, the relationship to the older
specification, and important terminology are provided in other specification, and important terminology are provided in other
documents. This document specifies the protocol mechanism, called documents. This document specifies the protocol mechanism, called
Internationalizing Domain Names in Applications (IDNA), for Internationalizing Domain Names in Applications (IDNA), for
registering and looking up IDNs in a way that does not require registering and looking up IDNs in a way that does not require
skipping to change at page 2, line 22 skipping to change at page 2, line 22
3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 5
3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 6 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 6
3.2.2. Non-domain-name Data Types Stored in the DNS . . . . . 6 3.2.2. Non-domain-name Data Types Stored in the DNS . . . . . 6
4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 6 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 6
4.1. Proposed label . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Proposed label . . . . . . . . . . . . . . . . . . . . . . 6
4.2. Conversion to Unicode and Normalization . . . . . . . . . 7 4.2. Conversion to Unicode and Normalization . . . . . . . . . 7
4.3. Permitted Character and Label Validation . . . . . . . . . 7 4.3. Permitted Character and Label Validation . . . . . . . . . 7
4.3.1. Rejection of Characters that are not Permitted . . . . 7 4.3.1. Rejection of Characters that are not Permitted . . . . 7
4.3.2. Label Validation . . . . . . . . . . . . . . . . . . . 7 4.3.2. Label Validation . . . . . . . . . . . . . . . . . . . 7
4.3.3. Registration Validation Summary . . . . . . . . . . . 8 4.3.3. Registration Validation Summary . . . . . . . . . . . 8
4.4. Registry Restrictions . . . . . . . . . . . . . . . . . . 8 4.4. Registry Restrictions . . . . . . . . . . . . . . . . . . 9
4.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9 4.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9
4.6. Insertion in the Zone . . . . . . . . . . . . . . . . . . 9 4.6. Insertion in the Zone . . . . . . . . . . . . . . . . . . 9
5. Domain Name Resolution (Lookup) Protocol . . . . . . . . . . . 9 5. Domain Name Resolution (Lookup) Protocol . . . . . . . . . . . 9
5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 9 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 9
5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10
5.3. Character Changes in Preprocessing or the User 5.3. Character Changes in Preprocessing or the User
Interface . . . . . . . . . . . . . . . . . . . . . . . . 10 Interface . . . . . . . . . . . . . . . . . . . . . . . . 10
5.4. A-label Input . . . . . . . . . . . . . . . . . . . . . . 11 5.4. A-label Input . . . . . . . . . . . . . . . . . . . . . . 11
5.5. Validation and Character List Testing . . . . . . . . . . 11 5.5. Validation and Character List Testing . . . . . . . . . . 11
5.6. Punycode Conversion . . . . . . . . . . . . . . . . . . . 12 5.6. Punycode Conversion . . . . . . . . . . . . . . . . . . . 12
5.7. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 12 5.7. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 12
6. Name server Considerations . . . . . . . . . . . . . . . . . . 12 6. Name server Considerations . . . . . . . . . . . . . . . . . . 12
6.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 12 6.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 13
6.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 13 6.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 13
6.3. Root Server Considerations . . . . . . . . . . . . . . . . 14 6.3. Root and other DNS Server Considerations . . . . . . . . . 14
7. Security Considerations . . . . . . . . . . . . . . . . . . . 14 7. Security Considerations . . . . . . . . . . . . . . . . . . . 14
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
9. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 15 9. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 15
9.1. Version -00 of draft-klensin-idnabis-protocol . . . . . . 15 9.1. Version -00 of draft-klensin-idnabis-protocol . . . . . . 15
9.2. Versions -01 and -02 of draft-klensin-idnabis-protocol . . 15 9.2. Versions -01 and -02 of draft-klensin-idnabis-protocol . . 15
9.3. Version -03 of draft-klensin-idnabis-protocol . . . . . . 15 9.3. Version -03 of draft-klensin-idnabis-protocol . . . . . . 15
9.4. Version -04 of draft-klensin-idnabis-protocol . . . . . . 15 9.4. Version -04 of draft-klensin-idnabis-protocol . . . . . . 15
9.5. Version -00 of draft-ietf-idnabis-protocol . . . . . . . . 16 9.5. Version -00 of draft-ietf-idnabis-protocol . . . . . . . . 16
9.6. Version -01 of draft-ietf-idnabis-protocol . . . . . . . . 16 9.6. Version -01 of draft-ietf-idnabis-protocol . . . . . . . . 16
10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 16 9.7. Version -02 of draft-ietf-idnabis-protocol . . . . . . . . 16
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 17
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
12.1. Normative References . . . . . . . . . . . . . . . . . . . 17 12.1. Normative References . . . . . . . . . . . . . . . . . . . 17
12.2. Informative References . . . . . . . . . . . . . . . . . . 18 12.2. Informative References . . . . . . . . . . . . . . . . . . 18
Appendix A. The Contextual Rules Registry . . . . . . . . . . . . 19 Appendix A. The Contextual Rules Registry . . . . . . . . . . . . 19
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 22 Appendix B. Contextual Rules Registry - Alternate Syntax . . . . 23
Intellectual Property and Copyright Statements . . . . . . . . . . 24 B.1. HYPHEN-MINUS . . . . . . . . . . . . . . . . . . . . . . . 24
B.2. ZERO WIDTH NON-JOINER . . . . . . . . . . . . . . . . . . 24
B.3. ZERO WIDTH JOINER . . . . . . . . . . . . . . . . . . . . 25
B.4. MIDDLE DOT . . . . . . . . . . . . . . . . . . . . . . . . 25
B.5. GREEK LOWER NUMERAL SIGN (KERAIA) . . . . . . . . . . . . 25
B.6. MODIFIER LETTER PRIME . . . . . . . . . . . . . . . . . . 26
B.7. COMBINING CYRILLIC TITLO . . . . . . . . . . . . . . . . . 26
B.8. HEBREW PUNCTUATION GERESH . . . . . . . . . . . . . . . . 27
B.9. HEBREW PUNCTUATION GERSHAYIM . . . . . . . . . . . . . . . 27
B.10. IDEOGRAPHIC ITERATION MARK; . . . . . . . . . . . . . . . 27
B.11. VERTICAL IDEOGRAPHIC ITERATION MARK . . . . . . . . . . . 28
B.12. KATAKANA MIDDLE DOT . . . . . . . . . . . . . . . . . . . 28
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 28
Intellectual Property and Copyright Statements . . . . . . . . . . 29
1. Introduction 1. Introduction
This document supplies the protocol definition for a revised and This document supplies the protocol definition for a revised and
updated specification for internationalized domain names. The updated specification for internationalized domain names. The
rationale for these changes and relationship to the older rationale for these changes and relationship to the older
specification and some new terminology is provided in other specification and some new terminology is provided in other
documents, notably [IDNA2008-Rationale]. documents, notably [IDNA2008-Rationale].
IDNA works by allowing applications to use certain ASCII string IDNA works by allowing applications to use certain ASCII string
skipping to change at page 8, line 14 skipping to change at page 8, line 14
4.3.2.2. Leading Combining Marks 4.3.2.2. Leading Combining Marks
The first character of the string is examined to verify that it is The first character of the string is examined to verify that it is
not a combining mark. If it is a combining mark, the string MUST NOT not a combining mark. If it is a combining mark, the string MUST NOT
be registered. be registered.
4.3.2.3. Contextual Rules 4.3.2.3. Contextual Rules
Each code point is checked for its identification as characters Each code point is checked for its identification as characters
requiring contextual processign for registration (the list of requiring contextual processing for registration (the list of
characters appears as the combination of CONTEXTJ and CONTEXTO in characters appears as the combination of CONTEXTJ and CONTEXTO in
[IDNA2008-Tables]). If that indication appears, the table of [IDNA2008-Tables]). If that indication appears, the table of
contextual rules is checked for a rule for that character. If no contextual rules is checked for a rule for that character. If no
rule is found, the proposed label is rejected and MUST NOT be rule is found, the proposed label is rejected and MUST NOT be
installed in a zone file. If one is found, it is applied (typically installed in a zone file. If one is found, it is applied (typically
as a test on the entire label or on adjacent characters). If the as a test on the entire label or on adjacent characters). If the
application of the rule does not conclude that the character is valid application of the rule does not conclude that the character is valid
in context, the proposed label MUST BE rejected. (See the IANA in context, the proposed label MUST BE rejected. (See the IANA
Considerations: IDNA Context Registry section of [IDNA2008-Rationale] Considerations: IDNA Context Registry section of [IDNA2008-Rationale]
and Appendix A of this document.) and Appendix A of this document.)
4.3.2.4. Labels Containing Characters Written Right to Left 4.3.2.4. Labels Containing Characters Written Right to Left
Additional special tests for right-to-left strings are applied (See Additional special tests for right-to-left strings are applied (See
[IDNA2008-BIDI]. Strings that contain right to left characters that [IDNA2008-BIDI]. Strings that contain right to left characters that
do not conform to the rule(s) identified there MUST NOT be inserted do not conform to the rule(s) identified there MUST NOT be inserted
in zone files. in zone files.
[[anchor14: If the bidi specification continues to specify checking
more than one label, this subsection will need to be revised and/or
moved to a separate "FQDN validation" section.]]
4.3.3. Registration Validation Summary 4.3.3. Registration Validation Summary
Strings that have been produced by the steps above, and whose Strings that have been produced by the steps above, and whose
contents pass the above tests, are U-labels. contents pass the above tests, are U-labels.
To summarize, tests are made here for invalid characters, invalid To summarize, tests are made here for invalid characters, invalid
combinations of characters, and for labels that are invalid even if combinations of characters, and for labels that are invalid even if
the characters they contain are valid individually. For example, the characters they contain are valid individually. For example,
labels containing invisible ("zero-width") characters may be labels containing invisible ("zero-width") characters may be
skipping to change at page 9, line 45 skipping to change at page 10, line 4
scripts and accommodate new versions of Unicode without introducing scripts and accommodate new versions of Unicode without introducing
ambiguity into domain name processing. ambiguity into domain name processing.
5.1. Label String Input 5.1. Label String Input
The user supplies a string in the local character set, typically by The user supplies a string in the local character set, typically by
typing it or clicking on, or copying and pasting, a resource typing it or clicking on, or copying and pasting, a resource
identifier, e.g., a URI [RFC3986] or IRI [RFC3987] from which the identifier, e.g., a URI [RFC3986] or IRI [RFC3987] from which the
domain name is extracted. Or some process not directly involving the domain name is extracted. Or some process not directly involving the
user may read the string from a file or obtain it in some other way. user may read the string from a file or obtain it in some other way.
Processing in this step and the next two are local matters, to be Processing in this step and the next two are local matters, to be
accomplished prior to actual invocation of IDNA, but at least these accomplished prior to actual invocation of IDNA, but at least these
two steps must be accomplished in some way. two steps must be accomplished in some way.
5.2. Conversion to Unicode 5.2. Conversion to Unicode
The local character set, character coding conventions, and, as The string is converted from the local character set into Unicode, if
necessary, display and presentation conventions, are converted to it is not already Unicode. The exact nature of this conversion is
Unicode (without surrogates), paralleling the process described above beyond the scope of this document, but may involve normalization, as
in Section 4.2. described in Section 4.2.
5.3. Character Changes in Preprocessing or the User Interface 5.3. Character Changes in Preprocessing or the User Interface
The Unicode string MAY then be processed, in a way specific to the The Unicode string MAY then be processed, in a way specific to the
local environment, to make the result of the IDNA processing match local environment, to make the result of the IDNA processing match
user expectations. For instance, it would be reasonable, at this user expectations. For instance, it would be reasonable, at this
step, to convert all upper case characters to lower case, if this step, to convert all upper case characters to lower case, if this
makes sense in the user's environment. makes sense in the user's environment.
Other examples of processing for localization might be applied, if Other examples of processing for localization might be applied, if
skipping to change at page 10, line 35 skipping to change at page 10, line 39
character into the one form permitted in labels, or giving special character into the one form permitted in labels, or giving special
treatment to characters whose presentation forms are dependent only treatment to characters whose presentation forms are dependent only
on placement in the label. Such localization changes are even on placement in the label. Such localization changes are even
further outside the scope of this specification than the ones further outside the scope of this specification than the ones
mentioned above. mentioned above.
Recommendations for preprocessing for global contexts (i.e., when Recommendations for preprocessing for global contexts (i.e., when
local considerations do not apply or cannot be used) and for maximum local considerations do not apply or cannot be used) and for maximum
interoperability with labels that might have been specified under interoperability with labels that might have been specified under
liberal readings of IDNA2003 are given in [IDNA2008-Rationale]. liberal readings of IDNA2003 are given in [IDNA2008-Rationale].
[[anchor16: The question of preprocessing remains controversial in
[[anchor17: The question of preprocessing remains controversial in
the WG. One school of thought is that, for compatibility with the WG. One school of thought is that, for compatibility with
IDNA2003, preprocessing should be standardized and required, with IDNA2003, preprocessing should be standardized and required, with
only one form permitted. Another sees important advantages in having only one form permitted. Another sees important advantages in having
the mappings between U-labels and A-labels be symmetric, unambiguous, the mappings between U-labels and A-labels be symmetric, unambiguous,
and information-preserving. And a third believes that local mappings and information-preserving. And a third believes that local mappings
will occur regardless of what we specify and that it is better to will occur regardless of what we specify and that it is better to
specify the protocol on that basis than to indirectly encourage local specify the protocol on that basis than to indirectly encourage local
inventions. The first group (and perhaps others) believe that local inventions. The first group (and perhaps others) believe that local
mappings will be, to put it mildly, "very bad... for mappings will be, to put it mildly, "very bad... for
interoperability.]] interoperability.]]
skipping to change at page 12, line 8 skipping to change at page 12, line 13
present. present.
o Labels containing other code points that are shown in the o Labels containing other code points that are shown in the
permitted character table as requiring a contextual rule permitted character table as requiring a contextual rule
("CONTEXTO" in the tables), but for which no such rule appears in ("CONTEXTO" in the tables), but for which no such rule appears in
the table of rules. With the exception in the rule immediately the table of rules. With the exception in the rule immediately
above, applications resolving DNS names or carrying out equivalent above, applications resolving DNS names or carrying out equivalent
operations are not required to test contextual rules, only to operations are not required to test contextual rules, only to
verify that a rule exists. verify that a rule exists.
o Labels whose first character is a combining mark. [[anchor18: Note o Labels whose first character is a combining mark. [[anchor19: Note
in Draft: this definition may need to be further tightened.]] in Draft: this definition may need to be further tightened.]]
In addition, the application SHOULD apply the following test. The In addition, the application SHOULD apply the following test. The
test may be omitted in special circumstances, such as when the test may be omitted in special circumstances, such as when the
resolver application knows that the conditions are enforced resolver application knows that the conditions are enforced
elsewhere, because an attempt to resolve such strings will almost elsewhere, because an attempt to resolve such strings will almost
certainly lead to a DNS lookup failure. However, applying the test certainly lead to a DNS lookup failure. However, applying the test
is likely to give much better information about the reason for a is likely to give much better information about the reason for a
lookup failure -- information that may be usefully passed to the user lookup failure -- information that may be usefully passed to the user
when that is feasible -- then DNS resolution failure alone. when that is feasible -- then DNS resolution failure alone.
[[anchor19: Should this be a MUST? Pro: this is the only remaining [[anchor20: Should this be a MUST? Pro: this is the only remaining
SHOULD (true?), the test is relatively straightforward, and it helps SHOULD (true?), the test is relatively straightforward, and it helps
avoid visual ambiguity. Con: the "special circumstances" that might avoid visual ambiguity. Con: the "special circumstances" that might
justify doing something different are explained above.]] justify doing something different are explained above.]]
o Verification that the string is compliant with the requirements o Verification that the string is compliant with the requirements
for right to left characters, specified in [IDNA2008-BIDI]. for right to left characters, specified in [IDNA2008-BIDI].
For all other strings, the resolver MUST rely on the presence or For all other strings, the resolver MUST rely on the presence or
absence of labels in the DNS to determine the validity of those absence of labels in the DNS to determine the validity of those
labels and the validity of the characters they contain. If they are labels and the validity of the characters they contain. If they are
skipping to change at page 14, line 5 skipping to change at page 14, line 8
domain name containing A-labels or conventional LDH-labels, not domain name containing A-labels or conventional LDH-labels, not
U-labels. In the presence of DNSSEC, no form of a zone file or query U-labels. In the presence of DNSSEC, no form of a zone file or query
response that contains a U-label may be signed or the signature response that contains a U-label may be signed or the signature
validated. validated.
One consequence of this for sites deploying IDNA in the presence of One consequence of this for sites deploying IDNA in the presence of
DNSSEC is that any special purpose proxies or forwarders used to DNSSEC is that any special purpose proxies or forwarders used to
transform user input into IDNs must be earlier in the resolution flow transform user input into IDNs must be earlier in the resolution flow
than DNSSEC authenticating nameservers for DNSSEC to work. than DNSSEC authenticating nameservers for DNSSEC to work.
6.3. Root Server Considerations 6.3. Root and other DNS Server Considerations
IDNs in A-label form will generally be somewhat longer than current IDNs in A-label form will generally be somewhat longer than current
domain names, so the bandwidth needed by the root servers is likely domain names, so the bandwidth needed by the root servers is likely
to go up by a small amount. Also, queries and responses for IDNs to go up by a small amount. Also, queries and responses for IDNs
will probably be somewhat longer than typical queries today, so more will probably be somewhat longer than typical queries historically,
queries and responses may be forced to go to TCP instead of UDP. so EDNS0 [RFC2671] support may be more important (otherwise, queries
and responses may be forced to go to TCP instead of UDP).
7. Security Considerations 7. Security Considerations
The general security principles and issues for IDNA appear in The general security principles and issues for IDNA appear in
[IDNA2008-Rationale]. The comments below are specific to this pair [IDNA2008-Rationale]. The comments below are specific to this pair
of protocols, but should be read in the context of that material and of protocols, but should be read in the context of that material and
the definitions and specifications, identified there, on which this the definitions and specifications, identified there, on which this
one depends. one depends.
This memo describes procedures for registering and looking up labels This memo describes procedures for registering and looking up labels
skipping to change at page 15, line 7 skipping to change at page 15, line 12
that this has caused any practical problems since RFC 3490 was that this has caused any practical problems since RFC 3490 was
adopted, but the risk still exists in principle. adopted, but the risk still exists in principle.
8. IANA Considerations 8. IANA Considerations
IANA actions for this version of IDNA are specified in IANA actions for this version of IDNA are specified in
[IDNA2008-Rationale]. [IDNA2008-Rationale].
9. Change Log 9. Change Log
[[anchor25: RFC Editor: Please remove this section.]] [[anchor26: RFC Editor: Please remove this section.]]
This document started as a follow-on to the pre-WG
draft-klensih-idnabis-protocol. Subsections that describe changes in
pre-WG drafts will be removed in version -03.
9.1. Version -00 of draft-klensin-idnabis-protocol 9.1. Version -00 of draft-klensin-idnabis-protocol
Version -00 of this draft was produced in November 2007 by moving Version -00 of this draft was produced in November 2007 by moving
text from draft-klensin-idnabis-issues and by copy considerable text text from draft-klensin-idnabis-issues and by copy considerable text
from RFC 3490. The result was then extensively edited. from RFC 3490. The result was then extensively edited.
9.2. Versions -01 and -02 of draft-klensin-idnabis-protocol 9.2. Versions -01 and -02 of draft-klensin-idnabis-protocol
These versions reflected a number of editorial changes, some of them These versions reflected a number of editorial changes, some of them
skipping to change at page 16, line 26 skipping to change at page 16, line 36
o Inserted a test for double-hyphen here. o Inserted a test for double-hyphen here.
9.6. Version -01 of draft-ietf-idnabis-protocol 9.6. Version -01 of draft-ietf-idnabis-protocol
o Corrected discussion of SRV records. o Corrected discussion of SRV records.
o Several small corrections for clarity. o Several small corrections for clarity.
o Inserted more "open issue" placeholders. o Inserted more "open issue" placeholders.
9.7. Version -02 of draft-ietf-idnabis-protocol
o Rewrote the "conversion to Unicode" text in Section 5.2 as
requested on-list.
o Added a comment (and reference) about EDNS0 to the "DNS Server
Conventions" section, which was also retitled.
o Made several editorial corrections and improvements in response to
various comments.
o Added several new discussion placeholder anchors and updated some
older ones.
10. Contributors 10. Contributors
While the listed editor held the pen, the original versions of this While the listed editor held the pen, the original versions of this
document represent the joint work and conclusions of an ad hoc design document represent the joint work and conclusions of an ad hoc design
team consisting of the editor and, in alphabetic order, Harald team consisting of the editor and, in alphabetic order, Harald
Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document
draws significantly on the original version of IDNA [RFC3490] both draws significantly on the original version of IDNA [RFC3490] both
conceptually and for specific text. This second-generation version conceptually and for specific text. This second-generation version
would not have been possible without the work that went into that would not have been possible without the work that went into that
first version and its authors, Patrik Faltstrom, Paul Hoffman, and first version and its authors, Patrik Faltstrom, Paul Hoffman, and
skipping to change at page 17, line 5 skipping to change at page 17, line 30
This revision to IDNA would have been impossible without the This revision to IDNA would have been impossible without the
accumulated experience since RFC 3490 was published and resulting accumulated experience since RFC 3490 was published and resulting
comments and complaints of many people in the IETF, ICANN, and other comments and complaints of many people in the IETF, ICANN, and other
communities, too many people to list here. Nor would it have been communities, too many people to list here. Nor would it have been
possible without RFC 3490 itself and the efforts of the Working Group possible without RFC 3490 itself and the efforts of the Working Group
that defined it. Those people whose contributions are acknowledged that defined it. Those people whose contributions are acknowledged
in RFC 3490, [RFC4690], and [IDNA2008-Rationale] were particularly in RFC 3490, [RFC4690], and [IDNA2008-Rationale] were particularly
important. important.
Specific textual changes were incorporated into this document after
suggestions from Stephane Bortzmeyer, Mark Davis, and others.
12. References 12. References
12.1. Normative References 12.1. Normative References
[IDNA2008-BIDI] [IDNA2008-BIDI]
Alvestrand, H. and C. Karp, "An updated IDNA criterion for Alvestrand, H. and C. Karp, "An updated IDNA criterion for
right-to-left scripts", January 2008, <http:// right-to-left scripts", July 2008, <https://
www.ietf.org/internet-drafts/ datatracker.ietf.org/drafts/draft-ietf-idnabis-bidi/>.
draft-alvestrand-idna-bidi-03.txt>.
[IDNA2008-Rationale] [IDNA2008-Rationale]
Klensin, J., Ed., "Internationalizing Domain Names for Klensin, J., Ed., "Internationalizing Domain Names for
Applications (IDNA): Issues, Explanation, and Rationale", Applications (IDNA): Issues, Explanation, and Rationale",
February 2008, <http://www.ietf.org/internet-drafts/ July 2008, <https://datatracker.ietf.org/drafts/
draft-klensin-idnabis-issues-06.txt>. draft-ietf-idnabis-rationale>.
[IDNA2008-Tables] [IDNA2008-Tables]
Faltstrom, P., "The Unicode Codepoints and IDN", Faltstrom, P., "The Unicode Codepoints and IDNA",
February 2008, <http://stupid.domain.name/idnabis/ July 2008, <https://datatracker.ietf.org/drafts/
draft-faltstrom-idnabis-tables-04.txt>. draft-ietf-idnabis-tables/>.
A version of this document, is available in HTML format at A version of this document is available in HTML format at
http://stupid.domain.name/idnabis/ http://stupid.domain.name/idnabis/
draft-faltstrom-idnabis-tables-04.html draft-ietf-idnabis-tables-02.html
[RFC1034] Mockapetris, P., "Domain names - concepts and facilities", [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
STD 13, RFC 1034, November 1987. STD 13, RFC 1034, November 1987.
[RFC1035] Mockapetris, P., "Domain names - implementation and [RFC1035] Mockapetris, P., "Domain names - implementation and
specification", STD 13, RFC 1035, November 1987. specification", STD 13, RFC 1035, November 1987.
[RFC1123] Braden, R., "Requirements for Internet Hosts - Application [RFC1123] Braden, R., "Requirements for Internet Hosts - Application
and Support", STD 3, RFC 1123, October 1989. and Support", STD 3, RFC 1123, October 1989.
skipping to change at page 18, line 38 skipping to change at page 19, line 19
[RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
"Dynamic Updates in the Domain Name System (DNS UPDATE)", "Dynamic Updates in the Domain Name System (DNS UPDATE)",
RFC 2136, April 1997. RFC 2136, April 1997.
[RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
Specification", RFC 2181, July 1997. Specification", RFC 2181, July 1997.
[RFC2535] Eastlake, D., "Domain Name System Security Extensions", [RFC2535] Eastlake, D., "Domain Name System Security Extensions",
RFC 2535, March 1999. RFC 2535, March 1999.
[RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
RFC 2671, August 1999.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)", "Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003. RFC 3490, March 2003.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66, Resource Identifier (URI): Generic Syntax", STD 66,
RFC 3986, January 2005. RFC 3986, January 2005.
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRIs)", RFC 3987, January 2005. Identifiers (IRIs)", RFC 3987, January 2005.
skipping to change at page 19, line 15 skipping to change at page 19, line 47
[RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
Internationalized Email", RFC 4952, July 2007. Internationalized Email", RFC 4952, July 2007.
[Unicode] The Unicode Consortium, "The Unicode Standard, Version [Unicode] The Unicode Consortium, "The Unicode Standard, Version
5.0", 2007. 5.0", 2007.
Boston, MA, USA: Addison-Wesley. ISBN 0-321-48091-0 Boston, MA, USA: Addison-Wesley. ISBN 0-321-48091-0
Appendix A. The Contextual Rules Registry Appendix A. The Contextual Rules Registry
[[anchor36: Note in Draft: the WG needs to figure out whether this [[anchor38: Note in Draft: The WG seems to be concluding that this
table stays as part of this document, is moved to a separate one, or material should actually be in the Tables document, possibly with
is incorporated into "tables". Regardless of where they are placed, some additional material added from Rationale. Unless there are
the WG will still need to review the specific content of the rules. objections and consensus on some other plan, that move will be made
In this version of the document, the table remains something of a with -03 of this document. Regardless of where they are placed, the
WG will still need to review the specific content of the rules. In
this version of the document, the table remains something of a
illustrative placeholder, not a final specification.]] illustrative placeholder, not a final specification.]]
[[anchor39: The next appendix sketches out an alternate way to
present this information. See the notes there.]]
As discussed in the IANA Considerations section of As discussed in the IANA Considerations section of
[IDNA2008-Rationale], a registry of rules that define the contexts in [IDNA2008-Rationale], a registry of rules that define the contexts in
which particular PROTOCOL-VALID characters, characters associated which particular PROTOCOL-VALID characters, characters associated
with a requirement for Contextual Information, are permitted. These with a requirement for Contextual Information, are permitted. These
rules are expressed as tests on the label in which the characters rules are expressed as tests on the label in which the characters
appear (all, or any part of, the label may be tested). appear (all, or any part of, the label may be tested). [[anchor40:
Probably the IANA registry spec should be moved directly from
Rationale to Tables -- see above.]]
For each character specified as requiring a contextual rule, a rule For each character specified as requiring a contextual rule, a rule
MAY be established with the following data elements: MAY be established with the following data elements:
1. The code point associated with the character. 1. The code point associated with the character.
2. The name of the character. 2. The name of the character.
3. An indication as to whether the code point requires the rule be 3. An indication as to whether the code point requires the rule be
processed at lookup time (this indication is equivalent to the processed at lookup time (this indication is equivalent to the
skipping to change at page 20, line 9 skipping to change at page 20, line 48
and the Unicode Property Value Aliases list and the Unicode Property Value Aliases list
[Unicode-PropertyValueAliases]. Note that in these regular [Unicode-PropertyValueAliases]. Note that in these regular
expressions, the label is taken to be an entire line, i.e., "^" expressions, the label is taken to be an entire line, i.e., "^"
refers to the beginning of the label and "$" refers to the end of refers to the beginning of the label and "$" refers to the end of
the label. the label.
These regular expressions are used as tests. The contextual These regular expressions are used as tests. The contextual
requirement is met if there is a match for the regular expression requirement is met if there is a match for the regular expression
and not met if there is no match. and not met if there is no match.
[[anchor37: Patrik and I (JcK) would like to find a way to state [[anchor41: Patrik and I (JcK) would like to find a way to state
these rules that does not require the reader and implementer to these rules that does not require the reader and implementer to
understand what we believe to be a fairly exotic element of the understand what we believe to be a fairly exotic element of the
Unicode specification. Suggestions welcome.]] Unicode specification. See the second Appendix for a possible
alternative. Suggestions welcome.]]
6. An optional comment preceded by "#" 6. An optional comment preceded by "#"
Should there be any conflict between the two statements of a rule, Should there be any conflict between the two statements of a rule,
the regular expression form MUST be considered normative until the the regular expression form MUST be considered normative until the
registry can be corrected. registry can be corrected.
The rules for the characters listed in the Tables document as The rules for the characters listed in the Tables document as
exception cases or Join_Controls and for which rules are being exception cases or Join_Controls and for which rules are being
defined at this time appear below. defined at this time appear below.
[[anchor38: Note in draft: This table is not complete and the rule [[anchor42: Note in draft: This table is not complete and the rule
entries below are temporarily only examples.]] entries below are temporarily only examples.]]
002D; HYPHEN-MINUS; F; 002D; HYPHEN-MINUS; F;
Must not appear at the beginning or end of a label; Must not appear at the beginning or end of a label;
Regular expression: Regular expression:
[^^]\u002D|\u00SD[^$] ; [^^]\u002D|\u00SD[^$] ;
# Note that a prohibition on having two hyphens as the third and # Note that there are some additional prohibitions in the
fourth characters of anything but a valid A-label appears in the specification on consecutive hyphens in anything but a valid
specification. A-label.
200C; ZERO WIDTH NON-JOINER; T; 200C; ZERO WIDTH NON-JOINER; T;
Between two characters from the same script only. The script must Between two characters from the same script only. The script must
be one in which the use of this character causes significant be one in which the use of this character causes significant
visual transformation of one or both of the adjacent characters; visual transformation of one or both of the adjacent characters;
Regular expression: Regular expression:
[\p(Script:Deva)\p(Script:Tamil)]\u200C[\p(Script:Deva)\p(Script: [\p(Script:Deva)\p(Script:Tamil)]\u200C[\p(Script:Deva)\p(Script:
Tamil)] ; Tamil)] ;
[[anchor39: That script list is _not_ complete and, in particular, [[anchor43: That script list is _not_ complete and, in particular,
more Indic scripts certainly need to be listed. It also does not more Indic scripts certainly need to be listed. It also does not
correctly express the "same script" restriction mentioned in the correctly express the "same script" restriction mentioned in the
prose, since it only tests adjacent characters. Whether this prose, since it only tests adjacent characters.]] This character
character is required for Arabic script, and with what is also required for Arabic script. The minimal restriction is
restrictions if it is, is under discussion in the WG and in other \p(Joining_Type:L)\p(Joining_Type:T)*\u200C\p(Joining_Type:
forums. It is clear that a Unicode derived property for script T)*\p(Joining_Type:R) ;
groups that would permit testing, e.g., "Indic Script", would be ; more narrow restrictions may be suggested by the Arabic script
very helpful here.]] group.
200D; ZERO WIDTH JOINER; T; 200D; ZERO WIDTH JOINER; T;
Between two characters from the same script only. The script must Between two characters from the same script only. The script must
be one in which the use of this character causes significant be one in which the use of this character causes significant
visual transformation of one or both of the adjacent characters; visual transformation of one or both of the adjacent characters;
Regular expression: Regular expression:
[\p(Script:Deva)\p(Script:Tamil)]+ [\p(Script:Deva)\p(Script:Tamil)]+
\u200D[\p(Script:Deva)\p(Script:Tamil)]+ ; \u200D[\p(Script:Deva)\p(Script:Tamil)]+ ;
[[anchor40: That script list is _not_ complete and, in particular, [[anchor44: That script list is _not_ complete and, in particular,
more Indic scripts certainly need to be listed. It also does not more Indic scripts certainly need to be listed. It also does not
correctly express the "same script" restriction mentioned in the correctly express the "same script" restriction mentioned in the
prose, since it only tests adjacent characters. Whether this prose, since it only tests adjacent characters. This character is
character is required for Arabic script, and with what not required for Arabic script.]]
restrictions if it is, is under discussion in the WG and in other
forums.]]
00B7; MIDDLE DOT; F; 00B7; MIDDLE DOT; F;
Between two 'l' (U+006C) characters only, used to permit the Between two 'l' (U+006C) characters only, used to permit the
Catalan character ela geminada to be expressed; Catalan character ela geminada to be expressed;
Regular expression: Regular expression:
\u006C\u00B7\u006c ; \u006C\u00B7\u006c ;
0375; GREEK LOWER NUMERAL SIGN (KERAIA); F; 0375; GREEK LOWER NUMERAL SIGN (KERAIA); F;
Greek script only. Might be further restricted to specific Greek script only. Might be further restricted to specific
following characters; following characters;
skipping to change at page 21, line 39 skipping to change at page 22, line 29
\0375\(Script:Greek) ; \0375\(Script:Greek) ;
02B9; MODIFIER LETTER PRIME; F;;; 02B9; MODIFIER LETTER PRIME; F;;;
# Permitted only in contexts in which GREEK LOWER NUMERAL SIGN, # Permitted only in contexts in which GREEK LOWER NUMERAL SIGN,
U+0375, is permitted. GREEK NUMERAL SIGN, U+0374, and the Lower U+0375, is permitted. GREEK NUMERAL SIGN, U+0374, and the Lower
Numeral Sign (U+0375) are indicators for numeric use of letters in Numeral Sign (U+0375) are indicators for numeric use of letters in
older Greek writing systems. U+02B9 is relevant because older Greek writing systems. U+02B9 is relevant because
normalization maps U+0374 into it.; normalization maps U+0374 into it.;
Regular expression: Regular expression:
\(Script:Greek)\02B9\(Script:Greek) ; \(Script:Greek)\02B9\(Script:Greek) ;
[[anchor41: The test is that the adjacent characters be in the [[anchor45: The test is that the adjacent characters be in the
Greek script. It is not clear whether this is sufficient. The Greek script. It is not clear whether this is sufficient. The
requirement for a preceding Greek letter may not be necessary. requirement for a preceding Greek letter may not be necessary.
More input needed.]] More input needed.]]
0483; COMBINING CYRILLIC TITLO; F; 0483; COMBINING CYRILLIC TITLO; F;
Cyrillic script only. Might be further restricted to permit only Cyrillic script only. Might be further restricted to permit only
a preceding list of characters. a preceding list of characters.
Regular expression: Regular expression:
\p(Script:Cyrillic)\u0483 ; \p(Script:Cyrillic)\u0483 ;
05F3; HEBREW PUNCTUATION GERESH; F; 05F3; HEBREW PUNCTUATION GERESH; F;
The script of the preceding character and the subsequent The script of the preceding character and the subsequent
character, if any, MUST be Hebrew; character, if any, MUST be Hebrew;
Regular expression: Regular expression:
\p(Script:Hebrew)\u05F3\p(Script:Hebrew)? ; \p(Script:Hebrew)\u05F3\p(Script:Hebrew)? ;
05F4; HEBREW PUNCTUATION GERSHAYIM; F 05F4; HEBREW PUNCTUATION GERSHAYIM; F
The script of the preceding character and the subsequent The script of the preceding character and the subsequent
character, if any, MUST be Hebrew; character, if any, MUST be Hebrew;
Regular expression: Regular expression:
skipping to change at page 23, line 5 skipping to change at page 23, line 29
\p(Script:Kana)\u30FB\p(Script:Kana) ; \p(Script:Kana)\u30FB\p(Script:Kana) ;
While the information above is to be used to initialize the registry, While the information above is to be used to initialize the registry,
IANA should treat the table format in this Appendix simply as an IANA should treat the table format in this Appendix simply as an
initial, tentative, suggestion. Subject to review and comment from initial, tentative, suggestion. Subject to review and comment from
the IESG and any Expert Reviewers, IANA is responsible for, and the IESG and any Expert Reviewers, IANA is responsible for, and
should develop, a format for that registry, or a copy of it should develop, a format for that registry, or a copy of it
maintained in parallel, that is convenient for retrieval and machine maintained in parallel, that is convenient for retrieval and machine
processing and publish the location of that version. processing and publish the location of that version.
Appendix B. Contextual Rules Registry - Alternate Syntax
[[anchor46: This Appendix is temporary. It illustrates, for
discussion, a possible way of presenting the Contextual Rules as a
procedural pseudocode rule set rather than as a regular expression or
property list and also shows a bit of the layout suggested by Mark
Davis. Each entry consists of the name for identification, followed
by an informal description, the code point, and the rule set. Note
that the two appendices are alternate forms of the same information;
only one should be moved to Tablss; the other will be deleted.]]
[[anchor47: The grammatical rules and operations for the pseudocode
below are left as an exercise for the reader in this draft. Note
however that the "Before" and "After" operations, by themselves,
match anything including null, i.e., BeforeScript would match any
script if the character was the first one in the label. Obviously,
if something satisfies all of the rules, then it is contextually
valid. If any of them yield "False" than it isn't. If we decide to
go in this direction, we should form a small ad hoc committee to
either sort that out or possibly convert it to standard Prolog.]]
B.1. HYPHEN-MINUS
Code point: 002D
Overview: Must appear at the beginning or end of a label.
Lookup: False
Rule Set:
If FirstChar .eq. True Then False;
If LastChar .eq. Then False;
Else True;
Comment: Note that there are some additional prohibitions in the
specification on consecutive hyphens in anything but a valid
A-label.
B.2. ZERO WIDTH NON-JOINER
Code point: 200C
Overview: Between two characters from the same script only. The
script must be one in which the use of this character causes
significant visual transformation of one or both of the adjacent
characters.
Lookup: True
Rule Set:
If BeforeScript .eq. ( Deva | Tamil | Arabic ) Then
If AfterScript .eq. ( Deva | Tamil | Arabic ) Then True;
Else False;
[[anchor50: That script list is _not_ complete and, in particular,
more Indic scripts certainly need to be listed. It also does not
correctly express the "same script" restriction mentioned in the
prose, since it only tests adjacent characters.]]
This character is also required for Arabic script. The minimal
restriction (in regex form) is
\p(Joining_Type:L)\p(Joining_Type:T)*\u200C\p(Joining_Type:
T)*\p(Joining_Type:R) ;
; more narrow restrictions may be suggested by the Arabic script
group.
B.3. ZERO WIDTH JOINER
Code point: 200D
Overview: Between two characters from the same script only. The
script must be one in which the use of this character causes
significant visual transformation of one or both of the adjacent
characters.
Lookup: True
Rule Set:
If BeforeScript .eq. ( Deva | Tamil | Arabic ) Then
If AfterScript .eq. ( Deva | Tamil | Arabic ) Then True;
Else False;
[[anchor52: The script list for this character is _not_ complete
and, in particular, more Indic scripts certainly need to be
listed. It also does not correctly express the "same script"
restriction mentioned in the prose, since it only tests adjacent
characters. This character is not required for Arabic script.]]
B.4. MIDDLE DOT
Code point: 00B7
Overview: Between 'l' (U+006C) characters only, used to permit the
Catalan character ela geminada to be expressed
Lookup: False
Rule Set:
If BeforeChar .eq. \006C Then
If AfterChar .eq. \006C Then True;
Else False;
B.5. GREEK LOWER NUMERAL SIGN (KERAIA)
Code point: 0375
Overview: Greek script only. Might be further restricted to
specific following characters
Lookup: False
Rule Set:
If AfterScript .eq. Greek Then True;
Else False;
B.6. MODIFIER LETTER PRIME
Code point: 02B9
Overview: Permitted only in contexts in which GREEK LOWER NUMERAL
SIGN, U+0375, is permitted. GREEK NUMERAL SIGN, U+0374, and the
Lower Numeral Sign (U+0375) are indicators for numeric use of
letters in older Greek writing systems. U+02B9 is relevant
because normalization maps U+0374 into it.
Lookup: False
Rule Set:
BeforeScript If .eq. Greek Then
If AfterScript .eq. Greek Then True;
Else False;
Comment: [[anchor56: The test is that the adjacent characters be in
the Greek script. It is not clear whether this is sufficient.
The requirement for a preceding Greek letter may not be necessary.
More input needed.]]
B.7. COMBINING CYRILLIC TITLO
Code point: 0483
Overview: Cyrillic script only. Might be further restricted to
permit only a preceding list of characters.
Lookup: False
Rule Set:
If BeforeScript .eq. Cyrillic Then
If AfterScript .eq. Cyrillic Then True;
Else False;
B.8. HEBREW PUNCTUATION GERESH
Code point: 05F3
Overview: The script of the preceding character and the subsequent
character, if any, MUST be Hebrew.
Lookup: False
Rule Set:
If FirstChar .eq. True then False;
Else If BeforeScript .eq. Hebrew Then
If AfterScript .eq. Hebrew Then True;
Else False;
B.9. HEBREW PUNCTUATION GERSHAYIM
Code point: 05F4
Overview: The script of the preceding character and the subsequent
character, if any, MUST be Hebrew.
Lookup: False
Rule Set:
If FirstChar .eq. True then False;
Else If BeforeScript .eq. Hebrew Then
If AfterScript .eq. Hebrew Then True;
Else False;
B.10. IDEOGRAPHIC ITERATION MARK;
Code point: 3005
Overview: MUST NOT be at the beginning of the label, and the
previous character MUST be in Han Script.
Lookup: False
Rule Set:
If FirstChar .eq. True Then False;
Else If BeforeScript .eq. Han Then True;
Else False;
B.11. VERTICAL IDEOGRAPHIC ITERATION MARK
Code point: 303B
Overview: MUST NOT be at the beginning of the label, and the
previous character MUST be in Han Script.
Lookup: False
Rule Set:
If FirstChar .eq. True Then False;
Else If BeforeScript .eq. Han Then True;
Else False;
B.12. KATAKANA MIDDLE DOT
Code point: 30FB
Overview: Adjacent characters MUST be Katakana.
Lookup: False
Rule Set:
If FirstChar .eq. True Then False;
Else If BeforeScript .eq. Kana Then
If AfterScript .eq. Kana Then True;
Else False;
Author's Address Author's Address
John C Klensin (editor) John C Klensin
(editor)
1770 Massachusetts Ave, Ste 322 1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140 Cambridge, MA 02140
USA USA
Phone: +1 617 245 1457 Phone: +1 617 245 1457
Fax:
Email: john+ietf@jck.com Email: john+ietf@jck.com
URI:
Full Copyright Statement Full Copyright Statement
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors contained in BCP 78, and except as set forth therein, the authors
retain all their rights. retain all their rights.
This document and the information contained herein are provided on an This document and the information contained herein are provided on an
 End of changes. 44 change blocks. 
61 lines changed or deleted 345 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/