draft-ietf-idnabis-protocol-02.txt   draft-ietf-idnabis-protocol-03.txt 
Network Working Group J. Klensin, Ed. Network Working Group J. Klensin
Obsoletes: 3490 (if approved) Obsoletes: 3490 (if approved)
Intended status: Standards Track Intended status: Standards Track
Expires: January 15, 2009 Expires: January 28, 2009
Internationalized Domain Names in Applications (IDNA): Protocol Internationalized Domain Names in Applications (IDNA): Protocol
draft-ietf-idnabis-protocol-02.txt draft-ietf-idnabis-protocol-03.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 15, 2009. This Internet-Draft will expire on January 28, 2009.
Abstract Abstract
This document supplies the protocol definition for a revised and This document supplies the protocol definition for a revised and
updated specification for internationalized domain names (IDNs). The updated specification for internationalized domain names (IDNs). The
rationale for these changes, the relationship to the older rationale for these changes, the relationship to the older
specification, and important terminology are provided in other specification, and important terminology are provided in other
documents. This document specifies the protocol mechanism, called documents. This document specifies the protocol mechanism, called
Internationalizing Domain Names in Applications (IDNA), for Internationalizing Domain Names in Applications (IDNA), for
registering and looking up IDNs in a way that does not require registering and looking up IDNs in a way that does not require
skipping to change at page 2, line 26 skipping to change at page 2, line 26
4.1. Proposed label . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Proposed label . . . . . . . . . . . . . . . . . . . . . . 6
4.2. Conversion to Unicode and Normalization . . . . . . . . . 7 4.2. Conversion to Unicode and Normalization . . . . . . . . . 7
4.3. Permitted Character and Label Validation . . . . . . . . . 7 4.3. Permitted Character and Label Validation . . . . . . . . . 7
4.3.1. Rejection of Characters that are not Permitted . . . . 7 4.3.1. Rejection of Characters that are not Permitted . . . . 7
4.3.2. Label Validation . . . . . . . . . . . . . . . . . . . 7 4.3.2. Label Validation . . . . . . . . . . . . . . . . . . . 7
4.3.3. Registration Validation Summary . . . . . . . . . . . 8 4.3.3. Registration Validation Summary . . . . . . . . . . . 8
4.4. Registry Restrictions . . . . . . . . . . . . . . . . . . 9 4.4. Registry Restrictions . . . . . . . . . . . . . . . . . . 9
4.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9 4.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9
4.6. Insertion in the Zone . . . . . . . . . . . . . . . . . . 9 4.6. Insertion in the Zone . . . . . . . . . . . . . . . . . . 9
5. Domain Name Resolution (Lookup) Protocol . . . . . . . . . . . 9 5. Domain Name Resolution (Lookup) Protocol . . . . . . . . . . . 9
5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 9 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 10
5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10
5.3. Character Changes in Preprocessing or the User 5.3. Character Changes in Preprocessing or the User
Interface . . . . . . . . . . . . . . . . . . . . . . . . 10 Interface . . . . . . . . . . . . . . . . . . . . . . . . 10
5.4. A-label Input . . . . . . . . . . . . . . . . . . . . . . 11 5.4. A-label Input . . . . . . . . . . . . . . . . . . . . . . 11
5.5. Validation and Character List Testing . . . . . . . . . . 11 5.5. Validation and Character List Testing . . . . . . . . . . 11
5.6. Punycode Conversion . . . . . . . . . . . . . . . . . . . 12 5.6. Punycode Conversion . . . . . . . . . . . . . . . . . . . 13
5.7. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 12 5.7. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 13
6. Name server Considerations . . . . . . . . . . . . . . . . . . 12 6. Name Server Considerations . . . . . . . . . . . . . . . . . . 13
6.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 13 6.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 13
6.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 13 6.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 13
6.3. Root and other DNS Server Considerations . . . . . . . . . 14 6.3. Root and other DNS Server Considerations . . . . . . . . . 14
7. Security Considerations . . . . . . . . . . . . . . . . . . . 14 7. Security Considerations . . . . . . . . . . . . . . . . . . . 14
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
9. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 15 9. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 15
9.1. Version -00 of draft-klensin-idnabis-protocol . . . . . . 15 9.1. Changes between Version -00 and -01 of
9.2. Versions -01 and -02 of draft-klensin-idnabis-protocol . . 15 draft-ietf-idnabis-protocol . . . . . . . . . . . . . . . 15
9.3. Version -03 of draft-klensin-idnabis-protocol . . . . . . 15 9.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 15
9.4. Version -04 of draft-klensin-idnabis-protocol . . . . . . 15 9.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 16
9.5. Version -00 of draft-ietf-idnabis-protocol . . . . . . . . 16 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 16
9.6. Version -01 of draft-ietf-idnabis-protocol . . . . . . . . 16 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16
9.7. Version -02 of draft-ietf-idnabis-protocol . . . . . . . . 16
10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 17
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
12.1. Normative References . . . . . . . . . . . . . . . . . . . 17 12.1. Normative References . . . . . . . . . . . . . . . . . . . 17
12.2. Informative References . . . . . . . . . . . . . . . . . . 18 12.2. Informative References . . . . . . . . . . . . . . . . . . 18
Appendix A. The Contextual Rules Registry . . . . . . . . . . . . 19 Appendix A. The Contextual Rules Registry . . . . . . . . . . . . 19
Appendix B. Contextual Rules Registry - Alternate Syntax . . . . 23 Appendix B. Contextual Rules Registry - Alternate Syntax . . . . 22
B.1. HYPHEN-MINUS . . . . . . . . . . . . . . . . . . . . . . . 24 B.1. HYPHEN-MINUS . . . . . . . . . . . . . . . . . . . . . . . 23
B.2. ZERO WIDTH NON-JOINER . . . . . . . . . . . . . . . . . . 24 B.2. ZERO WIDTH NON-JOINER . . . . . . . . . . . . . . . . . . 23
B.3. ZERO WIDTH JOINER . . . . . . . . . . . . . . . . . . . . 25 B.3. ZERO WIDTH JOINER . . . . . . . . . . . . . . . . . . . . 24
B.4. MIDDLE DOT . . . . . . . . . . . . . . . . . . . . . . . . 25 B.4. MIDDLE DOT . . . . . . . . . . . . . . . . . . . . . . . . 24
B.5. GREEK LOWER NUMERAL SIGN (KERAIA) . . . . . . . . . . . . 25 B.5. GREEK LOWER NUMERAL SIGN (KERAIA) . . . . . . . . . . . . 25
B.6. MODIFIER LETTER PRIME . . . . . . . . . . . . . . . . . . 26 B.6. MODIFIER LETTER PRIME . . . . . . . . . . . . . . . . . . 25
B.7. COMBINING CYRILLIC TITLO . . . . . . . . . . . . . . . . . 26 B.7. COMBINING CYRILLIC TITLO . . . . . . . . . . . . . . . . . 26
B.8. HEBREW PUNCTUATION GERESH . . . . . . . . . . . . . . . . 27 B.8. HEBREW PUNCTUATION GERESH . . . . . . . . . . . . . . . . 26
B.9. HEBREW PUNCTUATION GERSHAYIM . . . . . . . . . . . . . . . 27 B.9. HEBREW PUNCTUATION GERSHAYIM . . . . . . . . . . . . . . . 26
B.10. IDEOGRAPHIC ITERATION MARK; . . . . . . . . . . . . . . . 27 B.10. IDEOGRAPHIC ITERATION MARK; . . . . . . . . . . . . . . . 27
B.11. VERTICAL IDEOGRAPHIC ITERATION MARK . . . . . . . . . . . 28 B.11. VERTICAL IDEOGRAPHIC ITERATION MARK . . . . . . . . . . . 27
B.12. KATAKANA MIDDLE DOT . . . . . . . . . . . . . . . . . . . 28 B.12. KATAKANA MIDDLE DOT . . . . . . . . . . . . . . . . . . . 27
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 28 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 28
Intellectual Property and Copyright Statements . . . . . . . . . . 29 Intellectual Property and Copyright Statements . . . . . . . . . . 29
1. Introduction 1. Introduction
This document supplies the protocol definition for a revised and This document supplies the protocol definition for a revised and
updated specification for internationalized domain names. The updated specification for internationalized domain names. The
rationale for these changes and relationship to the older rationale for these changes and relationship to the older
specification and some new terminology is provided in other specification and some new terminology is provided in other
documents, notably [IDNA2008-Rationale]. documents, notably [IDNA2008-Rationale].
skipping to change at page 6, line 20 skipping to change at page 6, line 20
There are currently no other exclusions on the applicability of IDNA There are currently no other exclusions on the applicability of IDNA
to DNS resource records. Applicability depends entirely on the to DNS resource records. Applicability depends entirely on the
CLASS, and not on the TYPE except as noted below. This will remain CLASS, and not on the TYPE except as noted below. This will remain
true, even as new types are defined, unless there is a compelling true, even as new types are defined, unless there is a compelling
reason for a new type that requires type-specific rules. The special reason for a new type that requires type-specific rules. The special
naming conventions applicable to SRV records are examples of type- naming conventions applicable to SRV records are examples of type-
specific rules that are incompatible with IDNA coding. Hence the specific rules that are incompatible with IDNA coding. Hence the
first two labels (the ones required to start in "_") on a record with first two labels (the ones required to start in "_") on a record with
TYPE SRV MUST NOT be A-labels or U-labels (while it would be possible TYPE SRV MUST NOT be A-labels or U-labels (while it would be possible
to write a non-ASCII string with a leading underscore, conversion to to write a non-ASCII string with a leading underscore, conversion to
an A-label would be impossible without loss of information and an A-label would be impossible without loss of information because
because the underscore is not a letter, digit, or hyphen. Of course, the underscore is not a letter, digit, or hyphen). Of course, those
those labels may be part of a domain that uses IDN labels at higher labels may be part of a domain that uses IDN labels at higher levels
levels in the tree. in the tree.
3.2.2. Non-domain-name Data Types Stored in the DNS 3.2.2. Non-domain-name Data Types Stored in the DNS
Although IDNA enables the representation of non-ASCII characters in Although IDNA enables the representation of non-ASCII characters in
domain names, that does not imply that IDNA enables the domain names, that does not imply that IDNA enables the
representation of non-ASCII characters in other data types that are representation of non-ASCII characters in other data types that are
stored in domain names, specifically in the RDATA field for types stored in domain names, specifically in the RDATA field for types
that have structured RDATA format. For example, an email address that have structured RDATA format. For example, an email address
local part is stored in a domain name in the RNAME field as part of local part is stored in a domain name in the RNAME field as part of
the RDATA of an SOA record (hostmaster@example.com would be the RDATA of an SOA record (hostmaster@example.com would be
skipping to change at page 7, line 4 skipping to change at page 7, line 4
This section defines the procedure for registering an IDN. The This section defines the procedure for registering an IDN. The
procedure is implementation independent; any sequence of steps that procedure is implementation independent; any sequence of steps that
produces exactly the same result for all labels is considered a valid produces exactly the same result for all labels is considered a valid
implementation. implementation.
4.1. Proposed label 4.1. Proposed label
The registrant submits a request for an IDN. The user typically The registrant submits a request for an IDN. The user typically
produces the request string by the keyboard entry of a character produces the request string by the keyboard entry of a character
sequence in the local native character set. The registry MAY permit sequence in the local native character set (which might, of course,
submission of labels in A-label form. If it does so, it SHOULD be Unicode). The registry MAY permit submission of labels in A-label
perform a conversion to a U-label, perform the steps and tests form. If it does so, it SHOULD perform a conversion to a U-label,
described below, and verify that the A-label produced by the step in perform the steps and tests described below, and verify that the
Section 4.5 matches the one provided as input. If, for some reason, A-label produced by the step in Section 4.5 matches the one provided
it does not, the registration MUST be rejected. as input. If, for some reason, it does not, the registration MUST be
rejected.
[[anchor9: Editorial: Should the sentences starting with "The
registry" be moved to 4.3? I.e., would they be more in sequence
there?]]
4.2. Conversion to Unicode and Normalization 4.2. Conversion to Unicode and Normalization
Some system routine, or a localized front-end to the IDNA process, Some system routine, or a localized front-end to the IDNA process,
ensures that the proposed label is a Unicode string. That string ensures that the proposed label is a Unicode string or converts it to
MUST be in Unicode Normalization Form C (NFC [Unicode-UAX15]). one as appropriate. That string MUST be in Unicode Normalization
Form C (NFC [Unicode-UAX15]).
As a local implementation choice, the implementation MAY choose to As a local implementation choice, the implementation MAY choose to
map some forbidden characters to permitted characters (for instance map some forbidden characters to permitted characters (for instance
mapping uppercase characters to lowercase ones), displaying the mapping uppercase characters to lowercase ones), displaying the
result to the user, and allowing processing to continue. However, it result to the user, and allowing processing to continue. However, it
is strongly recommended that, to avoid any possible ambiguity, is strongly recommended that, to avoid any possible ambiguity,
entities responsible for zone files ("registries") accept entities responsible for zone files ("registries") accept
registrations only for A-labels (to be converted to U-labels by the registrations only for A-labels (to be converted to U-labels by the
registry) or U-labels actually produced from A-labels, not forms registry) or U-labels actually produced from A-labels, not forms
expected to be converted by some other process. expected to be converted by some other process.
skipping to change at page 8, line 32 skipping to change at page 8, line 37
in context, the proposed label MUST BE rejected. (See the IANA in context, the proposed label MUST BE rejected. (See the IANA
Considerations: IDNA Context Registry section of [IDNA2008-Rationale] Considerations: IDNA Context Registry section of [IDNA2008-Rationale]
and Appendix A of this document.) and Appendix A of this document.)
4.3.2.4. Labels Containing Characters Written Right to Left 4.3.2.4. Labels Containing Characters Written Right to Left
Additional special tests for right-to-left strings are applied (See Additional special tests for right-to-left strings are applied (See
[IDNA2008-BIDI]. Strings that contain right to left characters that [IDNA2008-BIDI]. Strings that contain right to left characters that
do not conform to the rule(s) identified there MUST NOT be inserted do not conform to the rule(s) identified there MUST NOT be inserted
in zone files. in zone files.
[[anchor14: If the bidi specification continues to specify checking [[anchor15: If the bidi specification continues to specify checking
more than one label, this subsection will need to be revised and/or more than one label, this subsection will need to be revised and/or
moved to a separate "FQDN validation" section.]] moved to a separate "FQDN validation" section.]]
4.3.3. Registration Validation Summary 4.3.3. Registration Validation Summary
Strings that have been produced by the steps above, and whose Strings that have been produced by the steps above, and whose
contents pass the above tests, are U-labels. contents pass the above tests, are U-labels.
To summarize, tests are made here for invalid characters, invalid To summarize, tests are made here for invalid characters, invalid
combinations of characters, and for labels that are invalid even if combinations of characters, and for labels that are invalid even if
the characters they contain are valid individually. For example, the characters they contain are valid individually. For example,
labels containing invisible ("zero-width") characters may be labels containing invisible ("zero-width") characters may be
permitted in context with characters whose presentation forms are permitted in context with characters whose presentation forms are
significantly changed by the presence or absence of the zero-width significantly changed by the presence or absence of the zero-width
characters, while other labels in which zero-width characters appear characters, while other labels in which zero-width characters appear
may be rejected. may be rejected.
[[anchor17: Should the example text be removed or moved? Note that
I've been strongly encouraged to supply specific examples to reduce
abstraction and questions about the appropriateness of the text.
-JcK]]
4.4. Registry Restrictions 4.4. Registry Restrictions
Registries at all levels of the DNS, not just the top level, are Registries at all levels of the DNS, not just the top level, are
expected to establish policies about the labels that may be expected to establish policies about the labels that may be
registered, and for the processes associated with that action. While registered, and for the processes associated with that action. While
exact policies are not specified as part of IDNA2008 and it is exact policies are not specified as part of IDNA2008 and it is
expected that different registries may specify different policies, expected that different registries may specify different policies,
there SHOULD be policies. These per-registry policies and there SHOULD be policies. These per-registry policies and
restrictions are an essential element of the IDNA registration restrictions are an essential element of the IDNA registration
skipping to change at page 9, line 27 skipping to change at page 9, line 32
The string produced by the above steps is checked and processed as The string produced by the above steps is checked and processed as
appropriate to local registry restrictions. Application of those appropriate to local registry restrictions. Application of those
registry restrictions may result in the rejection of some labels or registry restrictions may result in the rejection of some labels or
the application of special restrictions to others. the application of special restrictions to others.
4.5. Punycode Conversion 4.5. Punycode Conversion
The resulting U-label is converted to an A-label (i.e., the encoding The resulting U-label is converted to an A-label (i.e., the encoding
of that label according to the Punycode algorithm [RFC3492] with the of that label according to the Punycode algorithm [RFC3492] with the
prefix included, i.e., the "xn--..." form). ACE prefix added, i.e., the "xn--..." form).
[[anchor18: Explain why 3492 failures cannot occur or explain what to
do if they do.]]
4.6. Insertion in the Zone 4.6. Insertion in the Zone
The A-label is registered in the DNS by insertion into a zone. The A-label is registered in the DNS by insertion into a zone.
5. Domain Name Resolution (Lookup) Protocol 5. Domain Name Resolution (Lookup) Protocol
Resolution is conceptually different from registration and different Resolution is conceptually different from registration and different
tests are applied on the client. Although some validity checks are tests are applied on the client. Although some validity checks are
necessary to avoid serious problems with the protocol (see necessary to avoid serious problems with the protocol (see
Section 5.5 ff.), the resolution-side tests are more permissive and Section 5.5 ff.), the resolution-side tests are more permissive and
rely heavily on the assumption that names that are present in the DNS rely heavily on the assumption that names that are present in the DNS
are valid. Among other things, this distinction, applied carefully, are valid.
facilitates expansion of the permitted character lists to include new
scripts and accommodate new versions of Unicode without introducing
ambiguity into domain name processing.
5.1. Label String Input 5.1. Label String Input
The user supplies a string in the local character set, typically by The user supplies a string in the local character set, typically by
typing it or clicking on, or copying and pasting, a resource typing it or clicking on, or copying and pasting, a resource
identifier, e.g., a URI [RFC3986] or IRI [RFC3987] from which the identifier, e.g., a URI [RFC3986] or IRI [RFC3987] from which the
domain name is extracted. Or some process not directly involving the domain name is extracted. Or some process not directly involving the
user may read the string from a file or obtain it in some other way. user may read the string from a file or obtain it in some other way.
Processing in this step and the next two are local matters, to be Processing in this step and the next two are local matters, to be
accomplished prior to actual invocation of IDNA, but at least these accomplished prior to actual invocation of IDNA, but at least these
two steps must be accomplished in some way. two steps must be accomplished in some way.
5.2. Conversion to Unicode 5.2. Conversion to Unicode
The string is converted from the local character set into Unicode, if The string is converted from the local character set into Unicode, if
it is not already Unicode. The exact nature of this conversion is it is not already Unicode. The exact nature of this conversion is
beyond the scope of this document, but may involve normalization, as beyond the scope of this document, but may involve normalization, as
described in Section 4.2. described in Section 4.2.
skipping to change at page 10, line 29 skipping to change at page 10, line 36
local environment, to make the result of the IDNA processing match local environment, to make the result of the IDNA processing match
user expectations. For instance, it would be reasonable, at this user expectations. For instance, it would be reasonable, at this
step, to convert all upper case characters to lower case, if this step, to convert all upper case characters to lower case, if this
makes sense in the user's environment. makes sense in the user's environment.
Other examples of processing for localization might be applied, if Other examples of processing for localization might be applied, if
appropriate, at this point. They include interpreting various appropriate, at this point. They include interpreting various
characters as separating domain name components from each other characters as separating domain name components from each other
(label separators) because they either look like periods or are used (label separators) because they either look like periods or are used
to separate sentences, mapping different "width" forms of the same to separate sentences, mapping different "width" forms of the same
character into the one form permitted in labels, or giving special character into the one form permitted in labels[[anchor20: This needs
treatment to characters whose presentation forms are dependent only clarification]], or giving special treatment to characters whose
on placement in the label. Such localization changes are even presentation forms are dependent only on placement in the label.
further outside the scope of this specification than the ones Such localization changes are also outside the scope of this
mentioned above. specification.
Recommendations for preprocessing for global contexts (i.e., when Recommendations for preprocessing for global contexts (i.e., when
local considerations do not apply or cannot be used) and for maximum local considerations do not apply or cannot be used) and for maximum
interoperability with labels that might have been specified under interoperability with labels that might have been specified under
liberal readings of IDNA2003 are given in [IDNA2008-Rationale]. liberal readings of IDNA2003 are given in [IDNA2008-Rationale].
[[anchor17: The question of preprocessing remains controversial in [[anchor21: The question of preprocessing remains controversial in
the WG. One school of thought is that, for compatibility with the WG. One school of thought is that, for compatibility with
IDNA2003, preprocessing should be standardized and required, with IDNA2003, preprocessing should be standardized and required, with
only one form permitted. Another sees important advantages in having only one form permitted. Another sees important advantages in having
the mappings between U-labels and A-labels be symmetric, unambiguous, the mappings between U-labels and A-labels be symmetric, unambiguous,
and information-preserving. And a third believes that local mappings and information-preserving. And a third believes that local mappings
will occur regardless of what we specify and that it is better to will occur regardless of what we specify and that it is better to
specify the protocol on that basis than to indirectly encourage local specify the protocol on that basis than to indirectly encourage local
inventions. The first group (and perhaps others) believe that local inventions. The first group (and perhaps others) believe that local
mappings will be, to put it mildly, "very bad... for mappings will be, to put it mildly, "very bad... for
interoperability.]] interoperability.]]
skipping to change at page 11, line 29 skipping to change at page 11, line 36
In general, that conversion and testing should be performed if the In general, that conversion and testing should be performed if the
domain name will later be presented to the user in native character domain name will later be presented to the user in native character
form (this requires that the lookup application be IDNA-aware). form (this requires that the lookup application be IDNA-aware).
Applications that are not IDNA-aware will obviously omit that Applications that are not IDNA-aware will obviously omit that
testing; others may treat the string as opaque to avoid the testing; others may treat the string as opaque to avoid the
additional processing at the expense of providing less protection and additional processing at the expense of providing less protection and
information to users. information to users.
5.5. Validation and Character List Testing 5.5. Validation and Character List Testing
In parallel with the registration procedure, the Unicode string is As with the registration procedure, the Unicode string is checked to
checked to verify that all characters that appear in it are valid for verify that all characters that appear in it are valid for IDNA
IDNA resolution input. As discussed above and in resolution input. As discussed above and in [IDNA2008-Rationale],
[IDNA2008-Rationale], the resolution check is more liberal than the the resolution check is more liberal than the registration one.
registration one. Putative labels with any of the following Putative labels with any of the following characteristics MUST BE
characteristics MUST BE rejected prior to DNS lookup: rejected prior to DNS lookup:
o Labels containing code points that are unassigned in the version o Labels containing code points that are unassigned in the version
of Unicode being used by the application, i.e., in the of Unicode being used by the application, i.e., in the
"Unassigned" Unicode category or the UNASSIGNED category of "Unassigned" Unicode category or the UNASSIGNED category of
[IDNA2008-Tables]. [IDNA2008-Tables].
o Labels that are not in NFC form. o Labels that are not in NFC form.
o Labels containing prohibited code points, i.e., those that are o Labels containing prohibited code points, i.e., those that are
assigned to the "DISALLOWED" category in the permitted character assigned to the "DISALLOWED" category in the permitted character
skipping to change at page 12, line 13 skipping to change at page 12, line 20
present. present.
o Labels containing other code points that are shown in the o Labels containing other code points that are shown in the
permitted character table as requiring a contextual rule permitted character table as requiring a contextual rule
("CONTEXTO" in the tables), but for which no such rule appears in ("CONTEXTO" in the tables), but for which no such rule appears in
the table of rules. With the exception in the rule immediately the table of rules. With the exception in the rule immediately
above, applications resolving DNS names or carrying out equivalent above, applications resolving DNS names or carrying out equivalent
operations are not required to test contextual rules, only to operations are not required to test contextual rules, only to
verify that a rule exists. verify that a rule exists.
o Labels whose first character is a combining mark. [[anchor19: Note o Labels whose first character is a combining mark. [[anchor23: Note
in Draft: this definition may need to be further tightened.]] in Draft: this definition may need to be further tightened.]]
In addition, the application SHOULD apply the following test. The In addition, the application SHOULD apply the following test. The
test may be omitted in special circumstances, such as when the test may be omitted in special circumstances, such as when the
resolver application knows that the conditions are enforced resolver application knows that the conditions are enforced
elsewhere, because an attempt to resolve such strings will almost elsewhere, because an attempt to resolve such strings will almost
certainly lead to a DNS lookup failure. However, applying the test certainly lead to a DNS lookup failure. However, applying the test
is likely to give much better information about the reason for a is likely to give much better information about the reason for a
lookup failure -- information that may be usefully passed to the user lookup failure -- information that may be usefully passed to the user
when that is feasible -- then DNS resolution failure alone. when that is feasible -- then DNS resolution failure alone. In any
[[anchor20: Should this be a MUST? Pro: this is the only remaining event, resolvers should avoid looking up labels that are invalid
under that test.
[[anchor24: Should this be a MUST? Pro: this is the only remaining
SHOULD (true?), the test is relatively straightforward, and it helps SHOULD (true?), the test is relatively straightforward, and it helps
avoid visual ambiguity. Con: the "special circumstances" that might avoid visual ambiguity. Con: the "special circumstances" that might
justify doing something different are explained above.]] justify doing something different are explained above.]]
o Verification that the string is compliant with the requirements o Verification that the string is compliant with the requirements
for right to left characters, specified in [IDNA2008-BIDI]. for right to left characters, specified in [IDNA2008-BIDI].
For all other strings, the resolver MUST rely on the presence or For all other strings, the resolver MUST rely on the presence or
absence of labels in the DNS to determine the validity of those absence of labels in the DNS to determine the validity of those
labels and the validity of the characters they contain. If they are labels and the validity of the characters they contain. If they are
registered, they are presumed to be valid; if they are not, their registered, they are presumed to be valid; if they are not, their
possible validity is not relevant. A resolver that declines to look possible validity is not relevant. A resolver that declines to look
up a string that conforms to the above rules is not in conformance up a string that conforms to the above rules is not in conformance
with this protocol. with this protocol.
5.6. Punycode Conversion 5.6. Punycode Conversion
The validated string, a U-label, is converted to an A-label using the The validated string, a U-label, is converted to an A-label using the
punycode algorithm. Punycode algorithm with the ACE prefix added.
5.7. DNS Name Resolution 5.7. DNS Name Resolution
The A-label is looked up in the DNS, using normal DNS procedures. The A-label is looked up in the DNS, using normal DNS procedures.
6. Name server Considerations 6. Name Server Considerations
6.1. Processing Non-ASCII Strings 6.1. Processing Non-ASCII Strings
Existing DNS servers do not know the IDNA rules for handling non- Existing DNS servers do not know the IDNA rules for handling non-
ASCII forms of IDNs, and therefore need to be shielded from them. ASCII forms of IDNs, and therefore need to be shielded from them.
All existing channels through which names can enter a DNS server All existing channels through which names can enter a DNS server
database (for example, master files (as described in RFC 1034) and database (for example, master files (as described in RFC 1034) and
DNS update messages [RFC2136]) are IDN-unaware because they predate DNS update messages [RFC2136]) are IDN-unaware because they predate
IDNA. Other sections of this document provide the needed shielding IDNA. Other sections of this document provide the needed shielding
by ensuring that internationalized domain names entering DNS server by ensuring that internationalized domain names entering DNS server
databases through such channels have already been converted to their databases through such channels have already been converted to their
equivalent ASCII A-label forms. equivalent ASCII A-label forms.
Because of the design of the algorithms in Section 4 and Section 5 (a Because of the design of the algorithms in Section 4 and Section 5 (a
domain name containing only ASCII codepoints can not be converted to domain name containing only ASCII codepoints can not be converted to
an A-label), there can not be more than one A-label form for each an A-label), there can not be more than one A-label form for any
U-label. given U-label.
The current update to the definition of the DNS protocol [RFC2181] The current update to the definition of the DNS protocol [RFC2181]
explicitly allows domain labels to contain octets beyond the ASCII explicitly allows domain labels to contain octets beyond the ASCII
range (0000..007F), and this document does not change that. Note, range (0000..007F), and this document does not change that. Note,
however, that there is no defined interpretation of octets 0080..00FF however, that there is no defined interpretation of octets 0080..00FF
as characters. If labels containing these octets are returned to as characters. If labels containing these octets are returned to
applications, unpredictable behavior could result. The A-label form, applications, unpredictable behavior could result. The A-label form,
which cannot contain those characters, is the only standard which cannot contain those characters, is the only standard
representation for internationalized labels in the current DNS representation for internationalized labels in the current DNS
protocol. protocol.
skipping to change at page 14, line 26 skipping to change at page 14, line 38
7. Security Considerations 7. Security Considerations
The general security principles and issues for IDNA appear in The general security principles and issues for IDNA appear in
[IDNA2008-Rationale]. The comments below are specific to this pair [IDNA2008-Rationale]. The comments below are specific to this pair
of protocols, but should be read in the context of that material and of protocols, but should be read in the context of that material and
the definitions and specifications, identified there, on which this the definitions and specifications, identified there, on which this
one depends. one depends.
This memo describes procedures for registering and looking up labels This memo describes procedures for registering and looking up labels
that are not valid according to the base DNS specifications (STD13 that are not compatible with the preferred syntax described in the
[RFC1034] [RFC1035] and Host Requirements [RFC1123]) because they base DNS specifications (STD13 [RFC1034] [RFC1035] and Host
contain non-ASCII characters. These procedures depend on the use of Requirements [RFC1123]) because they contain non-ASCII characters.
a special ASCII-compatable encoded form that contains only characters These procedures depend on the use of a special ASCII-compatible
permitted in host names by those earlier specifications. The encoding form that contains only characters permitted in host names
encoding is specified in [RFC3492]. No security issues such as by those earlier specifications. The encoding is specified in
string length increases or new allowed values are introduced by the [RFC3492]. No security issues such as string length increases or new
encoding process or the use of these encoded values, apart from those allowed values are introduced by the encoding process or the use of
introduced by the ACE encoding itself. these encoded values, apart from those introduced by the ACE encoding
itself.
Domain names (or portions of them) are sometimes compared against a Domain names (or portions of them) are sometimes compared against a
set of privileged or anti-privileged domains. In such situations it set domains to be given special treatment if a match occurs, e.g.,
is especially important that the comparisons be done properly, as treated as more privileged than others or blocked in some way. In
specified in requirement 2 of Section 3.1. For labels already in such situations it is especially important that the comparisons be
ASCII form (i.e., are LDH-labels or A-labels), the proper comparison done properly, as specified in requirement 2 of Section 3.1. For
reduces to the same case-insensitive ASCII comparison that has always labels already in ASCII form (i.e., are LDH-labels or A-labels), the
been used for ASCII labels. proper comparison reduces to the same case-insensitive ASCII
comparison that has always been used for ASCII labels.
The introduction of IDNA means that any existing labels that start The introduction of IDNA means that any existing labels that start
with the ACE prefix would be construed as A-labels, at least until with the ACE prefix would be construed as A-labels, at least until
they failed one of the relevant tests, whether or not that was the they failed one of the relevant tests, whether or not that was the
intent of the zone administrator or registrant. There is no evidence intent of the zone administrator or registrant. There is no evidence
that this has caused any practical problems since RFC 3490 was that this has caused any practical problems since RFC 3490 was
adopted, but the risk still exists in principle. adopted, but the risk still exists in principle.
8. IANA Considerations 8. IANA Considerations
IANA actions for this version of IDNA are specified in IANA actions for this version of IDNA are specified in
[IDNA2008-Rationale]. [IDNA2008-Rationale].
9. Change Log 9. Change Log
[[anchor26: RFC Editor: Please remove this section.]] [[anchor30: RFC Editor: Please remove this section.]]
This document started as a follow-on to the pre-WG
draft-klensih-idnabis-protocol. Subsections that describe changes in
pre-WG drafts will be removed in version -03.
9.1. Version -00 of draft-klensin-idnabis-protocol
Version -00 of this draft was produced in November 2007 by moving
text from draft-klensin-idnabis-issues and by copy considerable text
from RFC 3490. The result was then extensively edited.
9.2. Versions -01 and -02 of draft-klensin-idnabis-protocol
These versions reflected a number of editorial changes, some of them
significant, and alignment of terminology with
draft-faltstrom-idnabis-tables.
9.3. Version -03 of draft-klensin-idnabis-protocol
o Abstract rewritten to bring its length within RFC Editor
guidelines.
o Corrections and revisions in response to extensive comments by
Mark Davis and others.
o Small modifications to several operations, including moving the
Normalization steps to a different place in the sequence.
o Many editorial changes.
9.4. Version -04 of draft-klensin-idnabis-protocol
o Revised terminology and removed the MAYBE category as a
consequence of design discussions on 30 January 2003 and followup
conversations. Also restructured the various operations to treat
CONTEXTUAL RULE REQUIRED as a validation step (paralleling bidi),
rather than a category. Those changes required changes elsewhere
in the document for consistency.
o Changed the requirements for normalization, making this a
requirement on the calling application rather than an action of
this protocol. This is consistent with the general "mappings
belong somewhere else" principle.
o Updated references.
o More editorial work, some independent of the changes, described
immediately above.
9.5. Version -00 of draft-ietf-idnabis-protocol
o Clarified actions to be taken if an A-label is supplied as input.
o Moved the contextual rules appendix into this document from
draft-klensin-idnabis-issues and made an initial attempt at
defining the actual rules. Synchronized the list of characters in
that appendix with tables-01.
o Added an explicit discussion of A-label input.
o Inserted a test for double-hyphen here.
9.6. Version -01 of draft-ietf-idnabis-protocol 9.1. Changes between Version -00 and -01 of draft-ietf-idnabis-protocol
o Corrected discussion of SRV records. o Corrected discussion of SRV records.
o Several small corrections for clarity. o Several small corrections for clarity.
o Inserted more "open issue" placeholders. o Inserted more "open issue" placeholders.
9.7. Version -02 of draft-ietf-idnabis-protocol 9.2. Version -02
o Rewrote the "conversion to Unicode" text in Section 5.2 as o Rewrote the "conversion to Unicode" text in Section 5.2 as
requested on-list. requested on-list.
o Added a comment (and reference) about EDNS0 to the "DNS Server o Added a comment (and reference) about EDNS0 to the "DNS Server
Conventions" section, which was also retitled. Conventions" section, which was also retitled.
o Made several editorial corrections and improvements in response to o Made several editorial corrections and improvements in response to
various comments. various comments.
o Added several new discussion placeholder anchors and updated some o Added several new discussion placeholder anchors and updated some
older ones. older ones.
9.3. Version -03
o Trimmed change log, removing information about pre-WG drafts.
o Incorporated a number of changes suggested by Marcos Sanz in his
note of 2008.07.17 and added several more placeholder anchors.
o Several minor editorial corrections and improvements.
o "Editor" designation temporarily removed because the automatic
posting machinery does not accept it.
10. Contributors 10. Contributors
While the listed editor held the pen, the original versions of this While the listed editor held the pen, the original versions of this
document represent the joint work and conclusions of an ad hoc design document represent the joint work and conclusions of an ad hoc design
team consisting of the editor and, in alphabetic order, Harald team consisting of the editor and, in alphabetic order, Harald
Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document
draws significantly on the original version of IDNA [RFC3490] both draws significantly on the original version of IDNA [RFC3490] both
conceptually and for specific text. This second-generation version conceptually and for specific text. This second-generation version
would not have been possible without the work that went into that would not have been possible without the work that went into that
first version and its authors, Patrik Faltstrom, Paul Hoffman, and first version and its authors, Patrik Faltstrom, Paul Hoffman, and
skipping to change at page 21, line 22 skipping to change at page 20, line 38
The rules for the characters listed in the Tables document as The rules for the characters listed in the Tables document as
exception cases or Join_Controls and for which rules are being exception cases or Join_Controls and for which rules are being
defined at this time appear below. defined at this time appear below.
[[anchor42: Note in draft: This table is not complete and the rule [[anchor42: Note in draft: This table is not complete and the rule
entries below are temporarily only examples.]] entries below are temporarily only examples.]]
002D; HYPHEN-MINUS; F; 002D; HYPHEN-MINUS; F;
Must not appear at the beginning or end of a label; Must not appear at the beginning or end of a label;
Regular expression: Regular expression:
[^^]\u002D|\u00SD[^$] ; [^^]\u002D|\u002D[^$] ;
# Note that there are some additional prohibitions in the # Note that there are some additional prohibitions in the
specification on consecutive hyphens in anything but a valid specification on consecutive hyphens in anything but a valid
A-label. A-label.
200C; ZERO WIDTH NON-JOINER; T; 200C; ZERO WIDTH NON-JOINER; T;
Between two characters from the same script only. The script must Between two characters from the same script only. The script must
be one in which the use of this character causes significant be one in which the use of this character causes significant
visual transformation of one or both of the adjacent characters; visual transformation of one or both of the adjacent characters;
Regular expression: Regular expression:
[\p(Script:Deva)\p(Script:Tamil)]\u200C[\p(Script:Deva)\p(Script: [\p(Script:Deva)\p(Script:Tamil)]\u200C[\p(Script:Deva)\p(Script:
skipping to change at page 22, line 13 skipping to change at page 21, line 29
[[anchor44: That script list is _not_ complete and, in particular, [[anchor44: That script list is _not_ complete and, in particular,
more Indic scripts certainly need to be listed. It also does not more Indic scripts certainly need to be listed. It also does not
correctly express the "same script" restriction mentioned in the correctly express the "same script" restriction mentioned in the
prose, since it only tests adjacent characters. This character is prose, since it only tests adjacent characters. This character is
not required for Arabic script.]] not required for Arabic script.]]
00B7; MIDDLE DOT; F; 00B7; MIDDLE DOT; F;
Between two 'l' (U+006C) characters only, used to permit the Between two 'l' (U+006C) characters only, used to permit the
Catalan character ela geminada to be expressed; Catalan character ela geminada to be expressed;
Regular expression: Regular expression:
\u006C\u00B7\u006c ; \u006C\u00B7\u006C ;
0375; GREEK LOWER NUMERAL SIGN (KERAIA); F; 0375; GREEK LOWER NUMERAL SIGN (KERAIA); F;
Greek script only. Might be further restricted to specific Greek script only. Might be further restricted to specific
following characters; following characters;
Regular expression: Regular expression:
\0375\(Script:Greek) ; \u0375\p(Script:Greek) ;
02B9; MODIFIER LETTER PRIME; F;;; 02B9; MODIFIER LETTER PRIME; F;;;
# Permitted only in contexts in which GREEK LOWER NUMERAL SIGN, # Permitted only in contexts in which GREEK LOWER NUMERAL SIGN,
U+0375, is permitted. GREEK NUMERAL SIGN, U+0374, and the Lower U+0375, is permitted. GREEK NUMERAL SIGN, U+0374, and the Lower
Numeral Sign (U+0375) are indicators for numeric use of letters in Numeral Sign (U+0375) are indicators for numeric use of letters in
older Greek writing systems. U+02B9 is relevant because older Greek writing systems. U+02B9 is relevant because
normalization maps U+0374 into it.; normalization maps U+0374 into it.;
Regular expression: Regular expression:
\(Script:Greek)\02B9\(Script:Greek) ; \p(Script:Greek)\u02B9\p(Script:Greek) ;
[[anchor45: The test is that the adjacent characters be in the [[anchor45: The test is that the adjacent characters be in the
Greek script. It is not clear whether this is sufficient. The Greek script. It is not clear whether this is sufficient. The
requirement for a preceding Greek letter may not be necessary. requirement for a preceding Greek letter may not be necessary.
More input needed.]] More input needed.]]
0483; COMBINING CYRILLIC TITLO; F; 0483; COMBINING CYRILLIC TITLO; F;
Cyrillic script only. Might be further restricted to permit only Cyrillic script only. Might be further restricted to permit only
a preceding list of characters. a preceding list of characters.
Regular expression: Regular expression:
\p(Script:Cyrillic)\u0483 ; \p(Script:Cyrillic)\u0483 ;
05F3; HEBREW PUNCTUATION GERESH; F; 05F3; HEBREW PUNCTUATION GERESH; F;
The script of the preceding character and the subsequent The script of the preceding character and the subsequent
character, if any, MUST be Hebrew; character, if any, MUST be Hebrew;
Regular expression: Regular expression:
skipping to change at page 22, line 50 skipping to change at page 22, line 20
05F3; HEBREW PUNCTUATION GERESH; F; 05F3; HEBREW PUNCTUATION GERESH; F;
The script of the preceding character and the subsequent The script of the preceding character and the subsequent
character, if any, MUST be Hebrew; character, if any, MUST be Hebrew;
Regular expression: Regular expression:
\p(Script:Hebrew)\u05F3\p(Script:Hebrew)? ; \p(Script:Hebrew)\u05F3\p(Script:Hebrew)? ;
05F4; HEBREW PUNCTUATION GERSHAYIM; F 05F4; HEBREW PUNCTUATION GERSHAYIM; F
The script of the preceding character and the subsequent The script of the preceding character and the subsequent
character, if any, MUST be Hebrew; character, if any, MUST be Hebrew;
Regular expression: Regular expression:
\p(Script:Hebrew)\u05F3\p(Script:Hebrew)? ; \p(Script:Hebrew)\u05F4\p(Script:Hebrew)? ;
3005; IDEOGRAPHIC ITERATION MARK; F; 3005; IDEOGRAPHIC ITERATION MARK; F;
MUST NOT be at the beginning of the label, and the previous MUST NOT be at the beginning of the label, and the previous
character MUST be in Han Script; character MUST be in Han Script;
Regular expression: Regular expression:
\p(Script:Hani)\u30FB ; \p(Script:Hani)\u3005 ;
303B; VERTICAL IDEOGRAPHIC ITERATION MARK; F; 303B; VERTICAL IDEOGRAPHIC ITERATION MARK; F;
MUST NOT be at the beginning of the label, and the previous MUST NOT be at the beginning of the label, and the previous
character MUST be in Han Script; character MUST be in Han Script;
Regular expression: Regular expression:
\p(Script:Hani)\u303B ; \p(Script:Hani)\u303B ;
30FB; KATAKANA MIDDLE DOT; F; 30FB; KATAKANA MIDDLE DOT; F;
Adjacent characters MUST be Katakana; Adjacent characters MUST be Katakana;
Regular expression: Regular expression:
skipping to change at page 28, line 38 skipping to change at page 28, line 8
Rule Set: Rule Set:
If FirstChar .eq. True Then False; If FirstChar .eq. True Then False;
Else If BeforeScript .eq. Kana Then Else If BeforeScript .eq. Kana Then
If AfterScript .eq. Kana Then True; If AfterScript .eq. Kana Then True;
Else False; Else False;
Author's Address Author's Address
John C Klensin John C Klensin
(editor)
1770 Massachusetts Ave, Ste 322 1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140 Cambridge, MA 02140
USA USA
Phone: +1 617 245 1457 Phone: +1 617 245 1457
Email: john+ietf@jck.com Email: john+ietf@jck.com
Full Copyright Statement Full Copyright Statement
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
 End of changes. 41 change blocks. 
152 lines changed or deleted 110 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/