draft-ietf-idn-nameprep-02.txt   draft-ietf-idn-nameprep-03.txt 
Internet Draft Paul Hoffman Internet Draft Paul Hoffman
draft-ietf-idn-nameprep-02.txt IMC & VPNC draft-ietf-idn-nameprep-03.txt IMC & VPNC
January 17, 2001 Marc Blanchet February 24, 2001 Marc Blanchet
Expires in six months ViaGenie Expires in six months ViaGenie
Preparation of Internationalized Host Names Preparation of Internationalized Host Names
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with all This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026. provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task Internet-Drafts are working documents of the Internet Engineering Task
skipping to change at line 28 skipping to change at line 28
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress." or to cite them other than as "work in progress."
To view the list Internet-Draft Shadow Directories, see To view the list Internet-Draft Shadow Directories, see
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
This document describes how to prepare internationalized host names for This document describes how to prepare internationalized host names for
for use in the DNS. The steps include: use in the DNS. The steps include:
- mapping characters to other characters, such as to change their case - mapping characters to other characters, such as to change their case
- normalizing the characters - normalizing the characters
- excluding characters that are prohibited from appearing in - excluding characters that are prohibited from appearing in
internationalized host names internationalized host names
This document does not specify a wire protocol. This preparation should
be done before the DNS request.
1. Introduction 1. Introduction
When expanding today's DNS to include internationalized host names, When expanding today's DNS to include internationalized host names,
those new names will be handled in many parts of the DNS. The IDN those new names will be handled in many parts of the DNS. The
Working Group's requirements document [IDNReq] describes a framework for Internationalized Domain Name (IDN) Working Group's requirements
domain name handling as well as requirements for the new names. The IDN document [IDNReq] describes a framework for domain name handling as well
Working Group's comparison document [IDNComp] gives a framework for how as requirements for the new names.
various parts of the IDN solution work together.
A user can enter a domain name into an application program in a myriad A user can enter a domain name into an application program in a myriad
of fashions. Depending on the input method, the characters entered in of fashions. Depending on the input method, the characters entered in
the domain name may or may not be those that are allowed in the domain name may or may not be those that are allowed in
internationalized host names. Thus, there must be a way to normalized internationalized host names. Thus, there must be a way to normalized
the user's input before the name is resolved in the DNS. the user's input before the name is resolved in the DNS.
It is a design goal of this document to allow users to enter host names It is a design goal of this document to allow users to enter host names
in applications and have the highest chance of getting the name correct. in applications and have the highest chance of getting the name correct.
This means that the user should not be limited to only entering exactly Another, often conflicting, design goal is to allow as wide of a range
the characters that might have been used, but to instead be able to of characters as possible to be allowed in host names. The user should
enter characters that unambiguously normalize to characters in the not be limited to only entering exactly the characters that might have
desired host name. At the same time, this process must not introduce any been used, but to instead be able to enter characters that unambiguously
chance that two host names could be represented by two distinct strings normalize to characters in the desired host name. Although it would be
of characters that look identical to typical users. It is also a design easy to use the process in this step to "correct" perceived mis-features
goal to have all preprocessing of IDN done before going on the wire, so or bugs in the current character standards, this document expressly does
that no transformation is done in the DNS server space. Name preparation not do so.
can be done in other places, such as in the registration process.
This document describes the steps needed to convert a name part from one This document describes the steps needed to convert a name part from one
that is entered by the user to one that can be used in the DNS. that is entered by the user to one that can be used in the DNS.
Within a fully-qualified domain name, some labels may be
internationalized, while others are not. This specification should be
applied to all internationalized labels. An application must be able to
recognize which part is internationalized; the method for such
recognition is outside of the scope of this document. Note that this
specification is harmless to the non-internationalized labels: when the
steps described here are applied to non-internationalized labels, the
label will not change.
1.1 Terminology 1.1 Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
"MAY" in this document are to be interpreted as described in RFC 2119 "MAY" in this document are to be interpreted as described in RFC 2119
[RFC2119]. [RFC2119].
Examples in this document use the notation for code points and names Examples in this document use the notation for code points and names
from the Unicode Standard [Unicode3] and ISO 10646 [ISO10646]. For from the Unicode Standard [Unicode3] and ISO/IEC 10646 [ISO10646]. For
example, the letter "a" may be represented as either "U+0061" or "LATIN example, the letter "a" may be represented as either "U+0061" or "LATIN
SMALL LETTER A". In the lists of prohibited characters, the "U+" is left SMALL LETTER A". In the lists of prohibited characters, the "U+" is left
off to make the lists easier to read. The names of character ranges are off to make the lists easier to read. The names of character ranges are
shown in square brackets (such as "[SYMBOLS]") and do not come from the shown in square brackets (such as "[SYMBOLS]") and do not come from the
standards. standards.
Note: A glossary of terms used in Unicode and ISO 10646 can be found in Note: A glossary of terms used in Unicode and ISO/IEC 10646 can be found
[Glossary]. Information on the 10646/Unicode character model can be in [Glossary]. Information on the 10646/Unicode character model can be
found in [CharModel]. found in [CharModel].
2. Preparation Overview 2. Preparation Overview
The steps for preparing names are: The steps for preparing names are:
1) Input from the application service interface -- This can be done in 1) Input from the application service interface -- This can be done in
many ways and is not specified in this document many ways and is not specified in this document
2) Map -- For each character in the input, check if it has a mapping 2) Map -- For each character in the input, check if it has a mapping
and, if so, replace it with its mapping. The mappings are a combination and, if so, replace it with its mapping. The mappings are a combination
of folding uppercase characters to lowercase and hyphen mapping. This of folding uppercase characters to lowercase and hyphen mapping. This is
is described in Section 4. described in Section 4.
3) Normalize -- Normalize the characters. This is described in Section 5. 3) Normalize -- Normalize the characters. This is described in Section
5.
4) Look for prohibited output -- Check for any characters that are not 4) Look for prohibited output -- Check for any characters that are not
allowed in the output. If any are found, return an error to the allowed in the output. If any are found, return an error to the
application service interface. This is described in Section 6. application service interface. This is described in Section 6.
5) Resolution of the prepared name -- This must be specified in a 5) Resolution of the prepared name -- This must be specified in a
different IDN document. different IDN document.
The above steps MUST be performed in the order given in order to comply The above steps MUST be performed in the order given in order to comply
with this specification. with this specification.
skipping to change at line 125 skipping to change at line 135
3. Mapping 3. Mapping
Each character in the input stream is checked against the mapping table. Each character in the input stream is checked against the mapping table.
The mapping table can be found in Appendix E of this document. That The mapping table can be found in Appendix E of this document. That
table includes all the steps described in the subsections below. table includes all the steps described in the subsections below.
The mappings can be one-to-none, one-to-one, or one-to-many. That is, The mappings can be one-to-none, one-to-one, or one-to-many. That is,
some characters may be eliminated or replaced by more than one some characters may be eliminated or replaced by more than one
character, and the output of this step might be shorter or longer than character, and the output of this step might be shorter or longer than
the input. the input. Because of this, an application MUST be prepared to receive a
longer or shorter string than the one input in the nameprep algorithm.
Design note: Characters that are not wanted in internationalized name Rationale: Characters that are not wanted in internationalized name
parts can either be mapped to nothing in the mapping step, or cause an parts can either be mapped to nothing in the mapping step, or cause an
error in the prohibition step. The general guideline used to pick error in the prohibition step. The general guideline used to pick
between the two outcomes was that removing alphabetic, non-protocol between the two outcomes was that removing alphabetic, non-protocol
characters be done in the mapping step, but all other removals be done characters be done in the mapping step, but all other removals be done
in the prohibition step. This allows for simple linguistic errors on the in the prohibition step. This allows for simple linguistic errors on the
part of an input mechanism to be caught in the mapping step, but to not part of an input mechanism to be caught in the mapping step, but to not
hide serious errors such as entering protocol characters or invisible hide serious errors such as entering protocol characters or invisible
characters from the user. characters from the user.
3.1 Case mapping 3.1 Case mapping
For each character in the input, if there is a lowercase mapping for The input string is case folded according to [UTR21]. For most
that character, the input character is changed to the mapped lowercase characters, this is the same thing as changing the input character to a
character(s). The entries in the mapping table are derived from [UTR21]. lowercase character. For some characters, however, more complex
transformations occur. The mapping table in Appendix E is derived by
applying the rules for equivalence classes from [UTR21].
Design note: this step could have been "change all lowercase characters Rationale: This step could have been "change all lowercase characters
into uppercase characters". However, the upper-to-lower folding was into uppercase characters". However, the upper-to-lower folding was
chosen because most users of the Internet today enter host names in chosen because most users of the Internet today enter host names in
lowercase. lowercase.
3.2 Additional folding mappings 3.2 Additional folding mappings
There are some characters that do not have mappings in [UTR21] but still There are some characters that do not have mappings in [UTR21] but still
need processing. These characters include a few Greek characters and need processing. These characters include a few Greek characters and
many symbols that contain Latin characters. The list of characters to many symbols that contain Latin characters. The list of characters to
add to the mapping table were determined by the following algorithm: add to the mapping table were determined by the following algorithm:
b = Normalize(Fold(a)); b = NormalizeWithKC(Fold(a));
c = Normalize(Fold(b)); c = NormalizeWithKC(Fold(b));
if c is not the same as b, add a mapping for "a to c". if c is not the same as b, add a mapping for "a to c".
Because Normalize(Fold(c)) always equals c, the table is stable from Because NormalizeWithKC(Fold(c)) always equals c, the table is stable
that point on. from that point on.
3.3 Mapped out 3.3 Mapped out
The following characters are simply deleted from the input (that is, The following characters are simply deleted from the input (that is,
they are mapped to nothing) because their presence or absence should not they are mapped to nothing) because their presence or absence should not
make two domain names different. make two domain names different.
Some characters are only useful in line-based text, and are otherwise Some characters are only useful in line-based text, and are otherwise
invisible and ignored. invisible and ignored.
skipping to change at line 190 skipping to change at line 203
180C; MONGOLIAN FREE VARIATION SELECTOR TWO 180C; MONGOLIAN FREE VARIATION SELECTOR TWO
180D; MONGOLIAN FREE VARIATION SELECTOR THREE 180D; MONGOLIAN FREE VARIATION SELECTOR THREE
200C; ZERO WIDTH NON-JOINER 200C; ZERO WIDTH NON-JOINER
200D; ZERO WIDTH JOINER 200D; ZERO WIDTH JOINER
4. Normalization 4. Normalization
The output of the mapping step is normalized using form KC, as described The output of the mapping step is normalized using form KC, as described
in [UTR15]. Using form KC instead of form C causes many characters that in [UTR15]. Using form KC instead of form C causes many characters that
are identical or near-identical to be converted into a single character. are identical or near-identical to be converted into a single character.
Note that this specification refers to a specific vesion of [UTR15]. Note that this specification refers to a specific version of [UTR15]. If
If a later version of [UTR15] changes the algorithm used for normalizing, a later version of [UTR15] changes the algorithm used for normalizing,
that later version MUST NOT be used with this specification. Note that that later version MUST NOT be used with this specification. Note that
it is likely that this specification will be revised if UTR15 is changed, it is likely that this specification will be revised if UTR15 is
but until that happens, only the specified version of [UTR15] must changed, but until that happens, only the specified version of [UTR15]
be used. must be used.
5. Prohibited Output 5. Prohibited Output
Before the text can be emitted, it must be checked for prohibited code Before the text can be emitted, it must be checked for prohibited code
points. There is a variety of prohibited code points, as described in points. There is a variety of prohibited code points, as described in
this section. this section.
One of the goals of IDN is to allow the widest possible set of host One of the goals of IDN is to allow the widest possible set of host
names as long as those host names do not cause other problems, such as names as long as those host names do not cause other problems, such as
conflict with other standards. Specifically, experience with current DNS conflict with other standards. Specifically, experience with current DNS
names have shown that there is a desire for host names that include names have shown that there is a desire for host names that include
personal names, company names, and spoken phrases. A goal of this personal names, company names, and spoken phrases. A goal of this
section is to prohibit as few characters that might be used in these section is to prohibit as few characters that might be used in these
contexts as possible. contexts as possible.
Note that every code point listed in this section MUST NOT be transmitted
on the DNS service interface. If a DNS server receives a request
containing a prohibited code point, then the DNS server MUST NOT resolve
that name.
The collected list of prohibited code points can be found in Appendix F The collected list of prohibited code points can be found in Appendix F
of this document. The list in Appendix F MUST be used by implementations of this document. The list in Appendix F MUST be used by implementations
of this specification. If there are any discrepancies between the list of this specification. If there are any discrepancies between the list
in Appendix F and subsections below, the list Appendix F always takes in Appendix F and subsections below, the list Appendix F always takes
precedence. precedence.
Some code points listed in one section would also appear in other Some code points listed in one section would also appear in other
sections. Each code point is only listed once in the table in Appendix sections. Each code point is only listed once in the table in Appendix
F. F.
5.1 Currently-prohibited ASCII characters 5.1 Currently-prohibited ASCII characters
Some of the ASCII characters that are currently prohibited in host names Some of the ASCII characters that are currently prohibited in host names
by [STD13] are also used in protocol elements such as URIs. The other by [STD13] are also used in protocol elements such as URIs [URI]. The other
characters in the range U+0000 to U+007F that are not currently allowed characters in the range U+0000 to U+007F that are not currently allowed
are also prohibited in host name parts to reserve them for future use in are also prohibited in host name parts to reserve them for future use in
protocol elements. protocol elements.
0000-002C; [ASCII] 0000-002C; [ASCII]
002E-002F; [ASCII] 002E-002F; [ASCII]
003A-0040; [ASCII] 003A-0040; [ASCII]
005B-0060; [ASCII] 005B-0060; [ASCII]
007B-007F; [ASCII] 007B-007F; [ASCII]
skipping to change at line 272 skipping to change at line 280
5.3 Control characters 5.3 Control characters
Control characters cannot be seen and can cause unpredictable results Control characters cannot be seen and can cause unpredictable results
when displayed. when displayed.
0000-001F; [CONTROL CHARACTERS] 0000-001F; [CONTROL CHARACTERS]
007F; DELETE 007F; DELETE
0080-009F; [CONTROL CHARACTERS] 0080-009F; [CONTROL CHARACTERS]
2028; LINE SEPARATOR 2028; LINE SEPARATOR
2029; PARAGRAPH SEPARATORS 2029; PARAGRAPH SEPARATOR
5.4 Private use and replacement characters 5.4 Private use and replacement characters
Because private-use characters do not have defined meanings, they are Because private-use characters do not have defined meanings, they are
prohibited. The private-use characters are: prohibited. The private-use characters are:
E000-F8FF; [PRIVATE USE, PLANE 0] E000-F8FF; [PRIVATE USE, PLANE 0]
F0000-FFFFD; [PRIVATE USE, PLANE 15] F0000-FFFFD; [PRIVATE USE, PLANE 15]
100000-10FFFD; [PRIVATE USE, PLANE 16] 100000-10FFFD; [PRIVATE USE, PLANE 16]
The replacement character (U+FFFD) has no known semantic definition in a The replacement character (U+FFFD) has no known semantic definition in a
name, and is often used in renderers to say "there would be some name, and is often displayed by renderers to indicate "there would be some
character here, but it cannot be rendered". For example, on a computer character here, but it cannot be rendered". For example, on a computer
with no Asian fonts, a name with three katakana characters might be with no Asian fonts, a name with three katakana characters might be
rendered with three replacement characters. rendered with three replacement characters.
FFFD; REPLACEMENT CHARACTER FFFD; REPLACEMENT CHARACTER
5.5 Non-character codepoints 5.5 Non-character codepoints
Non-character code points are code points that have been assigned in Non-character code points are code points that have been assigned in
ISO 10646 but are not characters. Because they are already assigned, ISO/IEC 10646 but are not characters. Because they are already assigned,
they are guaranteed not to later change into characters. they are guaranteed not to later change into characters.
FFFE-FFFF; [NONCHARACTER CODE POINTS] FFFE-FFFF; [NONCHARACTER CODE POINTS]
1FFFE-1FFFF; [NONCHARACTER CODE POINTS] 1FFFE-1FFFF; [NONCHARACTER CODE POINTS]
2FFFE-2FFFF; [NONCHARACTER CODE POINTS] 2FFFE-2FFFF; [NONCHARACTER CODE POINTS]
3FFFE-3FFFF; [NONCHARACTER CODE POINTS] 3FFFE-3FFFF; [NONCHARACTER CODE POINTS]
4FFFE-4FFFF; [NONCHARACTER CODE POINTS] 4FFFE-4FFFF; [NONCHARACTER CODE POINTS]
5FFFE-5FFFF; [NONCHARACTER CODE POINTS] 5FFFE-5FFFF; [NONCHARACTER CODE POINTS]
6FFFE-6FFFF; [NONCHARACTER CODE POINTS] 6FFFE-6FFFF; [NONCHARACTER CODE POINTS]
7FFFE-7FFFF; [NONCHARACTER CODE POINTS] 7FFFE-7FFFF; [NONCHARACTER CODE POINTS]
skipping to change at line 342 skipping to change at line 350
5.8 Inappropriate for domain names 5.8 Inappropriate for domain names
The ideographic description characters allow different sequences of The ideographic description characters allow different sequences of
characters to be rendered the same way, which makes them inappropriate characters to be rendered the same way, which makes them inappropriate
for host names that must have a single canonical order. for host names that must have a single canonical order.
2FF0-2FFF; [IDEOGRAPHIC DESCRIPTION CHARACTERS] 2FF0-2FFF; [IDEOGRAPHIC DESCRIPTION CHARACTERS]
5.9 Change display properties 5.9 Change display properties
The following characters, some of which are deprecated in ISO 10646, The following characters, some of which are deprecated in ISO/IEC 10646,
can cause changes in display or the order in which characters appear can cause changes in display or the order in which characters appear
when rendered. when rendered.
200E; LEFT-TO-RIGHT MARK 200E; LEFT-TO-RIGHT MARK
200F; RIGHT-TO-LEFT MARK 200F; RIGHT-TO-LEFT MARK
202A; LEFT-TO-RIGHT EMBEDDING 202A; LEFT-TO-RIGHT EMBEDDING
202B; RIGHT-TO-LEFT EMBEDDING 202B; RIGHT-TO-LEFT EMBEDDING
202C; POP DIRECTIONAL FORMATTING 202C; POP DIRECTIONAL FORMATTING
202D; LEFT-TO-RIGHT OVERRIDE 202D; LEFT-TO-RIGHT OVERRIDE
202E; RIGHT-TO-LEFT OVERRIDE 202E; RIGHT-TO-LEFT OVERRIDE
206A; INHIBIT SYMMETRIC SWAPPING 206A; INHIBIT SYMMETRIC SWAPPING
206B; ACTIVATE SYMMETRIC SWAPPING 206B; ACTIVATE SYMMETRIC SWAPPING
206C; INHIBIT ARABIC FORM SHAPING 206C; INHIBIT ARABIC FORM SHAPING
206D; ACTIVATE ARABIC FORM SHAPING 206D; ACTIVATE ARABIC FORM SHAPING
206E; NATIONAL DIGIT SHAPES 206E; NATIONAL DIGIT SHAPES
206F; NOMINAL DIGIT SHAPES 206F; NOMINAL DIGIT SHAPES
5.10 Inappropriate characters from common input mechanisms
U+3002 is used as if it were U+002E in many input mechanisms,
particularly in Asia. This prohibition allows input mechanisms to safely
map U+3002 to U+002E before doing nameprep without worrying about
preventing users from accessing legitimate host name parts.
3002; IDEOGRAPHIC FULL STOP
6. Unassigned Code Points 6. Unassigned Code Points
All code points not yet assigned in ISO 10646 are called "unassigned All code points not assigned in ISO/IEC 10646 are called "unassigned
code points". Authoritative name servers MUST NOT have internationalized code points". Authoritative name servers MUST NOT have internationalized
name parts that contain any unassigned code points. DNS requests MAY name parts that contain any unassigned code points. DNS requests MAY
contain name parts that contain unassigned code points. Note that this contain name parts that contain unassigned code points. Note that this
is the only part of this document where the requirements for queries is the only part of this document where the requirements for queries
differs from the requirements for names in DNS zones. differs from the requirements for names in DNS zones.
Using two different policies for where unassigned code points can appear Using two different policies for where unassigned code points can appear
in the DNS prevents the need for versioning the IDNprotocol [IDNrev]. in the DNS prevents the need for versioning the IDNprotocol [IDNrev].
This is very useful since it makes the overall processing simpler and do This is very useful since it makes the overall processing simpler and do
not impose a "protocol" to handle versioning. It is expected that ISO not impose a "protocol" to handle versioning. It is expected that ISO/IEC
10646 will be updated fairly frequently; recently, it has happened 10646 will be updated fairly frequently; recently, it has happened
approximately once a year. Each time a new version of ISO 10646 appears, approximately once a year. Each time a new version of ISO/IEC 10646 appears,
a new version of this document can be created. Some end users will want a new version of this document can be created. Some end users will want
to use the new code points as soon as they are defined. to use the new code points as soon as they are defined.
The list of unassigned code points can be found in Appendix G of this The list of unassigned code points can be found in Appendix G of this
document. The list in Appendix G MUST be used by implementations of this document. The list in Appendix G MUST be used by implementations of this
specification. If there are any discrepancies between the list in specification. If there are any discrepancies between the list in
Appendix G and the ISO 10646 specification, the list Appendix G always Appendix G and the ISO/IEC 10646 specification, the list Appendix G
takes precedence. always takes precedence.
Due to the way that versioning is handled in this section, host names Due to the way that versioning is handled in this section, host names
that are embedded in structures that cannot be changed (such as the that are embedded in structures that cannot be changed (such as the
signed parts of digital certificates) MUST NOT have internationalized signed parts of digital certificates) MUST NOT have internationalized
name parts that contain any unassigned code points. name parts that contain any unassigned code points.
6.1 Categories of code points 6.1 Categories of code points
Each code point in ISO 10646 can be categorized by how it acts in the Each code point in ISO/IEC 10646 can be categorized by how it acts in the
process described in earlier sections of this document: process described in earlier sections of this document:
AO Code points that may be in the output AO Code points that may be in the output
MN Code points that cannot be in the output because they are MN Code points that cannot be in the output because they are
mapped to nothing or never appear as output from mapped to nothing or never appear as output from
normalization normalization
D Code points that cannot be in the output because they are D Code points that cannot be in the output because they are
disallowed in the prohibition step disallowed in the prohibition step
U Unassigned code points U Unassigned code points
A subsequent version of this document that references a newer version of A subsequent version of this document that references a newer version of
ISO 10646 with new code points will inherently have some code points ISO/IEC 10646 with new code points will inherently have some code points
move from category U to either D, MN, or AO. For backwards move from category U to either D, MN, or AO. For backwards
compatibility, no future version of this document will move code points compatibility, no future version of this document will move code points
from any other category. That is, no current AO, MN, or D code points from any other category. That is, no current AO, MN, or D code points
will ever change to a different category. will ever change to a different category.
Authoritative name servers MUST NOT contain any name that has code Authoritative name servers MUST NOT contain any name that has code
points outside of AO for the latest version of this document. That is, points outside of AO for the latest version of this document. That is,
they are forbidden to contain any IDN names containing code points from they are forbidden to contain any IDN names containing code points from
the MN, D, or U categories. the MN, D, or U categories.
skipping to change at line 534 skipping to change at line 551
may be vulnerable to attack based on the new characters allowed by this may be vulnerable to attack based on the new characters allowed by this
specification. specification.
8. References 8. References
[CharModel] Unicode Technical Report;17, Character Model. [CharModel] Unicode Technical Report;17, Character Model.
<http://www.unicode.org/unicode/reports/tr17/>. <http://www.unicode.org/unicode/reports/tr17/>.
[Glossary] Unicode Glossary, <http://www.unicode.org/glossary/>. [Glossary] Unicode Glossary, <http://www.unicode.org/glossary/>.
[IDNComp] Paul Hoffman, "Comparison of Internationalized Domain Name [IDNReq] Zita Wenzel and James Seng, "Requirements of Internationalized
Proposals", draft-ietf-idn-compare Domain Names", draft-ietf-idn-requirements
[IDNReq] James Seng, "Requirements of Internationalized Domain Names",
draft-ietf-idn-requirement
[IDNRev] Marc Blanchet, "Handling versions of internationalized domain [IDNRev] Marc Blanchet, "Handling versions of internationalized domain
names protocols", draft-ietf-idn-version names protocols", draft-ietf-idn-version
[ISO10646] ISO/IEC 10646-1:2000. International Standard -- Information [ISO10646] ISO/IEC 10646-1:2000. International Standard -- Information
technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part
1: Architecture and Basic Multilingual Plane. 1: Architecture and Basic Multilingual Plane.
[Normalize] Character Normalization in IETF Protocols, [Normalize] Character Normalization in IETF Protocols,
draft-duerst-i18n-norm-03 draft-duerst-i18n-norm-03
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, RFC 2119. Requirement Levels", March 1997, RFC 2119.
[RFC2396] Tim Berners-Lee, et. al., "Uniform Resource Identifiers (URI): [RFC2396] Tim Berners-Lee, et. al., "Uniform Resource Identifiers (URI):
Generic Syntax", August 1998, RFC 2396. Generic Syntax", August 1998, RFC 2396.
[RFC2732] Robert Hinden, et. al., Format for Literal IPv6 Addresses in [RFC2732] Robert Hinden, et. al., Format for Literal IPv6 Addresses in
URL's, December 1999, RFC 2732. URL's, December 1999, RFC 2732.
[STD13] Paul Mockapetris, "Domain names - implementation and [STD13] Paul Mockapetris, "Domain names - concepts and facilities" (RFC
specification", November 1987, STD 13 (RFC 1034 and 1035). 1034) and "Domain names - implementation and specification" (RFC 1035,
STD 13, November 1987.
[Unicode3] The Unicode Consortium, "The Unicode Standard -- Version [Unicode3] The Unicode Consortium, "The Unicode Standard -- Version
3.0", ISBN 0-201-61633-5. Described at 3.0", ISBN 0-201-61633-5. Described at
<http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>. <http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.
[URIs] For example: Roy Fielding et. al., "Uniform Resource Identifiers:
Generic Syntax", August 1998, RFC 2396; Robert Hinden et. al, "IPv6
Literal Addresses in URL's", December 1999, RFC 2732.
[UTR15] Mark Davis and Martin Duerst. Unicode Normalization Forms. [UTR15] Mark Davis and Martin Duerst. Unicode Normalization Forms.
Unicode Technical Report;15. Unicode Technical Report;15.
<http://www.unicode.org/unicode/reports/tr15/>. <http://www.unicode.org/unicode/reports/tr15/>.
[UTR21] Mark Davis. Case Mappings. Unicode Technical Report;21. [UTR21] Mark Davis. Case Mappings. Unicode Technical Report;21.
<http://www.unicode.org/unicode/reports/tr21/>. <http://www.unicode.org/unicode/reports/tr21/>.
A. Acknowledgements A. Acknowledgements
Many people from the IETF IDN Working Group and the Unicode Technical Many people from the IETF IDN Working Group and the Unicode Technical
skipping to change at line 599 skipping to change at line 618
Martin Duerst Martin Duerst
Patrik Faltstrom Patrik Faltstrom
Paul Hoffman Paul Hoffman
Additional significant improvements were proposed by: Additional significant improvements were proposed by:
Jonathan Rosenne Jonathan Rosenne
Kent Karlsson Kent Karlsson
Scott Hollenbeck Scott Hollenbeck
B. Differences Between -01 and -01 Drafts B. Differences Between -02 and -03 Drafts
Throughout: changed the format of lines with character names to make Throughout: Changed "ISO 10646" to "ISO/IEC 10646". Changed "codepoint"
the document easier to review. to "code point".
1.1: Added non-normative reference to [ISO10646]. Also added note about Abstract: Added last sentence.
range names.
3.2: Changed "CaseFold" to "Fold" in last sentence. 1: Removed the sentence about [IDNComp] in the first paragraph.
Clarified the design goals in the third paragraph. Added new last
paragraph about processing name parts.
4: Corrected spelling in title. 3: Added sentence at the end of the second paragraph about accepting
shorter or longer responses. Changed "Design note" to "Rationale".
5: Changed "character" to "code point" in many places because some of 3.1: Revised the first paragraph to make it clearer that the mapping is
the things that are prohibited are not chraracters. Changed the last not simple lowercasing. Changed "Design note" to "Rationale".
sentence in the fifth paragraph.
6: Changed "character" to "code point" in many places, including the 3.2: Made it clearer that the normalization is with form KC.
title of the section.
A: Added Kent Karlsson and Scott Hollenbeck to the commenters list. 5: Removed the previous third paragraph, which discussed the DNS service
interface.
F: Corrected an error in the table (hyphen was called prohibited 5.1: Added references for URIs.
when it obviously is not). Changed title.
G: Fixed the table to use the proper format for the code points. 5.4: Changed the sentence about the replacement character to read
Changed title. "...and is often displayed by renderers to indicate...".
5.10: Added this section, which prohibits U+3002.
6: Removed "yet" from the first sentence.
8: Fixed the reference for [IDNReq] and [STD13]. Removed the reference
to [IDNComp]. Added the reference for [URIs].
C: Changed wording of the section.
E, F, G: Added tags to the beginning and end of the tables.
F: Added 3002 (from section 5.10). Added FDD0-FDEF, which were omitted
in error.
C. IANA Considerations C. IANA Considerations
[[[ We probably won't have any. ]]] None.
D. Author Contact Information D. Author Contact Information
Paul Hoffman Paul Hoffman
Internet Mail Consortium and VPN Consortium Internet Mail Consortium and VPN Consortium
127 Segre Place 127 Segre Place
Santa Cruz, CA 95060 USA Santa Cruz, CA 95060 USA
paul.hoffman@imc.org and paul.hoffman@vpnc.org paul.hoffman@imc.org and paul.hoffman@vpnc.org
Marc Blanchet Marc Blanchet
skipping to change at line 655 skipping to change at line 688
The following is the mapping table from Section 3. The table has three The following is the mapping table from Section 3. The table has three
columns: columns:
- the character that is mapped from - the character that is mapped from
- the zero or more characters that it is mapped to - the zero or more characters that it is mapped to
- the reason for the mapping - the reason for the mapping
The columns are separated by semicolons. Note that the second column may The columns are separated by semicolons. Note that the second column may
be empty, or it may have one character, or it may have more than one be empty, or it may have one character, or it may have more than one
character, with each character separated by a space. character, with each character separated by a space.
----- Start Mapping Table -----
0041; 0061; Case map 0041; 0061; Case map
0042; 0062; Case map 0042; 0062; Case map
0043; 0063; Case map 0043; 0063; Case map
0044; 0064; Case map 0044; 0064; Case map
0045; 0065; Case map 0045; 0065; Case map
0046; 0066; Case map 0046; 0066; Case map
0047; 0067; Case map 0047; 0067; Case map
0048; 0068; Case map 0048; 0068; Case map
0049; 0069; Case map 0049; 0069; Case map
004A; 006A; Case map 004A; 006A; Case map
skipping to change at line 1532 skipping to change at line 1566
FF31; FF51; Case map FF31; FF51; Case map
FF32; FF52; Case map FF32; FF52; Case map
FF33; FF53; Case map FF33; FF53; Case map
FF34; FF54; Case map FF34; FF54; Case map
FF35; FF55; Case map FF35; FF55; Case map
FF36; FF56; Case map FF36; FF56; Case map
FF37; FF57; Case map FF37; FF57; Case map
FF38; FF58; Case map FF38; FF58; Case map
FF39; FF59; Case map FF39; FF59; Case map
FF3A; FF5A; Case map FF3A; FF5A; Case map
----- End Mapping Table -----
F. Prohibited Code Point List F. Prohibited Code Point List
----- Start Prohibited Table -----
0000-002C 0000-002C
002E-002F 002E-002F
003A-0040 003A-0040
005B-0060 005B-0060
007B-007F 007B-007F
0080-009F 0080-009F
00A0 00A0
1680 1680
2000 2000
2001 2001
skipping to change at line 1573 skipping to change at line 1609
202E 202E
202F 202F
206A 206A
206B 206B
206C 206C
206D 206D
206E 206E
206F 206F
2FF0-2FFF 2FF0-2FFF
3000 3000
3002
D800-DFFF D800-DFFF
E000-F8FF E000-F8FF
FFF9 FFF9
FFFA FFFA
FFFB FFFB
FFFC FFFC
FFFD FFFD
FFFE-FFFF FFFE-FFFF
1FFFE-1FFFF 1FFFE-1FFFF
2FFFE-2FFFF 2FFFE-2FFFF
skipping to change at line 1599 skipping to change at line 1636
9FFFE-9FFFF 9FFFE-9FFFF
AFFFE-AFFFF AFFFE-AFFFF
BFFFE-BFFFF BFFFE-BFFFF
CFFFE-CFFFF CFFFE-CFFFF
DFFFE-DFFFF DFFFE-DFFFF
EFFFE-EFFFF EFFFE-EFFFF
F0000-FFFFD F0000-FFFFD
FFFFE-FFFFF FFFFE-FFFFF
100000-10FFFD 100000-10FFFD
10FFFE-10FFFF 10FFFE-10FFFF
----- End Prohibited Table -----
NOTE WELL: Software that follows this specification that will be used to NOTE WELL: Software that follows this specification that will be used to
check names before they are put in authoritative name servers MUST add check names before they are put in authoritative name servers MUST add
all unassigned code pints to the list of characters that are prohibited. all unassigned code pints to the list of characters that are prohibited.
See Section 6 for more details. See Section 6 for more details.
G. Unassigned Code Point List G. Unassigned Code Point List
----- Start Unassigned Table -----
0220-0221 0220-0221
0234-024F 0234-024F
02AE-02AF 02AE-02AF
02EF-02FF 02EF-02FF
034F-035F 034F-035F
0363-0373 0363-0373
0376-0379 0376-0379
037B-037D 037B-037D
037F-0383 037F-0383
038B 038B
skipping to change at line 1942 skipping to change at line 1981
FB07-FB12 FB07-FB12
FB18-FB1C FB18-FB1C
FB37 FB37
FB3D FB3D
FB3F FB3F
FB42 FB42
FB45 FB45
FBB2-FBD2 FBB2-FBD2
FD40-FD4F FD40-FD4F
FD90-FD91 FD90-FD91
FDC8-FDCF FDC8-FDEF
FDFC-FE1F FDFC-FE1F
FE24-FE2F FE24-FE2F
FE45-FE48 FE45-FE48
FE53 FE53
FE67 FE67
FE6C-FE6F FE6C-FE6F
FE73 FE73
FE75 FE75
FEFD-FEFE FEFD-FEFE
FF00 FF00
skipping to change at line 1975 skipping to change at line 2014
50000-5FFFD 50000-5FFFD
60000-6FFFD 60000-6FFFD
70000-7FFFD 70000-7FFFD
80000-8FFFD 80000-8FFFD
90000-9FFFD 90000-9FFFD
A0000-AFFFD A0000-AFFFD
B0000-BFFFD B0000-BFFFD
C0000-CFFFD C0000-CFFFD
D0000-DFFFD D0000-DFFFD
E0000-EFFFD E0000-EFFFD
----- End Unassigned Table -----
 End of changes. 53 change blocks. 
81 lines changed or deleted 120 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/