draft-ietf-idn-nameprep-01.txt   draft-ietf-idn-nameprep-02.txt 
Internet Draft Paul Hoffman Internet Draft Paul Hoffman
draft-ietf-idn-nameprep-01.txt IMC & VPNC draft-ietf-idn-nameprep-02.txt IMC & VPNC
January 15, 2001 Marc Blanchet January 17, 2001 Marc Blanchet
Expires in six months ViaGenie Expires in six months ViaGenie
Preparation of Internationalized Host Names Preparation of Internationalized Host Names
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with all This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026. provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts. may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress." or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at To view the list Internet-Draft Shadow Directories, see
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
This document describes how to prepare internationalized host names for This document describes how to prepare internationalized host names for
for use in the DNS. The steps include: for use in the DNS. The steps include:
- mapping characters to other characters, such as to change their case - mapping characters to other characters, such as to change their case
- normalizing the characters - normalizing the characters
- excluding characters that are prohibited from appearing in - excluding characters that are prohibited from appearing in
internationalized host names internationalized host names
skipping to change at line 74 skipping to change at line 71
This document describes the steps needed to convert a name part from one This document describes the steps needed to convert a name part from one
that is entered by the user to one that can be used in the DNS. that is entered by the user to one that can be used in the DNS.
1.1 Terminology 1.1 Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
"MAY" in this document are to be interpreted as described in RFC 2119 "MAY" in this document are to be interpreted as described in RFC 2119
[RFC2119]. [RFC2119].
Examples in this document use the notation for code points and names Examples in this document use the notation for code points and names
from the Unicode Standard [Unicode3] and ISO 10646. For example, the from the Unicode Standard [Unicode3] and ISO 10646 [ISO10646]. For
letter "a" may be represented as either "U+0061" or "LATIN SMALL LETTER example, the letter "a" may be represented as either "U+0061" or "LATIN
A". In the lists of prohibited characters, the "U+" is left off to make SMALL LETTER A". In the lists of prohibited characters, the "U+" is left
the lists easier to read. off to make the lists easier to read. The names of character ranges are
shown in square brackets (such as "[SYMBOLS]") and do not come from the
standards.
Note: A glossary of terms used in Unicode and ISO 10646 can be found in Note: A glossary of terms used in Unicode and ISO 10646 can be found in
[Glossary]. Information on the 10646/Unicode character model can be [Glossary]. Information on the 10646/Unicode character model can be
found in [CharModel]. found in [CharModel].
2. Preparation Overview 2. Preparation Overview
The steps for preparing names are: The steps for preparing names are:
1) Input from the application service interface -- This can be done in 1) Input from the application service interface -- This can be done in
skipping to change at line 160 skipping to change at line 159
There are some characters that do not have mappings in [UTR21] but still There are some characters that do not have mappings in [UTR21] but still
need processing. These characters include a few Greek characters and need processing. These characters include a few Greek characters and
many symbols that contain Latin characters. The list of characters to many symbols that contain Latin characters. The list of characters to
add to the mapping table were determined by the following algorithm: add to the mapping table were determined by the following algorithm:
b = Normalize(Fold(a)); b = Normalize(Fold(a));
c = Normalize(Fold(b)); c = Normalize(Fold(b));
if c is not the same as b, add a mapping for "a to c". if c is not the same as b, add a mapping for "a to c".
Because Normalize(CaseFold(c)) always equals c, the table is stable from Because Normalize(Fold(c)) always equals c, the table is stable from
that point on. that point on.
3.3 Mapped out 3.3 Mapped out
The following characters are simply deleted from the input (that is, The following characters are simply deleted from the input (that is,
they are mapped to nothing) because their presence or absence should not they are mapped to nothing) because their presence or absence should not
make two domain names different. make two domain names different.
Some characters are only useful in line-based text, and are otherwise Some characters are only useful in line-based text, and are otherwise
invisible and ignored. invisible and ignored.
00AD SOFT HYPHEN 00AD; SOFT HYPHEN
1806 MONGOLIAN TODO SOFT HYPHEN 1806; MONGOLIAN TODO SOFT HYPHEN
200B ZERO WIDTH SPACE 200B; ZERO WIDTH SPACE
FEFF ZERO WIDTH NO-BREAK SPACE FEFF; ZERO WIDTH NO-BREAK SPACE
Variation selectors and cursive connectors select different glyphs, but Variation selectors and cursive connectors select different glyphs, but
do not bear semantics. do not bear semantics.
180B MONGOLIAN FREE VARIATION SELECTOR ONE 180B; MONGOLIAN FREE VARIATION SELECTOR ONE
180C MONGOLIAN FREE VARIATION SELECTOR TWO 180C; MONGOLIAN FREE VARIATION SELECTOR TWO
180D MONGOLIAN FREE VARIATION SELECTOR THREE 180D; MONGOLIAN FREE VARIATION SELECTOR THREE
200C ZERO WIDTH NON-JOINER 200C; ZERO WIDTH NON-JOINER
200D ZERO WIDTH JOINER 200D; ZERO WIDTH JOINER
4. Normalizaiton 4. Normalization
The output of the mapping step is normalized using form KC, as described The output of the mapping step is normalized using form KC, as described
in [UTR15]. Using form KC instead of form C causes many characters that in [UTR15]. Using form KC instead of form C causes many characters that
are identical or near-identical to be converted into a single character. are identical or near-identical to be converted into a single character.
Note that this specification refers to a specific vesion of [UTR15]. Note that this specification refers to a specific vesion of [UTR15].
If a later version of [UTR15] changes the algorithm used for normalizing, If a later version of [UTR15] changes the algorithm used for normalizing,
that later version MUST NOT be used with this specification. Note that that later version MUST NOT be used with this specification. Note that
it is likely that this specification will be revised if UTR15 is changed, it is likely that this specification will be revised if UTR15 is changed,
but until that happens, only the specified version of [UTR15] must but until that happens, only the specified version of [UTR15] must
be used. be used.
5. Prohibited Output 5. Prohibited Output
Before the text can be emitted, it must be checked for prohibited Before the text can be emitted, it must be checked for prohibited code
characters. There is a variety of prohibited characters, as described in points. There is a variety of prohibited code points, as described in
this section. this section.
One of the goals of IDN is to allow the widest possible set of host One of the goals of IDN is to allow the widest possible set of host
names as long as those host names do not cause other problems, such as names as long as those host names do not cause other problems, such as
conflict with other standards. Specifically, experience with current DNS conflict with other standards. Specifically, experience with current DNS
names have shown that there is a desire for host names that include names have shown that there is a desire for host names that include
personal names, company names, and spoken phrases. A goal of this personal names, company names, and spoken phrases. A goal of this
section is to prohibit as few characters that might be used in these section is to prohibit as few characters that might be used in these
contexts as possible. contexts as possible.
Note that every character listed in this section MUST NOT be transmitted Note that every code point listed in this section MUST NOT be transmitted
on the DNS service interface. If a DNS server receives a request on the DNS service interface. If a DNS server receives a request
containing a prohibited character, then the DNS server MUST NOT containing a prohibited code point, then the DNS server MUST NOT resolve
resolve that name. that name.
Some characters listed in one section would also appear in other
sections. Each character is only listed once.
The collected list of prohibited characters can be found in Appendix F The collected list of prohibited code points can be found in Appendix F
of this document. The list in Appendix F MUST be used by implementations of this document. The list in Appendix F MUST be used by implementations
of this specification. If there are any discrepancies between the list of this specification. If there are any discrepancies between the list
in Appendix F and subsections below, the list Appendix F always takes in Appendix F and subsections below, the list Appendix F always takes
precedence. precedence.
Some code points listed in one section would also appear in other
sections. Each code point is only listed once in the table in Appendix
F.
5.1 Currently-prohibited ASCII characters 5.1 Currently-prohibited ASCII characters
Some of the ASCII characters that are currently prohibited in host names Some of the ASCII characters that are currently prohibited in host names
by [STD13] are also used in protocol elements such as URIs. The other by [STD13] are also used in protocol elements such as URIs. The other
characters in the range U+0000 to U+007F that are not currently allowed characters in the range U+0000 to U+007F that are not currently allowed
are also prohibited in host name parts to reserve them for future use in are also prohibited in host name parts to reserve them for future use in
protocol elements. protocol elements.
0000-002C 0000-002C; [ASCII]
002E-002F 002E-002F; [ASCII]
003A-0040 003A-0040; [ASCII]
005B-0060 005B-0060; [ASCII]
007B-007F 007B-007F; [ASCII]
5.2 Space characters 5.2 Space characters
Space characters would make visual transcription of URLs nearly Space characters would make visual transcription of URLs nearly
impossible and could lead to user entry errors in many ways. impossible and could lead to user entry errors in many ways.
0020 SPACE 0020; SPACE
00A0 NO-BREAK SPACE 00A0; NO-BREAK SPACE
2000 EN QUAD 2000; EN QUAD
2001 EM QUAD 2001; EM QUAD
2002 EN SPACE 2002; EN SPACE
2003 EM SPACE 2003; EM SPACE
2004 THREE-PER-EM SPACE 2004; THREE-PER-EM SPACE
2005 FOUR-PER-EM SPACE 2005; FOUR-PER-EM SPACE
2006 SIX-PER-EM SPACE 2006; SIX-PER-EM SPACE
2007 FIGURE SPACE 2007; FIGURE SPACE
2008 PUNCTUATION SPACE 2008; PUNCTUATION SPACE
2009 THIN SPACE 2009; THIN SPACE
200A HAIR SPACE 200A; HAIR SPACE
202F NARROW NO-BREAK SPACE 202F; NARROW NO-BREAK SPACE
3000 IDEOGRAPHIC SPACE 3000; IDEOGRAPHIC SPACE
1680 OGHAM SPACE MARK 1680; OGHAM SPACE MARK
200B ZERO WIDTH SPACE 200B; ZERO WIDTH SPACE
5.3 Control characters 5.3 Control characters
Control characters cannot be seen and can cause unpredictable results Control characters cannot be seen and can cause unpredictable results
when displayed. when displayed.
0000-001F [CONTROL CHARACTERS] 0000-001F; [CONTROL CHARACTERS]
007F DELETE 007F; DELETE
0080-009F [CONTROL CHARACTERS] 0080-009F; [CONTROL CHARACTERS]
2028 LINE SEPARATOR 2028; LINE SEPARATOR
2029 PARAGRAPH SEPARATORS 2029; PARAGRAPH SEPARATORS
5.4 Private use and replacement characters 5.4 Private use and replacement characters
Because private-use characters do not have defined meanings, they are Because private-use characters do not have defined meanings, they are
prohibited. The private-use characters are: prohibited. The private-use characters are:
E000-F8FF [PRIVATE USE, PLANE 0] E000-F8FF; [PRIVATE USE, PLANE 0]
F0000-FFFFD [PRIVATE USE, PLANE 15] F0000-FFFFD; [PRIVATE USE, PLANE 15]
100000-10FFFD [PRIVATE USE, PLANE 16] 100000-10FFFD; [PRIVATE USE, PLANE 16]
The replacement character (U+FFFD) has no known semantic definition in a The replacement character (U+FFFD) has no known semantic definition in a
name, and is often used in renderers to say "there would be some name, and is often used in renderers to say "there would be some
character here, but it cannot be rendered". For example, on a computer character here, but it cannot be rendered". For example, on a computer
with no Asian fonts, a name with three katakana characters might be with no Asian fonts, a name with three katakana characters might be
rendered with three replacement characters. rendered with three replacement characters.
FFFD REPLACEMENT CHARACTER FFFD; REPLACEMENT CHARACTER
5.5 Non-character codepoints 5.5 Non-character codepoints
Non-character code points are code points that have been assigned in Non-character code points are code points that have been assigned in
ISO 10646 but are not characters. Because they are already assigned, ISO 10646 but are not characters. Because they are already assigned,
they are guaranteed not to later change into characters. they are guaranteed not to later change into characters.
FFFE-FFFF [NONCHARACTER CODE POINTS] FFFE-FFFF; [NONCHARACTER CODE POINTS]
1FFFE-1FFFF [NONCHARACTER CODE POINTS] 1FFFE-1FFFF; [NONCHARACTER CODE POINTS]
2FFFE-2FFFF [NONCHARACTER CODE POINTS] 2FFFE-2FFFF; [NONCHARACTER CODE POINTS]
3FFFE-3FFFF [NONCHARACTER CODE POINTS] 3FFFE-3FFFF; [NONCHARACTER CODE POINTS]
4FFFE-4FFFF [NONCHARACTER CODE POINTS] 4FFFE-4FFFF; [NONCHARACTER CODE POINTS]
5FFFE-5FFFF [NONCHARACTER CODE POINTS] 5FFFE-5FFFF; [NONCHARACTER CODE POINTS]
6FFFE-6FFFF [NONCHARACTER CODE POINTS] 6FFFE-6FFFF; [NONCHARACTER CODE POINTS]
7FFFE-7FFFF [NONCHARACTER CODE POINTS] 7FFFE-7FFFF; [NONCHARACTER CODE POINTS]
8FFFE-8FFFF [NONCHARACTER CODE POINTS] 8FFFE-8FFFF; [NONCHARACTER CODE POINTS]
9FFFE-9FFFF [NONCHARACTER CODE POINTS] 9FFFE-9FFFF; [NONCHARACTER CODE POINTS]
AFFFE-AFFFF [NONCHARACTER CODE POINTS] AFFFE-AFFFF; [NONCHARACTER CODE POINTS]
BFFFE-BFFFF [NONCHARACTER CODE POINTS] BFFFE-BFFFF; [NONCHARACTER CODE POINTS]
CFFFE-CFFFF [NONCHARACTER CODE POINTS] CFFFE-CFFFF; [NONCHARACTER CODE POINTS]
DFFFE-DFFFF [NONCHARACTER CODE POINTS] DFFFE-DFFFF; [NONCHARACTER CODE POINTS]
EFFFE-EFFFF [NONCHARACTER CODE POINTS] EFFFE-EFFFF; [NONCHARACTER CODE POINTS]
FFFFE-FFFFF [NONCHARACTER CODE POINTS] FFFFE-FFFFF; [NONCHARACTER CODE POINTS]
10FFFE-10FFFF [NONCHARACTER CODE POINTS] 10FFFE-10FFFF; [NONCHARACTER CODE POINTS]
5.6 Surrogate codes 5.6 Surrogate codes
The following are permanently reserved for use as surrogate code values The following code points are permanently reserved for use as surrogate
in the UTF-16 encoding, will never be assigned to characters and are code values in the UTF-16 encoding, will never be assigned to
therefore prohibited: characters, and are therefore prohibited:
D800-DFFF [SURROGATE CODES] D800-DFFF; [SURROGATE CODES]
5.7 Inappropriate for plain text 5.7 Inappropriate for plain text
The following characters should not appear in regular text. The following characters should not appear in regular text.
FFF9 INTERLINEAR ANNOTATION ANCHOR FFF9; INTERLINEAR ANNOTATION ANCHOR
FFFA INTERLINEAR ANNOTATION SEPARATOR FFFA; INTERLINEAR ANNOTATION SEPARATOR
FFFB INTERLINEAR ANNOTATION TERMINATOR FFFB; INTERLINEAR ANNOTATION TERMINATOR
FFFC OBJECT REPLACEMENT CHARACTER FFFC; OBJECT REPLACEMENT CHARACTER
5.8 Inappropriate for domain names 5.8 Inappropriate for domain names
The ideographic description characters allow different sequences of The ideographic description characters allow different sequences of
characters to be rendered the same way, which makes them inappropriate characters to be rendered the same way, which makes them inappropriate
for host names that must have a single canonical order. for host names that must have a single canonical order.
2FF0-2FFF IDEOGRAPHIC DESCRIPTION CHARACTERS 2FF0-2FFF; [IDEOGRAPHIC DESCRIPTION CHARACTERS]
5.9 Change display properties 5.9 Change display properties
The following characters, some of which are deprecated in ISO 10646, The following characters, some of which are deprecated in ISO 10646,
can cause changes in display or the order in which characters appear can cause changes in display or the order in which characters appear
when rendered. when rendered.
200E LEFT-TO-RIGHT MARK 200E; LEFT-TO-RIGHT MARK
200F RIGHT-TO-LEFT MARK 200F; RIGHT-TO-LEFT MARK
202A LEFT-TO-RIGHT EMBEDDING 202A; LEFT-TO-RIGHT EMBEDDING
202B RIGHT-TO-LEFT EMBEDDING 202B; RIGHT-TO-LEFT EMBEDDING
202C POP DIRECTIONAL FORMATTING 202C; POP DIRECTIONAL FORMATTING
202D LEFT-TO-RIGHT OVERRIDE 202D; LEFT-TO-RIGHT OVERRIDE
202E RIGHT-TO-LEFT OVERRIDE 202E; RIGHT-TO-LEFT OVERRIDE
206A INHIBIT SYMMETRIC SWAPPING 206A; INHIBIT SYMMETRIC SWAPPING
206B ACTIVATE SYMMETRIC SWAPPING 206B; ACTIVATE SYMMETRIC SWAPPING
206C INHIBIT ARABIC FORM SHAPING 206C; INHIBIT ARABIC FORM SHAPING
206D ACTIVATE ARABIC FORM SHAPING 206D; ACTIVATE ARABIC FORM SHAPING
206E NATIONAL DIGIT SHAPES 206E; NATIONAL DIGIT SHAPES
206F NOMINAL DIGIT SHAPES 206F; NOMINAL DIGIT SHAPES
6. Unassigned Characters 6. Unassigned Code Points
All characters not yet assigned in [ISO10646] are called "unassigned All code points not yet assigned in ISO 10646 are called "unassigned
characters". Authoritative name servers MUST NOT have internationalized code points". Authoritative name servers MUST NOT have internationalized
name parts that contain any unassigned characters. DNS requests MAY name parts that contain any unassigned code points. DNS requests MAY
contain name parts that contain unassigned characters. Note that this is contain name parts that contain unassigned code points. Note that this
the only part of this document where the requirements for queries is the only part of this document where the requirements for queries
differs from the requirements for names in DNS zones. differs from the requirements for names in DNS zones.
Using two different policies for where unassigned characters can appear Using two different policies for where unassigned code points can appear
in the DNS prevents the need for versioning the IDNprotocol [IDNrev]. in the DNS prevents the need for versioning the IDNprotocol [IDNrev].
This is very useful since it makes the overall processing simpler and do This is very useful since it makes the overall processing simpler and do
not impose a "protocol" to handle versioning. It is expected that ISO not impose a "protocol" to handle versioning. It is expected that ISO
10646 will be updated fairly frequently; recently, it has happened 10646 will be updated fairly frequently; recently, it has happened
approximately once a year. Each time a new version of ISO 10646 appears, approximately once a year. Each time a new version of ISO 10646 appears,
a new version of this document can be created. Some end users will want a new version of this document can be created. Some end users will want
to use the new characters as soon as they are defined. to use the new code points as soon as they are defined.
The list of unassigned characters can be found in Appendix G of this The list of unassigned code points can be found in Appendix G of this
document. The list in Appendix G MUST be used by implementations of this document. The list in Appendix G MUST be used by implementations of this
specification. If there are any discrepancies between the list in specification. If there are any discrepancies between the list in
Appendix G and the ISO 10646 specification, the list Appendix G always Appendix G and the ISO 10646 specification, the list Appendix G always
takes precedence. takes precedence.
Due to the way that versioning is handled in this section, host names Due to the way that versioning is handled in this section, host names
that are embedded in structures that cannot be changed (such as the that are embedded in structures that cannot be changed (such as the
signed parts of digital certificates) MUST NOT have internationalized signed parts of digital certificates) MUST NOT have internationalized
name parts that contain any unassigned characters. name parts that contain any unassigned code points.
6.1 Categories of characters 6.1 Categories of code points
Each character in ISO 10646 can be categorized by how it acts in Each code point in ISO 10646 can be categorized by how it acts in the
the process described in earlier sections of this document: process described in earlier sections of this document:
AO Characters that may be in the output AO Code points that may be in the output
MN Characters that cannot be in the output because they are MN Code points that cannot be in the output because they are
mapped to nothing or never appear as output from mapped to nothing or never appear as output from
normalization normalization
D Characters that cannot be in the output because they are D Code points that cannot be in the output because they are
disallowed in the prohibition step disallowed in the prohibition step
U Unassigned characters U Unassigned code points
A subsequent version of this document that references a newer version of A subsequent version of this document that references a newer version of
ISO 10646 with new characters will inherently have some characters move ISO 10646 with new code points will inherently have some code points
from category U to either D, MN, or AO. For backwards compatibility, no move from category U to either D, MN, or AO. For backwards
future version of this document will move characters from any other compatibility, no future version of this document will move code points
category. That is, no current AO, MN, or D characters will ever change from any other category. That is, no current AO, MN, or D code points
to a different category. will ever change to a different category.
Authoritative name servers MUST NOT contain any name that has characters Authoritative name servers MUST NOT contain any name that has code
outside of AO for the latest version of this document. That is, they are points outside of AO for the latest version of this document. That is,
forbidden to contain any IDN names containing characters from the MN, D, they are forbidden to contain any IDN names containing code points from
or U categories. the MN, D, or U categories.
Applications creating name queries MUST treat U code points as if they Applications creating name queries MUST treat U code points as if they
were AO when preparing the name parts according to this document. Those were AO when preparing the name parts according to this document. Those
applications MAY optionally have a preprocess that provide stricter applications MAY optionally have a preprocess that provide stricter
checks: treating unassigned characters in the input as errors, or checks: treating unassigned code points in the input as errors, or
warning the user about the fact that the character is unassigned in the warning the user about the fact that the code point is unassigned in the
version of this document that the software is based on; such a choice is version of this document that the software is based on; such a choice is
a local matter for the software. a local matter for the software.
Non-authoritative DNS servers MAY reject names that contain characters Non-authoritative DNS servers MAY reject names that contain code points
that are in categories MN or D for the version of this document that that are in categories MN or D for the version of this document that
they implement, but MUST NOT reject names because they contain name they implement, but MUST NOT reject names because they contain name
parts with characters from category U. parts with code points from category U.
6.2 Reasons for difference between authoritative servers and requests 6.2 Reasons for difference between authoritative servers and requests
Different software using different versions of this document need to Different software using different versions of this document need to
interoperate with maximal compatibility. The scheme described in this interoperate with maximal compatibility. The scheme described in this
section (authoritative name servers MUST NOT use unassigned characters, section (authoritative name servers MUST NOT use unassigned code points,
requests MAY include unassigned characters) allows that compatibility requests MAY include unassigned code points) allows that compatibility
without introducing any known security or interoperability issues. without introducing any known security or interoperability issues.
The list below shows what happens if a request contains a character from The list below shows what happens if a request contains a code point
category U that is allowed in a newer version of this document. The from category U that is allowed in a newer version of this document. The
request either resolves to the domain name that was intended, or request either resolves to the domain name that was intended, or
resolves to no domain at all. In this list, the request comes from an resolves to no domain at all. In this list, the request comes from an
application using version "oldVersion" of this document, the application using version "oldVersion" of this document, the
authoritative name server is using version "newVersion" of this authoritative name server is using version "newVersion" of this
document, and the character X was in category U on oldVersion, and has document, and the code point X was in category U on oldVersion, and has
changed category to AO, MN, or D. There are 3 possible scenarios: changed category to AO, MN, or D. There are 3 possible scenarios:
1. X becomes AO -- In newVersion, X is in category AO. Because the 1. X becomes AO -- In newVersion, X is in category AO. Because the
application passed X through, it gets back correct data from the application passed X through, it gets back correct data from the
authoritative name server. There is one exceptional case, where X is a authoritative name server. There is one exceptional case, where X is a
combining mark. combining mark.
The order of combining marks is normalized, so if another combining mark The order of combining marks is normalized, so if another combining mark
Y has a lower combining class than X then XY will be put in the Y has a lower combining class than X then XY will be put in the
canonical order YX. (Unassigned characters are never reordered, so this canonical order YX. (Unassigned code points are never reordered, so this
doesn't happen in oldVersion). If the request contains YX, the request doesn't happen in oldVersion). If the request contains YX, the request
will get correct data from the authoritative name server. However, no will get correct data from the authoritative name server. However, no
domain name can be registered with XY, so a request with XY will get a domain name can be registered with XY, so a request with XY will get a
"no such host" error. "no such host" error.
2. X becomes MN -- In newVersion, X is normalized to character "nX" and 2. X becomes MN -- In newVersion, X is normalized to code point "nX" and
therefore X is now put in category MN. This cannot exist in any domain therefore X is now put in category MN. This cannot exist in any domain
name, so any request containing X will get back a "no such host" error. name, so any request containing X will get back a "no such host" error.
Note, however, if the request had contained the letter nX, it would have Note, however, if the request had contained the letter nX, it would have
gotten back correct data. gotten back correct data.
3. X becomes D -- In newVersion, X is in category MN. This cannot exist 3. X becomes D -- In newVersion, X is in category MN. This cannot exist
in any domain name, so any request containing X will get back a "no such in any domain name, so any request containing X will get back a "no such
host" error. host" error.
In none of the cases does the request get data for a host name other In none of the cases does the request get data for a host name other
skipping to change at line 501 skipping to change at line 501
document. document.
Newer application -- Suppose that a application or intermediary DNS Newer application -- Suppose that a application or intermediary DNS
server is using version newVersion and the authoritative name server is server is using version newVersion and the authoritative name server is
using version oldVersion. This case is simple: there will be no names on using version oldVersion. This case is simple: there will be no names on
the server that cannot be accessed by the application because the the server that cannot be accessed by the application because the
resolver uses a superset of the code points accepted by the server. resolver uses a superset of the code points accepted by the server.
Newer server -- Suppose that an application or intermediary DNS server Newer server -- Suppose that an application or intermediary DNS server
is using oldVersion and the authoritative name server is using is using oldVersion and the authoritative name server is using
newVersion. Because the application passed through any unassigned newVersion. Because the application passed through any unassigned code
characters, the user can access names on the server that use characters points, the user can access names on the server that use code points in
in newVersion. No names on the site can have characters that are newVersion. No names on the site can have code points that are
unassigned in newVersion, since that is illegal. In this case, the unassigned in newVersion, since that is illegal. In this case, the
application has to enter the unassigned characters in the correct order, application has to enter the unassigned code points in the correct
and has to use unassigned characters that would make it through both the order, and has to use unassigned code points that would make it through
mapping and the normalization steps. both the mapping and the normalization steps.
7. Security Considerations 7. Security Considerations
Much of the security of the Internet relies on the DNS. Thus, any change Much of the security of the Internet relies on the DNS. Thus, any change
to the characteristics of the DNS can change the security of much of the to the characteristics of the DNS can change the security of much of the
Internet. Internet.
Host names are used by users to connect to Internet servers. The Host names are used by users to connect to Internet servers. The
security of the Internet would be compromised if a user entering a security of the Internet would be compromised if a user entering a
single internationalized name could be connected to different servers single internationalized name could be connected to different servers
skipping to change at line 596 skipping to change at line 596
James Seng James Seng
Marc Blanchet Marc Blanchet
Mark Davis Mark Davis
Martin Duerst Martin Duerst
Patrik Faltstrom Patrik Faltstrom
Paul Hoffman Paul Hoffman
Additional significant improvements were proposed by: Additional significant improvements were proposed by:
Jonathan Rosenne Jonathan Rosenne
Kent Karlsson
Scott Hollenbeck
B. Differences Between -00 and -01 Drafts B. Differences Between -01 and -01 Drafts
Throughout: Changed "canonicalize" to "normalize". Removed the
normative references to ISO 10646.
1.1: Clarified the second paragraph and added the third.
1.2: Removed the IDN summary because we have diverged from the
comparison draft significantly.
1.3: Removed the open issues list.
2: Removed the references to the parts of IDNComp.
2.1: Removed the section on where preparation happens.
2.2: Reversed the order of the middle three steps.
3, 4, and 5: Changed the order to match the new ordering.
4: Added the description of the design goals for one-to-none vs.
prohibition. Changed the table on which case mapping is based. Pretty
much changed the whole section.
5: Removed many characters. Two reasons were to remove the ones that now
get corrected by NFKC, and removed the ones that "looked like" other
forbidden characters.
5.2: Added and removed various characters.
5.3: Added higher-plane private use characters. Throughout: changed the format of lines with character names to make
the document easier to review.
5.5: Added non-character code points. 1.1: Added non-normative reference to [ISO10646]. Also added note about
range names.
5.6: Changed "surrogate characters" to "surrogate codes" and corrected 3.2: Changed "CaseFold" to "Fold" in last sentence.
the description of why they are prohibited.
6: Replaced future IANA description with new versioning proposal. 4: Corrected spelling in title.
7: Added third paragraph. 5: Changed "character" to "code point" in many places because some of
the things that are prohibited are not chraracters. Changed the last
sentence in the fifth paragraph.
8: Added [CharModel] and [Glossary]. Updated the non-normative 6: Changed "character" to "code point" in many places, including the
reference for ISO 10646. title of the section.
A: Added names of commenters. A: Added Kent Karlsson and Scott Hollenbeck to the commenters list.
C: Removed the IANA Considerations because we are not sure we will F: Corrected an error in the table (hyphen was called prohibited
we have any. when it obviously is not). Changed title.
E, F, G: Added the long appendicies at the end of the document. G: Fixed the table to use the proper format for the code points.
Changed title.
C. IANA Considerations C. IANA Considerations
[[[ We probably won't have any. ]]] [[[ We probably won't have any. ]]]
D. Author Contact Information D. Author Contact Information
Paul Hoffman Paul Hoffman
Internet Mail Consortium and VPN Consortium Internet Mail Consortium and VPN Consortium
127 Segre Place 127 Segre Place
skipping to change at line 1555 skipping to change at line 1533
FF32; FF52; Case map FF32; FF52; Case map
FF33; FF53; Case map FF33; FF53; Case map
FF34; FF54; Case map FF34; FF54; Case map
FF35; FF55; Case map FF35; FF55; Case map
FF36; FF56; Case map FF36; FF56; Case map
FF37; FF57; Case map FF37; FF57; Case map
FF38; FF58; Case map FF38; FF58; Case map
FF39; FF59; Case map FF39; FF59; Case map
FF3A; FF5A; Case map FF3A; FF5A; Case map
F. Prohibited Character List F. Prohibited Code Point List
0000-002C 0000-002C
002E-002F 002E-002F
003A-0040 003A-0040
005B-0060 005B-0060
007B-007F 007B-007F
0080-009F 0080-009F
00A0 00A0
1680 1680
2000 2000
skipping to change at line 1624 skipping to change at line 1602
CFFFE-CFFFF CFFFE-CFFFF
DFFFE-DFFFF DFFFE-DFFFF
EFFFE-EFFFF EFFFE-EFFFF
F0000-FFFFD F0000-FFFFD
FFFFE-FFFFF FFFFE-FFFFF
100000-10FFFD 100000-10FFFD
10FFFE-10FFFF 10FFFE-10FFFF
NOTE WELL: Software that follows this specification that will be used to NOTE WELL: Software that follows this specification that will be used to
check names before they are put in authoritative name servers MUST add check names before they are put in authoritative name servers MUST add
all unassigned characters to the list of characters that are prohibited. all unassigned code pints to the list of characters that are prohibited.
See Section 6 for more details. See Section 6 for more details.
G. Unassigned Character List G. Unassigned Code Point List
000220-000221 0220-0221
000234-00024F 0234-024F
0002AE-0002AF 02AE-02AF
0002EF-0002FF 02EF-02FF
00034F-00035F 034F-035F
000363-000373 0363-0373
000376-000379 0376-0379
00037B-00037D 037B-037D
00037F-000383 037F-0383
00038B 038B
00038D 038D
0003A2 03A2
0003CF 03CF
0003D8-0003D9 03D8-03D9
0003F6-0003FF 03F4-03FF
000487 0487
00048A-00048B 048A-048B
0004C5-0004C6 04C5-04C6
0004C9-0004CA 04C9-04CA
0004CD-0004CF 04CD-04CF
0004F6-0004F7 04F6-04F7
0004FA-000530 04FA-0530
000557-000558 0557-0558
000560 0560
000588 0588
00058B-000590 058B-0590
0005A2 05A2
0005BA 05BA
0005C5-0005CF 05C5-05CF
0005EB-0005EF 05EB-05EF
0005F5-00060B 05F5-060B
00060D-00061A 060D-061A
00061C-00061E 061C-061E
000620 0620
00063B-00063F 063B-063F
000656-00065F 0656-065F
00066E-00066F 066E-066F
0006EE-0006EF 06EE-06EF
0006FF 06FF
00070E 070E
00072D-00072F 072D-072F
00074B-00077F 074B-077F
0007B1-000900 07B1-0900
000904 0904
00093A-00093B 093A-093B
00094E-00094F 094E-094F
000955-000957 0955-0957
000971-000980 0971-0980
000984 0984
00098D-00098E 098D-098E
000991-000992 0991-0992
0009A9 09A9
0009B1 09B1
0009B3-0009B5 09B3-09B5
0009BA-0009BB 09BA-09BB
0009BD 09BD
0009C5-0009C6 09C5-09C6
0009C9-0009CA 09C9-09CA
0009CE-0009D6 09CE-09D6
0009D8-0009DB 09D8-09DB
0009DE 09DE
0009E4-0009E5 09E4-09E5
0009FB-000A01 09FB-0A01
000A03-000A04 0A03-0A04
000A0B-000A0E 0A0B-0A0E
000A11-000A12 0A11-0A12
000A29 0A29
000A31 0A31
000A34 0A34
000A37 0A37
000A3A-000A3B 0A3A-0A3B
000A3D 0A3D
000A43-000A46 0A43-0A46
000A49-000A4A 0A49-0A4A
000A4E-000A58 0A4E-0A58
000A5D 0A5D
000A5F-000A65 0A5F-0A65
000A75-000A80 0A75-0A80
000A84 0A84
000A8C 0A8C
000A8E 0A8E
000A92 0A92
000AA9 0AA9
000AB1 0AB1
000AB4 0AB4
000ABA-000ABB 0ABA-0ABB
000AC6 0AC6
000ACA 0ACA
000ACE-000ACF 0ACE-0ACF
000AD1-000ADF 0AD1-0ADF
000AE1-000AE5 0AE1-0AE5
000AF0-000B00 0AF0-0B00
000B04 0B04
000B0D-000B0E 0B0D-0B0E
000B11-000B12 0B11-0B12
000B29 0B29
000B31 0B31
000B34-000B35 0B34-0B35
000B3A-000B3B 0B3A-0B3B
000B44-000B46 0B44-0B46
000B49-000B4A 0B49-0B4A
000B4E-000B55 0B4E-0B55
000B58-000B5B 0B58-0B5B
000B5E 0B5E
000B62-000B65 0B62-0B65
000B71-000B81 0B71-0B81
000B84 0B84
000B8B-000B8D 0B8B-0B8D
000B91 0B91
000B96-000B98 0B96-0B98
000B9B 0B9B
000B9D 0B9D
000BA0-000BA2 0BA0-0BA2
000BA5-000BA7 0BA5-0BA7
000BAB-000BAD 0BAB-0BAD
000BB6 0BB6
000BBA-000BBD 0BBA-0BBD
000BC3-000BC5 0BC3-0BC5
000BC9 0BC9
000BCE-000BD6 0BCE-0BD6
000BD8-000BE6 0BD8-0BE6
000BF3-000C00 0BF3-0C00
000C04 0C04
000C0D 0C0D
000C11 0C11
000C29 0C29
000C34 0C34
000C3A-000C3D 0C3A-0C3D
000C45 0C45
000C49 0C49
000C4E-000C54 0C4E-0C54
000C57-000C5F 0C57-0C5F
000C62-000C65 0C62-0C65
000C70-000C81 0C70-0C81
000C84 0C84
000C8D 0C8D
000C91 0C91
000CA9 0CA9
000CB4 0CB4
000CBA-000CBD 0CBA-0CBD
000CC5 0CC5
000CC9 0CC9
000CCE-000CD4 0CCE-0CD4
000CD7-000CDD 0CD7-0CDD
000CDF 0CDF
000CE2-000CE5 0CE2-0CE5
000CF0-000D01 0CF0-0D01
000D04 0D04
000D0D 0D0D
000D11 0D11
000D29 0D29
000D3A-000D3D 0D3A-0D3D
000D44-000D45 0D44-0D45
000D49 0D49
000D4E-000D56 0D4E-0D56
000D58-000D5F 0D58-0D5F
000D62-000D65 0D62-0D65
000D70-000D81 0D70-0D81
000D84 0D84
000D97-000D99 0D97-0D99
000DB2 0DB2
000DBC 0DBC
000DBE-000DBF 0DBE-0DBF
000DC7-000DC9 0DC7-0DC9
000DCB-000DCE 0DCB-0DCE
000DD5 0DD5
000DD7 0DD7
000DE0-000DF1 0DE0-0DF1
000DF5-000E00 0DF5-0E00
000E3B-000E3E 0E3B-0E3E
000E5C-000E80 0E5C-0E80
000E83 0E83
000E85-000E86 0E85-0E86
000E89 0E89
000E8B-000E8C 0E8B-0E8C
000E8E-000E93 0E8E-0E93
000E98 0E98
000EA0 0EA0
000EA4 0EA4
000EA6 0EA6
000EA8-000EA9 0EA8-0EA9
000EAC 0EAC
000EBA 0EBA
000EBE-000EBF 0EBE-0EBF
000EC5 0EC5
000EC7 0EC7
000ECE-000ECF 0ECE-0ECF
000EDA-000EDB 0EDA-0EDB
000EDE-000EFF 0EDE-0EFF
000F48 0F48
000F6B-000F70 0F6B-0F70
000F8C-000F8F 0F8C-0F8F
000F98 0F98
000FBD 0FBD
000FCD-000FCE 0FCD-0FCE
000FD0-000FFF 0FD0-0FFF
001022 1022
001028 1028
00102B 102B
001033-001035 1033-1035
00103A-00103F 103A-103F
00105A-00109F 105A-109F
0010C6-0010CF 10C6-10CF
0010F7-0010FA 10F7-10FA
0010FC-0010FF 10FC-10FF
00115A-00115E 115A-115E
0011A3-0011A7 11A3-11A7
0011FA-0011FF 11FA-11FF
001207 1207
001247 1247
001249 1249
00124E-00124F 124E-124F
001257 1257
001259 1259
00125E-00125F 125E-125F
001287 1287
001289 1289
00128E-00128F 128E-128F
0012AF 12AF
0012B1 12B1
0012B6-0012B7 12B6-12B7
0012BF 12BF
0012C1 12C1
0012C6-0012C7 12C6-12C7
0012CF 12CF
0012D7 12D7
0012EF 12EF
00130F 130F
001311 1311
001316-001317 1316-1317
00131F 131F
001347 1347
00135B-001360 135B-1360
00137D-00139F 137D-139F
0013F5-001400 13F5-1400
001677-00167F 1677-167F
00169D-00169F 169D-169F
0016F1-00177F 16F1-177F
0017DD-0017DF 17DD-17DF
0017EA-0017FF 17EA-17FF
00180F 180F
00181A-00181F 181A-181F
001878-00187F 1878-187F
0018AA-001DFF 18AA-1DFF
001E9C-001E9F 1E9C-1E9F
001EFA-001EFF 1EFA-1EFF
001F16-001F17 1F16-1F17
001F1E-001F1F 1F1E-1F1F
001F46-001F47 1F46-1F47
001F4E-001F4F 1F4E-1F4F
001F58 1F58
001F5A 1F5A
001F5C 1F5C
001F5E 1F5E
001F7E-001F7F 1F7E-1F7F
001FB5 1FB5
001FC5 1FC5
001FD4-001FD5 1FD4-1FD5
001FDC 1FDC
001FF0-001FF1 1FF0-1FF1
001FF5 1FF5
001FFF 1FFF
002047 2047
00204E-002069 204E-2069
002071-002073 2071-2073
00208F-00209F 208F-209F
0020B0-0020CF 20B0-20CF
0020E4-0020FF 20E4-20FF
00213B-002152 213B-2152
002184-00218F 2184-218F
0021F4-0021FF 21F4-21FF
0022F2-0022FF 22F2-22FF
00237C 237C
00239B-0023FF 239B-23FF
002427-00243F 2427-243F
00244B-00245F 244B-245F
0024EB-0024FF 24EB-24FF
002596-00259F 2596-259F
0025F8-0025FF 25F8-25FF
002614-002618 2614-2618
002672-002700 2672-2700
002705 2705
00270A-00270B 270A-270B
002728 2728
00274C 274C
00274E 274E
002753-002755 2753-2755
002757 2757
00275F-002760 275F-2760
002768-002775 2768-2775
002795-002797 2795-2797
0027B0 27B0
0027BF-0027FF 27BF-27FF
002900-002E7F 2900-2E7F
002E9A 2E9A
002EF4-002EFF 2EF4-2EFF
002FD6-002FEF 2FD6-2FEF
002FFC-002FFF 2FFC-2FFF
00303B-00303D 303B-303D
003040 3040
003095-003098 3095-3098
00309F-0030A0 309F-30A0
0030FF-003104 30FF-3104
00312D-003130 312D-3130
00318F 318F
0031B8-0031FF 31B8-31FF
00321D-00321F 321D-321F
003244-00325F 3244-325F
00327C-00327E 327C-327E
0032B1-0032BF 32B1-32BF
0032CC-0032CF 32CC-32CF
0032FF 32FF
003377-00337A 3377-337A
0033DE-0033DF 33DE-33DF
0033FF 33FF
004DB6-004DFF 4DB6-4DFF
009FA6-009FFF 9FA6-9FFF
00A48D-00A48F A48D-A48F
00A4A2-00A4A3 A4A2-A4A3
00A4B4 A4B4
00A4C1 A4C1
00A4C5 A4C5
00A4C7-00ABFF A4C7-ABFF
00D7A4-00D7FF D7A4-D7FF
00FA2E-00FAFF FA2E-FAFF
00FB07-00FB12 FB07-FB12
00FB18-00FB1C FB18-FB1C
00FB37 FB37
00FB3D FB3D
00FB3F FB3F
00FB42 FB42
00FB45 FB45
00FBB2-00FBD2 FBB2-FBD2
00FD40-00FD4F FD40-FD4F
00FD90-00FD91 FD90-FD91
00FDC8-00FDCF FDC8-FDCF
00FDFC-00FE1F FDFC-FE1F
00FE24-00FE2F FE24-FE2F
00FE45-00FE48 FE45-FE48
00FE53 FE53
00FE67 FE67
00FE6C-00FE6F FE6C-FE6F
00FE73 FE73
00FE75 FE75
00FEFD-00FEFE FEFD-FEFE
00FF00 FF00
00FF5F-00FF60 FF5F-FF60
00FFBF-00FFC1 FFBF-FFC1
00FFC8-00FFC9 FFC8-FFC9
00FFD0-00FFD1 FFD0-FFD1
00FFD8-00FFD9 FFD8-FFD9
00FFDD-00FFDF FFDD-FFDF
00FFE7 FFE7
00FFEF-00FFF8 FFEF-FFF8
10000-1FFFD
20000-2FFFD
30000-3FFFD
40000-4FFFD
50000-5FFFD
60000-6FFFD
70000-7FFFD
80000-8FFFD
90000-9FFFD
A0000-AFFFD
B0000-BFFFD
C0000-CFFFD
D0000-DFFFD
E0000-EFFFD
 End of changes. 62 change blocks. 
187 lines changed or deleted 165 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/