draft-ietf-idn-nameprep-05.txt   draft-ietf-idn-nameprep-06.txt 
Internet Draft Paul Hoffman Internet Draft Paul Hoffman
draft-ietf-idn-nameprep-05.txt IMC & VPNC draft-ietf-idn-nameprep-06.txt IMC & VPNC
July 19, 2001 Marc Blanchet September 27, 2001 Marc Blanchet
Expires in six months ViaGenie Expires in six months ViaGenie
Preparation of Internationalized Host Names Stringprep Profile for Internationalized Host Names
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with all This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026. provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts. may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress." or to cite them other than as "work in progress."
To view the list Internet-Draft Shadow Directories, see To view the list Internet-Draft Shadow Directories, see
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
This document describes how to prepare internationalized host names for This document describes how to prepare internationalized host name parts
use in the DNS. The steps are: in order to increase the likelihood that name input and name comparison
- mapping characters to other characters, such as to change their case work in ways that make sense for typical users throughout the world. This
- normalizing the characters profile of the stringprep protocol is used as part of a suite of on-the-wire
- excluding characters that are prohibited from appearing in protocols for internationalizing the DNS.
internationalized host names
This document does not specify a wire protocol. This preparation should
be done before the DNS request.
1. Introduction 1. Introduction
When expanding today's DNS to include internationalized host names, This document specifies processing rules that will allow users to enter
those new names will be handled in many parts of the DNS. The internationalized host name parts in applications and have the highest
Internationalized Domain Name (IDN) Working Group's requirements chance of getting the content of the strings correct. It is a profile of
document [IDNReq] describes a framework for domain name handling as well stringprep [STRINGPREP].
as requirements for the new names.
A user can enter a domain name into an application program in a myriad This document was previously called "nameprep" before splitting the
of fashions. Depending on the input method, the characters entered in structure of the protocol off into the stringprep document.
the domain name may or may not be those that are allowed in
internationalized host names. Thus, there must be a way to normalize
the user's input before the name is resolved in the DNS.
It is a design goal of this document to allow users to enter host names This profile defines the following, as required by [STRINGPREP]
in applications and have the highest chance of getting the name correct.
Another, often conflicting, design goal is to allow as wide of a range
of characters as possible in host names. The user should not be limited
to only entering exactly the characters that might have been used, but
to instead be able to enter characters that unambiguously normalize to
characters in the desired host name. Although it would be easy to use
the process in this step to "correct" perceived mis-features or bugs in
the current character standards, this document expressly does not do so.
A difference between a character standard and this specification does
not imply that the character standard is wrong, simply that the
character standard and this specification have different purposes.
This document describes the steps needed to convert a name part from one - The intended applicability of the profile: internationalized
that is entered by the user to one that can be used in the DNS. host name parts
Within a fully-qualified domain name, some labels may be - The character repertoire that is the input and output to stringprep:
internationalized, while others are not. This specification should be defined in Section 2
applied to all internationalized labels. An application must be able to
recognize which part is internationalized; the method for such
recognition is outside of the scope of this document. Note that this
specification is harmless to the non-internationalized labels: when the
steps described here are applied to non-internationalized labels, the
label will not change.
1.1 Terminology - The list of unassigned code points for the repertoire: defined
in Appendix F.
- The mappings used: defined in Section 3.
- The Unicode normalization used: defined in Section 4
- The characters that are prohibited as output: Defined in section 5
1.2 Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
"MAY" in this document are to be interpreted as described in RFC 2119 "MAY" in this document are to be interpreted as described in RFC 2119
[RFC2119]. [RFC2119].
Examples in this document use the notation for code points and names Examples in this document use the notation for code points and names
from the Unicode Standard [Unicode3.1] and ISO/IEC 10646 [ISO10646]. For from the Unicode Standard [Unicode3.1] and ISO/IEC 10646 [ISO10646]. For
example, the letter "a" may be represented as either "U+0061" or "LATIN example, the letter "a" may be represented as either "U+0061" or "LATIN
SMALL LETTER A". In the lists of prohibited characters, the "U+" is left SMALL LETTER A". In the lists of prohibited characters, the "U+" is left
off to make the lists easier to read. The comments for character ranges off to make the lists easier to read. The comments for character ranges
are shown in square brackets (such as "[SYMBOLS]") and do not come from are shown in square brackets (such as "[SYMBOLS]") and do not come from
the standards. the standards.
Note: A glossary of terms used in Unicode and ISO/IEC 10646 can be found 2. Character Repertoire
in [Glossary]. Information on the 10646/Unicode character encoding model
can be found in [CharModel].
2. Preparation Overview
The steps for preparing names are:
1) Input from the application service interface -- This can be done in
many ways and is not specified in this document
2) Map -- For each character in the input, check if it has a mapping
and, if so, replace it with its mapping. The mappings are a combination
of folding uppercase characters to lowercase and mapping out characters.
This is described in Section 4.
3) Normalize -- Normalize the characters. This is described in Section Unicode 3.1 [Unicode3.1] is the repertoire used in this profile.
5. The reason Unicode 3.1 was chosen instead of a version of
ISO/IEC 10646 is that ISO/IEC 10646 is expected to be updated soon after
this document becomes an RFC. Unicode 3.1 has the exact repertoire that
is expected in the next version of ISO/IEC 10646, and is therefore used
here.
4) Look for prohibited output -- Check for any characters that are not 3. Mapping
allowed in the output. If any are found, return an error to the
application service interface. This is described in Section 6.
5) Resolution of the prepared name -- This must be specified in a This profile specifies stringprep mapping using the mapping table
different IDN document. in Appendix D. That table includes all the steps described in this
section.
The above steps MUST be performed in the order given to comply with this Note that text in this section describe how Appendix D was formed. It is
specification. there for people who want to understand more, but it should be ignored
by implementors. Implementations of this profile MUST map based on
Appendix D, not based on the descriptions in this section of how
Appendix D was created.
The steps in this document have associated tables in the document. The 3.1 Mapped out
tables are derived from outside sources, and the derivation is briefly
described in the document. Although a great deal of effort has gone into
preparing the tables, there is a chance that the tables do not correctly
reflect the outside sources. Regardless of whether or not the tables
differ from the sources, implementations MUST use the tables in this
document for their processing. That is, if there is an error in the
tables, the tables must still be used. Future versions of this document
may include corrections and additions to the tables.
The mappings in section 3 can be one-to-none, one-to-one, or The following characters are simply deleted from the input (that is,
one-to-many. That is, some characters may be eliminated or replaced by they are mapped to nothing) because their presence or absence should not
more than one character, and the output of this step might be shorter or make two strings different.
longer than the input. The normalization in section 4 can be one-to-one
or many-to-one. Because of this, the system using nameprep MUST be
prepared to receive a longer or shorter string than the one input in the
nameprep algorithm.
3. Mapping Some characters are only useful in line-based text, and are otherwise
invisible and ignored.
Each character in the input stream is checked against the mapping table. 00AD; SOFT HYPHEN
The mapping table can be found in Appendix E of this document. That 1806; MONGOLIAN TODO SOFT HYPHEN
table includes all the steps described in the subsections below. 200B; ZERO WIDTH SPACE
FEFF; ZERO WIDTH NO-BREAK SPACE
Note that the subsections below describe how Appendix E was formed. Variation selectors and cursive connectors select different glyphs, but
They are there for people who want to understand more, but they should do not bear semantics.
be ignored by implementors. Nameprep implementations MUST map based
on Appendix E, not based on the descriptions in this section of how
Appendix E was created.
Rationale: Characters that are not wanted in internationalized name 180B; MONGOLIAN FREE VARIATION SELECTOR ONE
parts are either mapped to nothing in the mapping step, or cause an 180C; MONGOLIAN FREE VARIATION SELECTOR TWO
error in the prohibition step. The general guideline used to pick 180D; MONGOLIAN FREE VARIATION SELECTOR THREE
between the two outcomes was that removing alphabetic, non-protocol 200C; ZERO WIDTH NON-JOINER
characters be done in the mapping step, but all other characters cause 200D; ZERO WIDTH JOINER
errors in the prohibition step. This allows for minor linguistic errors
on the part of an input mechanism to be caught in the mapping step, but
to not hide serious errors such as entering protocol characters or
invisible characters from the user.
3.1 Case mapping 3.2 Case mapping
The input string is case folded according to [UTR21]. For most The input string is case folded according to [UTR21]. For most
characters, this is the same as changing the input character to a characters, this is the same as changing the input character to a
lowercase character. For some characters, however, more complex lowercase character. For some characters, however, more complex
transformations occur. The "CaseFolding.txt" file from the Unicode transformations occur. The "CaseFolding.txt" file from the Unicode
database was used to prepare Appendix E. database was used to prepare the mapping table.
Rationale: This could have been "change all lowercase characters
into uppercase characters". However, the upper-to-lower folding was
chosen because most users of the Internet today enter host names in
lowercase.
3.2 Additional folding mappings
There are some characters that do not have mappings in [UTR21] but still There are some characters that do not have mappings in [UTR21] but still
need processing. These characters include a few Greek characters and need processing. These characters include a few Greek characters and
many symbols that contain Latin characters. The list of characters to many symbols that contain Latin characters. The list of characters to
add to the mapping table were determined by the following algorithm: add to the mapping table were determined by the following algorithm:
b = NormalizeWithKC(Fold(a)); b = NormalizeWithKC(Fold(a));
c = NormalizeWithKC(Fold(b)); c = NormalizeWithKC(Fold(b));
if c is not the same as b, add a mapping for "a to c". if c is not the same as b, add a mapping for "a to c".
Because NormalizeWithKC(Fold(c)) always equals c, the table is stable Because NormalizeWithKC(Fold(c)) always equals c, the table is stable
from that point on. The "DerivedNormalizationProperties.txt" file from from that point on. The "DerivedNormalizationProperties.txt" file from
the Unicode database was used to prepare Appendix E. This mapping was the Unicode database was used to prepare Appendix D. This mapping was
added to reduce the number of processing steps, that is, to avoid doing added to reduce the number of processing steps, that is, to avoid doing
case mapping and normalization twice. case mapping and normalization twice.
3.3 Mapped out
The following characters are simply deleted from the input (that is,
they are mapped to nothing) because their presence or absence should not
make two domain names different.
Some characters are only useful in line-based text, and are otherwise
invisible and ignored.
00AD; SOFT HYPHEN
1806; MONGOLIAN TODO SOFT HYPHEN
200B; ZERO WIDTH SPACE
FEFF; ZERO WIDTH NO-BREAK SPACE
Variation selectors and cursive connectors select different glyphs, but
do not bear semantics.
180B; MONGOLIAN FREE VARIATION SELECTOR ONE
180C; MONGOLIAN FREE VARIATION SELECTOR TWO
180D; MONGOLIAN FREE VARIATION SELECTOR THREE
200C; ZERO WIDTH NON-JOINER
200D; ZERO WIDTH JOINER
4. Normalization 4. Normalization
The output of the mapping step is normalized using form KC, as described This profile specifies using Unicode normalization form KC, as described
in [UAX15]. Using form KC instead of form C causes many characters that in [UAX15].
are identical or near-identical to be converted into a single character.
Note that this specification refers to a specific version of [UAX15]. If
a later version of [UAX15] changes the algorithm used for normalization,
that later version MUST NOT be used with this specification. Note that
it is likely that this specification will be revised if UAX15 is
changed, but until that happens, systems compliant with this
specification MUST use only the specified version of [UAX15].
5. Prohibited Output 5. Prohibited Output
Before the text can be emitted, it must be checked for prohibited code This profile specifies using the prohibition table in Appendix E.
points. There is a variety of prohibited code points, as described in
this section.
Note that the subsections below describe how Appendix F was formed.
They are there for people who want to understand more, but they should
be ignored by implementors. Nameprep implementations MUST map based
on Appendix F, not based on the descriptions in this section of how
Appendix F was created.
One of the goals of IDN is to allow the widest possible set of host Note that the subsections below describe how Appendix E was formed. They
names as long as those host names do not cause other problems, such as are there for people who want to understand more, but they should be
conflict with other standards. Specifically, experience with current DNS ignored by implementors. Implementations of this profile MUST map based
names have shown that there is a desire for host names that include on Appendix E, not based on the descriptions in this section of how
personal names, company names, and spoken phrases. A goal of this Appendix E was created.
section is to prohibit as few characters that might be used in these
contexts as possible.
The collected list of prohibited code points can be found in Appendix F The collected lists of prohibited code points can be found in Appendix E
of this document. The list in Appendix F MUST be used by implementations of this document. The lists in Appendix E MUST be used by implementations
of this specification. If there are any discrepancies between the list of this specification. If there are any discrepancies between the lists
in Appendix F and subsections below, the list in Appendix F always takes in Appendix E and subsections below, the lists in Appendix E always takes
precedence. precedence.
Some code points listed in one section would also appear in other Some code points listed in one section would also appear in other
sections. Each code point is only listed once in the table in Appendix sections. Each code point is only listed once in the tables in Appendix
F. E.
5.1 Currently-prohibited ASCII characters 5.1 Currently-prohibited ASCII characters
Some of the ASCII characters that are currently prohibited in host names Some of the ASCII characters that are currently prohibited in host names
by [STD13] are also used in protocol elements such as URIs [URI]. The other by [STD13] are also used in protocol elements such as URIs [URI]. The other
characters in the range U+0000 to U+007F that are not currently allowed characters in the range U+0000 to U+007F that are not currently allowed
are also prohibited in host name parts to reserve them for future use in are also prohibited in host name parts to reserve them for future use in
protocol elements. protocol elements.
0000-002C; [ASCII CONTROL CHARACTERS and SPACE through ,] 0000-002C; [ASCII CONTROL CHARACTERS and SPACE through ,]
skipping to change at line 367 skipping to change at line 280
5.7 Inappropriate for plain text 5.7 Inappropriate for plain text
The following characters should not appear in regular text. The following characters should not appear in regular text.
FFF9; INTERLINEAR ANNOTATION ANCHOR FFF9; INTERLINEAR ANNOTATION ANCHOR
FFFA; INTERLINEAR ANNOTATION SEPARATOR FFFA; INTERLINEAR ANNOTATION SEPARATOR
FFFB; INTERLINEAR ANNOTATION TERMINATOR FFFB; INTERLINEAR ANNOTATION TERMINATOR
FFFC; OBJECT REPLACEMENT CHARACTER FFFC; OBJECT REPLACEMENT CHARACTER
5.8 Inappropriate for domain names 5.8 Inappropriate for canonical representation
The ideographic description characters allow different sequences of The ideographic description characters allow different sequences of
characters to be rendered the same way, which makes them inappropriate characters to be rendered the same way, which makes them inappropriate
for host names that must have a single canonical representation. for host names that must have a single canonical representation.
2FF0-2FFB; [IDEOGRAPHIC DESCRIPTION CHARACTERS] 2FF0-2FFB; [IDEOGRAPHIC DESCRIPTION CHARACTERS]
5.9 Change display properties 5.9 Change display properties
The following characters, some of which are deprecated in ISO/IEC 10646, The following characters, some of which are deprecated in ISO/IEC 10646,
skipping to change at line 399 skipping to change at line 312
206B; ACTIVATE SYMMETRIC SWAPPING 206B; ACTIVATE SYMMETRIC SWAPPING
206C; INHIBIT ARABIC FORM SHAPING 206C; INHIBIT ARABIC FORM SHAPING
206D; ACTIVATE ARABIC FORM SHAPING 206D; ACTIVATE ARABIC FORM SHAPING
206E; NATIONAL DIGIT SHAPES 206E; NATIONAL DIGIT SHAPES
206F; NOMINAL DIGIT SHAPES 206F; NOMINAL DIGIT SHAPES
5.10 Inappropriate characters from common input mechanisms 5.10 Inappropriate characters from common input mechanisms
U+3002 is used as if it were U+002E in many input mechanisms, U+3002 is used as if it were U+002E in many input mechanisms,
particularly in Asia. This prohibition allows input mechanisms to safely particularly in Asia. This prohibition allows input mechanisms to safely
map U+3002 to U+002E before doing nameprep without worrying about map U+3002 to U+002E before doing stringprep without worrying about
preventing users from accessing legitimate host name parts. preventing users from accessing legitimate host name parts.
3002; IDEOGRAPHIC FULL STOP 3002; IDEOGRAPHIC FULL STOP
5.11 Tagging characters 5.11 Tagging characters
The following characters are used for tagging text and are invisible. The following characters are used for tagging text and are invisible.
E0001; LANGUAGE TAG E0001; LANGUAGE TAG
E0020-E007F; [TAGGING CHARACTERS] E0020-E007F; [TAGGING CHARACTERS]
6. Unassigned Code Points 6. Unassigned Code Points in Internationalized Host Names
All code points not assigned in [Unicode3.1] are called "unassigned code
points". Authoritative name servers MUST NOT have internationalized name
parts that contain any unassigned code points. DNS requests MAY contain
name parts that contain unassigned code points. Note that this is the
only part of this document where the requirements for queries differs
from the requirements for names in DNS zones.
Note: For this section, Unicode 3.1 is the base repertoire of unassigned
code points. The reason Unicode 3.1 was chosen instead of a version of
ISO/IEC 10646 is that ISO/IEC 10646 is expected to be updated soon after
this document becomes an RFC. Unicode 3.1 has the exact repertoire that
is expected in the next version of ISO/IEC 10646, and is therefore used
here.
Using two different policies for where unassigned code points can appear
in the DNS prevents the need for versioning the IDN protocol [IDNrev].
This is very useful since it makes the overall processing simpler and
does not impose a "protocol" to handle versioning. It is expected that
ISO/IEC 10646 will be updated fairly frequently; recently, it has
happened approximately once a year. Each time a new version of ISO/IEC
10646 appears, a new version of this document can be created. Some end
users will want to use the new code points as soon as they are defined.
The list of unassigned code points can be found in Appendix G of this
document. The list in Appendix G MUST be used by implementations of this
specification. If there are any discrepancies between the list in
Appendix G and the Unicode 3.1 specification, the list Appendix G
always takes precedence.
Due to the way that versioning is handled in this section, host names
that are embedded in structures that cannot be changed (such as the
signed parts of digital certificates) MUST NOT have internationalized
name parts that contain any unassigned code points.
6.1 Categories of code points
Each code point in ISO/IEC 10646 can be categorized by how it acts in the
process described in earlier sections of this document:
AO Code points that may be in the output
MN Code points that cannot be in the output because they are
never appear as output from mapping or normalization
D Code points that cannot be in the output because they are
disallowed in the prohibition step
U Unassigned code points
A subsequent version of this document that references a newer version of
ISO/IEC 10646 with new code points will inherently have some code points
move from category U to either D, MN, or AO. For backwards
compatibility, no future version of this document will move code points
from any other category. That is, no current AO, MN, or D code points
will ever change to a different category.
Authoritative name servers MUST NOT contain any name that has code
points outside of AO for the latest version of this document. That is,
they are forbidden to contain any IDN names containing code points from
the MN, D, or U categories.
Applications creating name queries MUST treat U code points as if they
were AO when preparing the name parts according to this document. Those
applications MAY optionally have a preprocessor that provide stricter
checks: treating unassigned code points in the input as errors, or
warning the user about the fact that the code point is unassigned in the
version of this document that the software is based on; such a choice is
a local matter for the software.
Non-authoritative DNS servers MAY reject queries that include name parts
containing code points that are in categories MN or D for the version of
this document that they implement, but MUST NOT reject queries that
include name parts only for the reason that those parts contain code
points from category U.
6.2 Reasons for difference between authoritative servers and requests
Different software using different versions of this document need to
interoperate with maximal compatibility. The scheme described in this
section (authoritative name servers MUST NOT use unassigned code points,
requests MAY include unassigned code points) allows that compatibility
without introducing any known security or interoperability issues.
The list below shows what happens if a request contains a code point
from category U that is allowed in a newer version of this document. The
request either resolves to the domain name that was intended, or
resolves to no domain at all. In this list, the request comes from an
application using version "oldVersion" of this document, the
authoritative name server is using version "newVersion" of this
document, and the code point X was in category U on oldVersion, and has
changed category to AO, MN, or D. There are 3 possible scenarios:
1. X is assigned to AO -- In newVersion, X is in category AO. Because
the application passed X through, it gets back correct data from the
authoritative name server. There is one exceptional case, where X is a
combining mark.
The order of combining marks is normalized, so if another combining mark
Y has a lower combining class than X then XY will be put in the
canonical order YX. (Unassigned code points are never reordered, so this
doesn't happen in oldVersion). If the request contains YX, the request
will get correct data from the authoritative name server. However, no
domain name can be registered with XY, so a request with XY will get a
"no such host" error.
2. X is assigned to MN -- In newVersion, X is normalized to code point
"nX" and therefore X is now put in category MN. This cannot exist in any
domain name, so any request containing X will get back a "no such host"
error. Note, however, if the request had contained the letter nX, it
would have gotten back correct data.
3. X is assigned to D -- In newVersion, X is in category D. This cannot
exist in any domain name, so any request containing X will get back a
"no such host" error.
In none of the cases does the request get data for a host name other
than the one it actually wanted.
The processing in this document is always stable. If a string S is the
result of processing on newVersion, then it will remain the same when
processed on oldVersion.
There is always a way for the application to get the correct data from
the authoritative name server. For example, suppose that <ALPHA> was
unassigned in oldVersion, and that it is assigned in newVersion, but
case-folded to <alpha>. As long as the application supplies strings
containing <alpha> instead of <ALPHA>, the correct data will be
returned. Because the processing is stable, a different application
running newVersion can pass a processed host name to the application
running oldVersion. It will only contain <alpha>, and will return the
correct results from the authoritative name server.
6.3 Versions of applications and authoritative name servers
Another way to see that this versioning system works is to compare what
happens when an application uses a newer or older version of this
document.
Newer application -- Suppose that an application or intermediary DNS
server is using version newVersion and the authoritative name server is
using version oldVersion. This case is simple: there will be no names on
the server that cannot be accessed by the application because the
resolver uses a superset of the code points accepted by the server.
Newer server -- Suppose that an application or intermediary DNS server This profile lists the unassigned code points for Unicode 3.1 in
is using oldVersion and the authoritative name server is using Appendix F. The list in Appendix F MUST be used by implementations of
newVersion. Because the application passed through any unassigned code this specification. If there are any discrepancies between the list in
points, the user can access names on the server that use code points in Appendix F and the Unicode 3.1 specification, the list Appendix F always
newVersion. No names on the site can have code points that are takes precedence.
unassigned in newVersion, since that is illegal. In this case, the
application has to enter the unassigned code points in the correct
order, and has to use unassigned code points that would make it through
both the mapping and the normalization steps.
7. Security Considerations 7. Security Considerations
ISO/IEC 10646 has many characters that look similar. In many cases,
users of security protocols might do visual matching, such as when
comparing the names of trusted third parties. This profile does nothing
to map similar-looking characters together.
Much of the security of the Internet relies on the DNS. Thus, any change Much of the security of the Internet relies on the DNS. Thus, any change
to the characteristics of the DNS can change the security of much of the to the characteristics of the DNS can change the security of much of the
Internet. Internet.
Host names are used by users to connect to Internet servers. The Host names are used by users to connect to Internet servers. The
security of the Internet would be compromised if a user entering a security of the Internet would be compromised if a user entering a
single internationalized name could be connected to different servers single internationalized name could be connected to different servers
based on different interpretations of the internationalized host name. based on different interpretations of the internationalized host name.
Current applications may assume that the characters allowed in host Current applications may assume that the characters allowed in host
skipping to change at line 592 skipping to change at line 362
may be vulnerable to attack based on the new characters allowed by this may be vulnerable to attack based on the new characters allowed by this
specification. specification.
8. References 8. References
[CharModel] Unicode Technical Report;17, Character Encoding Model. [CharModel] Unicode Technical Report;17, Character Encoding Model.
<http://www.unicode.org/unicode/reports/tr17/>. <http://www.unicode.org/unicode/reports/tr17/>.
[Glossary] Unicode Glossary, <http://www.unicode.org/glossary/>. [Glossary] Unicode Glossary, <http://www.unicode.org/glossary/>.
[IDNReq] Zita Wenzel and James Seng, "Requirements of Internationalized
Domain Names", draft-ietf-idn-requirements
[IDNRev] Marc Blanchet, "Handling versions of internationalized domain
names protocols", draft-ietf-idn-version
[ISO10646] ISO/IEC 10646-1:2000. International Standard -- Information [ISO10646] ISO/IEC 10646-1:2000. International Standard -- Information
technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part
1: Architecture and Basic Multilingual Plane. 1: Architecture and Basic Multilingual Plane.
[Normalize] Character Normalization in IETF Protocols,
draft-duerst-i18n-norm-03
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, RFC 2119. Requirement Levels", March 1997, RFC 2119.
[RFC2396] Tim Berners-Lee, et. al., "Uniform Resource Identifiers (URI):
Generic Syntax", August 1998, RFC 2396.
[RFC2732] Robert Hinden, et. al., Format for Literal IPv6 Addresses in
URL's, December 1999, RFC 2732.
[STD13] Paul Mockapetris, "Domain names - concepts and facilities" (RFC [STD13] Paul Mockapetris, "Domain names - concepts and facilities" (RFC
1034) and "Domain names - implementation and specification" (RFC 1035, 1034) and "Domain names - implementation and specification" (RFC 1035,
STD 13, November 1987. STD 13, November 1987.
[STRINGPREP] Paul Hoffman and Marc Blanchet, "Preparation of
Internationalized Strings ("stringprep")", draft-hoffman-stringprep,
work in progress
[Unicode3.1] The Unicode Standard, Version 3.1.0: The Unicode [Unicode3.1] The Unicode Standard, Version 3.1.0: The Unicode
Consortium. The Unicode Standard, Version 3.0. Reading, MA, Consortium. The Unicode Standard, Version 3.0. Reading, MA,
Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5, as amended Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5, as amended
by: Unicode Standard Annex #27: Unicode 3.1 by: Unicode Standard Annex #27: Unicode 3.1
<http://www.unicode.org/unicode/reports/tr27/tr27-4.html>. <http://www.unicode.org/unicode/reports/tr27/tr27-4.html>.
[URI] For example: Roy Fielding et al., "Uniform Resource Identifiers: [URI] For example: Roy Fielding et al., "Uniform Resource Identifiers:
Generic Syntax", August 1998, RFC 2396; Robert Hinden et. al, "IPv6 Generic Syntax", August 1998, RFC 2396; Robert Hinden et. al, "IPv6
Literal Addresses in URL's", December 1999, RFC 2732. Note that Literal Addresses in URL's", December 1999, RFC 2732. Note that
there are many other RFCs that define additional URI schemes. there are many other RFCs that define additional URI schemes.
[UAX15] Mark Davis and Martin Duerst. Unicode Standard Annex #15: [UAX15] Mark Davis and Martin Duerst. Unicode Standard Annex #15:
Unicode Normalization Forms, Version 3.1.0. Unicode Normalization Forms, Version 3.1.0.
<http://www.unicode.org/unicode/reports/tr15/tr15-21.html> <http://www.unicode.org/unicode/reports/tr15/tr15-21.html>
[UTR21] Mark Davis. Case Mappings. Unicode Technical Report;21. [UTR21] Mark Davis. Case Mappings. Unicode Technical Report;21.
<http://www.unicode.org/unicode/reports/tr21/>. <http://www.unicode.org/unicode/reports/tr21/>.
9. Differences Between -05 and -06 Drafts
Throughout: became a profile of stringprep.
A: Added Dave Crocker
B: Made this section 9 to ease later renumbering. Relettered other
appendicies.
A. Acknowledgements A. Acknowledgements
Many people from the IETF IDN Working Group and the Unicode Technical Many people from the IETF IDN Working Group and the Unicode Technical
Committee contributed ideas that went into the first draft of this Committee contributed ideas that went into the first draft of this
document. Mark Davis and Patrik Faltstrom were particularly helpful in document.
some of the ideas, such as the versioning description.
The IDN namprep design team made many useful changes to the first The IDN namprep design team made many useful changes to the first
draft. That team and its advisors include: draft. That team and its advisors include:
Asmus Freytag Asmus Freytag
Cathy Wissink Cathy Wissink
Francois Yergeau Francois Yergeau
James Seng James Seng
Marc Blanchet Marc Blanchet
Mark Davis Mark Davis
Martin Duerst Martin Duerst
Patrik Faltstrom Patrik Faltstrom
Paul Hoffman Paul Hoffman
Additional significant improvements were proposed by: Additional significant improvements were proposed by:
Jonathan Rosenne Jonathan Rosenne
Kent Karlsson Kent Karlsson
Scott Hollenbeck Scott Hollenbeck
Dave Crocker
B. Differences Between -04 and -05 Drafts B. IANA Considerations
Small editorial changes throughout.
1: Added sentence at end of third paragraph.
2: Took paragraph from section 3 and moved it to the end of section 2
with a few changes.
3.2: Added sentence at end of last paragraph.
4: Clarified last sentence.
5.1: Added detail to ASCII ranges.
5.3: Added "(or characters with control function)"
Added:
070F
180E
206A-206F
FFF9-FFFC
1D173-1D17A
Removed the "future" control characters.
5.5: Added note about properties list at the end of the section.
5.8: Narrowed the range to 2FF0-2FFB because that is all that is
currently assigned.
5.11: Changed the range to E0001 and E0020-E007F.
6.1: Clarified definition of MN. Clarified last paragraph.
6.2, step 3: Corrected first sentence to say "D" instead of "MN".
8: Fixed title of [CharModel]. Added sentence at the end of [URI].
F: Updated the table from the changes in 5 above.
C. IANA Considerations
None. This is a profile of stringprep. When it becomes an RFC, it
should be registered in the stringprep profile registry.
D. Author Contact Information C. Author Contact Information
Paul Hoffman Paul Hoffman
Internet Mail Consortium and VPN Consortium Internet Mail Consortium and VPN Consortium
127 Segre Place 127 Segre Place
Santa Cruz, CA 95060 USA Santa Cruz, CA 95060 USA
paul.hoffman@imc.org and paul.hoffman@vpnc.org paul.hoffman@imc.org and paul.hoffman@vpnc.org
Marc Blanchet Marc Blanchet
Viagenie inc. Viagenie inc.
2875 boul. Laurier, bur. 300 2875 boul. Laurier, bur. 300
Ste-Foy, Quebec, Canada, G1V 2M2 Ste-Foy, Quebec, Canada, G1V 2M2
Marc.Blanchet@viagenie.qc.ca Marc.Blanchet@viagenie.qc.ca
E. Mapping Table D. Mapping Tables
The following is the mapping table from Section 3. The table has three The following is the mapping table from Section 3. The table has three
columns: columns:
- the character that is mapped from - the character that is mapped from
- the zero or more characters that it is mapped to - the zero or more characters that it is mapped to
- the reason for the mapping - the reason for the mapping
The columns are separated by semicolons. Note that the second column may The columns are separated by semicolons. Note that the second column may
be empty, or it may have one character, or it may have more than one be empty, or it may have one character, or it may have more than one
character, with each character separated by a space. character, with each character separated by a space.
skipping to change at line 2097 skipping to change at line 1827
1D7A2; 03C3; Additional folding 1D7A2; 03C3; Additional folding
1D7A3; 03C4; Additional folding 1D7A3; 03C4; Additional folding
1D7A4; 03C5; Additional folding 1D7A4; 03C5; Additional folding
1D7A5; 03C6; Additional folding 1D7A5; 03C6; Additional folding
1D7A6; 03C7; Additional folding 1D7A6; 03C7; Additional folding
1D7A7; 03C8; Additional folding 1D7A7; 03C8; Additional folding
1D7A8; 03C9; Additional folding 1D7A8; 03C9; Additional folding
1D7BB; 03C3; Additional folding 1D7BB; 03C3; Additional folding
----- End Mapping Table ----- ----- End Mapping Table -----
F. Prohibited Code Point List E. Prohibited Code Point List
----- Start Prohibited Table ----- ----- Start Prohibited Table -----
0000-002C 0000-002C
002E-002F 002E-002F
003A-0040 003A-0040
005B-0060 005B-0060
007B-007F 007B-007F
0080-009F 0080-009F
00A0 00A0
070F 070F
skipping to change at line 2162 skipping to change at line 1892
BFFFE-BFFFF BFFFE-BFFFF
CFFFE-CFFFF CFFFE-CFFFF
DFFFE-DFFFF DFFFE-DFFFF
E0001 E0001
E0020-E007F E0020-E007F
EFFFE-EFFFF EFFFE-EFFFF
F0000-FFFFD F0000-FFFFD
FFFFE-FFFFF FFFFE-FFFFF
100000-10FFFD 100000-10FFFD
10FFFE-10FFFF 10FFFE-10FFFF
----- End Prohibited Table ----- ----- End Prohibited Table -----
NOTE WELL: Software that follows this specification that will be used to NOTE WELL: Software that follows this specification that will be used to
check names before they are put in authoritative name servers MUST add check names before they are put in authoritative name servers MUST add
all unassigned code pints to the list of characters that are prohibited. all unassigned code pints to the list of characters that are prohibited.
See Section 6 for more details. See Section 6 of [STRINGPREP] for more details.
G. Unassigned Code Point List F. Unassigned Code Point List
----- Start Unassigned Table ----- ----- Start Unassigned Table -----
0220-0221 0220-0221
0234-024F 0234-024F
02AE-02AF 02AE-02AF
02EF-02FF 02EF-02FF
034F-035F 034F-035F
0363-0373 0363-0373
0376-0379 0376-0379
037B-037D 037B-037D
 End of changes. 49 change blocks. 
386 lines changed or deleted 115 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/