draft-ietf-idn-uri-02.txt   draft-ietf-idn-uri-03.txt 
Network Working Group M. Duerst Network Working Group M. Duerst
Internet-Draft W3C/Keio University Internet-Draft W3C
Expires: December 30, 2002 July 1, 2002 Expires: May 4, 2003 November 3, 2002
Internationalized Domain Names in URIs Internationalized Domain Names in URIs
draft-ietf-idn-uri-02 draft-ietf-idn-uri-03
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 31 skipping to change at page 1, line 31
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http:// The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt. www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 30, 2002. This Internet-Draft will expire on May 4, 2003.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved. Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract Abstract
This document proposes to upgrade the definition of URIs (RFC 2396) This document proposes to upgrade the definition of URIs (RFC 2396)
[RFC2396] to work consistently with internationalized domain names. [RFC2396] to work consistently with internationalized domain names.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. URI syntax changes . . . . . . . . . . . . . . . . . . . . . . 3 2. URI syntax changes . . . . . . . . . . . . . . . . . . . . . . 3
3. Security considerations . . . . . . . . . . . . . . . . . . . 5 3. Security considerations . . . . . . . . . . . . . . . . . . . 5
4. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 5
4.1 Changes from draft-ietf-idn-uri--01 to draft-ietf-idn-uri-02 . 5 5. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01 . 5 5.1 Changes from draft-ietf-idn-uri-02 to draft-ietf-idn-uri-03 . 5
References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5.2 Changes from draft-ietf-idn-uri-01 to draft-ietf-idn-uri-02 . 5
5.3 Changes from draft-ietf-idn-uri-00 to draft-ietf-idn-uri-01 . 5
References . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 7 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 7
Full Copyright Statement . . . . . . . . . . . . . . . . . . . 8 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 8
1. Introduction 1. Introduction
Internet domain names serve to identify hosts and services on the Internet domain names serve to identify hosts and services on the
Internet in a convenient way. The IETF IDN working group [IDNWG] has Internet in a convenient way. The IETF IDN working group [IDNWG] has
been working on extending the character repertoire usable in domain been working on extending the character repertoire usable in domain
names beyond a subset of US-ASCII. names beyond a subset of US-ASCII.
One of the most important places where domain names appear are One of the most important places where domain names appear are
Uniform Resource Identifiers (URIs, [RFC2396], as modified by Uniform Resource Identifiers (URIs, [RFC2396], as modified by
[RFC2732]). However, in the current definition of the generic URI [RFC2732]). However, in the current definition of the generic URI
syntax, the restrictions on domain names are 'hard-coded'. In syntax, the restrictions on domain names are 'hard-coded'. In
Section 2, this document relaxes these restrictions by updating the Section 2, this document relaxes these restrictions by updating the
syntax, and defines how internationalized domain names are encoded in syntax, and defines how internationalized domain names are encoded in
URIs. URIs.
The syntax in this document has been choosen to further increase the The syntax in this document has been chosen to further increase the
uniformity of URI syntax, which is a very important principle of uniformity of URI syntax, which is a very important principle of
URIs. URIs.
In practice, escaped domanin names should be used as rarely as In practice, escaped domain names should be used as rarely as
possible. Wherever possible, the actual characters in possible. Wherever possible, the actual characters in
Internationalized Domain Names should be preserved as long as Internationalized Domain Names should be preserved as long as
possible by using IRIs [IRI] rather than URIs, and only converting to possible by using IRIs [IRI] rather than URIs, and only converting to
URIs and then to ACE-encoded [IDNA] domain names (or ideally directly URIs and then to ACE-encoded [IDNA] domain names (or ideally directly
to ACE-encoding without even using URIs) when resolving the IRI. to ACE-encoding without even using URIs) when resolving the IRI.
Also, this document does in no way exclude the use of ACE encoding Also, this document does not exclude the use of ACE encoding directly
directly in an URI domain name part. ACE encoding may be used in an URI domain name part. ACE encoding may be used directly in an
directly in an URI domain name part if this is considered necessary URI domain name part if this is considered necessary for
for interoperability. interoperability.
Please note that even with the definition of URIs in [RFC2396], some Please note that even with the definition of URIs in [RFC2396], some
URIs can already contain host names with escaped characters. For URIs can already contain host names with escaped characters. For
example, mailto:example@w%33.org is legal per [RFC2396] because the example, mailto:example@w%33.org is legal per [RFC2396] because the
mailto: URI scheme does not follow the generic syntax of [RFC2396]. mailto: URI scheme does not follow the generic syntax of [RFC2396].
2. URI syntax changes 2. URI syntax changes
The syntax of URIs [RFC2396] currently contains the following rules The syntax of URIs [RFC2396] currently contains the following rules
relevant to domain names: relevant to domain names:
skipping to change at page 4, line 28 skipping to change at page 4, line 28
Using UTF-8 assures that this encoding interoperates with IRIs [IRI]. Using UTF-8 assures that this encoding interoperates with IRIs [IRI].
It is also aligned with the recommendations in [RFC2277] and It is also aligned with the recommendations in [RFC2277] and
[RFC2718], and is consistent with the URN syntax [RFC2141] as well as [RFC2718], and is consistent with the URN syntax [RFC2141] as well as
recent URL scheme definitions that define encodings of non-ASCII recent URL scheme definitions that define encodings of non-ASCII
characters based on UTF-8 (e.g., IMAP URLs [RFC2192] and POP URLs characters based on UTF-8 (e.g., IMAP URLs [RFC2192] and POP URLs
[RFC2384]). [RFC2384]).
The above syntax rules permit for domain names that are neither The above syntax rules permit for domain names that are neither
permitted as US-ASCII only domain names nor as internationalized permitted as US-ASCII only domain names nor as internationalized
domain names. However, such syntax should never be used, and will domain names. However, such domain names should never be used, and
always be rejected by resolvers. For US-ASCII only domain names, the will never be resolved because no such domains will be registered.
syntax rules in [RFC2396] are relevant. For example, http:// For US-ASCII only domain names, the syntax rules in [RFC2396] are
www.w%33.org is legal, because the corresponding 'w3' is a legal relevant. For example, http://www.w%33.org is legal, because the
'domainlabel' according to [RFC2396]. However, http:// corresponding 'w3' is a legal 'domainlabel' according to [RFC2396].
%2a.example.org is illegal because the corresponding '*' is not a However, http://%2a.example.org is illegal because the corresponding
legal 'domainlabel' according to [RFC2396]. For domain names '*' is not a legal 'domainlabel' according to [RFC2396].
containing non-ASCII characters, the legal domain names are those for
which the ToASCII operation ([IDNA], [Nameprep]; using the unescaped For domain names containing non-ASCII characters, the legal domain
UTF-8 values as input) is successful. names are those for which the ToASCII operation ([IDNA], [Nameprep];
using the unescaped UTF-8 values as input), with the flags
"UseSTD3ASCIIRules" and "AllowUnassigned" set, is successful. The
URI resolver MUST apply any steps required as part of domain name
resolution by [IDNA], in particular the ToASCII operation, with the
above-mentioned flags set. URIs where the ToASCII operation results
in an error should be treated as unresolvable.
For domain names containing non-ASCII characters, the Nameprep
specification ([Nameprep]) defines some mappings, which mainly
include normalization to NFKC and folding to lower case. When
encoding an internationalized domain name in an URI, these mappings
SHOULD NOT be applied. It should be assumed that the domain name is
already normalized as far as appropriate.
For consistency in comparison operations and for interoperability For consistency in comparison operations and for interoperability
with older software, the following should be noted: 1) US-ASCII with older software, the following should be noted: 1) US-ASCII
characters in domain names should not be escaped. 2) Because of the characters in domain names should not be escaped. 2) Because of the
principle of syntax uniformity for URIs, it is always more prudent to principle of syntax uniformity for URIs, it is always more prudent to
take into account the possibility that US-ASCII characters are take into account the possibility that US-ASCII characters are
escaped. escaped.
The work of the IDN WG includes some procedures for name preparation
[Nameprep]. Before encoding an internationalized domain name in an
URI, this preparation step SHOULD be applied. However, the URI
resolver MUST also apply any steps required as part of domain name
resolution by [IDNA].
3. Security considerations 3. Security considerations
The security considerations of [RFC2396] and those applying to The security considerations of [RFC2396] and those applying to
internationalized domain names apply. There may be an increased internationalized domain names apply. There may be an increased
potential to smuggle escaped US-ASCII-based domain names across potential to smuggle escaped US-ASCII-based domain names across
firewalls, although because of the uniform syntax principle for URIs, firewalls, although because of the uniform syntax principle for URIs,
such a potential is already existing. such a potential is already existing.
4. Change Log 4. Acknowledgements
4.1 Changes from draft-ietf-idn-uri--01 to draft-ietf-idn-uri-02 Erik Nordmark
5. Change Log
5.1 Changes from draft-ietf-idn-uri-02 to draft-ietf-idn-uri-03
Clarified expectations on name checking.
5.2 Changes from draft-ietf-idn-uri-01 to draft-ietf-idn-uri-02
Moved change log to back Moved change log to back
Changed to only change URIs; IRI syntax updated directly in IRI Changed to only change URIs; IRI syntax updated directly in IRI
draft. draft.
Removed syntax restriction on %hh in the US-ASCII part, but made Removed syntax restriction on %hh in the US-ASCII part, but made
clear that restrictions to domain names apply. clear that restrictions to domain names apply.
Made clear that escaped domain names in URIs should only be an Made clear that escaped domain names in URIs should only be an
intermediate representation. intermediate representation.
Gave example of mailto: as already allowing escaped host names. Gave example of mailto: as already allowing escaped host names.
4.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01 Corrected some typos.
5.3 Changes from draft-ietf-idn-uri-00 to draft-ietf-idn-uri-01
Changed requirement for URI/IRI resolvers from MUST to SHOULD Changed requirement for URI/IRI resolvers from MUST to SHOULD
Changed IRI syntax slightly (ichar -> idchar, based on changes in Changed IRI syntax slightly (ichar -> idchar, based on changes in
[IRI]) [IRI])
Various wording changes Various wording changes
References References
[IDNA] Faltstrom, P., Hoffman, P. and A. Costello, [IDNA] Faltstrom, P., Hoffman, P. and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)", "Internationalizing Domain Names in Applications (IDNA)",
draft-ietf-idn-idna-09.txt (work in progress), May 2002, draft-ietf-idn-idna-14.txt (work in progress), October
<http://www.ietf.org/internet-drafts/draft-ietf-idn-idna- 2002, <http://www.ietf.org/internet-drafts/draft-ietf-
09.txt>. idn-idna-14.txt>.
[IDNWG] "IETF Internationalized Domain Name (idn) Working Group". [IDNWG] "IETF Internationalized Domain Name (idn) Working Group".
[IRI] Duerst, M. and M. Suignard, "Internationalized Resource [IRI] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRI)", draft-duerst-iri-01 (work in Identifiers (IRI)", draft-duerst-iri-02.txt (work in
progress), July 2002. progress), November 2002, <http://www.ietf.org/internet-
drafts/draft-duerst-iri-02.txt>.
[ISO10646] International Organization for Standardization, [ISO10646] International Organization for Standardization,
"Information Technology - Universal Multiple-Octet Coded "Information Technology - Universal Multiple-Octet Coded
Character Set (UCS) - Part 1: Architecture and Basic Character Set (UCS) - Part 1: Architecture and Basic
Multilingual Plane", ISO Standard 10646-1, October 2000. Multilingual Plane", ISO Standard 10646-1, October 2000.
[Nameprep] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep [Nameprep] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names", draft-ietf- Profile for Internationalized Domain Names", draft-ietf-
idn-nameprep-10.txt (work in progress), May 2002, <http:/ idn-nameprep-11.txt (work in progress), June 2002,
/www.ietf.org/internet-drafts/draft-ietf-idn-nameprep- <http://www.ietf.org/internet-drafts/draft-ietf-idn-
10.txt>. nameprep-11.txt>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997.
[RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997.
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
Languages", BCP 18, RFC 2277, January 1998. Languages", BCP 18, RFC 2277, January 1998.
skipping to change at page 7, line 8 skipping to change at page 7, line 17
"Guidelines for new URL Schemes", RFC 2718, November "Guidelines for new URL Schemes", RFC 2718, November
1999. 1999.
[RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for [RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for
Literal IPv6 Addresses in URL's", RFC 2732, December Literal IPv6 Addresses in URL's", RFC 2732, December
1999. 1999.
Author's Address Author's Address
Martin Duerst Martin Duerst
W3C/Keio University World Wide Web Consortium
5322 Endo 200 Technology Square
Fujisawa 252-8520 Cambridge, MA 02139
Japan U.S.A.
Phone: +81 466 49 1170 Phone: +1 617 253 5509
Fax: +81 466 49 1171 Fax: +1 617 258 5999
EMail: duerst@w3.org EMail: duerst@w3.org
URI: http://www.w3.org/People/D%C3%BCrst/ URI: http://www.w3.org/People/D%C3%BCrst/
Full Copyright Statement Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved. Copyright (C) The Internet Society (2002). All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published or assist in its implementation may be prepared, copied, published
 End of changes. 18 change blocks. 
48 lines changed or deleted 67 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/