INTERNET-DRAFT MartinNetwork Working Group M. Duerst draft-ietf-idn-uri-01Internet-Draft W3C/Keio University Expires MayExpires: December 30, 2002 July 1, 2002 November 20, 2001Internationalized Domain Names in URIs and IRIsdraft-ietf-idn-uri-02 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- DraftsInternet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 30, 2002. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. Abstract This document proposes to upgrade the definitionsdefinition of URIs [RFC 2396] and IRIs (Internationalized Resource Identifiers, [IRI])(RFC 2396) [RFC2396] to work consistently with internationalized domain names. 0.Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. URI syntax changes . . . . . . . . . . . . . . . . . . . . . . 3 3. Security considerations . . . . . . . . . . . . . . . . . . . 5 4. Change Log 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . 5 4.1 Changes from -00draft-ietf-idn-uri--01 to -01 - Changed requirement for URI/IRI resolversdraft-ietf-idn-uri-02 . 5 4.2 Changes from MUSTdraft-ietf-idn-uri--00 to SHOULD - Changed IRI syntax slightly (ichar -> idchar, based on changes in [IRI]) - Various wording changesdraft-ietf-idn-uri-01 . 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 7 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 8 1. Introduction Internet domain names serve to identify hosts and services on the Internet in a convenient way. The IETF IDN working group is currently[IDNWG] has been working on extending the character repertoire usable in domain names beyond a subset of US-ASCII. One of the most important places where domain names appear are Uniform Resource Identifiers (URIs, [RFC 2396],[RFC2396], as modified by [RFC2732]). However, in the current definition of the generic URI syntax, the restrictions on domain names are 'hard-coded'. In Section 2, this document relaxes these restrictions by updating the syntax, and defines how internationalized domain names are encoded in URIs. URIs are restrictedThe syntax in this document has been choosen to further increase the uniformity of URI syntax, which is a subsetvery important principle of US-ASCII. However, IRIs (Internationalized Resource Identifier [IRI]) in general allow non-ASCII characters. ButURIs. In practice, escaped domanin names should be used as rarely as possible. Wherever possible, the syntax ofactual characters in Internationalized Domain Names should be preserved as long as possible by using IRIs has the same 'hard-coded' restrictions on[IRI] rather than URIs, and only converting to URIs and then to ACE-encoded [IDNA] domain names as(or ideally directly to ACE-encoding without even using URIs) when resolving the syntax of URIs. In Section 3,IRI. Also, this document relaxes these restrictions by updating the IRI syntax. This is donedoes in ano way that is compatible withexclude the new syntax for URIs. This means thatuse of ACE encoding an internationalized domain namedirectly in an URI and encoding the samedomain name part. ACE encoding may be used directly in an IRI will produce anURI and an IRIdomain name part if this is considered necessary for interoperability. Please note that can be converted into each other usingeven with the procedures defineddefinition of URIs in [IRI] for these conversions.[RFC2396], some URIs can already contain host names with escaped characters. For example, mailto:example@w%33.org is legal per [RFC2396] because the mailto: URI scheme does not follow the generic syntax of [RFC2396]. 2. URI syntax changes The syntax of URIs [RFC2326][RFC2396] currently contains the following rules relevant to domain names: hostname = *( domainlabel "." ) toplabel [ "." ] domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum The later two rules are changed as follows: domainlabel = escalphanumanchar | escalphanumanchar *( escalphanumanchar | "-" ) escalphanumanchar toplabel = escalphaachar | escalphaachar *( escalphanumanchar | "-" ) escalphanumanchar and the following rules are added: escalphanumanchar = escaped8 |alphanum escalpha = elcaped8| alpha escaped8 = "%" hexdig8 HEXDIG hexdig8escaped achar = <<HEXDIG greater than 7>> The %HH escaping is used to encode charactersalpha | escaped Characters outside the repertoire of US-ASCII. This is done(alphanum) are encoded by first encoding the characters in UTF-8 [RFC 2279], resulting in a sequence of octets, and then escaping these octets according to the rules defined in [RFC2396]. Using UTF-8 assures that this encoding interoperates with IRIs (see Section 3).[IRI]. It is also aligned with the recommendations in [RFC 2277][RFC2277] and [RFC 2718],[RFC2718], and is consistent with the URN syntax [RFC2141] as well as recent URL scheme definitions that define encodings of non-ASCII characters based on UTF-8 (e.g., IMAP URLs [RFC 2192][RFC2192] and POP URLs [RFC 2384]). Please note that the use of UTF-8[RFC2384]). The above syntax rules permit for encoding internationalizeddomain names in URIs is independent of the choice of encoding chosen for thesethat are neither permitted as US-ASCII only domain names innor as internationalized domain names. However, such syntax should never be used, and will always be rejected by resolvers. For US-ASCII only domain names, the DNS protocol. Depending onsyntax rules in [RFC2396] are relevant. For example, http:// www.w%33.org is legal, because the choice of encoding forcorresponding 'w3' is a legal 'domainlabel' according to [RFC2396]. However, http:// %2a.example.org is illegal because the DNS protocol, an appropriate conversioncorresponding '*' is necessary. The above syntax rules donot extenda legal 'domainlabel' according to [RFC2396]. For domain names containing non-ASCII characters, the possiblelegal domain names based on US-ASCII characters. This is in accordance withare those for which the current direction ofToASCII operation ([IDNA], [Nameprep]; using the IDN WG [IDNWG]. The above rules also do not allow escaping of US-ASCII characters, although thisunescaped UTF-8 values as input) is allowedsuccessful. For consistency in the other parts of an URI (exceptcomparison operations and for interoperability with older software, the special provisions in case of reserved characters). Allowing such escaping would make the syntax rules quite a bit more complicated, would mean that the restrictions on US-ASCII characters can be circumvented by using escaping, or would lead to much simpler syntax rules that don't express these restrictions anymore. Whether escaping of US-ASCII characters is allowed or not, two thingsfollowing should be noted: 1) It is always better not to escapeUS-ASCII characters in domain names because of the possibility that a resolver doesshould not unescape them. At least purely US-ASCII domain names would then alwaysbe resolved by such a processor.escaped. 2) Because of the principle of syntax uniformity for URIs, it is always more prudent to take into account the possibility that US-ASCII characters are escaped. Only the restrictions on US-ASCII characters are expressed in the rules above. However, all the other restrictions on internationalized domain names that are defined by the IDN WG [IDNWG] MUST be respected.The work of the IDN WG currentlyincludes some procedures for name preparation.preparation [Nameprep]. Before encoding an internationalized domain name in an URI, this preparation step SHOULD be applied. However, the URI resolver SHOULDMUST also apply any steps required as part of domain name preparation.resolution by [IDNA]. 3. IRI syntax changesSecurity considerations The syntaxsecurity considerations of IRIs [IRI] currently contains the following rules relevant[RFC2396] and those applying to internationalized domain names: hostname = *( domainlabel "." ) toplabel [ "." ] domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum The later two rules are changed as follows: domainlabel = intalphanum | intalphanum *( intalphanum | "-" ) intalphanum toplabel = intalpha | intalpha *( intalphanum | "-" ) intalphanum and the following rules are added: intalphanum = idchar | alphanum | escaped8 intalpha = idchar | alpha | escaped8 escaped8 = "%" hexdig8 HEXDIG hexdig8 = <<HEXDIG greater than 7>> idchar = << any character of the UCS [ISO10646] of U+00A0 and beyond, subject to limitations in Section 3.1. of [IRI] >> With respectnames apply. There may be an increased potential to the allowedsmuggle escaped US-ASCII-based domain names based on US-ASCII characters,across firewalls, although because of the same considerations asuniform syntax principle for URIs, such a potential is already existing. 4. Change Log 4.1 Changes from draft-ietf-idn-uri--01 to draft-ietf-idn-uri-02 Moved change log to back Changed to only change URIs; IRI syntax updated directly in Section 2 apply. AsIRI draft. Removed syntax restriction on %hh in Section 2, allthe otherUS-ASCII part, but made clear that restrictions on internationalizedto domain names apply. Made clear that will be defined by the IDN WG MUST be respected. Also, before encoding an internationalizedescaped domain namenames in URIs should only be an IRI, name preparationintermediate representation. Gave example of mailto: as already allowing escaped host names. 4.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01 Changed requirement for URI/IRI resolvers from MUST to SHOULD be applied. However, theChanged IRI resolver SHOULD also apply name preparation. It is expected that the rulessyntax slightly (ichar -> idchar, based on changes in Section 3.1 of[IRI]) Various wording changes References [IDNA] Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", draft-ietf-idn-idna-09.txt (work in progress), May 2002, <http://www.ietf.org/internet-drafts/draft-ietf-idn-idna- 09.txt>. [IDNWG] "IETF Internationalized Domain Name (idn) Working Group". [IRI] will be less restrictive than the rules for internationalized domain names, so that no escaping is necessary. Nevertheless, escaping is allowedDuerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRI)", draft-duerst-iri-01 (work in progress), July 2002. [ISO10646] International Organization for cases where not all characters can be directly represented. 4. Security Considerations The security considerations of [RFC 2396]Standardization, "Information Technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and [IRI]Basic Multilingual Plane", ISO Standard 10646-1, October 2000. [Nameprep] Hoffman, P. and those applying to internationalized domain names apply. There may be an increased potentialM. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names", draft-ietf- idn-nameprep-10.txt (work in progress), May 2002, <http:/ /www.ietf.org/internet-drafts/draft-ietf-idn-nameprep- 10.txt>. [RFC2119] Bradner, S., "Key words for use in RFCs to smuggle escaped US-ASCII-based domain names across firewalls, although becauseIndicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and Languages", BCP 18, RFC 2277, January 1998. [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC 2279, January 1998. [RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998. [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 2396, August 1998. [RFC2640] Curtin, B., "Internationalization of the uniform syntax principleFile Transfer Protocol", RFC 2640, July 1999. [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, "Guidelines for URIs, such a potential is already existing. Acknowledgements Looking forwardnew URL Schemes", RFC 2718, November 1999. [RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for comments. Will acknowledge them here!Literal IPv6 Addresses in URL's", RFC 2732, December 1999. Author's Address Martin Duerst W3C/Keio University 5322 Endo Fujisawa 252-8520 Japan Phone: +81 466 49 1170 Fax: +81 466 49 1171 EMail: firstname.lastname@example.org URI: http://www.w3.org/People/D%C3%BCrst/ Full Copyright Statement Copyright (C) The Internet Society, 1997.Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Author's address Martin J. Duerst W3C/Keio University 5322 Endo, Fujisawa 252-8520 Japan email@example.com http://www.w3.org/People/D%C3%BCrst/ Tel/Fax: +81 466 49 1170 Note: Please write "Duerst" with u-umlaut wherever possible, e.g. as "Dürst" in XML and HTML. References [IDNWG] IETF Internationalized Domain Name (idn) Working Group. Information at http://www.ietf.org/html.charters/idn-charter.html. [IRI] L. Masinter, M. Duerst, "Internationalized Resource Identifiers (IRI)", Internet Draft, November 2001, <http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-08.txt>, work in progress. [ISO10646] ISO/IEC, Information Technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane, Oct. 2000, with amendments. [RFC 2119] S. Bradner, "Key wordsPURPOSE. Acknowledgement Funding for use in RFCs to Indicate Requirement Levels", March 1997. [RFC 2141] R. Moats, "URN Syntax", May 1997. [RFC 2192] C. Newman, "IMAP URL Scheme", September 1997. [RFC 2277] H. Alvestrad, "IETF Policy on Character Sets and Languages". [RFC 2279] F. Yergeau. "UTF-8, a transformation format of ISO 10646.", January 1998. [RFC 2384] R. Gellens, "POP URL Scheme", August 1998. [RFC 2396] T.Berners-Lee, R.Fielding, L.Masinter. "Uniform Resource Identifiers (URI): Generic Syntax." August 1998. [RFC 2640] B. Curtis, "Internationalization ofthe File Transfer Protocol", July 1999. [RFC 2718] L. Masinter, H. Alvestrand, D. Zigmond, R. Petke, "Guidelines for new URL Schemes", November 1999. [RFC 2732] R. Hinden, B. Carpenter, L. Masinter, "Format for Literal IPv6 Addresses in URL's", December 1999.RFC Editor function is currently provided by the Internet Society.