draft-ietf-iri-3987bis-02.txt   draft-ietf-iri-3987bis-03.txt 
Internationalized Resource M. Duerst Internationalized Resource M. Duerst
Identifiers (iri) Aoyama Gakuin University Identifiers (iri) Aoyama Gakuin University
Internet-Draft M. Suignard Internet-Draft M. Suignard
Obsoletes: 3987 (if approved) Unicode Consortium Obsoletes: 3987 (if approved) Unicode Consortium
Intended status: Standards Track L. Masinter Intended status: Standards Track L. Masinter
Expires: April 20, 2011 Adobe Expires: April 28, 2011 Adobe
October 17, 2010 October 25, 2010
Internationalized Resource Identifiers (IRIs) Internationalized Resource Identifiers (IRIs)
draft-ietf-iri-3987bis-02 draft-ietf-iri-3987bis-03
Abstract Abstract
This document defines the Internationalized Resource Identifier (IRI) This document defines the Internationalized Resource Identifier (IRI)
protocol element, as an extension of the Uniform Resource Identifier protocol element, as an extension of the Uniform Resource Identifier
(URI). An IRI is a sequence of characters from the Universal (URI). An IRI is a sequence of characters from the Universal
Character Set (Unicode/ISO 10646). Grammar and processing rules are Character Set (Unicode/ISO 10646). Grammar and processing rules are
given for IRIs and related syntactic forms. given for IRIs and related syntactic forms.
In addition, this document provides named additional rule sets for In addition, this document provides named additional rule sets for
skipping to change at page 2, line 21 skipping to change at page 2, line 21
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 20, 2011. This Internet-Draft will expire on April 28, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 19 skipping to change at page 3, line 19
1.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6 1.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6
1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 6 1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 6
1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 9
2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 10 2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 10
2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10 2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10
3. Processing IRIs and related protocol elements . . . . . . . . 13 3. Processing IRIs and related protocol elements . . . . . . . . 13
3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 14 3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 14
3.2. Parse the IRI into IRI components . . . . . . . . . . . . 14 3.2. Parse the IRI into IRI components . . . . . . . . . . . . 14
3.3. General percent-encoding of IRI components . . . . . . . 15 3.3. General percent-encoding of IRI components . . . . . . . 15
3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 15 3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 16
3.5. Mapping query components . . . . . . . . . . . . . . . . 17 3.5. Mapping query components . . . . . . . . . . . . . . . . 17
3.6. Mapping IRIs to URIs . . . . . . . . . . . . . . . . . . 17 3.6. Mapping IRIs to URIs . . . . . . . . . . . . . . . . . . 17
3.7. Converting URIs to IRIs . . . . . . . . . . . . . . . . . 17 3.7. Converting URIs to IRIs . . . . . . . . . . . . . . . . . 17
3.7.1. Examples . . . . . . . . . . . . . . . . . . . . . . . 19 3.7.1. Examples . . . . . . . . . . . . . . . . . . . . . . . 19
4. Bidirectional IRIs for Right-to-Left Languages . . . . . . . . 20 4. Bidirectional IRIs for Right-to-Left Languages . . . . . . . . 20
4.1. Logical Storage and Visual Presentation . . . . . . . . . 21 4.1. Logical Storage and Visual Presentation . . . . . . . . . 21
4.2. Bidi IRI Structure . . . . . . . . . . . . . . . . . . . 22 4.2. Bidi IRI Structure . . . . . . . . . . . . . . . . . . . 22
4.3. Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . 23 4.3. Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . 23
4.4. Examples . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4. Examples . . . . . . . . . . . . . . . . . . . . . . . . 23
5. Normalization and Comparison . . . . . . . . . . . . . . . . . 25 5. Normalization and Comparison . . . . . . . . . . . . . . . . . 25
5.1. Equivalence . . . . . . . . . . . . . . . . . . . . . . . 25 5.1. Equivalence . . . . . . . . . . . . . . . . . . . . . . . 26
5.2. Preparation for Comparison . . . . . . . . . . . . . . . 26 5.2. Preparation for Comparison . . . . . . . . . . . . . . . 26
5.3. Comparison Ladder . . . . . . . . . . . . . . . . . . . . 27 5.3. Comparison Ladder . . . . . . . . . . . . . . . . . . . . 27
5.3.1. Simple String Comparison . . . . . . . . . . . . . . . 27 5.3.1. Simple String Comparison . . . . . . . . . . . . . . . 27
5.3.2. Syntax-Based Normalization . . . . . . . . . . . . . . 28 5.3.2. Syntax-Based Normalization . . . . . . . . . . . . . . 28
5.3.3. Scheme-Based Normalization . . . . . . . . . . . . . . 31 5.3.3. Scheme-Based Normalization . . . . . . . . . . . . . . 31
5.3.4. Protocol-Based Normalization . . . . . . . . . . . . . 32 5.3.4. Protocol-Based Normalization . . . . . . . . . . . . . 32
6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 32 6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.1. Limitations on UCS Characters Allowed in IRIs . . . . . . 33 6.1. Limitations on UCS Characters Allowed in IRIs . . . . . . 33
6.2. Software Interfaces and Protocols . . . . . . . . . . . . 33 6.2. Software Interfaces and Protocols . . . . . . . . . . . . 33
6.3. Format of URIs and IRIs in Documents and Protocols . . . 33 6.3. Format of URIs and IRIs in Documents and Protocols . . . 34
6.4. Use of UTF-8 for Encoding Original Characters . . . . . . 34 6.4. Use of UTF-8 for Encoding Original Characters . . . . . . 34
6.5. Relative IRI References . . . . . . . . . . . . . . . . . 36 6.5. Relative IRI References . . . . . . . . . . . . . . . . . 36
7. Liberal handling of otherwise invalid IRIs . . . . . . . . . . 36 7. Liberal handling of otherwise invalid IRIs . . . . . . . . . . 36
7.1. LEIRI processing . . . . . . . . . . . . . . . . . . . . 36 7.1. LEIRI processing . . . . . . . . . . . . . . . . . . . . 36
7.2. Web Address processing . . . . . . . . . . . . . . . . . 36 7.2. Web Address processing . . . . . . . . . . . . . . . . . 37
7.3. Characters not allowed in IRIs . . . . . . . . . . . . . 38 7.3. Characters not allowed in IRIs . . . . . . . . . . . . . 38
8. URI/IRI Processing Guidelines (Informative) . . . . . . . . . 40 8. URI/IRI Processing Guidelines (Informative) . . . . . . . . . 40
8.1. URI/IRI Software Interfaces . . . . . . . . . . . . . . . 40 8.1. URI/IRI Software Interfaces . . . . . . . . . . . . . . . 40
8.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 41 8.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 41
8.3. URI/IRI Transfer between Applications . . . . . . . . . . 42 8.3. URI/IRI Transfer between Applications . . . . . . . . . . 42
8.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 42 8.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 42
8.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 43 8.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 43
8.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 43 8.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 43
8.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 44 8.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 44
8.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 44 8.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 44
skipping to change at page 6, line 10 skipping to change at page 6, line 10
discusses various forms of equivalence between IRIs. Section 6 discusses various forms of equivalence between IRIs. Section 6
discusses the use of IRIs in different situations. Section 8 gives discusses the use of IRIs in different situations. Section 8 gives
additional informative guidelines. Section 10 discusses IRI-specific additional informative guidelines. Section 10 discusses IRI-specific
security considerations. security considerations.
1.2. Applicability 1.2. Applicability
IRIs are designed to allow protocols and software that deal with URIs IRIs are designed to allow protocols and software that deal with URIs
to be updated to handle IRIs. A "URI scheme" (as defined by to be updated to handle IRIs. A "URI scheme" (as defined by
[RFC3986] and registered through the IANA process defined in [RFC3986] and registered through the IANA process defined in
[RFC4395] also serves as an "IRI scheme". Processing of IRIs is [RFC4395bis] also serves as an "IRI scheme". Processing of IRIs is
accomplished by extending the URI syntax while retaining (and not accomplished by extending the URI syntax while retaining (and not
expanding) the set of "reserved" characters, such that the syntax for expanding) the set of "reserved" characters, such that the syntax for
any URI scheme may be uniformly extended to allow non-ASCII any URI scheme may be uniformly extended to allow non-ASCII
characters. In addition, following parsing of an IRI, it is possible characters. In addition, following parsing of an IRI, it is possible
to construct a corresponding URI by first encoding characters outside to construct a corresponding URI by first encoding characters outside
of the allowed URI range and then reassembling the components. of the allowed URI range and then reassembling the components.
Practical use of IRIs forms in place of URIs forms depends on the Practical use of IRIs forms in place of URIs forms depends on the
following conditions being met: following conditions being met:
skipping to change at page 6, line 40 skipping to change at page 6, line 40
b. The protocol or format carrying the IRIs MUST have a mechanism to b. The protocol or format carrying the IRIs MUST have a mechanism to
represent the wide range of characters used in IRIs, either represent the wide range of characters used in IRIs, either
natively or by some protocol- or format-specific escaping natively or by some protocol- or format-specific escaping
mechanism (for example, numeric character references in [XML1]). mechanism (for example, numeric character references in [XML1]).
c. The URI scheme definition, if it explicitly allows a percent sign c. The URI scheme definition, if it explicitly allows a percent sign
("%") in any syntactic component, SHOULD define the interpretation ("%") in any syntactic component, SHOULD define the interpretation
of sequences of percent-encoded octets (using "%XX" hex octets) as of sequences of percent-encoded octets (using "%XX" hex octets) as
octet from sequences of UTF-8 encoded strings; this is recommended octet from sequences of UTF-8 encoded strings; this is recommended
in the guidelines for registering new schemes, [RFC4395]. For in the guidelines for registering new schemes, [RFC4395bis]. For
example, this is the practice for IMAP URLs [RFC2192], POP URLs example, this is the practice for IMAP URLs [RFC2192], POP URLs
[RFC2384] and the URN syntax [RFC2141]). Note that use of [RFC2384] and the URN syntax [RFC2141]). Note that use of
percent-encoding may also be restricted in some situations, for percent-encoding may also be restricted in some situations, for
example, URI schemes that disallow percent-encoding might still be example, URI schemes that disallow percent-encoding might still be
used with a fragment identifier which is percent-encoded (e.g., used with a fragment identifier which is percent-encoded (e.g.,
[XPointer]). See Section 6.4 for further discussion. [XPointer]). See Section 6.4 for further discussion.
1.3. Definitions 1.3. Definitions
The following definitions are used in this document; they follow the The following definitions are used in this document; they follow the
skipping to change at page 14, line 21 skipping to change at page 14, line 21
and IRI references (i.e., absolute or relative forms); for IRIs, some and IRI references (i.e., absolute or relative forms); for IRIs, some
steps are scheme specific. steps are scheme specific.
3.1. Converting to UCS 3.1. Converting to UCS
Input that is already in a Unicode form (i.e., a sequence of Unicode Input that is already in a Unicode form (i.e., a sequence of Unicode
characters or an octet-stream representing a Unicode-based character characters or an octet-stream representing a Unicode-based character
encoding such as UTF-8 or UTF-16) should be left as is and not encoding such as UTF-8 or UTF-16) should be left as is and not
normalized (see (see Section 5.3.2.2). normalized (see (see Section 5.3.2.2).
If the IRI or IRI reference is an octet stream in some known non- An IRI or IRI reference is a sequence of characters from the UCS.
Unicode character encoding, convert the IRI to a sequence of For IRIs that are not already in a Unicode form (as when written on
characters from the UCS; this sequence SHOULD also be normalized paper, read aloud, or represented in a text stream using a legacy
according to Unicode Normalization Form C (NFC, [UTR15]). In this character encoding), convert the IRI to Unicode. Note that some
case, retain the original character encoding as the "document character encodings or transcriptions can be converted to or
character encoding". (DESIGN QUESTION: NOT WHAT MOST IMPLEMENTATIONS represented by more than one sequence of Unicode characters. Ideally
DO, CHANGE? ) the resulting IRI would use a normalized form, such as Unicode
Normalization Form C [UTR15] (see Section 5.3 Normalization and
Comparison), since that ensures a stable, consistent representation
that is most likely to produce the intended results. Implementers
and users are cautioned that, while denormalized character sequences
are valid, they might be difficult for other users or processes to
reproduce and might lead to unexpected results.
In other cases (written on paper, read aloud, or otherwise In other cases (written on paper, read aloud, or otherwise
represented independent of any character encoding) represent the IRI represented independent of any character encoding) represent the IRI
as a sequence of characters from the UCS normalized according to as a sequence of characters from the UCS normalized according to
Unicode Normalization Form C (NFC, [UTR15]). Unicode Normalization Form C (NFC, [UTR15]).
3.2. Parse the IRI into IRI components 3.2. Parse the IRI into IRI components
Parse the IRI, either as a relative reference (no scheme) or using Parse the IRI, either as a relative reference (no scheme) or using
scheme specific processing (according to the scheme given); the scheme specific processing (according to the scheme given); the
skipping to change at page 34, line 31 skipping to change at page 34, line 41
This section discusses details and gives examples for point c) in This section discusses details and gives examples for point c) in
Section 1.2. To be able to use IRIs, the URI corresponding to the Section 1.2. To be able to use IRIs, the URI corresponding to the
IRI in question has to encode original characters into octets by IRI in question has to encode original characters into octets by
using UTF-8. This can be specified for all URIs of a URI scheme or using UTF-8. This can be specified for all URIs of a URI scheme or
can apply to individual URIs for schemes that do not specify how to can apply to individual URIs for schemes that do not specify how to
encode original characters. It can apply to the whole URI, or only encode original characters. It can apply to the whole URI, or only
to some part. For background information on encoding characters into to some part. For background information on encoding characters into
URIs, see also Section 2.5 of [RFC3986]. URIs, see also Section 2.5 of [RFC3986].
For new URI schemes, using UTF-8 is recommended in [RFC4395]. For new URI schemes, using UTF-8 is recommended in [RFC4395bis].
Examples where UTF-8 is already used are the URN syntax [RFC2141], Examples where UTF-8 is already used are the URN syntax [RFC2141],
IMAP URLs [RFC2192], and POP URLs [RFC2384]. On the other hand, IMAP URLs [RFC2192], and POP URLs [RFC2384]. On the other hand,
because the HTTP URI scheme does not specify how to encode original because the HTTP URI scheme does not specify how to encode original
characters, only some HTTP URLs can have corresponding but different characters, only some HTTP URLs can have corresponding but different
IRIs. IRIs.
For example, for a document with a URI of For example, for a document with a URI of
"http://www.example.org/r%C3%A9sum%C3%A9.html", it is possible to "http://www.example.org/r%C3%A9sum%C3%A9.html", it is possible to
construct a corresponding IRI (in XML notation, see Section 1.4): construct a corresponding IRI (in XML notation, see Section 1.4):
"http://www.example.org/résumé.html" ("é" stands for "http://www.example.org/résumé.html" ("é" stands for
skipping to change at page 53, line 45 skipping to change at page 53, line 45
[RFC2397] Masinter, L., "The "data" URL scheme", RFC 2397, [RFC2397] Masinter, L., "The "data" URL scheme", RFC 2397,
August 1998. August 1998.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC2640] Curtin, B., "Internationalization of the File Transfer [RFC2640] Curtin, B., "Internationalization of the File Transfer
Protocol", RFC 2640, July 1999. Protocol", RFC 2640, July 1999.
[RFC4395] Hansen, T., Hardie, T., and L. Masinter, "Guidelines and [RFC4395bis]
Registration Procedures for New URI Schemes", BCP 35, Hansen, T., Hardie, T., and L. Masinter, "Guidelines and
RFC 4395, February 2006. Registration Procedures for New URI/IRI Schemes",
draft-hansen-iri-4395bis-irireg-00 (work in progress),
September 2010.
[UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other [UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other
Markup Languages", Unicode Technical Report #20, World Markup Languages", Unicode Technical Report #20, World
Wide Web Consortium Note, June 2003, Wide Web Consortium Note, June 2003,
<http://www.w3.org/TR/unicode-xml/>. <http://www.w3.org/TR/unicode-xml/>.
[UTR36] Davis, M. and M. Suignard, "Unicode Security [UTR36] Davis, M. and M. Suignard, "Unicode Security
Considerations", Unicode Technical Report #36, Considerations", Unicode Technical Report #36,
August 2010, <http://unicode.org/reports/tr36/>. August 2010, <http://unicode.org/reports/tr36/>.
 End of changes. 13 change blocks. 
22 lines changed or deleted 30 lines changed or added

This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/