draft-ietf-iri-3987bis-02.txt | draft-ietf-iri-3987bis-03.txt | |||
---|---|---|---|---|
Internationalized Resource M. Duerst | Internationalized Resource M. Duerst | |||
Identifiers (iri) Aoyama Gakuin University | Identifiers (iri) Aoyama Gakuin University | |||
Internet-Draft M. Suignard | Internet-Draft M. Suignard | |||
Obsoletes: 3987 (if approved) Unicode Consortium | Obsoletes: 3987 (if approved) Unicode Consortium | |||
Intended status: Standards Track L. Masinter | Intended status: Standards Track L. Masinter | |||
Expires: April 20, 2011 Adobe | Expires: April 28, 2011 Adobe | |||
October 17, 2010 | October 25, 2010 | |||
Internationalized Resource Identifiers (IRIs) | Internationalized Resource Identifiers (IRIs) | |||
draft-ietf-iri-3987bis-02 | draft-ietf-iri-3987bis-03 | |||
Abstract | Abstract | |||
This document defines the Internationalized Resource Identifier (IRI) | This document defines the Internationalized Resource Identifier (IRI) | |||
protocol element, as an extension of the Uniform Resource Identifier | protocol element, as an extension of the Uniform Resource Identifier | |||
(URI). An IRI is a sequence of characters from the Universal | (URI). An IRI is a sequence of characters from the Universal | |||
Character Set (Unicode/ISO 10646). Grammar and processing rules are | Character Set (Unicode/ISO 10646). Grammar and processing rules are | |||
given for IRIs and related syntactic forms. | given for IRIs and related syntactic forms. | |||
In addition, this document provides named additional rule sets for | In addition, this document provides named additional rule sets for | |||
skipping to change at page 2, line 21 | skipping to change at page 2, line 21 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on April 20, 2011. | This Internet-Draft will expire on April 28, 2011. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2010 IETF Trust and the persons identified as the | Copyright (c) 2010 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 3, line 19 | skipping to change at page 3, line 19 | |||
1.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6 | |||
1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 9 | 1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 10 | 2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 10 | |||
2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10 | 2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10 | |||
3. Processing IRIs and related protocol elements . . . . . . . . 13 | 3. Processing IRIs and related protocol elements . . . . . . . . 13 | |||
3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 14 | 3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 14 | |||
3.2. Parse the IRI into IRI components . . . . . . . . . . . . 14 | 3.2. Parse the IRI into IRI components . . . . . . . . . . . . 14 | |||
3.3. General percent-encoding of IRI components . . . . . . . 15 | 3.3. General percent-encoding of IRI components . . . . . . . 15 | |||
3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 15 | 3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 16 | |||
3.5. Mapping query components . . . . . . . . . . . . . . . . 17 | 3.5. Mapping query components . . . . . . . . . . . . . . . . 17 | |||
3.6. Mapping IRIs to URIs . . . . . . . . . . . . . . . . . . 17 | 3.6. Mapping IRIs to URIs . . . . . . . . . . . . . . . . . . 17 | |||
3.7. Converting URIs to IRIs . . . . . . . . . . . . . . . . . 17 | 3.7. Converting URIs to IRIs . . . . . . . . . . . . . . . . . 17 | |||
3.7.1. Examples . . . . . . . . . . . . . . . . . . . . . . . 19 | 3.7.1. Examples . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
4. Bidirectional IRIs for Right-to-Left Languages . . . . . . . . 20 | 4. Bidirectional IRIs for Right-to-Left Languages . . . . . . . . 20 | |||
4.1. Logical Storage and Visual Presentation . . . . . . . . . 21 | 4.1. Logical Storage and Visual Presentation . . . . . . . . . 21 | |||
4.2. Bidi IRI Structure . . . . . . . . . . . . . . . . . . . 22 | 4.2. Bidi IRI Structure . . . . . . . . . . . . . . . . . . . 22 | |||
4.3. Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . 23 | 4.3. Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . 23 | |||
4.4. Examples . . . . . . . . . . . . . . . . . . . . . . . . 23 | 4.4. Examples . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
5. Normalization and Comparison . . . . . . . . . . . . . . . . . 25 | 5. Normalization and Comparison . . . . . . . . . . . . . . . . . 25 | |||
5.1. Equivalence . . . . . . . . . . . . . . . . . . . . . . . 25 | 5.1. Equivalence . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
5.2. Preparation for Comparison . . . . . . . . . . . . . . . 26 | 5.2. Preparation for Comparison . . . . . . . . . . . . . . . 26 | |||
5.3. Comparison Ladder . . . . . . . . . . . . . . . . . . . . 27 | 5.3. Comparison Ladder . . . . . . . . . . . . . . . . . . . . 27 | |||
5.3.1. Simple String Comparison . . . . . . . . . . . . . . . 27 | 5.3.1. Simple String Comparison . . . . . . . . . . . . . . . 27 | |||
5.3.2. Syntax-Based Normalization . . . . . . . . . . . . . . 28 | 5.3.2. Syntax-Based Normalization . . . . . . . . . . . . . . 28 | |||
5.3.3. Scheme-Based Normalization . . . . . . . . . . . . . . 31 | 5.3.3. Scheme-Based Normalization . . . . . . . . . . . . . . 31 | |||
5.3.4. Protocol-Based Normalization . . . . . . . . . . . . . 32 | 5.3.4. Protocol-Based Normalization . . . . . . . . . . . . . 32 | |||
6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 32 | 6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
6.1. Limitations on UCS Characters Allowed in IRIs . . . . . . 33 | 6.1. Limitations on UCS Characters Allowed in IRIs . . . . . . 33 | |||
6.2. Software Interfaces and Protocols . . . . . . . . . . . . 33 | 6.2. Software Interfaces and Protocols . . . . . . . . . . . . 33 | |||
6.3. Format of URIs and IRIs in Documents and Protocols . . . 33 | 6.3. Format of URIs and IRIs in Documents and Protocols . . . 34 | |||
6.4. Use of UTF-8 for Encoding Original Characters . . . . . . 34 | 6.4. Use of UTF-8 for Encoding Original Characters . . . . . . 34 | |||
6.5. Relative IRI References . . . . . . . . . . . . . . . . . 36 | 6.5. Relative IRI References . . . . . . . . . . . . . . . . . 36 | |||
7. Liberal handling of otherwise invalid IRIs . . . . . . . . . . 36 | 7. Liberal handling of otherwise invalid IRIs . . . . . . . . . . 36 | |||
7.1. LEIRI processing . . . . . . . . . . . . . . . . . . . . 36 | 7.1. LEIRI processing . . . . . . . . . . . . . . . . . . . . 36 | |||
7.2. Web Address processing . . . . . . . . . . . . . . . . . 36 | 7.2. Web Address processing . . . . . . . . . . . . . . . . . 37 | |||
7.3. Characters not allowed in IRIs . . . . . . . . . . . . . 38 | 7.3. Characters not allowed in IRIs . . . . . . . . . . . . . 38 | |||
8. URI/IRI Processing Guidelines (Informative) . . . . . . . . . 40 | 8. URI/IRI Processing Guidelines (Informative) . . . . . . . . . 40 | |||
8.1. URI/IRI Software Interfaces . . . . . . . . . . . . . . . 40 | 8.1. URI/IRI Software Interfaces . . . . . . . . . . . . . . . 40 | |||
8.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 41 | 8.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 41 | |||
8.3. URI/IRI Transfer between Applications . . . . . . . . . . 42 | 8.3. URI/IRI Transfer between Applications . . . . . . . . . . 42 | |||
8.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 42 | 8.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 42 | |||
8.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 43 | 8.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 43 | |||
8.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 43 | 8.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 43 | |||
8.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 44 | 8.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 44 | |||
8.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 44 | 8.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 44 | |||
skipping to change at page 6, line 10 | skipping to change at page 6, line 10 | |||
discusses various forms of equivalence between IRIs. Section 6 | discusses various forms of equivalence between IRIs. Section 6 | |||
discusses the use of IRIs in different situations. Section 8 gives | discusses the use of IRIs in different situations. Section 8 gives | |||
additional informative guidelines. Section 10 discusses IRI-specific | additional informative guidelines. Section 10 discusses IRI-specific | |||
security considerations. | security considerations. | |||
1.2. Applicability | 1.2. Applicability | |||
IRIs are designed to allow protocols and software that deal with URIs | IRIs are designed to allow protocols and software that deal with URIs | |||
to be updated to handle IRIs. A "URI scheme" (as defined by | to be updated to handle IRIs. A "URI scheme" (as defined by | |||
[RFC3986] and registered through the IANA process defined in | [RFC3986] and registered through the IANA process defined in | |||
[RFC4395] also serves as an "IRI scheme". Processing of IRIs is | [RFC4395bis] also serves as an "IRI scheme". Processing of IRIs is | |||
accomplished by extending the URI syntax while retaining (and not | accomplished by extending the URI syntax while retaining (and not | |||
expanding) the set of "reserved" characters, such that the syntax for | expanding) the set of "reserved" characters, such that the syntax for | |||
any URI scheme may be uniformly extended to allow non-ASCII | any URI scheme may be uniformly extended to allow non-ASCII | |||
characters. In addition, following parsing of an IRI, it is possible | characters. In addition, following parsing of an IRI, it is possible | |||
to construct a corresponding URI by first encoding characters outside | to construct a corresponding URI by first encoding characters outside | |||
of the allowed URI range and then reassembling the components. | of the allowed URI range and then reassembling the components. | |||
Practical use of IRIs forms in place of URIs forms depends on the | Practical use of IRIs forms in place of URIs forms depends on the | |||
following conditions being met: | following conditions being met: | |||
skipping to change at page 6, line 40 | skipping to change at page 6, line 40 | |||
b. The protocol or format carrying the IRIs MUST have a mechanism to | b. The protocol or format carrying the IRIs MUST have a mechanism to | |||
represent the wide range of characters used in IRIs, either | represent the wide range of characters used in IRIs, either | |||
natively or by some protocol- or format-specific escaping | natively or by some protocol- or format-specific escaping | |||
mechanism (for example, numeric character references in [XML1]). | mechanism (for example, numeric character references in [XML1]). | |||
c. The URI scheme definition, if it explicitly allows a percent sign | c. The URI scheme definition, if it explicitly allows a percent sign | |||
("%") in any syntactic component, SHOULD define the interpretation | ("%") in any syntactic component, SHOULD define the interpretation | |||
of sequences of percent-encoded octets (using "%XX" hex octets) as | of sequences of percent-encoded octets (using "%XX" hex octets) as | |||
octet from sequences of UTF-8 encoded strings; this is recommended | octet from sequences of UTF-8 encoded strings; this is recommended | |||
in the guidelines for registering new schemes, [RFC4395]. For | in the guidelines for registering new schemes, [RFC4395bis]. For | |||
example, this is the practice for IMAP URLs [RFC2192], POP URLs | example, this is the practice for IMAP URLs [RFC2192], POP URLs | |||
[RFC2384] and the URN syntax [RFC2141]). Note that use of | [RFC2384] and the URN syntax [RFC2141]). Note that use of | |||
percent-encoding may also be restricted in some situations, for | percent-encoding may also be restricted in some situations, for | |||
example, URI schemes that disallow percent-encoding might still be | example, URI schemes that disallow percent-encoding might still be | |||
used with a fragment identifier which is percent-encoded (e.g., | used with a fragment identifier which is percent-encoded (e.g., | |||
[XPointer]). See Section 6.4 for further discussion. | [XPointer]). See Section 6.4 for further discussion. | |||
1.3. Definitions | 1.3. Definitions | |||
The following definitions are used in this document; they follow the | The following definitions are used in this document; they follow the | |||
skipping to change at page 14, line 21 | skipping to change at page 14, line 21 | |||
and IRI references (i.e., absolute or relative forms); for IRIs, some | and IRI references (i.e., absolute or relative forms); for IRIs, some | |||
steps are scheme specific. | steps are scheme specific. | |||
3.1. Converting to UCS | 3.1. Converting to UCS | |||
Input that is already in a Unicode form (i.e., a sequence of Unicode | Input that is already in a Unicode form (i.e., a sequence of Unicode | |||
characters or an octet-stream representing a Unicode-based character | characters or an octet-stream representing a Unicode-based character | |||
encoding such as UTF-8 or UTF-16) should be left as is and not | encoding such as UTF-8 or UTF-16) should be left as is and not | |||
normalized (see (see Section 5.3.2.2). | normalized (see (see Section 5.3.2.2). | |||
If the IRI or IRI reference is an octet stream in some known non- | An IRI or IRI reference is a sequence of characters from the UCS. | |||
Unicode character encoding, convert the IRI to a sequence of | For IRIs that are not already in a Unicode form (as when written on | |||
characters from the UCS; this sequence SHOULD also be normalized | paper, read aloud, or represented in a text stream using a legacy | |||
according to Unicode Normalization Form C (NFC, [UTR15]). In this | character encoding), convert the IRI to Unicode. Note that some | |||
case, retain the original character encoding as the "document | character encodings or transcriptions can be converted to or | |||
character encoding". (DESIGN QUESTION: NOT WHAT MOST IMPLEMENTATIONS | represented by more than one sequence of Unicode characters. Ideally | |||
DO, CHANGE? ) | the resulting IRI would use a normalized form, such as Unicode | |||
Normalization Form C [UTR15] (see Section 5.3 Normalization and | ||||
Comparison), since that ensures a stable, consistent representation | ||||
that is most likely to produce the intended results. Implementers | ||||
and users are cautioned that, while denormalized character sequences | ||||
are valid, they might be difficult for other users or processes to | ||||
reproduce and might lead to unexpected results. | ||||
In other cases (written on paper, read aloud, or otherwise | In other cases (written on paper, read aloud, or otherwise | |||
represented independent of any character encoding) represent the IRI | represented independent of any character encoding) represent the IRI | |||
as a sequence of characters from the UCS normalized according to | as a sequence of characters from the UCS normalized according to | |||
Unicode Normalization Form C (NFC, [UTR15]). | Unicode Normalization Form C (NFC, [UTR15]). | |||
3.2. Parse the IRI into IRI components | 3.2. Parse the IRI into IRI components | |||
Parse the IRI, either as a relative reference (no scheme) or using | Parse the IRI, either as a relative reference (no scheme) or using | |||
scheme specific processing (according to the scheme given); the | scheme specific processing (according to the scheme given); the | |||
skipping to change at page 34, line 31 | skipping to change at page 34, line 41 | |||
This section discusses details and gives examples for point c) in | This section discusses details and gives examples for point c) in | |||
Section 1.2. To be able to use IRIs, the URI corresponding to the | Section 1.2. To be able to use IRIs, the URI corresponding to the | |||
IRI in question has to encode original characters into octets by | IRI in question has to encode original characters into octets by | |||
using UTF-8. This can be specified for all URIs of a URI scheme or | using UTF-8. This can be specified for all URIs of a URI scheme or | |||
can apply to individual URIs for schemes that do not specify how to | can apply to individual URIs for schemes that do not specify how to | |||
encode original characters. It can apply to the whole URI, or only | encode original characters. It can apply to the whole URI, or only | |||
to some part. For background information on encoding characters into | to some part. For background information on encoding characters into | |||
URIs, see also Section 2.5 of [RFC3986]. | URIs, see also Section 2.5 of [RFC3986]. | |||
For new URI schemes, using UTF-8 is recommended in [RFC4395]. | For new URI schemes, using UTF-8 is recommended in [RFC4395bis]. | |||
Examples where UTF-8 is already used are the URN syntax [RFC2141], | Examples where UTF-8 is already used are the URN syntax [RFC2141], | |||
IMAP URLs [RFC2192], and POP URLs [RFC2384]. On the other hand, | IMAP URLs [RFC2192], and POP URLs [RFC2384]. On the other hand, | |||
because the HTTP URI scheme does not specify how to encode original | because the HTTP URI scheme does not specify how to encode original | |||
characters, only some HTTP URLs can have corresponding but different | characters, only some HTTP URLs can have corresponding but different | |||
IRIs. | IRIs. | |||
For example, for a document with a URI of | For example, for a document with a URI of | |||
"http://www.example.org/r%C3%A9sum%C3%A9.html", it is possible to | "http://www.example.org/r%C3%A9sum%C3%A9.html", it is possible to | |||
construct a corresponding IRI (in XML notation, see Section 1.4): | construct a corresponding IRI (in XML notation, see Section 1.4): | |||
"http://www.example.org/résumé.html" ("é" stands for | "http://www.example.org/résumé.html" ("é" stands for | |||
skipping to change at page 53, line 45 | skipping to change at page 53, line 45 | |||
[RFC2397] Masinter, L., "The "data" URL scheme", RFC 2397, | [RFC2397] Masinter, L., "The "data" URL scheme", RFC 2397, | |||
August 1998. | August 1998. | |||
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., | [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., | |||
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext | Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext | |||
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. | Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. | |||
[RFC2640] Curtin, B., "Internationalization of the File Transfer | [RFC2640] Curtin, B., "Internationalization of the File Transfer | |||
Protocol", RFC 2640, July 1999. | Protocol", RFC 2640, July 1999. | |||
[RFC4395] Hansen, T., Hardie, T., and L. Masinter, "Guidelines and | [RFC4395bis] | |||
Registration Procedures for New URI Schemes", BCP 35, | Hansen, T., Hardie, T., and L. Masinter, "Guidelines and | |||
RFC 4395, February 2006. | Registration Procedures for New URI/IRI Schemes", | |||
draft-hansen-iri-4395bis-irireg-00 (work in progress), | ||||
September 2010. | ||||
[UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | [UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | |||
Markup Languages", Unicode Technical Report #20, World | Markup Languages", Unicode Technical Report #20, World | |||
Wide Web Consortium Note, June 2003, | Wide Web Consortium Note, June 2003, | |||
<http://www.w3.org/TR/unicode-xml/>. | <http://www.w3.org/TR/unicode-xml/>. | |||
[UTR36] Davis, M. and M. Suignard, "Unicode Security | [UTR36] Davis, M. and M. Suignard, "Unicode Security | |||
Considerations", Unicode Technical Report #36, | Considerations", Unicode Technical Report #36, | |||
August 2010, <http://unicode.org/reports/tr36/>. | August 2010, <http://unicode.org/reports/tr36/>. | |||
End of changes. 13 change blocks. | ||||
22 lines changed or deleted | 30 lines changed or added | |||
This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |