draft-ietf-iri-3987bis-03.txt | draft-ietf-iri-3987bis-04.txt | |||
---|---|---|---|---|
Internationalized Resource M. Duerst | Internationalized Resource M. Duerst | |||
Identifiers (iri) Aoyama Gakuin University | Identifiers (iri) Aoyama Gakuin University | |||
Internet-Draft M. Suignard | Internet-Draft M. Suignard | |||
Obsoletes: 3987 (if approved) Unicode Consortium | Obsoletes: 3987 (if approved) Unicode Consortium | |||
Intended status: Standards Track L. Masinter | Intended status: Standards Track L. Masinter | |||
Expires: April 28, 2011 Adobe | Expires: September 15, 2011 Adobe | |||
October 25, 2010 | March 14, 2011 | |||
Internationalized Resource Identifiers (IRIs) | Internationalized Resource Identifiers (IRIs) | |||
draft-ietf-iri-3987bis-03 | draft-ietf-iri-3987bis-04 | |||
Abstract | Abstract | |||
This document defines the Internationalized Resource Identifier (IRI) | This document defines the Internationalized Resource Identifier (IRI) | |||
protocol element, as an extension of the Uniform Resource Identifier | protocol element, as an extension of the Uniform Resource Identifier | |||
(URI). An IRI is a sequence of characters from the Universal | (URI). An IRI is a sequence of characters from the Universal | |||
Character Set (Unicode/ISO 10646). Grammar and processing rules are | Character Set (Unicode/ISO 10646). Grammar and processing rules are | |||
given for IRIs and related syntactic forms. | given for IRIs and related syntactic forms. | |||
In addition, this document provides named additional rule sets for | In addition, this document provides named additional rule sets for | |||
skipping to change at page 2, line 21 | skipping to change at page 2, line 21 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on April 28, 2011. | This Internet-Draft will expire on September 15, 2011. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2010 IETF Trust and the persons identified as the | Copyright (c) 2011 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
skipping to change at page 3, line 10 | skipping to change at page 3, line 10 | |||
outside the IETF Standards Process, and derivative works of it may | outside the IETF Standards Process, and derivative works of it may | |||
not be created outside the IETF Standards Process, except to format | not be created outside the IETF Standards Process, except to format | |||
it for publication as an RFC or to translate it into languages other | it for publication as an RFC or to translate it into languages other | |||
than English. | than English. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
1.1. Overview and Motivation . . . . . . . . . . . . . . . . . 5 | 1.1. Overview and Motivation . . . . . . . . . . . . . . . . . 5 | |||
1.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6 | |||
1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 9 | 1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 10 | 2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 10 | |||
2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10 | 2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10 | |||
3. Processing IRIs and related protocol elements . . . . . . . . 13 | 3. Processing IRIs and related protocol elements . . . . . . . . 13 | |||
3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 14 | 3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 14 | |||
3.2. Parse the IRI into IRI components . . . . . . . . . . . . 14 | 3.2. Parse the IRI into IRI components . . . . . . . . . . . . 14 | |||
3.3. General percent-encoding of IRI components . . . . . . . 15 | 3.3. General percent-encoding of IRI components . . . . . . . 15 | |||
3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 16 | 3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 16 | |||
3.5. Mapping query components . . . . . . . . . . . . . . . . 17 | 3.5. Mapping query components . . . . . . . . . . . . . . . . 17 | |||
skipping to change at page 4, line 10 | skipping to change at page 4, line 10 | |||
8.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 41 | 8.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 41 | |||
8.3. URI/IRI Transfer between Applications . . . . . . . . . . 42 | 8.3. URI/IRI Transfer between Applications . . . . . . . . . . 42 | |||
8.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 42 | 8.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 42 | |||
8.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 43 | 8.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 43 | |||
8.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 43 | 8.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 43 | |||
8.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 44 | 8.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 44 | |||
8.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 44 | 8.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 44 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 | |||
10. Security Considerations . . . . . . . . . . . . . . . . . . . 46 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 46 | |||
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 47 | 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 47 | |||
12. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 48 | 12. Main Changes Since RFC 3987 . . . . . . . . . . . . . . . . . 48 | |||
12.1. Changes from draft-duerst-iri-bis-07 to | 12.1. Major restructuring of IRI processing model . . . . . . . 48 | |||
draft-ietf-iri-3987bis-00 . . . . . . . . . . . . . . . . 48 | 12.1.1. OLD WAY . . . . . . . . . . . . . . . . . . . . . . . 48 | |||
12.2. Changes from -06 to -07 of draft-duerst-iri-bis . . . . . 48 | 12.1.2. NEW WAY . . . . . . . . . . . . . . . . . . . . . . . 49 | |||
12.2.1. OLD WAY . . . . . . . . . . . . . . . . . . . . . . . 49 | 12.1.3. Extension of Syntax . . . . . . . . . . . . . . . . . 49 | |||
12.2.2. NEW WAY . . . . . . . . . . . . . . . . . . . . . . . 49 | 12.1.4. More to be added . . . . . . . . . . . . . . . . . . . 49 | |||
12.3. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 49 | 12.2. Change Log . . . . . . . . . . . . . . . . . . . . . . . 49 | |||
12.4. Changes from -05 to -06 of draft-duerst-iri-bis-00 . . . 49 | 12.2.1. Changes after draft-ietf-iri-3987bis-01 . . . . . . . 49 | |||
12.2.2. Changes from draft-duerst-iri-bis-07 to | ||||
draft-ietf-iri-3987bis-00 . . . . . . . . . . . . . . 49 | ||||
12.2.3. Changes from -06 to -07 of draft-duerst-iri-bis . . . 49 | ||||
12.3. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 50 | ||||
12.4. Changes from -05 to -06 of draft-duerst-iri-bis-00 . . . 50 | ||||
12.5. Changes from -04 to -05 of draft-duerst-iri-bis . . . . . 50 | 12.5. Changes from -04 to -05 of draft-duerst-iri-bis . . . . . 50 | |||
12.6. Changes from -03 to -04 of draft-duerst-iri-bis . . . . . 50 | 12.6. Changes from -03 to -04 of draft-duerst-iri-bis . . . . . 50 | |||
12.7. Changes from -02 to -03 of draft-duerst-iri-bis . . . . . 50 | 12.7. Changes from -02 to -03 of draft-duerst-iri-bis . . . . . 50 | |||
12.8. Changes from -01 to -02 of draft-duerst-iri-bis . . . . . 50 | 12.8. Changes from -01 to -02 of draft-duerst-iri-bis . . . . . 50 | |||
12.9. Changes from -00 to -01 of draft-duerst-iri-bis . . . . . 50 | 12.9. Changes from -00 to -01 of draft-duerst-iri-bis . . . . . 51 | |||
12.10. Changes from RFC 3987 to -00 of draft-duerst-iri-bis . . 51 | 12.10. Changes from RFC 3987 to -00 of draft-duerst-iri-bis . . 51 | |||
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 51 | 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 51 | |||
13.1. Normative References . . . . . . . . . . . . . . . . . . 51 | 13.1. Normative References . . . . . . . . . . . . . . . . . . 51 | |||
13.2. Informative References . . . . . . . . . . . . . . . . . 52 | 13.2. Informative References . . . . . . . . . . . . . . . . . 52 | |||
Appendix A. Design Alternatives . . . . . . . . . . . . . . . . . 54 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 55 | |||
A.1. New Scheme(s) . . . . . . . . . . . . . . . . . . . . . . 54 | ||||
A.2. Character Encodings Other Than UTF-8 . . . . . . . . . . 55 | ||||
A.3. New Encoding Convention . . . . . . . . . . . . . . . . . 55 | ||||
A.4. Indicating Character Encodings in the URI/IRI . . . . . . 55 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 56 | ||||
1. Introduction | 1. Introduction | |||
1.1. Overview and Motivation | 1.1. Overview and Motivation | |||
A Uniform Resource Identifier (URI) is defined in [RFC3986] as a | A Uniform Resource Identifier (URI) is defined in [RFC3986] as a | |||
sequence of characters chosen from a limited subset of the repertoire | sequence of characters chosen from a limited subset of the repertoire | |||
of US-ASCII [ASCII] characters. | of US-ASCII [ASCII] characters. | |||
The characters in URIs are frequently used for representing words of | The characters in URIs are frequently used for representing words of | |||
skipping to change at page 6, line 5 | skipping to change at page 6, line 5 | |||
as URI references. The syntax of IRIs is defined in Section 2. | as URI references. The syntax of IRIs is defined in Section 2. | |||
Using characters outside of A - Z in IRIs adds a number of | Using characters outside of A - Z in IRIs adds a number of | |||
difficulties. Section 4 discusses the special case of bidirectional | difficulties. Section 4 discusses the special case of bidirectional | |||
IRIs using characters from scripts written right-to-left. Section 5 | IRIs using characters from scripts written right-to-left. Section 5 | |||
discusses various forms of equivalence between IRIs. Section 6 | discusses various forms of equivalence between IRIs. Section 6 | |||
discusses the use of IRIs in different situations. Section 8 gives | discusses the use of IRIs in different situations. Section 8 gives | |||
additional informative guidelines. Section 10 discusses IRI-specific | additional informative guidelines. Section 10 discusses IRI-specific | |||
security considerations. | security considerations. | |||
When originally defining IRIs, several design alternatives were | ||||
considered. Historically interested readers can find an overview in | ||||
Appendix A of [RFC3987]. For some additional background on the | ||||
design of URIs and IRIs, please also see [Gettys]. | ||||
1.2. Applicability | 1.2. Applicability | |||
IRIs are designed to allow protocols and software that deal with URIs | IRIs are designed to allow protocols and software that deal with URIs | |||
to be updated to handle IRIs. A "URI scheme" (as defined by | to be updated to handle IRIs. A "URI scheme" (as defined by | |||
[RFC3986] and registered through the IANA process defined in | [RFC3986] and registered through the IANA process defined in | |||
[RFC4395bis] also serves as an "IRI scheme". Processing of IRIs is | [RFC4395bis] also serves as an "IRI scheme". Processing of IRIs is | |||
accomplished by extending the URI syntax while retaining (and not | accomplished by extending the URI syntax while retaining (and not | |||
expanding) the set of "reserved" characters, such that the syntax for | expanding) the set of "reserved" characters, such that the syntax for | |||
any URI scheme may be uniformly extended to allow non-ASCII | any URI scheme may be uniformly extended to allow non-ASCII | |||
characters. In addition, following parsing of an IRI, it is possible | characters. In addition, following parsing of an IRI, it is possible | |||
skipping to change at page 7, line 28 | skipping to change at page 7, line 33 | |||
character encoding: A method of representing a sequence of | character encoding: A method of representing a sequence of | |||
characters as a sequence of octets (maybe with variants). Also, a | characters as a sequence of octets (maybe with variants). Also, a | |||
method of (unambiguously) converting a sequence of octets into a | method of (unambiguously) converting a sequence of octets into a | |||
sequence of characters. | sequence of characters. | |||
charset: The name of a parameter or attribute used to identify a | charset: The name of a parameter or attribute used to identify a | |||
character encoding. | character encoding. | |||
UCS: Universal Character Set. The coded character set defined by | UCS: Universal Character Set. The coded character set defined by | |||
ISO/IEC 10646 [ISO10646] and the Unicode Standard [UNIV4]. | ISO/IEC 10646 [ISO10646] and the Unicode Standard [UNIV6]. | |||
IRI reference: Denotes the common usage of an Internationalized | IRI reference: Denotes the common usage of an Internationalized | |||
Resource Identifier. An IRI reference may be absolute or | Resource Identifier. An IRI reference may be absolute or | |||
relative. However, the "IRI" that results from such a reference | relative. However, the "IRI" that results from such a reference | |||
only includes absolute IRIs; any relative IRI references are | only includes absolute IRIs; any relative IRI references are | |||
resolved to their absolute form. Note that in [RFC2396] URIs did | resolved to their absolute form. Note that in [RFC2396] URIs did | |||
not include fragment identifiers, but in [RFC3986] fragment | not include fragment identifiers, but in [RFC3986] fragment | |||
identifiers are part of URIs. | identifiers are part of URIs. | |||
URL: The term "URL" was originally used [RFC1738] for roughly what | URL: The term "URL" was originally used [RFC1738] for roughly what | |||
skipping to change at page 21, line 21 | skipping to change at page 21, line 21 | |||
4.1. Logical Storage and Visual Presentation | 4.1. Logical Storage and Visual Presentation | |||
When stored or transmitted in digital representation, bidirectional | When stored or transmitted in digital representation, bidirectional | |||
IRIs MUST be in full logical order and MUST conform to the IRI syntax | IRIs MUST be in full logical order and MUST conform to the IRI syntax | |||
rules (which includes the rules relevant to their scheme). This | rules (which includes the rules relevant to their scheme). This | |||
ensures that bidirectional IRIs can be processed in the same way as | ensures that bidirectional IRIs can be processed in the same way as | |||
other IRIs. | other IRIs. | |||
Bidirectional IRIs MUST be rendered by using the Unicode | Bidirectional IRIs MUST be rendered by using the Unicode | |||
Bidirectional Algorithm [UNIV4], [UNI9]. Bidirectional IRIs MUST be | Bidirectional Algorithm [UNIV6], [UNI9]. Bidirectional IRIs MUST be | |||
rendered in the same way as they would be if they were in a left-to- | rendered in the same way as they would be if they were in a left-to- | |||
right embedding; i.e., as if they were preceded by U+202A, LEFT-TO- | right embedding; i.e., as if they were preceded by U+202A, LEFT-TO- | |||
RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL | RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL | |||
FORMATTING (PDF). Setting the embedding direction can also be done | FORMATTING (PDF). Setting the embedding direction can also be done | |||
in a higher-level protocol (e.g., the dir='ltr' attribute in HTML). | in a higher-level protocol (e.g., the dir='ltr' attribute in HTML). | |||
There is no requirement to use the above embedding if the display is | There is no requirement to use the above embedding if the display is | |||
still the same without the embedding. For example, a bidirectional | still the same without the embedding. For example, a bidirectional | |||
IRI in a text with left-to-right base directionality (such as used | IRI in a text with left-to-right base directionality (such as used | |||
for English or Cyrillic) that is preceded and followed by whitespace | for English or Cyrillic) that is preceded and followed by whitespace | |||
skipping to change at page 29, line 30 | skipping to change at page 29, line 30 | |||
Creating schemes that allow case-insensitive syntax components | Creating schemes that allow case-insensitive syntax components | |||
containing non-ASCII characters should be avoided. Case | containing non-ASCII characters should be avoided. Case | |||
normalization of non-ASCII characters can be culturally dependent and | normalization of non-ASCII characters can be culturally dependent and | |||
is always a complex operation. The only exception concerns non-ASCII | is always a complex operation. The only exception concerns non-ASCII | |||
host names for which the character normalization includes a mapping | host names for which the character normalization includes a mapping | |||
step derived from case folding. | step derived from case folding. | |||
5.3.2.2. Character Normalization | 5.3.2.2. Character Normalization | |||
The Unicode Standard [UNIV4] defines various equivalences between | The Unicode Standard [UNIV6] defines various equivalences between | |||
sequences of characters for various purposes. Unicode Standard Annex | sequences of characters for various purposes. Unicode Standard Annex | |||
#15 [UTR15] defines various Normalization Forms for these | #15 [UTR15] defines various Normalization Forms for these | |||
equivalences, in particular Normalization Form C (NFC, Canonical | equivalences, in particular Normalization Form C (NFC, Canonical | |||
Decomposition, followed by Canonical Composition) and Normalization | Decomposition, followed by Canonical Composition) and Normalization | |||
Form KC (NFKC, Compatibility Decomposition, followed by Canonical | Form KC (NFKC, Compatibility Decomposition, followed by Canonical | |||
Composition). | Composition). | |||
IRIs already in Unicode MUST NOT be normalized before parsing or | IRIs already in Unicode MUST NOT be normalized before parsing or | |||
interpreting. In many non-Unicode character encodings, some text | interpreting. In many non-Unicode character encodings, some text | |||
cannot be represented directly. For example, the word "Vietnam" is | cannot be represented directly. For example, the word "Vietnam" is | |||
skipping to change at page 30, line 17 | skipping to change at page 30, line 17 | |||
avoid even more problems; for example, by choosing half-width Latin | avoid even more problems; for example, by choosing half-width Latin | |||
letters instead of full-width ones, and full-width instead of half- | letters instead of full-width ones, and full-width instead of half- | |||
width Katakana. | width Katakana. | |||
As an example, "http://www.example.org/résumé.html" (in XML | As an example, "http://www.example.org/résumé.html" (in XML | |||
Notation) is in NFC. On the other hand, | Notation) is in NFC. On the other hand, | |||
"http://www.example.org/résumé.html" is not in NFC. | "http://www.example.org/résumé.html" is not in NFC. | |||
The former uses precombined e-acute characters, and the latter uses | The former uses precombined e-acute characters, and the latter uses | |||
"e" characters followed by combining acute accents. Both usages are | "e" characters followed by combining acute accents. Both usages are | |||
defined as canonically equivalent in [UNIV4]. | defined as canonically equivalent in [UNIV6]. | |||
Note: Because it is unknown how a particular sequence of characters | Note: Because it is unknown how a particular sequence of characters | |||
is being treated with respect to character normalization, it would | is being treated with respect to character normalization, it would | |||
be inappropriate to allow third parties to normalize an IRI | be inappropriate to allow third parties to normalize an IRI | |||
arbitrarily. This does not contradict the recommendation that | arbitrarily. This does not contradict the recommendation that | |||
when a resource is created, its IRI should be as character | when a resource is created, its IRI should be as character | |||
normalized as possible (i.e., NFC or even NFKC). This is similar | normalized as possible (i.e., NFC or even NFKC). This is similar | |||
to the uppercase/lowercase problems. Some parts of a URI are case | to the uppercase/lowercase problems. Some parts of a URI are case | |||
insensitive (for example, the domain name). For others, it is | insensitive (for example, the domain name). For others, it is | |||
unclear whether they are case sensitive, case insensitive, or | unclear whether they are case sensitive, case insensitive, or | |||
skipping to change at page 36, line 40 | skipping to change at page 36, line 40 | |||
known as Legacy Extended IRI or LEIRI [LEIRI], and Web Address | known as Legacy Extended IRI or LEIRI [LEIRI], and Web Address | |||
[HTML5]). | [HTML5]). | |||
Future technical specifications SHOULD NOT allow conforming producers | Future technical specifications SHOULD NOT allow conforming producers | |||
to produce, or conforming content to contain, such forms, as they are | to produce, or conforming content to contain, such forms, as they are | |||
not interoperable with other IRI consuming software. | not interoperable with other IRI consuming software. | |||
7.1. LEIRI processing | 7.1. LEIRI processing | |||
This section defines Legacy Extended IRIs (LEIRIs). The syntax of | This section defines Legacy Extended IRIs (LEIRIs). The syntax of | |||
Legacy Extended IRIs is the same as that for IRIs, except that the | Legacy Extended IRIs is the same as that for <IRI-reference>, except | |||
ucschar production is replaced by the leiri-ucschar production: | that the ucschar production is replaced by the leiri-ucschar | |||
production: | ||||
leiri-ucschar = " " / "<" / ">" / '"' / "{" / "}" / "|" | leiri-ucschar = " " / "<" / ">" / '"' / "{" / "}" / "|" | |||
/ "\" / "^" / "`" / %x0-1F / %x7F-D7FF | / "\" / "^" / "`" / %x0-1F / %x7F-D7FF | |||
/ %xE000-FFFD / %x10000-10FFFF | / %xE000-FFFD / %x10000-10FFFF | |||
Among other extensions, processors based on this specification also | Among other extensions, processors based on this specification also | |||
did not enforce the restriction on bidirectional formatting | did not enforce the restriction on bidirectional formatting | |||
characters in Section 4.1, and the iprivate production becomes | characters in Section 4.1, and the iprivate production becomes | |||
redundant. | redundant. | |||
skipping to change at page 47, line 37 | skipping to change at page 47, line 37 | |||
Section 4.2 are not followed. The same visual representation may be | Section 4.2 are not followed. The same visual representation may be | |||
interpreted as different logical representations, and vice versa. It | interpreted as different logical representations, and vice versa. It | |||
is also very important that a correct Unicode bidirectional | is also very important that a correct Unicode bidirectional | |||
implementation be used. | implementation be used. | |||
The use of Legacy Extended IRIs introduces additional security | The use of Legacy Extended IRIs introduces additional security | |||
issues. | issues. | |||
11. Acknowledgements | 11. Acknowledgements | |||
For contributions to this update, we would like to thank Ian Hickson, | This document was derived from [RFC3987]; the acknowledgments from | |||
Michael Sperberg-McQueen, Dan Connolly, Norman Walsh, Richard Tobin, | that specification still apply. | |||
Henry S. Thomson, and the XML Core Working Group of the W3C. | ||||
The discussion on the issue addressed here started a long time ago. | ||||
There was a thread in the HTML working group in August 1995 (under | ||||
the topic of "Globalizing URIs") and in the www-international mailing | ||||
list in July 1996 (under the topic of "Internationalization and | ||||
URLs"), and there were ad-hoc meetings at the Unicode conferences in | ||||
September 1995 and September 1997. | ||||
For contributions to the previous version of this document, RFC 3987, | ||||
many thanks go to Francois Yergeau, Matitiahu Allouche, Roy Fielding, | ||||
Tim Berners-Lee, Mark Davis, M.T. Carrasco Benitez, James Clark, Tim | ||||
Bray, Chris Wendt, Yaron Goland, Andrea Vine, Misha Wolf, Leslie | ||||
Daigle, Ted Hardie, Bill Fenner, Margaret Wasserman, Russ Housley, | ||||
Makoto MURATA, Steven Atkin, Ryan Stansifer, Tex Texin, Graham Klyne, | ||||
Bjoern Hoehrmann, Chris Lilley, Ian Jacobs, Adam Costello, Dan | ||||
Oscarson, Elliotte Rusty Harold, Mike J. Brown, Roy Badami, Jonathan | ||||
Rosenne, Asmus Freytag, Simon Josefsson, Carlos Viegas Damasio, Chris | ||||
Haynes, Walter Underwood, and many others. | ||||
A definition of HyperText Reference was initially produced by Ian | ||||
Hixson, and further edited by Dan Connolly and C. M. Spergerg- | ||||
McQueen. | ||||
Thanks to the Internationalization Working Group (I18N WG) of the | ||||
World Wide Web Consortium (W3C), and the members of the W3C I18N | ||||
Working Group and Interest Group for their contributions and their | ||||
work on [CharMod]. Thanks also go to the members of many other W3C | ||||
Working Groups for adopting IRIs, and to the members of the Montreal | ||||
IAB Workshop on Internationalization and Localization for their | ||||
review. | ||||
12. Change Log | We would like to thank Ian Hickson, Michael Sperberg-McQueen, and Dan | |||
Connolly for their work on HyperText References, and Norman Walsh, | ||||
Richard Tobin, Henry S. Thomson, John Cowan, Paul Grosso, and the XML | ||||
Core Working Group of the W3C for their work on LEIRIs. | ||||
Note to RFC Editor: Please completely remove this section before | In addition, this document was influenced by contributions from (in | |||
publication. | no particular order) Chris Lilley, Bjoern Hoehrmann, Felix Sasaki, | |||
Jeremy Carroll, Frank Ellermann, Michael Everson, Cary Karp, | ||||
Matitiahu Allouche, Richard Ishida, Addison Phillips, Jonathan | ||||
Rosenne, Najib Tounsi, Debbie Garside, Mark Davis, Sarmad Hussain, | ||||
Ted Hardie, Konrad Lanz, Thomas Roessler, Lisa Dusseault, Julian | ||||
Reschke, Giovanni Campagna, Anne van Kesteren, Mark Nottingham, Erik | ||||
van der Poel, Marcin Hanclik, Marcos Caceres, Roy Fielding, Greg | ||||
Wilkins, Pieter Hintjens, Daniel R. Tobias, Marko Martin, Maciej | ||||
Stanchowiak, Wil Tan, Yui Naruse, Michael A. Puls II, Dave Thaler, | ||||
Tom Perch, John Klensin, Shawn Steele, Peter Saint-Andre, Geoffrey | ||||
Sneddon, Chris Weber, Alex Melnikov, Slim Amamou, SM, Tim Berners- | ||||
Lee, Yaron Goland, Sam Ruby, Adam Barth, Abdulrahman I. ALGhadir, | ||||
Aharon Lanin, Thomas Milo, Murray Sargent, Marc Blanchet, and Mykyta | ||||
Yevstifeyev. | ||||
12.1. Changes from draft-duerst-iri-bis-07 to draft-ietf-iri-3987bis-00 | 12. Main Changes Since RFC 3987 | |||
Changed draft name, date, last paragraph of abstract, and titles in | This section describes the main changes since [RFC3987]. | |||
change log, and added this section in moving from | ||||
draft-duerst-iri-bis-07 (personal submission) to | ||||
draft-ietf-iri-3987bis-00 (WG document). | ||||
12.2. Changes from -06 to -07 of draft-duerst-iri-bis | 12.1. Major restructuring of IRI processing model | |||
Major restructuring of IRI processing model to make scheme-specific | Major restructuring of IRI processing model to make scheme-specific | |||
translation necessary to handle IDNA requirements and for consistency | translation necessary to handle IDNA requirements and for consistency | |||
with web implementations. | with web implementations. | |||
Starting with IRI, you want one of: | Starting with IRI, you want one of: | |||
a IRI components (IRI parsed into UTF8 pieces) | a IRI components (IRI parsed into UTF8 pieces) | |||
b URI components (URI parsed into ASCII pieces, encoded correctly) | b URI components (URI parsed into ASCII pieces, encoded correctly) | |||
c whole URI (for passing on to some other system that wants whole | c whole URI (for passing on to some other system that wants whole | |||
URIs) | URIs) | |||
12.2.1. OLD WAY | 12.1.1. OLD WAY | |||
1. Pct-encoding on the whole thing to a URI. (c1) If you want a | 1. Pct-encoding on the whole thing to a URI. (c1) If you want a | |||
(maybe broken) whole URI, you might stop here. | (maybe broken) whole URI, you might stop here. | |||
2. Parsing the URI into URI components. (b1) If you want (maybe | 2. Parsing the URI into URI components. (b1) If you want (maybe | |||
broken) URI components, stop here. | broken) URI components, stop here. | |||
3. Decode the components (undoing the pct-encoding). (a) if you want | 3. Decode the components (undoing the pct-encoding). (a) if you want | |||
IRI components, stop here. | IRI components, stop here. | |||
4. reencode: Either using a different encoding some components (for | 4. reencode: Either using a different encoding some components (for | |||
domain names, and query components in web pages, which depends on | domain names, and query components in web pages, which depends on | |||
the component, scheme and context), and otherwise using pct- | the component, scheme and context), and otherwise using pct- | |||
encoding. (b2) if you want (good) URI components, stop here. | encoding. (b2) if you want (good) URI components, stop here. | |||
5. reassemble the reencoded components. (c2) if you want a (*good*) | 5. reassemble the reencoded components. (c2) if you want a (*good*) | |||
whole URI stop here. | whole URI stop here. | |||
12.2.2. NEW WAY | 12.1.2. NEW WAY | |||
1. Parse the IRI into IRI components using the generic syntax. (a) | 1. Parse the IRI into IRI components using the generic syntax. (a) | |||
if you want IRI components, stop here. | if you want IRI components, stop here. | |||
2. Encode each components, using pct-encoding, IDN encoding, or | 2. Encode each components, using pct-encoding, IDN encoding, or | |||
special query part encoding depending on the component scheme or | special query part encoding depending on the component scheme or | |||
context. (b) If you want URI components, stop here. | context. (b) If you want URI components, stop here. | |||
3. reassemble the a whole URI from URI components. (c) if you want a | 3. reassemble the a whole URI from URI components. (c) if you want a | |||
whole URI stop here. | whole URI stop here. | |||
12.1.3. Extension of Syntax | ||||
Added the tag range (U+E0000-E0FFF) to the iprivate production. Some | ||||
IRIs generated with the new syntax may fail to pass very strict | ||||
checks relying on the old syntax. But characters in this range | ||||
should be extremely infrequent anyway. | ||||
12.1.4. More to be added | ||||
TODO: There are more main changes that need to be documented in this | ||||
section. | ||||
12.2. Change Log | ||||
Note to RFC Editor: Please completely remove this section before | ||||
publication. | ||||
12.2.1. Changes after draft-ietf-iri-3987bis-01 | ||||
Changes from draft-ietf-iri-3987bis-01 onwards are available as | ||||
changesets in the IETF tools subversion repository at http:// | ||||
trac.tools.ietf.org/wg/iri/trac/log/draft-ietf-iri-3987bis/ | ||||
draft-ietf-iri-3987bis.xml. | ||||
12.2.2. Changes from draft-duerst-iri-bis-07 to | ||||
draft-ietf-iri-3987bis-00 | ||||
Changed draft name, date, last paragraph of abstract, and titles in | ||||
change log, and added this section in moving from | ||||
draft-duerst-iri-bis-07 (personal submission) to | ||||
draft-ietf-iri-3987bis-00 (WG document). | ||||
12.2.3. Changes from -06 to -07 of draft-duerst-iri-bis | ||||
Major restructuring of the processing model, see Section 12.1. | ||||
12.3. Changes from -00 to -01 | 12.3. Changes from -00 to -01 | |||
o Removed 'mailto:' before mail addresses of authors. | o Removed 'mailto:' before mail addresses of authors. | |||
o Added "<to be done>" as right side of 'href-strip' rule. Fixed | o Added "<to be done>" as right side of 'href-strip' rule. Fixed | |||
'|' to '/' for alternatives. | '|' to '/' for alternatives. | |||
12.4. Changes from -05 to -06 of draft-duerst-iri-bis-00 | 12.4. Changes from -05 to -06 of draft-duerst-iri-bis-00 | |||
o Add HyperText Reference, change abstract, acks and references for | o Add HyperText Reference, change abstract, acks and references for | |||
skipping to change at page 52, line 9 | skipping to change at page 52, line 23 | |||
[RFC5891] Klensin, J., "Internationalized Domain Names in | [RFC5891] Klensin, J., "Internationalized Domain Names in | |||
Applications (IDNA): Protocol", RFC 5891, August 2010. | Applications (IDNA): Protocol", RFC 5891, August 2010. | |||
[STD68] Crocker, D. and P. Overell, "Augmented BNF for Syntax | [STD68] Crocker, D. and P. Overell, "Augmented BNF for Syntax | |||
Specifications: ABNF", STD 68, RFC 5234, January 2008. | Specifications: ABNF", STD 68, RFC 5234, January 2008. | |||
[UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard | [UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard | |||
Annex #9, March 2004, | Annex #9, March 2004, | |||
<http://www.unicode.org/reports/tr9/tr9-13.html>. | <http://www.unicode.org/reports/tr9/tr9-13.html>. | |||
[UNIV4] The Unicode Consortium, "The Unicode Standard, Version | [UNIV6] The Unicode Consortium, "The Unicode Standard, Version | |||
5.1.0, defined by: The Unicode Standard, Version 5.0 | 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, | |||
(Boston, MA, Addison-Wesley, 2007. ISBN 0-321-48091-0), as | ISBN 978-1-936213-01-6)", October 2010. | |||
amended by Unicode 4.1.0 | ||||
(http://www.unicode.org/versions/Unicode5.1.0/)", | ||||
April 2008. | ||||
[UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", | [UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", | |||
Unicode Standard Annex #15, March 2008, | Unicode Standard Annex #15, March 2008, | |||
<http://www.unicode.org/unicode/reports/tr15/ | <http://www.unicode.org/unicode/reports/tr15/ | |||
tr15-23.html>. | tr15-23.html>. | |||
13.2. Informative References | 13.2. Informative References | |||
[BidiEx] "Examples of bidirectional IRIs", | [BidiEx] "Examples of bidirectional IRIs", | |||
<http://www.w3.org/International/iri-edit/BidiExamples>. | <http://www.w3.org/International/iri-edit/BidiExamples>. | |||
skipping to change at page 53, line 45 | skipping to change at page 54, line 8 | |||
[RFC2397] Masinter, L., "The "data" URL scheme", RFC 2397, | [RFC2397] Masinter, L., "The "data" URL scheme", RFC 2397, | |||
August 1998. | August 1998. | |||
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., | [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., | |||
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext | Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext | |||
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. | Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. | |||
[RFC2640] Curtin, B., "Internationalization of the File Transfer | [RFC2640] Curtin, B., "Internationalization of the File Transfer | |||
Protocol", RFC 2640, July 1999. | Protocol", RFC 2640, July 1999. | |||
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource | ||||
Identifiers (IRIs)", RFC 3987, January 2005. | ||||
[RFC4395bis] | [RFC4395bis] | |||
Hansen, T., Hardie, T., and L. Masinter, "Guidelines and | Hansen, T., Hardie, T., and L. Masinter, "Guidelines and | |||
Registration Procedures for New URI/IRI Schemes", | Registration Procedures for New URI/IRI Schemes", | |||
draft-hansen-iri-4395bis-irireg-00 (work in progress), | draft-hansen-iri-4395bis-irireg-00 (work in progress), | |||
September 2010. | September 2010. | |||
[UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | [UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | |||
Markup Languages", Unicode Technical Report #20, World | Markup Languages", Unicode Technical Report #20, World | |||
Wide Web Consortium Note, June 2003, | Wide Web Consortium Note, June 2003, | |||
<http://www.w3.org/TR/unicode-xml/>. | <http://www.w3.org/TR/unicode-xml/>. | |||
skipping to change at page 54, line 39 | skipping to change at page 55, line 5 | |||
Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes", | Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes", | |||
World Wide Web Consortium Recommendation, May 2001, | World Wide Web Consortium Recommendation, May 2001, | |||
<http://www.w3.org/TR/xmlschema-2/#anyURI>. | <http://www.w3.org/TR/xmlschema-2/#anyURI>. | |||
[XPointer] | [XPointer] | |||
Grosso, P., Maler, E., Marsh, J., and N. Walsh, "XPointer | Grosso, P., Maler, E., Marsh, J., and N. Walsh, "XPointer | |||
Framework", World Wide Web Consortium Recommendation, | Framework", World Wide Web Consortium Recommendation, | |||
March 2003, | March 2003, | |||
<http://www.w3.org/TR/xptr-framework/#escaping>. | <http://www.w3.org/TR/xptr-framework/#escaping>. | |||
Appendix A. Design Alternatives | ||||
This section briefly summarizes some design alternatives considered | ||||
earlier and the reasons why they were not chosen. | ||||
A.1. New Scheme(s) | ||||
Introducing new schemes (for example, httpi:, ftpi:,...) or a new | ||||
metascheme (e.g., i:, leading to URI/IRI prefixes such as i:http:, | ||||
i:ftp:,...) was proposed to make IRI-to-URI conversion scheme | ||||
dependent or to distinguish between percent-encodings resulting from | ||||
IRI-to-URI conversion and percent-encodings from legacy character | ||||
encodings. | ||||
New schemes are not needed to distinguish URIs from true IRIs (i.e., | ||||
IRIs that contain non-ASCII characters). The benefit of being able | ||||
to detect the origin of percent-encodings is marginal, as UTF-8 can | ||||
be detected with very high reliability. Deploying new schemes is | ||||
extremely hard, so not requiring new schemes for IRIs makes | ||||
deployment of IRIs vastly easier. Making conversion scheme dependent | ||||
is highly inadvisable and would be encouraged by separate schemes for | ||||
IRIs. Using a uniform convention for conversion from IRIs to URIs | ||||
makes IRI implementation orthogonal to the introduction of actual new | ||||
schemes. | ||||
A.2. Character Encodings Other Than UTF-8 | ||||
At an early stage, UTF-7 was considered as an alternative to UTF-8 | ||||
when IRIs are converted to URIs. UTF-7 would not have needed | ||||
percent-encoding and in most cases would have been shorter than | ||||
percent-encoded UTF-8. | ||||
Using UTF-8 avoids a double layering and overloading of the use of | ||||
the "+" character. UTF-8 is fully compatible with US-ASCII and has | ||||
therefore been recommended by the IETF, and is being used widely. | ||||
UTF-7 has never been used much and is now clearly being discouraged. | ||||
Requiring implementations to convert from UTF-8 to UTF-7 and back | ||||
would be an additional implementation burden. | ||||
A.3. New Encoding Convention | ||||
Instead of using the existing percent-encoding convention of URIs, | ||||
which is based on octets, the idea was to create a new encoding | ||||
convention; for example, to use "%u" to introduce UCS code points. | ||||
Using the existing octet-based percent-encoding mechanism does not | ||||
need an upgrade of the URI syntax and does not need corresponding | ||||
server upgrades. | ||||
A.4. Indicating Character Encodings in the URI/IRI | ||||
Some proposals suggested indicating the character encodings used in | ||||
an URI or IRI with some new syntactic convention in the URI itself, | ||||
similar to the "charset" parameter for e-mails and Web pages. As an | ||||
example, the label in square brackets in | ||||
"http://www.example.org/ros[iso-8859-1]é" indicated that the | ||||
following "é" had to be interpreted as iso-8859-1. | ||||
If UTF-8 is used exclusively, an upgrade to the URI syntax is not | ||||
needed. It avoids potentially multiple labels that have to be copied | ||||
correctly in all cases, even on the side of a bus or on a napkin, | ||||
leading to usability problems (and being prohibitively annoying). | ||||
Exclusively using UTF-8 also reduces transcoding errors and | ||||
confusion. | ||||
Authors' Addresses | Authors' Addresses | |||
Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever | Martin Duerst | |||
possible, for example as "Dürst" in XML and HTML) | ||||
Aoyama Gakuin University | Aoyama Gakuin University | |||
5-10-1 Fuchinobe | 5-10-1 Fuchinobe | |||
Sagamihara, Kanagawa 229-8558 | Sagamihara, Kanagawa 229-8558 | |||
Japan | Japan | |||
Phone: +81 42 759 6329 | Phone: +81 42 759 6329 | |||
Fax: +81 42 759 6495 | Fax: +81 42 759 6495 | |||
Email: duerst@it.aoyama.ac.jp | Email: duerst@it.aoyama.ac.jp | |||
URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ | URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ | |||
(Note: This is the percent-encoded form of an IRI) | ||||
Michel Suignard | Michel Suignard | |||
Unicode Consortium | Unicode Consortium | |||
P.O. Box 391476 | P.O. Box 391476 | |||
Mountain View, CA 94039-1476 | Mountain View, CA 94039-1476 | |||
U.S.A. | U.S.A. | |||
Phone: +1-650-693-3921 | Phone: +1-650-693-3921 | |||
Email: michel@unicode.org | Email: michel@unicode.org | |||
URI: http://www.suignard.com | URI: http://www.suignard.com | |||
End of changes. 28 change blocks. | ||||
146 lines changed or deleted | 102 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |