draft-ietf-iri-3987bis-12.txt | draft-ietf-iri-3987bis-13.txt | |||
---|---|---|---|---|
Internationalized Resource Identifiers M. Duerst | Internationalized Resource Identifiers M. Duerst | |||
(iri) Aoyama Gakuin University | (iri) Aoyama Gakuin University | |||
Internet-Draft M. Suignard | Internet-Draft M. Suignard | |||
Obsoletes: 3987 (if approved) Unicode Consortium | Obsoletes: 3987 (if approved) Unicode Consortium | |||
Intended status: Standards Track L. Masinter | Intended status: Standards Track L. Masinter | |||
Expires: January 17, 2013 Adobe | Expires: April 23, 2013 Adobe | |||
July 16, 2012 | October 20, 2012 | |||
Internationalized Resource Identifiers (IRIs) | Internationalized Resource Identifiers (IRIs) | |||
draft-ietf-iri-3987bis-12 | draft-ietf-iri-3987bis-13 | |||
Abstract | Abstract | |||
This document defines the Internationalized Resource Identifier (IRI) | This document defines the Internationalized Resource Identifier (IRI) | |||
protocol element, as an extension of the Uniform Resource Identifier | protocol element, as an extension of the Uniform Resource Identifier | |||
(URI). An IRI is a sequence of characters from the Universal | (URI). An IRI is a sequence of characters from the Universal | |||
Character Set (Unicode/ISO 10646). Grammar and processing rules are | Character Set (Unicode/ISO 10646). Grammar and processing rules are | |||
given for IRIs and related syntactic forms. | given for IRIs and related syntactic forms. | |||
Defining IRI as a new protocol element (rather than updating or | Defining IRI as a new protocol element (rather than updating or | |||
skipping to change at page 1, line 49 | skipping to change at page 1, line 49 | |||
public-iri@w3.org, archives at | public-iri@w3.org, archives at | |||
http://lists.w3.org/archives/public/public-iri/. For a list of open | http://lists.w3.org/archives/public/public-iri/. For a list of open | |||
issues, please see the issue tracker of the WG at | issues, please see the issue tracker of the WG at | |||
http://trac.tools.ietf.org/wg/iri/trac/report/1. For a list of | http://trac.tools.ietf.org/wg/iri/trac/report/1. For a list of | |||
individual edits, please see the change history at | individual edits, please see the change history at | |||
http://trac.tools.ietf.org/wg/iri/trac/log/draft-ietf-iri-3987bis. | http://trac.tools.ietf.org/wg/iri/trac/log/draft-ietf-iri-3987bis. | |||
This document is available in (line-printer ready) plaintext ASCII | This document is available in (line-printer ready) plaintext ASCII | |||
and PDF. It is also available in HTML from | and PDF. It is also available in HTML from | |||
http://www.sw.it.aoyama.ac.jp/2012/pub/ | http://www.sw.it.aoyama.ac.jp/2012/pub/ | |||
draft-ietf-iri-3987bis-12.html, and in UTF-8 plaintext from http:// | draft-ietf-iri-3987bis-13.html, and in UTF-8 plaintext from http:// | |||
www.sw.it.aoyama.ac.jp/2012/pub/draft-ietf-iri-3987bis-12.utf8.txt. | www.sw.it.aoyama.ac.jp/2012/pub/draft-ietf-iri-3987bis-13.utf8.txt. | |||
While all these versions are identical in their technical content, | While all these versions are identical in their technical content, | |||
the HTML, PDF, and UTF-8 plaintext versions show non-Unicode | the HTML, PDF, and UTF-8 plaintext versions show non-Unicode | |||
characters directly. This often makes it easier to understand | characters directly. This often makes it easier to understand | |||
examples, and readers are therefore advised to consult these versions | examples, and readers are therefore advised to consult these versions | |||
in preference or as a supplement to the ASCII version. | in preference or as a supplement to the ASCII version. | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
skipping to change at page 2, line 26 | skipping to change at page 2, line 26 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 17, 2013. | This Internet-Draft will expire on April 23, 2013. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 3, line 17 | skipping to change at page 3, line 17 | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
1.1. Overview and Motivation . . . . . . . . . . . . . . . . . 5 | 1.1. Overview and Motivation . . . . . . . . . . . . . . . . . 5 | |||
1.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6 | |||
1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 7 | 1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 8 | 1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 9 | 2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 9 | |||
2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10 | 2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10 | |||
3. Processing IRIs and related protocol elements . . . . . . . . 13 | 3. Processing IRIs and related protocol elements . . . . . . . . 12 | |||
3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 13 | 3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 13 | |||
3.2. Parse the IRI into IRI components . . . . . . . . . . . . 13 | 3.2. Parse the IRI into IRI components . . . . . . . . . . . . 13 | |||
3.3. General percent-encoding of IRI components . . . . . . . 14 | 3.3. General percent-encoding of IRI components . . . . . . . 13 | |||
3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 14 | 3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 14 | |||
3.4.1. Mapping using Percent-Encoding . . . . . . . . . . . . 14 | 3.4.1. Mapping using Percent-Encoding . . . . . . . . . . . . 14 | |||
3.4.2. Mapping using Punycode . . . . . . . . . . . . . . . . 15 | 3.4.2. Mapping using Punycode . . . . . . . . . . . . . . . . 14 | |||
3.4.3. Additional Considerations . . . . . . . . . . . . . . 15 | 3.4.3. Additional Considerations . . . . . . . . . . . . . . 15 | |||
3.5. Mapping query components . . . . . . . . . . . . . . . . 16 | 3.5. Mapping query components . . . . . . . . . . . . . . . . 16 | |||
3.6. Mapping IRIs to URIs . . . . . . . . . . . . . . . . . . 16 | 3.6. Mapping IRIs to URIs . . . . . . . . . . . . . . . . . . 16 | |||
4. Converting URIs to IRIs . . . . . . . . . . . . . . . . . . . 16 | 4. Converting URIs to IRIs . . . . . . . . . . . . . . . . . . . 16 | |||
4.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . 18 | 4.1. Limitations . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
4.2. Conversion . . . . . . . . . . . . . . . . . . . . . . . 17 | ||||
4.3. Examples . . . . . . . . . . . . . . . . . . . . . . . . 18 | ||||
5. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 19 | 5. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
5.1. Limitations on UCS Characters Allowed in IRIs . . . . . . 19 | 5.1. Limitations on UCS Characters Allowed in IRIs . . . . . . 19 | |||
5.2. Software Interfaces and Protocols . . . . . . . . . . . . 20 | 5.2. Software Interfaces and Protocols . . . . . . . . . . . . 20 | |||
5.3. Format of URIs and IRIs in Documents and Protocols . . . 21 | 5.3. Format of URIs and IRIs in Documents and Protocols . . . 20 | |||
5.4. Use of UTF-8 for Encoding Original Characters . . . . . . 21 | 5.4. Use of UTF-8 for Encoding Original Characters . . . . . . 21 | |||
5.5. Relative IRI References . . . . . . . . . . . . . . . . . 23 | 5.5. Relative IRI References . . . . . . . . . . . . . . . . . 22 | |||
6. Legacy Extended IRIs (LEIRIs) . . . . . . . . . . . . . . . . 23 | 6. Legacy Extended IRIs (LEIRIs) . . . . . . . . . . . . . . . . 23 | |||
6.1. Legacy Extended IRI Syntax . . . . . . . . . . . . . . . 23 | 6.1. Legacy Extended IRI Syntax . . . . . . . . . . . . . . . 23 | |||
6.2. Conversion of Legacy Extended IRIs to IRIs . . . . . . . 24 | 6.2. Conversion of Legacy Extended IRIs to IRIs . . . . . . . 23 | |||
6.3. Characters Allowed in Legacy Extended IRIs but not in | 6.3. Characters Allowed in Legacy Extended IRIs but not in | |||
IRIs . . . . . . . . . . . . . . . . . . . . . . . . . . 24 | IRIs . . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
7. URI/IRI Processing Guidelines (Informative) . . . . . . . . . 26 | 7. Processing of URIs/IRIs/URLs by Web Browsers . . . . . . . . . 25 | |||
7.1. URI/IRI Software Interfaces . . . . . . . . . . . . . . . 26 | 8. URI/IRI Processing Guidelines (Informative) . . . . . . . . . 26 | |||
7.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 26 | 8.1. URI/IRI Software Interfaces . . . . . . . . . . . . . . . 26 | |||
7.3. URI/IRI Transfer between Applications . . . . . . . . . . 27 | 8.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 26 | |||
7.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 27 | 8.3. URI/IRI Transfer between Applications . . . . . . . . . . 27 | |||
7.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 28 | 8.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 27 | |||
7.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 29 | 8.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 28 | |||
7.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 29 | 8.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 29 | |||
7.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 30 | 8.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 29 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 | 8.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 30 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 31 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 32 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 31 | |||
11. Main Changes Since RFC 3987 . . . . . . . . . . . . . . . . . 33 | 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 32 | |||
11.1. Split out Bidi, processing guidelines, comparison | 12. Main Changes Since RFC 3987 . . . . . . . . . . . . . . . . . 32 | |||
sections . . . . . . . . . . . . . . . . . . . . . . . . 33 | 12.1. Split out Bidi, processing guidelines, comparison | |||
11.2. Major restructuring of IRI processing model . . . . . . . 33 | sections . . . . . . . . . . . . . . . . . . . . . . . . 32 | |||
11.2.1. OLD WAY . . . . . . . . . . . . . . . . . . . . . . . 33 | 12.2. Major restructuring of IRI processing model . . . . . . . 32 | |||
11.2.2. NEW WAY . . . . . . . . . . . . . . . . . . . . . . . 34 | 12.2.1. OLD WAY . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
11.2.3. Extension of Syntax . . . . . . . . . . . . . . . . . 34 | 12.2.2. NEW WAY . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
11.2.4. More to be added . . . . . . . . . . . . . . . . . . . 34 | 12.2.3. Extension of Syntax . . . . . . . . . . . . . . . . . 33 | |||
11.3. Change Log . . . . . . . . . . . . . . . . . . . . . . . 34 | 12.2.4. More to be added . . . . . . . . . . . . . . . . . . . 34 | |||
11.3.1. Changes after draft-ietf-iri-3987bis-01 . . . . . . . 34 | 12.3. Change Log . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
11.3.2. Changes from draft-duerst-iri-bis-07 to | 12.3.1. Changes after draft-ietf-iri-3987bis-01 . . . . . . . 34 | |||
12.3.2. Changes from draft-duerst-iri-bis-07 to | ||||
draft-ietf-iri-3987bis-00 . . . . . . . . . . . . . . 34 | draft-ietf-iri-3987bis-00 . . . . . . . . . . . . . . 34 | |||
11.3.3. Changes from -06 to -07 of draft-duerst-iri-bis . . . 34 | 12.3.3. Changes from -06 to -07 of draft-duerst-iri-bis . . . 34 | |||
11.4. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 35 | 12.4. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 34 | |||
11.5. Changes from -05 to -06 of draft-duerst-iri-bis-00 . . . 35 | 12.5. Changes from -05 to -06 of draft-duerst-iri-bis-00 . . . 34 | |||
11.6. Changes from -04 to -05 of draft-duerst-iri-bis . . . . . 35 | 12.6. Changes from -04 to -05 of draft-duerst-iri-bis . . . . . 35 | |||
11.7. Changes from -03 to -04 of draft-duerst-iri-bis . . . . . 35 | 12.7. Changes from -03 to -04 of draft-duerst-iri-bis . . . . . 35 | |||
11.8. Changes from -02 to -03 of draft-duerst-iri-bis . . . . . 35 | 12.8. Changes from -02 to -03 of draft-duerst-iri-bis . . . . . 35 | |||
11.9. Changes from -01 to -02 of draft-duerst-iri-bis . . . . . 35 | 12.9. Changes from -01 to -02 of draft-duerst-iri-bis . . . . . 35 | |||
11.10. Changes from -00 to -01 of draft-duerst-iri-bis . . . . . 36 | 12.10. Changes from -00 to -01 of draft-duerst-iri-bis . . . . . 35 | |||
11.11. Changes from RFC 3987 to -00 of draft-duerst-iri-bis . . 36 | 12.11. Changes from RFC 3987 to -00 of draft-duerst-iri-bis . . 35 | |||
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 36 | 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 36 | |||
12.1. Normative References . . . . . . . . . . . . . . . . . . 36 | 13.1. Normative References . . . . . . . . . . . . . . . . . . 36 | |||
12.2. Informative References . . . . . . . . . . . . . . . . . 37 | 13.2. Informative References . . . . . . . . . . . . . . . . . 37 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 39 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
1. Introduction | 1. Introduction | |||
1.1. Overview and Motivation | 1.1. Overview and Motivation | |||
A Uniform Resource Identifier (URI) is defined in [RFC3986] as a | A Uniform Resource Identifier (URI) is defined in [RFC3986] as a | |||
sequence of characters chosen from a limited subset of the repertoire | sequence of characters chosen from a limited subset of the repertoire | |||
of US-ASCII [ASCII] characters. | of US-ASCII [ASCII] characters. | |||
skipping to change at page 6, line 6 | skipping to change at page 6, line 6 | |||
character sequences that can result in the same presentation. | character sequences that can result in the same presentation. | |||
This document defines the protocol element called Internationalized | This document defines the protocol element called Internationalized | |||
Resource Identifier (IRI), which allows applications of URIs to be | Resource Identifier (IRI), which allows applications of URIs to be | |||
extended to use resource identifiers that have a much wider | extended to use resource identifiers that have a much wider | |||
repertoire of characters. It also provides corresponding | repertoire of characters. It also provides corresponding | |||
"internationalized" versions of other constructs from [RFC3986], such | "internationalized" versions of other constructs from [RFC3986], such | |||
as URI references. The syntax of IRIs is defined in Section 2. | as URI references. The syntax of IRIs is defined in Section 2. | |||
Within this document, Section 5 discusses the use of IRIs in | Within this document, Section 5 discusses the use of IRIs in | |||
different situations. Section 7 gives additional informative | different situations. Section 8 gives additional informative | |||
guidelines. Section 9 discusses IRI-specific security | guidelines. Section 10 discusses IRI-specific security | |||
considerations. | considerations. | |||
This specification is part of a collection of specifications intended | This specification is part of a collection of specifications intended | |||
to replace [RFC3987]. [Bidi] discusses the special case of | to replace [RFC3987]. [Bidi] discusses the special case of | |||
bidirectional IRIs, IRIs using characters from scripts written right- | bidirectional IRIs, IRIs using characters from scripts written right- | |||
to-left. [Equivalence] gives guidelines for applications wishing to | to-left. [Equivalence] gives guidelines for applications wishing to | |||
determine if two IRIs are equivalent, as well as defining some | determine if two IRIs are equivalent, as well as defining some | |||
equivalence methods. [RFC4395bis] updates the URI scheme | equivalence methods. [RFC4395bis] updates the URI scheme | |||
registration guidelines and procedures to note that every URI scheme | registration guidelines and procedures to note that every URI scheme | |||
is also automatically an IRI scheme and to allow scheme definitions | is also automatically an IRI scheme and to allow scheme definitions | |||
skipping to change at page 7, line 15 | skipping to change at page 7, line 15 | |||
[RFC4395bis]. For example, this is the practice for IMAP URLs | [RFC4395bis]. For example, this is the practice for IMAP URLs | |||
[RFC2192], POP URLs [RFC2384] and the URN syntax [RFC2141]). Note | [RFC2192], POP URLs [RFC2384] and the URN syntax [RFC2141]). Note | |||
that use of percent-encoding may also be restricted in some | that use of percent-encoding may also be restricted in some | |||
situations, for example, URI schemes that disallow percent- | situations, for example, URI schemes that disallow percent- | |||
encoding might still be used with a fragment identifier which is | encoding might still be used with a fragment identifier which is | |||
percent-encoded (e.g., [XPointer]). See Section 5.4 for further | percent-encoded (e.g., [XPointer]). See Section 5.4 for further | |||
discussion. | discussion. | |||
1.3. Definitions | 1.3. Definitions | |||
The following definitions are used in this document; they follow the | Various terms used in this document are defined in [RFC6365] and | |||
terms in [RFC2130], [RFC2277], and [ISO10646]. | [RFC3986]. In addition, we define the following terms for use in | |||
this document. | ||||
character: A member of a set of elements used for the organization, | ||||
control, or representation of data. For example, "LATIN CAPITAL | ||||
LETTER A" names a character. | ||||
octet: An ordered sequence of eight bits considered as a unit. | octet: An ordered sequence of eight bits considered as a unit. | |||
character repertoire: A set of characters (set in the mathematical | ||||
sense). | ||||
sequence of characters: A sequence of characters (one after | sequence of characters: A sequence of characters (one after | |||
another). | another). | |||
sequence of octets: A sequence of octets (one after another). | sequence of octets: A sequence of octets (one after another). | |||
character encoding: A method of representing a sequence of | character encoding: A method of representing a sequence of | |||
characters as a sequence of octets (maybe with variants). Also, a | characters as a sequence of octets (maybe with variants). Also, a | |||
method of (unambiguously) converting a sequence of octets into a | method of (unambiguously) converting a sequence of octets into a | |||
sequence of characters. | sequence of characters. | |||
skipping to change at page 8, line 9 | skipping to change at page 8, line 5 | |||
relative. However, the "IRI" that results from such a reference | relative. However, the "IRI" that results from such a reference | |||
only includes absolute IRIs; any relative IRI references are | only includes absolute IRIs; any relative IRI references are | |||
resolved to their absolute form. Note that in [RFC2396] URIs did | resolved to their absolute form. Note that in [RFC2396] URIs did | |||
not include fragment identifiers, but in [RFC3986] fragment | not include fragment identifiers, but in [RFC3986] fragment | |||
identifiers are part of URIs. | identifiers are part of URIs. | |||
LEIRI (Legacy Extended IRI): This term is used in various XML | LEIRI (Legacy Extended IRI): This term is used in various XML | |||
specifications to refer to strings that, although not valid IRIs, | specifications to refer to strings that, although not valid IRIs, | |||
are acceptable input to the processing rules in Section 6.2. | are acceptable input to the processing rules in Section 6.2. | |||
running text: Human text (paragraphs, sentences, phrases) with | ||||
syntax according to orthographic conventions of a natural | ||||
language, as opposed to syntax defined for ease of processing by | ||||
machines (e.g., markup, programming languages). | ||||
protocol element: Any portion of a message that affects processing | protocol element: Any portion of a message that affects processing | |||
of that message by the protocol in question. | of that message by the protocol in question. | |||
create (a URI or IRI): With respect to URIs and IRIs, the term is | create (a URI or IRI): With respect to URIs and IRIs, the term is | |||
used for the initial creation. This may be the initial creation | used for the initial creation. This may be the initial creation | |||
of a resource with a certain identifier, or the initial exposition | of a resource with a certain identifier, or the initial exposition | |||
of a resource under a particular identifier. | of a resource under a particular identifier. | |||
generate (a URI or IRI): With respect to URIs and IRIs, the term is | generate (a URI or IRI): With respect to URIs and IRIs, the term is | |||
used when the identifier is generated by derivation from other | used when the identifier is generated by derivation from other | |||
skipping to change at page 15, line 12 | skipping to change at page 14, line 49 | |||
[RFC3986], which does not mandate a particular registered name lookup | [RFC3986], which does not mandate a particular registered name lookup | |||
technology. For further background, see [RFC6055] and [Gettys]. | technology. For further background, see [RFC6055] and [Gettys]. | |||
3.4.2. Mapping using Punycode | 3.4.2. Mapping using Punycode | |||
In situations where it is certain that <ireg-name> is intended to be | In situations where it is certain that <ireg-name> is intended to be | |||
used as a domain name to be processed by Domain Name Lookup (as per | used as a domain name to be processed by Domain Name Lookup (as per | |||
[RFC5891]), an alternative method MAY be used, converting <ireg-name> | [RFC5891]), an alternative method MAY be used, converting <ireg-name> | |||
as follows: | as follows: | |||
If there are any sequences of <pct-encoded>, and their corresponding | If there is any percent-encoding, and the corresponding octets all | |||
octets all represent valid UTF-8 octet sequences, then convert these | represent valid UTF-8 octet sequences, then convert these back to | |||
back to Unicode character sequences. (If any <pct-encoded> sequences | Unicode character sequences. (If any percent-encodings are not valid | |||
are not valid UTF-8 octet sequences, then leave the entire field as | UTF-8 octet sequences, then leave the entire field as is without any | |||
is without any change, since punycode encoding would not succeed.) | change, since punycode encoding would not succeed.) | |||
Replace the ireg-name part of the IRI by the part converted using the | Replace the ireg-name part of the IRI by the part converted using the | |||
Domain Name Lookup procedure (Subsections 5.3 to 5.5) of [RFC5891]. | Domain Name Lookup procedure (Subsections 5.3 to 5.5) of [RFC5891]. | |||
on each dot-separated label, and by using U+002E (FULL STOP) as a | on each dot-separated label, and by using U+002E (FULL STOP) as a | |||
label separator. This procedure may fail, but this would mean that | label separator. This procedure may fail, but this would mean that | |||
the IRI cannot be resolved. In such cases, if the domain name | the IRI cannot be resolved. In such cases, if the domain name | |||
conversion fails, then the entire IRI conversion fails. Processors | conversion fails, then the entire IRI conversion fails. Processors | |||
that have no mechanism for signalling a failure MAY instead | that have no mechanism for signalling a failure MAY instead | |||
substitute an otherwise invalid host name, although such processing | substitute an otherwise invalid host name, although such processing | |||
SHOULD be avoided. | SHOULD be avoided. | |||
skipping to change at page 15, line 38 | skipping to change at page 15, line 27 | |||
For example, the IRI | For example, the IRI | |||
"http://résumé.example.org" | "http://résumé.example.org" | |||
is converted to | is converted to | |||
"http://xn--rsum-bad.example.org". | "http://xn--rsum-bad.example.org". | |||
This conversion for ireg-name will be better able to deal with legacy | This conversion for ireg-name will be better able to deal with legacy | |||
infrastructure that cannot handle percent-encoding in domain names. | infrastructure that cannot handle percent-encoding in domain names. | |||
3.4.3. Additional Considerations | 3.4.3. Additional Considerations | |||
Note: Domain Names may appear in parts of an IRI other than the | Domain Names can appear in parts of an IRI other than the ireg-name | |||
ireg-name part. It is the responsibility of scheme-specific | part. It is the responsibility of scheme-specific implementations | |||
implementations (if the Internationalized Domain Name is part of | (if the Internationalized Domain Name is part of the scheme syntax) | |||
the scheme syntax) or of server-side implementations (if the | or of server-side implementations (if the Internationalized Domain | |||
Internationalized Domain Name is part of 'iquery') to apply the | Name is part of 'iquery') to apply the necessary conversions at the | |||
necessary conversions at the appropriate point. Example: Trying | appropriate point. Example: Trying to validate the Web page at | |||
to validate the Web page at | http://résumé.example.org would lead to an IRI of | |||
http://résumé.example.org would lead to an IRI of | http://validator.w3.org/check?uri=http%3A%2F%2Frésumé | |||
http://validator.w3.org/check?uri=http%3A%2F%2Frésumé | .example.org, which would convert to a URI of | |||
.example.org, which would convert to a URI of | http://validator.w3.org/check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9. | |||
http://validator.w3.org/check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9. | example.org. The server-side implementation is responsible for | |||
example.org. The server-side implementation is responsible for | making the necessary conversions to be able to retrieve the Web page. | |||
making the necessary conversions to be able to retrieve the Web | ||||
page. | ||||
Note: In this process, characters allowed in URI references and | In this process, characters allowed in URI references and existing | |||
existing percent-encoded sequences are not encoded further. (This | percent-encoded sequences are not encoded further. (This mapping is | |||
mapping is similar to, but different from, the encoding applied | similar to, but different from, the encoding applied when arbitrary | |||
when arbitrary content is included in some part of a URI.) For | content is included in some part of a URI.) For example, an IRI of | |||
example, an IRI of | "http://www.example.org/red%09rosé#red" (in XML notation) is | |||
"http://www.example.org/red%09rosé#red" (in XML notation) is | converted to | |||
converted to | "http://www.example.org/red%09ros%C3%A9#red", not to something like | |||
"http://www.example.org/red%09ros%C3%A9#red", not to something | "http%3A%2F%2Fwww.example.org%2Fred%2509ros%C3%A9%23red". | |||
like | ||||
"http%3A%2F%2Fwww.example.org%2Fred%2509ros%C3%A9%23red". | ||||
3.5. Mapping query components | 3.5. Mapping query components | |||
For compatibility with existing deployed HTTP infrastructure, the | For compatibility with existing deployed HTTP infrastructure, the | |||
following special case applies for the schemes "http" and "https" | following special case applies for the schemes "http" and "https" | |||
when an IRI is found in a document whose charset is not based on UCS | when an IRI is found in a document whose charset is not based on UCS | |||
(e.g., not UTF-8 or UTF-16). In such a case, the "query" component | (e.g., not UTF-8 or UTF-16). In such a case, the "query" component | |||
of an IRI is mapped into a URI by using the document charset rather | of an IRI is mapped into a URI by using the document charset rather | |||
than UTF-8 as the binary representation before pct-encoding. This | than UTF-8 as the binary representation before percent-encoding. | |||
mapping is not applied for any other scheme or component. | This mapping is not applied for any other schemes or components. | |||
3.6. Mapping IRIs to URIs | 3.6. Mapping IRIs to URIs | |||
The mapping from an IRI to URI is accomplished by applying the | The mapping from an IRI to URI is accomplished by applying the | |||
mapping above (from IRI to URI components) and then reassembling a | mapping above (from IRI to URI components) and then reassembling a | |||
URI from the parsed URI components using the original punctuation | URI from the parsed URI components using the original punctuation | |||
that delimited the IRI components. | that delimited the IRI components. | |||
4. Converting URIs to IRIs | 4. Converting URIs to IRIs | |||
In some situations, for presentation and further processing, it is | In some situations, for presentation and further processing, it is | |||
desirable to convert a URI into an equivalent IRI without unnecessary | desirable to convert a URI into an equivalent IRI without unnecessary | |||
percent encoding. Of course, every URI is already an IRI in its own | percent encoding. Of course, every URI is already an IRI in its own | |||
right without any conversion. This section gives one possible | right without any conversion. This section gives one possible | |||
procedure for URI to IRI mapping. | procedure for converting a URI to an IRI. | |||
4.1. Limitations | ||||
The conversion described in this section, if given a valid URI, will | The conversion described in this section, if given a valid URI, will | |||
result in an IRI that maps back to the URI used as an input for the | result in an IRI that maps back to the URI used as an input for the | |||
conversion (except for potential case differences in percent-encoding | conversion (except for potential case differences in percent-encoding | |||
and for potential percent-encoded unreserved characters). However, | and for potential percent-encoded unreserved characters). However, | |||
the IRI resulting from this conversion may differ from the original | the IRI resulting from this conversion may differ from the original | |||
IRI (if there ever was one). | IRI (if there ever was one). | |||
URI-to-IRI conversion removes percent-encodings, but not all percent- | URI-to-IRI conversion removes percent-encodings, but not all percent- | |||
encodings can be eliminated. There are several reasons for this: | encodings can be eliminated. There are several reasons for this: | |||
skipping to change at page 17, line 15 | skipping to change at page 17, line 4 | |||
1. Some percent-encodings are necessary to distinguish percent- | 1. Some percent-encodings are necessary to distinguish percent- | |||
encoded and unencoded uses of reserved characters. | encoded and unencoded uses of reserved characters. | |||
2. Some percent-encodings cannot be interpreted as sequences of UTF-8 | 2. Some percent-encodings cannot be interpreted as sequences of UTF-8 | |||
octets. | octets. | |||
(Note: The octet patterns of UTF-8 are highly regular. Therefore, | (Note: The octet patterns of UTF-8 are highly regular. Therefore, | |||
there is a very high probability, but no guarantee, that percent- | there is a very high probability, but no guarantee, that percent- | |||
encodings that can be interpreted as sequences of UTF-8 octets | encodings that can be interpreted as sequences of UTF-8 octets | |||
actually originated from UTF-8. For a detailed discussion, see | actually originated from UTF-8. For a detailed discussion, see | |||
[Duerst97].) | [Duerst97].) | |||
3. The conversion may result in a character that is not appropriate | 3. The conversion may result in a character that is not appropriate | |||
in an IRI. See Section 2.2, and Section 5.1 for further details. | in an IRI. See Section 2.2, and Section 5.1 for further details. | |||
4. IRI to URI conversion has different rules for dealing with domain | 4. As described in Section 3.5, IRI to URI conversion may work | |||
names and query parameters. | somewhat differently for query components. | |||
4.2. Conversion | ||||
Conversion from a URI to an IRI MAY be done by using the following | Conversion from a URI to an IRI MAY be done by using the following | |||
steps: | steps: | |||
1. Represent the URI as a sequence of octets in US-ASCII. | 1. Represent the URI as a sequence of octets in US-ASCII. | |||
2. Convert all percent-encodings ("%" followed by two hexadecimal | 2. Convert all percent-encodings ("%" followed by two hexadecimal | |||
digits) to the corresponding octets, except those corresponding to | digits) to the corresponding octets, except those corresponding to | |||
"%", characters in "reserved", and characters in US-ASCII not | "%", characters in "reserved", and characters in US-ASCII not | |||
allowed in URIs. | allowed in URIs. | |||
3. Re-percent-encode any octet produced in step 2 that is not part of | 3. Re-percent-encode any octet produced in step 2 that is not part of | |||
a strictly legal UTF-8 octet sequence. | a strictly legal UTF-8 octet sequence. | |||
4. Re-percent-encode all octets produced in step 3 that in UTF-8 | 4. Re-percent-encode all octets produced in step 3 that in UTF-8 | |||
represent characters that are not appropriate according to | represent characters that are not appropriate according to | |||
Section 2.2 and Section 5.1. | Section 2.2 and Section 5.1. | |||
5. Interpret the resulting octet sequence as a sequence of characters | 5. Optionally, re-percent-encode octets in the query component if the | |||
scheme is one of those mentioned in Section 3.5. | ||||
6. Interpret the resulting octet sequence as a sequence of characters | ||||
encoded in UTF-8. | encoded in UTF-8. | |||
6. URIs known to contain domain names in the reg-name component | 7. URIs known to contain domain names in the reg-name component | |||
SHOULD convert punycode-encoded domain name labels to the | SHOULD convert punycode-encoded domain name labels to the | |||
corresponding characters using the ToUnicode procedure. | corresponding characters using the ToUnicode procedure. | |||
This procedure will convert as many percent-encoded characters as | This procedure will convert as many percent-encoded characters as | |||
possible to characters in an IRI. Because there are some choices | possible to characters in an IRI. Because there are some choices in | |||
when step 4 is applied (see Section 5.1), results may vary. | steps 4 (see also Section 5.1) and 5, results may vary. | |||
Conversions from URIs to IRIs MUST NOT use any character encoding | Conversions from URIs to IRIs MUST NOT use any character encoding | |||
other than UTF-8 in steps 3 and 4, even if it might be possible to | other than UTF-8 in steps 3 and 4, even if it might be possible to | |||
guess from the context that another character encoding than UTF-8 was | guess from the context that another character encoding than UTF-8 was | |||
used in the URI. For example, the URI | used in the URI. For example, the URI | |||
"http://www.example.org/r%E9sum%E9.html" might with some guessing be | "http://www.example.org/r%E9sum%E9.html" might with some guessing be | |||
interpreted to contain two e-acute characters encoded as iso-8859-1. | interpreted to contain two e-acute characters encoded as iso-8859-1. | |||
It must not be converted to an IRI containing these e-acute | It must not be converted to an IRI containing these e-acute | |||
characters. Otherwise, in the future the IRI will be mapped to | characters. Otherwise, in the future the IRI will be mapped to | |||
"http://www.example.org/r%C3%A9sum%C3%A9.html", which is a different | "http://www.example.org/r%C3%A9sum%C3%A9.html", which is a different | |||
URI from "http://www.example.org/r%E9sum%E9.html". | URI from "http://www.example.org/r%E9sum%E9.html". | |||
4.1. Examples | 4.3. Examples | |||
This section shows various examples of converting URIs to IRIs. Each | This section shows various examples of converting URIs to IRIs. Each | |||
example shows the result after each of the steps 1 through 6 is | example shows the result after each of the steps 1 through 6 is | |||
applied. XML Notation is used for the final result. Octets are | applied. XML Notation is used for the final result. Octets are | |||
denoted by "<" followed by two hexadecimal digits followed by ">". | denoted by "<" followed by two hexadecimal digits followed by ">". | |||
The following example contains the sequence "%C3%BC", which is a | The following example contains the sequence "%C3%BC", which is a | |||
strictly legal UTF-8 sequence, and which is converted into the actual | strictly legal UTF-8 sequence, and which is converted into the actual | |||
character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (also known as | character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (also known as | |||
u-umlaut). | u-umlaut). | |||
skipping to change at page 20, line 36 | skipping to change at page 20, line 28 | |||
parse and process the '/' separately. | parse and process the '/' separately. | |||
d. The ZERO WIDTH NON-JOINER (U+200C) and ZERO WIDTH JOINER (U+200D) | d. The ZERO WIDTH NON-JOINER (U+200C) and ZERO WIDTH JOINER (U+200D) | |||
are invisible in most contexts, but are crucial in some very | are invisible in most contexts, but are crucial in some very | |||
limited contexts. Appendix A of [RFC5892] contains contextual | limited contexts. Appendix A of [RFC5892] contains contextual | |||
restrictions for these and some other characters. The use of | restrictions for these and some other characters. The use of | |||
these characters are strongly discouraged except in the relevant | these characters are strongly discouraged except in the relevant | |||
contexts. | contexts. | |||
Additional information is available from [UNIXML]. [UNIXML] is | Additional information is available from [UNIXML]. [UNIXML] is | |||
written in the context of running text rather than in that of | written in the context of general purpose text rather than in that of | |||
identifiers. Nevertheless, it discusses many of the categories of | identifiers. Nevertheless, it discusses many of the categories of | |||
characters not appropriate for IRIs. | characters not appropriate for IRIs. | |||
5.2. Software Interfaces and Protocols | 5.2. Software Interfaces and Protocols | |||
Although an IRI is defined as a sequence of characters, software | Although an IRI is defined as a sequence of characters, software | |||
interfaces for URIs typically function on sequences of octets or | interfaces for URIs typically function on sequences of octets or | |||
other kinds of code units. Thus, software interfaces and protocols | other kinds of code units. Thus, software interfaces and protocols | |||
MUST define which character encoding is used. | MUST define which character encoding is used. | |||
skipping to change at page 21, line 39 | skipping to change at page 21, line 30 | |||
This section discusses details and gives examples for point c) in | This section discusses details and gives examples for point c) in | |||
Section 1.2. To be able to use IRIs, the URI corresponding to the | Section 1.2. To be able to use IRIs, the URI corresponding to the | |||
IRI in question has to encode original characters into octets by | IRI in question has to encode original characters into octets by | |||
using UTF-8. This can be specified for all URIs of a URI scheme or | using UTF-8. This can be specified for all URIs of a URI scheme or | |||
can apply to individual URIs for schemes that do not specify how to | can apply to individual URIs for schemes that do not specify how to | |||
encode original characters. It can apply to the whole URI, or only | encode original characters. It can apply to the whole URI, or only | |||
to some part. For background information on encoding characters into | to some part. For background information on encoding characters into | |||
URIs, see also Section 2.5 of [RFC3986]. | URIs, see also Section 2.5 of [RFC3986]. | |||
For new URI schemes, using UTF-8 is recommended in [RFC4395bis]. | For new URI/IRI schemes, using UTF-8 is recommended in [RFC4395bis]. | |||
Examples where UTF-8 is already used are the URN syntax [RFC2141], | Examples where UTF-8 is already used are the URN syntax [RFC2141], | |||
IMAP URLs [RFC2192], POP URLs [RFC2384], XMPP URLs [RFC5122], and the | IMAP URLs [RFC2192], POP URLs [RFC2384], XMPP URLs [RFC5122], and the | |||
'mailto:' scheme [RFC6068]. On the other hand, because the HTTP URI | 'mailto:' scheme [RFC6068]. On the other hand, because the HTTP URI | |||
scheme does not specify how to encode original characters, only some | scheme does not specify how to encode original characters, only some | |||
HTTP URLs can have corresponding but different IRIs. | HTTP URLs can have corresponding but different IRIs. | |||
For example, for a document with a URI of | For example, for a document with a URI of | |||
"http://www.example.org/r%C3%A9sum%C3%A9.html", it is possible to | "http://www.example.org/r%C3%A9sum%C3%A9.html", it is possible to | |||
construct a corresponding IRI (in XML notation, see Section 1.4): | construct a corresponding IRI (in XML notation, see Section 1.4): | |||
"http://www.example.org/résumé.html" ("é" stands for | "http://www.example.org/résumé.html" ("é" stands for | |||
skipping to change at page 22, line 15 | skipping to change at page 22, line 7 | |||
IRI, as the percent-encoding is not based on UTF-8. | IRI, as the percent-encoding is not based on UTF-8. | |||
For most URI schemes, there is no need to upgrade their scheme | For most URI schemes, there is no need to upgrade their scheme | |||
definition in order for them to work with IRIs. The main case where | definition in order for them to work with IRIs. The main case where | |||
upgrading makes sense is when a scheme definition, or a particular | upgrading makes sense is when a scheme definition, or a particular | |||
component of a scheme, is strictly limited to the use of US-ASCII | component of a scheme, is strictly limited to the use of US-ASCII | |||
characters with no provision to include non-ASCII characters/octets | characters with no provision to include non-ASCII characters/octets | |||
via percent-encoding, or if a scheme definition currently uses highly | via percent-encoding, or if a scheme definition currently uses highly | |||
scheme-specific provisions for the encoding of non-ASCII characters. | scheme-specific provisions for the encoding of non-ASCII characters. | |||
This specification updates the IANA registry of URI schemes to note | ||||
their applicability to IRIs, see Section 8. All IRIs use URI | ||||
schemes, and all URIs with URI schemes can be used as IRIs, even | ||||
though in some cases only by using URIs directly as IRIs, without any | ||||
conversion. | ||||
Scheme definitions can impose restrictions on the syntax of scheme- | Scheme definitions can impose restrictions on the syntax of scheme- | |||
specific URIs; i.e., URIs that are admissible under the generic URI | specific URIs; i.e., URIs that are admissible under the generic URI | |||
syntax [RFC3986] may not be admissible due to narrower syntactic | syntax [RFC3986] may not be admissible due to narrower syntactic | |||
constraints imposed by a URI scheme specification. URI scheme | constraints imposed by a URI scheme specification. URI scheme | |||
definitions cannot broaden the syntactic restrictions of the generic | definitions cannot broaden the syntactic restrictions of the generic | |||
URI syntax; otherwise, it would be possible to generate URIs that | URI syntax; otherwise, it would be possible to generate URIs that | |||
satisfied the scheme-specific syntactic constraints without | satisfied the scheme-specific syntactic constraints without | |||
satisfying the syntactic constraints of the generic URI syntax. | satisfying the syntactic constraints of the generic URI syntax. | |||
However, additional syntactic constraints imposed by URI scheme | However, additional syntactic constraints imposed by URI scheme | |||
specifications are applicable to IRI, as the corresponding URI | specifications are applicable to IRI, as the corresponding URI | |||
skipping to change at page 26, line 5 | skipping to change at page 25, line 37 | |||
For reference, we here also list the code points and code units not | For reference, we here also list the code points and code units not | |||
even allowed in Legacy Extended IRIs: | even allowed in Legacy Extended IRIs: | |||
Surrogate code units (D800-DFFF): These do not represent Unicode | Surrogate code units (D800-DFFF): These do not represent Unicode | |||
codepoints. | codepoints. | |||
Non-characters (U+FFFE-FFFF): These are not allowed in XML nor | Non-characters (U+FFFE-FFFF): These are not allowed in XML nor | |||
LEIRIs. | LEIRIs. | |||
7. URI/IRI Processing Guidelines (Informative) | 7. Processing of URIs/IRIs/URLs by Web Browsers | |||
For legacy reasons, many web browsers exhibit some irregularities | ||||
when processing URIs, IRIs, and URLs. This is being documented in | ||||
[HTMLURL], in the hope that it will lead to more uniform | ||||
implementations of these irregularities across web browsers. | ||||
As far as currently known, creators of content for web browsers (such | ||||
as HTML) can use all URIs without problems. They can also use all | ||||
IRIs without problems except that they should be aware of the fact | ||||
that query parts for HTTP/HTTPS IRIs should be percent-escaped. | ||||
8. URI/IRI Processing Guidelines (Informative) | ||||
This informative section provides guidelines for supporting IRIs in | This informative section provides guidelines for supporting IRIs in | |||
the same software components and operations that currently process | the same software components and operations that currently process | |||
URIs: Software interfaces that handle URIs, software that allows | URIs: Software interfaces that handle URIs, software that allows | |||
users to enter URIs, software that creates or generates URIs, | users to enter URIs, software that creates or generates URIs, | |||
software that displays URIs, formats and protocols that transport | software that displays URIs, formats and protocols that transport | |||
URIs, and software that interprets URIs. These may all require | URIs, and software that interprets URIs. These may all require | |||
modification before functioning properly with IRIs. The | modification before functioning properly with IRIs. The | |||
considerations in this section also apply to URI references and IRI | considerations in this section also apply to URI references and IRI | |||
references. | references. | |||
7.1. URI/IRI Software Interfaces | 8.1. URI/IRI Software Interfaces | |||
Software interfaces that handle URIs, such as URI-handling APIs and | Software interfaces that handle URIs, such as URI-handling APIs and | |||
protocols transferring URIs, need interfaces and protocol elements | protocols transferring URIs, need interfaces and protocol elements | |||
that are designed to carry IRIs. | that are designed to carry IRIs. | |||
In case the current handling in an API or protocol is based on US- | In case the current handling in an API or protocol is based on US- | |||
ASCII, UTF-8 is recommended as the character encoding for IRIs, as it | ASCII, UTF-8 is recommended as the character encoding for IRIs, as it | |||
is compatible with US-ASCII, is in accordance with the | is compatible with US-ASCII, is in accordance with the | |||
recommendations of [RFC2277], and makes converting to URIs easy. In | recommendations of [RFC2277], and makes converting to URIs easy. In | |||
any case, the API or protocol definition must clearly define the | any case, the API or protocol definition must clearly define the | |||
character encoding to be used. | character encoding to be used. | |||
The transfer from URI-only to IRI-capable components requires no | The transfer from URI-only to IRI-capable components requires no | |||
mapping, although the conversion described in Section 4 above may be | mapping, although the conversion described in Section 4 above may be | |||
performed. It is preferable not to perform this inverse conversion | performed. It is preferable not to perform this inverse conversion | |||
unless it is certain this can be done correctly. | unless it is certain this can be done correctly. | |||
7.2. URI/IRI Entry | 8.2. URI/IRI Entry | |||
Some components allow users to enter URIs into the system by typing | Some components allow users to enter URIs into the system by typing | |||
or dictation, for example. This software must be updated to allow | or dictation, for example. This software must be updated to allow | |||
for IRI entry. | for IRI entry. | |||
A person viewing a visual presentation of an IRI (as a sequence of | A person viewing a visual presentation of an IRI (as a sequence of | |||
glyphs, in some order, in some visual display) will use an entry | glyphs, in some order, in some visual display) will use an entry | |||
method for characters in the user's language to input the IRI. | method for characters in the user's language to input the IRI. | |||
Depending on the script and the input method used, this may be a more | Depending on the script and the input method used, this may be a more | |||
or less complicated process. | or less complicated process. | |||
skipping to change at page 27, line 24 | skipping to change at page 27, line 24 | |||
viewing an IRI as mapped to a URI. This will help users when some of | viewing an IRI as mapped to a URI. This will help users when some of | |||
the software they use does not yet accept IRIs. | the software they use does not yet accept IRIs. | |||
An IRI input component interfacing to components that handle URIs, | An IRI input component interfacing to components that handle URIs, | |||
but not IRIs, must map the IRI to a URI before passing it to these | but not IRIs, must map the IRI to a URI before passing it to these | |||
components. | components. | |||
For the input of IRIs with right-to-left characters, please see | For the input of IRIs with right-to-left characters, please see | |||
[Bidi]. | [Bidi]. | |||
7.3. URI/IRI Transfer between Applications | 8.3. URI/IRI Transfer between Applications | |||
Many applications (for example, mail user agents) try to detect URIs | Many applications (for example, mail user agents) try to detect URIs | |||
appearing in plain text. For this, they use some heuristics based on | appearing in plain text. For this, they use some heuristics based on | |||
URI syntax. They then allow the user to click on such URIs and | URI syntax. They then allow the user to click on such URIs and | |||
retrieve the corresponding resource in an appropriate (usually | retrieve the corresponding resource in an appropriate (usually | |||
scheme-dependent) application. | scheme-dependent) application. | |||
Such applications would need to be upgraded, in order to use the IRI | Such applications would need to be upgraded, in order to use the IRI | |||
syntax as a base for heuristics. In particular, a non-ASCII | syntax as a base for heuristics. In particular, a non-ASCII | |||
character should not be taken as the indication of the end of an IRI. | character should not be taken as the indication of the end of an IRI. | |||
skipping to change at page 27, line 48 | skipping to change at page 27, line 48 | |||
by the system-wide IRI invocation mechanism, or to a URI (according | by the system-wide IRI invocation mechanism, or to a URI (according | |||
to Section 3.6) if the system-wide invocation mechanism only accepts | to Section 3.6) if the system-wide invocation mechanism only accepts | |||
URIs. | URIs. | |||
The clipboard is another frequently used way to transfer URIs and | The clipboard is another frequently used way to transfer URIs and | |||
IRIs from one application to another. On most platforms, the | IRIs from one application to another. On most platforms, the | |||
clipboard is able to store and transfer text in many languages and | clipboard is able to store and transfer text in many languages and | |||
scripts. Correctly used, the clipboard transfers characters, not | scripts. Correctly used, the clipboard transfers characters, not | |||
octets, which will do the right thing with IRIs. | octets, which will do the right thing with IRIs. | |||
7.4. URI/IRI Generation | 8.4. URI/IRI Generation | |||
Systems that offer resources through the Internet, where those | Systems that offer resources through the Internet, where those | |||
resources have logical names, sometimes automatically generate URIs | resources have logical names, sometimes automatically generate URIs | |||
for the resources they offer. For example, some HTTP servers can | for the resources they offer. For example, some HTTP servers can | |||
generate a directory listing for a file directory and then respond to | generate a directory listing for a file directory and then respond to | |||
the generated URIs with the files. | the generated URIs with the files. | |||
Many legacy character encodings are in use in various file systems. | Many legacy character encodings are in use in various file systems. | |||
Many currently deployed systems do not transform the local character | Many currently deployed systems do not transform the local character | |||
representation of the underlying system before generating URIs. | representation of the underlying system before generating URIs. | |||
skipping to change at page 28, line 22 | skipping to change at page 28, line 22 | |||
identifiers should make the appropriate transformations. For | identifiers should make the appropriate transformations. For | |||
example, if a file system contains a file named "résum&# | example, if a file system contains a file named "résum&# | |||
xE9;.html", a server should expose this as "r%C3%A9sum%C3%A9.html" in | xE9;.html", a server should expose this as "r%C3%A9sum%C3%A9.html" in | |||
a URI, which allows use of "résumé.html" in an IRI, even if | a URI, which allows use of "résumé.html" in an IRI, even if | |||
locally the file name is kept in a character encoding other than | locally the file name is kept in a character encoding other than | |||
UTF-8. | UTF-8. | |||
This recommendation particularly applies to HTTP servers. For FTP | This recommendation particularly applies to HTTP servers. For FTP | |||
servers, similar considerations apply; see [RFC2640]. | servers, similar considerations apply; see [RFC2640]. | |||
7.5. URI/IRI Selection | 8.5. URI/IRI Selection | |||
In some cases, resource owners and publishers have control over the | In some cases, resource owners and publishers have control over the | |||
IRIs used to identify their resources. This control is mostly | IRIs used to identify their resources. This control is mostly | |||
executed by controlling the resource names, such as file names, | executed by controlling the resource names, such as file names, | |||
directly. | directly. | |||
In these cases, it is recommended to avoid choosing IRIs that are | In these cases, it is recommended to avoid choosing IRIs that are | |||
easily confused. For example, for US-ASCII, the lower-case ell ("l") | easily confused. For example, for US-ASCII, the lower-case ell ("l") | |||
is easily confused with the digit one ("1"), and the upper-case oh | is easily confused with the digit one ("1"), and the upper-case oh | |||
("O") is easily confused with the digit zero ("0"). Publishers | ("O") is easily confused with the digit zero ("0"). Publishers | |||
skipping to change at page 29, line 17 | skipping to change at page 29, line 17 | |||
the Latin "A", the Greek "Alpha", and the Cyrillic "A". To avoid | the Latin "A", the Greek "Alpha", and the Cyrillic "A". To avoid | |||
such cases, IRIs should only be created where all the characters in a | such cases, IRIs should only be created where all the characters in a | |||
single component are used together in a given language. This usually | single component are used together in a given language. This usually | |||
means that all of these characters will be from the same script, but | means that all of these characters will be from the same script, but | |||
there are languages that mix characters from different scripts (such | there are languages that mix characters from different scripts (such | |||
as Japanese). This is similar to the heuristics used to distinguish | as Japanese). This is similar to the heuristics used to distinguish | |||
between letters and numbers in the examples above. Also, for Latin, | between letters and numbers in the examples above. Also, for Latin, | |||
Greek, and Cyrillic, using lowercase letters results in fewer | Greek, and Cyrillic, using lowercase letters results in fewer | |||
ambiguities than using uppercase letters would. | ambiguities than using uppercase letters would. | |||
7.6. Display of URIs/IRIs | 8.6. Display of URIs/IRIs | |||
In situations where the rendering software is not expected to display | In situations where the rendering software is not expected to display | |||
non-ASCII parts of the IRI correctly using the available layout and | non-ASCII parts of the IRI correctly using the available layout and | |||
font resources, these parts should be percent-encoded before being | font resources, these parts should be percent-encoded before being | |||
displayed. | displayed. | |||
For display of Bidi IRIs, please see [Bidi]. | For display of Bidi IRIs, please see [Bidi]. | |||
7.7. Interpretation of URIs and IRIs | 8.7. Interpretation of URIs and IRIs | |||
Software that interprets IRIs as the names of local resources should | Software that interprets IRIs as the names of local resources should | |||
accept IRIs in multiple forms and convert and match them with the | accept IRIs in multiple forms and convert and match them with the | |||
appropriate local resource names. | appropriate local resource names. | |||
First, multiple representations include both IRIs in the native | First, multiple representations include both IRIs in the native | |||
character encoding of the protocol and also their URI counterparts. | character encoding of the protocol and also their URI counterparts. | |||
Second, it may include URIs constructed based on character encodings | Second, it may include URIs constructed based on character encodings | |||
other than UTF-8. These URIs may be produced by user agents that do | other than UTF-8. These URIs may be produced by user agents that do | |||
skipping to change at page 30, line 11 | skipping to change at page 30, line 11 | |||
beyond the US-ASCII repertoire, this may, for example, include | beyond the US-ASCII repertoire, this may, for example, include | |||
ignoring the accents on received IRIs or resource names. Please note | ignoring the accents on received IRIs or resource names. Please note | |||
that such mappings, including case mappings, are language dependent. | that such mappings, including case mappings, are language dependent. | |||
It can be difficult to identify a resource unambiguously if too many | It can be difficult to identify a resource unambiguously if too many | |||
mappings are taken into consideration. However, percent-encoded and | mappings are taken into consideration. However, percent-encoded and | |||
not percent-encoded parts of IRIs can always be clearly | not percent-encoded parts of IRIs can always be clearly | |||
distinguished. Also, the regularity of UTF-8 (see [Duerst97]) makes | distinguished. Also, the regularity of UTF-8 (see [Duerst97]) makes | |||
the potential for collisions lower than it may seem at first. | the potential for collisions lower than it may seem at first. | |||
7.8. Upgrading Strategy | 8.8. Upgrading Strategy | |||
Where this recommendation places further constraints on software for | Where this recommendation places further constraints on software for | |||
which many instances are already deployed, it is important to | which many instances are already deployed, it is important to | |||
introduce upgrades carefully and to be aware of the various | introduce upgrades carefully and to be aware of the various | |||
interdependencies. | interdependencies. | |||
If IRIs cannot be interpreted correctly, they should not be created, | If IRIs cannot be interpreted correctly, they should not be created, | |||
generated, or transported. This suggests that upgrading URI | generated, or transported. This suggests that upgrading URI | |||
interpreting software to accept IRIs should have highest priority. | interpreting software to accept IRIs should have highest priority. | |||
skipping to change at page 31, line 9 | skipping to change at page 31, line 9 | |||
encoding of the form page, the returned query URIs will use UTF-8 as | encoding of the form page, the returned query URIs will use UTF-8 as | |||
the character encoding (unless the user, for whatever reason, changes | the character encoding (unless the user, for whatever reason, changes | |||
the character encoding) and will therefore be compatible with IRIs. | the character encoding) and will therefore be compatible with IRIs. | |||
These recommendations, when taken together, will allow for the | These recommendations, when taken together, will allow for the | |||
extension from URIs to IRIs in order to handle characters other than | extension from URIs to IRIs in order to handle characters other than | |||
US-ASCII while minimizing interoperability problems. For | US-ASCII while minimizing interoperability problems. For | |||
considerations regarding the upgrade of URI scheme definitions, see | considerations regarding the upgrade of URI scheme definitions, see | |||
Section 5.4. | Section 5.4. | |||
8. IANA Considerations | 9. IANA Considerations | |||
NOTE: THIS SECTION NEEDS REVIEW AGAINST HAPPIANA WORK. | ||||
RFC Editor and IANA note: Please Replace RFC XXXX with the number of | ||||
this document when it issues as an RFC, and RFC YYYY with the number | ||||
of the RFC issued for draft-ietf-iri-rfc3987bis. | ||||
IANA maintains a registry of "URI schemes". This document attempts | ||||
to make it clear from the registry that a "URI scheme" also serves an | ||||
"IRI scheme", and makes several changes to the registry. | ||||
The description of the registry should be changed: "RFC 4395 defined | ||||
an IANA-maintained registry of URI Schemes. RFC XXXX updates this | ||||
registry to make it clear that the registered values also serve as | ||||
IRI schemes, as defined in RFC YYYY." | ||||
The registry includes schemes marked as Permanent or Provisional. | This specification does not affect IANA. For details on how to | |||
Previously, this was accomplished by having two sections, "Permanent" | define a URI/IRI scheme and register it with IANA, see [RFC4395bis]. | |||
and "Provisional". However, in order to allow other status | ||||
("Historical", and possibly a Proposed status for proposals which | ||||
have been received but not accepted), the registry should be changed | ||||
so that the status is indicated in a separate "Status" column, whose | ||||
values may be "Permanent", "Provisional" or "Historical". Changes in | ||||
status as well as updates to the entire registration may be | ||||
accomplished by requests and expert review. | ||||
9. Security Considerations | 10. Security Considerations | |||
The security considerations discussed in [RFC3986] also apply to | The security considerations discussed in [RFC3986] also apply to | |||
IRIs. In addition, the following issues require particular care for | IRIs. In addition, the following issues require particular care for | |||
IRIs. | IRIs. | |||
Incorrect encoding or decoding can lead to security problems. For | Incorrect encoding or decoding can lead to security problems. For | |||
example, some UTF-8 decoders do not check against overlong byte | example, some UTF-8 decoders do not check against overlong byte | |||
sequences. See [UTR36] Section 3 for details. | sequences. See [UTR36] Section 3 for details. | |||
There are serious difficulties with relying on a human to verify that | There are serious difficulties with relying on a human to verify that | |||
skipping to change at page 32, line 27 | skipping to change at page 32, line 5 | |||
normalization expectations, use of percent-encoding with various | normalization expectations, use of percent-encoding with various | |||
legacy encodings, and bidirectionality issues. See also [Bidi]. | legacy encodings, and bidirectionality issues. See also [Bidi]. | |||
Confusion can occur in various IRI components, such as the domain | Confusion can occur in various IRI components, such as the domain | |||
name part or the path part, or between IRI components. For | name part or the path part, or between IRI components. For | |||
considerations specific to the domain name part, see [RFC5890]. For | considerations specific to the domain name part, see [RFC5890]. For | |||
considerations specific to particular protocols or schemes, see the | considerations specific to particular protocols or schemes, see the | |||
security sections of the relevant specifications and registration | security sections of the relevant specifications and registration | |||
templates. Administrators of sites that allow independent users to | templates. Administrators of sites that allow independent users to | |||
create resources in the same sub area have to be careful. Details | create resources in the same sub area have to be careful. Details | |||
are discussed in Section 7.5. | are discussed in Section 8.5. | |||
The characters additionally allowed in Legacy Extended IRIs introduce | The characters additionally allowed in Legacy Extended IRIs introduce | |||
additional security issues. For details, see Section 6.3. | additional security issues. For details, see Section 6.3. | |||
10. Acknowledgements | 11. Acknowledgements | |||
This document was derived from [RFC3987]; the acknowledgments from | This document was derived from [RFC3987]; the acknowledgments from | |||
that specification still apply. | that specification still apply. | |||
In addition, this document was influenced by contributions from (in | In addition, this document was influenced by contributions from (in | |||
no particular order) Norman Walsh, Richard Tobin, Henry S. Thomson, | no particular order) Norman Walsh, Richard Tobin, Henry S. Thomson, | |||
John Cowan, Paul Grosso, the XML Core Working Group of the W3C, Chris | John Cowan, Paul Grosso, the XML Core Working Group of the W3C, Chris | |||
Lilley, Bjoern Hoehrmann, Felix Sasaki, Jeremy Carroll, Frank | Lilley, Bjoern Hoehrmann, Felix Sasaki, Jeremy Carroll, Frank | |||
Ellermann, Michael Everson, Cary Karp, Matitiahu Allouche, Richard | Ellermann, Michael Everson, Cary Karp, Matitiahu Allouche, Richard | |||
Ishida, Addison Phillips, Jonathan Rosenne, Najib Tounsi, Debbie | Ishida, Addison Phillips, Jonathan Rosenne, Najib Tounsi, Debbie | |||
skipping to change at page 33, line 7 | skipping to change at page 32, line 32 | |||
Roessler, Lisa Dusseault, Julian Reschke, Giovanni Campagna, Anne van | Roessler, Lisa Dusseault, Julian Reschke, Giovanni Campagna, Anne van | |||
Kesteren, Mark Nottingham, Erik van der Poel, Marcin Hanclik, Marcos | Kesteren, Mark Nottingham, Erik van der Poel, Marcin Hanclik, Marcos | |||
Caceres, Roy Fielding, Greg Wilkins, Pieter Hintjens, Daniel R. | Caceres, Roy Fielding, Greg Wilkins, Pieter Hintjens, Daniel R. | |||
Tobias, Marko Martin, Maciej Stanchowiak, Wil Tan, Yui Naruse, | Tobias, Marko Martin, Maciej Stanchowiak, Wil Tan, Yui Naruse, | |||
Michael A. Puls II, Dave Thaler, Tom Petch, John Klensin, Shawn | Michael A. Puls II, Dave Thaler, Tom Petch, John Klensin, Shawn | |||
Steele, Peter Saint-Andre, Geoffrey Sneddon, Chris Weber, Alex | Steele, Peter Saint-Andre, Geoffrey Sneddon, Chris Weber, Alex | |||
Melnikov, Slim Amamou, S. Moonesamy, Tim Berners-Lee, Yaron Goland, | Melnikov, Slim Amamou, S. Moonesamy, Tim Berners-Lee, Yaron Goland, | |||
Sam Ruby, Adam Barth, Abdulrahman I. ALGhadir, Aharon Lanin, Thomas | Sam Ruby, Adam Barth, Abdulrahman I. ALGhadir, Aharon Lanin, Thomas | |||
Milo, Murray Sargent, Marc Blanchet, and Mykyta Yevstifeyev. | Milo, Murray Sargent, Marc Blanchet, and Mykyta Yevstifeyev. | |||
11. Main Changes Since RFC 3987 | Anne van Kesteren is also gratefully acknowledged for his ongoing | |||
work documenting browser behavior with respect to URIs/URIs/URLs (see | ||||
[HTMLURL]). | ||||
12. Main Changes Since RFC 3987 | ||||
This section describes the main changes since [RFC3987]. | This section describes the main changes since [RFC3987]. | |||
11.1. Split out Bidi, processing guidelines, comparison sections | 12.1. Split out Bidi, processing guidelines, comparison sections | |||
Move some components (comparison, bidi, processing) into separate | Move some components (comparison, bidi, processing) into separate | |||
documents. | documents. | |||
11.2. Major restructuring of IRI processing model | 12.2. Major restructuring of IRI processing model | |||
Major restructuring of IRI processing model to make scheme-specific | Major restructuring of IRI processing model to make scheme-specific | |||
translation necessary to handle IDNA requirements and for consistency | translation necessary to handle IDNA requirements and for consistency | |||
with web implementations. | with web implementations. | |||
Starting with IRI, you want one of: | Starting with IRI, you want one of: | |||
a IRI components (IRI parsed into UTF8 pieces) | a IRI components (IRI parsed into UTF8 pieces) | |||
b URI components (URI parsed into ASCII pieces, encoded correctly) | b URI components (URI parsed into ASCII pieces, encoded correctly) | |||
c whole URI (for passing on to some other system that wants whole | c whole URI (for passing on to some other system that wants whole | |||
URIs) | URIs) | |||
11.2.1. OLD WAY | 12.2.1. OLD WAY | |||
1. Pct-encoding on the whole thing to a URI. (c1) If you want a | 1. Percent-encoding on the whole thing to a URI. (c1) If you want a | |||
(maybe broken) whole URI, you might stop here. | (maybe broken) whole URI, you might stop here. | |||
2. Parsing the URI into URI components. (b1) If you want (maybe | 2. Parsing the URI into URI components. (b1) If you want (maybe | |||
broken) URI components, stop here. | broken) URI components, stop here. | |||
3. Decode the components (undoing the pct-encoding). (a) if you want | 3. Decode the components (undoing the percent-encoding). (a) if you | |||
IRI components, stop here. | want IRI components, stop here. | |||
4. reencode: Either using a different encoding some components (for | 4. reencode: Either using a different encoding some components (for | |||
domain names, and query components in web pages, which depends on | domain names, and query components in web pages, which depends on | |||
the component, scheme and context), and otherwise using pct- | the component, scheme and context), and otherwise using percent- | |||
encoding. (b2) if you want (good) URI components, stop here. | encoding. (b2) if you want (good) URI components, stop here. | |||
5. reassemble the reencoded components. (c2) if you want a (*good*) | 5. reassemble the reencoded components. (c2) if you want a (*good*) | |||
whole URI stop here. | whole URI stop here. | |||
11.2.2. NEW WAY | 12.2.2. NEW WAY | |||
1. Parse the IRI into IRI components using the generic syntax. (a) | 1. Parse the IRI into IRI components using the generic syntax. (a) | |||
if you want IRI components, stop here. | if you want IRI components, stop here. | |||
2. Encode each components, using pct-encoding, IDN encoding, or | 2. Encode each components, using percent-encoding, IDN encoding, or | |||
special query part encoding depending on the component scheme or | special query part encoding depending on the component scheme or | |||
context. (b) If you want URI components, stop here. | context. (b) If you want URI components, stop here. | |||
3. reassemble the a whole URI from URI components. (c) if you want a | 3. reassemble the a whole URI from URI components. (c) if you want a | |||
whole URI stop here. | whole URI stop here. | |||
11.2.3. Extension of Syntax | 12.2.3. Extension of Syntax | |||
Added the tag range (U+E0000-E0FFF) to the iprivate production. Some | Added the tag range (U+E0000-E0FFF) to the iprivate production. Some | |||
IRIs generated with the new syntax may fail to pass very strict | IRIs generated with the new syntax may fail to pass very strict | |||
checks relying on the old syntax. But characters in this range | checks relying on the old syntax. But characters in this range | |||
should be extremely infrequent anyway. | should be extremely infrequent anyway. | |||
11.2.4. More to be added | 12.2.4. More to be added | |||
TODO: There are more main changes that need to be documented in this | TODO: There are more main changes that need to be documented in this | |||
section. | section. | |||
11.3. Change Log | 12.3. Change Log | |||
Note to RFC Editor: Please completely remove this section before | Note to RFC Editor: Please completely remove this section before | |||
publication. | publication. | |||
11.3.1. Changes after draft-ietf-iri-3987bis-01 | 12.3.1. Changes after draft-ietf-iri-3987bis-01 | |||
Changes from draft-ietf-iri-3987bis-01 onwards are available as | Changes from draft-ietf-iri-3987bis-01 onwards are available as | |||
changesets in the IETF tools subversion repository at http:// | changesets in the IETF tools subversion repository at http:// | |||
trac.tools.ietf.org/wg/iri/trac/log/draft-ietf-iri-3987bis/ | trac.tools.ietf.org/wg/iri/trac/log/draft-ietf-iri-3987bis/ | |||
draft-ietf-iri-3987bis.xml. | draft-ietf-iri-3987bis.xml. | |||
11.3.2. Changes from draft-duerst-iri-bis-07 to | 12.3.2. Changes from draft-duerst-iri-bis-07 to | |||
draft-ietf-iri-3987bis-00 | draft-ietf-iri-3987bis-00 | |||
Changed draft name, date, last paragraph of abstract, and titles in | Changed draft name, date, last paragraph of abstract, and titles in | |||
change log, and added this section in moving from | change log, and added this section in moving from | |||
draft-duerst-iri-bis-07 (personal submission) to | draft-duerst-iri-bis-07 (personal submission) to | |||
draft-ietf-iri-3987bis-00 (WG document). | draft-ietf-iri-3987bis-00 (WG document). | |||
11.3.3. Changes from -06 to -07 of draft-duerst-iri-bis | 12.3.3. Changes from -06 to -07 of draft-duerst-iri-bis | |||
Major restructuring of the processing model, see Section 11.2. | Major restructuring of the processing model, see Section 12.2. | |||
11.4. Changes from -00 to -01 | 12.4. Changes from -00 to -01 | |||
o Removed 'mailto:' before mail addresses of authors. | o Removed 'mailto:' before mail addresses of authors. | |||
o Added "<to be done>" as right side of 'href-strip' rule. Fixed | o Added "<to be done>" as right side of 'href-strip' rule. Fixed | |||
'|' to '/' for alternatives. | '|' to '/' for alternatives. | |||
11.5. Changes from -05 to -06 of draft-duerst-iri-bis-00 | 12.5. Changes from -05 to -06 of draft-duerst-iri-bis-00 | |||
o Add HyperText Reference, change abstract, acks and references for | o Add HyperText Reference, change abstract, acks and references for | |||
it | it | |||
o Add Masinter back as another editor. | o Add Masinter back as another editor. | |||
o Masinter integrates HRef material from HTML5 spec. | o Masinter integrates HRef material from HTML5 spec. | |||
o Rewrite introduction sections to modernize. | o Rewrite introduction sections to modernize. | |||
11.6. Changes from -04 to -05 of draft-duerst-iri-bis | 12.6. Changes from -04 to -05 of draft-duerst-iri-bis | |||
o Updated references. | o Updated references. | |||
o Changed IPR text to pre5378Trust200902. | o Changed IPR text to pre5378Trust200902. | |||
11.7. Changes from -03 to -04 of draft-duerst-iri-bis | 12.7. Changes from -03 to -04 of draft-duerst-iri-bis | |||
o Added explicit abbreviation for LEIRIs. | o Added explicit abbreviation for LEIRIs. | |||
o Mentioned LEIRI references. | o Mentioned LEIRI references. | |||
o Completed text in LEIRI section about tag characters and about | o Completed text in LEIRI section about tag characters and about | |||
specials. | specials. | |||
11.8. Changes from -02 to -03 of draft-duerst-iri-bis | 12.8. Changes from -02 to -03 of draft-duerst-iri-bis | |||
o Updated some references. | o Updated some references. | |||
o Updated Michel Suginard's coordinates. | o Updated Michel Suginard's coordinates. | |||
11.9. Changes from -01 to -02 of draft-duerst-iri-bis | 12.9. Changes from -01 to -02 of draft-duerst-iri-bis | |||
o Added tag range to iprivate (issue private-include-tags-115). | o Added tag range to iprivate (issue private-include-tags-115). | |||
o Added Specials (U+FFF0-FFFD) to Legacy Extended IRIs. | o Added Specials (U+FFF0-FFFD) to Legacy Extended IRIs. | |||
11.10. Changes from -00 to -01 of draft-duerst-iri-bis | 12.10. Changes from -00 to -01 of draft-duerst-iri-bis | |||
o Changed from "IRIs with Spaces/Controls" to "Legacy Extended IRI" | o Changed from "IRIs with Spaces/Controls" to "Legacy Extended IRI" | |||
based on input from the W3C XML Core WG. Moved the relevant | based on input from the W3C XML Core WG. Moved the relevant | |||
subsections to the back and promoted them to a section. | subsections to the back and promoted them to a section. | |||
o Added some text re. Legacy Extended IRIs to the security section. | o Added some text re. Legacy Extended IRIs to the security section. | |||
o Added a IANA Consideration Section. | o Added a IANA Consideration Section. | |||
o Added this Change Log Section. | o Added this Change Log Section. | |||
o Added a section about "IRIs with Spaces/Controls" (converting from | o Added a section about "IRIs with Spaces/Controls" (converting from | |||
a Note in RFC 3987). | a Note in RFC 3987). | |||
11.11. Changes from RFC 3987 to -00 of draft-duerst-iri-bis | 12.11. Changes from RFC 3987 to -00 of draft-duerst-iri-bis | |||
Fixed errata (see | Fixed errata (see | |||
http://www.rfc-editor.org/cgi-bin/errataSearch.pl?rfc=3987). | http://www.rfc-editor.org/cgi-bin/errataSearch.pl?rfc=3987). | |||
12. References | 13. References | |||
12.1. Normative References | 13.1. Normative References | |||
[ASCII] American National Standards Institute, "Coded Character | [ASCII] American National Standards Institute, "Coded Character | |||
Set -- 7-bit American Standard Code for Information | Set -- 7-bit American Standard Code for Information | |||
Interchange", ANSI X3.4, 1986. | Interchange", ANSI X3.4, 1986. | |||
[ISO10646] | [ISO10646] | |||
International Organization for Standardization, "ISO/IEC | International Organization for Standardization, "ISO/IEC | |||
10646:2011: Information Technology - Universal Multiple- | 10646:2011: Information Technology - Universal Multiple- | |||
Octet Coded Character Set (UCS)", ISO Standard 10646, | Octet Coded Character Set (UCS)", ISO Standard 10646, | |||
March 20011, <http://standards.iso.org/ittf/ | March 20011, <http://standards.iso.org/ittf/ | |||
skipping to change at page 37, line 23 | skipping to change at page 36, line 50 | |||
Internationalized Domain Names for Applications (IDNA)", | Internationalized Domain Names for Applications (IDNA)", | |||
RFC 5892, August 2010. | RFC 5892, August 2010. | |||
[STD63] Yergeau, F., "UTF-8, a transformation format of ISO | [STD63] Yergeau, F., "UTF-8, a transformation format of ISO | |||
10646", STD 63, RFC 3629, November 2003. | 10646", STD 63, RFC 3629, November 2003. | |||
[STD68] Crocker, D. and P. Overell, "Augmented BNF for Syntax | [STD68] Crocker, D. and P. Overell, "Augmented BNF for Syntax | |||
Specifications: ABNF", STD 68, RFC 5234, January 2008. | Specifications: ABNF", STD 68, RFC 5234, January 2008. | |||
[UNIV6] The Unicode Consortium, "The Unicode Standard, Version | [UNIV6] The Unicode Consortium, "The Unicode Standard, Version | |||
6.1.0 (Mountain View, CA, The Unicode Consortium, 2012, | 6.2.0 (Mountain View, CA, The Unicode Consortium, 2012, | |||
ISBN 978-1-936213-02-3)", 2012. | ISBN 978-1-936213-07-8)", October 2012. | |||
[UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", | [UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", | |||
Unicode Standard Annex #15, March 2008, | Unicode Standard Annex #15, March 2008, | |||
<http://www.unicode.org/unicode/reports/tr15/ | <http://www.unicode.org/unicode/reports/tr15/ | |||
tr15-23.html>. | tr15-23.html>. | |||
12.2. Informative References | 13.2. Informative References | |||
[Bidi] Duerst, M., Masinter, L., and A. Allawi, "Guidelines for | [Bidi] Duerst, M., Masinter, L., and A. Allawi, "Guidelines for | |||
Internationalized Resource Identifiers with Bi-directional | Internationalized Resource Identifiers with Bi-directional | |||
Characters (Bidi IRIs)", draft-ietf-iri-bidi-guidelines-02 | Characters (Bidi IRIs)", draft-ietf-iri-bidi-guidelines-02 | |||
(work in progress), March 2012. | (work in progress), March 2012. | |||
[CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M., and T. | [CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M., and T. | |||
Texin, "Character Model for the World Wide Web 1.0: | Texin, "Character Model for the World Wide Web 1.0: | |||
Resource Identifiers", W3C Candidate Recommendation CR- | Resource Identifiers", W3C Candidate Recommendation CR- | |||
charmod-resid-20041122, November 2004, | charmod-resid-20041122, November 2004, | |||
<http://www.w3.org/TR/2004/CR-charmod-resid/>. | <http://www.w3.org/TR/2004/CR-charmod-resid/>. | |||
[Duerst97] | [Duerst97] | |||
Duerst, M., "The Properties and Promises of UTF-8", Proc. | Duerst, M., "The Properties and Promises of UTF-8", Proc. | |||
11th International Unicode Conference, San Jose , | 11th International Unicode Conference, San Jose , | |||
September 1997, <http://www.ifi.unizh.ch/mml/mduerst/ | September 1997, | |||
papers/PDF/IUC11-UTF-8.pdf>. | <http://www.sw.it.aoyama.ac.jp/2012/pub/IUC11-UTF-8.pdf>. | |||
[Equivalence] | [Equivalence] | |||
Masinter, L. and M. Duerst, "Equivalence and | Masinter, L. and M. Duerst, "Equivalence and | |||
Canonicalization of Internationalized Resource Identifiers | Canonicalization of Internationalized Resource Identifiers | |||
(IRIs)", draft-ietf-iri-comparison-01 (work in progress), | (IRIs)", draft-ietf-iri-comparison-01 (work in progress), | |||
March 2012. | March 2012. | |||
[Gettys] Gettys, J., "URI Model Consequences", | [Gettys] Gettys, J., "URI Model Consequences", | |||
<http://www.w3.org/DesignIssues/ModelConsequences>. | <http://www.w3.org/DesignIssues/ModelConsequences>. | |||
[HTML4] Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01 | [HTML4] Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01 | |||
Specification", W3C Recommendation REC-html401-19991224, | Specification", W3C Recommendation REC-html401-19991224, | |||
December 1999, <http://www.w3.org/TR/1999/REC-html401>. | December 1999, <http://www.w3.org/TR/1999/REC-html401>. | |||
[RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, H., | [HTMLURL] van Kesteren, A., "URL", October 2012, | |||
Atkinson, R., Crispin, M., and P. Svanberg, "The Report of | <http://url.spec.whatwg.org/>. | |||
the IAB Character Set Workshop held 29 February - 1 March, | ||||
1996", RFC 2130, April 1997. | ||||
[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. | [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. | |||
[RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. | [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. | |||
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | |||
Languages", BCP 18, RFC 2277, January 1998. | Languages", BCP 18, RFC 2277, January 1998. | |||
[RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998. | [RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998. | |||
skipping to change at page 39, line 15 | skipping to change at page 38, line 40 | |||
Extensible Messaging and Presence Protocol (XMPP)", | Extensible Messaging and Presence Protocol (XMPP)", | |||
RFC 5122, February 2008. | RFC 5122, February 2008. | |||
[RFC6055] Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on | [RFC6055] Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on | |||
Encodings for Internationalized Domain Names", RFC 6055, | Encodings for Internationalized Domain Names", RFC 6055, | |||
February 2011. | February 2011. | |||
[RFC6068] Duerst, M., Masinter, L., and J. Zawinski, "The 'mailto' | [RFC6068] Duerst, M., Masinter, L., and J. Zawinski, "The 'mailto' | |||
URI Scheme", RFC 6068, October 2010. | URI Scheme", RFC 6068, October 2010. | |||
[RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in | ||||
Internationalization in the IETF", BCP 166, RFC 6365, | ||||
September 2011. | ||||
[UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | [UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | |||
Markup Languages", Unicode Technical Report #20, World | Markup Languages", Unicode Technical Report #20, World | |||
Wide Web Consortium Note, June 2003, | Wide Web Consortium Note, June 2003, | |||
<http://www.w3.org/TR/unicode-xml/>. | <http://www.w3.org/TR/unicode-xml/>. | |||
[UTR36] Davis, M. and M. Suignard, "Unicode Security | [UTR36] Davis, M. and M. Suignard, "Unicode Security | |||
Considerations", Unicode Technical Report #36, | Considerations", Unicode Technical Report #36, | |||
August 2010, <http://unicode.org/reports/tr36/>. | August 2010, <http://unicode.org/reports/tr36/>. | |||
[XLink] DeRose, S., Maler, E., Orchard, D., and N. Walsh, "XML | [XLink] DeRose, S., Maler, E., Orchard, D., and N. Walsh, "XML | |||
End of changes. 76 change blocks. | ||||
185 lines changed or deleted | 171 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |