draft-ietf-iri-3987bis-09.txt | draft-ietf-iri-3987bis-10.txt | |||
---|---|---|---|---|
Internationalized Resource Identifiers M. Duerst | Internationalized Resource Identifiers M. Duerst | |||
(iri) Aoyama Gakuin University | (iri) Aoyama Gakuin University | |||
Internet-Draft M. Suignard | Internet-Draft M. Suignard | |||
Obsoletes: 3987 (if approved) Unicode Consortium | Obsoletes: 3987 (if approved) Unicode Consortium | |||
Intended status: Standards Track L. Masinter | Intended status: Standards Track L. Masinter | |||
Expires: July 12, 2012 Adobe | Expires: September 3, 2012 Adobe | |||
January 9, 2012 | March 2, 2012 | |||
Internationalized Resource Identifiers (IRIs) | Internationalized Resource Identifiers (IRIs) | |||
draft-ietf-iri-3987bis-09 | draft-ietf-iri-3987bis-10 | |||
Abstract | Abstract | |||
This document defines the Internationalized Resource Identifier (IRI) | This document defines the Internationalized Resource Identifier (IRI) | |||
protocol element, as an extension of the Uniform Resource Identifier | protocol element, as an extension of the Uniform Resource Identifier | |||
(URI). An IRI is a sequence of characters from the Universal | (URI). An IRI is a sequence of characters from the Universal | |||
Character Set (Unicode/ISO 10646). Grammar and processing rules are | Character Set (Unicode/ISO 10646). Grammar and processing rules are | |||
given for IRIs and related syntactic forms. | given for IRIs and related syntactic forms. | |||
Defining IRI as new protocol element (rather than updating or | Defining IRI as new protocol element (rather than updating or | |||
skipping to change at page 2, line 15 | skipping to change at page 2, line 15 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on July 12, 2012. | This Internet-Draft will expire on September 3, 2012. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 3, line 21 | skipping to change at page 3, line 21 | |||
1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 8 | 1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 9 | 2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 9 | |||
2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10 | 2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10 | |||
3. Processing IRIs and related protocol elements . . . . . . . . 13 | 3. Processing IRIs and related protocol elements . . . . . . . . 13 | |||
3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 13 | 3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 13 | |||
3.2. Parse the IRI into IRI components . . . . . . . . . . . . 13 | 3.2. Parse the IRI into IRI components . . . . . . . . . . . . 13 | |||
3.3. General percent-encoding of IRI components . . . . . . . 14 | 3.3. General percent-encoding of IRI components . . . . . . . 14 | |||
3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 14 | 3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 14 | |||
3.4.1. Mapping using Percent-Encoding . . . . . . . . . . . . 14 | 3.4.1. Mapping using Percent-Encoding . . . . . . . . . . . . 14 | |||
3.4.2. Mapping using Punycode . . . . . . . . . . . . . . . . 14 | 3.4.2. Mapping using Punycode . . . . . . . . . . . . . . . . 15 | |||
3.4.3. Additional Considerations . . . . . . . . . . . . . . 15 | 3.4.3. Additional Considerations . . . . . . . . . . . . . . 15 | |||
3.5. Mapping query components . . . . . . . . . . . . . . . . 16 | 3.5. Mapping query components . . . . . . . . . . . . . . . . 16 | |||
3.6. Mapping IRIs to URIs . . . . . . . . . . . . . . . . . . 16 | 3.6. Mapping IRIs to URIs . . . . . . . . . . . . . . . . . . 16 | |||
4. Converting URIs to IRIs . . . . . . . . . . . . . . . . . . . 16 | 4. Converting URIs to IRIs . . . . . . . . . . . . . . . . . . . 16 | |||
4.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . 18 | 4.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
5. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 19 | 5. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
5.1. Limitations on UCS Characters Allowed in IRIs . . . . . . 19 | 5.1. Limitations on UCS Characters Allowed in IRIs . . . . . . 19 | |||
5.2. Software Interfaces and Protocols . . . . . . . . . . . . 20 | 5.2. Software Interfaces and Protocols . . . . . . . . . . . . 20 | |||
5.3. Format of URIs and IRIs in Documents and Protocols . . . 20 | 5.3. Format of URIs and IRIs in Documents and Protocols . . . 21 | |||
5.4. Use of UTF-8 for Encoding Original Characters . . . . . . 20 | 5.4. Use of UTF-8 for Encoding Original Characters . . . . . . 21 | |||
5.5. Relative IRI References . . . . . . . . . . . . . . . . . 22 | 5.5. Relative IRI References . . . . . . . . . . . . . . . . . 23 | |||
6. Legacy Extended IRIs (LEIRIs) . . . . . . . . . . . . . . . . 22 | 6. Legacy Extended IRIs (LEIRIs) . . . . . . . . . . . . . . . . 23 | |||
6.1. Legacy Extended IRI Syntax . . . . . . . . . . . . . . . 23 | 6.1. Legacy Extended IRI Syntax . . . . . . . . . . . . . . . 23 | |||
6.2. Conversion of Legacy Extended IRIs to IRIs . . . . . . . 23 | 6.2. Conversion of Legacy Extended IRIs to IRIs . . . . . . . 24 | |||
6.3. Characters Allowed in Legacy Extended IRIs but not in | 6.3. Characters Allowed in Legacy Extended IRIs but not in | |||
IRIs . . . . . . . . . . . . . . . . . . . . . . . . . . 23 | IRIs . . . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
7. URI/IRI Processing Guidelines (Informative) . . . . . . . . . 25 | 7. URI/IRI Processing Guidelines (Informative) . . . . . . . . . 26 | |||
7.1. URI/IRI Software Interfaces . . . . . . . . . . . . . . . 25 | 7.1. URI/IRI Software Interfaces . . . . . . . . . . . . . . . 26 | |||
7.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 26 | 7.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 26 | |||
7.3. URI/IRI Transfer between Applications . . . . . . . . . . 26 | 7.3. URI/IRI Transfer between Applications . . . . . . . . . . 27 | |||
7.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 27 | 7.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 27 | |||
7.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 27 | 7.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 28 | |||
7.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 28 | 7.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 29 | |||
7.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 28 | 7.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 29 | |||
7.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 29 | 7.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 30 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 30 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 31 | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 31 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 32 | |||
11. Main Changes Since RFC 3987 . . . . . . . . . . . . . . . . . 32 | 11. Main Changes Since RFC 3987 . . . . . . . . . . . . . . . . . 33 | |||
11.1. Split out Bidi, processing guidelines, comparison | 11.1. Split out Bidi, processing guidelines, comparison | |||
sections . . . . . . . . . . . . . . . . . . . . . . . . 32 | sections . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
11.2. Major restructuring of IRI processing model . . . . . . . 32 | 11.2. Major restructuring of IRI processing model . . . . . . . 33 | |||
11.2.1. OLD WAY . . . . . . . . . . . . . . . . . . . . . . . 32 | 11.2.1. OLD WAY . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
11.2.2. NEW WAY . . . . . . . . . . . . . . . . . . . . . . . 33 | 11.2.2. NEW WAY . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
11.2.3. Extension of Syntax . . . . . . . . . . . . . . . . . 33 | 11.2.3. Extension of Syntax . . . . . . . . . . . . . . . . . 34 | |||
11.2.4. More to be added . . . . . . . . . . . . . . . . . . . 33 | 11.2.4. More to be added . . . . . . . . . . . . . . . . . . . 34 | |||
11.3. Change Log . . . . . . . . . . . . . . . . . . . . . . . 33 | 11.3. Change Log . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
11.3.1. Changes after draft-ietf-iri-3987bis-01 . . . . . . . 33 | 11.3.1. Changes after draft-ietf-iri-3987bis-01 . . . . . . . 34 | |||
11.3.2. Changes from draft-duerst-iri-bis-07 to | 11.3.2. Changes from draft-duerst-iri-bis-07 to | |||
draft-ietf-iri-3987bis-00 . . . . . . . . . . . . . . 34 | draft-ietf-iri-3987bis-00 . . . . . . . . . . . . . . 34 | |||
11.3.3. Changes from -06 to -07 of draft-duerst-iri-bis . . . 34 | 11.3.3. Changes from -06 to -07 of draft-duerst-iri-bis . . . 34 | |||
11.4. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 34 | 11.4. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 35 | |||
11.5. Changes from -05 to -06 of draft-duerst-iri-bis-00 . . . 34 | 11.5. Changes from -05 to -06 of draft-duerst-iri-bis-00 . . . 35 | |||
11.6. Changes from -04 to -05 of draft-duerst-iri-bis . . . . . 34 | 11.6. Changes from -04 to -05 of draft-duerst-iri-bis . . . . . 35 | |||
11.7. Changes from -03 to -04 of draft-duerst-iri-bis . . . . . 34 | 11.7. Changes from -03 to -04 of draft-duerst-iri-bis . . . . . 35 | |||
11.8. Changes from -02 to -03 of draft-duerst-iri-bis . . . . . 35 | 11.8. Changes from -02 to -03 of draft-duerst-iri-bis . . . . . 35 | |||
11.9. Changes from -01 to -02 of draft-duerst-iri-bis . . . . . 35 | 11.9. Changes from -01 to -02 of draft-duerst-iri-bis . . . . . 35 | |||
11.10. Changes from -00 to -01 of draft-duerst-iri-bis . . . . . 35 | 11.10. Changes from -00 to -01 of draft-duerst-iri-bis . . . . . 36 | |||
11.11. Changes from RFC 3987 to -00 of draft-duerst-iri-bis . . 35 | 11.11. Changes from RFC 3987 to -00 of draft-duerst-iri-bis . . 36 | |||
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 35 | 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 36 | |||
12.1. Normative References . . . . . . . . . . . . . . . . . . 35 | 12.1. Normative References . . . . . . . . . . . . . . . . . . 36 | |||
12.2. Informative References . . . . . . . . . . . . . . . . . 36 | 12.2. Informative References . . . . . . . . . . . . . . . . . 37 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 39 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 40 | |||
1. Introduction | 1. Introduction | |||
1.1. Overview and Motivation | 1.1. Overview and Motivation | |||
A Uniform Resource Identifier (URI) is defined in [RFC3986] as a | A Uniform Resource Identifier (URI) is defined in [RFC3986] as a | |||
sequence of characters chosen from a limited subset of the repertoire | sequence of characters chosen from a limited subset of the repertoire | |||
of US-ASCII [ASCII] characters. | of US-ASCII [ASCII] characters. | |||
The characters in URIs are frequently used for representing words of | The characters in URIs are frequently used for representing words of | |||
skipping to change at page 11, line 27 | skipping to change at page 11, line 27 | |||
ipath = ipath-abempty ; begins with "/" or is empty | ipath = ipath-abempty ; begins with "/" or is empty | |||
/ ipath-absolute ; begins with "/" but not "//" | / ipath-absolute ; begins with "/" but not "//" | |||
/ ipath-noscheme ; begins with a non-colon segment | / ipath-noscheme ; begins with a non-colon segment | |||
/ ipath-rootless ; begins with a segment | / ipath-rootless ; begins with a segment | |||
/ ipath-empty ; zero characters | / ipath-empty ; zero characters | |||
ipath-abempty = *( path-sep isegment ) | ipath-abempty = *( path-sep isegment ) | |||
ipath-absolute = path-sep [ isegment-nz *( path-sep isegment ) ] | ipath-absolute = path-sep [ isegment-nz *( path-sep isegment ) ] | |||
ipath-noscheme = isegment-nz-nc *( path-sep isegment ) | ipath-noscheme = isegment-nz-nc *( path-sep isegment ) | |||
ipath-rootless = isegment-nz *( path-sep isegment ) | ipath-rootless = isegment-nz *( path-sep isegment ) | |||
ipath-empty = 0<ipchar> | ipath-empty = "" | |||
path-sep = "/" | path-sep = "/" | |||
isegment = *ipchar | isegment = *ipchar | |||
isegment-nz = 1*ipchar | isegment-nz = 1*ipchar | |||
isegment-nz-nc = 1*( iunreserved / pct-form / sub-delims | isegment-nz-nc = 1*( iunreserved / pct-form / sub-delims | |||
/ "@" ) | / "@" ) | |||
; non-zero-length segment without any colon ":" | ; non-zero-length segment without any colon ":" | |||
ipchar = iunreserved / pct-form / sub-delims / ":" | ipchar = iunreserved / pct-form / sub-delims / ":" | |||
/ "@" | / "@" | |||
skipping to change at page 14, line 33 | skipping to change at page 14, line 33 | |||
is the hexadecimal notation of the octet value. The hexadecimal | is the hexadecimal notation of the octet value. The hexadecimal | |||
notation SHOULD use uppercase letters. (This is the general URI | notation SHOULD use uppercase letters. (This is the general URI | |||
percent-encoding mechanism in Section 2.1 of [RFC3986].) | percent-encoding mechanism in Section 2.1 of [RFC3986].) | |||
Note that the mapping is an identity transformation for parsed URI | Note that the mapping is an identity transformation for parsed URI | |||
components of valid URIs, and is idempotent: applying the mapping a | components of valid URIs, and is idempotent: applying the mapping a | |||
second time will not change anything. | second time will not change anything. | |||
3.4. Mapping ireg-name | 3.4. Mapping ireg-name | |||
The mapping from <ireg-name> to a <reg-name> requires a choice | ||||
between one of the two methods described below. | ||||
3.4.1. Mapping using Percent-Encoding | 3.4.1. Mapping using Percent-Encoding | |||
The ireg-name component SHOULD be converted according to the general | The ireg-name component SHOULD be converted according to the general | |||
procedure for percent-encoding of IRI components described in | procedure for percent-encoding of IRI components described in | |||
Section 3.3. | Section 3.3. | |||
For example, the IRI | For example, the IRI | |||
"http://résumé.example.org" | "http://résumé.example.org" | |||
will be converted to | will be converted to | |||
"http://r%C3%A9sum%C3%A9.example.org". | "http://r%C3%A9sum%C3%A9.example.org". | |||
This conversion for ireg-name is in line with Section 3.2.2 of | This conversion for ireg-name is in line with Section 3.2.2 of | |||
[RFC3986], which does not mandate a particular registered name lookup | [RFC3986], which does not mandate a particular registered name lookup | |||
technology. For further background, see [RFC6055] and [Gettys]. | technology. For further background, see [RFC6055] and [Gettys]. | |||
3.4.2. Mapping using Punycode | 3.4.2. Mapping using Punycode | |||
The ireg-name component MAY also be converted as follows: | In situations where it is certain that <ireg-name> is intended to be | |||
used as a domain name to be processed by Domain Name Lookup (as per | ||||
[RFC5891]), an alternative method MAY be used, converting <ireg-name> | ||||
as follows: | ||||
If there are any sequences of <pct-encoded>, and their corresponding | If there are any sequences of <pct-encoded>, and their corresponding | |||
octets all represent valid UTF-8 octet sequences, then convert these | octets all represent valid UTF-8 octet sequences, then convert these | |||
back to Unicode character sequences. (If any <pct-encoded> sequences | back to Unicode character sequences. (If any <pct-encoded> sequences | |||
are not valid UTF-8 octet sequences, then leave the entire field as | are not valid UTF-8 octet sequences, then leave the entire field as | |||
is without any change, since punycode encoding would not succeed.) | is without any change, since punycode encoding would not succeed.) | |||
Replace the ireg-name part of the IRI by the part converted using the | Replace the ireg-name part of the IRI by the part converted using the | |||
Domain Name Lookup procedure (Subsections 5.3 to 5.5) of [RFC5891]. | Domain Name Lookup procedure (Subsections 5.3 to 5.5) of [RFC5891]. | |||
on each dot-separated label, and by using U+002E (FULL STOP) as a | on each dot-separated label, and by using U+002E (FULL STOP) as a | |||
skipping to change at page 20, line 5 | skipping to change at page 20, line 21 | |||
cannot and should not check for such limitations.) | cannot and should not check for such limitations.) | |||
b. The UCS contains many areas of characters for which there are | b. The UCS contains many areas of characters for which there are | |||
strong visual look-alikes. Because of the likelihood of | strong visual look-alikes. Because of the likelihood of | |||
transcription errors, these also should be avoided. This includes | transcription errors, these also should be avoided. This includes | |||
the full-width equivalents of Latin characters, half-width | the full-width equivalents of Latin characters, half-width | |||
Katakana characters for Japanese, and many others. It also | Katakana characters for Japanese, and many others. It also | |||
includes many look-alikes of "space", "delims", and "unwise", | includes many look-alikes of "space", "delims", and "unwise", | |||
characters excluded in [RFC3491]. | characters excluded in [RFC3491]. | |||
c. At the start of a component, the use of combining marks is | ||||
strongly discouraged. As an example, a COMBINING TILDE OVERLAY | ||||
(U+0334) would be very confusing at the start of a <isegment>. | ||||
Combined with the preceeding '/', it might look like a solidus | ||||
with combining tilde overlay, but IRI processing software will | ||||
parse and process the '/' separately. | ||||
d. The ZERO WIDTH NON-JOINER (U+200C) and ZERO WIDTH JOINER (U+200D) | ||||
are invisible in most contexts, but are crucial in some very | ||||
limited contexts. Appendix A of [RFC5892] contains contextual | ||||
restrictions for these and some other characters. The use of | ||||
these characters are strongly discouraged except in the relevant | ||||
contexts. | ||||
Additional information is available from [UNIXML]. [UNIXML] is | Additional information is available from [UNIXML]. [UNIXML] is | |||
written in the context of running text rather than in that of | written in the context of running text rather than in that of | |||
identifiers. Nevertheless, it discusses many of the categories of | identifiers. Nevertheless, it discusses many of the categories of | |||
characters not appropriate for IRIs. | characters not appropriate for IRIs. | |||
5.2. Software Interfaces and Protocols | 5.2. Software Interfaces and Protocols | |||
Although an IRI is defined as a sequence of characters, software | Although an IRI is defined as a sequence of characters, software | |||
interfaces for URIs typically function on sequences of octets or | interfaces for URIs typically function on sequences of octets or | |||
other kinds of code units. Thus, software interfaces and protocols | other kinds of code units. Thus, software interfaces and protocols | |||
skipping to change at page 22, line 37 | skipping to change at page 23, line 19 | |||
5.5. Relative IRI References | 5.5. Relative IRI References | |||
Processing of relative IRI references against a base is handled | Processing of relative IRI references against a base is handled | |||
straightforwardly; the algorithms of [RFC3986] can be applied | straightforwardly; the algorithms of [RFC3986] can be applied | |||
directly, treating the characters additionally allowed in IRI | directly, treating the characters additionally allowed in IRI | |||
references in the same way that unreserved characters are in URI | references in the same way that unreserved characters are in URI | |||
references. | references. | |||
6. Legacy Extended IRIs (LEIRIs) | 6. Legacy Extended IRIs (LEIRIs) | |||
For historic reasons, some formats have allowed variants of IRIs that | In some cases, there have been formats which have used a protocol | |||
are somewhat less restricted in syntax. This section provides a | element which is a variant of the IRI definition; these variants have | |||
definition and a name (Legacy Extended IRI or LEIRI) for these | usually been somewhat less restricted in syntax. This section | |||
variants for easier reference. These variants have to be used with | provides a definition and a name (Legacy Extended IRI or LEIRI) for | |||
care; they require further processing before being fully | one of these variants used widely in XML-based protocols. This | |||
interchangeable as IRIs. New protocols and formats SHOULD NOT use | variant has to be used with care; it requires further processing | |||
Legacy Extended IRIs. Even where Legacy Extended IRIs are allowed, | before being fully interchangeable as IRIs. New protocols and | |||
only IRIs fully conforming to the syntax definition in Section 2.2 | formats SHOULD NOT use Legacy Extended IRIs. Even where Legacy | |||
SHOULD be created, generated, and used. The provisions in this | Extended IRIs are allowed, only IRIs fully conforming to the syntax | |||
section also apply to Legacy Extended IRI references. | definition in Section 2.2 SHOULD be created, generated, and used. | |||
The provisions in this section also apply to Legacy Extended IRI | ||||
references. | ||||
6.1. Legacy Extended IRI Syntax | 6.1. Legacy Extended IRI Syntax | |||
The syntax of Legacy Extended IRIs is the same as that for IRIs, | This section defines Legacy Extended IRIs (LEIRIs). The syntax of | |||
except that ucschar is redefined as follows: | Legacy Extended IRIs is the same as that for <IRI-reference>, except | |||
that the ucschar production is replaced by the leiri-ucschar | ||||
production: | ||||
ucschar = " " / "<" / ">" / '"' / "{" / "}" / "|" | leiri-ucschar = " " / "<" / ">" / '"' / "{" / "}" / "|" | |||
/ "\" / "^" / "`" / %x0-1F / %x7F-D7FF | / "\" / "^" / "`" / %x0-1F / %x7F-D7FF | |||
/ %xE000-FFFD / %x10000-10FFFF | / %xE000-FFFD / %x10000-10FFFF | |||
The restriction on bidirectional formatting characters in [Bidi] is | The restriction on bidirectional formatting characters in [Bidi] is | |||
lifted. The iprivate production becomes redundant. | lifted. The iprivate production becomes redundant. | |||
Likewise, the syntax for Legacy Extended IRI references (LEIRI | Likewise, the syntax for Legacy Extended IRI references (LEIRI | |||
references) is the same as that for IRI references with the above | references) is the same as that for IRI references with the above | |||
redefinition of ucschar applied. | replacement of ucschar with leiri-ucschar. | |||
Formats that use Legacy Extended IRIs or Legacy Extended IRI | ||||
references MAY further restrict the characters allowed therein, | ||||
either implicitly by the fact that the format as such does not allow | ||||
some characters, or explicitly. An example of a character not | ||||
allowed implicitly may be the NUL character (U+0000). However, all | ||||
the characters allowed in IRIs MUST still be allowed. | ||||
6.2. Conversion of Legacy Extended IRIs to IRIs | 6.2. Conversion of Legacy Extended IRIs to IRIs | |||
To convert a Legacy Extended IRI (reference) to an IRI (reference), | To convert a Legacy Extended IRI (reference) to an IRI (reference), | |||
each character allowed in a Legacy Extended IRI (reference) but not | each character allowed in a Legacy Extended IRI (reference) but not | |||
allowed in an IRI (reference) (see Section 6.3) MUST be percent- | allowed in an IRI (reference) (see Section 6.3) MUST be percent- | |||
encoded by applying steps 2.1 to 2.3 of Section 3.6. | encoded by applying the steps in Section 3.3. | |||
6.3. Characters Allowed in Legacy Extended IRIs but not in IRIs | 6.3. Characters Allowed in Legacy Extended IRIs but not in IRIs | |||
This section provides a list of the groups of characters and code | This section provides a list of the groups of characters and code | |||
points that are allowed in Legacy Extedend IRIs, but are not allowed | points that are allowed in Legacy Extedend IRIs, but are not allowed | |||
in IRIs or are allowed in IRIs only in the query part. For each | in IRIs or are allowed in IRIs only in the query part. For each | |||
group of characters, advice on the usage of these characters is also | group of characters, advice on the usage of these characters is also | |||
given, concentrating on the reasons for why not to use them. | given, concentrating on the reasons for why not to use them. | |||
Space (U+0020): Some formats and applications use space as a | Space (U+0020): Some formats and applications use space as a | |||
delimiter, e.g. for items in a list. Appendix C of [RFC3986] also | delimiter, e.g., for items in a list. Appendix C of [RFC3986] | |||
mentions that white space may have to be added when displaying or | also mentions that white space may have to be added when | |||
printing long URIs; the same applies to long IRIs. This means | displaying or printing long URIs; the same applies to long IRIs. | |||
that spaces can disappear, or can make the Legacy Extended IRI to | Spaces might disappear, or a single Legacy Extended IRI might | |||
be interpreted as two or more separate IRIs. | incorrectly be interpreted as two or more separate ones. | |||
Delimiters "<" (U+003C), ">" (U+003E), and '"' (U+0022): Appendix | Delimiters "<" (U+003C), ">" (U+003E), and '"' (U+0022): Appendix | |||
C of [RFC3986] suggests the use of double-quotes | C of [RFC3986] suggests the use of double-quotes | |||
("http://example.com/") and angle brackets (<http://example.com/>) | ("http://example.com/") and angle brackets (<http://example.com/>) | |||
as delimiters for URIs in plain text. These conventions are often | as delimiters for URIs in plain text. These conventions are often | |||
used, and also apply to IRIs. Legacy Extended IRIs using these | used, and also apply to IRIs. Legacy Extended IRIs using these | |||
characters will be cut off at the wrong place. | characters might be cut off at the wrong place. | |||
Unwise characters "\" (U+005C), "^" (U+005E), "`" (U+0060), "{" | Unwise characters "\" (U+005C), "^" (U+005E), "`" (U+0060), "{" | |||
(U+007B), "|" (U+007C), and "}" (U+007D): These characters | (U+007B), "|" (U+007C), and "}" (U+007D): These characters | |||
originally have been excluded from URIs because the respective | originally were excluded from URIs because the respective | |||
codepoints are assigned to different graphic characters in some | codepoints are assigned to different graphic characters in some | |||
7-bit or 8-bit encoding. Despite the move to Unicode, some of | 7-bit or 8-bit encoding. Despite the move to Unicode, some of | |||
these characters are still occasionally displayed differently on | these characters are still occasionally displayed differently on | |||
some systems, e.g. U+005C as a Japanese Yen symbol. Also, the | some systems, e.g., U+005C as a Japanese Yen symbol. Also, the | |||
fact that these characters are not used in URIs or IRIs has | fact that these characters are not used in URIs or IRIs has | |||
encouraged their use outside URIs or IRIs in contexts that may | encouraged their use outside URIs or IRIs in contexts that may | |||
include URIs or IRIs. In case a Legacy Extended IRI with such a | include URIs or IRIs. In case a Legacy Extended IRI with such a | |||
character is used in such a context, the Legacy Extended IRI will | character is used in such a context, the Legacy Extended IRI will | |||
be interpreted piecemeal. | be interpreted piecemeal. | |||
The controls (C0 controls, DEL, and C1 controls, #x0 - #x1F #x7F - | The controls (C0 controls, DEL, and C1 controls, #x0 - #x1F #x7F - | |||
#x9F): There is no way to transmit these characters reliably | #x9F): There is no way to transmit these characters reliably | |||
except potentially in electronic form. Even when in electronic | except potentially in electronic form. Even when in electronic | |||
form, some software components might silently filter out some of | form, some software components might silently filter out some of | |||
skipping to change at page 25, line 8 | skipping to change at page 25, line 32 | |||
Private use code points (U+E000-F8FF, U+F0000-FFFFD, U+100000- | Private use code points (U+E000-F8FF, U+F0000-FFFFD, U+100000- | |||
10FFFD): Display and interpretation of these code points is by | 10FFFD): Display and interpretation of these code points is by | |||
definition undefined without private agreement. Therefore, these | definition undefined without private agreement. Therefore, these | |||
code points are not suited for use on the Internet. They are not | code points are not suited for use on the Internet. They are not | |||
interoperable and may have unpredictable effects. | interoperable and may have unpredictable effects. | |||
Tags (U+E0000-E0FFF): These characters provide a way to language | Tags (U+E0000-E0FFF): These characters provide a way to language | |||
tag in Unicode plain text. They are not appropriate for Legacy | tag in Unicode plain text. They are not appropriate for Legacy | |||
Extended IRIs because language information in identifiers cannot | Extended IRIs because language information in identifiers cannot | |||
reliably be input, transmitted (e.g. on a visual medium such as | reliably be input, transmitted (e.g., on a visual medium such as | |||
paper), or recognized. | paper), or recognized. | |||
Non-characters (U+FDD0-FDEF, U+1FFFE-1FFFF, U+2FFFE-2FFFF, | Non-characters (U+FDD0-FDEF, U+1FFFE-1FFFF, U+2FFFE-2FFFF, | |||
U+3FFFE-3FFFF, U+4FFFE-4FFFF, U+5FFFE-5FFFF, U+6FFFE-6FFFF, | U+3FFFE-3FFFF, U+4FFFE-4FFFF, U+5FFFE-5FFFF, U+6FFFE-6FFFF, | |||
U+7FFFE-7FFFF, U+8FFFE-8FFFF, U+9FFFE-9FFFF, U+AFFFE-AFFFF, | U+7FFFE-7FFFF, U+8FFFE-8FFFF, U+9FFFE-9FFFF, U+AFFFE-AFFFF, | |||
U+BFFFE-BFFFF, U+CFFFE-CFFFF, U+DFFFE-DFFFF, U+EFFFE-EFFFF, | U+BFFFE-BFFFF, U+CFFFE-CFFFF, U+DFFFE-DFFFF, U+EFFFE-EFFFF, | |||
U+FFFFE-FFFFF, U+10FFFE-10FFFF): These code points are defined as | U+FFFFE-FFFFF, U+10FFFE-10FFFF): These code points are defined as | |||
non-characters. Applications may use some of them internally, but | non-characters. Applications may use some of them internally, but | |||
are not prepared to interchange them. | are not prepared to interchange them. | |||
For reference, we here also list the code points and code units not | For reference, we here also list the code points and code units not | |||
even allowed in Legacy Extended IRIs: | even allowed in Legacy Extended IRIs: | |||
Surrogate code units (D800-DFFF): These do not represent Unicode | Surrogate code units (D800-DFFF): These do not represent Unicode | |||
codepoints. | codepoints. | |||
Non-characters (U+FFFE-FFFF): These are not allowed in XML nor | ||||
LEIRIs. | ||||
7. URI/IRI Processing Guidelines (Informative) | 7. URI/IRI Processing Guidelines (Informative) | |||
This informative section provides guidelines for supporting IRIs in | This informative section provides guidelines for supporting IRIs in | |||
the same software components and operations that currently process | the same software components and operations that currently process | |||
URIs: Software interfaces that handle URIs, software that allows | URIs: Software interfaces that handle URIs, software that allows | |||
users to enter URIs, software that creates or generates URIs, | users to enter URIs, software that creates or generates URIs, | |||
software that displays URIs, formats and protocols that transport | software that displays URIs, formats and protocols that transport | |||
URIs, and software that interprets URIs. These may all require | URIs, and software that interprets URIs. These may all require | |||
modification before functioning properly with IRIs. The | modification before functioning properly with IRIs. The | |||
considerations in this section also apply to URI references and IRI | considerations in this section also apply to URI references and IRI | |||
skipping to change at page 30, line 33 | skipping to change at page 31, line 11 | |||
the character encoding) and will therefore be compatible with IRIs. | the character encoding) and will therefore be compatible with IRIs. | |||
These recommendations, when taken together, will allow for the | These recommendations, when taken together, will allow for the | |||
extension from URIs to IRIs in order to handle characters other than | extension from URIs to IRIs in order to handle characters other than | |||
US-ASCII while minimizing interoperability problems. For | US-ASCII while minimizing interoperability problems. For | |||
considerations regarding the upgrade of URI scheme definitions, see | considerations regarding the upgrade of URI scheme definitions, see | |||
Section 5.4. | Section 5.4. | |||
8. IANA Considerations | 8. IANA Considerations | |||
NOTE: THIS SECTION NEEDS REVIEW AGAINST HAPPIANA WORK. | ||||
RFC Editor and IANA note: Please Replace RFC XXXX with the number of | RFC Editor and IANA note: Please Replace RFC XXXX with the number of | |||
this document when it issues as an RFC. | this document when it issues as an RFC, and RFC YYYY with the number | |||
of the RFC issued for draft-ietf-iri-rfc3987bis. | ||||
IANA maintains a registry of "URI schemes". A "URI scheme" also | IANA maintains a registry of "URI schemes". This document attempts | |||
serves an "IRI scheme". | to make it clear from the registry that a "URI scheme" also serves an | |||
"IRI scheme", and makes several changes to the registry. | ||||
To clarify that the URI scheme registration process also applies to | The description of the registry should be changed: "RFC 4395 defined | |||
IRIs, change the description of the "URI schemes" registry header to | an IANA-maintained registry of URI Schemes. RFC XXXX updates this | |||
say "[RFC4395] defines an IANA-maintained registry of URI Schemes. | registry to make it clear that the registered values also serve as | |||
These registries include the Permanent and Provisional URI Schemes. | IRI schemes, as defined in RFC YYYY." | |||
RFC XXXX updates this registry to designate that schemes may also | ||||
indicate their usability as IRI schemes. | ||||
Update "per RFC 4395" to "per RFC 4395 and RFC XXXX". | The registry includes schemes marked as Permanent or Provisional. | |||
Previously, this was accomplished by having two sections, "Permanent" | ||||
and "Provisional". However, in order to allow other status | ||||
("Historical", and possibly a Proposed status for proposals which | ||||
have been received but not accepted), the registry should be changed | ||||
so that the status is indicated in a separate "Status" column, whose | ||||
values may be "Permanent", "Provisional" or "Historical". Changes in | ||||
status as well as updates to the entire registration may be | ||||
accomplished by requests and expert review. | ||||
9. Security Considerations | 9. Security Considerations | |||
The security considerations discussed in [RFC3986] also apply to | The security considerations discussed in [RFC3986] also apply to | |||
IRIs. In addition, the following issues require particular care for | IRIs. In addition, the following issues require particular care for | |||
IRIs. | IRIs. | |||
Incorrect encoding or decoding can lead to security problems. For | Incorrect encoding or decoding can lead to security problems. For | |||
example, some UTF-8 decoders do not check against overlong byte | example, some UTF-8 decoders do not check against overlong byte | |||
sequences. See [UTR36] Section 3 for details. | sequences. See [UTR36] Section 3 for details. | |||
skipping to change at page 36, line 26 | skipping to change at page 37, line 14 | |||
Resource Identifier (URI): Generic Syntax", STD 66, | Resource Identifier (URI): Generic Syntax", STD 66, | |||
RFC 3986, January 2005. | RFC 3986, January 2005. | |||
[RFC5890] Klensin, J., "Internationalized Domain Names for | [RFC5890] Klensin, J., "Internationalized Domain Names for | |||
Applications (IDNA): Definitions and Document Framework", | Applications (IDNA): Definitions and Document Framework", | |||
RFC 5890, August 2010. | RFC 5890, August 2010. | |||
[RFC5891] Klensin, J., "Internationalized Domain Names in | [RFC5891] Klensin, J., "Internationalized Domain Names in | |||
Applications (IDNA): Protocol", RFC 5891, August 2010. | Applications (IDNA): Protocol", RFC 5891, August 2010. | |||
[RFC5892] Faltstrom, P., "The Unicode Code Points and | ||||
Internationalized Domain Names for Applications (IDNA)", | ||||
RFC 5892, August 2010. | ||||
[STD68] Crocker, D. and P. Overell, "Augmented BNF for Syntax | [STD68] Crocker, D. and P. Overell, "Augmented BNF for Syntax | |||
Specifications: ABNF", STD 68, RFC 5234, January 2008. | Specifications: ABNF", STD 68, RFC 5234, January 2008. | |||
[UNIV6] The Unicode Consortium, "The Unicode Standard, Version | [UNIV6] The Unicode Consortium, "The Unicode Standard, Version | |||
6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, | 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, | |||
ISBN 978-1-936213-01-6)", October 2010. | ISBN 978-1-936213-01-6)", October 2010. | |||
[UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", | [UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", | |||
Unicode Standard Annex #15, March 2008, | Unicode Standard Annex #15, March 2008, | |||
<http://www.unicode.org/unicode/reports/tr15/ | <http://www.unicode.org/unicode/reports/tr15/ | |||
skipping to change at page 39, line 22 | skipping to change at page 40, line 15 | |||
[XPointer] | [XPointer] | |||
Grosso, P., Maler, E., Marsh, J., and N. Walsh, "XPointer | Grosso, P., Maler, E., Marsh, J., and N. Walsh, "XPointer | |||
Framework", World Wide Web Consortium REC-xptr-framework- | Framework", World Wide Web Consortium REC-xptr-framework- | |||
20030325, March 2003, | 20030325, March 2003, | |||
<http://www.w3.org/TR/xptr-framework/#escaping>. | <http://www.w3.org/TR/xptr-framework/#escaping>. | |||
Authors' Addresses | Authors' Addresses | |||
Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever | Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever | |||
possible, for example as "Dürst" in XML and HTML.) | possible, for example as "Dürst" in XML and HTML.) | |||
Aoyama Gakuin University | Aoyama Gakuin University | |||
5-10-1 Fuchinobe | 5-10-1 Fuchinobe | |||
Sagamihara, Kanagawa 229-8558 | Sagamihara, Kanagawa 229-8558 | |||
Japan | Japan | |||
Phone: +81 42 759 6329 | Phone: +81 42 759 6329 | |||
Fax: +81 42 759 6495 | Fax: +81 42 759 6495 | |||
Email: duerst@it.aoyama.ac.jp | Email: duerst@it.aoyama.ac.jp | |||
URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ | URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ | |||
(Note: This is the percent-encoded form of an IRI.) | (Note: This is the percent-encoded form of an IRI.) | |||
End of changes. 35 change blocks. | ||||
84 lines changed or deleted | 118 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |