draft-ietf-iri-3987bis-04.txt   draft-ietf-iri-3987bis-05.txt 
Internationalized Resource M. Duerst Internationalized Resource M. Duerst
Identifiers (iri) Aoyama Gakuin University Identifiers (iri) Aoyama Gakuin University
Internet-Draft M. Suignard Internet-Draft M. Suignard
Obsoletes: 3987 (if approved) Unicode Consortium Obsoletes: 3987 (if approved) Unicode Consortium
Intended status: Standards Track L. Masinter Intended status: Standards Track L. Masinter
Expires: September 15, 2011 Adobe Expires: September 30, 2011 Adobe
March 14, 2011 March 29, 2011
Internationalized Resource Identifiers (IRIs) Internationalized Resource Identifiers (IRIs)
draft-ietf-iri-3987bis-04 draft-ietf-iri-3987bis-05
Abstract Abstract
This document defines the Internationalized Resource Identifier (IRI) This document defines the Internationalized Resource Identifier (IRI)
protocol element, as an extension of the Uniform Resource Identifier protocol element, as an extension of the Uniform Resource Identifier
(URI). An IRI is a sequence of characters from the Universal (URI). An IRI is a sequence of characters from the Universal
Character Set (Unicode/ISO 10646). Grammar and processing rules are Character Set (Unicode/ISO 10646). Grammar and processing rules are
given for IRIs and related syntactic forms. given for IRIs and related syntactic forms.
In addition, this document provides named additional rule sets for In addition, this document provides named additional rule sets for
skipping to change at page 1, line 44 skipping to change at page 1, line 44
related protocol elements when revising protocols, formats, and related protocol elements when revising protocols, formats, and
software components that currently deal only with URIs. software components that currently deal only with URIs.
RFC Editor: Please remove the next paragraph before publication. RFC Editor: Please remove the next paragraph before publication.
This document is intended to update RFC 3987 and move towards IETF This document is intended to update RFC 3987 and move towards IETF
Draft Standard. For discussion and comments on this draft, please Draft Standard. For discussion and comments on this draft, please
join the IETF IRI WG by subscribing to the mailing list join the IETF IRI WG by subscribing to the mailing list
public-iri@w3.org. For a list of open issues, please see the issue public-iri@w3.org. For a list of open issues, please see the issue
tracker of the WG at http://trac.tools.ietf.org/wg/iri/trac/report/1. tracker of the WG at http://trac.tools.ietf.org/wg/iri/trac/report/1.
For a list of individual edits, please see the change history at
http://trac.tools.ietf.org/wg/iri/trac/log/draft-ietf-iri-3987bis.
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 2, line 21 skipping to change at page 2, line 22
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 15, 2011. This Internet-Draft will expire on September 30, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 19 skipping to change at page 3, line 19
1.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6 1.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6
1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 7 1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4. Notation . . . . . . . . . . . . . . . . . . . . . . . . 9
2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 10 2.1. Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 10
2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10 2.2. ABNF for IRI References and IRIs . . . . . . . . . . . . 10
3. Processing IRIs and related protocol elements . . . . . . . . 13 3. Processing IRIs and related protocol elements . . . . . . . . 13
3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 14 3.1. Converting to UCS . . . . . . . . . . . . . . . . . . . . 14
3.2. Parse the IRI into IRI components . . . . . . . . . . . . 14 3.2. Parse the IRI into IRI components . . . . . . . . . . . . 14
3.3. General percent-encoding of IRI components . . . . . . . 15 3.3. General percent-encoding of IRI components . . . . . . . 15
3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 16 3.4. Mapping ireg-name . . . . . . . . . . . . . . . . . . . . 15
3.4.1. Mapping using Percent-Encoding . . . . . . . . . . . . 15
3.4.2. Mapping using Punycode . . . . . . . . . . . . . . . . 16
3.4.3. Additional Considerations . . . . . . . . . . . . . . 16
3.5. Mapping query components . . . . . . . . . . . . . . . . 17 3.5. Mapping query components . . . . . . . . . . . . . . . . 17
3.6. Mapping IRIs to URIs . . . . . . . . . . . . . . . . . . 17 3.6. Mapping IRIs to URIs . . . . . . . . . . . . . . . . . . 17
3.7. Converting URIs to IRIs . . . . . . . . . . . . . . . . . 17 3.7. Converting URIs to IRIs . . . . . . . . . . . . . . . . . 17
3.7.1. Examples . . . . . . . . . . . . . . . . . . . . . . . 19 3.7.1. Examples . . . . . . . . . . . . . . . . . . . . . . . 19
4. Bidirectional IRIs for Right-to-Left Languages . . . . . . . . 20 4. Bidirectional IRIs for Right-to-Left Languages . . . . . . . . 20
4.1. Logical Storage and Visual Presentation . . . . . . . . . 21 4.1. Logical Storage and Visual Presentation . . . . . . . . . 21
4.2. Bidi IRI Structure . . . . . . . . . . . . . . . . . . . 22 4.2. Bidi IRI Structure . . . . . . . . . . . . . . . . . . . 22
4.3. Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . 23 4.3. Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . 23
4.4. Examples . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4. Examples . . . . . . . . . . . . . . . . . . . . . . . . 23
5. Normalization and Comparison . . . . . . . . . . . . . . . . . 25 5. Normalization and Comparison . . . . . . . . . . . . . . . . . 25
skipping to change at page 3, line 43 skipping to change at page 3, line 46
5.3.1. Simple String Comparison . . . . . . . . . . . . . . . 27 5.3.1. Simple String Comparison . . . . . . . . . . . . . . . 27
5.3.2. Syntax-Based Normalization . . . . . . . . . . . . . . 28 5.3.2. Syntax-Based Normalization . . . . . . . . . . . . . . 28
5.3.3. Scheme-Based Normalization . . . . . . . . . . . . . . 31 5.3.3. Scheme-Based Normalization . . . . . . . . . . . . . . 31
5.3.4. Protocol-Based Normalization . . . . . . . . . . . . . 32 5.3.4. Protocol-Based Normalization . . . . . . . . . . . . . 32
6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 33 6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.1. Limitations on UCS Characters Allowed in IRIs . . . . . . 33 6.1. Limitations on UCS Characters Allowed in IRIs . . . . . . 33
6.2. Software Interfaces and Protocols . . . . . . . . . . . . 33 6.2. Software Interfaces and Protocols . . . . . . . . . . . . 33
6.3. Format of URIs and IRIs in Documents and Protocols . . . 34 6.3. Format of URIs and IRIs in Documents and Protocols . . . 34
6.4. Use of UTF-8 for Encoding Original Characters . . . . . . 34 6.4. Use of UTF-8 for Encoding Original Characters . . . . . . 34
6.5. Relative IRI References . . . . . . . . . . . . . . . . . 36 6.5. Relative IRI References . . . . . . . . . . . . . . . . . 36
7. Liberal handling of otherwise invalid IRIs . . . . . . . . . . 36 7. Liberal Handling of Otherwise Invalid IRIs . . . . . . . . . . 36
7.1. LEIRI processing . . . . . . . . . . . . . . . . . . . . 36 7.1. LEIRI Processing . . . . . . . . . . . . . . . . . . . . 36
7.2. Web Address processing . . . . . . . . . . . . . . . . . 37 7.2. Web Address Processing . . . . . . . . . . . . . . . . . 37
7.3. Characters not allowed in IRIs . . . . . . . . . . . . . 38 7.3. Characters Not Allowed in IRIs . . . . . . . . . . . . . 38
8. URI/IRI Processing Guidelines (Informative) . . . . . . . . . 40 8. URI/IRI Processing Guidelines (Informative) . . . . . . . . . 40
8.1. URI/IRI Software Interfaces . . . . . . . . . . . . . . . 40 8.1. URI/IRI Software Interfaces . . . . . . . . . . . . . . . 40
8.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 41 8.2. URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 41
8.3. URI/IRI Transfer between Applications . . . . . . . . . . 42 8.3. URI/IRI Transfer between Applications . . . . . . . . . . 42
8.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 42 8.4. URI/IRI Generation . . . . . . . . . . . . . . . . . . . 42
8.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 43 8.5. URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 43
8.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 43 8.6. Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 43
8.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 44 8.7. Interpretation of URIs and IRIs . . . . . . . . . . . . . 44
8.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 44 8.8. Upgrading Strategy . . . . . . . . . . . . . . . . . . . 44
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45
10. Security Considerations . . . . . . . . . . . . . . . . . . . 46 10. Security Considerations . . . . . . . . . . . . . . . . . . . 46
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 47 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 47
12. Main Changes Since RFC 3987 . . . . . . . . . . . . . . . . . 48 12. Main Changes Since RFC 3987 . . . . . . . . . . . . . . . . . 47
12.1. Major restructuring of IRI processing model . . . . . . . 48 12.1. Major restructuring of IRI processing model . . . . . . . 47
12.1.1. OLD WAY . . . . . . . . . . . . . . . . . . . . . . . 48 12.1.1. OLD WAY . . . . . . . . . . . . . . . . . . . . . . . 48
12.1.2. NEW WAY . . . . . . . . . . . . . . . . . . . . . . . 49 12.1.2. NEW WAY . . . . . . . . . . . . . . . . . . . . . . . 48
12.1.3. Extension of Syntax . . . . . . . . . . . . . . . . . 49 12.1.3. Extension of Syntax . . . . . . . . . . . . . . . . . 48
12.1.4. More to be added . . . . . . . . . . . . . . . . . . . 49 12.1.4. More to be added . . . . . . . . . . . . . . . . . . . 49
12.2. Change Log . . . . . . . . . . . . . . . . . . . . . . . 49 12.2. Change Log . . . . . . . . . . . . . . . . . . . . . . . 49
12.2.1. Changes after draft-ietf-iri-3987bis-01 . . . . . . . 49 12.2.1. Changes after draft-ietf-iri-3987bis-01 . . . . . . . 49
12.2.2. Changes from draft-duerst-iri-bis-07 to 12.2.2. Changes from draft-duerst-iri-bis-07 to
draft-ietf-iri-3987bis-00 . . . . . . . . . . . . . . 49 draft-ietf-iri-3987bis-00 . . . . . . . . . . . . . . 49
12.2.3. Changes from -06 to -07 of draft-duerst-iri-bis . . . 49 12.2.3. Changes from -06 to -07 of draft-duerst-iri-bis . . . 49
12.3. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 50 12.3. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 49
12.4. Changes from -05 to -06 of draft-duerst-iri-bis-00 . . . 50 12.4. Changes from -05 to -06 of draft-duerst-iri-bis-00 . . . 49
12.5. Changes from -04 to -05 of draft-duerst-iri-bis . . . . . 50 12.5. Changes from -04 to -05 of draft-duerst-iri-bis . . . . . 50
12.6. Changes from -03 to -04 of draft-duerst-iri-bis . . . . . 50 12.6. Changes from -03 to -04 of draft-duerst-iri-bis . . . . . 50
12.7. Changes from -02 to -03 of draft-duerst-iri-bis . . . . . 50 12.7. Changes from -02 to -03 of draft-duerst-iri-bis . . . . . 50
12.8. Changes from -01 to -02 of draft-duerst-iri-bis . . . . . 50 12.8. Changes from -01 to -02 of draft-duerst-iri-bis . . . . . 50
12.9. Changes from -00 to -01 of draft-duerst-iri-bis . . . . . 51 12.9. Changes from -00 to -01 of draft-duerst-iri-bis . . . . . 50
12.10. Changes from RFC 3987 to -00 of draft-duerst-iri-bis . . 51 12.10. Changes from RFC 3987 to -00 of draft-duerst-iri-bis . . 50
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 51 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 51
13.1. Normative References . . . . . . . . . . . . . . . . . . 51 13.1. Normative References . . . . . . . . . . . . . . . . . . 51
13.2. Informative References . . . . . . . . . . . . . . . . . 52 13.2. Informative References . . . . . . . . . . . . . . . . . 52
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 55 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 54
1. Introduction 1. Introduction
1.1. Overview and Motivation 1.1. Overview and Motivation
A Uniform Resource Identifier (URI) is defined in [RFC3986] as a A Uniform Resource Identifier (URI) is defined in [RFC3986] as a
sequence of characters chosen from a limited subset of the repertoire sequence of characters chosen from a limited subset of the repertoire
of US-ASCII [ASCII] characters. of US-ASCII [ASCII] characters.
The characters in URIs are frequently used for representing words of The characters in URIs are frequently used for representing words of
skipping to change at page 11, line 4 skipping to change at page 11, line 4
The following grammar closely follows the URI grammar in [RFC3986], The following grammar closely follows the URI grammar in [RFC3986],
except that the range of unreserved characters is expanded to include except that the range of unreserved characters is expanded to include
UCS characters, with the restriction that private UCS characters can UCS characters, with the restriction that private UCS characters can
occur only in query parts. The grammar is split into two parts: occur only in query parts. The grammar is split into two parts:
Rules that differ from [RFC3986] because of the above-mentioned Rules that differ from [RFC3986] because of the above-mentioned
expansion, and rules that are the same as those in [RFC3986]. For expansion, and rules that are the same as those in [RFC3986]. For
rules that are different than those in [RFC3986], the names of the rules that are different than those in [RFC3986], the names of the
non-terminals have been changed as follows. If the non-terminal non-terminals have been changed as follows. If the non-terminal
contains 'URI', this has been changed to 'IRI'. Otherwise, an 'i' contains 'URI', this has been changed to 'IRI'. Otherwise, an 'i'
has been prefixed. has been prefixed. The rule <pct-form> has been introduced in order
to be able to reference it from other parts of the document.
The following rules are different from those in [RFC3986]: The following rules are different from those in [RFC3986]:
IRI = scheme ":" ihier-part [ "?" iquery ] IRI = scheme ":" ihier-part [ "?" iquery ]
[ "#" ifragment ] [ "#" ifragment ]
ihier-part = "//" iauthority ipath-abempty ihier-part = "//" iauthority ipath-abempty
/ ipath-absolute / ipath-absolute
/ ipath-rootless / ipath-rootless
/ ipath-empty / ipath-empty
skipping to change at page 12, line 4 skipping to change at page 12, line 5
ipath-absolute = path-sep [ isegment-nz *( path-sep isegment ) ] ipath-absolute = path-sep [ isegment-nz *( path-sep isegment ) ]
ipath-noscheme = isegment-nz-nc *( path-sep isegment ) ipath-noscheme = isegment-nz-nc *( path-sep isegment )
ipath-rootless = isegment-nz *( path-sep isegment ) ipath-rootless = isegment-nz *( path-sep isegment )
ipath-empty = 0<ipchar> ipath-empty = 0<ipchar>
path-sep = "/" path-sep = "/"
isegment = *ipchar isegment = *ipchar
isegment-nz = 1*ipchar isegment-nz = 1*ipchar
isegment-nz-nc = 1*( iunreserved / pct-form / sub-delims isegment-nz-nc = 1*( iunreserved / pct-form / sub-delims
/ "@" ) / "@" )
; non-zero-length segment without any colon ":" ; non-zero-length segment without any colon ":"
ipchar = iunreserved / pct-form / sub-delims / ":" ipchar = iunreserved / pct-form / sub-delims / ":"
/ "@" / "@"
iquery = *( ipchar / iprivate / "/" / "?" ) iquery = *( ipchar / iprivate / "/" / "?" )
ifragment = *( ipchar / "/" / "?" / "#" ) ifragment = *( ipchar / "/" / "?" )
iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar
ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
/ %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
/ %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
/ %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
/ %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
/ %xD0000-DFFFD / %xE1000-EFFFD / %xD0000-DFFFD / %xE1000-EFFFD
skipping to change at page 14, line 19 skipping to change at page 14, line 19
which establish the relationship between the string given and the which establish the relationship between the string given and the
interpreted derivatives. These processing steps apply to both IRIs interpreted derivatives. These processing steps apply to both IRIs
and IRI references (i.e., absolute or relative forms); for IRIs, some and IRI references (i.e., absolute or relative forms); for IRIs, some
steps are scheme specific. steps are scheme specific.
3.1. Converting to UCS 3.1. Converting to UCS
Input that is already in a Unicode form (i.e., a sequence of Unicode Input that is already in a Unicode form (i.e., a sequence of Unicode
characters or an octet-stream representing a Unicode-based character characters or an octet-stream representing a Unicode-based character
encoding such as UTF-8 or UTF-16) should be left as is and not encoding such as UTF-8 or UTF-16) should be left as is and not
normalized (see (see Section 5.3.2.2). normalized (see Section 5.3.2.2).
An IRI or IRI reference is a sequence of characters from the UCS. An IRI or IRI reference is a sequence of characters from the UCS.
For IRIs that are not already in a Unicode form (as when written on For IRIs that are not already in a Unicode form (as when written on
paper, read aloud, or represented in a text stream using a legacy paper, read aloud, or represented in a text stream using a legacy
character encoding), convert the IRI to Unicode. Note that some character encoding), convert the IRI to Unicode. Note that some
character encodings or transcriptions can be converted to or character encodings or transcriptions can be converted to or
represented by more than one sequence of Unicode characters. Ideally represented by more than one sequence of Unicode characters. Ideally
the resulting IRI would use a normalized form, such as Unicode the resulting IRI would use a normalized form, such as Unicode
Normalization Form C [UTR15] (see Section 5.3 Normalization and Normalization Form C [UTR15] (see Section 5.3 Normalization and
Comparison), since that ensures a stable, consistent representation Comparison), since that ensures a stable, consistent representation
skipping to change at page 14, line 44 skipping to change at page 14, line 44
In other cases (written on paper, read aloud, or otherwise In other cases (written on paper, read aloud, or otherwise
represented independent of any character encoding) represent the IRI represented independent of any character encoding) represent the IRI
as a sequence of characters from the UCS normalized according to as a sequence of characters from the UCS normalized according to
Unicode Normalization Form C (NFC, [UTR15]). Unicode Normalization Form C (NFC, [UTR15]).
3.2. Parse the IRI into IRI components 3.2. Parse the IRI into IRI components
Parse the IRI, either as a relative reference (no scheme) or using Parse the IRI, either as a relative reference (no scheme) or using
scheme specific processing (according to the scheme given); the scheme specific processing (according to the scheme given); the
result resulting in a set of parsed IRI components. (NOTE: FIX result is a set of parsed IRI components.
BEFORE RELEASE: INTENT IS THAT ALL IRI SCHEMES THAT USE GENERIC
SYNTAX AND ALLOW NON-ASCII AUTHORITY CAN ONLY USE AUTHORITY FOR NAMES
THAT FOLLOW PUNICODE.)
NOTE: The result of parsing into components will correspond result in NOTE: The result of parsing into components will correspond to
a correspondence of subtrings of the IRI according to the part subtrings of the IRI that may be accessible via an API. For example,
matched. For example, in [HTML5], the protocol components of in [HTML5], the protocol components of interest are SCHEME (scheme),
interest are SCHEME (scheme), HOST (ireg-name), PORT (port), the PATH HOST (ireg-name), PORT (port), the PATH (ipath after the initial
(ipath after the initial "/"), QUERY (iquery), FRAGMENT (ifragment), "/"), QUERY (iquery), FRAGMENT (ifragment), and AUTHORITY
and AUTHORITY (iauthority). (iauthority).
Subsequent processing rules are sometimes used to define other Subsequent processing rules are sometimes used to define other
syntactic components. For example, [HTML5] defines APIs for IRI syntactic components. For example, [HTML5] defines APIs for IRI
processing; in these APIs: processing; in these APIs:
HOSTSPECIFIC the substring that follows the substring matched by the HOSTSPECIFIC the substring that follows the substring matched by the
iauthority production, or the whole string if the iauthority iauthority production, or the whole string if the iauthority
production wasn't matched. production wasn't matched.
HOSTPORT if there is a scheme component and a port component and the HOSTPORT if there is a scheme component and a port component and the
port given by the port component is different than the default port given by the port component is different than the default
port defined for the protocol given by the scheme component, then port defined for the protocol given by the scheme component, then
HOSTPORT is the substring that starts with the substring matched HOSTPORT is the substring that starts with the substring matched
by the host production and ends with the substring matched by the by the host production and ends with the substring matched by the
port production, and includes the colon in between the two. port production, and includes the colon in between the two.
Otherwise, it is the same as the host component. Otherwise, it is the same as the host component.
3.3. General percent-encoding of IRI components 3.3. General percent-encoding of IRI components
For most IRI components, it is possible to map the IRI component to Except as noted in the following subsections, IRI components are
an equivalent URI component by percent-encoding those characters not mapped to the equivalent URI components by percent-encoding those
allowed in URIs. Previous processing steps will have removed some characters not allowed in URIs. Previous processing steps will have
characters, and the interpretation of reserved characters will have removed some characters, and the interpretation of reserved
already been done (with the syntactic reserved characters outside of characters will have already been done (with the syntactic reserved
the IRI component). This mapping is defined for all sequences of characters outside of the IRI component). This mapping is defined
Unicode characters, whether or not they are valid for the component for all sequences of Unicode characters, whether or not they are
in question. valid for the component in question.
For each character which is not allowed in a valid URI (NOTE: WHAT IS For each character which is not allowed anywhere in a valid URI,
THE RIGHT REFERENCE HERE), apply the following steps. apply the following steps.
Convert to UTF-8 Convert the character to a sequence of one or more Convert to UTF-8 Convert the character to a sequence of one or more
octets using UTF-8 [RFC3629]. octets using UTF-8 [RFC3629].
Percent encode Convert each octet of this sequence to %HH, where HH Percent encode Convert each octet of this sequence to %HH, where HH
is the hexadecimal notation of the octet value. The hexadecimal is the hexadecimal notation of the octet value. The hexadecimal
notation SHOULD use uppercase letters. (This is the general URI notation SHOULD use uppercase letters. (This is the general URI
percent-encoding mechanism in Section 2.1 of [RFC3986].) percent-encoding mechanism in Section 2.1 of [RFC3986].)
Note that the mapping is an identity transformation for parsed URI Note that the mapping is an identity transformation for parsed URI
components of valid URIs, and is idempotent: applying the mapping a components of valid URIs, and is idempotent: applying the mapping a
second time will not change anything. second time will not change anything.
3.4. Mapping ireg-name 3.4. Mapping ireg-name
Schemes that allow non-ASCII based characters in the reg-name (ireg- 3.4.1. Mapping using Percent-Encoding
name) position MUST convert the ireg-name component of an IRI as
follows: The ireg-name component SHOULD be converted according to the general
procedure for percent-encoding of IRI components described in
Section 3.3.
For example, the IRI
"http://r&#xE9;sum&#xE9;.example.org"
will be converted to
"http://r%C3%A9sum%C3%A9.example.org".
This conversion for ireg-name is in line with Section 3.2.2 of
[RFC3986], which does not mandate a particular registered name lookup
technology. For further background, see [RFC6055] and [Gettys].
3.4.2. Mapping using Punycode
The ireg-name component MAY also be converted as follows:
Replace the ireg-name part of the IRI by the part converted using the Replace the ireg-name part of the IRI by the part converted using the
ToASCII operation specified in Section 4.1 of [RFC3490] on each dot- Domain Name Lookup procedure (Subsections 5.3 to 5.5) of [RFC5891].
separated label, and by using U+002E (FULL STOP) as a label on each dot-separated label, and by using U+002E (FULL STOP) as a
separator, with the flag UseSTD3ASCIIRules set to FALSE, and with the label separator. This procedure may fail, but this would mean that
flag AllowUnassigned set to FALSE. The ToASCII operation may fail, the IRI cannot be resolved. In such cases, if the domain name
but this would mean that the IRI cannot be resolved. In such cases, conversion fails, then the entire IRI conversion fails. Processors
if the domain name conversion fails, then the entire IRI conversion that have no mechanism for signalling a failure MAY instead
fails. Processors that have no mechanism for signalling a failure substitute an otherwise invalid host name, although such processing
MAY instead substitute an otherwise invalid host name, although such SHOULD be avoided.
processing SHOULD be avoided.
For example, the IRI For example, the IRI
"http://r&#xE9;sum&#xE9;.example.org" "http://r&#xE9;sum&#xE9;.example.org"
MAY be converted to MAY be converted to
"http://xn--rsum-bad.example.org" "http://xn--rsum-bad.example.org"
; conversion to percent-encoded form, e.g., .
"http://r%C3%A9sum%C3%A9.example.org", MUST NOT be performed.
This conversion for ireg-name will be better able to deal with legacy
infrastructure that cannot handle percent-encoding in domain names.
3.4.3. Additional Considerations
Note: Domain Names may appear in parts of an IRI other than the Note: Domain Names may appear in parts of an IRI other than the
ireg-name part. It is the responsibility of scheme-specific ireg-name part. It is the responsibility of scheme-specific
implementations (if the Internationalized Domain Name is part of implementations (if the Internationalized Domain Name is part of
the scheme syntax) or of server-side implementations (if the the scheme syntax) or of server-side implementations (if the
Internationalized Domain Name is part of 'iquery') to apply the Internationalized Domain Name is part of 'iquery') to apply the
necessary conversions at the appropriate point. Example: Trying necessary conversions at the appropriate point. Example: Trying
to validate the Web page at to validate the Web page at
http://r&#xE9;sum&#xE9;.example.org would lead to an IRI of http://r&#xE9;sum&#xE9;.example.org would lead to an IRI of
http://validator.w3.org/check?uri=http%3A%2F%2Fr&#xE9;sum&#xE9;. http://validator.w3.org/check?uri=http%3A%2F%2Fr&#xE9;sum&#xE9;.
skipping to change at page 17, line 5 skipping to change at page 17, line 17
Note: In this process, characters allowed in URI references and Note: In this process, characters allowed in URI references and
existing percent-encoded sequences are not encoded further. (This existing percent-encoded sequences are not encoded further. (This
mapping is similar to, but different from, the encoding applied mapping is similar to, but different from, the encoding applied
when arbitrary content is included in some part of a URI.) For when arbitrary content is included in some part of a URI.) For
example, an IRI of example, an IRI of
"http://www.example.org/red%09ros&#xE9;#red" (in XML notation) is "http://www.example.org/red%09ros&#xE9;#red" (in XML notation) is
converted to converted to
"http://www.example.org/red%09ros%C3%A9#red", not to something "http://www.example.org/red%09ros%C3%A9#red", not to something
like like
"http%3A%2F%2Fwww.example.org%2Fred%2509ros%C3%A9%23red". "http%3A%2F%2Fwww.example.org%2Fred%2509ros%C3%A9%23red".
((DESIGN QUESTION: What about e.g.
http://r%C3%A9sum%C3%A9.example.org in an IRI? Will that get
converted to punycode, or not?))
3.5. Mapping query components 3.5. Mapping query components
((NOTE: SEE ISSUES LIST)) For compatibility with existing deployed ((NOTE: SEE ISSUES LIST)) For compatibility with existing deployed
HTTP infrastructure, the following special case applies for schemes HTTP infrastructure, the following special case applies for schemes
"http" and "https" and IRIs whose origin has a document charset other "http" and "https" and IRIs whose origin has a document charset other
than one which is UCS-based (e.g., UTF-8 or UTF-16). In such a case, than one which is UCS-based (e.g., UTF-8 or UTF-16). In such a case,
the "query" component of an IRI is mapped into a URI by using the the "query" component of an IRI is mapped into a URI by using the
document charset rather than UTF-8 as the binary representation document charset rather than UTF-8 as the binary representation
before pct-encoding. This mapping is not applied for any other before pct-encoding. This mapping is not applied for any other
skipping to change at page 30, line 35 skipping to change at page 30, line 35
to the uppercase/lowercase problems. Some parts of a URI are case to the uppercase/lowercase problems. Some parts of a URI are case
insensitive (for example, the domain name). For others, it is insensitive (for example, the domain name). For others, it is
unclear whether they are case sensitive, case insensitive, or unclear whether they are case sensitive, case insensitive, or
something in between (e.g., case sensitive, but with a multiple something in between (e.g., case sensitive, but with a multiple
choice selection if the wrong case is used, instead of a direct choice selection if the wrong case is used, instead of a direct
negative result). The best recipe is that the creator use a negative result). The best recipe is that the creator use a
reasonable capitalization and, when transferring the URI, reasonable capitalization and, when transferring the URI,
capitalization never be changed. capitalization never be changed.
Various IRI schemes may allow the usage of Internationalized Domain Various IRI schemes may allow the usage of Internationalized Domain
Names (IDN) [RFC3490] either in the ireg-name part or elsewhere. Names (IDN) [RFC5890] either in the ireg-name part or elsewhere.
Character Normalization also applies to IDNs, as discussed in Character Normalization also applies to IDNs, as discussed in
Section 5.3.3. Section 5.3.3.
5.3.2.3. Percent-Encoding Normalization 5.3.2.3. Percent-Encoding Normalization
The percent-encoding mechanism (Section 2.1 of [RFC3986]) is a The percent-encoding mechanism (Section 2.1 of [RFC3986]) is a
frequent source of variance among otherwise identical IRIs. In frequent source of variance among otherwise identical IRIs. In
addition to the case normalization issue noted above, some IRI addition to the case normalization issue noted above, some IRI
producers percent-encode octets that do not require percent-encoding, producers percent-encode octets that do not require percent-encoding,
resulting in IRIs that are equivalent to their nonencoded resulting in IRIs that are equivalent to their nonencoded
skipping to change at page 36, line 21 skipping to change at page 36, line 21
used if the query part is encoded in UTF-8. used if the query part is encoded in UTF-8.
6.5. Relative IRI References 6.5. Relative IRI References
Processing of relative IRI references against a base is handled Processing of relative IRI references against a base is handled
straightforwardly; the algorithms of [RFC3986] can be applied straightforwardly; the algorithms of [RFC3986] can be applied
directly, treating the characters additionally allowed in IRI directly, treating the characters additionally allowed in IRI
references in the same way that unreserved characters are in URI references in the same way that unreserved characters are in URI
references. references.
7. Liberal handling of otherwise invalid IRIs 7. Liberal Handling of Otherwise Invalid IRIs
(EDITOR NOTE: This Section may move to an appendix.) Some technical (EDITOR NOTE: This Section may move to an appendix.) Some technical
specifications and widely-deployed software have allowed additional specifications and widely-deployed software have allowed additional
variations and extensions of IRIs to be used in syntactic components. variations and extensions of IRIs to be used in syntactic components.
This section describes two widely-used preprocessing agreements. This section describes two widely-used preprocessing agreements.
Other technical specifications may wish to reference a syntactic Other technical specifications may wish to reference a syntactic
component which is "a valid IRI or a string that will map to a valid component which is "a valid IRI or a string that will map to a valid
IRI after this preprocessing algorithm". These two variants are IRI after this preprocessing algorithm". These two variants are
known as Legacy Extended IRI or LEIRI [LEIRI], and Web Address known as Legacy Extended IRI or LEIRI [LEIRI], and Web Address
[HTML5]). [HTML5]).
Future technical specifications SHOULD NOT allow conforming producers Future technical specifications SHOULD NOT allow conforming producers
to produce, or conforming content to contain, such forms, as they are to produce, or conforming content to contain, such forms, as they are
not interoperable with other IRI consuming software. not interoperable with other IRI consuming software.
7.1. LEIRI processing 7.1. LEIRI Processing
This section defines Legacy Extended IRIs (LEIRIs). The syntax of This section defines Legacy Extended IRIs (LEIRIs). The syntax of
Legacy Extended IRIs is the same as that for <IRI-reference>, except Legacy Extended IRIs is the same as that for <IRI-reference>, except
that the ucschar production is replaced by the leiri-ucschar that the ucschar production is replaced by the leiri-ucschar
production: production:
leiri-ucschar = " " / "<" / ">" / '"' / "{" / "}" / "|" leiri-ucschar = " " / "<" / ">" / '"' / "{" / "}" / "|"
/ "\" / "^" / "`" / %x0-1F / %x7F-D7FF / "\" / "^" / "`" / %x0-1F / %x7F-D7FF
/ %xE000-FFFD / %x10000-10FFFF / %xE000-FFFD / %x10000-10FFFF
Among other extensions, processors based on this specification also Among other extensions, processors based on this specification also
did not enforce the restriction on bidirectional formatting did not enforce the restriction on bidirectional formatting
characters in Section 4.1, and the iprivate production becomes characters in Section 4.1, and the iprivate production becomes
redundant. redundant.
To convert a string allowed as a LEIRI to an IRI, each character To convert a string allowed as a LEIRI to an IRI, each character
allowed in leiri-ucschar but not in ucschar must be percent-encoded allowed in leiri-ucschar but not in ucschar must be percent-encoded
using Section 3.3. using Section 3.3.
7.2. Web Address processing 7.2. Web Address Processing
Many popular web browsers have taken the approach of being quite Many popular web browsers have taken the approach of being quite
liberal in what is accepted as a "URL" or its relative forms. This liberal in what is accepted as a "URL" or its relative forms. This
section describes their behavior in terms of a preprocessor which section describes their behavior in terms of a preprocessor which
maps strings into the IRI space for subsequent parsing and maps strings into the IRI space for subsequent parsing and
interpretation as an IRI. interpretation as an IRI.
In some situations, it might be appropriate to describe the syntax In some situations, it might be appropriate to describe the syntax
that a liberal consumer implementation might accept as a "Web that a liberal consumer implementation might accept as a "Web
Address" or "Hypertext Reference" or "HREF". However, technical Address" or "Hypertext Reference" or "HREF". However, technical
skipping to change at page 38, line 6 skipping to change at page 38, line 6
has a Document, and the HRef-charset is the Document's character has a Document, and the HRef-charset is the Document's character
encoding. encoding.
If the string had a HRef-charset defined when the string was created If the string had a HRef-charset defined when the string was created
or defined The HRef-charset is as defined. or defined The HRef-charset is as defined.
If the resulting HRef-charset is a unicode based character encoding If the resulting HRef-charset is a unicode based character encoding
(e.g., UTF-16), then use UTF-8 instead. (e.g., UTF-16), then use UTF-8 instead.
The syntax for Web Addresses is obtained by replacing the 'ucschar', The syntax for Web Addresses is obtained by replacing the 'ucschar',
pct-form, and path-sep rules with the href-ucschar, href-pct-form, pct-form, path-sep, and ifragment rules with the href-ucschar, href-
and href-path-sep rules below. In addition, some characters are pct-form, href-path-sep, and href-ifragment rules below. In
stripped. addition, some characters are stripped.
href-ucschar = " " / "<" / ">" / DQUOTE / "{" / "}" / "|" href-ucschar = " " / "<" / ">" / DQUOTE / "{" / "}" / "|"
/ "\" / "^" / "`" / %x0-1F / %x7F-D7FF / "\" / "^" / "`" / %x0-1F / %x7F-D7FF
/ %xE000-FFFD / %x10000-10FFFF / %xE000-FFFD / %x10000-10FFFF
href-pct-form = pct-encoded / "%" href-pct-form = pct-encoded / "%"
href-path-sep = "/" / "\" href-path-sep = "/" / "\"
href-strip = <to be done> href-ifragment = *( ipchar / "/" / "?" / "#" ) ; adding "#"
href-strip = <to be done>
(NOTE: NEED TO FIX THESE SETS TO MATCH HTML5; NOT SURE ABOUT NEXT (NOTE: NEED TO FIX THESE SETS TO MATCH HTML5; NOT SURE ABOUT NEXT
SENTENCE) browsers did not enforce the restriction on bidirectional SENTENCE) browsers did not enforce the restriction on bidirectional
formatting characters in Section 4.1, and the iprivate production formatting characters in Section 4.1, and the iprivate production
becomes redundant. becomes redundant.
'Web Address processing' requires the following additional 'Web Address processing' requires the following additional
preprocessing steps: preprocessing steps:
1. Leading and trailing instances of space (U+0020), CR (U+000A), LF 1. Leading and trailing instances of space (U+0020), CR (U+000A), LF
skipping to change at page 38, line 38 skipping to change at page 38, line 39
2. strip all characters in href-strip. 2. strip all characters in href-strip.
3. Percent-encode all characters in href-ucschar not in ucschar. 3. Percent-encode all characters in href-ucschar not in ucschar.
4. Replace occurrences of "%" not followed by two hexadecimal digits 4. Replace occurrences of "%" not followed by two hexadecimal digits
by "%25". by "%25".
5. Convert backslashes ('\') matching href-path-sep to forward 5. Convert backslashes ('\') matching href-path-sep to forward
slashes ('/'). slashes ('/').
7.3. Characters not allowed in IRIs 7.3. Characters Not Allowed in IRIs
This section provides a list of the groups of characters and code This section provides a list of the groups of characters and code
points that are allowed by LEIRI or HREF but are not allowed in IRIs points that are allowed by LEIRI or HREF but are not allowed in IRIs
or are allowed in IRIs only in the query part. For each group of or are allowed in IRIs only in the query part. For each group of
characters, advice on the usage of these characters is also given, characters, advice on the usage of these characters is also given,
concentrating on the reasons for why they are excluded from IRI use. concentrating on the reasons for why they are excluded from IRI use.
Space (U+0020): Some formats and applications use space as a Space (U+0020): Some formats and applications use space as a
delimiter, e.g. for items in a list. Appendix C of [RFC3986] also delimiter, e.g. for items in a list. Appendix C of [RFC3986] also
mentions that white space may have to be added when displaying or mentions that white space may have to be added when displaying or
skipping to change at page 46, line 17 skipping to change at page 46, line 17
indicate their usability as IRI schemes. indicate their usability as IRI schemes.
Update "per RFC 4395" to "per RFC 4395 and RFC XXXX". Update "per RFC 4395" to "per RFC 4395 and RFC XXXX".
10. Security Considerations 10. Security Considerations
The security considerations discussed in [RFC3986] also apply to The security considerations discussed in [RFC3986] also apply to
IRIs. In addition, the following issues require particular care for IRIs. In addition, the following issues require particular care for
IRIs. IRIs.
Incorrect encoding or decoding can lead to security problems. In Incorrect encoding or decoding can lead to security problems. For
particular, some UTF-8 decoders do not check against overlong byte example, some UTF-8 decoders do not check against overlong byte
sequences. As an example, a "/" is encoded with the byte 0x2F both sequences. See [UTR36] Section 3 for details.
in UTF-8 and in US-ASCII, but some UTF-8 decoders also wrongly
interpret the sequence 0xC0 0xAF as a "/". A sequence such as
"%C0%AF.." may pass some security tests and then be interpreted as
"/.." in a path if UTF-8 decoders are fault-tolerant, if conversion
and checking are not done in the right order, and/or if reserved
characters and unreserved characters are not clearly distinguished.
There are various ways in which "spoofing" can occur with IRIs.
"Spoofing" means that somebody may add a resource name that looks the
same or similar to the user, but that points to a different resource.
The added resource may pretend to be the real resource by looking
very similar but may contain all kinds of changes that may be
difficult to spot and that can cause all kinds of problems. Most
spoofing possibilities for IRIs are extensions of those for URIs.
Spoofing can occur for various reasons. First, a user's
normalization expectations or actual normalization when entering an
IRI or transcoding an IRI from a legacy character encoding do not
match the normalization used on the server side. Conceptually, this
is no different from the problems surrounding the use of case-
insensitive web servers. For example, a popular web page with a
mixed-case name ("http://big.example.com/PopularPage.html") might be
"spoofed" by someone who is able to create
"http://big.example.com/popularpage.html". However, the use of
unnormalized character sequences, and of additional mappings for user
convenience, may increase the chance for spoofing. Protocols and
servers that allow the creation of resources with names that are not
normalized are particularly vulnerable to such attacks. This is an
inherent security problem of the relevant protocol, server, or
resource and is not specific to IRIs, but it is mentioned here for
completeness.
Spoofing can occur in various IRI components, such as the domain name
part or a path part. For considerations specific to the domain name
part, see [RFC3491]. For the path part, administrators of sites that
allow independent users to create resources in the same sub area may
have to be careful to check for spoofing.
Spoofing can occur because in the UCS many characters look very There are serious difficulties with relying on a human to verify that
similar. Details are discussed in Section 8.5. Again, this is very a an IRI (whether presented visually or aurally) is the same as
similar to spoofing possibilities on US-ASCII, e.g., using "br0ken" another IRI or is the one intended. These problems exist with ASCII-
or "1ame" URIs. only URIs (bl00mberg.com vs. bloomberg.com) but are strongly
exacerbated when using the much larger character repertoire of
Unicode. For details, see Section 2 of [UTR36]. Using
administrative and technical means to reduce the availability of such
exploits is possible, but they are difficult to eliminate altogether.
User agents SHOULD NOT rely on visual or perceptual comparison or
verification of IRIs as a means of validating or assuring safety,
correctness or appropriateness of an IRI. Other means of presenting
users with the validity, safety, or appropriateness of visited sites
are being developed in the browser community as an alternative means
of avoiding these difficulties.
Spoofing can occur when URIs with percent-encodings based on various Besides the large character repertoire of Unicode, reasons for
character encodings are accepted to deal with older user agents. In confusion include different forms of normalization and different
some cases, particularly for Latin-based resource names, this is normalization expectations, use of percent-encoding with various
usually easy to detect because UTF-8-encoded names, when interpreted legacy encodings, and bidirectionality issues. See also [UTR36].
and viewed as legacy character encodings, produce mostly garbage.
When concurrently used character encodings have a similar structure Confusion can occur in various IRI components, such as the domain
but there are no characters that have exactly the same encoding, name part or the path part, or between IRI components. For
detection is more difficult. considerations specific to the domain name part, see [RFC5890]. For
considerations specific to particular protocols or schemes, see the
security sections of the relevant specifications and registration
templates. Administrators of sites that allow independent users to
create resources in the same sub area have to be careful. Details
are discussed in Section 8.5.
Spoofing can occur with bidirectional IRIs, if the restrictions in Confusion can occur with bidirectional IRIs, if the restrictions in
Section 4.2 are not followed. The same visual representation may be Section 4.2 are not followed. The same visual representation may be
interpreted as different logical representations, and vice versa. It interpreted as different logical representations, and vice versa. It
is also very important that a correct Unicode bidirectional is also very important that a correct Unicode bidirectional
implementation be used. implementation be used.
The use of Legacy Extended IRIs introduces additional security The characters additionally allowed in Legacy Extended IRIs introduce
issues. additional security issues. For details, see Section 7.3.
11. Acknowledgements 11. Acknowledgements
This document was derived from [RFC3987]; the acknowledgments from This document was derived from [RFC3987]; the acknowledgments from
that specification still apply. that specification still apply.
We would like to thank Ian Hickson, Michael Sperberg-McQueen, and Dan We would like to thank Ian Hickson, Michael Sperberg-McQueen, and Dan
Connolly for their work on HyperText References, and Norman Walsh, Connolly for their work on HyperText References, and Norman Walsh,
Richard Tobin, Henry S. Thomson, John Cowan, Paul Grosso, and the XML Richard Tobin, Henry S. Thomson, John Cowan, Paul Grosso, and the XML
Core Working Group of the W3C for their work on LEIRIs. Core Working Group of the W3C for their work on LEIRIs.
skipping to change at page 48, line 7 skipping to change at page 47, line 31
In addition, this document was influenced by contributions from (in In addition, this document was influenced by contributions from (in
no particular order) Chris Lilley, Bjoern Hoehrmann, Felix Sasaki, no particular order) Chris Lilley, Bjoern Hoehrmann, Felix Sasaki,
Jeremy Carroll, Frank Ellermann, Michael Everson, Cary Karp, Jeremy Carroll, Frank Ellermann, Michael Everson, Cary Karp,
Matitiahu Allouche, Richard Ishida, Addison Phillips, Jonathan Matitiahu Allouche, Richard Ishida, Addison Phillips, Jonathan
Rosenne, Najib Tounsi, Debbie Garside, Mark Davis, Sarmad Hussain, Rosenne, Najib Tounsi, Debbie Garside, Mark Davis, Sarmad Hussain,
Ted Hardie, Konrad Lanz, Thomas Roessler, Lisa Dusseault, Julian Ted Hardie, Konrad Lanz, Thomas Roessler, Lisa Dusseault, Julian
Reschke, Giovanni Campagna, Anne van Kesteren, Mark Nottingham, Erik Reschke, Giovanni Campagna, Anne van Kesteren, Mark Nottingham, Erik
van der Poel, Marcin Hanclik, Marcos Caceres, Roy Fielding, Greg van der Poel, Marcin Hanclik, Marcos Caceres, Roy Fielding, Greg
Wilkins, Pieter Hintjens, Daniel R. Tobias, Marko Martin, Maciej Wilkins, Pieter Hintjens, Daniel R. Tobias, Marko Martin, Maciej
Stanchowiak, Wil Tan, Yui Naruse, Michael A. Puls II, Dave Thaler, Stanchowiak, Wil Tan, Yui Naruse, Michael A. Puls II, Dave Thaler,
Tom Perch, John Klensin, Shawn Steele, Peter Saint-Andre, Geoffrey Tom Petch, John Klensin, Shawn Steele, Peter Saint-Andre, Geoffrey
Sneddon, Chris Weber, Alex Melnikov, Slim Amamou, SM, Tim Berners- Sneddon, Chris Weber, Alex Melnikov, Slim Amamou, SM, Tim Berners-
Lee, Yaron Goland, Sam Ruby, Adam Barth, Abdulrahman I. ALGhadir, Lee, Yaron Goland, Sam Ruby, Adam Barth, Abdulrahman I. ALGhadir,
Aharon Lanin, Thomas Milo, Murray Sargent, Marc Blanchet, and Mykyta Aharon Lanin, Thomas Milo, Murray Sargent, Marc Blanchet, and Mykyta
Yevstifeyev. Yevstifeyev.
12. Main Changes Since RFC 3987 12. Main Changes Since RFC 3987
This section describes the main changes since [RFC3987]. This section describes the main changes since [RFC3987].
12.1. Major restructuring of IRI processing model 12.1. Major restructuring of IRI processing model
skipping to change at page 54, line 17 skipping to change at page 53, line 45
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRIs)", RFC 3987, January 2005. Identifiers (IRIs)", RFC 3987, January 2005.
[RFC4395bis] [RFC4395bis]
Hansen, T., Hardie, T., and L. Masinter, "Guidelines and Hansen, T., Hardie, T., and L. Masinter, "Guidelines and
Registration Procedures for New URI/IRI Schemes", Registration Procedures for New URI/IRI Schemes",
draft-hansen-iri-4395bis-irireg-00 (work in progress), draft-hansen-iri-4395bis-irireg-00 (work in progress),
September 2010. September 2010.
[RFC6055] Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on
Encodings for Internationalized Domain Names", RFC 6055,
February 2011.
[UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other [UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other
Markup Languages", Unicode Technical Report #20, World Markup Languages", Unicode Technical Report #20, World
Wide Web Consortium Note, June 2003, Wide Web Consortium Note, June 2003,
<http://www.w3.org/TR/unicode-xml/>. <http://www.w3.org/TR/unicode-xml/>.
[UTR36] Davis, M. and M. Suignard, "Unicode Security [UTR36] Davis, M. and M. Suignard, "Unicode Security
Considerations", Unicode Technical Report #36, Considerations", Unicode Technical Report #36,
August 2010, <http://unicode.org/reports/tr36/>. August 2010, <http://unicode.org/reports/tr36/>.
[XLink] DeRose, S., Maler, E., and D. Orchard, "XML Linking [XLink] DeRose, S., Maler, E., and D. Orchard, "XML Linking
 End of changes. 38 change blocks. 
129 lines changed or deleted 128 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/