draft-ietf-iri-bidi-guidelines-02.txt | draft-ietf-iri-bidi-guidelines-03.txt | |||
---|---|---|---|---|
Internationalized Resource Identifiers M. Duerst | Internationalized Resource Identifiers M. Duerst | |||
(iri) Aoyama Gakuin University | (iri) Aoyama Gakuin University | |||
Internet-Draft L. Masinter | Internet-Draft L. Masinter | |||
Intended status: BCP Adobe | Intended status: BCP Adobe | |||
Expires: September 10, 2012 A. Allawi | Expires: April 24, 2013 A. Allawi | |||
Diwan Software Limited | Diwan Software Limited | |||
March 9, 2012 | October 21, 2012 | |||
Guidelines for Internationalized Resource Identifiers with Bi- | Guidelines for Internationalized Resource Identifiers with Bi- | |||
directional Characters (Bidi IRIs) | directional Characters (Bidi IRIs) | |||
draft-ietf-iri-bidi-guidelines-02 | draft-ietf-iri-bidi-guidelines-03 | |||
Abstract | Abstract | |||
This specification gives guidelines for selection, use, and | This specification gives guidelines for selection, use, and | |||
presentation of International Resource Identifiers (IRIs) which | presentation of International Resource Identifiers (IRIs) which | |||
include characters with inherent right-to-left (rtl) writing | include characters with inherent right-to-left (rtl) writing | |||
direction. | direction. | |||
Status of this Memo | Status of this Memo | |||
skipping to change at page 1, line 37 | skipping to change at page 1, line 37 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on September 10, 2012. | This Internet-Draft will expire on April 24, 2013. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 22 | skipping to change at page 2, line 22 | |||
modifications of such material outside the IETF Standards Process. | modifications of such material outside the IETF Standards Process. | |||
Without obtaining an adequate license from the person(s) controlling | Without obtaining an adequate license from the person(s) controlling | |||
the copyright in such materials, this document may not be modified | the copyright in such materials, this document may not be modified | |||
outside the IETF Standards Process, and derivative works of it may | outside the IETF Standards Process, and derivative works of it may | |||
not be created outside the IETF Standards Process, except to format | not be created outside the IETF Standards Process, except to format | |||
it for publication as an RFC or to translate it into languages other | it for publication as an RFC or to translate it into languages other | |||
than English. | than English. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
2. Logical Storage and Visual Presentation . . . . . . . . . . . . 3 | 1.2. Availability . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
3. Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . . . 5 | 1.3. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
4. Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . . . 6 | 2. Logical Storage and Visual Presentation . . . . . . . . . . . 4 | |||
5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 3. Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . . . 5 | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 | 4. Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . . . 6 | |||
7. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 | 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | |||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 | |||
9.1. Normative References . . . . . . . . . . . . . . . . . . . 9 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
9.2. Informative References . . . . . . . . . . . . . . . . . . 9 | 9. Main Changes Since RFC 3987 . . . . . . . . . . . . . . . . . 9 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . . 9 | ||||
10.2. Informative References . . . . . . . . . . . . . . . . . . 10 | ||||
Appendix A. List of ASCII Symbols and their Bidirectional | ||||
Character Types . . . . . . . . . . . . . . . . . . . 10 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 | ||||
1. Introduction | 1. Introduction | |||
1.1. Overview | ||||
Some UCS characters, such as those used in the Arabic and Hebrew | Some UCS characters, such as those used in the Arabic and Hebrew | |||
scripts, have an inherent right-to-left (rtl) writing direction as | scripts, have an inherent right-to-left (rtl) writing direction as | |||
opposed to characters, such as those in Latin scripts, that have an | opposed to characters, such as those in the Latin script, that have | |||
inherent left-to-right (ltr) direction. IRIs containing rtl | an inherent left-to-right (ltr) direction. IRIs containing rtl | |||
characters (called bidirectional IRIs or Bidi IRIs) require | characters (called bidirectional IRIs or Bidi IRIs) require | |||
additional attention because of the non-trivial relation between | additional attention because of the non-trivial relation between | |||
their logical and visual ordering. The logical order represents the | their logical and visual ordering. The logical order represents the | |||
order in which the characters are read and stored on computers. The | order in which characters are stored on computers and read by people. | |||
visual order represents the order the characters are drawn on a | The visual order is the order in which the characters appear (or are | |||
computer display or printout in the way a human expects to read them. | expected to appear) on a computer display or printout. | |||
Generally, alphabetic characters in scripts like Arabic and Hebrew | Generally, alphabetic characters in scripts like Arabic and Hebrew | |||
are drawn rtl while numbers are drawn ltr. Symbols, such as slash | are drawn rtl while numbers are drawn ltr. Symbols such as slash | |||
'/' and period '.' take their visual direction from the surrounding | ('/') and period ('.') take their visual direction from the | |||
chracters. | surrounding characters. A list of all ASCII symbols with their | |||
bidirectional character type and their function in URIs and IRIs is | ||||
given in Appendix A. | ||||
Because of this complex interaction between the logical | Because of this complex interaction between the logical | |||
representation, the visual representation, and the syntax of a Bidi | representation, the visual representation, and the syntax of a Bidi | |||
IRI, a balance is needed between various requirements. The main | IRI, a balance is needed between various requirements. The main | |||
requirements are: | requirements are: | |||
1. user-predictable conversion between visual and logical | 1. user-predictable conversion between visual and logical | |||
representation; | representation; | |||
2. the ability to include a wide range of characters in various parts | 2. the ability to include a wide range of characters in various parts | |||
of the IRI; and | of the IRI; and | |||
3. minor or no changes or restrictions for implementations. | 3. minor or no changes or restrictions for implementations. | |||
1.1. Notation | 1.2. Availability | |||
In this document, "Bidi Notation" is used for the given Bidi IRI | This document is available in (line-printer ready) plaintext ASCII | |||
examples as follows: Lower case letters a-z stand for characters that | and in PDF. It is also available in HTML from | |||
are written with a left to right ordering (such as Latin characters), | http://www.sw.it.aoyama.ac.jp/2012/pub/ | |||
whereas upper case letters A-Z represent characters that are written | draft-ietf-iri-bidi-guidelines-03.html, and in UTF-8 plaintext from | |||
right to left (such as Arqbic or Hebrew characters). Numbers and | http://www.sw.it.aoyama.ac.jp/2012/pub/ | |||
symbols are the same. | draft-ietf-iri-bidi-guidelines-03.utf8.txt. While all these versions | |||
are identical in their technical content, the HTML, PDF, and UTF-8 | ||||
plaintext versions show non-Unicode characters directly. This often | ||||
makes it easier to understand examples, and readers are therefore | ||||
strongly advised to consult one of these versions in preference to or | ||||
as a supplement to the ASCII version. | ||||
1.3. Notation | ||||
In this document, "Bidi Notation", abbreviated "BN" is used for the | ||||
given Bidi IRI examples as follows: Lower case letters a-z stand for | ||||
characters that are written with a left to right ordering (such as | ||||
Latin characters), whereas upper case letters A-Z represent | ||||
characters that are written right to left (such as Arabic or Hebrew | ||||
characters). Numbers and symbols are the same. | ||||
In this document, the key words "MUST", "MUST NOT", "REQUIRED", | In this document, the key words "MUST", "MUST NOT", "REQUIRED", | |||
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", | "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", | |||
and "OPTIONAL" are to be interpreted as described in [RFC2119]. | and "OPTIONAL" are to be interpreted as described in [RFC2119]. | |||
2. Logical Storage and Visual Presentation | 2. Logical Storage and Visual Presentation | |||
When stored or transmitted in digital representation, Bidi IRIs MUST | When stored or transmitted in digital representation, Bidi IRIs MUST | |||
be in full logical order and MUST conform to the IRI syntax rules | be in full logical order and MUST conform to the IRI syntax rules | |||
(which includes the rules relevant to their scheme). This ensures | (which includes the rules relevant to their scheme). This ensures | |||
skipping to change at page 4, line 22 | skipping to change at page 4, line 39 | |||
In conformance with the Unicode Bidirectional Algorithm, embedding | In conformance with the Unicode Bidirectional Algorithm, embedding | |||
MAY be done in one of two ways: | MAY be done in one of two ways: | |||
1. precede the IRI with U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and | 1. precede the IRI with U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and | |||
follow with U+202C, POP DIRECTIONAL FORMATTING (PDF); or | follow with U+202C, POP DIRECTIONAL FORMATTING (PDF); or | |||
2. use a higher-level protocol (e.g., the dir='ltr' attribute in | 2. use a higher-level protocol (e.g., the dir='ltr' attribute in | |||
HTML). | HTML). | |||
Preceding and following the Bidi IRI with U+200E, LEFT-TO-RIGHT MARK | Preceding and following the Bidi IRI with U+200E, LEFT-TO-RIGHT MARK | |||
(LRM). Is NOT RECOMMENDED as, there are cases where this may not be | (LRM) is NOT RECOMMENDED as, there are cases where this may not be | |||
sufficient to match full left to right embedding. | sufficient to match full left to right embedding. | |||
There is no requirement to use embedding if the display is still the | There is no requirement to use embedding if the display is still the | |||
same without the embedding. For example, a Bidi IRI in a text with | same without the embedding. For example, a Bidi IRI in a text with | |||
left-to-right base directionality (such as used for English or | left-to-right base directionality (such as used for English or | |||
Cyrillic) that is preceded and followed by whitespace and strong | Cyrillic) that is preceded and followed by whitespace and strong | |||
left-to-right characters does not need an embedding. Also, a | left-to-right characters does not need an embedding. Also, a | |||
bidirectional relative IRI reference that only contains strong right- | bidirectional relative IRI reference that only contains strong right- | |||
to-left characters and weak characters (such as symbols) and that | to-left characters and weak characters (such as symbols) and that | |||
starts and ends with a strong right-to-left character and appears in | starts and ends with a strong right-to-left character and appears in | |||
a text with right-to-left base directionality (such as used for | a text with right-to-left base directionality (such as used for | |||
Arabic or Hebrew) and is preceded and followed by whitespace and | Arabic or Hebrew) and is preceded and followed by whitespace and | |||
strong characters does not need an embedding. | strong characters does not need an embedding. | |||
However, Implementers are, RECOMMENDED to use embedding in all cases | However, implementers are RECOMMENDED to use embedding in all cases | |||
where they are not completely sure that the display behavior is | where they are not completely sure that the display behavior is | |||
unaffected without the embedding. | unaffected without the embedding. | |||
The Unicode Bidirectional Algorithm ([UNI9], section 4.3) permits | The Unicode Bidirectional Algorithm ([UNI9], section 4.3) permits | |||
higher-level protocols to influence bidirectional rendering. Such | higher-level protocols to influence bidirectional rendering. Such | |||
changes by higher-level protocols MUST NOT be used if they change the | changes by higher-level protocols MUST NOT be used if they change the | |||
rendering of IRIs. | rendering of IRIs. | |||
The bidirectional formatting characters that may be used before or | The bidirectional formatting characters that may be used before or | |||
after the IRI to ensure correct display are not themselves part of | after the IRI to ensure correct display are not themselves part of | |||
the IRI. IRIs MUST NOT contain bidirectional formatting characters | the IRI. IRIs MUST NOT contain bidirectional formatting characters | |||
(LRM, RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual | (LRM, RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual | |||
rendering of the IRI but do not appear themselves. It would | rendering of the IRI but do not appear themselves. It would | |||
therefore not be possible to input an IRI with such characters | therefore not be possible to input an IRI with such characters | |||
correctly. | correctly. | |||
3. Bidi IRI Structure | 3. Bidi IRI Structure | |||
The Unicode Bidirectional Algorithm is designed mainly for plain | The Unicode Bidirectional Algorithm is designed for general purpose | |||
text. To make sure that it does not affect the rendering of Bidi | text. To make sure that it does not affect the rendering of Bidi | |||
IRIs outside of the requirements of this document, some restrictions | IRIs outside of the requirements of this document, some restrictions | |||
on Bidi IRIs are necessary. These restrictions are given in terms of | on Bidi IRIs are necessary. These restrictions are given in terms of | |||
delimiters (structural characters, mostly punctuation such as "@", | delimiters (structural characters, mostly punctuation such as "@", | |||
".", ":", and "/") and components (usually consisting mostly of | ".", ":", and "/") and components (usually consisting mostly of | |||
letters and digits). | letters and digits). | |||
The following syntax rules from the ABNF of [RFC3987bis] correspond | The following syntax rules from the ABNF of [RFC3987bis] correspond | |||
to components for the purpose of Bidi behavior: iuserinfo, ireg-name, | to components for the purpose of Bidi behavior: iuserinfo, ireg-name, | |||
isegment, isegment-nz, isegment-nz-nc, ireg-name, iquery, and | isegment, isegment-nz, isegment-nz-nc, ireg-name, iquery, and | |||
skipping to change at page 5, line 38 | skipping to change at page 6, line 8 | |||
to apply the relevant restrictions. For example, for the usual name/ | to apply the relevant restrictions. For example, for the usual name/ | |||
value syntax in query parts, it is convenient to treat each name and | value syntax in query parts, it is convenient to treat each name and | |||
each value as a component. As another example, the extensions in a | each value as a component. As another example, the extensions in a | |||
resource name can be treated as separate components. | resource name can be treated as separate components. | |||
For each component, the following restrictions apply: | For each component, the following restrictions apply: | |||
1. A component SHOULD NOT use both right-to-left and left-to-right | 1. A component SHOULD NOT use both right-to-left and left-to-right | |||
characters. | characters. | |||
2. A component using right-to-left characters SHOULD start and end | 2. A component using right-to-left characters SHOULD start with a | |||
with right-to-left characters. | right-to-left character, and end with a right-to-left character | |||
potentially followed by one or more nonspacing mark (bidi class | ||||
NSM). | ||||
The above restrictions are given as "SHOULD"s, rather than as | The above restrictions are given as "SHOULD"s, rather than as | |||
"MUST"s. For IRIs that are never presented visually, they are not | "MUST"s. For IRIs that are never presented visually, they are not | |||
relevant. However, for IRIs in general, they are very important to | relevant. However, for IRIs in general, they are very important to | |||
ensure consistent conversion between visual presentation and logical | ensure consistent conversion between visual presentation and logical | |||
representation, in both directions. | representation, in both directions. | |||
Note: In some components, the above restrictions may actually be | Note: In some components, the above restrictions may actually be | |||
strictly enforced. For example, [RFC3490] requires that these | strictly enforced. For example, [RFC3490] requires that these | |||
restrictions apply to the labels of a host name for those schemes | restrictions apply to the labels of a host name for those schemes | |||
skipping to change at page 6, line 36 | skipping to change at page 6, line 50 | |||
user confusion. | user confusion. | |||
5. Examples | 5. Examples | |||
This section gives examples of Bidi IRIs in Bidi Notation. It shows | This section gives examples of Bidi IRIs in Bidi Notation. It shows | |||
legal IRIs with the relationship between their logical and visual | legal IRIs with the relationship between their logical and visual | |||
representation and explains how certain phenomena in this | representation and explains how certain phenomena in this | |||
relationship may look strange to somebody not familiar with | relationship may look strange to somebody not familiar with | |||
bidirectional behavior, but familiar to users of Arabic and Hebrew. | bidirectional behavior, but familiar to users of Arabic and Hebrew. | |||
It also shows what happens if the restrictions given in Section 3 are | It also shows what happens if the restrictions given in Section 3 are | |||
not followed. The examples below can be seen at [BidiEx], in Arabic, | not followed. Please see <Availability> for versions of the examples | |||
Hebrew, and Bidi Notation variants. | in Arabic and Hebrew script. | |||
To read the bidi text in the examples, read the visual representation | To read the bidi text in the examples, read the visual representation | |||
from left to right until you encounter a block of rtl text. Read the | from left to right until you encounter a block of rtl text. Read the | |||
rtl block (including slashes and other special characters) from right | rtl block (including slashes and other special characters) from right | |||
to left, then continue at the next unread ltr character. | to left, then continue at the next unread ltr character. | |||
Please note that "BN" stands for "Bidi Notation", see <Notation>. AR | ||||
stands for Arabic, HE for Hebrew. | ||||
Example 1: A single component with rtl characters is inverted: | Example 1: A single component with rtl characters is inverted: | |||
Logical representation: "http://ab.CDEFGH.ij/kl/mn/op.html" | Logical representation (BN): "http://ab.CDEFGH.ij/kl/mn/op.html" | |||
Visual representation: "http://ab.HGFEDC.ij/kl/mn/op.html" | Visual representation (BN): "http://ab.HGFEDC.ij/kl/mn/op.html" | |||
Components can be read one by one, and each component can be read in | Components can be read one by one, and each component can be read in | |||
its natural direction. | its natural direction. | |||
Example 2: More than one consecutive component with rtl characters is | Example 2: More than one consecutive component with rtl characters is | |||
inverted as a whole: | inverted as a whole: | |||
Logical representation: "http://ab.CDE.FGH/ij/kl/mn/op.html" | Logical representation (BN): "http://ab.CDE.FGH/ij/kl/mn/op.html" | |||
Visual representation: "http://ab.HGF.EDC/ij/kl/mn/op.html" | Visual representation (BN): "http://ab.HGF.EDC/ij/kl/mn/op.html" | |||
A sequence of rtl components is read rtl, in the same way as a | A sequence of rtl components is read rtl, in the same way as a | |||
sequence of rtl words is read rtl in a bidi text. | sequence of rtl words is read rtl in a bidi text. | |||
Example 3: All components of an IRI (except for the scheme) are rtl. | Example 3: All components of an IRI (except for the scheme) are rtl. | |||
All rtl components are inverted overall: | All rtl components are inverted overall: | |||
Logical representation: "http://AB.CD.EF/GH/IJ/KL?MN=OP;QR=ST#UV" | Logical representation (BN): | |||
Visual representation: "http://VU#TS=RQ;PO=NM?LK/JI/HG/FE.DC.BA" | "http://AB.CD.EF/GH/IJ/KL?MN=OP;QR=ST#UV" | |||
Visual representation (BN): "http://VU#TS=RQ;PO=NM?LK/JI/HG/FE.DC.BA" | ||||
The whole IRI (except the scheme) is read rtl. Delimiters between | The whole IRI (except the scheme) is read rtl. Delimiters between | |||
rtl components stay between the respective components; delimiters | rtl components stay between the respective components; delimiters | |||
between ltr and rtl components don't move. | between ltr and rtl components don't move. | |||
Example 4: Each of several sequences of rtl components is inverted on | Example 4: Each of several sequences of rtl components is inverted on | |||
its own: | its own: | |||
Logical representation: "http://AB.CD.ef/gh/IJ/KL.html" | Logical representation (BN): "http://AB.CD.ef/gh/IJ/KL.html" | |||
Visual representation: "http://DC.BA.ef/gh/LK/JI.html" | Visual representation (BN): "http://DC.BA.ef/gh/LK/JI.html" | |||
Each sequence of rtl components is read rtl, in the same way as each | Each sequence of rtl components is read rtl, in the same way as each | |||
sequence of rtl words in an ltr text is read rtl. | sequence of rtl words in an ltr text is read rtl. | |||
Example 5: Example 2, applied to components of different kinds: | Example 5: Example 2, applied to components of different kinds: | |||
Logical representation: "http://ab.cd.EF/GH/ij/kl.html" | Logical representation (BN): "http://ab.cd.EF/GH/ij/kl.html" | |||
Visual representation: "http://ab.cd.HG/FE/ij/kl.html" | Visual representation (BN): "http://ab.cd.HG/FE/ij/kl.html" | |||
The inversion of the domain name label and the path component may be | The inversion of the domain name label and the path component may be | |||
unexpected, but it is consistent with other bidi behavior. For | unexpected, but it is consistent with other bidi behavior. For | |||
reassurance that the domain component really is "ab.cd.EF", it may be | reassurance that the domain component really is "ab.cd.EF", it may be | |||
helpful to read aloud the visual representation following the Unicode | helpful to read aloud the visual representation following the Unicode | |||
Bidirectional Algorithm. After "http://ab.cd." one reads the RTL | Bidirectional Algorithm. After "http://ab.cd." one reads the RTL | |||
block "E-F-slash-G-H", which corresponds to the logical | block "E-F-slash-G-H", which corresponds to the logical | |||
representation. | representation. | |||
Example 6: Same as Example 5, with more rtl components: | Example 6: Same as Example 5, with more rtl components: | |||
Logical representation: "http://ab.CD.EF/GH/IJ/kl.html" | Logical representation (BN): "http://ab.CD.EF/GH/IJ/kl.html" | |||
Visual representation: "http://ab.JI/HG/FE.DC/kl.html" | Visual representation (BN): "http://ab.JI/HG/FE.DC/kl.html" | |||
The inversion of the domain name labels and the path components may | The inversion of the domain name labels and the path components may | |||
be easier to identify because the delimiters also move. | be easier to identify because the delimiters also move. | |||
Example 7: A single rtl component includes digits: | Example 7: A single rtl component includes digits: | |||
Logical representation: "http://ab.CDE123FGH.ij/kl/mn/op.html" | Logical representation (BN): "http://ab.CDE123FGH.ij/kl/mn/op.html" | |||
Visual representation: "http://ab.HGF123EDC.ij/kl/mn/op.html" | Visual representation (BN): "http://ab.HGF123EDC.ij/kl/mn/op.html" | |||
Numbers are written ltr in all cases but are treated as an additional | Numbers are written ltr in all cases but are treated as an additional | |||
embedding inside a run of rtl characters. This is completely | embedding inside a run of rtl characters. This is completely | |||
consistent with usual bidirectional text. | consistent with usual bidirectional text. | |||
Example 8 (not allowed): Numbers are at the start or end of an rtl | Example 8 (not allowed): Numbers are at the start or end of an rtl | |||
component: | component: | |||
Logical representation: "http://ab.cd.ef/GH1/2IJ/KL.html" | Logical representation (BN): "http://ab.cd.ef/GH1/2IJ/KL.html" | |||
Visual representation: "http://ab.cd.ef/LK/JI1/2HG.html" | Visual representation (BN): "http://ab.cd.ef/LK/JI1/2HG.html" | |||
The sequence "1/2" is interpreted by the Bidirectional Algorithm as a | The sequence "1/2" is interpreted by the Bidirectional Algorithm as a | |||
fraction, fragmenting the components and leading to confusion. There | fraction, fragmenting the components and leading to confusion. There | |||
are other characters that are interpreted in a special way close to | are other characters that are interpreted in a special way close to | |||
numbers; in particular, "+", "-", "#", "$", "%", ",", ".", and ":". | numbers; in particular, "+", "-", "#", "$", "%", ",", ".", and ":". | |||
Example 9 (not allowed): The numbers in the previous example are | Example 9 (not allowed): The numbers in the previous example are | |||
percent-encoded: | percent-encoded: | |||
Logical representation: "http://ab.cd.ef/GH%31/%32IJ/KL.html", | Logical representation (BN): "http://ab.cd.ef/GH%31/%32IJ/KL.html" | |||
Visual representation: "http://ab.cd.ef/LK/JI%32/%31HG.html" | Visual representation (BN): "http://ab.cd.ef/LK/JI%32/%31HG.html" | |||
Example 10 (allowed but not recommended): | Example 10 (allowed but not recommended): | |||
Logical representation: "http://ab.CDEFGH.123/kl/mn/op.html" | Logical representation (BN): "http://ab.CDEFGH.123/kl/mn/op.html" | |||
Visual representation: "http://ab.123.HGFEDC/kl/mn/op.html" | Visual representation (BN): "http://ab.123.HGFEDC/kl/mn/op.html" | |||
Components consisting of only numbers are allowed (it would be rather | Components consisting of only numbers are allowed (it would be rather | |||
difficult to prohibit them), but these may interact with adjacent RTL | difficult to prohibit them), but these may interact with adjacent RTL | |||
components in ways that are not easy to predict. | components in ways that are not easy to predict. | |||
Example 11 (allowed but not recommended): | Example 11 (allowed but not recommended): | |||
Logical representation: "http://ab.CDEFGH.123ij/kl/mn/op.html" | Logical representation (BN): "http://ab.CDEFGH.123ij/kl/mn/op.html" | |||
Visual representation: "http://ab.123.HGFEDCij/kl/mn/op.html" | Visual representation (BN): "http://ab.123.HGFEDCij/kl/mn/op.html" | |||
Components consisting of numbers and left-to-right characters are | Components consisting of numbers and left-to-right characters are | |||
allowed, but these may interact with adjacent RTL components in ways | allowed, but these may interact with adjacent RTL components in ways | |||
that are not easy to predict. | that are not easy to predict. | |||
6. IANA Considerations | 6. IANA Considerations | |||
This document makes no changes to IANA registries. | This document makes no changes to IANA registries. | |||
7. Security Considerations | 7. Security Considerations | |||
Confusion can occur with bidirectional IRIs, if the restrictions in | Confusion can occur with bidirectional IRIs, if the restrictions in | |||
Section 3 are not followed. The same visual representation may be | Section 3 are not followed. The same visual representation may be | |||
interpreted as different logical representations, and vice versa. It | interpreted as different logical representations, and vice versa. It | |||
is also very important that a correct Unicode bidirectional | is also very important that a correct Unicode bidirectional | |||
implementation be used. | implementation be used. | |||
8. Acknowledgements | 8. Acknowledgements | |||
This document was derived from [RFC3987] and [RFC3987bis] and the | This document was derived from [RFC3987] and [RFC3987bis] and the | |||
acknowledgments of those documents apply. | acknowledgments of those documents apply. Shunsuke Oshima provided | |||
the data for Appendix A. | ||||
9. References | 9. Main Changes Since RFC 3987 | |||
9.1. Normative References | ||||
[ASCII] American National Standards Institute, "Coded Character | This section describes the main changes since [RFC3987]. | |||
Set -- 7-bit American Standard Code for Information | ||||
Interchange", ANSI X3.4, 1986. | ||||
[ISO10646] | o Separated out the section on bidi in [RFC3987] to this document. | |||
International Organization for Standardization, "ISO/IEC | ||||
10646:2003: Information Technology - Universal Multiple- | o Added examples in Arabic and Hebrew, which can be seen in html/ | |||
Octet Coded Character Set (UCS)", ISO Standard 10646, | pdf/utf8.txt versions. | |||
December 2003. | ||||
o Allowed NSMs at the end of components, for Dhivehi, Yiddish,... | ||||
o TODO: check for major changes between RFC3987 and draft -02. | ||||
Note to RFC Editor: Please remove this paragraph before publication. | ||||
Detailled change logs are available in the IETF tools subversion | ||||
repository at http://trac.tools.ietf.org/wg/iri/trac/log/ | ||||
draft-ietf-iri-3987bis/draft-ietf-iri-bidi-guidelines.xml. | ||||
10. References | ||||
10.1. Normative References | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | |||
"Internationalizing Domain Names in Applications (IDNA)", | "Internationalizing Domain Names in Applications (IDNA)", | |||
RFC 3490, March 2003. | RFC 3490, March 2003. | |||
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | ||||
Profile for Internationalized Domain Names (IDN)", | ||||
RFC 3491, March 2003. | ||||
[RFC3987bis] | [RFC3987bis] | |||
Duerst, M., Masinter, L., and M. Suignard, | Duerst, M., Masinter, L., and M. Suignard, | |||
"Internationalized Resource Identifiers (IRIs)", | "Internationalized Resource Identifiers (IRIs)", | |||
August 2011, | October 2012, | |||
<http://tools.ietf.org/id/draft-ietf-iri-3987bis>. | <http://tools.ietf.org/id/draft-ietf-iri-3987bis>. | |||
[UNI9] Davis, M., "The Unicode Bidirectional Algorithm", Unicode | [UNI9] Davis, M., "The Unicode Bidirectional Algorithm", Unicode | |||
Standard Annex #9, March 2004, | Standard Annex #9, September 2012, | |||
<http://www.unicode.org/reports/tr9/tr9-13.html>. | <http://www.unicode.org/reports/tr9/tr9-27.html>. | |||
[UNIV6] The Unicode Consortium, "The Unicode Standard, Version | [UNIV6] The Unicode Consortium, "The Unicode Standard, Version | |||
6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, | 6.2.0 (Mountain View, CA, The Unicode Consortium, 2012, | |||
ISBN 978-1-936213-01-6)", October 2010. | ISBN 978-1-936213-07-8)", October 2012. | |||
9.2. Informative References | ||||
[BidiEx] "Examples of Bidi IRIs", | 10.2. Informative References | |||
<http://www.w3.org/International/iri-edit/BidiExamples>. | ||||
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource | [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource | |||
Identifiers (IRIs)", RFC 3987, January 2005. | Identifiers (IRIs)", RFC 3987, January 2005. | |||
Appendix A. List of ASCII Symbols and their Bidirectional Character | ||||
Types | ||||
To help understand the influence of various symbols on IRI display, | ||||
this appendix lists all of them, giving the character itself, the | ||||
Unicode codepoint, the character name, the bidirectional character | ||||
type (BCT) and the rule and relevance in the IRI syntax. | ||||
The most important ones in practice are ":", delimining schem and | ||||
port (CS, Common Number Separator), "/" to indicate generic | ||||
(hierarchical) schemes and as a path separator (CS, Common Number | ||||
Separator), "?" to introduce a query part (ON, Other Neutral), "#" to | ||||
introduce a fragment identifier (ET, European Number Terminator), "." | ||||
to separate labels in a domain name (CS, Common Number Separator), | ||||
"&" to separate form parameters (ON, Other Neutral), and "@" to | ||||
separate user information (ON, Other Neutral). | ||||
Char Codepoint Character Name BCT IRI syntax | ||||
------------------------------------------------------------- | ||||
"#" U+0023 NUMBER SIGN ET gen-delims, fragments | ||||
"/" U+002F SOLIDUS CS gen-delims, paths | ||||
":" U+003A COLON CS gen-delims, scheme, port | ||||
"?" U+003F QUESTION MARK ON gen-delims, query part | ||||
"@" U+0040 COMMERCIAL AT ON gen-delims, user | ||||
"[" U+005B LEFT SQUARE BRACKET ON gen-delims | ||||
"]" U+005D RIGHT SQUARE BRACKET ON gen-delims | ||||
"%" U+0025 PERCENT SIGN ET pcd-encoded | ||||
"!" U+0021 EXCLAMATION MARK ON sub-delims | ||||
"," U+002C COMMA CS sub-delims | ||||
"+" U+002B PLUS SIGN ES sub-delims | ||||
"$" U+0024 DOLLAR SIGN ET sub-delims | ||||
"(" U+0028 LEFT PARENTHESIS ON sub-delims | ||||
"'" U+0027 APOSTROPHE ON sub-delims | ||||
")" U+0029 RIGHT PARENTHESIS ON sub-delims | ||||
"*" U+002A ASTERISK ON sub-delims | ||||
";" U+003B SEMICOLON ON sub-delims | ||||
"=" U+003D EQUALS SIGN ON sub-delims, forms | ||||
"&" U+0026 AMPERSAND ON sub-delims, forms | ||||
"." U+002E FULL STOP CS unreserved, domain names | ||||
"-" U+002D HYPHEN-MINUS ES unreserved | ||||
"_" U+005F LOW LINE ON unreserved | ||||
"~" U+007E TILDE ON unreserved | ||||
" " U+0020 SPACE WS excluded, delim | ||||
'"' U+0022 QUOTATION MARK ON excluded, delim | ||||
"\" U+005C REVERSE SOLIDUS ON excluded, unwise | ||||
"^" U+005E CIRCUMFLEX ACCENT ON excluded, unwise | ||||
"<" U+003C LESS-THAN SIGN ON excluded, delim | ||||
">" U+003E GREATER-THAN SIGN ON excluded, delim | ||||
"`" U+0060 GRAVE ACCENT ON excluded, unwise | ||||
"|" U+007C VERTICAL LINE ON excluded, unwise | ||||
"{" U+007B LEFT CURLY BRACKET ON excluded, delim | ||||
"}" U+007D RIGHT CURLY BRACKET ON excluded, delim | ||||
Authors' Addresses | Authors' Addresses | |||
Martin Duerst | Martin J. Duerst (Note: Please write "Duerst" with u-umlaut wherever | |||
possible, for example as "Dürst" in XML and HTML.) | ||||
Aoyama Gakuin University | Aoyama Gakuin University | |||
5-10-1 Fuchinobe | 5-10-1 Fuchinobe | |||
Sagamihara, Kanagawa 229-8558 | Chuo-ku | |||
Sagamihara, Kanagawa 252-5258 | ||||
Japan | Japan | |||
Phone: +81 42 759 6329 | Phone: +81 42 759 6329 | |||
Fax: +81 42 759 6495 | Fax: +81 42 759 6495 | |||
Email: duerst@it.aoyama.ac.jp | Email: duerst@it.aoyama.ac.jp | |||
URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ | URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ | |||
(Note: This is the percent-encoded form of an IRI) | ||||
Larry Masinter | Larry Masinter | |||
Adobe | Adobe | |||
345 Park Ave | 345 Park Ave | |||
San Jose, CA 95110 | San Jose, CA 95110 | |||
U.S.A. | U.S.A. | |||
Phone: +1-408-536-3024 | Phone: +1-408-536-3024 | |||
Email: masinter@adobe.com | Email: masinter@adobe.com | |||
URI: http://larry.masinter.net | URI: http://larry.masinter.net | |||
End of changes. 41 change blocks. | ||||
87 lines changed or deleted | 175 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |