draft-ietf-iri-bidi-guidelines-01.txt   draft-ietf-iri-bidi-guidelines-02.txt 
Internationalized Resource Identifiers M. Duerst Internationalized Resource Identifiers M. Duerst
(iri) Aoyama Gakuin University (iri) Aoyama Gakuin University
Internet-Draft L. Masinter Internet-Draft L. Masinter
Intended status: BCP Adobe Intended status: BCP Adobe
Expires: September 3, 2012 A. Allawi Expires: September 10, 2012 A. Allawi
Diwan Software Limited Diwan Software Limited
March 2, 2012 March 9, 2012
Guidelines for Internationalized Resource Identifiers with Bi- Guidelines for Internationalized Resource Identifiers with Bi-
directional Characters (Bidi IRIs) directional Characters (Bidi IRIs)
draft-ietf-iri-bidi-guidelines-01 draft-ietf-iri-bidi-guidelines-02
Abstract Abstract
This specification gives guidelines for selection, use, presentation This specification gives guidelines for selection, use, and
of International Resource Identifiers (IRI) which include characters presentation of International Resource Identifiers (IRIs) which
with in inherent right-to-left (rtl) writing direction. include characters with inherent right-to-left (rtl) writing
direction.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 3, 2012. This Internet-Draft will expire on September 10, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 24 skipping to change at page 2, line 25
outside the IETF Standards Process, and derivative works of it may outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other it for publication as an RFC or to translate it into languages other
than English. than English.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Logical Storage and Visual Presentation . . . . . . . . . . . . 3 2. Logical Storage and Visual Presentation . . . . . . . . . . . . 3
3. Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . . . 4 3. Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . . . 5
4. Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . . . 6 4. Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . . . 6
5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
7. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 8
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9
9.1. Normative References . . . . . . . . . . . . . . . . . . . 8 9.1. Normative References . . . . . . . . . . . . . . . . . . . 9
9.2. Informative References . . . . . . . . . . . . . . . . . . 9 9.2. Informative References . . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9
1. Introduction 1. Introduction
Some UCS characters, such as those used in the Arabic and Hebrew Some UCS characters, such as those used in the Arabic and Hebrew
scripts, have an inherent right-to-left (rtl) writing direction. scripts, have an inherent right-to-left (rtl) writing direction as
IRIs containing these characters (called bidirectional IRIs or Bidi opposed to characters, such as those in Latin scripts, that have an
IRIs) require additional attention because of the non-trivial inherent left-to-right (ltr) direction. IRIs containing rtl
relation between logical representation (used for digital characters (called bidirectional IRIs or Bidi IRIs) require
representation and for reading/spelling) and visual representation additional attention because of the non-trivial relation between
(used for display/printing). their logical and visual ordering. The logical order represents the
order in which the characters are read and stored on computers. The
visual order represents the order the characters are drawn on a
computer display or printout in the way a human expects to read them.
Because of the complex interaction between the logical Generally, alphabetic characters in scripts like Arabic and Hebrew
are drawn rtl while numbers are drawn ltr. Symbols, such as slash
'/' and period '.' take their visual direction from the surrounding
chracters.
Because of this complex interaction between the logical
representation, the visual representation, and the syntax of a Bidi representation, the visual representation, and the syntax of a Bidi
IRI, a balance is needed between various requirements. The main IRI, a balance is needed between various requirements. The main
requirements are requirements are:
1. user-predictable conversion between visual and logical 1. user-predictable conversion between visual and logical
representation; representation;
2. the ability to include a wide range of characters in various parts 2. the ability to include a wide range of characters in various parts
of the IRI; and of the IRI; and
3. minor or no changes or restrictions for implementations. 3. minor or no changes or restrictions for implementations.
1.1. Notation 1.1. Notation
In this document, Bidi Notation is used for bidirectional examples: In this document, "Bidi Notation" is used for the given Bidi IRI
Lower case letters stand for Latin letters or other letters that are examples as follows: Lower case letters a-z stand for characters that
written left to right, whereas upper case letters represent Arabic or are written with a left to right ordering (such as Latin characters),
Hebrew letters that are written right to left. whereas upper case letters A-Z represent characters that are written
right to left (such as Arqbic or Hebrew characters). Numbers and
symbols are the same.
In this document, the key words "MUST", "MUST NOT", "REQUIRED", In this document, the key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" are to be interpreted as described in [RFC2119]. and "OPTIONAL" are to be interpreted as described in [RFC2119].
2. Logical Storage and Visual Presentation 2. Logical Storage and Visual Presentation
When stored or transmitted in digital representation, bidirectional When stored or transmitted in digital representation, Bidi IRIs MUST
IRIs MUST be in full logical order and MUST conform to the IRI syntax be in full logical order and MUST conform to the IRI syntax rules
rules (which includes the rules relevant to their scheme). This (which includes the rules relevant to their scheme). This ensures
ensures that bidirectional IRIs can be processed in the same way as that Bidi IRIs can be processed in the same way as other IRIs.
other IRIs.
Bidirectional IRIs MUST be rendered by using the Unicode Bidi IRIs MUST be visually ordered by the Unicode Bidirectional
Bidirectional Algorithm [UNIV6], [UNI9]. Bidirectional IRIs MUST be Algorithm [UNIV6], [UNI9]. Bidi IRIs MUST be rendered in the same
rendered in the same way as they would be if they were in a left-to- way as they would be if they were in a left-to-right embedding.
right embedding; i.e., as if they were preceded by U+202A, LEFT-TO-
RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL
FORMATTING (PDF). Setting the embedding direction can also be done
in a higher-level protocol (e.g., the dir='ltr' attribute in HTML).
There is no requirement to use the above embedding if the display is In conformance with the Unicode Bidirectional Algorithm, embedding
still the same without the embedding. For example, a bidirectional MAY be done in one of two ways:
IRI in a text with left-to-right base directionality (such as used
for English or Cyrillic) that is preceded and followed by whitespace
and strong left-to-right characters does not need an embedding.
Also, a bidirectional relative IRI reference that only contains
strong right-to-left characters and weak characters and that starts
and ends with a strong right-to-left character and appears in a text
with right-to-left base directionality (such as used for Arabic or
Hebrew) and is preceded and followed by whitespace and strong
characters does not need an embedding.
In some other cases, using U+200E, LEFT-TO-RIGHT MARK (LRM), may be 1. precede the IRI with U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and
sufficient to force the correct display behavior. However, the follow with U+202C, POP DIRECTIONAL FORMATTING (PDF); or
details of the Unicode Bidirectional algorithm are not always easy to
understand. Implementers are strongly advised to err on the side of 2. use a higher-level protocol (e.g., the dir='ltr' attribute in
caution and to use embedding in all cases where they are not HTML).
completely sure that the display behavior is unaffected without the
embedding. Preceding and following the Bidi IRI with U+200E, LEFT-TO-RIGHT MARK
(LRM). Is NOT RECOMMENDED as, there are cases where this may not be
sufficient to match full left to right embedding.
There is no requirement to use embedding if the display is still the
same without the embedding. For example, a Bidi IRI in a text with
left-to-right base directionality (such as used for English or
Cyrillic) that is preceded and followed by whitespace and strong
left-to-right characters does not need an embedding. Also, a
bidirectional relative IRI reference that only contains strong right-
to-left characters and weak characters (such as symbols) and that
starts and ends with a strong right-to-left character and appears in
a text with right-to-left base directionality (such as used for
Arabic or Hebrew) and is preceded and followed by whitespace and
strong characters does not need an embedding.
However, Implementers are, RECOMMENDED to use embedding in all cases
where they are not completely sure that the display behavior is
unaffected without the embedding.
The Unicode Bidirectional Algorithm ([UNI9], section 4.3) permits The Unicode Bidirectional Algorithm ([UNI9], section 4.3) permits
higher-level protocols to influence bidirectional rendering. Such higher-level protocols to influence bidirectional rendering. Such
changes by higher-level protocols MUST NOT be used if they change the changes by higher-level protocols MUST NOT be used if they change the
rendering of IRIs. rendering of IRIs.
The bidirectional formatting characters that may be used before or The bidirectional formatting characters that may be used before or
after the IRI to ensure correct display are not themselves part of after the IRI to ensure correct display are not themselves part of
the IRI. IRIs MUST NOT contain bidirectional formatting characters the IRI. IRIs MUST NOT contain bidirectional formatting characters
(LRM, RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual (LRM, RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual
rendering of the IRI but do not appear themselves. It would rendering of the IRI but do not appear themselves. It would
therefore not be possible to input an IRI with such characters therefore not be possible to input an IRI with such characters
correctly. correctly.
3. Bidi IRI Structure 3. Bidi IRI Structure
The Unicode Bidirectional Algorithm is designed mainly for running The Unicode Bidirectional Algorithm is designed mainly for plain
text. To make sure that it does not affect the rendering of text. To make sure that it does not affect the rendering of Bidi
bidirectional IRIs too much, some restrictions on bidirectional IRIs IRIs outside of the requirements of this document, some restrictions
are necessary. These restrictions are given in terms of delimiters on Bidi IRIs are necessary. These restrictions are given in terms of
(structural characters, mostly punctuation such as "@", ".", ":", and delimiters (structural characters, mostly punctuation such as "@",
"/") and components (usually consisting mostly of letters and ".", ":", and "/") and components (usually consisting mostly of
digits). letters and digits).
The following syntax rules from the ABNF of [RFC3987bis] correspond The following syntax rules from the ABNF of [RFC3987bis] correspond
to components for the purpose of Bidi behavior: iuserinfo, ireg-name, to components for the purpose of Bidi behavior: iuserinfo, ireg-name,
isegment, isegment-nz, isegment-nz-nc, ireg-name, iquery, and isegment, isegment-nz, isegment-nz-nc, ireg-name, iquery, and
ifragment. ifragment.
Specifications that define the syntax of any of the above components Specifications that define the syntax of any of the above components
MAY divide them further and define smaller parts to be components MAY divide them further and define smaller parts to be components
according to this document. As an example, the restrictions of according to this document. As an example, the restrictions of
[RFC3490] on bidirectional domain names correspond to treating each [RFC3490] on bidirectional domain names correspond to treating each
skipping to change at page 6, line 14 skipping to change at page 6, line 30
4. Input of Bidi IRIs 4. Input of Bidi IRIs
Bidi input methods MUST generate Bidi IRIs in logical order while Bidi input methods MUST generate Bidi IRIs in logical order while
rendering them according to Section 2. During input, rendering rendering them according to Section 2. During input, rendering
SHOULD be updated after every new character is input to avoid end- SHOULD be updated after every new character is input to avoid end-
user confusion. user confusion.
5. Examples 5. Examples
This section gives examples of bidirectional IRIs, in Bidi Notation. This section gives examples of Bidi IRIs in Bidi Notation. It shows
It shows legal IRIs with the relationship between logical and visual legal IRIs with the relationship between their logical and visual
representation and explains how certain phenomena in this representation and explains how certain phenomena in this
relationship may look strange to somebody not familiar with relationship may look strange to somebody not familiar with
bidirectional behavior, but familiar to users of Arabic and Hebrew. bidirectional behavior, but familiar to users of Arabic and Hebrew.
It also shows what happens if the restrictions given in Section 3 are It also shows what happens if the restrictions given in Section 3 are
not followed. The examples below can be seen at [BidiEx], in Arabic, not followed. The examples below can be seen at [BidiEx], in Arabic,
Hebrew, and Bidi Notation variants. Hebrew, and Bidi Notation variants.
To read the bidi text in the examples, read the visual representation To read the bidi text in the examples, read the visual representation
from left to right until you encounter a block of rtl text. Read the from left to right until you encounter a block of rtl text. Read the
rtl block (including slashes and other special characters) from right rtl block (including slashes and other special characters) from right
skipping to change at page 7, line 14 skipping to change at page 7, line 31
Visual representation: "http://DC.BA.ef/gh/LK/JI.html" Visual representation: "http://DC.BA.ef/gh/LK/JI.html"
Each sequence of rtl components is read rtl, in the same way as each Each sequence of rtl components is read rtl, in the same way as each
sequence of rtl words in an ltr text is read rtl. sequence of rtl words in an ltr text is read rtl.
Example 5: Example 2, applied to components of different kinds: Example 5: Example 2, applied to components of different kinds:
Logical representation: "http://ab.cd.EF/GH/ij/kl.html" Logical representation: "http://ab.cd.EF/GH/ij/kl.html"
Visual representation: "http://ab.cd.HG/FE/ij/kl.html" Visual representation: "http://ab.cd.HG/FE/ij/kl.html"
The inversion of the domain name label and the path component may be The inversion of the domain name label and the path component may be
unexpected, but it is consistent with other bidi behavior. For unexpected, but it is consistent with other bidi behavior. For
reassurance that the domain component really is "ab.cd.EF", it may be reassurance that the domain component really is "ab.cd.EF", it may be
helpful to read aloud the visual representation following the bidi helpful to read aloud the visual representation following the Unicode
algorithm. After "http://ab.cd." one reads the RTL block Bidirectional Algorithm. After "http://ab.cd." one reads the RTL
"E-F-slash-G-H", which corresponds to the logical representation. block "E-F-slash-G-H", which corresponds to the logical
representation.
Example 6: Same as Example 5, with more rtl components: Example 6: Same as Example 5, with more rtl components:
Logical representation: "http://ab.CD.EF/GH/IJ/kl.html" Logical representation: "http://ab.CD.EF/GH/IJ/kl.html"
Visual representation: "http://ab.JI/HG/FE.DC/kl.html" Visual representation: "http://ab.JI/HG/FE.DC/kl.html"
The inversion of the domain name labels and the path components may The inversion of the domain name labels and the path components may
be easier to identify because the delimiters also move. be easier to identify because the delimiters also move.
Example 7: A single rtl component includes digits: Example 7: A single rtl component includes digits:
Logical representation: "http://ab.CDE123FGH.ij/kl/mn/op.html" Logical representation: "http://ab.CDE123FGH.ij/kl/mn/op.html"
Visual representation: "http://ab.HGF123EDC.ij/kl/mn/op.html" Visual representation: "http://ab.HGF123EDC.ij/kl/mn/op.html"
Numbers are written ltr in all cases but are treated as an additional Numbers are written ltr in all cases but are treated as an additional
embedding inside a run of rtl characters. This is completely embedding inside a run of rtl characters. This is completely
consistent with usual bidirectional text. consistent with usual bidirectional text.
Example 8 (not allowed): Numbers are at the start or end of an rtl Example 8 (not allowed): Numbers are at the start or end of an rtl
component: component:
Logical representation: "http://ab.cd.ef/GH1/2IJ/KL.html" Logical representation: "http://ab.cd.ef/GH1/2IJ/KL.html"
Visual representation: "http://ab.cd.ef/LK/JI1/2HG.html" Visual representation: "http://ab.cd.ef/LK/JI1/2HG.html"
The sequence "1/2" is interpreted by the bidi algorithm as a The sequence "1/2" is interpreted by the Bidirectional Algorithm as a
fraction, fragmenting the components and leading to confusion. There fraction, fragmenting the components and leading to confusion. There
are other characters that are interpreted in a special way close to are other characters that are interpreted in a special way close to
numbers; in particular, "+", "-", "#", "$", "%", ",", ".", and ":". numbers; in particular, "+", "-", "#", "$", "%", ",", ".", and ":".
Example 9 (not allowed): The numbers in the previous example are Example 9 (not allowed): The numbers in the previous example are
percent-encoded: percent-encoded:
Logical representation: "http://ab.cd.ef/GH%31/%32IJ/KL.html", Logical representation: "http://ab.cd.ef/GH%31/%32IJ/KL.html",
Visual representation: "http://ab.cd.ef/LK/JI%32/%31HG.html" Visual representation: "http://ab.cd.ef/LK/JI%32/%31HG.html"
Example 10 (allowed but not recommended): Example 10 (allowed but not recommended):
skipping to change at page 9, line 15 skipping to change at page 9, line 33
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names (IDN)", Profile for Internationalized Domain Names (IDN)",
RFC 3491, March 2003. RFC 3491, March 2003.
[RFC3987bis] [RFC3987bis]
Duerst, M., Masinter, L., and M. Suignard, Duerst, M., Masinter, L., and M. Suignard,
"Internationalized Resource Identifiers (IRIs)", "Internationalized Resource Identifiers (IRIs)",
August 2011, August 2011,
<http://tools.ietf.org/id/draft-ietf-iri-3987bis>. <http://tools.ietf.org/id/draft-ietf-iri-3987bis>.
[UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard [UNI9] Davis, M., "The Unicode Bidirectional Algorithm", Unicode
Annex #9, March 2004, Standard Annex #9, March 2004,
<http://www.unicode.org/reports/tr9/tr9-13.html>. <http://www.unicode.org/reports/tr9/tr9-13.html>.
[UNIV6] The Unicode Consortium, "The Unicode Standard, Version [UNIV6] The Unicode Consortium, "The Unicode Standard, Version
6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011,
ISBN 978-1-936213-01-6)", October 2010. ISBN 978-1-936213-01-6)", October 2010.
9.2. Informative References 9.2. Informative References
[BidiEx] "Examples of bidirectional IRIs", [BidiEx] "Examples of Bidi IRIs",
<http://www.w3.org/International/iri-edit/BidiExamples>. <http://www.w3.org/International/iri-edit/BidiExamples>.
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRIs)", RFC 3987, January 2005. Identifiers (IRIs)", RFC 3987, January 2005.
Authors' Addresses Authors' Addresses
Martin Duerst Martin Duerst
Aoyama Gakuin University Aoyama Gakuin University
5-10-1 Fuchinobe 5-10-1 Fuchinobe
 End of changes. 21 change blocks. 
68 lines changed or deleted 84 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/