< draft-faltstrom-unicode11-07.txt   draft-faltstrom-unicode11-08.txt >
Network Working Group P. Faltstrom Network Working Group P. Faltstrom
Internet-Draft Netnod Internet-Draft Netnod
Intended status: Standards Track January 07, 2019 Intended status: Standards Track March 11, 2019
Expires: July 11, 2019 Expires: September 12, 2019
IDNA2008 and Unicode 11.0.0 IDNA2008 and Unicode 11.0.0
draft-faltstrom-unicode11-07 draft-faltstrom-unicode11-08
Abstract Abstract
This document describes the changes between Unicode 6.3.0 and Unicode This document describes the changes between Unicode 6.3.0 and Unicode
11.0.0 in the context of IDNA2008. It further suggests a path 11.0.0 in the context of IDNA2008. Some additions and changes have
forward for the IETF to ensure IDNA2008 follows the evolution of the been made in the Unicode Standard that affect the values produced by
Unicode Standard. the algorithm IDNA2008 specifies. Although IDNA2008 allows adding
exceptions to the algorithm for backward compatibility; however, this
Some changes have been made in the Unicode Standard related to the document does not add any such exceptions. This document provides
algorithm IDNA2008 specifies. IDNA2008 allows adding exceptions to the necessary tables to IANA to make its database consisstent with
the algorithm for backward compatibility; however, this document Unicode 11.0.0.
makes no such changes. Thus this document requests that IANA update
the tables to Unicode 11.
The document also recomments that all DNS registries continue the To improve understanding, this document describes systems that are
practice of calculating a repertoire using conservatism and inclusion being used as alternatives to those that conform to IDNA2008.
principles.
TO BE REMOVED AT TIME OF PUBLICATION AS AN RFC: TO BE REMOVED AT TIME OF PUBLICATION AS AN RFC:
This document is discussed on the i18nrp@ietf.org mailing list of the This document is discussed on the i18nrp@ietf.org mailing list of the
IETF. IETF.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
skipping to change at page 1, line 48 skipping to change at page 1, line 45
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 11, 2019. This Internet-Draft will expire on September 12, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Keywords for Requirement Levels . . . . . . . . . . . . . . . 4 2. Keywords for Requirement Levels . . . . . . . . . . . . . . . 4
3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 4 3.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 4
3.2. Deployment . . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Additional important IDNA2008-related documents . . . . . 5
4. Notable Changes Between Unicode 6.3.0 and 11.0.0 . . . . . . 6 3.3. Deployment . . . . . . . . . . . . . . . . . . . . . . . 5
4.1. Changes in Unicode 7.0.0 . . . . . . . . . . . . . . . . 6 4. Notable Changes Between Unicode 6.2.0 and 11.0.0 . . . . . . 7
4.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 6 4.1. Changes between Unicode 6.2.0 and 7.0.0 . . . . . . . . . 7
4.3. Changes in Unicode 11.0.0 . . . . . . . . . . . . . . . . 6 4.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 8
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.3. Changes between Unicode 10.0.0 and 11.0.0 . . . . . . . . 8
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 5. U+111C9 SHARADA SANDHI MARK . . . . . . . . . . . . . . . . . 9
7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 10
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10
9.1. Normative References . . . . . . . . . . . . . . . . . . 9 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11
9.2. Non-normative references . . . . . . . . . . . . . . . . 9 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 11
Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 12 10.1. Normative References . . . . . . . . . . . . . . . . . . 11
Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 15 10.2. Non-normative references . . . . . . . . . . . . . . . . 12
Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 16 Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 14
Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 17 Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 17
Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 18 Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 18
Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 20
Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 21
Appendix F. Code points in Unicode Character Database (UCD) Appendix F. Code points in Unicode Character Database (UCD)
format for Unicode 11.0.0 . . . . . . . . . . . . . 20 format for Unicode 11.0.0 . . . . . . . . . . . . . 22
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 79 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 81
1. Introduction 1. Introduction
The current version of Internationalized Domain Names for The current version of Internationalized Domain Names for
Applications (IDNA) was largely completed in 2008, and is thus known Applications (IDNA) was initiated in 2008, and despite not being
within as "IDNA2008". It is specified in a series of documents completed until 2010, is widely known as "IDNA2008". It is specified
listed in Section 3.1. The IDNA2008 standard includes an algorithm in the series of documents listed in Section 3.1. The IDNA2008
by which a derived property value is calculated based on the standard includes an algorithm by which a derived property value is
properties defined from the Unicode Standard. calculated based on the properties defined from the Unicode Standard.
When the Unicode Standard is updated, new code points are assigned When the Unicode Standard is updated, new code points are assigned
and already-assigned code points can have their property values and already-assigned code points can have their property values
changed. changed.
o Assigning code points can create problems if the newly-assigned o Assigning code points can create problems if the newly-assigned
code points are compositions of code points changes (or would have code points are compositions of existing code points and because
changed) the normalization functions. These problems can arise if of that the normalization relationships associated with those code
the new code points change the matching algorithms used and this points should have been changed.
in turn creates problems looking up already stored strings.
o Changing properties for already-assigned code points can create o Changing properties for already-assigned code points can create
problems if the property change results in changes to the derived problems if the property change results in changes to the derived
property value. This might make an earlier allowed code point property value. This might make an earlier allowed code point
whose derived property value is PVALID to then not be allowed whose derived property value is PVALID to then not be allowed
anymore if its derived property value changes to DISALLOWED. The anymore if its derived property value changes to DISALLOWED. The
problem can also happen the other way around: a code point that problem can also happen the other way around: a code point that
was not allowed (and thus is blocked in some situations) to was not allowed (and thus is prohibited) can suddenly end up being
suddenly end up being allowed. allowed.
Historically, the IETF has accepted all implications of changes in o Problems can also be created if the properties assigned to those
the Unicode Standard even though the changes have resulted in code points are inconsistent with IDNA2008 assumptions about how
problematic changes in the derived property value. The primary properties are assigned and/or about how code points with those
reason for that choice is that staying with the Unicode Standard has properties are used or behave.
been viewed as important because of the diversity of implementations
already existing in the wild.
As described in Section 4, a few changes have been made regarding There was three incompatible changes in the Unicode standard after
certain attributes to code points in Unicode between version 6.3.0 Unicode 5.2 up to including Unicode 6.0, as described in RFC 6452
and 11.0.0. Such changes could result in a change in the derived [RFC6452]. The code points U+0CF1 and U+0CF2 had a derived property
property value for the code point in question. If a change occurs, value change from DISSALOWED to PVALID while U+19DA had a change in
and it is between any of the derived property values except derived property value from PVALID to DISSALOWED. They where
DISALLOWED, there is not a problem. This document concludes that no examined in great detail and IETF concluded that the consensus is
exceptions are to be added to IDNA2008 even if changes in the derived that no update was needed to RFC 5892 [RFC5892] based on the changes
property value is a result of the changes made in Unicode between made to the Unicode standard.
version 6.3.0 and 11.0.0.
In 2015, the Internet Architecture Board (IAB) issued a statement As described in Section 4, more changes have been made to code points
[IAB] which requested the IETF to resolve the issues related to the between Unicode version 6.0 and 11.0.0 so that the derived property
code point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) that was value have been changed in an incompatible way. This document
introduced in Unicode 7.0.0 [Unicode-7.0.0]. The current document concludes that no exceptions are to be added to IDNA2008 even though
resolves this issue and suggests that the IDNA2008 standard followsO there are changes in the derived property value is a result of the
the Unicode Standard and not update RFC 5892 [RFC5892] or any other changes made in Unicode between version 6.2.0 and 11.0.0.
IDNA2008 RFCs.
Further, in 2015, the Internet Architecture Board (IAB) issued a
statement [IAB] which requested the IETF to resolve the issues
related to the code point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1)
that was introduced in Unicode 7.0.0 [Unicode-7.0.0]. This document
concludes that this code point is not to be added to the exception
list either. It should be noted that the review on U+08A1 indicated
that it is not an isolated case and that a number of PVALID code
points of long standing may have similar issues. The problem is
described in more detail in a document in progress, draft-klensin-
idna-5892upd-unicode70 [I-D.klensin-idna-5892upd-unicode70]. A
fuller resolution of this issue may require future changes to
IDNA2008 or additional specifications, but there is insufficient
understanding yet of what would constitute the best approach.
2. Keywords for Requirement Levels 2. Keywords for Requirement Levels
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 RFC2119 [RFC2119] RFC8174 [RFC8174] when, and only when, they 14 RFC2119 [RFC2119] RFC8174 [RFC8174] when, and only when, they
appear in all capitals, as shown here. appear in all capitals, as shown here.
3. Background 3. Background
3.1. IDNA2008 Documents 3.1. IDNA2008 Documents
IDNA2008 consists of the following documents. The documents in the IDNA2008 consists of the following documents. The documents in the
set have informal names. set have informal names.
o RFC 5890 [RFC5890], informally called "Defs" or "Definitions", o Internationalized Domain Names for Applications (IDNA):
contains definitions and other material that are needed for Definitions and Document Framework [RFC5890], informally called
understanding other documents in the set. "Defs" or "Definitions", contains definitions and other material
that are needed for understanding other documents in the set.
o RFC 5891 [RFC5891], informally called "Protocol", describes the o Internationalized Domain Names in Applications (IDNA): Protocol
core IDNA2008 protocol and its operations. It needs to be [RFC5891], informally called "Protocol", describes the core
interpreted in combination with the Bidi document (described IDNA2008 protocol and its operations. It needs to be interpreted
below). in combination with the Bidi document (described below).
o RFC 5892 [RFC5892], informally called "Tables", lists the o The Unicode Code Points and Internationalized Domain Names for
categories and rules that identify the code points allowed in a Applications (IDNA) [RFC5892], informally called "Tables", lists
label written in native character form (called a "U-label"), an is the categories and rules that identify the code points allowed in
based originally on Unicode 5.2.0 [Unicode-5.2.0] code point a label written in native character form (called a "U-label"), and
assignments and additional rules unique to IDNA2008. The Unicode- is based on Unicode 5.2.0 [Unicode-5.2.0] code point assignments
based rules in RFC 4892 are expected to be stable across Unicode and additional rules unique to IDNA2008. The Unicode-based rules
updates and hence independent of Unicode versions. RFC 5892 in RFC 4892 are expected to be stable across Unicode updates and
hence independent of Unicode versions. RFC 5892 [RFC5892]
obsoletes RFC 3491 [RFC3491], and in particular the use of the obsoletes RFC 3491 [RFC3491], and in particular the use of the
tables to which it refers. tables to which RFC 3491 [RFC3491] refers.
o RFC 5893 [RFC5893], informally called "Bidi", specifies special o Right-to-Left Scripts for Internationalized Domain Names for
rules for labels that contain characters that are written from Applications (IDNA) [RFC5893], informally called "Bidi", specifies
right to left. special rules for labels that contain characters that are written
from right to left.
o RFC 5894 [RFC5894], informally called "Rationale", provides an o Internationalized Domain Names for Applications (IDNA):
overview of the protocol and associated tables, and gives Background, Explanation, and Rationale [RFC5894], informally
explanatory material and some rationale for the decisions that led called "Rationale", provides an overview of the protocol and
to IDNA2008. It also contains advice for DNS registry operators associated tables, and gives explanatory material and some
and others who use Internationalized Domain Names (IDNs). rationale for the decisions that led to IDNA2008. It also
contains advice for DNS registry operators and others who use
Internationalized Domain Names (IDNs).
o RFC 5895 [RFC5895], informally called "Mapping", discusses the o Mapping Characters for Internationalized Domain Names in
issue of mapping characters into other characters and that Applications (IDNA) 2008 [RFC5895], informally called "Mapping",
provides guidance for doing so when that is appropriate. RFC 5895 discusses the issue of mapping characters into other characters
provides advice and is not a required part of IDNA. and provides guidance for doing so when that is appropriate. RFC
5895 provides advice only and is not a required part of IDNA.
o RFC 6452 [RFC6452] describes some changes made to Unicode 6.0.0 3.2. Additional important IDNA2008-related documents
[Unicode-6.0.0] that resulted in the derived property value change
for the code points U+0CF1, U+0CF2 and U+19DA. U+0CF1 and U+0CF2
changed from DISALLOWED to PVALID, while U+19DA changed from
PVALID to DISSALOWED. The IETF concluded that no update to RFC
5892 [RFC5892] was needed based on the changes made in Unicode
6.0.0 [Unicode-6.0.0]. As a result, the derived property value
remained aligned with the Unicode Standard.
3.2. Deployment There are other documents important for the understanding and
functioning of IDNA2008, for example this.
The level of deployment of IDNA2008 is unfortunately quite diverse. o The Unicode Code Points and Internationalized Domain Names for
The following lists some of the strategies that existing Applications (IDNA) - Unicode 6.0 [RFC6452] describes some changes
implementations are known to use: made to Unicode 6.0.0 [Unicode-6.0.0] that resulted in derived
property value change for the code points U+0CF1, U+0CF2 and
U+19DA. U+0CF1 and U+0CF2 changed from DISALLOWED to PVALID,
while U+19DA changed from PVALID to DISSALOWED. The IETF
concluded that no update to RFC 5892 [RFC5892] was needed based on
the changes made in Unicode 6.0.0 [Unicode-6.0.0]. As a result,
the derived property value remained aligned with the Unicode
Standard. Specifically, no exception was added.
o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491 [RFC3491] 3.3. Deployment
which implies using a table within which it is said whether code
points are allowed to be used or not, after doing the There are many variations on the general IDNA model in use in the
normalization specified in IDNA2003. various parts of the community. The following lists some of the
strategies that implementations that claim to be IDNA compliant are
known to use, but it should be noted the list is not complete:
o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491
[RFC3491]. Those specifications are dependent on case folding and
NFKC normalization and on tables that specify for each code point
whether it is allowed to be used or not, with a distinction made
between use for "stored strings" and "query strings". The tables
themselves are dependent on version 3.2 of The Unicode Standard
[Unicode-3.2.0].
o A number of variations on IDNA2003, sometimes presented as
"updated IDNA2003" or the like, which follow the principles of
IDNA2003 as understood by the implementers but that use tables
that represent how the implementers believe Stringprep [RFC3454]
and Nameprep [RFC3491] would have evolved had the IETF not moved
in the direction of IDNA2008 instead.
o A mix between IDNA2003 and IDNA2008 where code points assigned to o A mix between IDNA2003 and IDNA2008 where code points assigned to
Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property
value calculated according to the algorithm specified in IDNA2008. value calculated according to the algorithm specified in IDNA2008.
o Strict IDNA2008 following the current IANA tables, which implies o A mix between IDNA2003 and IDNA2008 according to the Unicode
staying at Unicode 6.3.0 [Unicode-6.3.0] and treating later Technical Standard #46 [UTS-46]. Because that document specifies
assigned code points as UNASSIGNED. different profiles, there are several different variations that
leave users with no guarantee that two applications claiming
o The IDNA2008 algorithm applied to whatever version of Unicode conformance to UTS#46 will interoperate well with each other much
Standard exists in the operating system and/or libraries used, less with conforming IDNA2008 implementations. UTS#46 is
regardless of whether the version is later than Unicode version ultimately based on a normative table very much like the one used
6.3.0. by Stringprep [RFC3454] but updated for each new version of
Unicode.
o A mix between IDNA2003 and IDNA2008 according to local
interpretation of the Unicode Technical Standard #46 [UTS-46].
The issue is further complicated by having diverse implementations of o The (normative) IDNA2008 algorithm applied to whatever version of
the requirements in RFC 5894 [RFC5894] by DNS registry operators, Unicode Standard exists in the operating system and/or libraries
based on the IDNA2008 specification, but with additional rules for used, independent of whatever version of tables appears in the
the specific code points that are allowed for registration. (non-normative) IANA detabase.
In practice, the Unicode Consortium creates a maximum set of code In practice, the Unicode Consortium creates a maximum set of code
points by assigning code points in the Unicode Standard. The points by assigning code points in the Unicode Standard. The
IDNA2008 rules based on the Unicode Standard create a subset of these IDNA2008 rules use the Unicode Standard to create a further subset of
by assigning the PVALID derived property value to them. DNS code points and context that are permitted in DNS labels associated
with its PVALID, CONTEXTJ, and CONTEXTO derived property values. DNS
registries and other organizations that deal with IDNs are supposed registries and other organizations that deal with IDNs are supposed
to create their own subsets from IDNA2008 for use by those registries to create their own subsets from IDNA2008 for use by those registries
and organizations. and organizations.
SAC-084 [SAC-084] and RFC 6912 [RFC6912] recommend to DNS registries This progressing subsetting and narrowing of the repertoire of code
and other organizations to be conservative when creating their points that can be used in labels is an implementation of the
subsets, and to use the principle of creating subsets by inclusion. principles of being conservative when deciding what code points to
include in such a subset. SAC-084 [SAC-084] and RFC 6912 [RFC6912]
recommend to DNS registries and other organizations to be
conservative when creating their subsets, and to use the principle of
creating subsets by inclusion.
4. Notable Changes Between Unicode 6.3.0 and 11.0.0 4. Notable Changes Between Unicode 6.2.0 and 11.0.0
4.1. Changes in Unicode 7.0.0 4.1. Changes between Unicode 6.2.0 and 7.0.0
Change in number of chacters in each category:
Code points that changed derived property value: 0
PVALID changed from 97946 to 99867 (+1921)
UNASSIGNED changed from 864348 to 861509 (-2839)
CONTEXTJ did not change, at 2
CONTEXTO did not change, at 25
DISALLOWED changed from from 151791 to 152709 (+918)
TOTAL did not change, at 1114112
There are no changes made to Unicode between version 6.2.0 and
7.0.0 that impact IDNA2008 calculation of the derived property
values.
The character ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) was The character ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) was
introduced in Unicode 7.0.0. This was discussed extensively in the introduced in Unicode 7.0.0. This was discussed extensively in the
IETF, and by the IAB in their statement [IAB] requesting the IETF to IETF, and by the IAB in their statement [IAB] requesting the IETF to
investigate the issue. Specifically, the IAB stated: investigate the issue. Specifically, the IAB stated:
On the same precautionary principle, the IAB recommends that the On the same precautionary principle, the IAB recommends that the
Internationalized Domain Names for Applications (IDNA) Parameters Internationalized Domain Names for Applications (IDNA) Parameters
registry (http://www.iana.org/assignments/idna-tables/) not be registry (http://www.iana.org/assignments/idna-tables/) not be
updated to Unicode 7.0.0 until the IETF has consensus on a updated to Unicode 7.0.0 until the IETF has consensus on a
solution to this problem. solution to this problem.
The discussion in the IETF concluded that although it is possible to The discussion in the IETF concluded that although it is possible to
create "the same" character in multiple ways, the issue with U+08A1 create "the same" character in multiple ways, the issue with U+08A1
is not unique. The character U+08A1 can be represented with the is not unique. The character U+08A1 (ARABIC LETTER BEH WITH HAMZA
sequence ARABIC LETTER BEH (U+0628) and ARABIC HAMZA ABOVE (U+0654). ABOVE) can be represented with the sequence ARABIC LETTER BEH
This identical to LATIN SMALL LETTER A WITH DIAERESIS (U+00E4), that (U+0628) and ARABIC HAMZA ABOVE (U+0654). This identical to LATIN
can be represented with the sequence LATIN SMALL LETTER A (U+0061) SMALL LETTER O WITH STROKE (U+00F8), which can be represented with
followed by COMBINING DIAERESIS (U+0308). One difference between the sequence LATIN SMALL LETTER O (U+006F) followed by COMBINING
these two sequences is how they are treated in the normalization SHORT SOLIDUS OVERLAY (U+0337).
forms specified by the Unicode Consortium.
U+08A1 is discussed in draft-freytag-troublesome-characters Although the discussion about this specific code point resulted in
[I-D.freytag-troublesome-characters] and other Internet-Drafts. acceptance of the derived property value of PVALID, the underlying
Regardless of whether the discussion of those drafts ends in problem with combining sequences is not understood fully. Therefore
recommendations to include the code point in the repertoire of it cannot be claimed that this case can be extrapolated to other
characters permissable for registration or not, it is still situtions and other code points.
acceptable to allow the code point to have a derived property value
of PVALID.
4.2. Changes between Unicode 7.0.0 and 10.0.0 4.2. Changes between Unicode 7.0.0 and 10.0.0
There are no changes made to Unicode between version 7.0.0 and 10.0.0 Change in number of chacters in each category:
that impact IDNA2008 calculation of the derived property value.
4.3. Changes in Unicode 11.0.0 Code points that changed derived property value: 0
The Unicode Standard Version 11.0.0 [Unicode-11.0.0] has included a PVALID changed from 99867 to 122411 (+22544)
number of changes [Changes-11.0.0] from version 10.0.0.
o 684 new characters were added, including letters, combining marks, UNASSIGNED changed from 861509 to 837775 (-23734)
digits, symbols, and punctuation marks.
o Georgian letters in the ranges U+10D0..U+10FA and U+10FD..U+10FF CONTEXTJ did not change, at 2
CONTEXTO did not change, at 25
DISALLOWED changed from from 152709 to 153899 (+1190)
TOTAL did not change, at 1114112
There are no changes made to Unicode between version 7.0.0 and
10.0.0 that impact IDNA2008 calculation of the derived property
values.
4.3. Changes between Unicode 10.0.0 and 11.0.0
Change in number of chacters in each category:
Code points that changed derived property value: 1
PVALID changed from 122411 to 122734 (+323)
UNASSIGNED changed from 837775 to 837091 (-684)
CONTEXTJ did not change, at 2
CONTEXTO did not change, at 25
DISALLOWED changed from from 153899 to 154260 (+361)
TOTAL did not change, at 1114112
Georgian letters in the ranges U+10D0..U+10FA and U+10FD..U+10FF
had their General Properties changed from Lo to Ll, to reflect had their General Properties changed from Lo to Ll, to reflect
their status as the lowercase of new Georgian case pairs. Case their status as the lowercase of new Georgian case pairs. Case
mappings were also added. mappings were also added.
o SHARADA SANDHI MARK (U+111C9 ) was changed from Po to Mn, and from SHARADA SANDHI MARK (U+111C9) was changed from Po to Mn, and from
bc=L to bc=NSM. bc=L to bc=NSM.
o The properties for ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and The properties for ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and
ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were corrected from Mc to ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were corrected from Mc to
Mn. Mn.
o SPHERICAL ANGLE OPENING UP (U+29A1) was changed to Bidi_M=N. SPHERICAL ANGLE OPENING UP (U+29A1) was changed to Bidi_M=N.
These changes to the Unicode Standard have the following implications These changes to the Unicode Standard have the following implications
for these code points: for these code points:
o The newly assigned 684 characters are to have a derived property o The newly assigned 684 characters are assigned a derived property
value as of a result of applying the IDNA2008 algorithm. value as of a result of applying the IDNA2008 algorithm.
o The Georgian letters in the ranges U+10D0..U+10FA and o The Georgian letters in the ranges U+10D0..U+10FA and
U+10FD..U+10FF existed before IDNA2008 was created. Applying the U+10FD..U+10FF existed before IDNA2008 was created. Applying the
IDNA2008 algorithm to the code points assigned the derived IDNA2008 algorithm to the code points assigned the derived
property value PVALID, and that value is unchanged even if the property value PVALID, and that value is unchanged even if the
underlying Unicode properties have changed. underlying Unicode properties have changed. The newly encoded
Mtavruli letters have general category "Lu" and are therefore
DISALLOWED.
o The U+111C9 SHARADA SANDHI MARK was added to Unicode 8.0.0 o The U+111C9 SHARADA SANDHI MARK was added to Unicode 8.0.0
[Unicode-8.0.0]. Applying the IDNA2008 algorithm to the code [Unicode-8.0.0]. Applying the IDNA2008 algorithm to the code
point assigned the derived property value DISALLOWED. The changes point assigned the derived property value DISALLOWED. The changes
in the underlying properties in the Unicode Standard Version in the underlying properties in the Unicode Standard Version
11.0.0 [Unicode-11.0.0] caused the derived property value to 11.0.0 [Unicode-11.0.0] caused the derived property value to
change to PVALID, which is an acceptable change. change to PVALID.
o The characters ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and o The characters ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and
ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were added to Unicode ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were added to Unicode
10.0.0 [Unicode-10.0.0]. Applying the IDNA2008 algorithm to the 10.0.0 [Unicode-10.0.0]. Applying the IDNA2008 algorithm to the
code points assigned the derived property value PVALID, and that code points assigned the derived property value PVALID, and that
value is unchanged even if the underlying Unicode properties have value is unchanged even if the underlying Unicode properties have
changed. changed.
o SPHERICAL ANGLE OPENING UP (U+29A1) existed before IDNA2008 was o SPHERICAL ANGLE OPENING UP (U+29A1) existed before IDNA2008 was
created. Applying the IDNA2008 algorithm to the code point created. Applying the IDNA2008 algorithm to the code point
assigned the derived property value PVALID, and that value is assigned the derived property value DISALLOWED, and that value is
unchanged even if the underlying Unicode properties have changed. unchanged even if the underlying Unicode properties have changed.
5. Conclusion 5. U+111C9 SHARADA SANDHI MARK
As described in Section 4, changes have been made to Unicode between As one can see in Section 4, there is one incompatible change made
version 6.3.0 and 11.0.0. Some changes to specific characters between Unicode 6.2.0 and 11.0.0, the code point U+111C9. It has
changed their derived property value, while other changes did not. changed derived property value from DISALLOWED to PVALID. In
Given the diverse deployment described in Section 3.2 and the changes situations like these, IDNA2008 allow for addition of rules to RFC
5892 [RFC5892] section 2.7. (BackwardCompatible (G)). The code
point if being accepted might due to implementations of IDNA2008
based on older versions of Unicode 11.0.0 be rejected. As the
character is rarely used outside of the group of Sharada specialist,
and used in some records for indicating sandhi breaks, the conclusion
is that it could be added as an exception as well as change property
value as the use of the code point is limited outside a special
community. As including an exception would require implementation
changes in deployed implementations of IDNA20008, the editor proposes
that such a BackwardCompatible rule NOT to be added to IDNA2008.
6. Conclusion
As described in Section 4 and Section 5, changes have been made to
Unicode between version 6.2.0 and 11.0.0. Some changes to specific
characters changed their derived property value, while other changes
did not. Given what is described in Section 3.3 and the changes
described, including implications to normalization, the conclusion of described, including implications to normalization, the conclusion of
this document is to not add any exception rules to IDNA2008. this document is to not add any exception rules to IDNA2008.
To increase overall harmonization in the use of IDNs, this document This does not preclude any such updates to RFC 5892 [RFC5892] or any
recommends that the derived property values MUST be calculated as other IDNA2008 related document in the future when new versions of
specified in the documents listed in section Section 3.1 and with the the Unicode Standard is released, and it might also happen that it is
code points in Unicode Version 11.0.0 [Unicode-11.0.0]. found the algorithm specified in IDNA2008 is not suitable for DNS
without additional rules, categories, or tuning.
All DNS registries (and other organizatios) SHOULD calculate a
repertoire using the conservatism and inclusion principles, as
described in SAC-084 [SAC-084] and similar documents.
6. IANA Considerations 7. IANA Considerations
IANA is requested to update the IDNA Parameters registry of derived IANA is requested to update the IDNA Parameters registry of derived
property values, after the expert reviewer validates that the derived property values, after the expert reviewer validates that the derived
property values are calculated correctly. property values are calculated correctly.
7. Security Considerations 8. Security Considerations
This document makes recommendations regarding the use of the IDNA2008 This document makes recommendations regarding the use of the IDNA2008
algorithm for calculation of derived property values, based on the algorithm for calculation of derived property values, based on the
current Unicode version. It also recommends that DNS registries (and Unicode version 11.0.0. This recommendation do not say anything
others dealing with Internationalized Domain Names) explicitly select about what recommendations to make for future versions of the Unicode
appropriate subsets of characters with the derived value of PVALID. Standard.
Not following these recommendations can lead to various security Not following these recommendations can lead to various security
issues. Specifically, allowing confusable characters may lead to issues. Specifically, allowing confusable characters may lead to
various phishing attacks, as described in the Security Consideration various phishing attacks, as described in the Security Consideration
Sections in the documents listed in section Section 3.1. Sections in the documents listed in section Section 3.1.
8. Acknowledgements 9. Acknowledgements
Thanks to Martin Duerst, Asmus Freytag, Ted Hardie, John Klensin, Thanks to Harald Alvestrand, Marc Blanchet, Martin Duerst, Asmus
Erik Nordmark, Michel Suignard, Andrew Sullivan and Suzanne Woolf for Freytag, Ted Hardie, John Klensin, Erik Nordmark, Pete Resnick, Peter
Saint-Andre, Michel Suignard, Andrew Sullivan and Suzanne Woolf for
input to this document. input to this document.
9. References 10. References
9.1. Normative References
10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names (IDN)", Profile for Internationalized Domain Names (IDN)",
RFC 3491, DOI 10.17487/RFC3491, March 2003, RFC 3491, DOI 10.17487/RFC3491, March 2003,
<https://www.rfc-editor.org/info/rfc3491>. <https://www.rfc-editor.org/info/rfc3491>.
skipping to change at page 9, line 45 skipping to change at page 12, line 9
[RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code
Points and Internationalized Domain Names for Applications Points and Internationalized Domain Names for Applications
(IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452,
November 2011, <https://www.rfc-editor.org/info/rfc6452>. November 2011, <https://www.rfc-editor.org/info/rfc6452>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
9.2. Non-normative references 10.2. Non-normative references
[Changes-11.0.0] [Changes-11.0.0]
The Unicode Consortium, "Unicode Standard Annex #44", The Unicode Consortium, "Unicode Standard Annex #44",
Unicode Standard Annex #44, UNICODE CHARACTER DATABASE, Unicode Standard Annex #44, UNICODE CHARACTER DATABASE,
Change History https://www.unicode.org/reports/tr44/ Change History https://www.unicode.org/reports/tr44/
tr44-21d4.html#Change_History, May 2018. tr44-21d4.html#Change_History, May 2018.
[I-D.freytag-troublesome-characters] [I-D.freytag-troublesome-characters]
Freytag, A., Klensin, J., and A. Sullivan, "Those Freytag, A., Klensin, J., and A. Sullivan, "Those
Troublesome Characters: A Registry of Unicode Code Points Troublesome Characters: A Registry of Unicode Code Points
Needing Special Consideration When Used in Network Needing Special Consideration When Used in Network
Identifiers", draft-freytag-troublesome-characters-02 Identifiers", draft-freytag-troublesome-characters-02
(work in progress), June 2018. (work in progress), June 2018.
[I-D.klensin-idna-5892upd-unicode70]
Klensin, J. and P. Faltstrom, "IDNA Update for Unicode 7.0
and Later Versions", draft-klensin-idna-5892upd-
unicode70-05 (work in progress), October 2017.
[IAB] Internet Architecture Board, "IAB Statement on Identifiers [IAB] Internet Architecture Board, "IAB Statement on Identifiers
and Unicode 7.0.0", IAB Statement on Identifiers and and Unicode 7.0.0", IAB Statement on Identifiers and
Unicode 7.0.0 Unicode 7.0.0
https://www.iab.org/documents/correspondence-reports- https://www.iab.org/documents/correspondence-reports-
documents/2015-2/iab-statement-on-identifiers-and-unicode- documents/2015-2/iab-statement-on-identifiers-and-unicode-
7-0-0/, January 2015. 7-0-0/, January 2015.
[N4330] Pandey, A., "Proposal to Encode the SANDHI MARK for
Sharada", Proposal to Encode the SANDHI MARK for
Sharada https://www.unicode.org/L2/L2012/12322-n4330-
sharada-sandhi-mark.pdf, September 2012.
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454,
DOI 10.17487/RFC3454, December 2002,
<https://www.rfc-editor.org/info/rfc3454>.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)", "Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, DOI 10.17487/RFC3490, March 2003, RFC 3490, DOI 10.17487/RFC3490, March 2003,
<https://www.rfc-editor.org/info/rfc3490>. <https://www.rfc-editor.org/info/rfc3490>.
[RFC5894] Klensin, J., "Internationalized Domain Names for [RFC5894] Klensin, J., "Internationalized Domain Names for
Applications (IDNA): Background, Explanation, and Applications (IDNA): Background, Explanation, and
Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010,
<https://www.rfc-editor.org/info/rfc5894>. <https://www.rfc-editor.org/info/rfc5894>.
 End of changes. 57 change blocks. 
187 lines changed or deleted 299 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/