draft-ietf-idn-idne-00.txt   draft-ietf-idn-idne-01.txt 
Internet Draft Marc Blanchet Internet Draft Marc Blanchet
draft-ietf-idn-idne-00.txt Viagenie draft-ietf-idn-idne-01.txt Viagenie
July 5, 2000 Paul Hoffman July 8, 2000 Paul Hoffman
Expires in six months IMC & VPNC Expires in six months IMC & VPNC
Internationalized domain names using EDNS (IDNE) Internationalized domain names using EDNS (IDNE)
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with all This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026. provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task Internet-Drafts are working documents of the Internet Engineering Task
skipping to change at line 46 skipping to change at line 46
1. Introduction 1. Introduction
Various proposals for IDN have tried to integrate IDN into the current Various proposals for IDN have tried to integrate IDN into the current
limited ASCII DNS. However, the compatibility issues make too many limited ASCII DNS. However, the compatibility issues make too many
constraints on the architecture. Many of these proposals require constraints on the architecture. Many of these proposals require
modifications to the applications or to the DNS protocol or to the modifications to the applications or to the DNS protocol or to the
servers. This proposal take a different approach: it uses the servers. This proposal take a different approach: it uses the
standardized extension mechanism for DNS (EDNS) and uses UTF-8 as the standardized extension mechanism for DNS (EDNS) and uses UTF-8 as the
mandatory charset. It causes no harm to the current DNS because it uses mandatory charset. It causes no harm to the current DNS because it uses
the ENDS extension mechanism. The major drawback of this proposal is the EDNS extension mechanism. The major drawback of this proposal is
that all protocols, applications and DNS servers will have to be that all protocols, applications and DNS servers will have to be
upgraded to support this proposal. upgraded to support this proposal.
1.1 Terminology 1.1 Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
"MAY" in this document are to be interpreted as described in RFC 2119 "MAY" in this document are to be interpreted as described in RFC 2119
[RFC2119]. [RFC2119].
Hexadecimal values are shown preceded with an "0x". For example, Hexadecimal values are shown preceded with an "0x". For example,
"0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are "0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are
shown preceded with an "0b". For example, a nine-bit value might be shown preceded with an "0b". For example, a nine-bit value might be
shown as "0b101101111". shown as "0b101101111".
Examples in this document use the notation from the Unicode Standard
[UNICODE3] as well as the ISO 10646 [ISO10646] names. For example, the
letter "a" may be represented as either "U+0061" or "LATIN SMALL LETTER
A". In the lists of prohibited characters, the "U+" is left off to make
the lists easier to read.
1.2 IDN summary 1.2 IDN summary
Using the terminology in [IDNComp], this protocol specifies an IDN Using the terminology in [IDNCOMP], this protocol specifies an IDN
architecture of arch-2 (send binary or ACE). The binary format is architecture of arch-2 (send binary or ACE). The binary format is
bin-1.1 (UTF-8), and the method for distinguishing binary from current bin-1.1 (UTF-8), and the method for distinguishing binary from current
names is bin-2.4 (mark binary with EDNS0). The transition period is not names is bin-2.4 (mark binary with EDNS0). The transition period is not
specified. specified.
2. Functional Description 2. Functional Description
DNS query and responses containing IDNE labels have the following DNS query and responses containing IDNE labels have the following
properties: properties:
skipping to change at line 101 skipping to change at line 107
0 1 2 0 1 2
bits 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 . . . bits 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 . . .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-//+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-//+-+-+-+-+-+-+
|0 1| ELT | Size | IDN label ... | |0 1| ELT | Size | IDN label ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+-+-+-+-+
ELT: The six-bit extended label type to be assigned by the IANA for an ELT: The six-bit extended label type to be assigned by the IANA for an
IDN label. In this document, the value 0b000010 is used, although that IDN label. In this document, the value 0b000010 is used, although that
might be changed by IANA. might be changed by IANA.
Size: Size (in octets) of the IDN label following. Size: Size (in octets) of the IDN label following. This MUST NOT
be zero.
IDN label: Label, encoded in UTF-8 [RFC2279]. Note that this label might IDN label: Label, encoded in UTF-8 [RFC2279]. Note that this label might
contain all ASCII characters, and thus can be used for host name labels contain all ASCII characters, and thus can be used for host name labels
that are legal in [STD13]. that are legal in [STD13].
IDNE labels can be mixed with STD13 labels in a domain name. IDNE labels can be mixed with STD13 labels in a domain name.
The compression scheme in section 4.1.4 of [STD13] is supported as is. The compression scheme in section 4.1.4 of [STD13] is supported as is.
Pointers can refer to either IDN labels or non-IDN labels. Pointers can refer to either IDN labels or non-IDN labels.
3.1 Examples 3.1 Examples
3.1.1 Basic example 3.1.1 Basic example
The following example shows the label me.com where the "e" in "me" is The following example shows the label me.com where the "e" in "me" is
replaced by a <LATIN CAPITAL LETTER E WITH ACUTE>, which has the replaced by a <LATIN CAPITAL LETTER E WITH ACUTE>, which is U+00C9. The
codepoint 0x00C9. The decomposition and downcasing specified in decomposition and downcasing specified in [NAMEPREP] changes the second
[NAMEPREP] produces the string <LATIN SMALL LETTER E><COMBINING ACUTE character to <LATIN SMALL LETTER E WITH ACUTE>, U+00E9. This string is
ACCENT>, which is 0x00650301. This is then transformed using then transformed using UTF-8 [RFC2279] to 0x6DC3A9.
UTF-8[RFC2279] to: 0x65CC81.
Ignoring the other fields of the message, the domain name portion of the Ignoring the other fields of the message, the domain name portion of the
datagram could look like: datagram could look like:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
20 | 0 1 0 0 0 0 1 0| 0 0 0 0 0 1 0 1| 20 | 0 1 0 0 0 0 1 0| 0 0 0 0 0 0 1 1|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
22 | 0x6D (m) | 0x65 (e) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
24 | 0xCC ('(1)) | 0x81 ('(2)) | 22 | 0x6D (m) | 0xC3 (e'(1)) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
26 | 3 | 0x63 (c) | 24 | 0xA9 (e'(2)) | 3 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
28 | 0x6F (o) | 0x6D (m) | 26 | 0x63 (c) | 0x6F (o) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
30 | 0x00 | | 28 | 0x6D (m) | 0x00 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Octet 20 means EDNS extended label type (0b01) using the IDN label Octet 20 means EDNS extended label type (0b01) using the IDN label
type (0b000010). type (0b000010)
Octet 21 means size of label is 4 octets following. Octet 21 means size of label is 3 octets following
Octet 22-24 are the "m*" label (where the "*" is Octet 22-24 are the "m*" label encoded in UTF-8
<LATIN SMALL LETTER E><COMBINING ACUTE ACCENT>) Octet 25-28 are "com" encoded as a STD13 label
Octet 26-29 are "com" encoded as a STD13 label Octet 29 is the root domain
Octet 30 is the root domain
3.1.2 Example with compression 3.1.2 Example with compression
Using the previous labels, one datagram might contain "www.m*.com" and Using the previous labels, one datagram might contain "www.m*.com" and
"m*.com" (where the "*" is <LATIN SMALL LETTER E><COMBINING ACUTE "m*.com" (where the "*" is <LATIN CAPITAL LETTER E WITH ACUTE>).
ACCENT>).
Ignoring the other fields of the message, the domain name portions of Ignoring the other fields of the message, the domain name portions of
the datagram could look like: the datagram could look like:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
20 | 0 1 0 0 0 0 1 0| 0 0 0 0 0 1 0 1| 20 | 0 1 0 0 0 0 1 0| 0 0 0 0 0 0 1 1|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
22 | 0x6D (m) | 0x65 (e) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
24 | 0xCC ('(1)) | 0x81 ('(2)) | 22 | 0x6D (m) | 0xC3 (e'(1)) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
26 | 3 | 0x63 (c) | 24 | 0xA9 (e'(2)) | 3 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
28 | 0x6F (o) | 0x6D (m) | 26 | 0x63 (c) | 0x6F (o) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
30 | 0x00 | | 28 | 0x6D (m) | 0x00 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
. . . . . .
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
40 | 3 | 0x77 (w) | 40 | 3 | 0x77 (w) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
42 | 0x77 (w) | 0x77 (w) | 42 | 0x77 (w) | 0x77 (w) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
44 | 1 1| 20 | 44 | 1 1| 20 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
skipping to change at line 221 skipping to change at line 221
IPv6 is 1280 [RFC2460]). A sender MUST announce its capability in the IPv6 is 1280 [RFC2460]). A sender MUST announce its capability in the
OPT pseudo-RR described in section 4.3 of [RFC2671] by having the CLASS OPT pseudo-RR described in section 4.3 of [RFC2671] by having the CLASS
sender's UDP payload size be greater than or equal to 1220. sender's UDP payload size be greater than or equal to 1220.
6. Canonalization, Prohibited Characters, and Case Folding 6. Canonalization, Prohibited Characters, and Case Folding
The string in the label MUST be pre-processed as described in [NAMEPREP] The string in the label MUST be pre-processed as described in [NAMEPREP]
before the query or response is prepared. A query or response MUST NOT before the query or response is prepared. A query or response MUST NOT
contain a label that does not conform to [NAMEPREP]. contain a label that does not conform to [NAMEPREP].
DNS servers MUST check for prohibited chars in the labels. If any label
in a query is found, a NOTIMPL RCODE MUST be returned.
7. Versions of IDNE 7. Versions of IDNE
The IDN protocol version number MUST be included in the OPT RR RDATA of The IDN protocol version number MUST be included in the OPT RR RDATA of
EDNS (described in Section 4.4 of [RFC2671]). An OPTION-CODE will be EDNS (described in Section 4.4 of [RFC2671]). An OPTION-CODE will be
assigned by IANA for storing the IDNE protocol version number; this assigned by IANA for storing the IDNE protocol version number; this
document uses 0x0001 for the OPTION-CODE. The value (that document uses 0x0001 for the OPTION-CODE. The value (that
is, the OPTION-DATA) is the version number coded in 8 bits. is, the OPTION-DATA) is the version number coded in 8 bits.
All requesters MUST send this information as part of the OPT RR included All requesters MUST send this information as part of the OPT RR included
in the EDNS packet. in the EDNS packet.
skipping to change at line 319 skipping to change at line 316
legally represented in the ACE but not in IDNE. legally represented in the ACE but not in IDNE.
The IETF should periodically evaluate the benefits and problems The IETF should periodically evaluate the benefits and problems
associated with having three different formats for names (STD13, IDNE, associated with having three different formats for names (STD13, IDNE,
and ACE). If at some point it is decided that the problems outweigh the and ACE). If at some point it is decided that the problems outweigh the
benefits, the IETF can state a time when one or more of the services benefits, the IETF can state a time when one or more of the services
should not be used on the Internet. should not be used on the Internet.
10. Root Server Considerations 10. Root Server Considerations
Because this specification uses ENDS, root servers should be prepared to Because this specification uses EDNS, root servers should be prepared to
receive EDNS requests. This specification handles IDN top-level domains receive EDNS requests. This specification handles IDN top-level domains
in exactly the same fashion as it does every other domain. in exactly the same fashion as it does every other domain.
Considerations about IDN top-level domains are outside of this work, but Considerations about IDN top-level domains are outside of this work, but
the first IDN top-level domains would require all root servers to be the first IDN top-level domains would require all root servers to be
ready for IDNE requests. ready for IDNE requests.
11. IANA Considerations 11. IANA Considerations
[[ TBD. This section will have two parts. The first will request an EDNS [[ TBD. This section will have two parts. The first will request an EDNS
option code. The second will specify how IDNE version numbers are option code. The second will specify how IDNE version numbers are
allocated (namely, standards-track RFC only). ]] allocated (namely, standards-track RFC only). ]]
12. Security Considerations 12. Security Considerations
Because IDNE uses ENDS, it inherits the same security considerations as Because IDNE uses EDNS, it inherits the same security considerations as
EDNS. EDNS.
Much of the security of the Internet relies on the DNS. Thus, any change
to the characteristics of the DNS can change the security of much of the
Internet.
Host names are used by users to connect to Internet servers. The
security of the Internet would be compromised if a user entering a
single internationalized name could be connected to different servers
based on different interpretations of the internationalized host name.
Because this document normatively refers to [NAMEPREP] and [RFC2671],
it includes the security considerations from those documents as well.
13. References 13. References
[IDNComp] Paul Hoffman, "Comparison of Internationalized Domain Name [IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
Proposals", draft-ietf-idn-compare. Proposals", draft-ietf-idn-compare.
[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information
technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part
1: Architecture and Basic Multilingual Plane. Five amendments and a
technical corrigendum have been published up to now. UTF-16 is described
in Annex Q, published as Amendment 1. 17 other amendments are currently
at various stages of standardization. [[[ THIS REFERENCE NEEDS TO BE
UPDATED AFTER DETERMINING ACCEPTABLE WORDING ]]]
[NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of [NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of
Internationalized Host Names", draft-ietf-idn-nameprep. Internationalized Host Names", draft-ietf-idn-nameprep.
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, RFC 2119. Requirement Levels", March 1997, RFC 2119.
[RFC2279] Francois Yergeau, "UTF-8, a transformation format of ISO [RFC2279] Francois Yergeau, "UTF-8, a transformation format of ISO
10646", January 1998, RFC 2279. 10646", January 1998, RFC 2279.
[RFC2460] Steve Deering & Bob Hinden, "Internet Protocol, Version 6 (IPv6) [RFC2460] Steve Deering & Bob Hinden, "Internet Protocol, Version 6 (IPv6)
Specification", December 1998, RFC 2460. Specification", December 1998, RFC 2460.
[RFC2671] Paul Vixie, "Extension Mechanisms for DNS (EDNS0)", August [RFC2671] Paul Vixie, "Extension Mechanisms for DNS (EDNS0)", August
1999, RFC 2671. 1999, RFC 2671.
[STD13] Paul Mockapetris, "Domain names - implementation and [STD13] Paul Mockapetris, "Domain names - implementation and
specification", November 1987, STD 13 (RFC 1035). specification", November 1987, STD 13 (RFC 1035).
A. Authors' Addresses [UNICODE3] The Unicode Consortium, "The Unicode Standard -- Version
3.0", ISBN 0-201-61633-5. Described at
<http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.
A. Acknowledgements
This document is the result of the thinking of many people. The following
people made significant comments on the early drafts:
Andre Cormier
Andrew Draper
Bill Sommerfeld
Francois Yergeau
B. Changes from -00 to -01
1.1: Added reference to Unicode names.
3: Clarified that a size of zero is not allowed.
3.1.1 and 3.1.2: Fixed two very serious errors in the examples.
6: Removed second paragraph, which was redundant with 7.3.
12: Beefed up the security considerations.
13: Added [ISO10646] and [UNICODE3].
Added Appendix A.
Added Appendex B.
C. Authors' Addresses
Marc Blanchet Marc Blanchet
Viagenie Viagenie
2875 boul. Laurier, bureau 300 2875 boul. Laurier, bureau 300
Sainte-Foy, QC G1V 2M2 Canada Sainte-Foy, QC G1V 2M2 Canada
Marc.Blanchet@viagenie.qc.ca Marc.Blanchet@viagenie.qc.ca
Paul Hoffman Paul Hoffman
Internet Mail Consortium and VPN Consortium Internet Mail Consortium and VPN Consortium
127 Segre Place 127 Segre Place
 End of changes. 25 change blocks. 
39 lines changed or deleted 88 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/