draft-ietf-idn-idna-02.txt   draft-ietf-idn-idna-03.txt 
Internet Draft Patrik Faltstrom Internet Draft Patrik Faltstrom
draft-ietf-idn-idna-02.txt Cisco draft-ietf-idn-idna-03.txt Cisco
June 16, 2001 Paul Hoffman July 20, 2001 Paul Hoffman
Expires in six months IMC & VPNC Expires in six months IMC & VPNC
Internationalizing Host Names In Applications (IDNA) Internationalizing Host Names In Applications (IDNA)
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with all This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026. provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task Internet-Drafts are working documents of the Internet Engineering Task
skipping to change at line 148 skipping to change at line 148
Applications can accept host names using any character set or sets Applications can accept host names using any character set or sets
desired by the application developer, and can display host names in any desired by the application developer, and can display host names in any
charset. That is, this protocol does not affect the interface between charset. That is, this protocol does not affect the interface between
users and applications. users and applications.
An IDNA-aware application can accept and display internationalized host An IDNA-aware application can accept and display internationalized host
names in two formats: the internationalized character set(s) supported names in two formats: the internationalized character set(s) supported
by the application, and in an ACE. Applications MAY allow ACE input and by the application, and in an ACE. Applications MAY allow ACE input and
output, but are not encouraged to do so except as an interface for output, but are not encouraged to do so except as an interface for
advanced users, possibly for debugging. ACE encoding is opaque and ugly, special purposes, possibly for debugging. ACE encoding is opaque and
and should thus only be exposed to users who absolutely need it. The ugly, and should thus only be exposed to users who absolutely need it.
optional use, especially during a transition period, of ACE encodings in The optional use, especially during a transition period, of ACE
the user interface is described in section 3. Since ACE can be rendered encodings in the user interface is described in section 3. Because name
either as the encoded ASCII glyphs or the proper decoded character parts encoded with ACE can be rendered either as the encoded ASCII
glyphs, the rendering engine for an application SHOULD have an option characters or the proper decoded characters, the application MAY have an
for the user to select the preferred display; if it does, rendering the option for the user to select the preferred method of display; if it
ACE SHOULD NOT be the default. does, rendering the ACE SHOULD NOT be the default.
Host names are often stored and transported in many places. For example, Host names are often stored and transported in many places. For example,
they are part of documents such as mail messages and web pages. They are they are part of documents such as mail messages and web pages. They are
transported in the many parts of many protocols, such as both the transported in the many parts of many protocols, such as both the
control commands and the RFC 2822 body parts of SMTP, and the headers control commands and the RFC 2822 body parts of SMTP, and the headers
and the body content in HTTP. and the body content in HTTP.
In protocols and document formats that define how to handle In protocols and document formats that define how to handle
specification or negotiation of charsets, IDN host name parts can be specification or negotiation of charsets, IDN host name parts can be
given in any charset allowed by the protocol or document format. If a encoded in any charset allowed by the protocol or document format. If a
protocol or document format only allows one charset, IDN host name parts protocol or document format only allows one charset, IDN host name parts
must be given in that charset. must be given in that charset. In any place where a protocol or document
format allows transmition of the characters in IDN host name parts, IDN
host name parts SHOULD be transmitted using whatever character encoding
and escape mechanism that the protocol or document format uses at that
place.
All protocols that have host names as protocol elements already have the All protocols that have host names as protocol elements already have the
capacity for handling host names in the ASCII charset. Thus, IDN host capacity for handling host names in the ASCII charset. Thus, IDN host
name parts can be specified in those protocols in the ACE charset, which name parts can be specified in those protocols in the ACE charset, which
is a superset of the ASCII charset that uses the same set of octets. is a superset of the ASCII charset that uses the same set of octets.
2.1.2 Applications and resolvers 2.1.2 Applications and resolvers
Applications communicate with resolver libraries through a programming Applications communicate with resolver libraries through a programming
interface (API). Typically, the IETF does not standardize APIs, although interface (API). Typically, the IETF does not standardize APIs, although
skipping to change at line 190 skipping to change at line 194
formats of the host names to the resolver library. formats of the host names to the resolver library.
Before converting the name parts into ACE, the application MUST prepare Before converting the name parts into ACE, the application MUST prepare
each name part as specified in [NAMEPREP]. The application MUST use ACE each name part as specified in [NAMEPREP]. The application MUST use ACE
for the name parts that are sent to the resolver, and will always get for the name parts that are sent to the resolver, and will always get
name parts encoded in ACE from the resolver. name parts encoded in ACE from the resolver.
IDNA-aware applications MUST be able to work with both IDNA-aware applications MUST be able to work with both
non-internationalized host name parts (those that conform to [STD13] and non-internationalized host name parts (those that conform to [STD13] and
[STD3]) and internationalized host name parts. An IDNA-aware application [STD3]) and internationalized host name parts. An IDNA-aware application
that is resolving a non-internationalized host name parts MUST NOT do that is resolving a non-internationalized host name part MUST NOT do
any preparation or conversion to ACE on any non-internationalized name any preparation or conversion to ACE on any non-internationalized name
part. part.
2.1.3 Resolvers and DNS servers 2.1.3 Resolvers and DNS servers
An operating system might have a set of libraries for converting host An operating system might have a set of libraries for converting host
names to nameprepped ACE. The input to such a library might be in one or names to nameprepped ACE. The input to such a library might be in one or
more charsets that are used in applications (UTF-8 and UTF-16 are likely more charsets that are used in applications (UTF-8 and UTF-16 are likely
candidates for almost any operating system, and script-specific charsets candidates for almost any operating system, and script-specific charsets
are likely for localized operating systems). The output would be either are likely for localized operating systems). The output would be either
skipping to change at line 227 skipping to change at line 231
from a gethostbyaddr or other such lookup SHOULD update as soon as from a gethostbyaddr or other such lookup SHOULD update as soon as
possible in order to prevent users from seeing the ACE. However, this is possible in order to prevent users from seeing the ACE. However, this is
not considered a big problem because so few applications show this type not considered a big problem because so few applications show this type
of resolution to users. of resolution to users.
If an application decodes an ACE name but cannot show all of the If an application decodes an ACE name but cannot show all of the
characters in the decoded name, such as if the name contains characters characters in the decoded name, such as if the name contains characters
that the output system cannot display, the application SHOULD show the that the output system cannot display, the application SHOULD show the
name in ACE format instead of displaying the name with the replacement name in ACE format instead of displaying the name with the replacement
character (U+FFFD). This is to make it easier for the user to transfer character (U+FFFD). This is to make it easier for the user to transfer
the name correctly to other programs using copy-and-paste techniques. the name correctly to other programs. Programs that by default show the
Programs that by default show the ACE form when they cannot show all the ACE form when they cannot show all the characters in a name part SHOULD
characters in a name part SHOULD also have a mechanism to show the name also have a mechanism to show the name with as many characters as
with as many characters as possible and replacement characters in the possible and replacement characters in the positions where characters
positions where characters cannot be displayed. cannot be displayed. In many cases, the application doesn't know exactly
what the underlying rendering engine can or cannot display.
In addition to the condition above, if an application decodes an ACE
name but finds that the decoded name was not properly prepared according
to [NAMEPREP] (for example, if it has illegal characters in it), the
application SHOULD show the name in ACE format and SHOULD NOT display
the name in its decoded form. This is to avoid security issues described
in [NAMEPREP].
2.1.5 Automatic detection of ACE 2.1.5 Automatic detection of ACE
An application which receives a host name SHOULD verify whether or not An application which receives a host name SHOULD verify whether or not
the host name is in ACE. This is possible by verifying the prefix in the host name is in ACE. This is possible by verifying the prefix in
each of the labels, and seeing whether or not the label is in ACE. This each of the labels, and seeing whether or not the label is in ACE. This
MUST be done regardless of whether or not the communication channel used MUST be done regardless of whether or not the communication channel used
(such as keyboard input, cut and paste, application protocol, (such as keyboard input, cut and paste, application protocol,
application payload, and so on) has negotiated ACE. application payload, and so on) is encoding with ACE.
The reason for this requirement is that many applications are not The reason for this requirement is that many applications are not
ACE-aware. Applications that are not ACE-aware will send host names in ACE-aware. Applications that are not ACE-aware will send host names in
ACE but mark the charset as being US-ASCII or some other charset which ACE but mark the charset as being US-ASCII or some other charset which
has the characters that are valid in [STD13] as a subset. has the characters that are valid in [STD13] as a subset.
2.1.6 Bidirectional text 2.1.6 Bidirectional text
In IDNA, bidirectional text is entered and displayed exactly as it is In IDNA, text storage and display follows the rules in the Unicode standard
specified in ISO/IEC 10646. Both ISO/IEC 10646 and the Unicode standard [Unicode3.1]. In particular, all Unicode text is stored in logical order;
have extensive discussion of how to deal with bidirectional text. Any the Unicode standard has an extensive discussion of how to deal with reorder
input mechanism and display mechanism that handles characters from glyphs for display when dealing with bidirectional text such as Arabic or
bidirectional scripts should already conform to those specifications. Hebrew. See [UAX9] for more information.
Note that the formatting characters that manually change the direction
of display are prohibited by nameprep, thus making the task for input
and display mechanisms easier.
3. Name Server Considerations 3. Name Server Considerations
It is imperative that there be only one encoding for a particular host It is imperative that there be only one encoding for a particular host
name. ACE is an encoding for host name parts that use characters outside name. ACE is an encoding for host name parts that use characters outside
those allowed for host names [STD13]. Thus, a primary master name server those allowed for host names [STD13]. Thus, a primary master name server
MUST NOT contain an ACE-encoded name that decodes to a host name that is MUST NOT contain an ACE-encoded name that decodes to a host name that is
allowed in [STD13] and [STD3]. allowed in [STD13] and [STD3].
Name servers MUST NOT have any records with host names that contain Name servers MUST NOT have any records with host names that contain
skipping to change at line 325 skipping to change at line 334
Requirement Levels", March 1997, RFC 2119. Requirement Levels", March 1997, RFC 2119.
[STD3] Bob Braden, "Requirements for Internet Hosts -- Communication [STD3] Bob Braden, "Requirements for Internet Hosts -- Communication
Layers" (RFC 1122) and "Requirements for Internet Hosts -- Application Layers" (RFC 1122) and "Requirements for Internet Hosts -- Application
and Support" (RFC 1123), STD 3, October 1989. and Support" (RFC 1123), STD 3, October 1989.
[STD13] Paul Mockapetris, "Domain names - concepts and facilities" (RFC [STD13] Paul Mockapetris, "Domain names - concepts and facilities" (RFC
1034) and "Domain names - implementation and specification" (RFC 1035, 1034) and "Domain names - implementation and specification" (RFC 1035,
STD 13, November 1987. STD 13, November 1987.
B. Changes from the -01 draft [UAX9] Unicode Standard Annex #9, The Bidirectional Algorithm.
http://www.unicode.org/unicode/reports/tr9/
1.1: Revised whole section to deal with more proposals.
2.1: Clarified the ASCII art [Unicode3.1] The Unicode Standard, Version 3.1.0: The Unicode
Consortium. The Unicode Standard, Version 3.0. Reading, MA,
Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5, as amended
by: Unicode Standard Annex #27: Unicode 3.1
<http://www.unicode.org/unicode/reports/tr27/tr27-4.html>.
2.1.1: Changed the section title. Added the last three paragraphs. B. Changes from the -02 draft
2.1.4: Added the second paragraph. Editorial changes throughout
2.1.6: Added this section. 2.1.1: Major changes to the second paragraph. Added major text to fourth
paragraph.
2.1.5: Added this section. 2.1.4: Added to the end of the second paragraph. Added the third
paragraph.
3: Added note in the last sentence of second paragraph. 2.1.6: Complete change.
5: Added second and third paragraphs. 6: Added [Unicode3.1] and [UAX9].
C. Authors' Addresses C. Authors' Addresses
Patrik Faltstrom Patrik Faltstrom
Cisco Systems Cisco Systems
Arstaangsvagen 31 J Arstaangsvagen 31 J
S-117 43 Stockholm S-117 43 Stockholm Sweden
Sweden
paf@cisco.com paf@cisco.com
Paul Hoffman Paul Hoffman
Internet Mail Consortium and VPN Consortium Internet Mail Consortium and VPN Consortium
127 Segre Place 127 Segre Place
Santa Cruz, CA 95060 USA Santa Cruz, CA 95060 USA
phoffman@imc.org phoffman@imc.org
 End of changes. 17 change blocks. 
39 lines changed or deleted 52 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/