draft-ietf-idn-punycode-01.txt   draft-ietf-idn-punycode-02.txt 
INTERNET-DRAFT Adam M. Costello INTERNET-DRAFT Adam M. Costello
draft-ietf-idn-punycode-01.txt 2002-Feb-24 draft-ietf-idn-punycode-02.txt 2002-May-23
Expires 2002-Aug-24 Expires 2002-Nov-23
Punycode: An encoding of Unicode for use with IDNA Punycode: An encoding of Unicode for use with IDNA
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note Task Force (IETF), its areas, and its working groups. Note
skipping to change at line 35 skipping to change at line 35
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Distribution of this document is unlimited. Distribution of this document is unlimited.
Abstract Abstract
Punycode is a simple and efficient transfer encoding syntax designed Punycode is a simple and efficient transfer encoding syntax designed
for use with Internationalized Domain Names in Applications [IDNA]. for use with Internationalized Domain Names in Applications [IDNA].
It uniquely and reversibly transforms a Unicode string [UNICODE] It uniquely and reversibly transforms a Unicode string [UNICODE]
into an ASCII string. ASCII characters in the Unicode string are into an ASCII string [ASCII]. ASCII characters in the Unicode
represented literally, and non-ASCII characters are represented by string are represented literally, and non-ASCII characters are
ASCII characters that are allowed in host name labels (letters, represented by ASCII characters that are allowed in host name labels
digits, and hyphens). Bootstring is a general algorithm that allows (letters, digits, and hyphens). This document defines a general
a string of basic code points to uniquely represent any string of algorithm called Bootstring that allows a string of basic code
code points drawn from a larger set. Punycode is an instance of points to uniquely represent any string of code points drawn from
Bootstring that uses particular parameter values appropriate for a larger set. Punycode is an instance of Bootstring that uses
IDNA. This document specifies Bootstring and the parameter values particular parameter values specified by this document, appropriate
for Punycode. for IDNA.
Contents Contents
1. Introduction 1. Introduction
1.1 Features 1.1 Features
1.2 Interaction of protocol parts 1.2 Interaction of protocol parts
2. Terminology 2. Terminology
3. Bootstring description 3. Bootstring description
3.1 Basic code point segregation 3.1 Basic code point segregation
3.2 Insertion unsort coding 3.2 Insertion unsort coding
3.3 Generalized variable-length integers 3.3 Generalized variable-length integers
3.4 Bias adaptation 3.4 Bias adaptation
4. Bootstring parameters 4. Bootstring parameters
5. Parameter values for Punycode 5. Parameter values for Punycode
6. Bootstring algorithms 6. Bootstring algorithms
6.1 Bias adaptation function 6.1 Bias adaptation function
6.2 Decoding procedure 6.2 Decoding procedure
6.3 Encoding procedure 6.3 Encoding procedure
6.4 Alternative methods for handling overflow 6.4 Overflow handling
7. Punycode examples 7. Punycode examples
7.1 Sample strings 7.1 Sample strings
7.2 Decoding traces 7.2 Decoding traces
7.3 Encoding traces 7.3 Encoding traces
8. Security considerations 8. Security considerations
9. References 9. References (non-normative)
A. Author contact information A. Author contact information
B. Mixed-case annotation B. Mixed-case annotation
C. Disclaimer and license C. Disclaimer and license
D. Punycode sample implementation D. Punycode sample implementation
1. Introduction 1. Introduction
[IDNA] describes an architecture for supporting internationalized [IDNA] describes an architecture for supporting internationalized
domain names. Labels containing non-ASCII characters can be domain names. Labels containing non-ASCII characters can be
represented by ACE labels, which begin with a special prefix and represented by ACE labels, which begin with a special ACE prefix and
contain only ASCII characters. The remainder of the label after the contain only ASCII characters. The remainder of the label after the
prefix is a Punycode encoding of a Unicode string satisfying certain prefix is a Punycode encoding of a Unicode string satisfying certain
constraints. For the details of the prefix and constraints, see constraints. For the details of the prefix and constraints, see
[IDNA] and [NAMEPREP]. [IDNA] and [NAMEPREP].
1.1 Features 1.1 Features
Bootstring has been designed to have the following features: Bootstring has been designed to have the following features:
* Completeness: Every extended string (sequence of arbitrary code * Completeness: Every extended string (sequence of arbitrary code
skipping to change at line 112 skipping to change at line 112
length of a domain label to 63 characters. length of a domain label to 63 characters.
* Simplicity: The encoding and decoding algorithms are reasonably * Simplicity: The encoding and decoding algorithms are reasonably
simple to implement. The goals of efficiency and simplicity are simple to implement. The goals of efficiency and simplicity are
at odds; Bootstring aims at a good balance between them. at odds; Bootstring aims at a good balance between them.
* Readability: Basic code points appearing in the extended string * Readability: Basic code points appearing in the extended string
are represented as themselves in the basic string (although the are represented as themselves in the basic string (although the
main purpose is to improve efficiency, not readability). main purpose is to improve efficiency, not readability).
Punycode can also support an additional feature described in Punycode can also support an additional feature that is not used
appendix B "Mixed-case annotation". by the ToASCII and ToUnicode operations of [IDNA]. When extended
strings are case-folded prior to encoding, the basic string can
use mixed case to tell how to convert the folded string into a
mixed-case string. See appendix B "Mixed-case annotation".
1.2 Interaction of protocol parts 1.2 Interaction of protocol parts
Punycode is used by the IDNA protocol [IDNA] for converting domain Punycode is used by the IDNA protocol [IDNA] for converting domain
labels into ASCII; it is not designed for any other purpose. It is labels into ASCII; it is not designed for any other purpose. It is
explicitly not designed for processing arbitrary free text. explicitly not designed for processing arbitrary free text.
2. Terminology 2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
skipping to change at line 383 skipping to change at line 386
skew = 38 skew = 38
damp = 700 damp = 700
initial_bias = 72 initial_bias = 72
initial_n = 128 = 0x80 initial_n = 128 = 0x80
Although the only restriction Punycode imposes on the input integers Although the only restriction Punycode imposes on the input integers
is that they be nonnegative, these parameters are especially is that they be nonnegative, these parameters are especially
designed to work well with Unicode [UNICODE] code points, which designed to work well with Unicode [UNICODE] code points, which
are integers in the range 0..10FFFF (but not D800..DFFF, which are are integers in the range 0..10FFFF (but not D800..DFFF, which are
reserved for use by the UTF-16 encoding of Unicode). The basic code reserved for use by the UTF-16 encoding of Unicode). The basic code
points are the ASCII code points (0..7F), of which U+002D (-) is the points are the ASCII [ASCII] code points (0..7F), of which U+002D
delimiter, and some of the others have digit-values as follows: (-) is the delimiter, and some of the others have digit-values as
follows:
code points digit-values code points digit-values
------------ ---------------------- ------------ ----------------------
41..5A (A-Z) = 0 to 25, respectively 41..5A (A-Z) = 0 to 25, respectively
61..7A (a-z) = 0 to 25, respectively 61..7A (a-z) = 0 to 25, respectively
30..39 (0-9) = 26 to 35, respectively 30..39 (0-9) = 26 to 35, respectively
Using hyphen-minus as the delimiter implies that the encoded string Using hyphen-minus as the delimiter implies that the encoded string
can end with a hyphen-minus only if the Unicode string consists can end with a hyphen-minus only if the Unicode string consists
entirely of basic code points, but IDNA forbids such strings from entirely of basic code points, but IDNA forbids such strings from
skipping to change at line 498 skipping to change at line 502
happen because of the way bias is computed and because of the happen because of the way bias is computed and because of the
constraints on the parameters. constraints on the parameters.
Because the decoder state can only advance monotonically, and there Because the decoder state can only advance monotonically, and there
is only one representation of any delta, there is therefore only is only one representation of any delta, there is therefore only
one encoded string that can represent a given sequence of integers. one encoded string that can represent a given sequence of integers.
The only error conditions are invalid code points, unexpected The only error conditions are invalid code points, unexpected
end-of-input, overflow, and basic code points encoded using deltas end-of-input, overflow, and basic code points encoded using deltas
instead of appearing literally. If the decoder fails on these instead of appearing literally. If the decoder fails on these
errors as shown above, then it cannot produce the same output for errors as shown above, then it cannot produce the same output for
two distinct inputs, and hence it need not re-encode its output to two distinct inputs. Without this property it would have been
verify that it matches the input. necessary to re-encode the output and verify that it matches the
input in order to guarantee the uniqueness of the encoding.
If the programming language does not provide overflow detection,
the following technique can be used. Suppose A, B, and C are
representable nonnegative integers and C is nonzero. Then A + B
overflows if and only if B > maxint - A, and A + (B * C) overflows
if and only if B > (maxint - A) div C, where maxint is the greatest
integer for which maxint + 1 cannot be represented. Refer to
appendix D "Punycode sample implementation" for demonstrations of
this technique in the C language. See also section 6.4 "Alternative
methods for handling overflow".
6.3 Encoding procedure 6.3 Encoding procedure
let n = initial_n let n = initial_n
let delta = 0 let delta = 0
let bias = initial_bias let bias = initial_bias
let h = b = the number of basic code points in the input let h = b = the number of basic code points in the input
copy them to the output in order, followed by a delimiter if b > 0 copy them to the output in order, followed by a delimiter if b > 0
{if the input contains a non-basic code point < n then fail} {if the input contains a non-basic code point < n then fail}
while h < length(input) do begin while h < length(input) do begin
skipping to change at line 561 skipping to change at line 556
than initial_n. than initial_n.
In the assignment of t, where t is clamped to the range tmin through In the assignment of t, where t is clamped to the range tmin through
tmax, "+ tmin" can always be omitted. This makes the clamping tmax, "+ tmin" can always be omitted. This makes the clamping
calculation incorrect when bias < k < bias + tmin, but that cannot calculation incorrect when bias < k < bias + tmin, but that cannot
happen because of the way bias is computed and because of the happen because of the way bias is computed and because of the
constraints on the parameters. constraints on the parameters.
The checks for overflow are necessary to avoid producing invalid The checks for overflow are necessary to avoid producing invalid
output when the input contains very large values or is very long. output when the input contains very large values or is very long.
Wider integer variables can handle more extreme inputs. For IDNA,
26-bit unsigned integers are sufficient, because any string that
needed a 27-bit delta would have to exceed either the code point
limit (0..10FFFF) or the label length limit (63 characters).
The increment of delta at the bottom of the outer loop cannot The increment of delta at the bottom of the outer loop cannot
overflow because delta < length(input) before the increment, and overflow because delta < length(input) before the increment, and
length(input) is already assumed to be representable. The increment length(input) is already assumed to be representable. The increment
of n could overflow, but only if h == length(input), in which case of n could overflow, but only if h == length(input), in which case
the procedure is finished anyway. the procedure is finished anyway.
6.4 Alternative methods for handling overflow 6.4 Overflow handling
The encoding and decoding algorithms handle overflow by detecting For IDNA, 26-bit unsigned integers are sufficient to handle all
it whenever it happens. Another approach is to enforce limits on valid IDNA labels without overflow, because any string that
the inputs that prevent overflow from happening. For example, if needed a 27-bit delta would have to exceed either the code point
the encoder were to verify that no input code points exceed M and limit (0..10FFFF) or the label length limit (63 characters).
that the input length does not exceed L, then no delta could ever However, overflow handling is necessary because the inputs are not
exceed (M - initial_n) * (L + 1), and hence no overflow could occur necessarily valid IDNA labels.
if integer variables were capable of representing values that large.
This prevention approach would impose more restrictions on the input If the programming language does not provide overflow detection,
than the detection approach does, but might be considered simpler in the following technique can be used. Suppose A, B, and C are
some programming languages. representable nonnegative integers and C is nonzero. Then A + B
overflows if and only if B > maxint - A, and A + (B * C) overflows
if and only if B > (maxint - A) div C, where maxint is the greatest
integer for which maxint + 1 cannot be represented. Refer to
appendix D "Punycode sample implementation" for demonstrations of
this technique in the C language.
The decoding and encoding algorithms shown in sections 6.2 and
6.3 handle overflow by detecting it whenever it happens. Another
approach is to enforce limits on the inputs that prevent overflow
from happening. For example, if the encoder were to verify that
no input code points exceed M and that the input length does not
exceed L, then no delta could ever exceed (M - initial_n) * (L + 1),
and hence no overflow could occur if integer variables were capable
of representing values that large. This prevention approach would
impose more restrictions on the input than the detection approach
does, but might be considered simpler in some programming languages.
In theory, the decoder could use an analogous approach, limiting the In theory, the decoder could use an analogous approach, limiting the
number of digits in a variable-length integer (that is, limiting the number of digits in a variable-length integer (that is, limiting the
number of iterations in the innermost loop). However, the number number of iterations in the innermost loop). However, the number
of digits that suffice to represent a given delta can sometimes of digits that suffice to represent a given delta can sometimes
represent much larger deltas (because of the adaptation), and hence represent much larger deltas (because of the adaptation), and hence
this approach would probably need integers wider than 32 bits. this approach would probably need integers wider than 32 bits.
Yet another approach for the decoder is to allow overflow to occur, Yet another approach for the decoder is to allow overflow to occur,
but to check the final output string by re-encoding it and comparing but to check the final output string by re-encoding it and comparing
skipping to change at line 610 skipping to change at line 617
In fact, if the decoder is used only inside the IDNA ToUnicode In fact, if the decoder is used only inside the IDNA ToUnicode
operation [IDNA], then it need not check for overflow at all, operation [IDNA], then it need not check for overflow at all,
because ToUnicode performs a higher level re-encoding and because ToUnicode performs a higher level re-encoding and
comparison, and a mismatch has the same consequence as if the comparison, and a mismatch has the same consequence as if the
Punycode decoder had failed. Punycode decoder had failed.
7. Punycode examples 7. Punycode examples
7.1 Sample strings 7.1 Sample strings
In the Punycode encodings below, the IDNA signature prefix is not In the Punycode encodings below, the ACE prefix is not shown.
shown. Backslashes show where line breaks have been inserted in Backslashes show where line breaks have been inserted in strings too
strings too long for one line. long for one line.
The first several examples are all translations of the sentence "Why The first several examples are all translations of the sentence "Why
can't they just speak in <language>?" (courtesy of Michael Kaplan's can't they just speak in <language>?" (courtesy of Michael Kaplan's
"provincial" page [PROVINCIAL]). Word breaks and punctuation have "provincial" page [PROVINCIAL]). Word breaks and punctuation have
been removed, as is often done in domain names. been removed, as is often done in domain names.
(A) Arabic (Egyptian): (A) Arabic (Egyptian):
u+0644 u+064A u+0647 u+0645 u+0627 u+0628 u+062A u+0643 u+0644 u+0644 u+064A u+0647 u+0645 u+0627 u+0628 u+062A u+0643 u+0644
u+0645 u+0648 u+0634 u+0639 u+0631 u+0628 u+064A u+061F u+0645 u+0648 u+0634 u+0639 u+0631 u+0628 u+064A u+061F
Punycode: egbpdaj6bu4bxfgehfvwxn Punycode: egbpdaj6bu4bxfgehfvwxn
(B) Chinese (simplified): (B) Chinese (simplified):
u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587 u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587
Punycode: ihqwcrb4cv8a8dqg056pqjye Punycode: ihqwcrb4cv8a8dqg056pqjye
(C) Czech: Pro<ccaron>prost<ecaron>nemluv<iacute><ccaron>esky (C) Chinese (traditional):
u+4ED6 u+5011 u+7232 u+4EC0 u+9EBD u+4E0D u+8AAA u+4E2D u+6587
Punycode: ihqwctvzc91f659drss3x8bo0yb
(D) Czech: Pro<ccaron>prost<ecaron>nemluv<iacute><ccaron>esky
U+0050 u+0072 u+006F u+010D u+0070 u+0072 u+006F u+0073 u+0074 U+0050 u+0072 u+006F u+010D u+0070 u+0072 u+006F u+0073 u+0074
u+011B u+006E u+0065 u+006D u+006C u+0075 u+0076 u+00ED u+010D u+011B u+006E u+0065 u+006D u+006C u+0075 u+0076 u+00ED u+010D
u+0065 u+0073 u+006B u+0079 u+0065 u+0073 u+006B u+0079
Punycode: Proprostnemluvesky-uyb24dma41a Punycode: Proprostnemluvesky-uyb24dma41a
(D) Hebrew: (E) Hebrew:
u+05DC u+05DE u+05D4 u+05D4 u+05DD u+05E4 u+05E9 u+05D5 u+05D8 u+05DC u+05DE u+05D4 u+05D4 u+05DD u+05E4 u+05E9 u+05D5 u+05D8
u+05DC u+05D0 u+05DE u+05D3 u+05D1 u+05E8 u+05D9 u+05DD u+05E2 u+05DC u+05D0 u+05DE u+05D3 u+05D1 u+05E8 u+05D9 u+05DD u+05E2
u+05D1 u+05E8 u+05D9 u+05EA u+05D1 u+05E8 u+05D9 u+05EA
Punycode: 4dbcagdahymbxekheh6e0a7fei0b Punycode: 4dbcagdahymbxekheh6e0a7fei0b
(E) Hindi (Devanagari): (F) Hindi (Devanagari):
u+092F u+0939 u+0932 u+094B u+0917 u+0939 u+093F u+0928 u+094D u+092F u+0939 u+0932 u+094B u+0917 u+0939 u+093F u+0928 u+094D
u+0926 u+0940 u+0915 u+094D u+092F u+094B u+0902 u+0928 u+0939 u+0926 u+0940 u+0915 u+094D u+092F u+094B u+0902 u+0928 u+0939
u+0940 u+0902 u+092C u+094B u+0932 u+0938 u+0915 u+0924 u+0947 u+0940 u+0902 u+092C u+094B u+0932 u+0938 u+0915 u+0924 u+0947
u+0939 u+0948 u+0902 u+0939 u+0948 u+0902
Punycode: i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd Punycode: i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd
(F) Japanese (kanji and hiragana): (G) Japanese (kanji and hiragana):
u+306A u+305C u+307F u+3093 u+306A u+65E5 u+672C u+8A9E u+3092 u+306A u+305C u+307F u+3093 u+306A u+65E5 u+672C u+8A9E u+3092
u+8A71 u+3057 u+3066 u+304F u+308C u+306A u+3044 u+306E u+304B u+8A71 u+3057 u+3066 u+304F u+308C u+306A u+3044 u+306E u+304B
Punycode: n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa Punycode: n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa
(G) Korean (Hangul syllables): (H) Korean (Hangul syllables):
u+C138 u+ACC4 u+C758 u+BAA8 u+B4E0 u+C0AC u+B78C u+B4E4 u+C774 u+C138 u+ACC4 u+C758 u+BAA8 u+B4E0 u+C0AC u+B78C u+B4E4 u+C774
u+D55C u+AD6D u+C5B4 u+B97C u+C774 u+D574 u+D55C u+B2E4 u+BA74 u+D55C u+AD6D u+C5B4 u+B97C u+C774 u+D574 u+D55C u+B2E4 u+BA74
u+C5BC u+B9C8 u+B098 u+C88B u+C744 u+AE4C u+C5BC u+B9C8 u+B098 u+C88B u+C744 u+AE4C
Punycode: 989aomsvi5e83db1d2a355cv1e0vak1dwrv93d5xbh15a0dt30a5j\ Punycode: 989aomsvi5e83db1d2a355cv1e0vak1dwrv93d5xbh15a0dt30a5j\
psd879ccm6fea98c psd879ccm6fea98c
(H) Russian (Cyrillic): (I) Russian (Cyrillic):
U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E
u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440 u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440
u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A
u+0438 u+0438
Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l
(I) Spanish: Porqu<eacute>nopuedensimplementehablarenEspa<ntilde>ol (J) Spanish: Porqu<eacute>nopuedensimplementehablarenEspa<ntilde>ol
U+0050 u+006F u+0072 u+0071 u+0075 u+00E9 u+006E u+006F u+0070 U+0050 u+006F u+0072 u+0071 u+0075 u+00E9 u+006E u+006F u+0070
u+0075 u+0065 u+0064 u+0065 u+006E u+0073 u+0069 u+006D u+0070 u+0075 u+0065 u+0064 u+0065 u+006E u+0073 u+0069 u+006D u+0070
u+006C u+0065 u+006D u+0065 u+006E u+0074 u+0065 u+0068 u+0061 u+006C u+0065 u+006D u+0065 u+006E u+0074 u+0065 u+0068 u+0061
u+0062 u+006C u+0061 u+0072 u+0065 u+006E U+0045 u+0073 u+0070 u+0062 u+006C u+0061 u+0072 u+0065 u+006E U+0045 u+0073 u+0070
u+0061 u+00F1 u+006F u+006C u+0061 u+00F1 u+006F u+006C
Punycode: PorqunopuedensimplementehablarenEspaol-fmd56a Punycode: PorqunopuedensimplementehablarenEspaol-fmd56a
(J) Taiwanese:
u+4ED6 u+5011 u+7232 u+4EC0 u+9EBD u+4E0D u+8AAA u+4E2D u+6587
Punycode: ihqwctvzc91f659drss3x8bo0yb
(K) Vietnamese: (K) Vietnamese:
T<adotbelow>isaoh<odotbelow>kh<ocirc>ngth<ecirchookabove>ch\ T<adotbelow>isaoh<odotbelow>kh<ocirc>ngth<ecirchookabove>ch\
<ihookabove>n<oacute>iti<ecircacute>ngVi<ecircdotbelow>t <ihookabove>n<oacute>iti<ecircacute>ngVi<ecircdotbelow>t
U+0054 u+1EA1 u+0069 u+0073 u+0061 u+006F u+0068 u+1ECD u+006B U+0054 u+1EA1 u+0069 u+0073 u+0061 u+006F u+0068 u+1ECD u+006B
u+0068 u+00F4 u+006E u+0067 u+0074 u+0068 u+1EC3 u+0063 u+0068 u+0068 u+00F4 u+006E u+0067 u+0074 u+0068 u+1EC3 u+0063 u+0068
u+1EC9 u+006E u+00F3 u+0069 u+0074 u+0069 u+1EBF u+006E u+0067 u+1EC9 u+006E u+00F3 u+0069 u+0074 u+0069 u+1EBF u+006E u+0067
U+0056 u+0069 u+1EC7 u+0074 U+0056 u+0069 u+1EC7 u+0074
Punycode: TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g Punycode: TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g
The next several examples are all names of Japanese music artists, The next several examples are all names of Japanese music artists,
skipping to change at line 720 skipping to change at line 727
Punycode: 2-u9tlzr9756bt3uc0v Punycode: 2-u9tlzr9756bt3uc0v
(P) Maji<de>Koi<suru>5<byou><mae> (P) Maji<de>Koi<suru>5<byou><mae>
U+004D u+0061 u+006A u+0069 u+3067 U+004B u+006F u+0069 u+3059 U+004D u+0061 u+006A u+0069 u+3067 U+004B u+006F u+0069 u+3059
u+308B u+0035 u+79D2 u+524D u+308B u+0035 u+79D2 u+524D
Punycode: MajiKoi5-783gue6qz075azm5e Punycode: MajiKoi5-783gue6qz075azm5e
(Q) <pafii>de<runba> (Q) <pafii>de<runba>
u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0 u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0
Punycode: de-jg4avhby1noc0d Punycode: de-jg4avhby1noc0d
(R) <sono><supiido><de> (R) <sono><supiido><de>
u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067 u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067
Punycode: d9juau41awczczp Punycode: d9juau41awczczp
The last example is an ASCII string that breaks the existing rules
The last example is an ASCII string that breaks not only the for host name labels. (It is not a realistic example for IDNA,
existing rules for host name labels but also the rules in [NAMEPREP] because IDNA never encodes pure ASCII labels.)
for internationalized domain names.
(S) -> $1.00 <- (S) -> $1.00 <-
u+002D u+003E u+0020 u+0024 u+0031 u+002E u+0030 u+0030 u+0020 u+002D u+003E u+0020 u+0024 u+0031 u+002E u+0030 u+0030 u+0020
u+003C u+002D u+003C u+002D
Punycode: -> $1.00 <-- Punycode: -> $1.00 <--
7.2 Decoding traces 7.2 Decoding traces
In the following traces, the evolving state of the decoder is In the following traces, the evolving state of the decoder is
shown as a sequence of hexadecimal values, representing the code shown as a sequence of hexadecimal values, representing the code
skipping to change at line 880 skipping to change at line 887
a different authority, some of which could be spoofs that hijack a different authority, some of which could be spoofs that hijack
service requests intended for another. Therefore Punycode is service requests intended for another. Therefore Punycode is
designed so that each Unicode string has a unique encoding. designed so that each Unicode string has a unique encoding.
However, there can still be multiple Unicode representations of the However, there can still be multiple Unicode representations of the
"same" text, for various definitions of "same". This problem is "same" text, for various definitions of "same". This problem is
addressed to some extent by the Unicode standard under the topic of addressed to some extent by the Unicode standard under the topic of
canonicalization, and this work is leveraged for domain names by canonicalization, and this work is leveraged for domain names by
Nameprep [NAMEPREP]. Nameprep [NAMEPREP].
9. References 9. References (non-normative)
[ASCII] Vint Cerf, "ASCII format for Network Interchange",
1969-Oct-16, RFC 20.
[IDNA] Patrik Faltstrom, Paul Hoffman, Adam M. Costello, [IDNA] Patrik Faltstrom, Paul Hoffman, Adam M. Costello,
"Internationalizing Domain Names In Applications (IDNA)", "Internationalizing Domain Names In Applications (IDNA)",
2002-###-##, draft-ietf-idn-idna-07. draft-ietf-idn-idna.
[NAMEPREP] Paul Hoffman, Marc Blanchet, "Nameprep: A Stringprep [NAMEPREP] Paul Hoffman, Marc Blanchet, "Nameprep: A
Profile for Internationalized Domain Names", 2002-###-##, Stringprep Profile for Internationalized Domain Names",
draft-ietf-idn-nameprep-08. draft-ietf-idn-nameprep.
[PROVINCIAL] Michael Kaplan, "The 'anyone can be provincial!' page", [PROVINCIAL] Michael Kaplan, "The 'anyone can be provincial!' page",
http://www.trigeminal.com/samples/provincial.html. http://www.trigeminal.com/samples/provincial.html.
[RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host [RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host
Table Specification", 1985-Oct, RFC 952. Table Specification", 1985-Oct, RFC 952.
[RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities", [RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities",
1987-Nov, RFC 1034. 1987-Nov, RFC 1034.
skipping to change at line 952 skipping to change at line 962
author grants irrevocable permission to anyone to use, modify, author grants irrevocable permission to anyone to use, modify,
and distribute it in any way that does not diminish the rights and distribute it in any way that does not diminish the rights
of anyone else to use, modify, and distribute it, provided that of anyone else to use, modify, and distribute it, provided that
redistributed derivative works do not contain misleading author or redistributed derivative works do not contain misleading author or
version information. Derivative works need not be licensed under version information. Derivative works need not be licensed under
similar terms. similar terms.
D. Punycode sample implementation D. Punycode sample implementation
/* /*
punycode.c from draft-ietf-idn-punycode-01 punycode.c from draft-ietf-idn-punycode-02
http://www.nicemice.net/idn/ http://www.nicemice.net/idn/
Adam M. Costello Adam M. Costello
http://www.nicemice.net/amc/ http://www.nicemice.net/amc/
This is ANSI C code (C89) implementing This is ANSI C code (C89) implementing
Punycode (draft-ietf-idn-punycode-01). Punycode (draft-ietf-idn-punycode-02).
*/ */
/************************************************************/ /************************************************************/
/* Public interface (would normally go in its own .h file): */ /* Public interface (would normally go in its own .h file): */
#include <limits.h> #include <limits.h>
enum punycode_status { enum punycode_status {
punycode_success, punycode_success,
skipping to change at line 1461 skipping to change at line 1471
if (r < 0) fail(io_error); if (r < 0) fail(io_error);
} }
return EXIT_SUCCESS; return EXIT_SUCCESS;
} }
usage(argv); usage(argv);
return EXIT_SUCCESS; /* not reached, but quiets compiler warning */ return EXIT_SUCCESS; /* not reached, but quiets compiler warning */
} }
INTERNET-DRAFT expires 2002-Aug-24 INTERNET-DRAFT expires 2002-Nov-23
 End of changes. 28 change blocks. 
70 lines changed or deleted 80 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/