Internet Engineering Task Force (IETF)INTERNET-DRAFT                                               Mark Welter
draft-ietf-idn-dude-02.txt                            Brian W. Spolarich
draft-ietf-idn-dude-01.txt                                     WALID, Inc.
March 02, 2001
Expires September 02, 2001

              DUDE: 2001-Dec-07                                     Adam M. Costello

              Differential Unicode Domain Encoding (DUDE)

Status of this memo Memo

    This document is an Internet-Draft and is in full conformance with
    all provisions of Section 10 of RFC2026.

    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note
    that other groups may also distribute working documents as

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at

    The list of Internet-Draft Shadow Directories can be accessed at

The distribution

    Distribution of this document is unlimited.

Copyright (c) The Internet Society (2000).  All Rights Reserved.  Please send comments to
    the authors or to the idn working group at


This document describes

    DUDE is a tranformation method for representing
Unicode character codepoints in host name parts in reversible transformation from a fashion that is
completely compatible with the current Domain Name System.  It provides
for very efficient representation sequence of typical Unicode sequences as
host name parts, while preserving simplicity.  It is proposed as nonnegative
    integer values to a
potential candidate for an sequence of letters, digits, and hyphens (LDH
    characters).  DUDE provides a simple and efficient ASCII-Compatible
    Encoding (ACE) for supporting
the deployment of an internationalized Unicode strings [UNICODE] for use with
    Internationalized Domain Name System.

Table of Names [IDN] [IDNA].


    1. Introduction
1.1         Terminology
    2.        Hostname Part Transformation
2.1         Post-Converted Name Prefix
2.2         Radix Selection
2.3         Hostname Prepartion
2.4         Definitions
2.5         DUDE Encoding
2.5.1         Extended Variable Length Hex Encoding
2.5.2         DUDE Compression Algorithm
2.5.3         Forward Transformation Algorithm
2.6         DUDE Decoding
2.6.1         Extended Variable Length Hex Decoding
2.6.2         DUDE Decompression Algorithm
2.6.3         Reverse Transformation Algorithm Terminology
    3.        Examples Overview
    4.        Optional Case Preservation Base-32 characters
    5.        Security Considerations Encoding procedure
    6. Decoding procedure
    7. Example strings
    8. Security considerations
    9. References
    A. Acknowledgements
    B. Author contact information
    C. Mixed-case annotation
    D. Differences from draft-ietf-idn-dude-01
    E. Example implementation

1. Introduction


    The IDNA draft [IDNA] describes an encoding scheme of the ISO/IEC 10646 [ISO10646]
character set (whose character code assignments are synchronized
with Unicode [UNICODE3]), and the procedures architecture for using this scheme
to transform host supporting
    internationalized domain names.  Each label of a domain name parts containing Unicode character sequences
into sequences that are compatible may
    begin with a special prefix, in which case the current DNS protocol
[STD13].  As such, it satisfies remainder of the definition
    label is an ASCII-Compatible Encoding (ACE) of a 'charset' as
defined in [IDNREQ].

1.1 Terminology Unicode string
    satisfying certain constraints.  For the details of the constraints,
    see [IDNA] and [NAMEPREP].  The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", prefix has not yet been specified,
    but see for prefixes to be used for testing
"MAY" in this document are experimentation.

    DUDE is intended to be interpreted used as described in RFC 2119

Hexadecimal values are shown preceded with an "0x". For example,
"0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are
shown preceded with ACE within IDNA, and has been
    designed to have the following features:

      * Completeness:  Every sequence of nonnegative integers maps to an "0b". For example, a nine-bit value might
        LDH string.  Restrictions on which integers are allowed, and on
        sequence length, may be
shown as "0b101101111".

Examples in this document use the notation from the Unicode Standard
[UNICODE3] as well as the ISO 10646 names. For example, the letter "a"
may be represented as either "U+0061" or "LATIN SMALL LETTER A".

DUDE converts strings with internationalized characters into
strings imposed by higher layers.

      * Uniqueness:  Every sequence of US-ASCII nonnegative integers maps to at
        most one LDH string.

      * Reversibility:  Any Unicode string mapped to an LDH string can
        be recovered from that are acceptable as host name parts in current
DNS host naming usage. LDH string.

      * Efficient encoding:  The former are called "pre-converted" and the
latter are called "post-converted".  This specification defines both
a forward and reverse transformation algorithm.

2. Hostname Part Transformation

According to [STD13], hostname parts must start and end with a letter
or digit, and contain only letters, digits, and the hyphen character
("-"). This, ratio of course, excludes most characters used by non-English
speakers, characters, as well as many other characters encoded size to original size
        is small.  This is important in the ASCII
character repertoire. Further, context of domain name parts must be 63 octets or
shorter in length.

2.1  Post-Converted Name Prefix

This document defines names
        because [RFC1034] restricts the string 'dq--' as length of a prefix domain label to identify
DUDE-encoded sequences.  For the purposes of comparison in the IDN
Working Group activities, the 'dq--' prefix should be used solely 63

      * Simplicity:  The encoding and decoding algorithms are reasonably
        simple to
identify implement.  The goals of efficiency and simplicity are
        at odds; DUDE sequences.  However, should places greater emphasis on simplicity.

    An optional feature is described in appendix C "Mixed-case

2. Terminology

    The key words "must", "shall", "required", "should", "recommended",
    and "may" in this document proceed beyond
draft status the prefix should be changed are to whatever prefix, if any,
is the final consensus of be interpreted as described in
    RFC 2119 [RFC2119].

    LDH characters are the IDN working group.

Note that letters A-Z and a-z, the prepending of digits 0-9, and

    A quartet is a fixed identifier sequence is only one
mechanism for differentiating ASCII character encoded international
domain names from 'ordinary' domain names.  One method, of four bits (also known as proposed in
[IDNRACE], is to include a character prefix nibble or suffix that does not
appear in any name in any zone file.

    A second method quintet is to insert a
domain component which pushes off any international names one or more
levels deeper into the DNS hierarchy.  There are trade-offs between
these two methods which are independent sequence of five bits.

    Hexadecimal values are shown preceeded by "0x".  For example, 0x60
    is decimal 96.

    As in the Unicode to ASCII
transcoding method finally chosen.  We do not address the international
vs. 'ordinary' name differention issue in this paper.

2.2  Radix Selection

There Standard [UNICODE], Unicode code points are many proposed methods for representing Unicode characters
within the allowed target character set, which can be split into groups
on the basis of the underlying radix.  We have chosen a method with
radix 16 because both UTF-32 and ASCII are represented
    denoted by "U+" followed by even multiples
of four bits.  This allows a Unicode character to be encoded as six hexadecimal digits, while a
whole number of ASCII characters, and permits easier manipulation
    range of
the resulting encoded data by humans.

2.3  Hostname Preparation

The hostname part code points is assumed to have at least one character disallowed denoted by two hexadecimal numbers separated
    by [STD13], "..", with no prefixes.

    XOR means bitwise exclusive or.  Given two nonnegative integer
    values A and that B, A XOR B is has been processed for logically equivalent
character mapping, filtering of disallowed characters (if any), and
compatibility composition/decomposition before presentation to the DUDE
conversion algorithm.

While it nonnegative integer value whose
    binary representation is possible to invent a transcoding mechanism that relies
on certain Unicode characters being deemed illegal within domain names 1 in whichever places the binary
    representations of A and hence available to B disagree, and 0 wherever they agree.
    For the transcoding mechanism for improving encoding
efficiency, we feel purpose of applying this rule, recall that such a proposal would complicate matters

2.4  Definitions

For clarity:

  'integer' is an unsigned binary quantity;
  'byte' is integer's
    representation begins with an 8-bit integer quantity;
  'nibble' is a 4-bit integer quantity.

2.5  DUDE Encoding

The idea behind this scheme is to provide compression by encoding the
contiguous least significant nibbles infinite number of a character unwritten zeros.
    In some programming languages, care may need to be taken that differ from A and
    B are stored in variables of the
preceding character.  Using same type and size.

3. Overview

    DUDE encodes a sequence of nonnegative integral values as a variant sequence
    of LDH characters, although implementations will of course need to
    represent the variable length hex encoding
desribed output characters somehow, typically as ASCII octets.
    When DUDE is used to encode Unicode characters, the input values are
    Unicode code points (integral values in [IDNDUERST] and elsewhere, the range 0..10FFFF, but not
    D800..DFFF, which are reserved for use by encoding leading zero nibbles
this technique allows recovery of UTF-16).

    Each value in the differential length. The encoding
is, with some practice, easy to perform manually.

2.5.1  Extended Variable Length Hex Encoding

The variable length hex encoding algorithm was introduced input sequence is represented by Duerst one or more LDH
    characters in
[IDNDUERST].  It encodes an integer the encoded string.  The value 0x2D is represented
    by hyphen-minus (U+002D).  Each non-hyphen-minus character in
    the encoded string represents a slight modification quintet.  A sequence of
traditional hexadecimal notation, quintets
    represents the difference being that bitwise XOR between each non-0x2D integer and the most
significant digit is represented with an alternate set of "digits"
- -- 'g through 'v'
    previous one.

4. Base-32 characters

        "a" =  0 = 0x00 = 00000         "s" = 16 = 0x10 = 10000
        "b" =  1 = 0x01 = 00001         "t" = 17 = 0x11 = 10001
        "c" =  2 = 0x02 = 00010         "u" = 18 = 0x12 = 10010
        "d" =  3 = 0x03 = 00011         "v" = 19 = 0x13 = 10011
        "e" =  4 = 0x04 = 00100         "w" = 20 = 0x14 = 10100
        "f" =  5 = 0x05 = 00101         "x" = 21 = 0x15 = 10101
        "g" =  6 = 0x06 = 00110         "y" = 22 = 0x16 = 10110
        "h" =  7 = 0x07 = 00111         "z" = 23 = 0x17 = 10111
        "i" =  8 = 0x08 = 01000         "2" = 24 = 0x18 = 11000
        "j" =  9 = 0x09 = 01001         "3" = 25 = 0x19 = 11001
        "k" = 10 = 0x0A = 01010         "4" = 26 = 0x1A = 11010
        "m" = 11 = 0x0B = 01011         "5" = 27 = 0x1B = 11011
        "n" = 12 = 0x0C = 01100         "6" = 28 = 0x1C = 11100
        "p" = 13 = 0x0D = 01101         "7" = 29 = 0x1D = 11101
        "q" = 14 = 0x0E = 01110         "8" = 30 = 0x1E = 11110
        "r" = 15 = 0x0F = 01111         "9" = 31 = 0x1F = 11111

    The digits "0" and "1" and the letters "o" and "l" are used not used, to
    avoid transcription errors.

    A decoder must accept both the uppercase and lowercase forms of
    the base-32 characters (including mixtures of both forms).  An
    encoder should output only lowercase forms or only uppercase forms
    (unless it uses the feature described in the appendix C "Mixed-case

5. Encoding procedure

    All ordering of bits, quartets, and quintets is big-endian (most
    significant first).

    let prev = 0x60
    for each input integer n (in order) do begin
      if n == 0x2D then output hyphen-minus
      else begin
        let diff = prev XOR n
        represent diff in base 16 as a sequence of quartets,
          as few as are sufficient (but at least one)
        prepend 0 through 15. to the last quartet and 1 to each of the others
        output a base-32 character corresponding to each quintet
        let prev = n

    If an encoder encounters an input value larger than expected (for
    example, the largest Unicode code point is U+10FFFF, and nameprep
    [NAMEPREP03] can never output a code point larger than U+EFFFD),
    the encoder may either encode the value correctly, or may fail, but
    it must not produce incorrect output.  The result encoder must fail if it
    encounters a negative input value.

6. Decoding procedure

    let prev = 0x60
    while the input string is not exhausted do begin
      if the next character is hyphen-minus
      then consume it and output 0x2D
      else begin
        consume characters and convert them to quintets until
          encountering a
variable length encoding which can efficiently represent integers quintet whose first bit is 0
        fail upon encountering a non-base-32 character or end-of-input
        strip the first bit of
arbitrary length.

This specification extends each quintet
        concatenate the variable length hex encoding algorithm resulting quartets to support form diff
        let prev = prev XOR diff
        output prev
    encode the compression scheme defined below by potentially output sequence and compare it to the input string
    fail if they do not
supressing leading zero nibbles. match (case-insensitively)

    The extended variable length nibble encoding comparison at the end is necessary to guarantee the uniqueness
    property (there cannot be two distinct encoded strings representing
    the same sequence of an integer, C, integers).  This check also frees the decoder
    from having to length N, check for overflow while decoding the base-32
    characters.  (If the decoder is defined as follows:

  1.  Start with I, one step of a larger decoding
    process, it may be possible to defer the Nth least significant nibble from re-encoding and comparison
    to the least
      significant nibble end of C;

  2.  Emit that larger decoding process.)

7. Example strings

    The first several examples are nonsense strings of mostly unassigned
    code points intended to exercise the Ith character corner cases of the sequence [ghijklmnopqrstuv];

  3.  Continue from algorithm.

    (A) u+0061
        DUDE: b

    (B) u+2C7EF u+2C7EF
        DUDE: u6z2ra
    (C) u+1752B u+1752A
        DUDE: tzxwmb

    (D) u+63AB1 u+63ABA
        DUDE: yv47bm

    (E) u+261AF u+261BF
        DUDE: uyt6rta

    (F) u+C3A31 u+C3A8C
        DUDE: 6v4xb5p

    (G) u+09F44 u+0954C
        DUDE: 39ue4si

    (H) u+8D1A3 u+8C8A3
        DUDE: 27t6dt3sa

    (I) u+6C2B6 u+CC266
        DUDE: y6u7g4ss7a

    (J) u+002D u+002D u+002D u+E848F
        DUDE: ---82w8r

    (K) u+BD08E u+002D u+002D u+002D
        DUDE: 57s8q---

    (L) u+A9A24 u+002D u+002D u+002D u+C05B7
        DUDE: 434we---y393d

    (M) u+7FFFFFFF
        DUDE: z999993r or explicit failure

    The next several examples are realistic Unicode strings that could
    be used in domain names.  They exhibit single-row text, two-row
    text, ideographic text, and mixtures thereof.  These examples are
    names of Japanese television programs, music artists, and songs,
    merely because one of the most authors happened to least significant, encoding have them handy.

    (N) 3<nen>b<gumi><kinpachi><sensei>  (Latin, kanji)
        u+0033 u+5E74 u+0062 u+7D44 u+91D1 u+516B u+5148 u+751F
        DUDE: xdx8whx8tgz7ug863f6s5kuduwxh

    (O) <amuro><namie>-with-super-monkeys  (Latin, kanji, hyphens)
        u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074
        u+0068 u+002D u+0073 u+0075 u+0070 u+0065 u+0072 u+002D u+006D
        u+006F u+006E u+006B u+0065 u+0079 u+0073
        DUDE: x58jupu8nuy6gt99m-yssctqtptn-tmgftfth-trcbfqtnk

    (P) maji<de>koi<suru>5<byou><mae>  (Latin, hiragana, kanji)
        u+006D u+0061 u+006A u+0069 u+3067 u+006B u+006F u+0069 u+3059
        u+308B u+0035 u+79D2 u+524D
        DUDE: pnmdvssqvssnegvsva7cvs5qz38hu53r

    (Q) <pafii>de<runba>  (Latin, katakana)
        u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0
        DUDE: vs5bezgxrvs3ibvs2qtiud
    (R) <sono><supiido><de>  (hiragana, katakana)
        u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067
        DUDE: vsvpvd7hypuivf4q

8. Security considerations

    Users expect each
      remaining nibble J domain name in DNS to be controlled by emitting the Jth character a single
    authority.  If a Unicode string intended for use as a domain label
    could map to multiple ACE labels, then an internationalized domain
    name could map to multiple ACE domain names, each controlled by
    a different authority, some of the
      sequence [0123456789abcdef].

2.5.2 which could be spoofs that hijack
    service requests intended for another.  Therefore DUDE Compression Algorithm

  1.  Let PREV = 0;

  2.  If is designed
    so that each Unicode string has a unique encoding.

    However, there are no more characters in the input, terminate successfully;

  4.  Let C can still be multiple Unicode representations of the next character in the input;

  5.  If C != '-' , then go
    "same" text, for various definitions of "same".  This problem is
    addressed to step 7;

  6.  Consume some extent by the input character, emit '-', and go to step 2;

  7.  Let D be Unicode standard under the result topic of PREV exclusive ORed with C;

  8.  Find the least positive value N such that
        D bitwise ANDed with M
    canonicalization, and this work is zero
        where M = the bitwise complement of (16**N) - 1; leveraged for domain names by
    "nameprep" [NAMEPREP03].

9.  Let V be C ANDed with the bitwise complement References

    [IDN] Internationalized Domain Names (IETF working group),,

    [IDNA] Patrik Faltstrom, Paul Hoffman, "Internationalizing Host
    Names In Applications (IDNA)", draft-ietf-idn-idna-01.

    [NAMEPREP03] Paul Hoffman, Marc Blanchet, "Preparation
    of M;

 10.  Variable length hex encode V to length N Internationalized Host Names", 2001-Feb-24,

    [RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host
    Table Specification", 1985-Oct, RFC 952.

    [RFC1034] P. Mockapetris, "Domain Names - Concepts and emit the result;

 11.  Let PREV = C Facilities",
    1987-Nov, RFC 1034.

    [RFC1123] Internet Engineering Task Force, R. Braden (editor),
    "Requirements for Internet Hosts -- Application and go to step 2.

2.5.3  Forward Transformation Algorithm

The DUDE transformation algorithm accepts a string in UTF-32
[UNICODE3] format as input.  It is assumed that prior nameprep
processing has disallowed the private Support",
    1989-Oct, RFC 1123.

    [RFC2119] Scott Bradner, "Key words for use code points in
0X100000 throuh 0X10FFFF, so that we are left with the task of
encoding 20 bit integers. RFCs to Indicate
    Requirement Levels", 1997-Mar, RFC 2119.

    [SFS] David Mazieres et al, "Self-certifying File System",

    [UNICODE] The Unicode Consortium, "The Unicode Standard",

A. Acknowledgements

    The basic encoding algorithm is as follows:

  1.  Break the hostname string into dot-separated hostname parts.
      For each hostname part which contains one or more characters
      disallowed of integers to quartets to quintets to base-32
    comes from earlier IETF work by [STD13], perform steps 2 and 3 below;

  2.  Compress the hostname part using Martin Duerst.  DUDE uses a slight
    variation on the method described in section
      2.5.2 above, idea.

    Paul Hoffman provided helpful comments on this document.

    The idea of avoiding 0, 1, o, and encode using the encoding described l in section

  3.  Prepend the post-converted name prefix 'dq--' (see section 2.1
      above) base-32 strings was taken
    from SFS [SFS].

B. Author contact information

    Mark Welter <>
    Brian W. Spolarich <>
    WALID, Inc.
    State Technology Park
    2245 S. State St.
    Ann Arbor, MI  48104
    +1 734 822 2020

    Adam M. Costello <>
    University of California, Berkeley

C. Mixed-case annotation

    In order to use DUDE to represent case-insensitive Unicode strings,
    higher layers need to case-fold the resulting string.

2.6 Unicode strings prior to DUDE Decoding

2.6.1  Extended Variable Length Hex Decoding

  Decoding extended variable length hex
    encoding.  The encoded strings is identical string can, however, use mixed-case base-32
    (rather than all-lowercase or all-uppercase as recommended in
    section 4 "Base-32 characters") as an annotation telling how to
    convert the standard variable length hex encoding, and folded Unicode string into a mixed-case Unicode string
    for display purposes.

    Each Unicode code point (unless it is defined as

  1.  Let CL be the lower case U+002D hyphen-minus) is
    represented by a sequence of base-32 characters, the first input character,

      If CL last of which
    is not in set [ghijklmnopqrstuv],
        return error,
        consume the input character;

  2.  Let R = CL - 'g',
      Let N = 1;

  3.  If no more input characters exist, go always a letter (as opposed to step 9.

  4.  Let CL be the lower case of the next input character;

  5. a digit).  If CL that letter is not in
    uppercase, it is a suggestion that the set [0123456789abcdef], go Unicode character be mapped
    to Step 9;

  6.  Consume uppercase (if possible); if the next input character,
      Let N = N + 1;
      Let R = R * 16;

  7.  If N letter is in set [0123456789],
        then let R = R + (N - '0')
        else let R = R + (N - 'a') + 10;

  8.  Go to step 3;

  9.  Let MASK be lowercase, it is a
    suggestion that the bitwise complement of (16**N) - 1;

 10.  Return decoded result R as well as MASK.

2.6.2 Unicode character be mapped to lowercase (if

    DUDE Decompression Algorithm

  1.  Let PREV = 0;

  2.  If there encoders and decoders are no more input characters then terminate successfully;

  3.  Let C be the next input character;

  4.  If C == '-', append '-' not required to the result string, consume the character, support these
    annotations, and go higher layers need not use them.

    Example:  In order to step 2,

  5.  Let VPART, MASK suggest that example O in section 7 "Example
    strings" be displayed as:


    one could capitalize the next extended variable length hex decoded
        value and mask;

  6.  If VPART > 0xFFFFF then return error status,

  7.  Let CU = ( PREV bitwise-AND MASK) + VPART,
      Let PREV = CU;

  8.  Append DUDE encoding as:


D. Differences from draft-ietf-idn-dude-01

    Four changes have been made since draft-ietf-idn-dude-01 (DUDE-01):

     1) DUDE-01 computed the UTF-32 character CU to XOR of each integer with the result string;

  9.  Go previous one
        in order to step 2.

2.6.3  Reverse Transformation Algorithm

  1.  Break the string into dot-separated components and apply Steps
      2 through 4 decide how many bits of each integer to encode, but
        now the XOR itself is encoded, so there is no need for a mask.

     2) DUDE-01 made the first quintet of each component;

  2.  Remove sequence different from
        the post converted name prefix 'dq--' (see Section 2.1);

  3.  Decompress rest, while now it is the component using last quintet that differs, so it's
        easier for the decompression algorithm
      described above (which in turn invokes decoder to detect the decoding algorithm
      also described above);

  4.  Concatenate end of the decoded segments with dot separators and return.

3.  Examples sequence.

     3) The examples below illustrate the encoding algorithm.  Allowed RFC1035
characters, including period [U+002E] base-32 map has changed to avoid 0, 1, o, and dash [U+002D] are shown as
literals in the UTF-16 version l, to help
        humans avoid transcription errors.

     4) The initial value of the example.  DUDE is compared previous code point has changed from 0
LACE as proposed in [IDNLACE].  A comprehensive comparison of ACE
proposals is outside of 0x60, making the scope encodings of this document.  However we believe
that DUDE shows a good balance between efficiency (resulting in few domain names shorter
ACE sequences for typical names) and complexity.

3.1  '' [Arabic]:

  UTF-16:  U+0645 U+0648 U+0642 U+0639 . U+0648 U+0644 U+064A U+062F .
           U+0634 U+0631 U+0643 U+0629

  DUDE:    dq--m45oij9.dq--m48kqif.dq--m34hk3i9


3.2  '' [Arabic]:

  UTF-16:  U+0623 U+0628 U+0648 U+063A U+0632 U+0627 U+0644 U+0629 -
           U+0644 U+0644 U+0645 U+0644 U+0643 U+064A U+0629 - U+0627
           U+0644 U+0641 U+0643 U+0631 U+064A U+0629 . U+0634 U+0631
           U+0643 U+0629

  DUDE:    dq--m23ok8jaii7k4i9-m44klkjqi9-m27k4hjj1kai9.dq--m34hk3i9

  LACE:    bq--badcgkcihizcorbjaeac2bygircekrcdjiuqcabna4dcorcbimyuuki.

3.3  'King-Hussain.person.jr' [Arabic]

  UTF-16:  U+0627 U+0644 U+0645 U+0644 U+0643 - U+062D U+0633 U+064A
           U+0646 . U+0634 U+062E U+0635 . U+0627 U+0644 U+0623 U+0631
           U+062F U+0646

  DUDE:    dq--m27k4lkj-m2dj3kam.dq--m34iej5.dq--m27k4i3j1ifk6


3.4  '' [Arabic]

  UTF-16:  U+0645 U+0631 U+0643 U+0632 - U+0627 U+0644 U+0623 U+0631 U+062F
           U+0646 - U+0644 U+0644 U+0623 U+0633 U+0646 U+0627 U+0646 .
           U+0634 U+0631 U+0643 U+0629 . U+0627 U+0644 U+0623 U+0631 U+062F

  DUDE:    dq--m45j1k3j2-m27k4i3j1ifk6-m44ki3j3k6i7k6.dq--m34hk3i9.

  LACE:    bq--aqdekmkdgiaqaligaytuiizrf5dacabna4deirbdgndcorq.

3.5  '' [Hindi]:

  UTF-16:  U+092E U+0939 U+093F U+0928 U+094D U+0926 U+094D U+0930
           U+093E . U+0935 U+094D U+092F U+093E U+092A U+093E U+0930

  DUDE:    dq--p2ej9vi8kdi6kdj0u.dq--p35kdifjeiajeg


3.6  '' [Hindi]:

  UTF-16:  U+0935 U+0947 U+092C U+0926 U+0941 U+0928 U+093F U+092F
           U+093E . U+0935 U+094D U+092F U+093E U+092A U+093E U+0930

  DUDE:    dq--p35k7icmk1i8jfifje.dq--p35kdifjeiajeg


3.7  'Chinese' [Traditional Chinese]

  UTF-16:  U+4E2D U+83EF U+8CA1 U+7D93 . c o m



3.8  'Chinese' [Chinese]

  UTF-16:  U+842C U+7DAD U+8B80 U+8005 . U+7DB2 U+7D61

  DUDE:    dq--o42cndadob80g05.dq--ndb2m1


3.9  '' [Russian]

  UTF-16:  U+0440 U+0443 U+0441 U+0441 U+043A U+0438 U+0439 -
           U+0441 U+0442 U+0430 U+043D U+0434 U+0430 U+0440 U+0442 .
           U+043A U+043E U+043C . U+0440 U+0444

  DUDE:    dq--k40jhhjaop-k3ausk1ij0tkgk0i.dq--k3aus.dq--k40k


3.10  '' [Russian]

  UTF-16:  U+0432 U+043B U+0430 U+0434 U+0438 U+043C U+0438 U+0440 -
           U+043F U+0443 U+0442 U+0438 U+043D . U+043B U+0438 U+0447
           U+043D U+043E U+0441 U+0442 U+044C . U+0440 U+0444 U+0020

  DUDE:    dq--k32rgkosok0-k3fk3ij8t.dq--k3bok7jduk1is.dq--k40k

  LACE:    bq--bacdeozqgq4dyocaaeac2bieh5bueob5.

4. Optional Case Preservation

An extension to the
        none longer.

E. Example implementation

/* dude.c 0.2.3 (2001-May-31-Thu)         */
/* Adam M. Costello <> */

/* This is ANSI C code (C89) implementing */
/* DUDE concept recognizes that (draft-ietf-idn-dude-02).         */

/* Public interface (would normally go in its own .h file): */

#include <limits.h>

enum dude_status {
  dude_big_output  /* Output would exceed the first
character emitted by space provided. */

enum case_sensitivity { case_sensitive, case_insensitive };

#if UINT_MAX >= 0x1FFFFF
typedef unsigned int u_code_point;
typedef unsigned long u_code_point;

enum dude_status dude_encode(
  unsigned int input_length,
  const u_code_point input[],
  const unsigned char uppercase_flags[],
  unsigned int *output_size,
  char output[] );
    /* dude_encode() converts Unicode to DUDE (without any            */
    /* signature).  The input must be represented as an array         */
    /* of Unicode code points (not code units; surrogate pairs        */
    /* are not allowed), and the variable length hex encoding algorithm output will be represented as        */
    /* null-terminated ASCII.  The input_length is
always alphabetic.  We encode the case (if any) number of code */
    /* points in the original Unicode
character input.  The output_size is an in/out argument:   */
    /* the caller must pass in the case maximum number of characters       */
    /* that may be output (including the initial "hex" character.  Because terminating null), and on    */
    /* successful return it will contain the DNS
performs case-insensitive comparisons, mixed case international domain
names behave in exactly number of characters     */
    /* actually output (including the terminating null, so it will be */
    /* one more than strlen() would return, which is why it is called */
    /* output_size rather than output_length).  The uppercase_flags   */
    /* array must hold input_length boolean values, where nonzero     */
    /* means the same way as traditional domain names.
In particular, this enables reverse lookups corresponding Unicode character should be forced     */
    /* to return names in the
preferred case.

In contrast uppercase after being decoded, and zero means it is         */
    /* caseless or should be forced to other proposals as of this writing, such lowercase.  Alternatively,     */
    /* uppercase_flags may be a case preserving
version null pointer, which is equivalent     */
    /* to all zeros.  The encoder always outputs lowercase base-32    */
    /* characters except when nonzero values of uppercase_flags       */
    /* require otherwise.  The return value may be any of DUDE will interoperate with the non case preserving version.

Despite         */
    /* dude_status values defined above; if not dude_success, then    */
    /* output_size and output may contain garbage.  On success, the foregoing, we feel that   */
    /* encoder will never need to write an output_size greater than   */
    /* input_length*k+1 if all the additional complexity input code points are less than 1  */
    /* << (4*k), because of tracking
character case through how the nameprep processing encoding is not warranted by defined.              */

enum dude_status dude_decode(
  enum case_sensitivity case_sensitivity,
  char scratch_space[],
  const char input[],
  unsigned int *output_length,
  u_code_point output[],
  unsigned char uppercase_flags[] );
    /* dude_decode() converts DUDE (without any signature) to         */
    /* Unicode.  The input must be represented as null-terminated     */
    /* ASCII, and the
marginal utility output will be represented as an array of       */
    /* Unicode code points.  The case_sensitivity argument influences */
    /* the result.

5. Security Considerations

Much of check on the security well-formedness of the Internet relies input string; it       */
    /* must be case_sensitive if case-sensitive comparisons are       */
    /* allowed on encoded strings, case_insensitive otherwise.        */
    /* The scratch_space must point to space at least as large        */
    /* as the DNS and any
change input, which will get overwritten (this allows the      */
    /* decoder to avoid calling malloc()).  The output_length is      */
    /* an in/out argument: the characteristics of caller must pass in the DNS maximum        */
    /* number of code points that may change be output, and on successful    */
    /* return it will contain the security of
much actual number of code points        */
    /* output.  The uppercase_flags array must have room for at       */
    /* least output_length values, or it may be a null pointer if     */
    /* the Internet. Therefore DUDE makes no changes to the DNS itself.

DUDE is designed so that distinct Unicode sequences map to distinct
domain name sequences (modulo the Unicode and DNS equivalence rules).
Therefore use of DUDE with DNS will case information is not negatively affect security below
the application level.

If an application has security reliance on needed.  A nonzero flag indicates  */
    /* that the corresponding Unicode string S, produced character should be forced to   */
    /* uppercase by an inverse ACE transformation of a name T, the application must verify
that the nameprepped and ACE encoded result of S caller, while zero means it is DNS-equivalent caseless or    */
    /* should be forced to T.

6. Change History lowercase.  The statement that we intended to submit a Nameprep draft was removed in
light return value may be any    */
    /* of the changes made between the frist dude_status values defined above; if not dude_success,  */
    /* then output_length, output, and second nameprep drafts.

The details of DUDE extensions for case preservation etc. have been
removed.  Basic DUDE was changed to operate over uppercase_flags may contain    */
    /* garbage.  On success, the relevant 20 bit
UTF32 code points.

Examples have been extended.

ACE security issues were clarified.

7. References

[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
Proposals", draft-ietf-idn-compare;

[IDNrACE] Paul Hoffman, "RACE: Row-Based ASCII Compatible Encoding for
IDN", draft-ietf-idn-race;

[IDNLACE] Mark Davis, "LACE: Length-Based ASCII Compatible Encoding for
IDN", draft-ietf-idn-lace;

[IDNREQ] James Seng, "Requirements of Internationalized Domain Names",

[IDNNAMEPREP] Paul Hoffman and Marc Blanchet, "Preparation of
Internationalized Host Names", draft-ietf-idn-nameprep;

[IDNDUERST] M. Duerst, "Internationalization of Domain Names",

[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information
technology -- Universal Multiple-Octet Coded Character Set (UCS) --
Part 1: Architecture and Basic Multilingual Plane.  Five amendments and
a technical corrigendum have been published up to now. UTF-16 is
described in Annex Q, published as Amendment 1. 17 other amendments are
currently at various stages of standardization;

[RFC2119] Scott Bradner, "Key words for use in RFCs decoder will never need to Indicate
Requirement Levels", March 1997, RFC 2119;

[STD13] Paul Mockapetris, "Domain names - implementation and
specification", November 1987, STD 13 (RFC 1035);

[UNICODE3] The Unicode Consortium, "The Unicode Standard -- Version
3.0", ISBN 0-201-61633-5. Described at

A. Acknowledgements

The structure (and some write     */
    /* an output_length greater than the length of the structural text) input (not     */
    /* counting the null terminator), because of this document is
intentionally borrowed from how the LACE IDN draft (draft-ietf-idn-lace-00)
by Mark Davis and Paul Hoffman.

B. IANA Considerations

There are no IANA considerations in this document.

C. Author Contact Information

Mark Welter
Brian W. Spolarich
State Technology Park
2245 S. State St.
Ann Arbor, MI  48104

D. DUDE C++ Implementation

#include <stdio.h> encoding is  */
    /* defined.                                                       */

/* Implementation (would normally go in its own .c file): */

#include <string.h>
#include <ctype.h>
#include <limits.h>


#define DUDETAG "dq--"

typedef unsigned int uchar_t;

bool idn_isRFC1035(const uchar_t * in, int len)
        const uchar_t * end = in + len;

        while (in < end)
                if ((*in > 127) ||
                        !strchr("abcdefghijklmnopqrstuvwxyz0123456789-.", tolower(*in)))
                        return false;
        return true;

static const char *hexchar = "0123456789abcdef";

/* Character utilities: */

/* base32[q] is the lowercase base-32 character representing  */
/* the number q from the range 0 to 31.  Note that we cannot  */
/* use string literals for ASCII characters because an ANSI C */
/* compiler does not necessarily use ASCII.                   */

static const char *leadchar base32[] = "ghijklmnopqrstuv"; {
  97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,     /*
        dudehex -- convert an integer, v, into n DUDE hex characters.
        The result is placed a-k */
  109, 110,                                               /* m-n */
  112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122,  /* p-z */
  50, 51, 52, 53, 54, 55, 56, 57                          /* 2-9 */

/* base32_decode(c) returns the value of a base-32 character, in ostr.  The buffer ends at the byte before
        eop, and false is returned to indicate insufficient buffer space. */
/* range 0 to 31, or the constant base32_invalid if c is not a valid */
/* base-32 character.                                                */

enum { base32_invalid = 32 };

static bool dudehex(char * & ostr, const char * eop, unsigned int v, int n) base32_decode(char c)
  if ((ostr + n) >= eop) (c < 50) return false;

        n--; // convert to zero origin

        *ostr++ = leadchar[(v >> (n << 2)) & 0x0F];

        while (n base32_invalid;
  if (c <= 57) return c - 26;
  if (c < 97) c += 32;
  if (c < 97 || c == 108 || c == 111 || c > 0)
                *ostr++ = hexchar[(v >> (n << 2)) & 0x0F];
        } 122) return true; base32_invalid;
  return c - 97 - (c > 108) - (c > 111);

        idn_dudeseg converts istr, a utf-32 domain name segment into DUDE.
        eip points at the character after unequal(case_sensitivity,s1,s2) returns 0 if the input segment.
        ostr points at an output buffer which ends just before eop. strings s1 and s2 */
/* are equal, 1 otherwise.  If there case_sensitivity is insufficient buffer space, the function return is false.
        Invalid surrogate sequences will also cause a return of false. case_insensitive,  */
/* then ASCII A-Z are considered equal to a-z respectively.           */

static bool idn_dudeseg(const uchar_t * istr, int unequal( enum case_sensitivity case_sensitivity,
                    const uchar_t * eip, char * & ostr, s1[], const char * eop) s2[]        )
        const uchar_t * ip = istr;
        unsigned p =
  char c1, c2;

  if (case_sensitivity != case_insensitive) return strcmp(s1,s2) != 0;

        while (ip < eip)

  for (;;) {
                if (*ip == '-')
    c1 =  *ip;
                else  // *s1;
    c2 = *s2;
    if (validnc(*ip))
                { (c1 >= 65 && c1 <= 90) c1 += 32;
    if (c2 >= 65 && c2 <= 90) c2 += 32;
    if (c1 != c2) return 1;
    if (c1 == 0) return 0;
    ++s1, ++s2;

/* Encoder: */

enum dude_status dude_encode(
  unsigned int input_length,
  const u_code_point input[],
  const unsigned char uppercase_flags[],
  unsigned int c = *ip;

                        unsigned d = p ^ c;  // d now has the difference (xor)
                                             // between the current and previous *output_size,
  char output[] )
  unsigned int n max_out, in, out, k, j;
  u_code_point prev, codept, diff, tmp;
  char shift;

  prev = 1;           // Count the number of significant nibbles
                        while (d >>= 4)

                        dudehex(ostr, eop, c, n);
                        p 0x60;
  max_out = c;
        *ostr *output_size;

  for (in = out = 0;
        return true;
}  in < input_length;  ++in) {

        idn_UTF32toDUDE converts a UTF-32 domain name into DUDE.
        in, a UTF-32 vector of length inlen is At the input domain name.
        outstr is a char output buffer start of length outmax.
        On success, each iteration, in and out are the number of output characters is returned.
        On failure, a negative number is returned.

        It is assumed that */
    /* items already input/output, or equivalently, the input has been nameprepped.

        If this routine is used in a registration context, segment and
        overall length restrictions must be checked by indices of  */
    /* the user. next items to be input/output.                           */

int idn_UTF32toDUDE(const uchar_t * in, int inlen, char *outstr, int outmax)
        const uchar_t *ip
    codept = in;
        const uchar_t *eip input[in];

    if (codept == 0x2D) {
      /* Hyphen-minus stands for itself. */
      if (max_out - out < 1) return dude_big_output;
      output[out++] = in + inlen;
        const uchar_t *ep 0x2D;

    diff = ip;
        char *op prev ^ codept;

    /* Compute the number of base-32 characters (k): */
    for (tmp = outstr;
        char *eop diff >> 4, k = outstr + outmax - 1;

        while (ip  tmp != 0;  ++k, tmp >>= 4);

    if (max_out - out < eip)
                ep k) return dude_big_output;
    shift = ip;
                while ((ep < eip) uppercase_flags && (*ep != '.'))

                const char * tagp = DUDETAG;  // prefix uppercase_flags[in] ? 32 : 0;
    /* shift controls the segment
                while (*tagp)                 // with case of the tag (dq--)
                        if (op >= eop)
                                *outstr = '\0';
                                return IDN_ERROR;
                        *op++ = *tagp++;

                if (idn_isRFC1035(ip, ep - ip))
                        if ((ep - ip) >= (eop last base-32 digit. */

    /* Each quintet has the form 1xxxx except the last is 0xxxx. */
    /* Computing the base-32 digits in reverse order is easiest. */

    out += k;
    output[out - op))
                                *outstr 1] = '\0';
                                return IDN_ERROR;
                        while (ip < ep)
                                *op++ base32[diff & 0xF] - shift;

    for (j = *ip++;
                        if (!idn_dudeseg(ip, ep, op, eop)) 2;  j <= k;  ++j) {
      diff >>= 4;
      output[out - j] = '\0';
                                return IDN_ERROR;
                        } base32[0x10 | (diff & 0xF)];

                if (op >= eop)                  // check for output buffer overflow

    prev = '\0';
                        return IDN_ERROR; codept;

  /* Append the null terminator: */
  if (ep (max_out - out < eip)
                        *op++ = *ep;            // copy '.'

                ip 1) return dude_big_output;
  output[out++] = ep + 1;

        *op 0;

  *output_size = '\0'; out;
  return (op - outstr) - 1; dude_success;

        idn_DUDEsegtoUTF32 converts instr, DUDE encoded domain name segment
        into UTF32.
        eip points at the character after the input segment.
        ostr points at an output buffer which ends just before eop.
        If there is insufficient buffer space, the function return is false. Decoder: */
static int idn_DUDEsegtoUTF32(const

enum dude_status dude_decode(
  enum case_sensitivity case_sensitivity,
  char scratch_space[],
  const char * instr, int inlen,
                                        uchar_t * outstr, input[],
  unsigned int maxlen) *output_length,
  u_code_point output[],
  unsigned char uppercase_flags[] )
  u_code_point prev, q, diff;
  char * ip c;
  unsigned int max_out, in, out, scratch_size;
  enum dude_status status;

  prev = instr;
        const char * eip 0x60;
  max_out = instr + inlen;
        uchar_t * op *output_length;
  for (c = outstr;
        uchar_t * eop input[in = op + maxlen - 1;

        unsigned prev 0], out = 0;

        while (ip < eip)  c != 0;  c = input[++in], ++out) {

    /* At the start of each iteration, in and out are the number of */
    /* items already input/output, or equivalently, the indices of  */
    /* the next items to be input/output.                           */

    if (max_out - out < 1) return dude_big_output;

    if (*ip (c == '-')
                        *op++ 0x2D) output[out] = '-'; c;  /* hyphen-minus is literal */
    else {
                        char c0
      /* Base-32 sequence.  Decode quintets until 0xxxx is found: */

      for (diff = 0;  ;  c = input[++in]) {
        q = tolower(*ip); base32_decode(c);
        if ((c0 < 'g') || (c0 > 'v')) (q == base32_invalid) return false;


                        unsigned r dude_bad_input;
        diff = c0 - 'g';
                        int n (diff << 4) | (q & 0xF);
        if (q >> 4 == 0) break;

      prev = 1;
                        while (ip < eip)
                                char cl output[out] = tolower(*ip);
                                if ((cl >= '0') && (cl <= '9'))
                                        r <<= 4;
                                        r += cl - '0'; prev ^ diff;

    /* Case of last character determines uppercase flag: */
    if ((cl (uppercase_flags) uppercase_flags[out] = c >= 'a') 65 && (cl c <= 'f'))
                                        r <<= 4;
                                        r += (cl - 'a') + 10;

                                n++; 90;

                        if (r >= 0x0fffff)
                                return false;
                        unsigned mask = -1 << (n << 2);

                        unsigned cu

  /* Enforce the uniqueness of the encoding by re-encoding */
  /* the output and comparing the result to the input:     */

  scratch_size = (prev & mask) + r;
                        prev ++in;
  status = cu; dude_encode(out, output, uppercase_flags,
                       &scratch_size, scratch_space);
  if (op >= eop) (status != dude_success || scratch_size != in ||
      unequal(case_sensitivity, scratch_space, input)
     ) return IDN_ERROR;
                        *op++ dude_bad_input;

  *output_length = cu;
                } out;
  return dude_success;

/* Wrapper for testing (would normally go in a separate .c file): */

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* For testing, we'll just set some compile-time limits rather than */
/* use malloc(), and set a compile-time option rather than using a  */
/* command-line option.                                             */

enum {
  unicode_max_length = '\0';
        return (op - outstr); 256,
  ace_max_size = 256,
  test_case_sensitivity = case_insensitive
                          /* suitable for host names */

static void usage(char **argv)
    "%s -e reads code points and writes a DUDE string.\n"
    "%s -d reads a DUDE string and writes code points.\n"
    "Input and output are plain text in the native character set.\n"
    "Code points are in the form u+hex separated by whitespace.\n"
    "A DUDE string is a newline-terminated sequence of LDH characters\n"
    "(without any signature).\n"
    "The case of the u in u+hex is the force-to-uppercase flag.\n"
    , argv[0], argv[0]);

int idn_DUDEtoUTF32(const

static void fail(const char * in, int inlen, uchar_t * outstr, int outmax) *msg)

static const char *ip too_big[] = in;
  "input or output is too large, recompile with larger limits\n";
static const char *eip invalid_input[] = in + inlen; "invalid input\n";
static const char *ep = ip;
        uchar_t *op = outstr;
        uchar_t *eop = outstr + outmax - 1;

        while (ip < eip)
                ep io_error[] = ip;
                while ((ep < eip) && (*ep != L'.'))
                        ep++; "I/O error\n";

/* The following string is used to convert LDH      */
/* characters between ASCII and the native charset: */

static const char * tip ldh_ascii[] = ip;

int main(int argc, char * tagp = DUDETAG;
                while (*tagp && (tip < ep) && (tolower(*tagp) == tolower(*tip))) **argv)
  enum dude_status status;
  int r;
  char *p;

  if (*tagp)
                {                              // tag doesn't match, copy segment verbatim
                        while (ip < ep)
                        { (argc != 2) usage(argv);
  if (op >= eop)
                                        return IDN_ERROR;
                                *op++ = *ip++;
                else (argv[1][0] != '-') usage(argv);
  if (argv[1][2] != 0) usage(argv);
  if (argv[1][1] == 'e') {
                        ip = tip;
    u_code_point input[unicode_max_length];
    unsigned long codept;
    unsigned char uppercase_flags[unicode_max_length];
    char output[ace_max_size], uplus[3];
    unsigned int rv input_length, output_size, i;

    /* Read the input code points: */

    input_length = idn_DUDEsegtoUTF32(ip, ep - ip, op, eop - op); 0;

    for (;;) {
      r = scanf("%2s%lx", uplus, &codept);
      if (rv < (ferror(stdin)) fail(io_error);
      if (r == EOF || r == 0)
                                return IDN_ERROR;

                        op += rv; break;

      if (r != 2 || uplus[1] != '+' || codept > (u_code_point)-1) {

                *op++ = *ep;

      if (!*ep)

                ip (input_length == unicode_max_length) fail(too_big);

      if (uplus[0] == 'u') uppercase_flags[input_length] = ep + 1;
        } 0;
      else if (op >= eop)
                return IDN_ERROR;

        *op (uplus[0] == 'U') uppercase_flags[input_length] = '\0';

        return (op - outstr) - 1;
      else fail(invalid_input);

      input[input_length++] = codept;

        DUDE test driver Encode: */

void printres(char *title, int rv, char *buff);
void printres(char *title, int rv, uchar_t *buff);

int main(int argc, char *argv[])
        char inbuff[512];

        while (fgets(inbuff, sizeof(inbuff), stdin))
                char cbuff[128];
                uchar_t wbuff[128];
                uchar_t iwbuff[128];
                uchar_t *wsp = wbuff;
                uchar_t wc;
                int in;
                int nr;

                char * inp

    output_size = inbuff;
                wsp ace_max_size;
    status = wbuff;
                while (sscanf(inp, "%x%n", &in, &nr) > 0) dude_encode(input_length, input, uppercase_flags,
                         &output_size, output);
    if (status == dude_bad_input) fail(invalid_input);
    if (status == dude_big_output) fail(too_big);
    assert(status == dude_success);

    /* Convert to native charset and output: */

    for (p = output;  *p != 0;  ++p) {
                        inp += nr;
      i = *p;
      assert(i <= 122 && ldh_ascii[i] != '.');
      *p = in; ldh_ascii[i];
                fprintf(stdout, "\n");

                int rv;

    r = idn_UTF32toDUDE(wbuff, wsp - wbuff, cbuff, sizeof(cbuff));
                printres("toDUDE", rv, cbuff); puts(output);
    if (rv >= 0)
                        rv = idn_DUDEtoUTF32(cbuff, rv, iwbuff, sizeof(iwbuff));
                        printres("toUTF32", rv, iwbuff);

        } (r == EOF) fail(io_error);
    return 0; EXIT_SUCCESS;

void printres(char *title, int rv, char *buff)
        fprintf(stdout, "%s (%d) : ", title, rv);

  if (rv >= 0) (argv[1][1] == 'd') {
    char input[ace_max_size], scratch[ace_max_size], *pp;
    u_code_point output[unicode_max_length];
    unsigned char *dp uppercase_flags[unicode_max_length];
    unsigned int input_length, output_length, i;
    /* Read the DUDE input string and convert to ASCII: */

    fgets(input, ace_max_size, stdin);
    if (ferror(stdin)) fail(io_error);
    if (feof(stdin)) fail(invalid_input);
    input_length = (unsigned char *) buff;
                while (*dp) strlen(input);
    if (input[input_length - 1] != '\n') fail(too_big);
    input[--input_length] = 0;

    for (p = input;  *p != 0;  ++p) {
                        fprintf(stdout, "%c", *dp++);
        fprintf(stdout, "\n");
      pp = strchr(ldh_ascii, *p);
      if (pp == 0) fail(invalid_input);
      *p = pp - ldh_ascii;

void printres(char *title, int rv, uchar_t *buff)

    /* Decode: */

    output_length = unicode_max_length;
    status = dude_decode(test_case_sensitivity, scratch, input,
                         &output_length, output, uppercase_flags);
    if (status == dude_bad_input) fail(invalid_input);
    if (status == dude_big_output) fail(too_big);
    assert(status == dude_success);

    /* Output the result: */

    for (i = 0;  i < output_length;  ++i) {
        fprintf(stdout, "%s (%d)
      r = printf("%s+%04lX\n",
                 uppercase_flags[i] ? "U" : ", title, rv); "u",
                 (unsigned long) output[i] );
      if (rv >= (r < 0)
                uchar_t *dp = buff;
                while (*dp)
                        fprintf(stdout, " %05x", *dp++); fail(io_error);

    return EXIT_SUCCESS;
        fprintf(stdout, "\n");

  return EXIT_SUCCESS;  /* not reached, but quiets compiler warning */

                   INTERNET-DRAFT expires 2001-Dec-07