draft-ietf-idn-amc-ace-z-00.txt   draft-ietf-idn-amc-ace-z-01.txt 
INTERNET-DRAFT Adam M. Costello INTERNET-DRAFT Adam M. Costello
draft-ietf-idn-amc-ace-z-00.txt 2001-Aug-16 draft-ietf-idn-amc-ace-z-01.txt 2001-Sep-04
Expires 2002-Feb-16 Expires 2002-Mar-04
AMC-ACE-Z version 0.3.0 AMC-ACE-Z version 0.3.1
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note Task Force (IETF), its areas, and its working groups. Note
that other groups may also distribute working documents as that other groups may also distribute working documents as
Internet-Drafts. Internet-Drafts.
skipping to change at line 64 skipping to change at line 64
3.1 Basic code point segregation 3.1 Basic code point segregation
3.2 Insertion unsort coding 3.2 Insertion unsort coding
3.3 Generalized variable-length integers 3.3 Generalized variable-length integers
3.4 Bias adaptation 3.4 Bias adaptation
4. Bootstring parameters 4. Bootstring parameters
5. Parameter values for AMC-ACE-Z 5. Parameter values for AMC-ACE-Z
6. Bootstring algorithms 6. Bootstring algorithms
6.1 Bias adaptation function 6.1 Bias adaptation function
6.2 Decoding procedure 6.2 Decoding procedure
6.3 Encoding procedure 6.3 Encoding procedure
6.4 Alternative methods for handling overflow
7. AMC-ACE-Z example strings 7. AMC-ACE-Z example strings
8. Security considerations 8. Security considerations
9. References 9. References
A. Author contact information A. Author contact information
B. Mixed-case annotation B. Mixed-case annotation
C. AMC-ACE-Z sample implementation C. AMC-ACE-Z sample implementation
1. Introduction 1. Introduction
The IDNA draft [IDNA] describes an architecture for supporting The IDNA draft [IDNA] describes an architecture for supporting
skipping to change at line 122 skipping to change at line 123
AMC-ACE-Z is a working name that should be changed if it is adopted. AMC-ACE-Z is a working name that should be changed if it is adopted.
(The Z merely indicates that it is the twenty-sixth ACE devised by (The Z merely indicates that it is the twenty-sixth ACE devised by
this author. Most were not worth releasing.) this author. Most were not worth releasing.)
2. Terminology 2. Terminology
The key words "must", "shall", "required", "should", "recommended", The key words "must", "shall", "required", "should", "recommended",
and "may" in this document are to be interpreted as described in RFC and "may" in this document are to be interpreted as described in RFC
2119 [RFC2119]. 2119 [RFC2119].
A "code point" is an integral value associated with a character in a
coded character set.
As in the Unicode Standard [UNICODE], Unicode code points are As in the Unicode Standard [UNICODE], Unicode code points are
denoted by "U+" followed by four to six hexadecimal digits, while a denoted by "U+" followed by four to six hexadecimal digits, while a
range of code points is denoted by two hexadecimal numbers separated range of code points is denoted by two hexadecimal numbers separated
by "..", with no prefixes. by "..", with no prefixes.
The operators div and mod perform integer division; (x div y) is the The operators div and mod perform integer division; (x div y) is the
quotient of x divided by y, discarding the remainder, and (x mod y) quotient of x divided by y, discarding the remainder, and (x mod y)
is the remainder, so (x div y) * y + (x mod y) == x. Bootstring is the remainder, so (x div y) * y + (x mod y) == x. Bootstring
uses these operators only with nonnegative operands, so the quotient uses these operators only with nonnegative operands, so the quotient
and remainder are always nonnegative. and remainder are always nonnegative.
The ?: operator is a conditional; (x ? y : z) means y if x is true,
z if x is false. It is just like "if x then y else z" except that y
and z are expressions rather than statements.
The "break" statement jumps out of the innermost loop (as in C). The "break" statement jumps out of the innermost loop (as in C).
An "overflow" is an attempt to compute a value that exceeds the
maximum value of an integer variable.
3. Bootstring description 3. Bootstring description
Bootstring represents an arbitrary sequence of code points (the Bootstring represents an arbitrary sequence of code points (the
"extended string") as a sequence of basic code points (the "extended string") as a sequence of basic code points (the
"basic string"). This section describes the representation. "basic string"). This section describes the representation.
Section 6 "Bootstring algorithms" presents the algorithms as Section 6 "Bootstring algorithms" presents the algorithms as
pseudocode. pseudocode.
3.1 Basic code point segregation 3.1 Basic code point segregation
skipping to change at line 255 skipping to change at line 258
integral value. integral value.
Bootstring uses little-endian ordering so that the deltas can be Bootstring uses little-endian ordering so that the deltas can be
separated starting with the first. The t(j) values are defined in separated starting with the first. The t(j) values are defined in
terms of the constants base, tmin, and tmax, and a state variable terms of the constants base, tmin, and tmax, and a state variable
called bias: called bias:
t(j) = base * (j + 1) - bias, t(j) = base * (j + 1) - bias,
clamped to the range tmin through tmax clamped to the range tmin through tmax
(The clamping means that if the formula yields a value less than The clamping means that if the formula yields a value less than tmin
tmin or greater than tmax, then t(j) = tmin or tmax, respectively.) or greater than tmax, then t(j) = tmin or tmax, respectively. (In
These t(j) values cause the representation to favor integers within the pseudocode in section 6 "Bootstring algorithms", the expression
a particular range determined by the bias. base * (j + 1) is denoted by k for performance reasons.) These
t(j) values cause the representation to favor integers within a
particular range determined by the bias.
3.4 Bias adaptation 3.4 Bias adaptation
After each delta is encoded or decoded, bias is set for the next After each delta is encoded or decoded, bias is set for the next
delta as follows: delta as follows:
1. Delta is scaled in order to avoid overflow in the next step: 1. Delta is scaled in order to avoid overflow in the next step:
let delta = delta div 2 let delta = delta div 2
skipping to change at line 308 skipping to change at line 313
expected to be last, tmin for the less significant digits up through expected to be last, tmin for the less significant digits up through
the one expected to be third-last, and somewhere between tmin and the one expected to be third-last, and somewhere between tmin and
tmax for the digit expected to be second-last (balancing the hope of tmax for the digit expected to be second-last (balancing the hope of
the expected-last digit being unnecessary against the danger of it the expected-last digit being unnecessary against the danger of it
being insufficient). being insufficient).
4. Bootstring parameters 4. Bootstring parameters
Given a set of basic code points, one must be designated as Given a set of basic code points, one must be designated as
the delimiter. The base can be no greater than the number of the delimiter. The base can be no greater than the number of
distinguishable basic code points remaining. The values 0 through distinguishable basic code points remaining. The digit-values
base-1 must be associated with non-delimiter basic code points. in the range 0 through base-1 must be associated with distinct
In some cases multiple code points must represent the same value; non-delimiter basic code points. In some cases multiple code points
for example, uppercase and lowercase versions of a letter must be must have the same digit-value; for example, uppercase and lowercase
equivalent if basic strings are case-insensitive. versions of the same letter must be equivalent if basic strings are
case-insensitive.
The initial value of n must be no greater than the minimum non-basic The initial value of n must be no greater than the minimum non-basic
code point that could appear in extended strings. code point that could appear in extended strings.
The remaining five parameters (tmin, tmax, skew, damp, and the The remaining five parameters (tmin, tmax, skew, damp, and the
initial value of bias) must satisfy the following constraints: initial value of bias) must satisfy the following constraints:
0 <= tmin <= tmax <= base-1 0 <= tmin <= tmax <= base-1
skew >= 1 skew >= 1
damp >= 2 damp >= 2
skipping to change at line 342 skipping to change at line 348
5. Parameter values for AMC-ACE-Z 5. Parameter values for AMC-ACE-Z
AMC-ACE-Z uses the following Bootstring parameter values: AMC-ACE-Z uses the following Bootstring parameter values:
base = 36 base = 36
tmin = 1 tmin = 1
tmax = 26 tmax = 26
skew = 38 skew = 38
damp = 700 damp = 700
initial_bias = 72 initial_bias = 72
initial_n = U+0080 initial_n = 0x80
In AMC-ACE-Z, code points are Unicode code points [UNICODE], that In AMC-ACE-Z, code points are Unicode code points [UNICODE], that
is, integers in the range 0..10FFFF, but not D800..DFFF, which are is, integers in the range 0..10FFFF, but not D800..DFFF, which are
reserved for use by UTF-16. The basic code points are the ASCII reserved for use by UTF-16. The basic code points are the ASCII
code points (0..7F), some of which have values associated with them: code points (0..7F), of which U+002D (-) is the delimiter, and some
of the others have digit-values as follows:
U+002D (-) = delimiter code points digit-values
------------ ----------------------
41..5A (A-Z) = 0 to 25, respectively 41..5A (A-Z) = 0 to 25, respectively
61..7A (a-z) = 0 to 25, respectively 61..7A (a-z) = 0 to 25, respectively
30..39 (0-9) = 26 to 35, respectively 30..39 (0-9) = 26 to 35, respectively
Using hyphen-minus as the delimiter implies that the ACE can end Using hyphen-minus as the delimiter implies that the ACE can end
with a hyphen-minus only if the Unicode string consists entirely with a hyphen-minus only if the Unicode string consists entirely
of basic code points, but IDNA forbids such strings from being of basic code points, but IDNA forbids such strings from being
ACE-encoded. Furthermore, the ACE can begin with a hyphen-minus ACE-encoded. Furthermore, the ACE can begin with a hyphen-minus
only if the Unicode string does, which is forbidden by IDNA. only if the Unicode string does, which is forbidden by IDNA.
Therefore IDNA using AMC-ACE-Z, regardless of whether the signature Therefore IDNA using AMC-ACE-Z, regardless of whether the signature
skipping to change at line 390 skipping to change at line 398
Such ambiguities are usually resolved by context, but in an ACE Such ambiguities are usually resolved by context, but in an ACE
there is no context apparent to humans. there is no context apparent to humans.
6. Bootstring algorithms 6. Bootstring algorithms
Some parts of the pseudocode can be omitted if the parameters Some parts of the pseudocode can be omitted if the parameters
satisfy certain conditions (for which AMC-ACE-Z qualifies). These satisfy certain conditions (for which AMC-ACE-Z qualifies). These
parts are enclosed in {braces}, and notes immediately following the parts are enclosed in {braces}, and notes immediately following the
pseudocode explain the conditions under which they may be omitted. pseudocode explain the conditions under which they may be omitted.
Formally, code points are integers, and hence the pseudocode assumes
that arithmetic operations can be performed directly on code points.
Some actual programming languages might require explicit conversion
between code points and integers.
6.1 Bias adaptation function 6.1 Bias adaptation function
function adapt(delta,numpoints,firsttime): function adapt(delta,numpoints,firsttime):
let delta = delta div (firsttime ? damp : 2) if firsttime then let delta = delta div damp
else let delta = delta div 2
let delta = delta + (delta div numpoints) let delta = delta + (delta div numpoints)
let k = 0 let k = 0
while delta > ((base - tmin) * tmax) div 2 while delta > ((base - tmin) * tmax) div 2 do begin
do let delta = delta div (base - tmin) and let k = k + base let delta = delta div (base - tmin)
let k = k + base
end
return k + (((base - tmin + 1) * delta) div (delta + skew)) return k + (((base - tmin + 1) * delta) div (delta + skew))
It does not matter whether the modifications to delta and k
inside adapt() affect variables of the same name inside the
encoding/decoding procedures, because after calling adapt() the
caller does not read those variables before overwriting them.
6.2 Decoding procedure 6.2 Decoding procedure
let n = initial_n let n = initial_n
let i = 0 let i = 0
let bias = initial_bias let bias = initial_bias
let output = an empty string indexed from 0 let output = an empty string indexed from 0
consume all code points before the last delimiter (if there is one) consume all code points before the last delimiter (if there is one)
and copy them to output, fail on any non-basic code point and copy them to output, fail on any non-basic code point
if more than zero code points were consumed then consume one more if more than zero code points were consumed then consume one more
(which will be the last delimiter)
while the input is not exhausted do begin while the input is not exhausted do begin
let oldi = i let oldi = i
let w = 1 let w = 1
for k = base to infinity in steps of base do begin for k = base to infinity in steps of base do begin
consume a code point, fail on end-of-input consume a code point, or fail if there was none to consume
let digit = the code point's value, fail if it has no value let digit = the code point's digit-value, fail if it has none
let i = i + digit * w, fail on overflow let i = i + digit * w, fail on overflow
let t = k <= bias ? tmin : k - bias > tmax ? tmax : k - bias let t = tmin if k <= bias, tmax if k >= bias + tmax, or
k - bias otherwise
if digit < t then break if digit < t then break
let w = w * (base - t), fail on overflow let w = w * (base - t), fail on overflow
end end
let bias = adapt(i - oldi, length(output) + 1, oldi == 0) let bias = adapt(i - oldi, length(output) + 1, test oldi is 0?)
let n = n + i div (length(output) + 1), fail on overflow let n = n + i div (length(output) + 1), fail on overflow
let i = i mod (length(output) + 1) let i = i mod (length(output) + 1)
{if n is a basic code point then fail} {if n is a basic code point then fail}
insert n into output at position i insert n into output at position i
increment i increment i
end end
The statement enclosed in braces (checking whether n is a basic The statement enclosed in braces (checking whether n is a basic
code point) may be omitted if initial_n exceeds all basic code code point) may be omitted if initial_n exceeds all basic code
points (which is true for AMC-ACE-Z), because n is never less than points (which is true for AMC-ACE-Z), because n is never less than
initial_n. initial_n.
Because the decoder state can only advance monotonically, and there Because the decoder state can only advance monotonically, and there
is only one representation of any delta, there is therefore only is only one representation of any delta, there is therefore only
one encoded string that can represent a given sequence of integers. one encoded string that can represent a given sequence of integers.
The only error conditions are invalid code points, unexpected The only error conditions are invalid code points, unexpected
end-of-input, overflow (attempts to compute values that exceed the end-of-input, overflow, and basic code points encoded using deltas
maximum value of an integer variable), and basic code points encoded instead of appearing literally. If the decoder fails on these
using deltas instead of appearing literally. If the decoder fails errors as shown above, then it cannot produce the same output for
on these errors as shown above, then it cannot produce the same two distinct inputs, and hence it need not re-encode its output to
output for two distinct inputs, and hence it need not re-encode its verify that it matches the input.
output to verify that it matches the input.
The assignment of t, where t is clamped to the range tmin through The assignment of t, where t is clamped to the range tmin through
tmax, does not handle the case where bias < k < bias + tmin, but tmax, does not handle the case where bias < k < bias + tmin, but
that is impossible because of the way bias is computed and because that is impossible because of the way bias is computed and because
of the constraints on the parameters. of the constraints on the parameters.
If the programming language does not provide overflow detection, If the programming language does not provide overflow detection,
the following technique can be used. Suppose A, B, and C are the following technique can be used. Suppose A, B, and C are
representable nonnegative integers and C is nonzero. Then A + B representable nonnegative integers and C is nonzero. Then A + B
overflows if and only if B > maxint - A, and A + (B * C) overflows overflows if and only if B > maxint - A, and A + (B * C) overflows
if and only if B > (maxint - A) div C. See appendix C "AMC-ACE-Z if and only if B > (maxint - A) div C, where maxint is the greatest
sample implementation" for demonstrations of this technique. integer for which maxint + 1 cannot be represented. Refer to
appendix C "AMC-ACE-Z sample implementation" for demonstrations of
this technique in the C language. See also section 6.4 "Alternative
methods for handling overflow".
6.3 Encoding procedure 6.3 Encoding procedure
let n = initial_n let n = initial_n
let delta = 0 let delta = 0
let bias = initial_bias let bias = initial_bias
let h = b = the number of basic code points in the input let h = b = the number of basic code points in the input
copy them to the output in order, followed by a delimiter if b > 0 copy them to the output in order, followed by a delimiter if b > 0
{if the input contains a non-basic code point < n then fail} {if the input contains a non-basic code point < n then fail}
while h < length(input) do begin while h < length(input) do begin
let m = the minimum {non-basic} code point >= n in the input let m = the minimum {non-basic} code point >= n in the input
let delta = delta + (m - n) * (h + 1), fail on overflow let delta = delta + (m - n) * (h + 1), fail on overflow
let n = m let n = m
for each code point m in the input (in order) do begin for each code point c in the input (in order) do begin
if m < n {or m is basic} then increment delta, fail on overflow if c < n {or c is basic} then increment delta, fail on overflow
if m == n then begin if c == n then begin
let q = delta let q = delta
for k = base to infinity in steps of base do begin for k = base to infinity in steps of base do begin
let t = k <= bias ? tmin : k - bias > tmax ? tmax : k - bias let t = tmin if k <= bias, tmax if k >= bias + tmax, or
k - bias otherwise
if q < t then break if q < t then break
output the code point for digit t + ((q - t) mod (base - t)) output the code point for digit t + ((q - t) mod (base - t))
let q = (q - t) div (base - t) let q = (q - t) div (base - t)
end end
output the code point for digit q output the code point for digit q
let bias = adapt(delta, h + 1, h == b) let bias = adapt(delta, h + 1, test h equals b?)
let delta = 0 let delta = 0
increment h increment h
end end
end end
increment delta and n increment delta and n
end end
The full statement enclosed in braces (checking whether the input The full statement enclosed in braces (checking whether the input
contains a non-basic code point less than n) can be omitted if all contains a non-basic code point less than n) can be omitted if all
code points less than initial_n are basic code points (which is true code points less than initial_n are basic code points (which is true
for AMC-ACE-Z if code points are unsigned). for AMC-ACE-Z if code points are unsigned).
The brace-enclosed conditions "non-basic" and "or m is basic" can be The brace-enclosed conditions "non-basic" and "or m is basic" can be
omitted if initial_n exceeds all basic code points (which is true omitted if initial_n exceeds all basic code points (which is true
for AMC-ACE-Z), because the code point being tested is never less for AMC-ACE-Z), because the code point being tested is never less
than initial_n. than initial_n.
skipping to change at line 497 skipping to change at line 525
code points less than initial_n are basic code points (which is true code points less than initial_n are basic code points (which is true
for AMC-ACE-Z if code points are unsigned). for AMC-ACE-Z if code points are unsigned).
The brace-enclosed conditions "non-basic" and "or m is basic" can be The brace-enclosed conditions "non-basic" and "or m is basic" can be
omitted if initial_n exceeds all basic code points (which is true omitted if initial_n exceeds all basic code points (which is true
for AMC-ACE-Z), because the code point being tested is never less for AMC-ACE-Z), because the code point being tested is never less
than initial_n. than initial_n.
The checks for overflow are necessary to avoid producing invalid The checks for overflow are necessary to avoid producing invalid
output when the input contains very large values or is very long. output when the input contains very large values or is very long.
Wider integer variables can handle more extreme inputs. For Wider integer variables can handle more extreme inputs. For IDNA,
AMC-ACE-Z, 26-bit unsigned integers are sufficient, because any 26-bit unsigned integers are sufficient, because any string that
string that required a 27-bit delta would have to exceed either required a 27-bit delta would have to exceed either the code point
the code point limit (0..10FFFF) or the label length limit (63 limit (0..10FFFF) or the label length limit (63 characters).
characters).
The increment of delta at the bottom of the outer loop cannot The increment of delta at the bottom of the outer loop cannot
overflow because delta < length(input) before the increment, and overflow because delta < length(input) before the increment, and
length(input) is already assumed to be representable. The increment length(input) is already assumed to be representable. The increment
of n could overflow, but only if h == length(input), in which case of n could overflow, but only if h == length(input), in which case
the procedure is finished anyway. the procedure is finished anyway.
6.4 Alternative methods for handling overflow
The encoding and decoding algorithms handle overflow by detecting
it whenever it happens. Another approach is to enforce limits on
the inputs that prevent overflow from happening. For example, if
the encoder were to verify that no input code points exceed M and
that the input length does not exceed L, then no delta could ever
exceed (M - initial_n) * (L + 1), and hence no overflow could occur
if integer variables were capable of representing values that large.
This prevention approach would impose more restrictions on the input
than the detection approach does, but might be considered simpler in
some programming languages.
In theory, the decoder could use an analogous approach, limiting the
number of digits in a variable-length integer (that is, limiting the
number of iterations in the innermost loop). However, the number
of digits that suffice to represent a given delta can sometimes
represent much larger deltas (because of the adaptation), and hence
this approach would probably require integers wider than 32 bits.
Yet another approach for the decoder is to allow overflow to occur,
but to check the final output string by re-encoding it and comparing
to the decoder input. If and only if they do not match (using a
case-insensitive ASCII comparison) overflow has occurred. This
delayed-detection approach would not impose any more restrictions on
the input than the immediate-detection approach does, and might be
considered simpler in some programming languages.
7. AMC-ACE-Z example strings 7. AMC-ACE-Z example strings
In the AMC-ACE-Z encodings below, the IDNA signature prefix is not In the AMC-ACE-Z encodings below, the IDNA signature prefix is not
shown. AMC-ACE-Z is abbreviated AMC-Z. Backslashes show where line shown. AMC-ACE-Z is abbreviated AMC-Z. Backslashes show where line
breaks have been inserted in strings too long for one line. breaks have been inserted in strings too long for one line.
The first several examples are all translations of the sentence "Why The first several examples are all translations of the sentence "Why
can't they just speak in <language>?" (courtesy of Michael Kaplan's can't they just speak in <language>?" (courtesy of Michael Kaplan's
"provincial" page [PROVINCIAL]). Word breaks and punctuation have "provincial" page [PROVINCIAL]). Word breaks and punctuation have
been removed, as is often done in domain names. been removed, as is often done in domain names.
skipping to change at line 676 skipping to change at line 731
Table Specification", 1985-Oct, RFC 952. Table Specification", 1985-Oct, RFC 952.
[RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities", [RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities",
1987-Nov, RFC 1034. 1987-Nov, RFC 1034.
[UNICODE] The Unicode Consortium, "The Unicode Standard", [UNICODE] The Unicode Consortium, "The Unicode Standard",
http://www.unicode.org/unicode/standard/standard.html. http://www.unicode.org/unicode/standard/standard.html.
A. Author contact information A. Author contact information
Adam M. Costello <amc@cs.berkeley.edu> Adam M. Costello
University of California, Berkeley University of California, Berkeley
http://www.cs.berkeley.edu/~amc/ http://www.cs.berkeley.edu/~amc/
B. Mixed-case annotation B. Mixed-case annotation
In order to use AMC-ACE-Z to represent case-insensitive strings, In order to use AMC-ACE-Z to represent case-insensitive strings,
higher layers need to case-fold the strings prior to AMC-ACE-Z higher layers need to case-fold the strings prior to AMC-ACE-Z
encoding. The encoded string can, however, use mixed case as an encoding. The encoded string can, however, use mixed case as an
annotation telling how to convert the original folded string into a annotation telling how to convert the original folded string into a
mixed-case string for display purposes. mixed-case string for display purposes.
skipping to change at line 701 skipping to change at line 756
the last of which provides the annotation. If it is uppercase, the last of which provides the annotation. If it is uppercase,
it is a suggestion to map the non-basic code point to uppercase it is a suggestion to map the non-basic code point to uppercase
(if possible); if it is lowercase, it is a suggestion to map the (if possible); if it is lowercase, it is a suggestion to map the
non-basic code point to lowercase (if possible). non-basic code point to lowercase (if possible).
AMC-ACE-Z encoders and decoders are not required to support these AMC-ACE-Z encoders and decoders are not required to support these
annotations, and higher layers need not use them. annotations, and higher layers need not use them.
C. AMC-ACE-Z sample implementation C. AMC-ACE-Z sample implementation
/******************************************/ /********************************************/
/* amc-ace-z.c 0.3.0 (2001-Aug-07-Tue) */ /* amc-ace-z.c 0.3.1 (2001-Sep-01-Sat) */
/* Adam M. Costello <amc@cs.berkeley.edu> */ /* http://www.cs.berkeley.edu/~amc/charset/ */
/******************************************/ /* Adam M. Costello */
/* http://www.cs.berkeley.edu/~amc/ */
/********************************************/
/* This is ANSI C code (C89) implementing AMC-ACE-Z version 0.3.x. */ /* This is ANSI C code (C89) implementing AMC-ACE-Z version 0.3.x. */
/************************************************************/ /************************************************************/
/* Public interface (would normally go in its own .h file): */ /* Public interface (would normally go in its own .h file): */
#include <limits.h> #include <limits.h>
enum amc_ace_status { enum amc_ace_status {
amc_ace_success, amc_ace_success,
skipping to change at line 829 skipping to change at line 887
/* 0..25 map to ASCII a..z or A..Z */ /* 0..25 map to ASCII a..z or A..Z */
/* 26..35 map to ASCII 0..9 */ /* 26..35 map to ASCII 0..9 */
} }
/* flagged(bcp) tests whether a basic code point is flagged */ /* flagged(bcp) tests whether a basic code point is flagged */
/* (uppercase). The behavior is undefined if bcp is not a */ /* (uppercase). The behavior is undefined if bcp is not a */
/* basic code point. */ /* basic code point. */
#define flagged(bcp) ((amc_ace_z_uint)(bcp) - 65 < 26) #define flagged(bcp) ((amc_ace_z_uint)(bcp) - 65 < 26)
/*** Useful constants ***/ /*** Platform-specific constants ***/
/* maxint is the maximum value of an amc_ace_z_uint variable: */ /* maxint is the maximum value of an amc_ace_z_uint variable: */
static const amc_ace_z_uint maxint = -1; static const amc_ace_z_uint maxint = -1;
/* Because maxint is unsigned, -1 becomes the maximum value. */
/* lobase and cutoff are used in the calculation of bias: */ /*** Bias adaptation function ***/
enum { lobase = base - tmin, cutoff = lobase * tmax / 2 };
static amc_ace_z_uint adapt(
amc_ace_z_uint delta, amc_ace_z_uint numpoints, int firsttime )
{
amc_ace_z_uint k;
delta = firsttime ? delta / damp : delta >> 1;
/* delta >> 1 is a faster way of doing delta / 2 */
delta += delta / numpoints;
for (k = 0; delta > ((base - tmin) * tmax) / 2; k += base) {
delta /= base - tmin;
}
return k + (base - tmin + 1) * delta / (delta + skew);
}
/*** Main encode function ***/ /*** Main encode function ***/
enum amc_ace_status amc_ace_z_encode( enum amc_ace_status amc_ace_z_encode(
amc_ace_z_uint input_length, amc_ace_z_uint input_length,
const amc_ace_z_uint input[], const amc_ace_z_uint input[],
const unsigned char uppercase_flags[], const unsigned char uppercase_flags[],
amc_ace_z_uint *output_length, amc_ace_z_uint *output_length,
char output[] ) char output[] )
{ {
skipping to change at line 893 skipping to change at line 967
} }
/* Increase delta enough to advance the decoder's */ /* Increase delta enough to advance the decoder's */
/* <n,i> state to <m,0>, but guard against overflow: */ /* <n,i> state to <m,0>, but guard against overflow: */
if (m - n > (maxint - delta) / (h + 1)) return amc_ace_overflow; if (m - n > (maxint - delta) / (h + 1)) return amc_ace_overflow;
delta += (m - n) * (h + 1); delta += (m - n) * (h + 1);
n = m; n = m;
for (j = 0; j < input_length; ++j) { for (j = 0; j < input_length; ++j) {
#if 0 /* AMC-ACE-Z does not need to check whether input[j] is basic: */
if (input[j] < n || basic(input[j])) { if (input[j] < n /* || basic(input[j]) */ ) {
if (++delta == 0) return amc_ace_overflow; if (++delta == 0) return amc_ace_overflow;
} }
#endif
/* AMC-ACE-Z can use this simplified version instead: */
if (input[j] < n && ++delta == 0) return amc_ace_overflow;
if (input[j] == n) { if (input[j] == n) {
/* Represent delta as a generalized variable-length integer: */ /* Represent delta as a generalized variable-length integer: */
for (q = delta, k = base; ; k += base) { for (q = delta, k = base; ; k += base) {
if (out >= max_out) return amc_ace_big_output; if (out >= max_out) return amc_ace_big_output;
t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias; t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias;
if (q < t) break; if (q < t) break;
output[out++] = encode_digit(t + (q - t) % (base - t), 0); output[out++] = encode_digit(t + (q - t) % (base - t), 0);
q = (q - t) / (base - t); q = (q - t) / (base - t);
} }
output[out++] = output[out++] =
encode_digit(q, uppercase_flags && uppercase_flags[j]); encode_digit(q, uppercase_flags && uppercase_flags[j]);
skipping to change at line 914 skipping to change at line 984
for (q = delta, k = base; ; k += base) { for (q = delta, k = base; ; k += base) {
if (out >= max_out) return amc_ace_big_output; if (out >= max_out) return amc_ace_big_output;
t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias; t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias;
if (q < t) break; if (q < t) break;
output[out++] = encode_digit(t + (q - t) % (base - t), 0); output[out++] = encode_digit(t + (q - t) % (base - t), 0);
q = (q - t) / (base - t); q = (q - t) / (base - t);
} }
output[out++] = output[out++] =
encode_digit(q, uppercase_flags && uppercase_flags[j]); encode_digit(q, uppercase_flags && uppercase_flags[j]);
bias = adapt(delta, h + 1, h == b);
/* Adapt the bias: */
delta = h == b ? delta / damp : delta >> 1;
delta += delta / (h + 1);
for (bias = 0; delta > cutoff; bias += base) delta /= lobase;
bias += (lobase + 1) * delta / (delta + skew);
delta = 0; delta = 0;
++h; ++h;
} }
} }
++delta, ++n; ++delta, ++n;
} }
*output_length = out; *output_length = out;
return amc_ace_success; return amc_ace_success;
} }
/*** Main decode function ***/ /*** Main decode function ***/
enum amc_ace_status amc_ace_z_decode( enum amc_ace_status amc_ace_z_decode(
skipping to change at line 941 skipping to change at line 1006
/*** Main decode function ***/ /*** Main decode function ***/
enum amc_ace_status amc_ace_z_decode( enum amc_ace_status amc_ace_z_decode(
amc_ace_z_uint input_length, amc_ace_z_uint input_length,
const char input[], const char input[],
amc_ace_z_uint *output_length, amc_ace_z_uint *output_length,
amc_ace_z_uint output[], amc_ace_z_uint output[],
unsigned char uppercase_flags[] ) unsigned char uppercase_flags[] )
{ {
amc_ace_z_uint n, out, i, max_out, bias, b, j, amc_ace_z_uint n, out, i, max_out, bias,
in, oldi, w, k, delta, digit, t; b, j, in, oldi, w, k, digit, t;
/* Initialize the state: */ /* Initialize the state: */
n = initial_n; n = initial_n;
out = i = 0; out = i = 0;
max_out = *output_length; max_out = *output_length;
bias = initial_bias; bias = initial_bias;
/* Handle the basic code points: Let b be the number of input code */ /* Handle the basic code points: Let b be the number of input code */
/* points before the last delimiter, or 0 if there is none, then */ /* points before the last delimiter, or 0 if there is none, then */
skipping to change at line 988 skipping to change at line 1053
digit = decode_digit(input[in++]); digit = decode_digit(input[in++]);
if (digit >= base) return amc_ace_bad_input; if (digit >= base) return amc_ace_bad_input;
if (digit > (maxint - i) / w) return amc_ace_overflow; if (digit > (maxint - i) / w) return amc_ace_overflow;
i += digit * w; i += digit * w;
t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias; t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias;
if (digit < t) break; if (digit < t) break;
if (w > maxint / (base - t)) return amc_ace_overflow; if (w > maxint / (base - t)) return amc_ace_overflow;
w *= (base - t); w *= (base - t);
} }
/* Adapt the bias: */ bias = adapt(i - oldi, out + 1, oldi == 0);
delta = oldi == 0 ? i / damp : (i - oldi) >> 1;
delta += delta / (out + 1);
for (bias = 0; delta > cutoff; bias += base) delta /= lobase;
bias += (lobase + 1) * delta / (delta + skew);
/* i was supposed to wrap around from out+1 to 0, */ /* i was supposed to wrap around from out+1 to 0, */
/* incrementing n each time, so we'll fix that now: */ /* incrementing n each time, so we'll fix that now: */
if (i / (out + 1) > maxint - n) return amc_ace_overflow; if (i / (out + 1) > maxint - n) return amc_ace_overflow;
n += i / (out + 1); n += i / (out + 1);
i %= (out + 1); i %= (out + 1);
/* Insert n at position i of the output: */ /* Insert n at position i of the output: */
skipping to change at line 1190 skipping to change at line 1250
if (r < 0) fail(io_error); if (r < 0) fail(io_error);
} }
return EXIT_SUCCESS; return EXIT_SUCCESS;
} }
usage(argv); usage(argv);
return EXIT_SUCCESS; /* not reached, but quiets compiler warning */ return EXIT_SUCCESS; /* not reached, but quiets compiler warning */
} }
INTERNET-DRAFT expires 2002-Feb-16 INTERNET-DRAFT expires 2002-Mar-04
 End of changes. 41 change blocks. 
72 lines changed or deleted 132 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/