[Docs] [txt|pdf] [Tracker] [Email] [Nits]

Versions: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 RFC 4790

Network Working Group                                          C. Newman
Internet-Draft                                          Sun Microsystems
Expires: November 7, 2003                                    May 9, 2003


           Internet Application Protocol Comparator Registry
                  draft-newman-i18n-comparator-00.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at http://
   www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on November 7, 2003.

Copyright Notice

   Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

   Many Internet application protocols include string-based lookup,
   searching, or sorting operations.  However the problem space for
   searching and sorting international strings is large, not fully
   explored, and is outside the area of expertise for the Internet
   Engineering Task Force (IETF).  Rather than attempt to solve such a
   large problem, this specification creates an abstraction framework so
   that application protocols can precisely identify a comparison
   function and the repertoire of comparison functions can be extended
   in the future.







Newman                  Expires November 7, 2003                [Page 1]


Internet-Draft            Comparator Registry                   May 2003


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   1.1 Conventions Used in this Document  . . . . . . . . . . . . . .  3
   2.  Comparator Definition and Purpose  . . . . . . . . . . . . . .  3
   3.  Comparator Name Syntax . . . . . . . . . . . . . . . . . . . .  4
   4.  Comparator Specification Requirements  . . . . . . . . . . . .  5
   5.  Application Protocol Requirements  . . . . . . . . . . . . . .  7
   6.  Initial Comparators  . . . . . . . . . . . . . . . . . . . . .  9
   6.1 Octet Comparator . . . . . . . . . . . . . . . . . . . . . . .  9
   6.2 ASCII Numeric Comparator . . . . . . . . . . . . . . . . . . . 10
   6.3 ASCII Casemap Comparator . . . . . . . . . . . . . . . . . . . 10
   6.4 Nameprep Comparator  . . . . . . . . . . . . . . . . . . . . . 11
   6.5 Basic Comparator . . . . . . . . . . . . . . . . . . . . . . . 11
   7.  Use by ACAP and Sieve  . . . . . . . . . . . . . . . . . . . . 14
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
   8.1 Comparator Registration Procedure  . . . . . . . . . . . . . . 14
   8.2 Comparator Registration Template . . . . . . . . . . . . . . . 15
   8.3 Octet Comparator Registration  . . . . . . . . . . . . . . . . 15
   8.4 ASCII Numeric Comparator Registration  . . . . . . . . . . . . 16
   8.5 ASCII Casemap Comparator Registration  . . . . . . . . . . . . 16
   8.6 Nameprep Comparator Registration . . . . . . . . . . . . . . . 16
   8.7 Basic Comparator Registration  . . . . . . . . . . . . . . . . 17
   8.8 Structure of Comparator Registry . . . . . . . . . . . . . . . 17
   8.9 Example Initial Registry . . . . . . . . . . . . . . . . . . . 18
   9.  Guidelines for Expert Reviewer . . . . . . . . . . . . . . . . 18
   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 19
   11. Open Issues  . . . . . . . . . . . . . . . . . . . . . . . . . 19
       Normative References . . . . . . . . . . . . . . . . . . . . . 20
       Informative References . . . . . . . . . . . . . . . . . . . . 20
       Author's Address . . . . . . . . . . . . . . . . . . . . . . . 21
       Intellectual Property and Copyright Statements . . . . . . . . 22



















Newman                  Expires November 7, 2003                [Page 2]


Internet-Draft            Comparator Registry                   May 2003


1. Introduction

   The ACAP [10] specification introduced the concept of a comparator,
   but failed to create an IANA registry.  With the introduction of
   stringprep [5] and the Unicode Collation Algorithm [7], it is now
   time to create that registry and populate it with some initial values
   appropriate for an international community.  This specification
   replaces and generalizes the definition of a comparator in ACAP and
   creates a comparator registry.

1.1 Conventions Used in this Document

   The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
   in this document are to be interpreted as defined in "Key words for
   use in RFCs to Indicate Requirement Levels" [1].

   The attribute syntax specifications use the Augmented Backus-Naur
   Form (ABNF) [2] notation including the core rules defined in Appendix
   A.  This also inherits ABNF rules from Language Tags [4].

2. Comparator Definition and Purpose

   A comparator is a named function which takes two arbitrary length
   octet strings (encoded in UTF-8 [3] for comparators which operate on
   characters) as input and can be used to perform one or more of three
   basic comparison operations: equality test, substring match and
   ordering test.

   Comparators provide a multi-protocol abstraction layer for comparison
   functions so the details of a particular comparison operation can be
   specified by someone with appropriate expertise independent of the
   application protocol that consumes that comparator.  This is similar
   to the way a charset [13] separates the details of octet to character
   mapping from a protocol specification such as MIME [8] or the way
   SASL [9] separates the details of an authentication mechanism from a
   protocol specification such as ACAP [10].















Newman                  Expires November 7, 2003                [Page 3]


Internet-Draft            Comparator Registry                   May 2003


   Here a small diagram to help illustrate the value of this abstraction
   layer:

                                                 +-----------------+
                                                 | Octet           |
   +-------------------+                      +--| Comparator Spec |
   | IMAP i18n SEARCH  |--+                   |  +-----------------+
   +-------------------+  |  +-------------+  |  +-----------------+
                          +--| Comparator  |--+--| A stringprep    |
   +-------------------+  |  | Registry    |  |  | Comparator Spec |
   | ACAP i18n SEARCH  |--+  +-------------+  |  +-----------------+
   +-------------------+                      |  +-----------------+
                                              |  | locale-specific |
                                              +--| Comparator Spec |
                                                 +-----------------+

   Thus IMAP, ACAP and future application protocols with international
   search capability simply specify how to interface to the comparator
   registry instead of each protocol spec having to specify all the
   comparators it supports.

   One component of a comparator is a canonicalization function which
   can be pre-applied to single strings and may enhance the performance
   of subsequent comparison operations.  Normally, this is an
   implementation detail of comparators, but at times it may be useful
   for an application protocol to expose comparator canonicalization
   over protocol.  Comparator canonicalization can range from an
   identity mapping (e.g., the i;octet comparator) to a mapping which
   makes the string unreadable to a human (e.g., the basic comparator).

3. Comparator Name Syntax

   The comparator name itself is a single US-ASCII string beginning with
   a letter and made up of letters, digits, or one of the following 4
   symbols: "-", ";", "=" or ".".  The name MUST NOT be longer than 254
   characters.

     comparator-char  =  ALPHA / DIGIT / "-" / ";" / "=" / "."

     comparator-name  =  ALPHA *253comparator-char

   The string a client uses to select a comparator MAY contain a
   wildcard ("*") character which matches zero or more comparator-chars.
   Wildcard characters MUST NOT be adjacent.  Clients which support
   disconnected operation SHOULD NOT use wildcards to select a
   comparator, but clients which provide comparator operations only when
   connected to the server MAY use wildcards.  If the wildcard string
   matches multiple comparators, the server SHOULD select the comparator



Newman                  Expires November 7, 2003                [Page 4]


Internet-Draft            Comparator Registry                   May 2003


   with the broadest scope (preferably international scope), the most
   recent table versions and the greatest number of supported
   operations.  A single wildcard character ("*") refers to the
   application protocol comparator behavior that would occur if no
   explicit negotiation were used.

   When used as a protocol element for ordering, the comparator name MAY
   be prefixed by either "+" or "-" to explicitly specify an ordering
   direction.  As mentioned previously, "+" has no effect on the
   ordering function, while "-" negates the result of the ordering
   function.  In general, comparator-order is used when a client
   requests a comparator, and comparator-sel is used with the server
   informs the client of the selected comparator.

     comparator-wild  =  ("*" / (ALPHA ["*"])) *(comparator-char ["*"])
                         ; MUST NOT exceed 255 characters total

     comparator-sel   =  ["+" / "-"] comparator-name

     comparator-order =  ["+" / "-"] comparator-wild

   While this specification makes no absolute requirements on the
   structure of comparator names, naming consistency is important, so
   the following initial guidelines are provided.

   Comparator names with an international audience typically begin with
   "i;".  Comparator names intended for a particular language or locale
   typically begin with a language tag [4] followed by a ";".  After the
   first ";" is normally the name of the general comparator algorithm
   followed by a series of algorithm modifications separated by the "-"
   delimiter.  Parameterized modifications will use "=" to delimit the
   parameter from the value.  The version numbers of any lookup tables
   used by the algorithm SHOULD be present as parameterized
   modifications.  This MAY be followed by a ";" and a name for a set of
   customizations applied to the comparator algorithm.

   Comparator names of the form *;vnd-domain.com;* are reserved for
   vendor-specific comparators created by the owner of the domain name
   following the "vnd-" prefix.  Registration of such comparators (or
   the name space as a whole) with intended use of "Vendor" is
   encouraged when a public specification or open-source implementation
   is available, but is not required.

4. Comparator Specification Requirements

   A comparator specification MUST state which of the three basic
   functions are supported (equality, substring, ordering) and how to
   perform each of the supported functions on any two input



Newman                  Expires November 7, 2003                [Page 5]


Internet-Draft            Comparator Registry                   May 2003


   octet-strings including empty strings.  Given a comparator with a
   specific name, and any two fixed input strings, the result MUST be
   the same.  The comparator specification MUST state whether the
   comparator operates on raw octets or on characters (in which case the
   UTF-8 charset is presumed).  Comparators MUST be transitive.

   A comparator specification MUST describe the internal
   canonicalization algorithm.  This algorithm can be applied to
   individual strings and the result strings can be stored to
   potentially optimize future comparison operations.  A comparator MAY
   specify that the canonicalization algorithm is the identity function.
   The output of the canonicalization algorithm MAY have no meaning to a
   human.

   Comparators which use more than one customizable lookup table in a
   documented format MUST assign numbers to the tables they use.  This
   permits an application protocol command to access the tables used by
   a server comparator.

   o  The equality function always returns "match" or "no-match" when
      supplied valid input and MAY return "error" if the input strings
      are not valid UTF-8 strings or violate other comparator
      constraints.

   o  The substring matching function determines if the first string is
      a substring of the second string.  A comparator which supports
      substring matching will automatically support the two special
      cases of substring matching: prefix and suffix matching if those
      special cases are supported by the application protocol.  It
      returns "match" or "no-match" when supplied valid input and
      returns "error" when supplied invalid input.

   o  The ordering function determines how two octet strings are
      ordered.  It returns "-1" if the first string is listed before the
      second string according to the comparator, "+1" if the second
      string is listed before the first string, and "0" if the two
      strings are equal.  If the order of the two strings is reversed,
      the result of the ordering function of the comparator MUST be
      negated.  In general, comparators SHOULD NOT return "0" unless the
      two octet sequences are identical.

      Since ordering is normally used to sort a list of items, "error"
      is not a useful return value from the ordering function.  Strings
      with errors that prevent the sorting algorithm from functioning
      correctly should sort to the end of the list.  Thus if the first
      string is invalid UTF-8 while the second string is valid, the
      result will be "+1".  If the second string is invalid UTF-8 while
      the first string is valid, the result will be "-1".  If the



Newman                  Expires November 7, 2003                [Page 6]


Internet-Draft            Comparator Registry                   May 2003


      comparator is character-based, and both strings are invalid UTF-8,
      the result SHOULD match the result from the "i;octet" comparator.

      When the comparator is used with a "+" prefix, the behavior is the
      same as when used with no prefix.  When the comparator is used
      with a "-" prefix, results which would be "+1" are instead "-1"
      and results which would be "-1" are instead "+1".

   Unless otherwise specified by the comparator or application protocol,
   a NULL string (as opposed to an empty string) is equal only to
   another NULL string, a NULL string is not a substring of any other
   string, and a NULL string sorts to a position after all non-NULL
   strings, but before strings which generate errors.

   Some application protocols will permit the use of multi-value
   attributes with a comparator.  This paragraph describes the rules
   that apply unless otherwise specified by the comparator or
   application protocol.  The equality and substring comparator
   algorithms will be iterated over each pair of single values from the
   two inputs.  If any combination produces an error, the result is an
   error.  Otherwise, if any combination produces a "match", the result
   is a match.  Otherwise the result is "no-match".  For the ordering
   function, the smallest ordinal octet string from the first set of
   values is compared to the smallest ordinal octet string from the
   second set of values.

   Application protocols MAY return position information for substring
   matches.  If this is done, the position information MUST include both
   the starting offset and the ending offset in the string.  This is
   important because more sophisticated comparators can match strings of
   unequal length (for example, a pre-composed accented character will
   match a decomposed accented character).

   Comparator specifications intended for common use are expected to
   reference standards from standards bodies with significant experience
   dealing with the details of international character sets.

5. Application Protocol Requirements

   An application protocol which offers searching, substring matching
   and/or sorting and permits the use of characters outside the US-ASCII
   charset needs to consider the following requirements and issues:

   The protocol MUST provide a mechanism for the client to select the
   comparator to use with equality matching, substring matching and
   ordering.

   The protocol MUST specify how comparisons behave in the absence of an



Newman                  Expires November 7, 2003                [Page 7]


Internet-Draft            Comparator Registry                   May 2003


   explicit comparator negotiation or when a comparator negotiation of
   "*" is used.  The protocol MAY specify that the default comparator
   used in such circumstances is sensitive to server configuration.

   The protocol SHOULD provide a way to list available comparators
   matching a given wildcard pattern or patterns.

   If the protocol provides positional information for the results of a
   substring match, that positional information MUST fully specify the
   substring in the result that matches independent of the length of the
   search string.  For example, returning both the starting and ending
   offset of the match would suffice, as would the starting offset and a
   length.  Returning just the starting offset is not acceptable.  This
   rule is necessary because advanced comparators can treat strings of
   different lengths as equal (for example, pre-composed and decomposed
   accented characters).

   If the protocol permits the use of comparators on stored character
   data which is not encoded with the UTF-8 charset, then the protocol
   specification has to describe relevant issues of the conversion.
   Details to consider include how to handle unknown charsets, any
   charsets which are mandatory-to-implement, any issues with byte-order
   that might apply, and any transfer encodings which need to be
   supported.

   If the protocol provides a canonicalization function for strings,
   then use of comparators MAY be appropriate for that function.

   If the protocol supports disconnected clients, then a mechanism for
   the client to precisely replicate the server's comparator algorithm
   is likely desirable.  Thus the protocol MAY wish to provide a command
   to fetch lookup tables used by charset conversions and comparators.

   The protocol specification should consider assigning protocol error
   codes for the following circumstances:

   o  The client requests the use of a comparator by name or pattern,
      but no implemented comparator matches that pattern.

   o  The client attempts to use a comparator for a function that is not
      supported by that comparator.  For example, attempting to use the
      "i;ascii-numeric" comparator for a substring matching function.

   o  The client uses an equality or substring matching comparator and
      the result is an error.  It may be appropriate to distinguish
      between the two input strings, particularly when one is supplied
      by the client and one is stored by the server.  It might also be
      appropriate to distinguish the specific case of an invalid UTF-8



Newman                  Expires November 7, 2003                [Page 8]


Internet-Draft            Comparator Registry                   May 2003


      string.

   If the protocol permits the use of a comparator with data structures
   beyond those described in this specification (octet strings, NULL
   string, array of octet strings), the protocol MUST describe the
   default behavior for a comparator with that data structure.

6. Initial Comparators

   This section describes an initial set of comparators for the
   comparator registry.

6.1 Octet Comparator

   The "i;octet" comparator is a simple and fast comparator intended for
   use on binary octet strings rather than on character data.  It never
   returns an "error" result.  It provides equality, substring and
   ordering functions.  The ordering algorithm is as follows:

   1.  If both strings are the empty string, return the result "0".

   2.  If the first string is empty and the second is not, return the
       result "-1".

   3.  If the second string is empty and the first is not, return the
       result "+1".

   4.  If both strings begin with the same octet value, remove the first
       octet from both strings and repeat this algorithm from step 1.

   5.  If the unsigned value (0 to 255) of the first octet of the first
       string is less than the unsigned value of the first octet of the
       second string, then return "-1".

   6.  If this step is reached, return "+1".

   This algorithm is roughly equivalent to the C library function memcmp
   with appropriate length checks added.

   The matching function returns "match" if the sorting algorithm would
   return "0".  Otherwise the matching function returns "no-match".

   The substring function returns "match" if the first string is the
   empty string, or if there exists a substring of the second string of
   length equal to the length of the first string which would result in
   a "match" result from the equality function.  Otherwise the substring
   function returns "no-match".




Newman                  Expires November 7, 2003                [Page 9]


Internet-Draft            Comparator Registry                   May 2003


   The associated canonicalization algorithm is the identity function.

6.2 ASCII Numeric Comparator

   The "i;ascii-numeric" comparator is a simple comparator intended for
   use with arbitrary sized decimal numbers stored as octet strings of
   US-ASCII digits (0x30 to 0x39).  It supports equality and ordering,
   but does not support the substring function.  The algorithm is as
   follows:

   1.  If neither string begins with a digit, return "error" if
       matching, or the result of the "i;octet" comparator for ordering.

   2.  If the first string begins with a digit and the second string
       does not, return "error" if matching and "-1" for ordering.

   3.  If the second string begins with a digit and the first string
       does not, return "error" if matching and "+1" for ordering.

   4.  Let "n" be the number of digits at the beginning of the first
       string, and "m" be the number of digits at the beginning of the
       second string.

   5.  If n is equal to m, return the result of the "i;octet"
       comparator.

   6.  If n is greater than m, prepend a string of "n - m" zeros to the
       second string and return the result of the "i;octet" comparator.

   7.  If m is greater than n, prepend a string of "m - n" zeros to the
       first string and return the result of the "i;octet" comparator.

   The associated canonicalization algorithm is to truncate the input
   string at the first non-digit character.

6.3 ASCII Casemap Comparator

   The "en;ascii-casemap" comparator is a simple comparator intended for
   use with English language text in pure US-ASCII.  It provides
   equality, substring and ordering functions.  The algorithm first
   applies a canonicalization algorithm to both input strings which
   subtracts 32 (0x20) from all octet values between 97 (0x61) and 122
   (0x7A) inclusive.  The result of the comparator is then the same as
   the result of the "i;octet" comparator for the canonicalized strings.
   Care should be taken when using OS-supplied functions to implement
   this comparator as this is not locale sensitive, but functions such
   as strcasecmp and toupper can be locale sensitive.




Newman                  Expires November 7, 2003               [Page 10]


Internet-Draft            Comparator Registry                   May 2003


   For historical reasons, in the context of ACAP and Sieve, the name
   "i;ascii-casemap" is a synonym for this comparator.

6.4 Nameprep Comparator

   The "i;nameprep-v=1-uv=3.2" comparator is an implementation of the
   nameprep [6] specification based on normalization tables from Unicode
   version 3.2.  This comparator applies the nameprep canoncialization
   function to both input strings and then returns the result of the
   i;octet comparator on the canonicalized strings.  While this
   comparator offers all three functions, the ordering function it
   provides is inadequate for use by the majority of the world.

   Version number 1 is applied to nameprep as specified in RFC 3491.  If
   the nameprep specification is revised without any changes that would
   produce different results when given the same pair of input octet
   strings, then the version number will remain unchanged.

   The table numbers for tables used by nameprep are as follows:

                +--------------+-----------------------+
                | Table Number | Table Name            |
                +--------------+-----------------------+
                |            1 | UnicodeData-3.2.0.txt |
                |            2 | Table B.1             |
                |            3 | Table B.2             |
                |            4 | Table C.1.2           |
                |            5 | Table C.2.2           |
                |            6 | Table C.3             |
                |            7 | Table C.4             |
                |            8 | Table C.5             |
                |            9 | Table C.6             |
                |           10 | Table C.7             |
                |           11 | Table C.8             |
                |           12 | Table C.9             |
                +--------------+-----------------------+


6.5 Basic Comparator

   The basic comparator is intended to provide tolerable results for a
   number of languages for all three functions (equality, substring and
   ordering) so it is suitable as a mandatory-to-implement comparator
   for protocols which include ordering support.  The ordering function
   of the basic comparator is the Unicode Collation Algorithm [7]
   version 9 (UCAv9).

   The equality and substring functions are created as described in



Newman                  Expires November 7, 2003               [Page 11]


Internet-Draft            Comparator Registry                   May 2003


   UCAv9 section 8.  While that section is informative to UCAv9, it is
   normative to this comparator specification.

   This comparator is based on Unicode version 3.2, with the following
   tables relevant:

   1.  For the normalization step, UnicodeData-3.2.0.txt [15] is used.
       Column 5 is used to determine the canonical decomposition, while
       column 3 contains the canonical combining classes necessary to
       attain canonical order.

   2.  The table of characters which require a logical order exception
       is a subset of the table in PropList-3.2.0.txt [16] and is
       included here:

   0E40..0E44    ; Logical_Order_Exception
   # Lo   [5] THAI CHARACTER SARA E..THAI CHARACTER SARA AI MAIMALAI
   0EC0..0EC4    ; Logical_Order_Exception
   # Lo   [5] LAO VOWEL SIGN E..LAO VOWEL SIGN AI

   # Total code points: 10

   3.  The table used to translate normalized code points to a sort key
       is allkeys-3.1.1.txt [17].

   UCAv9 includes a number of configurable parameters and steps labelled
   as potentially optional.  The following list summarizes the defaults
   used by this comparator:

   o  The logical order exception step is mandatory by default to
      support the largest number of languages.

   o  Steps 2.1.1 to 2.1.3 are mandatory as the repertoire of the basic
      comparator is intended to be large.

   o  The second level in the sort key is evaluated forwards by default.

   o  The variable weighting uses the "non-ignorable" option by default.

   o  The semi-stable option is not used by default.

   o  Support for exactly three levels of collation is the default
      behavior.

   o  No preprocessing step is used by the basic comparator prior to
      applying the UCAv9 algorithm.  Note that an application protocol
      specification MAY require pre-processing prior to the use of any
      comparators.



Newman                  Expires November 7, 2003               [Page 12]


Internet-Draft            Comparator Registry                   May 2003


   o  The equality and substring algorithms exclude differences at level
      2 and 3 by default (thus it is case-insensitive and ignores
      accentual distinctions.

   o  The equality and substring algorithms use the "Whole Characters
      Only" feature described in UCAv9 section 8 by default.

   The exact comparator name with these defaults is
   "i;basic-uca=3.1.1-uv=3.2".  When a specification states that the
   basic comparator is mandatory-to-implement, only this specific name
   is mandatory-to-implement.

   In order to allow modification of the optional behaviors, the
   following ABNF is used for variations of the basic comparator:

     basic-comparator  =  ("i" / Language-Tag) ";basic" basic-modifiers
                          "-uca=3.1.1-uv=3.2"
                          [";custom=" 1*comparator-char ]

     basic-modifiers   =  ["-2backwards"]
                          ["-blanked" / "-shifted" / "-shift-trimmed"]
                          ["-semi-stable"]
                          ["-quatlu"]
                          ["-match=accent" / "-match=case"]

   If multiple modifiers appear, they MUST appear in the order described
   above.  The modifiers have the following meanings:

   2backwards     When this modifier is selected, the order of the
                  second level sort keys is reversed.  This is useful
                  for French customizations.

   blanked        Use the "blanked" variable weighting option described
                  in UCAv9 section 3.2.2 rather than the default
                  "non-ignorable".

   shifted        Use the "shifted" variable weighting option described
                  in UCAv9 section 3.2.2. rather than the default
                  "non-ignorable".

   shift-trimmed  Use the "shift-trimmed" variable weighting option
                  described in UCAv9 section 3.2.2. rather than the
                  default "non-ignorable".

   semi-stable    Use the "semi-stable" option.  This involves appending
                  the input string to the end of the computed sort keys
                  so that only two identical strings will produce a
                  result of "0" from the order function.



Newman                  Expires November 7, 2003               [Page 13]


Internet-Draft            Comparator Registry                   May 2003


   quatlu         Use the 4th level weight from the allkeys file for the
                  ordering function.

   match=accent   Both the first and second levels of the sort keys are
                  considered relevant to the equality and substring
                  operations (rather than the default of first level
                  only).  This makes the matching functions sensitive to
                  accentual distinctions.

   match=case     The first three levels of sort keys are considered
                  relevant to the equality and substring operations.
                  This makes the matching functions sensitive to both
                  case and accentual distinctions.

   The canonicalization algorithm associated with this comparator is the
   output of step 3 of the UCAv9 algorithm (described in section 4.3 of
   the UCA specification).  This canonicalization is not suitable for
   human consumption.

   Finally, the UCAv9 algorithm permits the "allkeys" table to be
   customized.  People who make quality customizations are encouraged to
   register those customizations using the comparator registry.
   Customization names beginning with "x" are reserved for experimental
   use, are treated as "Limited use" and MUST NOT match wildcards if any
   registered comparator is available that does match.

7. Use by ACAP and Sieve

   Both ACAP [10] and Sieve [14] are standards track specifications
   which used comparators prior to the creation of this specification
   and registry.  Those standards do not meet all the application
   protocol requirements described in Section 5.  For backwards
   compatibility, those protocols use the "i;ascii-casemap" instead of
   "en;ascii-casemap".

8. IANA Considerations

8.1 Comparator Registration Procedure

   IANA will create a mailing list comparator@iana.org which can be used
   for public discussion of comparator proposals prior to registration.
   Use of the mailing list is encouraged but not required.  The actual
   registration procedure will not begin until the completed
   registration template is sent to iana@iana.org.  The IESG will
   appoint a designated expert who will monitor the comparator@iana.org
   mailing list and review registrations forwarded from IANA.  The
   designated expert is expected to tell IANA and the submitter of the
   registration within two weeks whether the registration is approved,



Newman                  Expires November 7, 2003               [Page 14]


Internet-Draft            Comparator Registry                   May 2003


   approved with minor changes, or rejected with cause.  When a
   registration is rejected with cause, it can be re-submitted if the
   concerns listed in the cause are addressed.  Decisions made by the
   designated expert can be appealed to the IESG and subsequently follow
   the normal appeals procedure for IESG decisions.

   Comparator registrations in a standards track, BCP or IESG-approved
   experimental RFC are owned by the IESG and changes to the
   registration follow normal procedures for updating such documents.
   Comparator registrations in other RFCs are owned by the RFC
   author(s).  Other comparator registrations are owned by the
   individual(s) listed in the contact field of the registration and
   IANA will preserve this information.  Changes to a registration MUST
   be approved by the owner.  In the event the owner can't be contacted
   for a period of one month and a change is deemed necessary, the IESG
   MAY re-assign ownership to an appropriate party.

8.2 Comparator Registration Template

   Comparator Name: {see comparator-wild syntax (Section 3)}

   Published Specification(s):

   Supported Functions: {one or more of "equality", "substring" and
   "order"}

   Scope: {"i18n", "Local", "Other"}

   Intended Use: {"Common", "Limited", "Vendor", "Deprecated"}

   Person and email address to contact for further information:

8.3 Octet Comparator Registration

   Comparator Name: i;octet

   Published Specification: RFC XXXX Section 6.1

   Supported Functions: equality, substring, order

   Scope: Other

   Intended Use: Common

   Person and email address to contact for further information:

   See the "Author's Address" section near the end of this
   specification.



Newman                  Expires November 7, 2003               [Page 15]


Internet-Draft            Comparator Registry                   May 2003


8.4 ASCII Numeric Comparator Registration

   Comparator Name: i;ascii-numeric

   Published Specification: RFC XXXX Section 6.2

   Supported Functions: equality, order

   Scope: Other

   Intended Use: Limited

   Person and email address to contact for further information:

   See the "Author's Address" section near the end of this
   specification.

8.5 ASCII Casemap Comparator Registration

   Comparator Name: en;ascii-casemap

   Published Specification: RFC XXXX Section 6.3

   Supported Functions: equality, substring, order

   Scope: Local

   Intended Use: Deprecated

   Person and email address to contact for further information:

   See the "Author's Address" section near the end of this
   specification.

8.6 Nameprep Comparator Registration

   Comparator Name: i;nameprep-v=1-uv=3.2

   Published Specification: RFC XXXX Section 6.4

   Supported Functions: equality, substring, order

   Scope: i18n

   Intended Use: Common

   Person and email address to contact for further information:




Newman                  Expires November 7, 2003               [Page 16]


Internet-Draft            Comparator Registry                   May 2003


   See the "Author's Address" section near the end of this
   specification.

8.7 Basic Comparator Registration

   Comparator Name: i;basic-*uca=3.1.1-uv=3.2*

   Published Specification: RFC XXXX Section 6.5

   Supported Functions: equality, substring, order

   Scope: i18n

   Intended Use: Common

   Person and email address to contact for further information:

   See the "Author's Address" section near the end of this
   specification.

8.8 Structure of Comparator Registry

   The comparator registry itself is divided into four sections.  The
   first section is for comparators intended for common use.  This
   section is intended for comparator registrations published in IESG
   approved RFCs or for locally scoped comparators from the primary
   standards body for that locale.  The designated expert is encouraged
   to reject comparator registrations with an intended use of "common"
   if the expert believes it should be "limited", as it is desirable to
   keep the number of "common" registrations small and high quality.
   The second section is reserved for limited use comparators.  The
   third section is reserved for registered vendor specific comparators.
   The final section is reserved for deprecated comparators.


















Newman                  Expires November 7, 2003               [Page 17]


Internet-Draft            Comparator Registry                   May 2003


8.9 Example Initial Registry

   The following is an example of how IANA might structure the initial
   registry:

     Comparator                        Functions Scope Reference
     ----------                        --------- ----- ---------
   Common Use Comparators:
     i;octet                           e, s, o   Other [RFC XXXX]
     i;nameprep-v=1-uv=3.2             e, s, o   i18n  [RFC XXXX]
     i;basic-*uca=3.1.1-uv=3.2*        e, s, o   i18n  [RFC XXXX]
     en;ascii-casemap                  e, s, o   Local [RFC XXXX]

   Limited Use Comparators:
     i;ascii-numeric                   e, o      Other [RFC XXXX]

   Vendor Comparators:

   Deprecated Comparators:


   References
   ----------
   [RFC XXXX]  Newman, C., "Internet Application Protocol Comparator
               Registry", RFC XXXX, Sun Microsystems, May 2003.


9. Guidelines for Expert Reviewer

   The expert reviewer appointed by the IESG has fairly broad latitude
   for this registry.  While a number of comparators are expected
   (particularly customizations of the basic comparator for localized
   use), an explosion of comparators (particularly common use
   comparators) is not desirable for widespread interoperability.
   However, it is important for the expert reviewer to provide cause
   when rejecting a registration, and when possible to describe
   corrective action to permit the registration to proceed.  The
   following table includes some example reasons to reject a
   registration with cause:

   o  The registration has intended use of "common", but there is no
      evidence the comparator will be widely deployed so it should be
      listed as "limited".

   o  The registration has intended use of "common", but is redundant
      with the functionality of a previously registered "common"
      comparator.




Newman                  Expires November 7, 2003               [Page 18]


Internet-Draft            Comparator Registry                   May 2003


   o  The comparator name fails to precisely identify the version
      numbers of relevant tables to use.

   o  The registration fails to meet one of the "MUST" requirements in
      Section 4.

   o  The comparator name fails to meet the syntax in Section 3.

   o  The comparator specification referenced in the registration is
      vague or has optional features without a clear behavior specified.

   o  The referenced specification does not adequately address security
      considerations specific to that comparator.


10. Security Considerations

   Comparators will normally be used with UTF-8 strings.  Thus the
   security considerations for UTF-8 [3] and stringprep [5] also apply
   and are normative to this specification.

11. Open Issues

   1.  Should we permit non-ASCII characters in the comparator name?
       The benefit of allowing non-ASCII characters in the comparator
       name is it would make presenting comparators to an end-user
       simple, particularly if comparator names were structured to
       include a user-friendly name as part of the conventional
       structure.  However, because there are other solutions to the
       problem of user friendly selection of a comparator which would
       add less complexity to the common case, the author has errored on
       the side of simplicity.  As long as the set of common use
       comparators (excluding versions) is relatively small, user
       friendly names can be part of the client (ideally admin
       configuration for the client).  If the registry becomes large,
       then a lookup service could be used to translate a comparator
       name into a user-friendly name.

   2.  Is any Nameprep processing appropriate for the basic comparator?
       Because a result of "0" from an ordering algorithm is
       undesirable, much of the nameprep processing is inappropriate.
       Furthermore, a result of "error" which is important for nameprep
       is generally inappropriate as an internal result in an ordering
       algorithm since it makes the results less intuitive.  The sort
       key table also eliminates most problematic characters from
       consideration if the appropriate comparator modifier is used.
       Finally, exact compatibility with the Unicode Collation Algorithm
       is deemed desirable by the author, as even the smallest variation



Newman                  Expires November 7, 2003               [Page 19]


Internet-Draft            Comparator Registry                   May 2003


       may require implementation of largely duplicate code.  However,
       this decision is outside my expertise, so I welcome alternate
       viewpoints.

   3.  The ICU implementation of the UCA algorithm includes additional
       algorithmic customizations such as the ability to be
       case-sensitive while at the same time being insensitive to
       accents.  Should these customizations be added to this
       specification?

   4.  Should a format for customization data for the basic comparator
       be defined so that disconnected clients might have the option of
       downloading that information?

Normative References

   [1]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels", BCP 14, RFC 2119, March 1997.

   [2]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
        Specifications: ABNF", RFC 2234, November 1997.

   [3]  Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC
        2279, January 1998.

   [4]  Alvestrand, H., "Tags for the Identification of Languages", BCP
        47, RFC 3066, January 2001.

   [5]  Hoffman, P. and M. Blanchet, "Preparation of Internationalized
        Strings ("stringprep")", RFC 3454, December 2002.

   [6]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for
        Internationalized Domain Names (IDN)", RFC 3491, March 2003.

   [7]  Davis, M. and K. Whistler, "Unicode Collation Algorithm version
        9", July 2002, <http://www.unicode.org/reports/tr10/
        tr10-9.html>.

Informative References

   [8]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
         Extensions (MIME) Part One: Format of Internet Message Bodies",
         RFC 2045, November 1996.

   [9]   Myers, J., "Simple Authentication and Security Layer (SASL)",
         RFC 2222, October 1997.

   [10]  Newman, C. and J. Myers, "ACAP -- Application Configuration



Newman                  Expires November 7, 2003               [Page 20]


Internet-Draft            Comparator Registry                   May 2003


         Access Protocol", RFC 2244, November 1997.

   [11]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
         Considerations Section in RFCs", BCP 26, RFC 2434, October
         1998.

   [12]  Resnick, P., "Internet Message Format", RFC 2822, April 2001.

   [13]  Freed, N. and J. Postel, "IANA Charset Registration
         Procedures", BCP 19, RFC 2978, October 2000.

   [14]  Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028,
         January 2001.

URIs

   [15]  <http://www.unicode.org/Public/3.2-Update/
         UnicodeData-3.2.0.txt>

   [16]  <http://www.unicode.org/Public/3.2-Update/PropList-3.2.0.txt>

   [17]  <http://www.unicode.org/reports/tr10/allkeys-3.1.1.txt>


Author's Address

   Chris Newman
   Sun Microsystems
   1050 Lakes Drive
   West Covina, CA  91790
   US

   EMail: chris.newman@sun.com


















Newman                  Expires November 7, 2003               [Page 21]


Internet-Draft            Comparator Registry                   May 2003


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights. Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11. Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard. Please address the information to the IETF Executive
   Director.


Full Copyright Statement

   Copyright (C) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assignees.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION



Newman                  Expires November 7, 2003               [Page 22]


Internet-Draft            Comparator Registry                   May 2003


   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.











































Newman                  Expires November 7, 2003               [Page 23]


Html markup produced by rfcmarkup 1.129d, available from https://tools.ietf.org/tools/rfcmarkup/