[Docs] [txt|pdf] [Tracker] [Email] [Diff1] [Diff2] [Nits]
Versions: 00 01 02 03 04 05 06 07 08 09 10 11
12 13 14 RFC 4790
Network Working Group C. Newman
Internet-Draft Sun Microsystems
Expires: September 3, 2006 M. Duerst
AGU
A. Gulbrandsen
Oryx
March 2, 2006
Internet Application Protocol Collation Registry
draft-newman-i18n-comparator-07.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 3, 2006.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
Many Internet application protocols include string-based lookup,
searching, or sorting operations. However the problem space for
searching and sorting international strings is large, not fully
explored, and is outside the area of expertise for the Internet
Engineering Task Force (IETF). Rather than attempt to solve such a
Newman, et al. Expires September 3, 2006 [Page 1]
Internet-Draft Collation Registry March 2006
large problem, this specification creates an abstraction framework so
that application protocols can precisely identify a comparison
function and the repertoire of comparison functions can be extended
in the future.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Conventions Used in this Document . . . . . . . . . . . . 4
2. Collation Definition and Purpose . . . . . . . . . . . . . . . 4
2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3. Some Other Terms Used in this Document . . . . . . . . . . 5
2.4. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Collation Name Syntax . . . . . . . . . . . . . . . . . . . . 6
3.1. Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . 6
3.2. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3. Ordering Direction . . . . . . . . . . . . . . . . . . . . 6
3.4. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.5. Naming Guidelines . . . . . . . . . . . . . . . . . . . . 7
4. Collation Specification Requirements . . . . . . . . . . . . . 7
4.1. API . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2. Operations Supported . . . . . . . . . . . . . . . . . . . 8
4.2.1. Equality . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.2. Substring . . . . . . . . . . . . . . . . . . . . . . 9
4.2.3. Ordering . . . . . . . . . . . . . . . . . . . . . . . 9
4.3. Internal Canonicalization Algorithm . . . . . . . . . . . 10
4.4. Use of Lookup Tables . . . . . . . . . . . . . . . . . . . 10
4.5. Multi-Value Attributes . . . . . . . . . . . . . . . . . . 10
5. Application Protocol Requirements . . . . . . . . . . . . . . 11
5.1. Character Encoding . . . . . . . . . . . . . . . . . . . . 11
5.2. Operations . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 11
5.4. Canonicalization Function . . . . . . . . . . . . . . . . 12
5.5. Disconnected Clients . . . . . . . . . . . . . . . . . . . 12
5.6. Error Codes . . . . . . . . . . . . . . . . . . . . . . . 12
5.7. Octet Collation . . . . . . . . . . . . . . . . . . . . . 12
6. Use by Existing Protocols . . . . . . . . . . . . . . . . . . 13
7. Collation Registration . . . . . . . . . . . . . . . . . . . . 13
7.1. Collation Registration Procedure . . . . . . . . . . . . . 13
7.2. Collation Registration Format . . . . . . . . . . . . . . 14
7.2.1. Registration Template . . . . . . . . . . . . . . . . 14
7.2.2. The collation Element . . . . . . . . . . . . . . . . 14
7.2.3. The name Element . . . . . . . . . . . . . . . . . . . 15
7.2.4. The title Element . . . . . . . . . . . . . . . . . . 15
7.2.5. The functions Element . . . . . . . . . . . . . . . . 15
7.2.6. The specification Element . . . . . . . . . . . . . . 15
Newman, et al. Expires September 3, 2006 [Page 2]
Internet-Draft Collation Registry March 2006
7.2.7. The submitter Element . . . . . . . . . . . . . . . . 15
7.2.8. The owner Element . . . . . . . . . . . . . . . . . . 15
7.2.9. The version Element . . . . . . . . . . . . . . . . . 16
7.2.10. The UnicodeVersion Element . . . . . . . . . . . . . . 16
7.2.11. The UCAVersion Element . . . . . . . . . . . . . . . . 16
7.2.12. The UCAMatchLevel Element . . . . . . . . . . . . . . 16
7.3. DTD for Collation Registration . . . . . . . . . . . . . . 17
7.4. Structure of Collation Registry . . . . . . . . . . . . . 17
7.5. Example Initial Registry Summary . . . . . . . . . . . . . 18
8. Guidelines for Expert Reviewer . . . . . . . . . . . . . . . . 18
9. Initial Collations . . . . . . . . . . . . . . . . . . . . . . 19
9.1. ASCII Numeric Collation . . . . . . . . . . . . . . . . . 19
9.1.1. ASCII Numeric Collation Description . . . . . . . . . 19
9.1.2. ASCII Numeric Collation Registration . . . . . . . . . 20
9.2. ASCII Casemap Collation . . . . . . . . . . . . . . . . . 20
9.2.1. ASCII Casemap Collation Description . . . . . . . . . 20
9.2.2. Legacy English Casemap Collation Registration . . . . 21
9.2.3. English Casemap Collation Registration . . . . . . . . 21
9.3. Nameprep Collation . . . . . . . . . . . . . . . . . . . . 21
9.3.1. Nameprep Collation Description . . . . . . . . . . . . 21
9.3.2. Nameprep Collation Registration . . . . . . . . . . . 22
9.4. Basic Collation . . . . . . . . . . . . . . . . . . . . . 22
9.4.1. Basic Collation Description . . . . . . . . . . . . . 22
9.4.2. Basic Collation Registration . . . . . . . . . . . . . 24
9.4.3. Basic Accent Sensitive Match Collation Registration . 25
9.4.4. Basic Case Sensitive Match Collation Registration . . 25
9.5. Octet Collation . . . . . . . . . . . . . . . . . . . . . 25
9.5.1. Octet Collation Description . . . . . . . . . . . . . 25
9.5.2. Octet Collation Registration . . . . . . . . . . . . . 26
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26
11. Security Considerations . . . . . . . . . . . . . . . . . . . 27
12. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 27
13. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 27
13.1. Changes From -06 . . . . . . . . . . . . . . . . . . . . . 27
13.2. Changes From -05 . . . . . . . . . . . . . . . . . . . . . 28
13.3. Changes From -04 . . . . . . . . . . . . . . . . . . . . . 28
13.4. Changes From -03 . . . . . . . . . . . . . . . . . . . . . 28
13.5. Changes From -02 . . . . . . . . . . . . . . . . . . . . . 29
13.6. Changes From -01 . . . . . . . . . . . . . . . . . . . . . 29
13.7. Changes From -00 . . . . . . . . . . . . . . . . . . . . . 29
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 30
14.1. Normative References . . . . . . . . . . . . . . . . . . . 30
14.2. Informative References . . . . . . . . . . . . . . . . . . 30
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 32
Intellectual Property and Copyright Statements . . . . . . . . . . 33
Newman, et al. Expires September 3, 2006 [Page 3]
Internet-Draft Collation Registry March 2006
1. Introduction
The ACAP [12] specification introduced the concept of a comparator
(which we call collation in this document), but failed to create an
IANA registry. With the introduction of stringprep [6] and the
Unicode Collation Algorithm [8], it is now time to create that
registry and populate it with some initial values appropriate for an
international community. This specification replaces and generalizes
the definition of a comparator in ACAP and creates a collation
registry.
1.1. Conventions Used in this Document
The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
in this document are to be interpreted as defined in "Key words for
use in RFCs to Indicate Requirement Levels" [1].
The attribute syntax specifications use the Augmented Backus-Naur
Form (ABNF) [2] notation including the core rules defined in Appendix
A. This also inherits ABNF rules from Language Tags [5].
2. Collation Definition and Purpose
2.1. Definition
A collation is a named function which takes two arbitrary length
strings as input and can be used to perform one or more of three
basic comparison operations: equality test, substring match, and
ordering test.
2.2. Purpose
Collations provide a multi-protocol abstraction layer for comparison
functions so the details of a particular comparison operation can be
specified by someone with appropriate expertise independent of the
application protocol that consumes that collation. This is similar
to the way a charset [14] separates the details of octet to character
mapping from a protocol specification such as MIME [10] or the way
SASL [11] separates the details of an authentication mechanism from a
protocol specification such as ACAP [12].
Newman, et al. Expires September 3, 2006 [Page 4]
Internet-Draft Collation Registry March 2006
Here is a small diagram to help illustrate the value of this
abstraction layer:
+-------------------+ +-----------------+
| IMAP i18n SEARCH |--+ | Basic |
+-------------------+ | +--| Collation Spec |
| | +-----------------+
+-------------------+ | +-------------+ | +-----------------+
| ACAP i18n SEARCH |--+--| Collation |--+--| A stringprep |
+-------------------+ | | Registry | | | Collation Spec |
| +-------------+ | +-----------------+
+-------------------+ | | +-----------------+
| ...other protocol |--+ | | locale-specific |
+-------------------+ +--| Collation Spec |
+-----------------+
Thus IMAP, ACAP and future application protocols with international
search capability simply specify how to interface to the collation
registry instead of each protocol specification having to specify all
the collations it supports.
2.3. Some Other Terms Used in this Document
The terms client, server and protocol are used in somewhat unusual
senses.
Client means a user, or a program acting directly on behalf of a
user. This may be an mail reader acting as an IMAP client, or it may
be an interactive shell where the user can type protocol directly, or
it may be a script or program written by the user.
Server means a program that performs services requested by the
client. This may be a traditional server such as an HTTP server, or
it may be a Sieve [15] interpreter running a Sieve script written by
a user. A server needs to use the operations provided by collations
in order to fulfil the client's requests.
The protocol describes how the client informs the server what it
wants done, and (if applicable) how the server tells the client about
the results. IMAP is a protocol by this definition, and so is the
Sieve language.
2.4. Sort Keys
One component of a collation is a transformation which turns a string
into a sort key, which is then used while sorting.
The transformation can range from an identity mapping (e.g., the
Newman, et al. Expires September 3, 2006 [Page 5]
Internet-Draft Collation Registry March 2006
i;octet collation Section 9.5) to a mapping which makes the string
unreadable to a human (e.g., the basic collation Section 9.4).
This is an implementation detail of collations or servers. A
protocol SHOULD NOT expose it, since some collations leave the sort
key's format up to the implementation, and current conformant
implementations are known to use different formats.
3. Collation Name Syntax
3.1. Basic Syntax
The collation name itself is a single US-ASCII string beginning with
a letter and made up of letters, digits, or one of the following 4
symbols: "-", ";", "=" or ".". The name MUST NOT be longer than 254
characters.
collation-char = ALPHA / DIGIT / "-" / ";" / "=" / "."
collation-name = ALPHA *253collation-char
3.2. Wildcards
The string a client uses to select a collation MAY contain a wildcard
("*") character which matches zero or more collation-chars. Wildcard
characters MUST NOT be adjacent. If the wildcard string matches
multiple collations, the server SHOULD select the collation with the
broadest scope (preferably international scope), the most recent
table versions and the greatest number of supported operations.
collation-wild = ("*" / (ALPHA ["*"])) *(collation-char ["*"])
; MUST NOT exceed 254 characters total
3.3. Ordering Direction
When used as a protocol element for ordering, the collation name MAY
be prefixed by either "+" or "-" to explicitly specify an ordering
direction. As mentioned previously, "+" has no effect on the
ordering operation, while "-" negates the result of the ordering
operation. In general, collation-order is used when a client
requests a collation, and collation-sel is used when the server
informs the client of the selected collation.
collation-sel = ["+" / "-"] collation-name
collation-order = ["+" / "-"] collation-wild
Newman, et al. Expires September 3, 2006 [Page 6]
Internet-Draft Collation Registry March 2006
3.4. URIs
Some protocols are designed to use URIs [4] to refer to collations
rather than simple tokens. A special section of the IANA web page is
reserved for such usage. The "collation-uri" form is used to refer
to a specific IANA registry entry for a specific named collation (the
collation registration may not actually be present if it is
experimental). The "collation-auri" form is an abstract name for an
ordering, a collation pattern or a vendor private collatorn.
collation-uri = "http://www.iana.org/assignments/collation/"
collation-name ".xml"
collation-auri = ( "http://www.iana.org/assignments/collation/"
collation-order [".xml"]) / other-uri
other-uri = <absoluteURI>
; excluding the IANA collation namespace.
3.5. Naming Guidelines
While this specification makes no absolute requirements on the
structure of collation names, naming consistency is important, so the
following initial guidelines are provided.
Collation names with an international audience typically begin with
"i;". Collation names intended for a particular language or locale
typically begin with a language tag [5] followed by a ";". After the
first ";" is normally the name of the general collation algorithm,
followed by a series of algorithm modifications separated by the ";"
delimiter. Parameterized modifications will use "=" to delimit the
parameter from the value. The version numbers of any lookup tables
used by the algorithm SHOULD be present as parameterized
modifications.
Collation names of the form *;vnd-domain.com;* are reserved for
vendor-specific collations created by the owner of the domain name
following the "vnd-" prefix (e.g. vnd-example.com for the vendor
example.com). Registration of such collations (or the name space as
a whole) with intended use of "Vendor" is encouraged when a public
specification or open-source implementation is available, but is not
required.
4. Collation Specification Requirements
Newman, et al. Expires September 3, 2006 [Page 7]
Internet-Draft Collation Registry March 2006
4.1. API
The collation itself decides what it operates on. Most collations
are expected to operate on character strings. The i;octet
(Section 9.5) collation operates on octet strings. The i;ascii-
numeric (Section 9.1) operation operates on numbers.
This specification defines the collation interface in terms of octet
strings, however, implementations may choose to use character strings
instead. Such implementations may not be able to implement e.g.
i;octet. Since i;octet is not currently mandatory to implement for
any protocol, this should not be a problem.
4.2. Operations Supported
A collation specification MUST state which of the three basic
operations are supported (equality, substring, ordering) and how to
perform each of the supported operations on any two input character
strings including empty strings. Collations must be deterministic,
i.e. given a collation with a specific name, and any two fixed input
strings, the result MUST be the same for the same operation.
In general, collation operations should behave as their names
suggest. While a collation may be new, the operations are not, so
the new collation's algorithm for each operation should be as similar
as possible to those of older collations. For example, a collation
should not provide a "substring" operator that would morph IMAP
substring SEARCH into another kind of search.
Note that for any single collation, either none or all of the
operations can return "error". For example, it is not possible to
have an equality operator that never returns "error" and a substring
operator that occasionally does.
4.2.1. Equality
The equality test always returns "match" or "no-match" when supplied
valid input, and MAY return "error" if one or both input strings are
not valid character strings or violate other collation constraints.
The equality test MUST be reflexive, symmetric and transitive.
If a collation provides either a substring or an ordering test, it
MUST also provide an equality test. The substring and/or ordering
tests MUST be consistent with the equality test.
In this specification, the return values of the equality test are
called "match", "no-match" and "error". This is not a specification,
Newman, et al. Expires September 3, 2006 [Page 8]
Internet-Draft Collation Registry March 2006
merely a choice of phrasing.
4.2.2. Substring
The substring matching operation determines if the first string is a
substring of the second string, ie. if the second string contains any
string which is equal to the first, as defined by the collation's
equality operator.
A collation which supports substring matching will automatically
support two special cases of substring matching: prefix and suffix
matching if those special cases are supported by the application
protocol. It returns "match" or "no-match" when supplied valid input
and returns "error" when supplied invalid input.
Application protocols MAY return position information for substring
matches. If this is done, the position information SHOULD include
both the starting offset and the ending offset for each match. All
matching substrings should be reported, even overlapping matches.
A string is a substring of itself. The empty string is a substring
of all strings. "Contain" is not defined by this specification.
Note that the substring operation of some collations can match
strings of unequal length. For example, a pre-composed accented
character can match a decomposed accented character. Unicode
Collation Algorithm [8] discusses this in more detail.
In this specification, the return values of the substring operator
are called "match", "no-match" and "error". This is not a
specification, merely a choice of phrasing.
4.2.3. Ordering
The ordering operator determines how two character strings are
ordered. It MUST be transitive and trichotomous.
Ordering returns "less" if the first string is listed before the
second string according to the collation, "greater" if the second
string is listed before the first string, and "equal" if the two
strings are equal as defined by the collation's equality operator.
If the order of the two strings is reversed, the result of the
ordering operator of the collation MUST be reversed, i.e. results
which would be "greater" are instead "less" and results which would
be "less" are instead "greater", while results which would be "equal"
stay "equal".
Since ordering is normally used to sort a list of items, "error" is
Newman, et al. Expires September 3, 2006 [Page 9]
Internet-Draft Collation Registry March 2006
not a useful return value from the ordering operator. Strings with
errors that prevent the sorting algorithm from functioning correctly
should sort to the end of the list. Thus if the first string is
invalid while the second string is valid, the result will be
"greater". If the second string is invalid while the first string is
valid, the result will be "less". If both strings are invalid, the
result SHOULD match the result from the "i;octet" collation.
When the collation is used with a "+" prefix, the behavior is the
same as when used with no prefix. When the collation is used with a
"-" prefix, the result of the ordering operator of the collation MUST
be reversed.
In this specification, the return values of the ordering operator are
called "less", "equal", "greater" and "error". This is not a
specification, merely a choice of phrasing.
4.3. Internal Canonicalization Algorithm
A collation specification MUST describe the internal canonicalization
algorithm. This algorithm can be applied to individual strings and
the result can be stored to potentially optimize future comparison
operations. A collation MAY specify that the canonicalization
algorithm is the identity function. The output of the
canonicalization algorithm MAY have no meaning to a human.
4.4. Use of Lookup Tables
Collations which use more than one customizable lookup table in a
documented format MUST assign numbers to the tables they use. This
permits an application protocol command to access the tables used by
a server collation.
4.5. Multi-Value Attributes
Some application protocols will permit the use of multi-value
attributes with a collation. This paragraph describes the rules that
apply unless otherwise specified by the collation or application
protocol. In the case of the equality and substring operation, the
operations are applied over each pair of single values from the two
inputs. If any combination produces an error, the result is an
error. Otherwise, if any combination produces a "match", the result
is a match. Otherwise the result is "no-match". For the ordering
operator, the smallest ordinal character string from the first set of
values is compared to the smallest ordinal character string from the
second set of values.
Newman, et al. Expires September 3, 2006 [Page 10]
Internet-Draft Collation Registry March 2006
5. Application Protocol Requirements
This section describes the requirements and issues that an
application protocol needs to consider if it offers searching,
substring matching and/or sorting, and permits the use of characters
outside the US-ASCII charset.
5.1. Character Encoding
The protocol specification has to make sure that it is clear on which
characters (rather than just octets) the collations are used. This
can be done by specifying the protocol itself in terms of characters
(e.g. in the case of a query language), by specifying a single
character encoding for the protocol (e.g. UTF-8 [3]), or by
carefully describing the relevant issues of character encoding
labeling and conversion. In the later case, details to consider
include how to handle unknown charsets, any charsets which are
mandatory-to-implement, any issues with byte-order that might apply,
and any transfer encodings which need to be supported.
5.2. Operations
The protocol must specify which of the operations defined in this
specification (equality matching, substring matching and ordering)
can be invoked in the protocol, and how they are invoked. There may
be more than one way to invoke an operation.
The protocol MUST provide a mechanism for the client to select the
collation to use with equality matching, substring matching and
ordering.
If the protocol provides positional information for the results of a
substring match, that positional information SHOULD fully specify the
substring in the result that matches independent of the length of the
search string. For example, returning both the starting and ending
offset of the match would suffice, as would the starting offset and a
length. Returning just the starting offset is not acceptable. This
rule is necessary because advanced collations can treat strings of
different lengths as equal (for example, pre-composed and decomposed
accented characters).
5.3. Wildcards
The protocol MUST specify whether it allows the use of wildcards in
collation identifiers or not. If the protocol allows wildcards,
then:
Newman, et al. Expires September 3, 2006 [Page 11]
Internet-Draft Collation Registry March 2006
The protocol MUST specify how comparisons behave in the absence of
explicit collation negotiation or when a collation of "*" is
requested. The protocol MAY specify that the default collation
used in such circumstances is sensitive to server configuration.
The protocol SHOULD provide a way to list available collations
matching a given wildcard pattern or patterns.
5.4. Canonicalization Function
If the protocol uses a canonicalization function for strings, then
use of collations MAY be appropriate for that function. As an
example, many protocols use case independent strings. In most cases,
a simple ASCII mapping to upper/lower case works well. However, in
some cases a collation may be better, e.g. to handle Turkish dotted/
dotless i.
5.5. Disconnected Clients
If the protocol supports disconnected clients, then a mechanism for
the client to precisely replicate the server's collation algorithm is
likely desirable. Thus the protocol MAY wish to provide a command to
fetch lookup tables used by charset conversions and collations.
5.6. Error Codes
The protocol specification should consider assigning protocol error
codes for the following circumstances:
o The client requests the use of a collation by name or pattern, but
no implemented collation matches that pattern.
o The client attempts to use a collation for an operation that is
not supported by that collation. For example, attempting to use
the "i;ascii-numeric" collation for substring matching.
o The client uses an equality or substring matching collation and
the result is an error. It may be appropriate to distinguish
between the two input strings, particularly when one is supplied
by the client and one is stored by the server. It might also be
appropriate to distinguish the specific case of an invalid UTF-8
string.
5.7. Octet Collation
The i;octet (Section 9.5) collation is only usable with protocols
based on octet-strings. Clients and servers MUST NOT use i;octet
with other protocols.
If the protocol permits the use of collations with data structures
other than strings, the protocol MUST describe the default behavior
for a collation with that data structure.
Newman, et al. Expires September 3, 2006 [Page 12]
Internet-Draft Collation Registry March 2006
6. Use by Existing Protocols
Both ACAP [12] and Sieve [15] are standards track specifications
which used collations prior to the creation of this specification and
registry. Those standards do not meet all the application protocol
requirements described in Section 5. For backwards compatibility,
those protocols use the "i;ascii-casemap" instead of "en;ascii-
casemap". These protocols allow the use of the i;octet (Section 9.5)
collation working directly on UTF-8 data as used in these protocols.
IMAP [16] also uses collation, although the use is explicit only when
the COMPARATOR [18] extension is used. The built-in IMAP substring
operation and the ordering provided by the SORT [17] extension may
not meet the requirements made in this document.
Other protocols may be in a similar position.
7. Collation Registration
7.1. Collation Registration Procedure
The IETF will create a mailing list, collation@ietf.org, which can be
used for public discussion of collation proposals prior to
registration. Use of the mailing list is encouraged but not
required. The actual registration procedure will not begin until the
completed registration template is sent to iana@iana.org. The IESG
will appoint a designated expert who will monitor the
collation@ietf.org mailing list and review registrations forwarded
from IANA. The designated expert is expected to tell IANA and the
submitter of the registration within two weeks whether the
registration is approved, approved with minor changes, or rejected
with cause. When a registration is rejected with cause, it can be
re-submitted if the concerns listed in the cause are addressed.
Decisions made by the designated expert can be appealed to the IESG
and subsequently follow the normal appeals procedure for IESG
decisions.
Collation registrations in a standards track, BCP or IESG-approved
experimental RFC are owned by the IETF, and changes to the
registration follow normal procedures for updating such documents.
Collation registrations in other RFCs are owned by the RFC author(s).
Other collation registrations are owned by the individual(s) listed
in the contact field of the registration and IANA will preserve this
information. Changes to a registration MUST be approved by the
owner. In the event the owner cannot be contacted for a period of
one month and a change is deemed necessary, the IESG MAY re-assign
ownership to an appropriate party.
Newman, et al. Expires September 3, 2006 [Page 13]
Internet-Draft Collation Registry March 2006
7.2. Collation Registration Format
Registration of a collation is done by sending a well-formed XML
document that validates with collationreg.dtd (Section 7.3).
7.2.1. Registration Template
Here is a template for the registration:
<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
<collation rfc="YYYY" scope="i18n" intendedUse="common">
<name>collation name</name>
<title>technical title for collation</title>
<functions>equality order substring</functions>
<specification>specification reference</specification>
<owner>email address of owner or IETF</owner>
<submitter>email address of submitter<submitter>
<version>1</version>
<UnicodeVersion>3.2</UnicodeVersion>
<UCAVersion>3.1.1</UCAVersion>
</collation>
7.2.2. The collation Element
The root of the registration document MUST be a <collation> element.
The collation element contains the other elements in the
registration, which are described in the following sub-subsections,
in the order given here.
The <collation> element MAY include an "rfc=" attribute if the
specification is in an RFC. The "rfc=" attribute gives only the
number of the RFC, without any prefix, such as "RFC", or suffix, such
as ".txt".
The <collation> element MUST include a "scope=" attribute, which MUST
have one of the values "i18n", "local" or "other".
The <collation> element MUST include an "intendedUse=" attribute,
which must have one of the values "common", "limited", "vendor", or
"deprecated". Collation specifications intended for "common" use are
expected to reference standards from standards bodies with
significant experience dealing with the details of international
character sets.
Be aware that future revisions of this specification may add
additional function types, as well as additional XML attributes and
values. Any system which automatically parses these XML documents
Newman, et al. Expires September 3, 2006 [Page 14]
Internet-Draft Collation Registry March 2006
MUST take this into account to preserve future compatibility. A DTD
for the current definition of the collation registration template is
given in Section 7.3.
7.2.3. The name Element
The <name> element gives the precise name of the collation. The
<name> element is mandatory.
7.2.4. The title Element
The <title> element gives the title of the collation. The <title>
element is mandatory.
7.2.5. The functions Element
The <functions> element lists which of the three operators
("equality", "order" or "substring") the collation provides. The
<functions> element is mandatory.
7.2.6. The specification Element
The <specification> element describes where to find the
specification. The <specification> element is mandatory. It MAY
have a URI attribute. There may be more than one <specification>
elements. (For example, a collation which has previously been
specified by a vendor may have been published on that vendor's web
site, and subsequently by a standards organization.)
In case the different specifications differ, the RFC is the
definitive specification.
7.2.7. The submitter Element
The <submitter> element provides an RFC 2822 [13] email address for
the person who submitted the registration. It is optional if the
<owner> element contains an email address.
There may be more than one <submitter> element.
7.2.8. The owner Element
The <owner> element contains either the four letters "IETF" or an
email address of the owner of the registration. The <owner> element
is mandatory. There may be more than one <owner> element. If so,
all owners are equal. Each owner can speak for all.
Newman, et al. Expires September 3, 2006 [Page 15]
Internet-Draft Collation Registry March 2006
7.2.9. The version Element
The <version> element is included when the registration is likely to
be revised or has been revised in such a way that the results change
for certain input strings. The <version> element is optional.
7.2.10. The UnicodeVersion Element
The <UnicodeVersion> element indicates the version number of the
UnicodeData file on which the collation is based. The
<UnicodeVersion> element is optional.
7.2.11. The UCAVersion Element
The <UCAVersion> element specifics the version of the Unicode
Collation Algorithm on which the collation is based. The
<UCAVersion> element is optional.
7.2.12. The UCAMatchLevel Element
The <UCAMatchLevel> element specifies the number of Unicode Collation
Algorithm sort key levels used for the equality and substring
operations. The <UCAMatchLevel> element is optional.
Newman, et al. Expires September 3, 2006 [Page 16]
Internet-Draft Collation Registry March 2006
7.3. DTD for Collation Registration
<!-
DTD for Collation Registration Document
Data types:
entity description
====== ========== NUMBER [0-9]+
URI As defined in RFC 3986
CTEXT printable ASCII text (no line-terminators)
TEXT character data
->
<!ENTITY % NUMBER "CDATA">
<!ENTITY % URI "CDATA">
<!ENTITY % CTEXT "#PCDATA">
<!ENTITY % TEXT "#PCDATA">
<!ELEMENT collation (name,title,functions,specification+,owner+,
submitter*,version?,UnicodeVersion?,
UCAVersion?,UCAMatchLevel?)>
<!ATTLIST collation
rfc %NUMBER; "0"
scope (i18n|local|other) #IMPLIED
intendedUse (common|limited|vendor|deprecated) #IMPLIED>
<!ELEMENT name (%CTEXT;)>
<!ELEMENT title (%CTEXT;)>
<!ELEMENT functions (%CTEXT;)>
<!ELEMENT specification (%TEXT;)>
<!ATTLIST specification
uri %URI; "">
<!ELEMENT owner (%CTEXT;)>
<!ELEMENT submitter (%CTEXT;)>
<!ELEMENT version (%CTEXT;)>
<!ELEMENT UnicodeVersion (%CTEXT;)>
<!ELEMENT UCAVersion (%CTEXT;)>
<!ELEMENT UCAMatchLevel (%CTEXT;)>
7.4. Structure of Collation Registry
Once the registration is approved, IANA will store each XML
registration document in a URL of the form
http://www.iana.org/assignments/collation/collation-name.xml where
collation-name is the contents of the name element in the
registration. Both the submitter and the designated expert is
responsible for verifying that the XML is well-formed and complies
with the DTD.
Newman, et al. Expires September 3, 2006 [Page 17]
Internet-Draft Collation Registry March 2006
IANA will also maintain a text summary of the registry under the name
http://www.iana.org/assignments/collation/summary.txt. This summary
is divided into four sections. The first section is for collations
intended for common use. This section is intended for collation
registrations published in IESG approved RFCs or for locally scoped
collations from the primary standards body for that locale. The
designated expert is encouraged to reject collation registrations
with an intended use of "common" if the expert believes it should be
"limited", as it is desirable to keep the number of "common"
registrations small and high quality. The second section is reserved
for limited use collations. The third section is reserved for
registered vendor specific collations. The final section is reserved
for deprecated collations.
7.5. Example Initial Registry Summary
The following is an example of how IANA might structure the initial
registry summary.txt file:
Collation Functions Scope Reference
--------- --------- ----- ---------
Common Use Collations:
i;nameprep;v=1;uv=3.2 e, o, s i18n [RFC XXXX]
i;basic;uca=3.1.1;uv=3.2 e, o, s i18n [RFC XXXX]
i;basic;uca=3.1.1;uv=3.2;match=accent e, o, s i18n [RFC XXXX]
i;basic;uca=3.1.1;uv=3.2;match=case e, o, s i18n [RFC XXXX]
en;ascii-casemap e, o, s Local [RFC XXXX]
Limited Use Collations:
i;octet e, o, s Other [RFC XXXX]
i;ascii-numeric e, o Other [RFC XXXX]
Vendor Collations:
Deprecated Collations:
i;ascii-casemap e, o, s Local [RFC XXXX]
References
----------
[RFC XXXX] Newman, C., "Internet Application Protocol Collation
Registry", RFC XXXX, Sun Microsystems, October 2003.
8. Guidelines for Expert Reviewer
The expert reviewer appointed by the IESG has fairly broad latitude
for this registry. While a number of collations are expected
Newman, et al. Expires September 3, 2006 [Page 18]
Internet-Draft Collation Registry March 2006
(particularly customizations of the basic collation for localized
use), an explosion of collations (particularly common use collations)
is not desirable for widespread interoperability. However, it is
important for the expert reviewer to provide cause when rejecting a
registration, and when possible to describe corrective action to
permit the registration to proceed. The following table includes
some example reasons to reject a registration with cause:
o The registration is not a well-formed XML document that follows
the DTD.
o The registration has an intended use of "common", but there is no
evidence the collation will be widely deployed, so it should be
listed as "limited".
o The registration has an intended use of "common", but it is
redundant with the functionality of a previously registered
"common" collation.
o The registration has an intended use of "common", but the
specification is not detailed enough to allow interoperable
implementations by others.
o The collation name fails to precisely identify the version numbers
of relevant tables to use.
o The registration fails to meet one of the "MUST" requirements in
Section 4.
o The collation name fails to meet the syntax in Section 3.
o The collation specification referenced in the registration is
vague or has optional features without a clear behavior specified.
o The referenced specification does not adequately address security
considerations specific to that collation.
o The regitration's operations are needlessly different from those
of traditional operations.
9. Initial Collations
This section describes an initial set of collations for the collation
registry.
9.1. ASCII Numeric Collation
9.1.1. ASCII Numeric Collation Description
The "i;ascii-numeric" collation is a simple collation intended for
use with arbitrary sized unsigned decimal integer numbers stored as
octet strings of US-ASCII digits (0x30 to 0x39). It supports
equality and ordering, but does not support the substring operator.
The equality operator returns "match" if the two strings represent
the same number (ie. leading zeroes are disregarded), "no-match" if
the two strings represent different numbers, and "error" if either
Newman, et al. Expires September 3, 2006 [Page 19]
Internet-Draft Collation Registry March 2006
string is empty or contains nondigits.
The ordering operator returns "less" if the first string represents a
smaller number than the second, "equal" if they represent the same
number, and "greater" if the first string represents a larger number
than the second. If either string is empty or contains nondigits,
the ordering operator returns the result of the i;octet ordering
function.
The associated canonicalization algorithm is to truncate the input
string at the first non-digit character.
9.1.2. ASCII Numeric Collation Registration
<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
<collation rfc="XXXX" scope="other" intendedUse="limited">
<name>i;ascii-numeric</name>
<title>ASCII Numeric</title>
<functions>equality order</functions>
<specification>RFC XXXX</specification>
<owner>IETF</owner>
<submitter>chris.newman@sun.com<submitter>
</collation>
9.2. ASCII Casemap Collation
9.2.1. ASCII Casemap Collation Description
The "en;ascii-casemap" collation is a simple collation intended for
use with English language text in pure US-ASCII. It provides
equality, substring and ordering operators. The algorithm first
applies a canonicalization algorithm to both input strings which
subtracts 32 (0x20) from all octet values between 97 (0x61) and 122
(0x7A) inclusive. The result of the collation is then the same as
the result of the "i;octet" collation for the canonicalized strings.
Care should be taken when using OS-supplied functions to implement
this collation as this is not locale sensitive, but functions such as
strcasecmp and toupper are sometimes locale sensitive.
For historical reasons, in the context of ACAP and Sieve, the name
"i;ascii-casemap" is a synonym for this collation.
Newman, et al. Expires September 3, 2006 [Page 20]
Internet-Draft Collation Registry March 2006
9.2.2. Legacy English Casemap Collation Registration
<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
<collation rfc="XXXX" scope="local" intendedUse="deprecated">
<name>i;ascii-casemap</name>
<title>Legacy English Casemap</title>
<functions>equality order substring</functions>
<specification>RFC XXXX</specification>
<owner>IETF</owner>
<submitter>chris.newman@sun.com<submitter>
</collation>
9.2.3. English Casemap Collation Registration
<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
<collation rfc="XXXX" scope="local" intendedUse="common">
<name>en;ascii-casemap</name>
<title>English Casemap</title>
<functions>equality order substring</functions>
<specification>RFC XXXX</specification>
<owner>IETF</owner>
<submitter>chris.newman@sun.com<submitter>
</collation>
9.3. Nameprep Collation
9.3.1. Nameprep Collation Description
The "i;nameprep;v=1;uv=3.2" collation is an implementation of the
nameprep [7] specification based on normalization tables from Unicode
version 3.2. This collation applies the nameprep canoncialization
function to both input strings and then returns the result of the
i;octet collation on the canonicalized strings. While this collation
offers all three operators, the ordering operator it provides is
inadequate for use by the majority of the world.
Version number 1 is applied to nameprep as specified in RFC 3491. If
the nameprep specification is revised without any changes that would
produce different results when given the same pair of input octet
strings, then the version number will remain unchanged.
Newman, et al. Expires September 3, 2006 [Page 21]
Internet-Draft Collation Registry March 2006
The table numbers for tables used by nameprep are as follows:
+--------------+-----------------------+
| Table Number | Table Name |
+--------------+-----------------------+
| 1 | UnicodeData-3.2.0.txt |
| 2 | Table B.1 |
| 3 | Table B.2 |
| 4 | Table C.1.2 |
| 5 | Table C.2.2 |
| 6 | Table C.3 |
| 7 | Table C.4 |
| 8 | Table C.5 |
| 9 | Table C.6 |
| 10 | Table C.7 |
| 11 | Table C.8 |
| 12 | Table C.9 |
+--------------+-----------------------+
9.3.2. Nameprep Collation Registration
<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
<collation rfc="XXXX" scope="i18n" intendedUse="common">
<name>i;nameprep;v=1;uv=3.2</name>
<title>Nameprep</title>
<functions>equality order substring</functions>
<specification>RFC XXXX</specification>
<owner>IETF</owner>
<submitter>chris.newman@sun.com<submitter>
<version>1</version>
<UnicodeVersion>3.2</UnicodeVersion>
</collation>
9.4. Basic Collation
9.4.1. Basic Collation Description
The basic collation is intended to provide tolerable results for a
number of languages for all three operators (equality, substring and
ordering) so it is suitable as a mandatory-to-implement collation for
protocols which include ordering support. The ordering operator of
the basic collation is the Unicode Collation Algorithm [8] version 9
(UCAv9).
The equality and substring operators are created as described in
UCAv9 section 8. While that section is informative to UCAv9, it is
normative to this collation specification.
Newman, et al. Expires September 3, 2006 [Page 22]
Internet-Draft Collation Registry March 2006
This collation is based on Unicode version 3.2, with the following
tables relevant:
1. For the normalization step,
<http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.txt>
is used. Column 5 is used to determine the canonical
decomposition, while column 3 contains the canonical combining
classes necessary to attain canonical order.
2. The table of characters which require a logical order exception
is a subset of the table in
<http://www.unicode.org/Public/3.2-Update/PropList-3.2.0.txt> and
is included here:
0E40..0E44 ; Logical_Order_Exception
# Lo [5] THAI CHARACTER SARA E..THAI CHARACTER SARA AI MAIMALAI
0EC0..0EC4 ; Logical_Order_Exception
# Lo [5] LAO VOWEL SIGN E..LAO VOWEL SIGN AI
# Total code points: 10
3. The table used to translate normalized code points to a sort key
is <http://www.unicode.org/reports/tr10/allkeys-3.1.1.txt>.
UCAv9 includes a number of configurable parameters and steps labelled
as potentially optional. The following list summarizes the defaults
used by this collation:
o The logical order exception step is mandatory by default to
support the largest number of languages.
o Steps 2.1.1 to 2.1.3 are mandatory as the repertoire of the basic
collation is intended to be large.
o The second level in the sort key is evaluated forwards by default.
o The variable weighting uses the "non-ignorable" option by default.
o The semi-stable option is not used by default.
o Support for exactly three levels of collation is the default
behavior.
o No preprocessing step is used by the basic collation prior to
applying the UCAv9 algorithm. Note that an application protocol
specification MAY require pre-processing prior to the use of any
collations.
o The equality and substring algorithms exclude differences at level
2 and 3 by default (thus it is case-insensitive and ignores
accentual distinctions.
o The equality and substring algorithms use the "Whole Characters
Only" feature described in UCAv9 section 8 by default.
The exact collation name with these defaults is
"i;basic;uca=3.1.1;uv=3.2". When a specification states that the
basic collation is mandatory-to-implement, only this specific name is
mandatory-to-implement.
Newman, et al. Expires September 3, 2006 [Page 23]
Internet-Draft Collation Registry March 2006
In order to allow modification of the optional behaviors, the
following ABNF is used for variations of the basic collation:
basic-collation = ("i" / Language-Tag) ";basic;uca=3.1.1;uv=3.2"
[";match=accent" / ";match=case"]
[";tailor=" 1*collation-char ]
If multiple modifiers appear, they MUST appear in the order described
above. The modifiers have the following meanings:
match=accent Both the first and second levels of the sort keys are
considered relevant to the equality and substring
operations (rather than the default of first level
only). This makes the matching functions sensitive to
accentual distinctions.
match=case The first three levels of sort keys are considered
relevant to the equality and substring operations.
This makes the matching functions sensitive to both
case and accentual distinctions.
The default weighting option is "non-ignorable". The "semi-stable"
sort key option is not used by default.
The canonicalization algorithm associated with this collation is the
output of step 3 of the UCAv9 algorithm (described in section 4.3 of
the UCA specification). This canonicalization is not suitable for
human consumption.
Finally, the UCAv9 algorithm permits the "allkeys" table to be
tailored to a language. People who make quality tailorings are
encouraged to register those tailorings using the collation registry.
Tailoring names beginning with "x" are reserved for experimental use,
are treated as "Limited use" and MUST NOT match wildcards if any
registered collation is available that does match.
9.4.2. Basic Collation Registration
<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
<collation rfc="XXXX" scope="i18n" intendedUse="common">
<name>i;basic;uca=3.1.1;uv=3.2</name>
<title>Basic</title>
<functions>equality order substring</functions>
<specification>RFC XXXX</specification>
<owner>IETF</owner>
<submitter>chris.newman@sun.com<submitter>
<UnicodeVersion>3.2</UnicodeVersion>
<UCAVersion>3.1.1</UCAVersion>
<UCAMatchLevel>1</UCAMatchLevel>
Newman, et al. Expires September 3, 2006 [Page 24]
Internet-Draft Collation Registry March 2006
</collation>
9.4.3. Basic Accent Sensitive Match Collation Registration
<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
<collation rfc="XXXX" scope="i18n" intendedUse="common">
<name>i;basic;uca=3.1.1;uv=3.2;match=accent</name>
<title>Basic Accent Sensitive Match</title>
<functions>equality order substring</functions>
<specification>RFC XXXX</specification>
<owner>IETF</owner>
<submitter>chris.newman@sun.com<submitter>
<UnicodeVersion>3.2</UnicodeVersion>
<UCAVersion>3.1.1</UCAVersion>
<UCAMatchLevel>2</UCAMatchLevel>
</collation>
9.4.4. Basic Case Sensitive Match Collation Registration
<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
<collation rfc="XXXX" scope="i18n" intendedUse="common">
<name>i;basic;uca=3.1.1;uv=3.2;match=case</name>
<title>Basic Case Sensitive Match</title>
<functions>equality order substring</functions>
<specification>RFC XXXX</specification>
<owner>IETF</owner>
<submitter>chris.newman@sun.com<submitter>
<UnicodeVersion>3.2</UnicodeVersion>
<UCAVersion>3.1.1</UCAVersion>
<UCAMatchLevel>3</UCAMatchLevel>
</collation>
9.5. Octet Collation
9.5.1. Octet Collation Description
The "i;octet" collation is a simple and fast collation intended for
use on binary octet strings rather than on character data. It is the
only such collation; it is not possible to register additional
collations with this property. Protocols that want to make this
collation available have to do so by explicitly allowing it. If not
explicitly allowed, it MUST NOT be used. It never returns an "error"
result. It provides equality, substring and ordering operators.
The ordering algorithm is as follows:
Newman, et al. Expires September 3, 2006 [Page 25]
Internet-Draft Collation Registry March 2006
1. If both strings are the empty string, return the result "equal".
2. If the first string is empty and the second is not, return the
result "less".
3. If the second string is empty and the first is not, return the
result "greater".
4. If both strings begin with the same octet value, remove the first
octet from both strings and repeat this algorithm from step 1.
5. If the unsigned value (0 to 255) of the first octet of the first
string is less than the unsigned value of the first octet of the
second string, then return "less".
6. If this step is reached, return "greater".
This algorithm is roughly equivalent to the C library function memcmp
with appropriate length checks added.
The matching operator returns "match" if the sorting algorithm would
return "equal". Otherwise the matching operator returns "no-match".
The substring operator returns "match" if the first string is the
empty string, or if there exists a substring of the second string of
length equal to the length of the first string which would result in
a "match" result from the equality function. Otherwise the substring
operator returns "no-match".
The associated canonicalization algorithm is the identity operator.
9.5.2. Octet Collation Registration
This collation is defined with intendedUse="limited" because it can
only be used by protocols that explicitly allow it.
<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
<collation rfc="XXXX" scope="i18n" intendedUse="limited">
<name>i;octet</name>
<title>Octet</title>
<functions>equality order substring</functions>
<specification>RFC XXXX</specification>
<owner>IETF</owner>
<submitter>chris.newman@sun.com<submitter>
</collation>
10. IANA Considerations
Section 7 defines how to register collations with IANA. Section 9
defines a list of predefined collations, which should be registered
when this document is approved and published as an RFC.
Newman, et al. Expires September 3, 2006 [Page 26]
Internet-Draft Collation Registry March 2006
11. Security Considerations
Collations will normally be used with UTF-8 strings. Thus the
security considerations for UTF-8 [3], stringprep [6] and Unicode
TR-36 [9] also apply and are normative to this specification.
12. Open Issues
Mark Davis writes:
The sample registry would suffer a combinatorial explosion if
parameters are not handled differently. For example, with CLDR
collations, there can be hundreds of locales, six different strength
settings; four different case-first settings; three different
alternate settings, backwards settings, normalization settings, case
level settings, hiragana settings, and numeric settings; plus a
variable-top setting which has a string as an operand. Registering
the combinations that people are allowed to use would be untenable.
Use UCAv9 or UCAvLatest (14)? Instead of using UCAv9, it sounds
sensible to use "UCA", "UCA version x" or "UCA version x or greater"
The change to en;ascii-casemap indicates that its use is local. It's
not. The collation is not suitable to English since it doesn't
handle common typography, signs like the pound sign, and even some
words. And it's not local, since many protocols (like SMTP) use
ASCII casemapping even when used in non-English locales.
Can the text on lookup tables be simplified greatly, to remove all
the MUSTs related to e.g. downloading tables? The client could get
its tables from the collation spec rather than indirectly via the
server.
Should there be some text which references RFC 2434? If not, the
reference should die.
The section on multi-value attributes seems to be a solution looking
for a problem. Frank Ellermann had questions about what it means.
13. Change Log
13.1. Changes From -06
1. Clarified equality and identity: equality is as defined by a
collation, identity is stronger.
Newman, et al. Expires September 3, 2006 [Page 27]
Internet-Draft Collation Registry March 2006
2. Added reference to
http://www.unicode.org/reports/tr10/#Searching.
3. Don't describe sort keys as a canonical representation of the
string.
4. Permit disconnected clients to use wildcards. (A disconnected
client has to resolve the wildcard itself, in the same way that a
server would.)
5. Change collation-wild to have the same length limit as collation.
6. Change to use "less" instead of "-1", etc., and specify that it's
just phrasing, not specification.
7. Don't describe the equality, substring and ordering operations as
functions. The definition of collation uses the word function
about the collation itself. A function that has three functions?
Something has to give.
8. Strike a requirement that selecting '*' is the same as not
selecting any collation. It restricted the protocol's default
too much. Existing code wasn't listening.
9. Left out the canonicalization/sort keys.
13.2. Changes From -05
1. Added definitions of client, server and protocol, and prose to
specify that while the IANA registrations of collations are
written in terms octet strings, implementations may do it
differently.
2. Changed the wording for ascii-numeric to treat the numbers as
numbers, etc.
3. Added explicit property requirements for the three functions,
e.g. that equality be symmetric. Added requirements that the
three functions be consistent, and that if any operations are
present, equality must be (needed for consistency).
4. Random editing, e.g. changing 'numbers' for ascii-numeric to
'integer numbers'.
5. Gave IMAP/SORT/COMPARATOR the same grandfather treatment as ACAP
and SIEVE.
13.3. Changes From -04
Grammar and clarity changes only. One (weak) example added. No
substantive changes.
13.4. Changes From -03
(This does not include all changes made.)
1. Checked and resolved most issues marked 'check whether this is
true' or similar.
2. Resolved nameprep issue: No.
Newman, et al. Expires September 3, 2006 [Page 28]
Internet-Draft Collation Registry March 2006
3. Removed NULL for compatibility with existing collations (IMAP
SORT, Sieve).
4. There can be multiple owners and submitters. Say how.
5. Added a requirement that common collations must now be
interoperable. Insufficiently detailed specs cannot be "common".
6. Added a guideline that the operations provided by new collations
should be reminiscent of similar operations on existing
collations.
13.5. Changes From -02
1. Changed from data being octet sequences (in UTF-8) to data being
character sequences (with octet collation as an exception).
2. Made XML format description much more structured.
3. Changed <submittor> to <submitter>, because this spelling is much
more common.
4. Defined 'protocol' to include query languages.
5. Reorganized document, in particular IANA considerations section
(which newly is just a list of pointers).
6. Added subsections, and a 'Structure of this Document' section.
7. Updated references.
8. Created a 'Change Log' chapter, with sections for each draft.
9. Reduced 'Open issues' section, open issues are now maintained at
http://www.w3.org/2004/08/ietf-collation.
13.6. Changes From -01
Add IANA comment to open issues. Otherwise this is just a re-publish
to keep the document alive.
13.7. Changes From -00
1. Replaced the term comparator with collation. While comparator is
somewhat more precise because these abstract functions are used
for matching as well as ordering, collation is the term used by
other parts of the industry. Thus I have changed the name to
collation for consistency.
2. Remove all modifiers to the basic collation except for the
customization and the match rules. The other behavior
modifications can be specified in a customization of the
collation.
3. Use ";" instead of "-" as delimiter between parameters to make
names more URL-ish.
4. Add URL form for comparator reference.
5. Switched registration template to use XML document.
6. Added a number of useful registration template elements related
to the Unicode Collation Algorithm.
Newman, et al. Expires September 3, 2006 [Page 29]
Internet-Draft Collation Registry March 2006
7. Switched language from "custom" to "tailor" to match UCA language
for tailoring of the collation algorithm.
14. References
14.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[2] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 4234, October 2005.
[3] Yergeau, F., "UTF-8, a transformation format of ISO 10646",
STD 63, RFC 3629, November 2003.
[4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", RFC 3986,
January 2005.
[5] Alvestrand, H., "Tags for the Identification of Languages",
BCP 47, RFC 3066, January 2001.
[6] Hoffman, P. and M. Blanchet, "Preparation of Internationalized
Strings ("stringprep")", RFC 3454, December 2002.
[7] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for
Internationalized Domain Names (IDN)", RFC 3491, March 2003.
[8] Davis, M. and K. Whistler, "Unicode Collation Algorithm version
9", July 2002,
<http://www.unicode.org/reports/tr10/tr10-9.html>.
[9] Davis, M. and M. Suignard, "Unicode Security Considerations",
February 2006, <http://www.unicode.org/reports/tr36/>.
14.2. Informative References
[10] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies",
RFC 2045, November 1996.
[11] Myers, J., "Simple Authentication and Security Layer (SASL)",
RFC 2222, October 1997.
[12] Newman, C. and J. Myers, "ACAP -- Application Configuration
Access Protocol", RFC 2244, November 1997.
Newman, et al. Expires September 3, 2006 [Page 30]
Internet-Draft Collation Registry March 2006
[13] Resnick, P., "Internet Message Format", RFC 2822, April 2001.
[14] Freed, N. and J. Postel, "IANA Charset Registration
Procedures", BCP 19, RFC 2978, October 2000.
[15] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028,
January 2001.
[16] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION
4rev1", RFC 3501, March 2003.
[17] Crispin, M. and K. Murchison, "INTERNET MESSAGE ACCESS PROTOCOL
- SORT AND THREAD EXTENSIONS", draft-ietf-imapext-sort-17.txt
(work in progress), May 2004.
[18] Newman, C. and A. Gulbrandsen, "Internet Message Access
Protocol Internationalization", draft-ietf-imapext-i18n-06.txt
(work in progress), January 2006.
Newman, et al. Expires September 3, 2006 [Page 31]
Internet-Draft Collation Registry March 2006
Authors' Addresses
Chris Newman
Sun Microsystems
1050 Lakes Drive
West Covina, CA 91790
US
Email: chris.newman@sun.com
Martin Duerst
Aoyama Gakuin University
5-10-1 Fuchinobe
Sagamihara, Kanagawa 229-8558
Japan
Phone: +81 466 49 1170
Fax: +81 466 49 1171
Email: duerst@it.aoyama.ac.jp
URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/
Note: Please write "Duerst" with u-umlaut wherever possible, for
example as "Dürst" in XML and HTML.)
Arnt Gulbrandsen
Oryx Mail Systems GmbH
Schweppermannstr. 8
Munich 81671
Germany
Phone: +49 89 4502 9757
Fax: +49 89 4502 9758
Email: arnt@oryx.com
URI: http://www.oryx.com/arnt/
Newman, et al. Expires September 3, 2006 [Page 32]
Internet-Draft Collation Registry March 2006
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Newman, et al. Expires September 3, 2006 [Page 33]
Html markup produced by rfcmarkup 1.129b, available from
https://tools.ietf.org/tools/rfcmarkup/