draft-ietf-ldapbis-strprep-03.txt   draft-ietf-ldapbis-strprep-04.txt 
Internet-Draft Kurt D. Zeilenga Internet-Draft Kurt D. Zeilenga
Intended Category: Standard Track OpenLDAP Foundation Intended Category: Standard Track OpenLDAP Foundation
Expires in six months 15 February 2004 Expires in six months 4 June 2004
LDAP: Internationalized String Preparation LDAP: Internationalized String Preparation
<draft-ietf-ldapbis-strprep-03.txt> <draft-ietf-ldapbis-strprep-04.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with all This document is intended to be published as a Standard Track RFC.
provisions of Section 10 of RFC2026.
Distribution of this memo is unlimited. Technical discussion of this Distribution of this memo is unlimited. Technical discussion of this
document will take place on the IETF LDAP Revision Working Group document will take place on the IETF LDAP Revision Working Group
mailing list <ietf-ldapbis@openldap.org>. Please send editorial mailing list <ietf-ldapbis@openldap.org>. Please send editorial
comments directly to the author <Kurt@OpenLDAP.org>. comments directly to the editor <Kurt@OpenLDAP.org>.
By submitting this Internet-Draft, I accept the provisions of Section
4 of RFC 3667. By submitting this Internet-Draft, I certify that any
applicable patent or other IPR claims of which I am aware have been
disclosed, and any of which I become aware will be disclosed, in
accordance with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Task Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference material
material or to cite them other than as ``work in progress.'' or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
<http://www.ietf.org/ietf/1id-abstracts.txt>. The list of http://www.ietf.org/ietf/1id-abstracts.txt. The list of
Internet-Draft Shadow Directories can be accessed at Internet-Draft Shadow Directories can be accessed at
<http://www.ietf.org/shadow.html>. http://www.ietf.org/shadow.html.
Copyright (C) The Internet Society (2004). All Rights Reserved. Copyright (C) The Internet Society (2004). All Rights Reserved.
Please see the Full Copyright section near the end of this document Please see the Full Copyright section near the end of this document
for more information. for more information.
Abstract Abstract
The previous Lightweight Directory Access Protocol (LDAP) technical The previous Lightweight Directory Access Protocol (LDAP) technical
specifications did not precisely define how character string matching specifications did not precisely define how character string matching
is to be performed. This led to a number of usability and is to be performed. This led to a number of usability and
interoperability problems. This document defines string preparation interoperability problems. This document defines string preparation
algorithms for character-based matching rules defined for use in LDAP. algorithms for character-based matching rules defined for use in LDAP.
Conventions Conventions and Terms
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14 [RFC2119]. document are to be interpreted as described in BCP 14 [RFC2119].
Character names in this document use the notation for code points and Character names in this document use the notation for code points and
names from the Unicode Standard [Unicode]. For example, the letter names from the Unicode Standard [Unicode]. For example, the letter
"a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>. "a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>.
In the lists of mappings and the prohibited characters, the "U+" is In the lists of mappings and the prohibited characters, the "U+" is
left off to make the lists easier to read. The comments for character left off to make the lists easier to read. The comments for character
ranges are shown in square brackets (such as "[CONTROL CHARACTERS]") ranges are shown in square brackets (such as "[CONTROL CHARACTERS]")
and do not come from the standard. and do not come from the standard.
Note: a glossary of terms used in Unicode can be found in [Glossary]. Note: a glossary of terms used in Unicode can be found in [Glossary].
Information on the Unicode character encoding model can be found in Information on the Unicode character encoding model can be found in
[CharModel]. [CharModel].
The term "combining mark", as used in this specification, refers to
any Unicode [Unicode] code point which has a mark property (Mn, Mc,
Me). Appendix C provides a complete list of combining marks.
1. Introduction 1. Introduction
1.1. Background 1.1. Background
A Lightweight Directory Access Protocol (LDAP) [Roadmap] matching rule A Lightweight Directory Access Protocol (LDAP) [Roadmap] matching rule
[Syntaxes] defines an algorithm for determining whether a presented [Syntaxes] defines an algorithm for determining whether a presented
value matches an attribute value in accordance with the criteria value matches an attribute value in accordance with the criteria
defined for the rule. The proposition may be evaluated to True, defined for the rule. The proposition may be evaluated to True,
False, or Undefined. False, or Undefined.
skipping to change at page 5, line 32 skipping to change at page 5, line 42
SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code
points are mapped to nothing. COMBINING GRAPHEME JOINER (U+034F) and points are mapped to nothing. COMBINING GRAPHEME JOINER (U+034F) and
VARIATION SELECTORs (U+180B-180D,FF00-FE0F) code points are also VARIATION SELECTORs (U+180B-180D,FF00-FE0F) code points are also
mapped to nothing. The OBJECT REPLACEMENT CHARACTER (U+FFFC) is mapped to nothing. The OBJECT REPLACEMENT CHARACTER (U+FFFC) is
mapped to nothing. mapped to nothing.
CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE
TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR) TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR)
(U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020). (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020).
All other control code points (e.g., Cc) or code points with a control All other control code (e.g., Cc) points or code points with a control
function (e.g., Cf) are mapped to nothing. function (e.g., Cf) are mapped to nothing. The following is a
complete list of these code points: U+0000-0008, 000E-001F, 007F-0084,
0086-009F, 06DD, 070F, 180E, 200C-200F, 202A-202E, 2060-2063,
206A-206F, FEFF, FFF9-FFFB, 1D173-1D17A, E0001, E0020-E007F.
ZERO WIDTH SPACE (U+200B) is mapped to nothing. All other code points ZERO WIDTH SPACE (U+200B) is mapped to nothing. All other code points
with Separator (space, line, or paragraph) property (e.g, Zs, Zl, or with Separator (space, line, or paragraph) property (e.g, Zs, Zl, or
Zp) are mapped to SPACE (U+0020). Zp) are mapped to SPACE (U+0020). The following is a complete list of
these code points: U+0020, 00A0, 1680, 2000-200A, 2028-2029, 202F,
Appendix B provides a table detailing the above mappings. 205F, 3000.
For case ignore, numeric, and stored prefix string matching rules, For case ignore, numeric, and stored prefix string matching rules,
characters are case folded per B.2 of [StringPrep]. characters are case folded per B.2 of [StringPrep].
The output is the mapped string. The output is the mapped string.
2.3. Normalize 2.3. Normalize
The input string is be normalized to Unicode Form KC (compatibility The input string is be normalized to Unicode Form KC (compatibility
composed) as described in [UAX15]. The output is the normalized composed) as described in [UAX15]. The output is the normalized
skipping to change at page 6, line 14 skipping to change at page 6, line 28
2.4. Prohibit 2.4. Prohibit
All Unassigned code points are prohibited. Unassigned code points are All Unassigned code points are prohibited. Unassigned code points are
listed in Table A.1 of [StringPrep]. listed in Table A.1 of [StringPrep].
Characters which, per Section 5.8 of [Stringprep], change display Characters which, per Section 5.8 of [Stringprep], change display
properties or are deprecated are prohibited. These characters are are properties or are deprecated are prohibited. These characters are are
listed in Table C.8 of [StringPrep]. listed in Table C.8 of [StringPrep].
Private Use (U+E000-F8FF, F0000-FFFFD, 100000-10FFFD) code points are Private Use code points are prohibited. These characters are listed
prohibited. in Table C.3 of [StringPrep].
All non-character code points (U+FDD0-FDEF, FFFE-FFFF, 1FFFE-1FFFF, All non-character code points are prohibited. These code points are
2FFFE-2FFFF, 3FFFE-3FFFF, 4FFFE-4FFFF, 5FFFE-5FFFF, 6FFFE-6FFFF, listed in Table C.4 of [StringPrep].
7FFFE-7FFFF, 8FFFE-8FFFF, 9FFFE-9FFFF, AFFFE-AFFFF, BFFFE-BFFFF,
CFFFE-CFFFF, DFFFE-DFFFF, EFFFE-EFFFF, FFFFE-FFFFF, 10FFFE-10FFFF) are
prohibited.
Surrogate codes (U+D800-DFFFF) are prohibited. Surrogate codes are prohibited. These characters are listed in Table
C.5 of [StringPrep].
The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited. The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited.
The step fails if the input string contains any prohibited code point. The step fails if the input string contains any prohibited code point.
Otherwise, the output is the input string. Otherwise, the output is the input string.
2.5. Check bidi 2.5. Check bidi
This step fails if the input string does not conform to the the This step fails if the input string does not conform to the the
bidirectional character restrictions detailed in 6 of [Stringprep]. bidirectional character restrictions detailed in 6 of [Stringprep].
skipping to change at page 9, line 4 skipping to change at page 9, line 17
The approach used in this document is based upon design principles and The approach used in this document is based upon design principles and
algorithms described in "Preparation of Internationalized Strings algorithms described in "Preparation of Internationalized Strings
('stringprep')" [StringPrep] by Paul Hoffman and Marc Blanchet. Some ('stringprep')" [StringPrep] by Paul Hoffman and Marc Blanchet. Some
additional guidance was drawn from Unicode Technical Standards, additional guidance was drawn from Unicode Technical Standards,
Technical Reports, and Notes. Technical Reports, and Notes.
This document is a product of the IETF LDAP Revision (LDAPBIS) Working This document is a product of the IETF LDAP Revision (LDAPBIS) Working
Group. Group.
6. Author's Address 6. Author's Address
Kurt D. Zeilenga Kurt D. Zeilenga
OpenLDAP Foundation OpenLDAP Foundation
Email: Kurt@OpenLDAP.org Email: Kurt@OpenLDAP.org
7. References 7. References
[[Note to the RFC Editor: please replace the citation tags used in
referencing Internet-Drafts with tags of the form RFCnnnn.]]
7.1. Normative References 7.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14 (also RFC 2119), March 1997. Requirement Levels", BCP 14 (also RFC 2119), March 1997.
[Roadmap] Zeilenga, K. (editor), "LDAP: Technical Specification [Roadmap] Zeilenga, K. (editor), "LDAP: Technical Specification
Road Map", draft-ietf-ldapbis-roadmap-xx.txt, a work in Road Map", draft-ietf-ldapbis-roadmap-xx.txt, a work in
progress. progress.
[StringPrep] Hoffman P. and M. Blanchet, "Preparation of [StringPrep] Hoffman P. and M. Blanchet, "Preparation of
skipping to change at page 17, line 38 skipping to change at page 18, line 10
48| 021e | 01cf | ?? | 01e8 | 013d | ?? | 0147 | 01d1 | 48| 021e | 01cf | ?? | 01e8 | 013d | ?? | 0147 | 01d1 |
50| ?? | ?? | 0158 | 0160 | 0164 | 01d3 | ?? | ?? | 50| ?? | ?? | 0158 | 0160 | 0164 | 01d3 | ?? | ?? |
58| ?? | ?? | 017d | ?? | ?? | ?? | ?? | ?? | 58| ?? | ?? | 017d | ?? | ?? | ?? | ?? | ?? |
60| ?? | 01ce | ?? | 010d | 010f | 011b | ?? | 01e7 | 60| ?? | 01ce | ?? | 010d | 010f | 011b | ?? | 01e7 |
68| 021f | 01d0 | 01f0 | 01e9 | 013e | ?? | 0148 | 01d2 | 68| 021f | 01d0 | 01f0 | 01e9 | 013e | ?? | 0148 | 01d2 |
70| ?? | ?? | 0159 | 0161 | 0165 | 01d4 | ?? | ?? | 70| ?? | ?? | 0159 | 0161 | 0165 | 01d4 | ?? | ?? |
78| ?? | ?? | 017e | ?? | ?? | ?? | ?? | ?? | 78| ?? | ?? | 017e | ?? | ?? | ?? | ?? | ?? |
--+------+------+------+------+------+------+------+------+ --+------+------+------+------+------+------+------+------+
Table B.14: Mapping of T.61 Caron Accent Combinations Table B.14: Mapping of T.61 Caron Accent Combinations
Appendix B -- Mapping Table Appendix C. Combining Marks
Input Output This appendix is normative.
----- ------
0000-0008 0300-034F 0360-036F 0483-0486 0488-0489 0591-05A1 05A3-05B9 05BB-05BC
0009-000D 0020 05BF 05C1-05C2 05C4 064B-0655 0670 06D6-06DC 06DE-06E4 06E7-06E8
000E-001F 06EA-06ED 0711 0730-074A 07A6-07B0 0901-0903 093C 093E-094F 0951-0954
007F-009F 0962-0963 0981-0983 09BC 09BE-09C4 09C7-09C8 09CB-09CD 09D7 09E2-09E3
0085 0020 0A02 0A3C 0A3E-0A42 0A47-0A48 0A4B-0A4D 0A70-0A71 0A81-0A83 0ABC
00A0 0020 0ABE-0AC5 0AC7-0AC9 0ACB-0ACD 0B01-0B03 0B3C 0B3E-0B43 0B47-0B48
00AD 0B4B-0B4D 0B56-0B57 0B82 0BBE-0BC2 0BC6-0BC8 0BCA-0BCD 0BD7 0C01-0C03
034F 0C3E-0C44 0C46-0C48 0C4A-0C4D 0C55-0C56 0C82-0C83 0CBE-0CC4 0CC6-0CC8
06DD 0CCA-0CCD 0CD5-0CD6 0D02-0D03 0D3E-0D43 0D46-0D48 0D4A-0D4D 0D57
070F 0D82-0D83 0DCA 0DCF-0DD4 0DD6 0DD8-0DDF 0DF2-0DF3 0E31 0E34-0E3A
1680 0020 0E47-0E4E 0EB1 0EB4-0EB9 0EBB-0EBC 0EC8-0ECD 0F18-0F19 0F35 0F37 0F39
1806 0F3E-0F3F 0F71-0F84 0F86-0F87 0F90-0F97 0F99-0FBC 0FC6 102C-1032
180B-180E 1036-1039 1056-1059 1712-1714 1732-1734 1752-1753 1772-1773 17B4-17D3
2000-200A 0020 180B-180D 18A9 20D0-20EA 302A-302F 3099-309A FB1E FE00-FE0F FE20-FE23
200B-200F 1D165-1D169 1D16D-1D172 1D17B-1D182 1D185-1D18B 1D1AA-1D1AD
2028-2029 0020
202A-202E
202F 0020
205F 0020
2060-2063
206A-206F
3000 0020
FEFF
FF00-FE0F
FFF9-FFFC
1D173-1D17A
E0001
E0020-E007F
Intellectual Property Rights Intellectual Property Rights
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to pertain Intellectual Property Rights or other rights that might be claimed to
to the implementation or use of the technology described in this pertain to the implementation or use of the technology described in
document or the extent to which any license under such rights might or this document or the extent to which any license under such rights
might not be available; neither does it represent that it has made any might or might not be available; nor does it represent that it has
effort to identify any such rights. Information on the IETF's made any independent effort to identify any such rights. Information
procedures with respect to rights in standards-track and on the procedures with respect to rights in RFC documents can be found
standards-related documentation can be found in BCP-11. Copies of in BCP 78 and BCP 79.
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to Copies of IPR disclosures made to the IETF Secretariat and any
obtain a general license or permission for the use of such proprietary assurances of licenses to be made available, or the result of an
rights by implementors or users of this specification can be obtained attempt made to obtain a general license or permission for the use of
from the IETF Secretariat. such proprietary rights by implementers or users of this specification
can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to practice rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF Executive this standard. Please address the information to the IETF at
Director. ietf-ipr@ietf.org.
Full Copyright Full Copyright
Copyright (C) The Internet Society (2004). All Rights Reserved.
This document and translations of it may be copied and furnished to Copyright (C) The Internet Society (2004). This document is subject
others, and derivative works that comment on or otherwise explain it to the rights, licenses and restrictions contained in BCP 78, and
or assist in its implementation may be prepared, copied, published and except as set forth therein, the authors retain all their rights.
distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are This document and the information contained herein are provided on an
included on all such copies and derivative works. However, this "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
document itself may not be modified in any way, such as by removing OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
the copyright notice or references to the Internet Society or other ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
Internet organizations, except as needed for the purpose of INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
developing Internet standards in which case the procedures for INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
copyrights defined in the Internet Standards process must be followed, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
or as required to translate it into languages other than English.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/