draft-ietf-urn-syntax-01.txt   draft-ietf-urn-syntax-02.txt 
Internet-Draft Ryan Moats Internet-Draft Ryan Moats
draft-ietf-urn-syntax-01.txt AT&T draft-ietf-urn-syntax-02.txt AT&T
Expires in six months November 1996 Expires in six months January 1997
URN Syntax URN Syntax
Filename: draft-ietf-urn-syntax-01.txt Filename: draft-ietf-urn-syntax-02.txt
Status of This Memo Status of This Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts. distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other months and may be updated, replaced, or obsoleted by other
skipping to change at page 1, line 33 skipping to change at page 1, line 33
To learn the current status of any Internet-Draft, please check To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet- the ``1id-abstracts.txt'' listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
(Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
Coast), or ftp.isi.edu (US West Coast). Coast), or ftp.isi.edu (US West Coast).
Abstract Abstract
Uniform Resource Names (URNs) are intended to serve as persistent Uniform Resource Names (URNs) are intended to serve as persistent
resource identifiers. This document sets forward the canonical syntax resource identifiers. This document sets forward the canonical syntax
for URNs. Support for both existing legacy and new namespaces is for URNs. A discussion of both existing legacy and new namespaces
discussed. Requirements for URN presentation and transmission are and requirements for URN presentation and transmission are presented.
presented. Finally, there is a discussion of URN equivalence and how Finally, there is a discussion of URN equivalence and how to
to determine it. determine it.
1. Introduction 1. Introduction
Uniform Resource Names (URNs) are intended to serve as persistent Uniform Resource Names (URNs) are intended to serve as persistent
resource identifiers and are designed to make it easy to map other resource identifiers and are designed to make it easy to map other
namespaces (which share the properties of URNs) into URN-space. The namespaces (which share the properties of URNs) into URN-space.
URN syntax therefore provides a means to encode character data in a Therefore, the URN syntax provides a means to encode character data
form that can be sent in existing protocols, transcribed on most in a form that can be sent in existing protocols, transcribed on most
keyboards, etc. keyboards, etc.
2. Syntax 2. Syntax
All URNs have the following syntax: All URNs have the following syntax (phrases enclosed in quotes are
REQUIRED):
<URN> ::= ["urn:"] <NID> ":" <NSS> <URN> ::= "urn:" <NID> ":" <NSS>
<NID> is the Namespace Identifier, and <NSS> is the Namespace where <NID> is the Namespace Identifier, and <NSS> is the Namespace
Specific String. The leading case-insensitive "urn:" sequence is Specific String. The leading "urn:" sequence is case-insensitive.
currently optional, as no closure on its definite presence or absence The Namespace ID determines the _syntactic_ interpretation of the
has been reached. The Namespace ID is used to determine the Namespace Specific String (as discussed in [1]).
_syntactic_ interpretation of the Namespace Specific String (as
discussed in [1]).
RFC 1737 [2] presents additional requirements on URN encoding, which RFC 1630 [2] and RFC 1737 [3] each presents additional considerations
all have implications as far as limiting syntax. On the other hand, for URN encoding, which have implications as far as limiting syntax.
the requirement to support existing legacy naming systems has the On the other hand, the requirement to support existing legacy naming
effect of broadening syntax. Thus, we discuss the acceptable syntax systems has the effect of broadening syntax. Thus, we discuss the
for both the Namespace Identifier and the Namespace Specific String acceptable syntax for both the Namespace Identifier and the Namespace
separately. Specific String separately.
2.1 Namespace Identifier Syntax 2.1 Namespace Identifier Syntax
The following is the syntax for the Namespace Identifier. To (a) be The following is the syntax for the Namespace Identifier. To (a) be
consistent with all potential resolution schemes and (b) not put any consistent with all potential resolution schemes and (b) not put any
undue constraints on any potential resolution scheme, the syntax for undue constraints on any potential resolution scheme, the syntax for
the Namespace Identifier is: the Namespace Identifier is:
<NID> ::= <letter> [ <let-hyp> ] <NID> ::= <let-num> [ *<let-num-hyp> ]
<let-hyp> ::= <letter> | "-" | <let-hyp>
<letter> ::= any one of the 52 alphabetic characters A through Z
in upper case and a through z in lower case
This is slightly more restrictive that what is stated in RFC 1738 [4]
(which allows the period "."). Further, the Namespace Identifier is
case insensitive, so that "ISBN" and "isbn" refer to the same
namespace.
To avoid confusion with the optional "urn:" identifier, the NID "urn"
is reserved and may not be used.
2.2 Namespace Specific String Syntax
As required by 1737, there is a single canonical representation of
the NSS portion of an URN. The format of this single canonical form
follows:
<NSS> ::= <URN chars>*
<URN chars> ::= <trans> | "%" <hex> <hex>
<trans> ::= <upper> | <lower> | <number> | <other> <let-num-hyp> ::= <upper> | <lower> | <number> | "-"
<hex> ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" <let-num> ::= <upper> | <lower> | <number>
<upper> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | <upper> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
"I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
"Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
"Y" | "Z" "Y" | "Z"
<lower> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | <lower> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
"i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
"q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
"y" | "z" "y" | "z"
<number> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | <number> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9" "8" | "9"
<other> ::= "(" | ")" | "+" | " This is slightly more restrictive that what is stated in [4] (which
":" | "=" | "?" | "@" allows the characters "." and "+"). Further, the Namespace
Identifier is case insensitive, so that "ISBN" and "isbn" refer to
the same namespace.
To avoid confusion with the "urn:" identifier, the NID "urn" is
reserved and MUST NOT be used.
2.2 Namespace Specific String Syntax
As required by RFC 1737, there is a single canonical representation
of the NSS portion of an URN. The format of this single canonical
form follows:
<NSS> ::= 1*<URN chars>
<URN chars> ::= <trans> | "%" <hex> <hex>
<trans> ::= <upper> | <lower> | <number> | <other> | <reserved>
<hex> ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
<other> ::= "(" | ")" | "+" | "," | "-" | "." |
":" | "=" | "?" | "@" | ";" | "$" |
"_" | "!" | "~" | "*" | "'"
Depending on the rules governing a namespace, valid identifiers in a Depending on the rules governing a namespace, valid identifiers in a
namespace might contain characters that are not members of the URN namespace might contain characters that are not members of the URN
character set above (<URN chars>). Such strings MUST be translated character set above (<URN chars>). Such strings MUST be translated
into canonical NSS format before using them as protocol elements or into canonical NSS format before using them as protocol elements or
otherwise passing them on to other applications. Translation is done otherwise passing them on to other applications. Translation is done
by encoding each character outside the URN character set as a by encoding each character outside the URN character set as a
sequence of one to six octets using UTF-8 encoding, and the encoding sequence of one to six octets using UTF-8 encoding, and the encoding
of each of those octets as "%" followed by two characters from the of each of those octets as "%" followed by two characters from the
<hex> character set above. The two characters give the hexadecimal <hex> character set above. The two characters give the hexadecimal
representation of that octet. representation of that octet.
2.3 Reserved characters
The remaining character set left to be discussed above is the
reserved character set, which contains various characters reserved
from normal use. The reserved character set follows, with a
discussion on the specifics of why each character is reserved.
The reserved character set is:
<reserved> ::= "/" | "%"
2.3.1 The "%" character
The "%" character is reserved in the URN syntax for introducing the
escape sequence for an octet. Literal use of the "%" character in a
namespace must be encoded using "%25" in URNs for that namespace.
The presence of an "%" character in an URN MUST be followed by two
characters from the <hex> character set.
Namespaces MAY designate one or more characters from the URN Namespaces MAY designate one or more characters from the URN
character set as having special meaning for that namespace. If the character set as having special meaning for that namespace. If the
namespace also uses that character in a literal sense as well, the namespace also uses that character in a literal sense as well, the
character used in a literal sense must be encoded with "%" followed character used in a literal sense MUST be encoded with "%" followed
by the hexadecimal representation of that octet. Therefore, the by the hexadecimal representation of that octet. Therefore, the
process of registering a namespace identifier shall include process of registering a namespace identifier shall include
publication of a definition of which characters have a special publication of a definition of which characters have a special
meaning and how to encode these characters if used in a literal meaning to that namespace.
sense.
3. Support of existing legacy naming systems and new naming systems 2.3.2 The "/" character
URN-aware applications MAY accept as input other resource identifiers The "/" character is RESERVED for future developments. It might be
from existing legacy namespaces. If such identifiers contain used for denoting hierarchy to allow for relative URN processing, but
characters that are not members of the URN character set specified in the WG has not yet reached consensus on this, so such developments
section 2.2, the identifier MUST be translated to canonical format as will be documented separately. Meanwhile, namespace developers
discussed in section 2.2. SHOULD NOT use an unencoded "/", but rather use %-encoding for "/"
("%2F").
Some existing name spaces that have the properties of the URN-space 2.4 Excluded characters
contain some human-significant components, and these exist in a wide
variety of languages. However, URNs are NOT intended to convey
information that is significant to humans. While the translation
rule in section 2.2 is provided for existing namespaces, new
namespaces, as part of their registration documentation, MUST define
a discipline for assigning new URNs that does not simplify the
generation of human-significant names.
4. URN presentation and transport The following list is included only for the sake of completeness.
Any octets/characters on this list are explicitly NOT part of the URN
character set, and if used in an URN, MUST be %encoded:
URN-aware applications MAY support "natural" display of URNs which <excluded> ::= octets 0-32 (0-20 hex) | "\" | """ | "#" | "&" | "<"
contain characters encoded using "%" notation. However, they MUST | ">" | "[" | "]" | "^" | "`" | "{" | "|" | "}" | octets 127-255 (7F-FF hex)
provide for display of URNs in canonical form (i.e. in a format
suitable for transcription).
URNs may only be transported in canonical format. An URN ends when an octet/character from the excluded character set
(<excluded>) is encountered. The character from the excluded
character set is NOT part of the URN.
5. Equivalence in URNs 3. Support of existing legacy naming systems and new naming systems
URNs are considered equivalent if they return the same resource. For Any namespace (existing or newly-devised) that is proposed as an
various purposes, such as caching, a test is necessary to determine URN-namespace and fulfills the criteria of URN-namespaces MUST be
equivalence without actually resolving the URNs and fetching/comparing expressed in this syntax. If names in these namespaces contain
the underlying resources. "Lexical equivalence" is a stricter condition characters other than those defined for the URN character set, they
that the equivalence described above (functional equivalence). MUST be translated into canonical form as discussed in section 2.2.
5.1 Lexical Equivalence 4. URN presentation and transport
Lexical equivalence may be determined by comparing two URNs without The URN syntax defines the canonical format for URNs and all URN
making any network accesses. Two URNs are lexically equivalent if transport and interchanges MUST take place in this format. Further,
they are octet-by-octet equal after the following preprocessing all URN-aware applications MUST offer the option of displaying URNs
in this canonical form to allow for direct transcription (for example
by cut and paste techniques). Such applications MAY support display
of URNs in a more human-friendly form and may use a character set
that includes characters that aren't permitted in URN syntax as
defined in this RFC (that is, they may replace %-notation by
characters in some extended character set in display to humans).
1. drop any preceding "urn:" token 5. Lexical Equivalence in URNs
For various purposes such as caching, it's often desirable to determine
if two URNs are the same without resolving them. The general purpose
means of doing so is by testing for "lexical equivalence" as defined
below.
Two URNs are lexically equivalent if they are octet-by-octet equal after
the following preprocessing:
1. normalize the case of the leading "urn:" token
2. normalize the case of the NID 2. normalize the case of the NID
3. normalizing the case of any %-escaping
Note that %-escaping MUST NOT be removed.
Some namespaces may define additional lexical equivalences, such as Some namespaces may define additional lexical equivalences, such as
case-insensitivity of the NSS (or parts thereof). Additional lexical case-insensitivity of the NSS (or parts thereof). Additional lexical
equivalences MUST be documented as part of namespace registration, equivalences MUST be documented as part of namespace registration, MUST
MUST always have the effect of eliminating some of the false always have the effect of eliminating some of the false negatives
negatives obtained by the procedure above, and MUST NEVER says that obtained by the procedure above, and MUST NEVER say that two URNs are
two URNs are not equivalent if the procedure above says they are not equivalent if the procedure above says they are equivalent.
equivalent. 6. Examples of lexical equivalence
5.2 Functional Equivalence
Resolvers determine functional equivalence based on specific rules The following URN comparisons highlight the lexical equivalence
for the namespace. Therefore, namespace registration must include definitions:
documentation on how to determine functional equivalence for that
namespace.
5.3 Examples 1- URN:foo:a123/456
2- urn:foo:a123/456
3- urn:FOO:a123/456
4- urn:foo:A123/456
5- urn:foo:a123%2F456
6- URN:FOO:a123%2f456
URNs 1, 2, and 3 are all lexically equivalent. URN 4 is not
lexically equivalent any of the other URNs of the above set. URNs 5
and 6 are only lexically equivalent to each other.
The following URN comparisons highlight the difference between these 7. Functional Equivalence in URNs
types of equivalence:
urn:isbn:1-23485-8-29, isbn:1-23485-8-29 are lexically equiv. Functional equivalence is determined by practice within a given
urn:isbn:1-23485-8-29, ISBN:1-23485-8-29 are lexically equiv. namespace and managed by resolvers for that namespeace. Namespace
urn:isbn:1-23485-8-29, isbn:123485829 are not lexically equiv. registration must include guidance on how to determine functional
but may be functionally equivalent. equivalence for that namespace, i.e. when two URNs are the identical
within a namespace.
6. Security considerations 8. Security considerations
Because of the number of potential namespaces, it must be restated This document specifies the syntax for URNs. While some namespaces
that certain of the characters in the Namespace Specific String may resolvers may assign special meaning to certain of the characters of
have special meaning to certain namespace resolvers. The process of the Namespace Specific String, any security consideration resulting
registering a namespace identifier shall therefore include from such assignment are outside the scope of this document. It is
publication of a definition of which characters have a special strongly recommended that the process of registering a namespace
meaning. identifier include any such considerations.
7. Acknowledgments 9. Acknowledgments
Thanks to various members of the URN working group and <<your name Thanks to various members of the URN working group and <<your name
here!!>> for comments on earlier drafts of this document. This here!!>> for comments on earlier drafts of this document. This
document is partially supported by the National Science Foundation. document is partially supported by the National Science Foundation,
Cooperative Agreement NCR-9218179.
8. References 10. References
Request For Comments (RFC) and Internet Draft documents are available Request For Comments (RFC) and Internet Draft documents are available
from <URL:ftp://ftp.internic.net> and numerous mirror sites. from <URL:ftp://ftp.internic.net> and numerous mirror sites.
[1] L. L. Daigle, P. Faltstrom, R. Iannella. "A Frame- [1] K. R. Sollins, "Requirements and a Framework for
work for the Assignment and Resolution of Uniform URN Resolution Systems," Internet Draft (work in
Resource Names," Internet Draft (work in progress). progress), November 1996.
June 1996.
[2] K. Sollins, L. Masinter. "Functional Requirements [2]
for Uniform Resource Names," RFC 1737. December T. Berners-Lee, "Universal Resource Identifiers in WWW," RFC
1994. 1630, June 1994.
[3] T. Berners-Lee. "Universal Resource Identifiers in [3] K. Sollins and L. Masinter, "Functional Require-
WWW," RFC 1630. June 1994. ments for Uniform Resource Names," RFC 1737.
December 1994.
[4] T. Berners-Lee, L. Masinter, M. McCahill. "Uniform [4] T. Berners-Lee, R. Fielding, L. Masinter, "Uniform
Resource Locators (URL)," RFC 1738. December 1994. Resource Locators (URL)," Internet Draft (work in
progress), December 1996.
9. Editor's address 11. Editor's address
Ryan Moats Ryan Moats
AT&T AT&T
15621 Drexel Circle 15621 Drexel Circle
Omaha, NE 68135-2358 Omaha, NE 68135-2358
USA USA
Phone: +1 402 894-9456 Phone: +1 402 894-9456
EMail: jayhawk@ds.internic.net EMail: jayhawk@ds.internic.net
This Internet Draft expires May 19, 1997. Appendix A. Handling of URNs by URL resolvers/browsers.
The URN syntax has been defined so that URNs can be used in places
where URLs are expected. A resolver that conforms to the current URL
syntax specification [3] will extract a scheme value of "urn:"
rather than a scheme value of "urn:<nid>".
An URN MUST be considered an opaque URL by URL resolvers and either
passed (with the "urn:" tag) to an URN resolver for resolution. The
URN resolver can either be an external resolver that the URL resolver
knows of, or it can be functionality built-in to the URL resolver.
To avoid confusion of users, an URL browser SHOULD display the com-
plete URN (including the "urn:" tag) to ensure that there is no con-
fusion between URN namespace identifiers and URL scheme identifiers.
This Internet Draft expires July 31, 1997.
 End of changes. 43 change blocks. 
126 lines changed or deleted 165 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/