Internet-Draft                                                Ryan Moats
draft-ietf-urn-syntax-00.txt
draft-ietf-urn-syntax-01.txt                                        AT&T
Expires in six months                                       October                                      November 1996

                               URN Syntax
                 Filename: draft-ietf-urn-syntax-00.txt draft-ietf-urn-syntax-01.txt

Status of This Memo

      This document is an Internet-Draft.  Internet-Drafts are working
      documents of the Internet Engineering Task Force (IETF), its
      areas, and its working groups.  Note that other groups may also
      distribute working documents as Internet-Drafts.

      Internet-Drafts are draft documents valid for a maximum of six
      months and may be updated, replaced, or obsoleted by other
      documents at any time.  It is inappropriate to use Internet-
      Drafts as reference material or to cite them other than as ``work
      in progress.''

      To learn the current status of any Internet-Draft, please check
      the ``1id-abstracts.txt'' listing contained in the Internet-
      Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
      (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
      Coast), or ftp.isi.edu (US West Coast).

Abstract

   Uniform Resource Names (URNs) are intended to serve as persistent
   resource identifiers. This document presents sets forward the canonical syntax
   for URNs.  Support for both existing legacy and new namespaces is
   discussed. Requirements for URN presentation and transmission
   encoding requirements are
   presented.  Finally, there is a discussion of URN equivalence and how
   to determine it.

1. Introduction

   Uniform Resource Names (URNs) are intended to serve as persistent
   resource identifiers and are designed to make it easy to map other
   namespaces (which share the properties of URNs) into URN-space. The
   URN syntax therefore provides a means to encode character data in a
   form that can be sent in existing protocols, transcribed on most
   keyboards, etc.

2. Syntax

   All URNs have the following syntax:

                    <URN> ::= ["urn:"] <NID> ":" <NSS>

   <NID> is the Namespace Identifier, and <NSS> is the Namespace
   Specific String.  The leading case-insensitive "urn:" sequence is
   currently optional, as no closure on its definite presence or absence
   has been reached.  The Namespace ID is used to determine the
   _syntactic_ interpretation of the Namespace Specific String (as
   discussed in [1]).

   RFC 1737 [2] suggests presents additional requirements on URN encoding, which
   all have implications as far as limiting syntax.  On the other hand,
   the requirement to support existing legacy naming systems has the
   effect of broadening syntax.  Thus, we discuss the acceptable syntax
   for both the Namespace Identifier and the Namespace Specific String
   separately.

1.1

2.1 Namespace Identifier Syntax

   The following is the syntax for the Namespace Identifier. To (a) be
   consistent with all potential resolution schemes and (b) not put any
   undue constraints on any potential resolution scheme, the syntax for
   the Namespace Identifier is:

   <NID>         ::= <letter> [ <let-hyp> ]

   <let-hyp>     ::= <letter> | "-" | <let-hyp>

   <letter>      ::= any one of the 52 alphabetic characters A through Z
                     in upper case and a through z in lower case

   This is slightly more restrictive that what is stated in RFC 1738 [4]
   (which allows the period "."). Further, the Namespace Identifier is
   case insensitive, so that "ISBN" and "isbn" refer to the same
   namespace.

   To avoid confusion with the optional "urn:" identifier, the NID "urn"
   is reserved and may not be used.

1.2

2.2 Namespace Specific String Syntax

   As required by 1737, there is a single canonical representation of
   the NSS portion of an URN.   The format of this single canonical form
   follows:

   <NSS>        ::= <URN chars>*

   <URN chars>  ::= <trans> | "%" <hex> <hex>

   <trans>      ::= <upper> | <lower> | <number> | <other>

   <hex>        ::= <number> | "A" | "B" | "C" | "D" | "E" | "F"

   <upper>      ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
                    "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
                    "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
                    "Y" | "Z"

   <lower>      ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
                    "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
                    "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
                    "y" | "z"

   <number>     ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                    "8" | "9"

   <other>      ::= "(" | ")" | "+" | "
                    ":" | "=" | "?" | "@"

   Depending on the rules governing a namespace, valid identifiers in a
   namespace might contain characters that are reserved characters in
   URI syntax or non-printable ASCII characters.  To accommodate the
   largest set not members of valid identifiers, the NSS portion of a URN shall use
   UTF-8 representation of ISO 10646 as its
   character set.  Namespaces
   that do not currently use ISO 10646/UTF-8 are encouraged to migrate
   to it.

   Clients set above (<URN chars>).  Such strings MUST be capable of %encoding the UTF-8 formatted NSS.
   %encoding, (as discussed in [3]) uses a percent sign "%" immediately
   followed translated
   into canonical NSS format before using them as protocol elements or
   otherwise passing them on to other applications. Translation is done
   by two hexadecimal digits (0-9, A-F) giving the binary code
   for that octet. The rules for %encoding presented in [3] apply with
   the following exceptions:

      1. [3] states that occurrence of encoding each character outside the '/' URN character in URIs must
      denote hierarchy, so that partial forms of a URI are possible.
      This restriction is unenforceable, and relative URLs do not have set as a
      scheme prefix, so we allow URNs to contain unescaped occurrences
   sequence of the '/' character that do not denote hierarchy.

      2. As an optimization when the transport between systems is known one to be 8-bit-clean, clients MAY omit the %encoding on 8-bit
      characters but MUST still %encode the reserved characters below.

   For historic reasons, the characters "#" (%23), "?" (%3F), "%" (%25),
   "*" (%2A), "!" (%21), "<" (%3C), ">" (%3E), and '"' (%22), are
   reserved six octets using UTF-8 encoding, and must be %encoded.  Thus client implementers should
   accept URNs from users in an unencoded form but must encode them
   before sending them to a resolver.

   URN resolvers MUST be capable the encoding
   of accepting URNs that have been
   %encoded for either 8-bit clean or 7-bit transports.  %encoding is
   removed first, then UTF-8 decoding is performed.  URN resolvers MUST
   return identical results from ANY legally encoded form each of those octets as "%" followed by two characters from the URN.

   It should be noted
   <hex> character set above. The two characters give the hexadecimal
   representation of that certain octet.

   Namespaces MAY designate one or more characters in from the Namespace Specific
   String syntax may have URN
   character set as having special meaning for that namespace. If the
   namespace also uses that character in certain namespaces. a literal sense as well, the
   character used in a literal sense must be encoded with "%" followed
   by the hexadecimal representation of that octet.  Therefore, the
   process of registering a namespace identifier shall include
   publication of a definition of which characters have a special
   meaning and how to encode these characters if used in a literal
   sense.

2.

3. Support of existing legacy naming systems

   To allow for support existing legacy and new naming systems (as required by
   [2]),

   URN-aware applications MAY accept as input other resource identifiers
   from existing legacy namespaces.  If such identifiers contain
   characters that are not members of the Namespace Specific String shall be considered an "opaque
   string" URN character set specified in
   section 2.2, the sense of structure except as mentioned in Section 1.

   In addition, URN servers should identifier MUST be prepared translated to accept URNs that do
   not use ISO 10646/UTF-8 for those namespaces canonical format as
   discussed in section 2.2.

   Some existing name spaces that currently use have the properties of the URN-space
   contain some human-significant components, and these exist in a
   different encoding.  Note wide
   variety of languages.  However, URNs are NOT intended to convey
   information that this is not a general requirement on
   all resolvers, only resolvers that handle significant to humans.  While the translation
   rule in section 2.2 is provided for existing namespaces, new
   namespaces, as part of their registration documentation, MUST define
   a namespace discipline for assigning new URNs that is known does not to use ISO 10646/UTF-8.

3. URN encoding for transmission

   Because simplify the NSS
   generation of a human-significant names.

4. URN is considered a series of octets presentation and transport

   URN-aware applications MAY support "natural" display of data,
   encoding URNs which
   contain characters encoded using "%" notation.  However, they MUST
   provide for transport is the responsibility display of the transport
   mechanism and is not discussed here.  Any mechanism that can handle
   arbitrary 8-bit data will successfully transport URNs in canonical form (i.e. in a URN.

4. format
   suitable for transcription).

   URNs may only be transported in canonical format.

5. Equivalence in URNs

URNs are considered equivalent if they return the same result. resource.  For
various purposes, such as caching, a test is necessary to determine
equivalence without actually resolving the URNs and fetching/comparing
the underlying resources.  "Lexical equivalence" is a stricter condition
that the equivalence described above (functional equivalence).

4.1

5.1 Lexical Equivalence

   Lexical equivalence may be determined by comparing two URNs without
   making any network accesses. Two URNs are lexically equivalent if
   they are octet-by-octet equal after the following preprocessing

           1. remove any %encoding that might be present
           2. drop any preceding "urn:" token
           3.
           2. normalize the case of the NID

   Some namespaces may define additional lexical equivalences, such as
   case-insensitivity of the NSS (or parts thereof).  Additional lexical
   equivalences MUST be documented as part of namespace registration,
   MUST always have the effect of eliminating some of the false
   negatives obtained by the procedure above, and MUST NEVER says that
   two URNs are not equivalent if the procedure above says they are
   equivalent.

4.2

5.2 Functional Equivalence

   Resolvers determine functional equivalence based on specific rules
   for the namespace.  Therefore, namespace registration must include
   documentation on how to determine functional equivalence for that
   namespace.

4.3

5.3 Examples

   The following URN comparisons highlight the difference between these
   types of equivalence:

     urn:isbn:1-23485-8-29, isbn:1-23485-8-29 are lexically equiv.
     urn:isbn:1-23485-8-29, ISBN:1-23485-8-29 are lexically equiv.
     urn:isbn:1-23485-8-29, isbn:123485829 are not lexically equiv.
        but may be functionally equivalent.

5.

6. Security considerations

   Because of the number of potential namespaces, it must be restated
   that certain of the characters in the Namespace Specific String may
   have special meaning to certain namespace resolvers.  The process of
   registering a namespace identifier shall therefore include
   publication of a definition of which characters have a special
   meaning and how to encode these characters if used in a literal
   sense.

6.
   meaning.

7. Acknowledgments

   Thanks to various members of the URN working group and <<your name
   here!!>> for comments on earlier drafts of this document.  This
   document is partially supported by the National Science Foundation.

7.

8. References

   Request For Comments (RFC) and Internet Draft documents are available
   from <URL:ftp://ftp.internic.net> and numerous mirror sites.

         [1]         L. L. Daigle, P. Faltstrom, R. Iannella.  "A Framework Frame-
                     work for the Assignment and Resolution of Uniform
                     Resource Names", Names," Internet Draft (work in progress).
                     June 1996.

         [2]         K. Sollins, L. Masinter.  "Functional Requirements
                     for Uniform Resource Names", Names," RFC 1737.  December
                     1994.

         [3]         T. Berners-Lee. "Universal Resource Identifiers in WWW",
                     WWW," RFC 1630. June 1994.

         [4]         T. Berners-Lee, L. Masinter, M. McCahill. "Uniform
                     Resource Locators (URL)", (URL)," RFC 1738.  December 1994.

8. Author's

9. Editor's address

   Ryan Moats
   AT&T
   15621 Drexel Circle
   Omaha, NE 68135-2358
   USA

   Phone:  +1 402 894-9456
   EMail:  jayhawk@ds.internic.net

                 This Internet Draft expires April 1, May 19, 1997.