[Docs] [txt|pdf] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 03 04 05 06 07 08 09 10

INTERNET-DRAFT                                         Larry Masinter
draft-masinter-dated-uri-01.txt                         March 1, 2002
Expires September 2002

        "duri" and "tdb" URN namespaces based on dated URIs

Status of this Memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups.  Note that other
groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.


Abstract

This document defines two persistent namespaces of URNs based on
prepending a date to an (encoded) URI. The results are namespaces in
which names are readily assigned but which offer the persistence of
reference that is required by URNs. The first namespace (duri) is used
to refer to URI-identified resources themselves, while the second
namespace (tdb) is used to refer to abstractions that are not
themselves networked resources but are "described by" them. One
reason for defining these is to help illustrate the boundaries
of applicability for URIs as permanent identifiers and for naming
abstract non-networked resources. Similar ideas have been discussed
in many fora for a number of years.

This document is not a product of any working group, but may be
discussed on the mailing list <uri@w3.org>.

1. Overview and Requirements

The URN namespaces defined here solve two separate but related
problems, discussed in this section.

1.1 Intrinsically Persistent Identifiers

Many people have wondered about how to create globally unique and
persistent identifiers. There are a number of URI schemes and URN
namespaces already registered. However, many of them lack an adequate
guarantee of both uniqueness and persistence.

In some cases, the guarantee of persistence comes through a promise of
good management practice, such as is encouraged in "Cool URIs don't
change" [COOL].  However, relying on promise of good management
practice is not the same as having a design that guarantees
reliability independent of actual administrative practice.

A primary design goal for URIs is that they are intended to mean the
same thing, no matter in what context they appear: a "Uniform" way to
Identify a Resource. However, even when URIs have Uniform meaning from
the point of view of the source of the reference, they don't guarantee
stability over time. Despite best efforts and intentions, identifying
information can change in unpredictable ways: domain names can
disappear or be reassigned, name assigning organizations can change
structure, responsibility, disappear, merge, or change in
unpredictable ways.

1.2 URIs for abstractions

The description of URIs [RFC 2396] describes a scope for 'Resource'
that is quite broad:

     A resource can be anything that has identity.  Familiar examples
     include an electronic document, an image, a service (e.g.,
     "today's weather report for Los Angeles"), and a collection of
     other resources.  Not all resources are network "retrievable";
     e.g., human beings, corporations, and bound books in a library
     can also be considered resources.

However, most of the URI mechanisms are either quite concrete,
(including an identification of protocol and protocol parameters for
connecting to a network communication endpoint), or else quite vague
about the way in which they are connected to the resource they
identify.

There is a significant dependence in the interpretation of many URNs
with the concept of "naming authority". The authority is presumably
some individual or organization both to insure uniqueness of
assignment and also to help with understanding the meaning of the link
between the name and the named.

However, authorities, whether individuals or organizations, have a
lifetime, and must be consulted at some point to understand the
bindings. The functioning of names as unique identifiers and holders
of meaning depends on having a reliable infrastructure of consulting
the authority or the authorities records to determine the thing
referenced. The goal, then, of the second URN scheme proposed
below is to provide a mechanism which is, at the same time:

   * permanent (the identity of the resource identified
       is not subject to reinterpretation over time)
   * explicitly bound (the mechanism by which the identified
       resource can be determined is explicitly included in
       the URI)
   * allows identification of resources outside the network:
       people, organizations, abstract concepts.
   * does not depend on reliable administrative processes
       of authorities for either assignment or interpretation

2. Namespace definitions

2.1 "duri" namespace

It is traditional in convention references and citations in printed
works to include the date of publication; this practice serves the
important purpose that the context of the naming can be determined.

The "duri" URN namespace takes the form:

     urn:duri:<date>:<encoded-URI>

where <date> is a digit string corresponding to a date (Section 4),
and an <encoded-URI> is an absolute URI-reference [RFC 2396] in which
any character excluded from URN syntax has been escaped (Section 3).

The meaning of a duri is "the resource (or fragment) that was
identified by the <encoded-URI> (after hex decoding) at the very first
instant of the date(time) given".

For example, 'urn:duri:2001:http://www.ietf.org' is a persistent
identifier to 'http://www.ietf.org' as of the very first moment of the
year 2001. A duri may not be a resource locator in a practical sense,
because the time of location has passed. However, is an acceptable
resource identifier, and fulfills all of the requirements for
URNs [RFC 1737].

2.2 "tdb" namespace

The second URN namespace defined is a parallel space which is useful
for describing entities, concepts, abstractions, and other items which
are not themselves network accessible resources, but have been at some
point described by network accessible resources.

The "tdb" namespace designates the "thing described by" a resource at
a given URI at the given time. This URN namespace is described by
'tdb', e.g.,

        urn:tdb:<date>:<encoded-URI>

with the same syntactic rules as 'duri'.

The intent is to use the inversion of "is a web resource about".  It
is common practice to give a reference for a concept by including a
pointer to a document, segment, phrase that defines the concept.
"tdb" attempts to capture this practice in URI space.

For example, "urn:tdb:2001:http://www.ietf.org" can be used to
designate the Internet Engineering Task Force organization, at least
as it was described by or referenced by its home page at the first
instant of 2001.

The "tdb" namespace differs from most other mechanisms for identifying
abstractions because the designation of what is actually identified by
the tdb doesn't depend on knowing the intention of the "assigner" of
the identifier. Unlike many of the alternatives proposed, the
identification is not dependent on the context of use.

The "tdb" namespace can be thought of as following another level of
indirection to URI resolution. While one could imagine using 'tdb'
without a date, it would leave the possibility that a reference that
is unambiguous at one time might become ambiguous at some other time.

3. Encoding URIs

Both "duri" and "tdb" URN namespaces require that some characters in
the URI references be encoded.

3.1 Characters that must be encoded

The characters that must be encoded are:

* All characters marked <excluded> in RFC 2141, section 2.4
  These are excluded because they are not allowed in URNs.
                \"&<>[]^`{|}~

* The character "#"
  Note that the <encoded-URI> of a "duri" or "tdb" can include a
  fragment identifier, but the "#" character used to delimit it must
  be encoded.

* The character "%"
  The encoded-URI can itself contain encoded characters, which are
  encoded with the same method. To insure that decoding happens at the
  right level of processing, the "%" itself must be encoded.

Unfortunately, there are many cases where there is a double
encoding of characters, first to construct the embedded URI
itself and second to then embed the URI within the tdb or
duri URN.



3.2 No need to encode "/"

The URN recommendation discourages the use of "/" in URNs because, in
general, there is no good interpretation of hierarchy and relative
URIs for assigned names. However, for the particular case of
duris (at least), there seems to be no good reason to avoid
the "/" because it corresponds fairly naturally (in many cases)
to the hierarchy of the original space.

4. Dates

A <date> is a simple expression of date, optional time, with arbitrary
precision. The goal is to allow relatively short expressions of dates
with no ambiguity, and with arbitrary precision. (The idea for this
syntax came from [RFC 2550].)

   date = year [ month [ day [ hour [ minute [ second [ fraction ]]]]]]

   year     = 4digit
   month    = 2digit
   day      = 2digit
   hour     = 2digit
   minute   = 2digit
   second   = 2digit
   fraction = *digit

The representation of a date or time refers to the very first instant
of the given date, so that, for example, 1999 and 199901010000 are
equivalent. If necessary, dates can include times and even fractional
times, so that a generator of duris can be arbitrarily precise.

Dates are interpreted relative to International Atomic Time [TAI], so
that there is no ambiguity about time zone.

5. Additional Considerations

5.1 Embedded URI schemes

Many URI schemes are appropriate for use inside duris and tdb URNs.

Of course, a common usage would be use a "http" URI to refer to a web
page or the subject of a web site at a given time. This can be a way
of referring to a web site at some date in the past, or an
organization that has changed or merged.

Local systems that have unique host names can use "file" URIs in
their duris, for example,

 urn:tdb:20010814142327:file://this.example.com/c|/temp/test.txt

can uniquely and unambiguously refer to a concept whose description is
contained in a system's local disk. While file URIs are difficult to
use for global resolution because of ambiguities of file system and
access methods, in this case, because the instant is fixed, the naming
mechanism of the host can prevail. (Using 'file:' URIs without a host
name is not recommended, because the interpretation is not uniform.)

Even urns might appear within a duri in unusual circumstances.  For
example, there are circumstances where the assignment of names a URN
namespace are not in practice be permanent, or that one might want to
refer to the assignment as of a given date. In this case, it is
possible to use a "urn" within a "duri", e.g.,

     urn:duri:2000:urn:ietf:std:50

might be used to refer to "the document that was STD 50 in effect as
of the first instant of 2000". [RFC 2648]

5.2 Using the "data" URI scheme with tdb

It's possible to using "tdb" to designate concepts that can be
described uniquely briefly inline. For example,

   urn:tdb:2001:data:,The%2520US%2520president

names the concept described by the (text/plain) string "The US
president" at the very first instant of 2001. (Note the awkward double
quoting of space as "%20" and then the "%" as "%25".) Of course, this
practice is only useful if the referent of the data is (or was at the
time designated) well-defined. In the case of 'data', there is no
assigning authority at all; the interpretation of the 'tdb' URN depend
on the interpreting community. 'urn:tdb:2001:data:,it' would not be
useful.

5.3 Useful dates

Dates in the future SHOULD NOT be used, because the meaning of the
duri or tdb cannot readily be determined in advance reliably.  Dates
prior to the actual assignment of the resource to the embedded URI
(and, certainly, dates far in the past) SHOULD NOT be used, because
the meaning of the reference is left in question. For example, using
http URIs before a web service was available at the given URI doesn't
make much sense.

However, although these practices are NOT RECOMMENDED, there is no
assurance that they have followed; by itself, a duri/tdb does not
constitute an assertion that the encoded-URI was available or assigned
at the date specified.

Note that the use of the "very first instant" means that a duri/tdb
using only a year must give a year greater than the first year in
which the corresponding URI was published; if a web page is published
in the middle of 2001, then "duri:2001:..." would be inappropriate.

5.4 Free assignment

Because of the many possible schemes that can be used in the
<encoded-URI> portion, there should be no difficulty in almost any
computational process being able to assign duris or tdbs at will. Of
course, it is necessary for there to be some resource which is
available at some point in time, and to have a clock which is accurate
to the granularity of the frequency of assignment.

5.5 Resolution

There are no accurate resolution servers for duri or tdb URNs.  A duri
might be "resolvable" in the sense that a resource that was accessed
at a point in time might have the result of that access cached or
archived in an Internet archive service. A "tdb" is only resolvable in
the sense that if the corresponding duri can be resolved, it may be
possible that the result can be accessed and interpreted.

Clients without access to an Internet archive service might take the
decoded <encoded-URI> of a duri and attempt resolution of *that*
identifier. This will give an approximation whose reliability depends
on the amount of time elapsed since the date indicated.

5.6 Why Names with Semantics?

There are a number of proposals for URN schemes that create otherwise
unbound "names", where the URN scheme only provides for uniqueness.
Neither "duri" nor "tdb" intrinsically have the property that the
names assigned are without any resolution semantics. This is
intentional; it's difficult to create names that carry no semantics
whatsoever about the authority that assigned the name and the
intention of the authority for what the name should designate.

5.7 Avoiding MetaData

One might consider the date in a duri/tdb to be just one piece of
additional metadata about the encoded-URI, and consider adding other
pieces of metadata as annotation.

However, the use of the date in a duri/tdb is intended primarily as a
mechanism of accomplishing uniqueness over time. No other bit of
metadata or description readily fills that purpose. Further, the date
is not descriptive (an assertion about the encoded-URI) but merely
refining.

5.8 Avoiding duri and tdb

Many applications of URIs already provide a context of date. For
example, one could imagine a hypertext system where the URIs contained
within a document were intended to refer to the resources as of the
date of the enclosing document. This would be a reasonable
interpretation of URIs within an Internet archive system, for example.

And some applications of URIs arguably already contain the level of
interpretive indirection that is explicit with "tdb". For example, one
might consider the use of URIs as namespace names within XML [XMLNAME]
as a reference to the "thing described by" the URI used.

5.9 tdb and RDF

The Resource Description Framework [RDF] is an XML-based framework for
describing assertions. RDF uses URIs to identify the objects being
described and XML-based tags to describe the relationships between
them.

The relations in RDF, however, may already provide for the "thing
described by" indirection. For example, the example in Section 3.2.1
of [RDF] claims the model for the sentence

           "The students in course 6.001 are Amy, Tim and Mary"

would be written in RDF/XML as

 <rdf:RDF>
   <rdf:Description about="http://mycollege.edu/courses/6.001">
     <s:students>
       <rdf:Bag>
        <rdf:li resource="http://mycollege.edu/students/Amy"/>
        <rdf:li resource="http://mycollege.edu/students/Tim"/>
        <rdf:li resource="http://mycollege.edu/students/Mary"/>
      </rdf:Bag>
    </s:students>
  </rdf:Description>
</rdf:RDF>

but the resources listed are web pages (served by HTTP) and the class
and students are the "things described by" those web pages.

6. URN Specification Templates

6.1 "duri" Specification Template

  Namespace ID:
      "duri" requested.

  Registration Information:
      Registration Version: 1
      Registration Date: 2001-08-19

  Declared registrant of the namespace:
      Larry Masinter (see Section 10 of this document.)

  Declaration of syntactic structure:
      Briefly, the syntax is
          urn:duri:<date>:<encoded-URI>
      The syntax is described in Sections 1-3 of this document.

  Relevant ancillary documentation:
      (See Section 10, References, of this document)

  Identifier uniqueness considerations:
      Uniqueness is guaranteed by the structure of adding
      a designation of a specific instant to a URI. However,
      URIs with ambiguous interpretation at any given
      instant (e.g., "file" URIs without a given host name)
      will not be unique.

  Identifier persistence considerations:
      The designation of a dated URI is completely persistent
      for all time.

  Process of identifier assignment:
      Any date can be used with any URI independently
      by anyone.

  Process of identifier resolution:
      Identifiers can only be resolved approximately. See
      Section 4.3.

  Conformance with URN Syntax:
      Note that the use of "/" for hierarchy, while discouraged
      in the URN specification, is allowed in duris.

  Rules for Lexical Equivalent:
      For dates, YYYY is equivalent to YYYY01, YYYYMM is equivalent to
      YYYYMM01, while YYYYMMDD is equivalent to YYYYMMDD0... followed
      by any number of 0's.

      In considering equivalence of the encoded URI, if two duris with
      equivalent dates contain lexically equivalent URIs, the duris
      are equivalent.

  Validation mechanism:
      Dates should be reasonable and meet the syntactic requirements.
      The URI encoded within should meet the syntactic requirements of
      the URI scheme used.

  Scope:
       Global.

6.2 "tdb" Specification Template

  Namespace ID:
      "tdb" requested.

  Registration Information:
      Registration Version: 1
      Registration Date: 2001-08-19

  Declared registrant of the namespace:
      Larry Masinter (see Section 10 of this document.)

  Declaration of syntactic structure:
      Briefly, the syntax is
          urn:tdb:<date>:<encoded-URI>
      The syntax is described in Sections 1-3 of this document.

  Relevant ancillary documentation:
      (See Section 10, References, of this document)

  Identifier uniqueness considerations:
      Uniqueness is guaranteed by the structure of adding
      a designation of a specific instant to a URI. However,
      URIs with ambiguous interpretation at any given
      instant (e.g., "file" URIs without a given host name)
      will not be unique.

  Identifier persistence considerations:
      The designation of a dated URI is completely persistent
      for all time, although the intent of a resource that
      is no longer available will be hard to discern.

  Process of identifier assignment:
      Any date can be used with any URI independently
      by anyone.

  Process of identifier resolution:
      Resolution of "tdb" identifiers requires interpreting
      the resource identified by the corresponding "duri".
      See Section 4.3 of this document.

  Rules for Lexical Equivalent:
      As with "duri", see section 5.1.

  Conformance with URN Syntax:
      As with "duri", see section 5.1.

  Validation mechanism:
      As with "duri", see section 5.1.

  Scope:
       Global.



7. IANA considerations

This document includes two URN NID registrations (sections 5.1 and
5.2) that should be entered into the IANA registry of URN NIDs.

8. Security Considerations

duris and tdbs are not any more reliable because they are dated.
URIs don't contain enough information to supply the authority for
deciding what was or wasn't at a given URI at a given date.

9. Acknowledgements

Many thanks to the many discussions on the relationship of URLs, URNs,
URIs and resource identifiers, as well as similar ideas, that have
been floated over the last many years.

10. Copyright

Copyright (C) The Internet Society, 2002. All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are
included on all such copies and derivative works.  However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of developing
Internet standards in which case the procedures for copyrights defined
in the Internet Standards process must be followed, or as required to
translate it into languages other than English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT
NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN
WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."


11. Author's address

          Larry Masinter
          345 Park Ave, #W14
          San Jose, CA 95110
          mailto: LMM@acm.org
          http://larry.masinter.net

12. References

[RFC 2141] R. Moats, "URN Syntax", May 1997.

[COOL] Tim Berners-Lee, "Cool URLs don't change.", 1998.
    <http://www.w3.org/Provider/Style/URI>.

[RFC 2396] R. Fielding, L. Masinter, "Uniform Resource Identifiers
    (URI): Generic Syntax", RFC 1396, August 1998.

[RFC 1737] K. Sollins, L. Masinter, "Functional Requirements for
    Uniform Resource Names", RFC 1737, December 1994.

[RFC 2550] S. Glassman, M. Manasse, J. Mogul, "Y10K and Beyond", RFC
  2550, April 1, 1999.
  <urn:duri:19990401:http://www.ietf.org/rfc/rfc2550.txt>

[TAI] "International Atomic Time",
    <http://www.bipm.fr/enus/5_Scientific/c_time/time_1.html>

[RFC 2648] R. Moats, "A URN Namespace for IETF Documents", August
    1999. <urn:ietf:rfc:2648>.

[XMLNAME] "Namespaces in XML", World Wide Web Consortium
    Recommendation,
    <urn:duri:19990114:http://www.w3.org/TR/REC-xml-names>.

[RDF] "Resource Description Framework (RDF) Model and Syntax
    Specification", World Wide Web Consortium Recommendation,
    <http://www.w3.org/TR/REC-rdf-syntax/>


Html markup produced by rfcmarkup 1.129b, available from https://tools.ietf.org/tools/rfcmarkup/