< draft-kunze-ark-21.txt   draft-kunze-ark-22.txt >
Network Working Group J. Kunze Network Working Group J. Kunze
Internet-Draft California Digital Library Internet-Draft California Digital Library
Intended status: Informational E. Bermes Intended status: Informational E. Bermes
Expires: November 30, 2019 Bibliotheque nationale de France Expires: December 24, 2019 Bibliotheque nationale de France
May 29, 2019 June 22, 2019
The ARK Identifier Scheme The ARK Identifier Scheme
draft-kunze-ark-21 draft-kunze-ark-22
Abstract Abstract
The ARK (Archival Resource Key) naming scheme is designed to The ARK (Archival Resource Key) naming scheme is designed to
facilitate the high-quality and persistent identification of facilitate the high-quality and persistent identification of
information objects. A founding principle of the ARK is that information objects. A founding principle of the ARK is that
persistence is purely a matter of service and is neither inherent in persistence is purely a matter of service and is neither inherent in
an object nor conferred on it by a particular naming syntax. The an object nor conferred on it by a particular naming syntax. The
best that an identifier can do is to lead users to the services that best that an identifier can do is to lead users to the services that
support robust reference. The term ARK itself refers both to the support robust reference. The term ARK itself refers both to the
skipping to change at page 2, line 12 skipping to change at page 2, line 12
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 30, 2019. This Internet-Draft will expire on December 24, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 43 skipping to change at page 2, line 43
2. ARK Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . 8 2. ARK Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1. The Name Mapping Authority Hostport (NMAH) . . . . . . . 9 2.1. The Name Mapping Authority Hostport (NMAH) . . . . . . . 9
2.2. The ARK Label Part (ark:) . . . . . . . . . . . . . . . . 11 2.2. The ARK Label Part (ark:) . . . . . . . . . . . . . . . . 11
2.3. The Name Assigning Authority Number (NAAN) . . . . . . . 11 2.3. The Name Assigning Authority Number (NAAN) . . . . . . . 11
2.4. The Name Part . . . . . . . . . . . . . . . . . . . . . . 12 2.4. The Name Part . . . . . . . . . . . . . . . . . . . . . . 12
2.5. The Qualifier Part . . . . . . . . . . . . . . . . . . . 13 2.5. The Qualifier Part . . . . . . . . . . . . . . . . . . . 13
2.5.1. ARKs that Reveal Object Hierarchy . . . . . . . . . . 14 2.5.1. ARKs that Reveal Object Hierarchy . . . . . . . . . . 14
2.5.2. ARKs that Reveal Object Variants . . . . . . . . . . 15 2.5.2. ARKs that Reveal Object Variants . . . . . . . . . . 15
2.6. Character Repertoires . . . . . . . . . . . . . . . . . . 16 2.6. Character Repertoires . . . . . . . . . . . . . . . . . . 16
2.7. Normalization and Lexical Equivalence . . . . . . . . . . 17 2.7. Normalization and Lexical Equivalence . . . . . . . . . . 17
3. Naming Considerations . . . . . . . . . . . . . . . . . . . . 18 3. Naming Considerations . . . . . . . . . . . . . . . . . . . . 19
3.1. ARKS Embedded in Language . . . . . . . . . . . . . . . . 19 3.1. ARKS Embedded in Language . . . . . . . . . . . . . . . . 19
3.2. Objects Should Wear Their Identifiers . . . . . . . . . . 19 3.2. Objects Should Wear Their Identifiers . . . . . . . . . . 19
3.3. Names are Political, not Technological . . . . . . . . . 19 3.3. Names are Political, not Technological . . . . . . . . . 20
3.4. Choosing a Hostname or NMA . . . . . . . . . . . . . . . 20 3.4. Choosing a Hostname or NMA . . . . . . . . . . . . . . . 20
3.5. Assigners of ARKs . . . . . . . . . . . . . . . . . . . . 21 3.5. Assigners of ARKs . . . . . . . . . . . . . . . . . . . . 22
3.6. NAAN Namespace Management . . . . . . . . . . . . . . . . 22 3.6. NAAN Namespace Management . . . . . . . . . . . . . . . . 22
3.7. Sub-Object Naming . . . . . . . . . . . . . . . . . . . . 23 3.7. Sub-Object Naming . . . . . . . . . . . . . . . . . . . . 24
4. Finding a Name Mapping Authority . . . . . . . . . . . . . . 24 4. Finding a Name Mapping Authority . . . . . . . . . . . . . . 24
4.1. Looking Up NMAHs in a Globally Accessible File . . . . . 25 4.1. Looking Up NMAHs in a Globally Accessible File . . . . . 25
5. Generic ARK Service Definition . . . . . . . . . . . . . . . 26 5. Generic ARK Service Definition . . . . . . . . . . . . . . . 27
5.1. Generic ARK Access Service (access, location) . . . . . . 27 5.1. Generic ARK Access Service (access, location) . . . . . . 27
5.1.1. Generic Policy Service (permanence, naming, etc.) . . 27 5.1.1. Generic Policy Service (permanence, naming, etc.) . . 27
5.1.2. Generic Description Service . . . . . . . . . . . . . 29 5.1.2. Generic Description Service . . . . . . . . . . . . . 29
5.2. Overview of The HTTP URL Mapping Protocol (THUMP) . . . . 29 5.2. Overview of The HTTP URL Mapping Protocol (THUMP) . . . . 29
5.3. The Electronic Resource Citation (ERC) . . . . . . . . . 32 5.3. The Electronic Resource Citation (ERC) . . . . . . . . . 32
5.4. Advice to Web Clients . . . . . . . . . . . . . . . . . . 34 5.4. Advice to Web Clients . . . . . . . . . . . . . . . . . . 34
5.5. Security Considerations . . . . . . . . . . . . . . . . . 35 5.5. Security Considerations . . . . . . . . . . . . . . . . . 35
6. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 35
Appendix A. ARK Maintenance Agency: arks.org . . . . . . . . . . 37 Appendix A. ARK Maintenance Agency: arks.org . . . . . . . . . . 37
Appendix B. Looking up NMAHs Distributed via DNS . . . . . . . . 38 Appendix B. Looking up NMAHs Distributed via DNS . . . . . . . . 38
skipping to change at page 11, line 12 skipping to change at page 11, line 12
presumably evolve to perform this (currently simple) transformation presumably evolve to perform this (currently simple) transformation
automatically. automatically.
2.2. The ARK Label Part (ark:) 2.2. The ARK Label Part (ark:)
The label part distinguishes an ARK from an ordinary identifier. The label part distinguishes an ARK from an ordinary identifier.
There is a new form of the label, "ark:", and an old form, "ark:/", There is a new form of the label, "ark:", and an old form, "ark:/",
both of which must be recognized in perpetuity. Implementations both of which must be recognized in perpetuity. Implementations
should generate new ARKs in the new form (without the "/") and should generate new ARKs in the new form (without the "/") and
resolvers must always treat received ARKs as equivalent if they resolvers must always treat received ARKs as equivalent if they
differ only regard to new form versus old form labels. Thus these differ only in regard to new form versus old form labels. Thus these
two ARKs are equivalent: two ARKs are equivalent:
ark:/12025/654xz321 ark:/12025/654xz321
ark:12025/654xz321 ark:12025/654xz321
In a URL found in the wild, the label indicates that the URL stands a In a URL found in the wild, the label indicates that the URL stands a
reasonable chance of being an ARK. If the context warrants, reasonable chance of being an ARK. If the context warrants,
verification that it actually is an ARK can be done by testing it for verification that it actually is an ARK can be done by testing it for
existence of the three ARK services. existence of the three ARK services.
skipping to change at page 18, line 9 skipping to change at page 18, line 9
are compared for lexical equivalence after first being normalized. are compared for lexical equivalence after first being normalized.
Since ARK strings may appear in various forms (e.g., having different Since ARK strings may appear in various forms (e.g., having different
NMAHs), normalizing them minimizes the chances that comparing two ARK NMAHs), normalizing them minimizes the chances that comparing two ARK
strings for equality will fail unless they actually identify strings for equality will fail unless they actually identify
different objects. In a specified-host ARK (one having an NMAH), the different objects. In a specified-host ARK (one having an NMAH), the
NMAH never participates in such comparisons. Normalization described NMAH never participates in such comparisons. Normalization described
here serves to define lexical equivalence but does not restrict how here serves to define lexical equivalence but does not restrict how
implementors normalize ARKs locally for storage. implementors normalize ARKs locally for storage.
Normalization of a received ARK for the purpose of octet-by-octet Normalization of a received ARK for the purpose of octet-by-octet
equality comparison with another ARK consists of several steps. equality comparison with another ARK consists of the following steps.
First, the NMAH part (eg, everything from an initial "http://" up to
the next slash), if present is removed. Second, any URI query string
is removed (everything from the first literal '?' to the end of the
string). Third, the first case-insensitive match on "ark:/" or
"ark:" is converted to "ark:" (replacing any upper case letters and
removing any terminal '/'). Fourth, in the string that remains, the
two characters following every occurrence of `%' are converted to
lower case. The case of all other letters in the ARK string must be
preserved. Fifth, all hyphens are removed.
Sixth, if normalization is being done as part of a resolution step, 1. The NMAH part (eg, everything from an initial "http://" up to the
and if the end of the remaining string matches a known inflection, next slash), if present is removed.
the inflection is noted and removed. Seventh, structural characters
(slash and period) are normalized: initial and final occurrences are
removed, and two structural characters in a row (e.g., // or ./) are
replaced by the first character, iterating until each occurrence has
at least one non-structural character on either side. Finally, if
there are any components with a period on the left and a slash on the
right, either the component and the preceding period must be moved to
the end of the Name part or the ARK must be thrown out as malformed.
The fourth and final step is to arrange the suffixes in ASCII 2. Any URI query string is removed (everything from the first
collating sequence (that is, to sort them) and to remove duplicate literal '?' to the end of the string).
suffixes, if any. It is also permissible to throw out ARKs for which
the suffixes are not sorted. 3. The first case-insensitive match on "ark:/" or "ark:" is
converted to "ark:" (replacing any upper case letters and
removing any terminal '/').
4. In the string that remains, the two characters following every
occurrence of `%' are converted to lower case. The case of all
other letters in the ARK string must be preserved.
5. All hyphens, are removed.
6. If normalization is being done as part of a resolution step, and
if the end of the remaining string matches a known inflection,
the inflection is noted and removed.
7. Structural characters (slash and period) are normalized: initial
and final occurrences are removed, and two structural characters
in a row (e.g., // or ./) are replaced by the first character,
iterating until each occurrence has at least one non-structural
character on either side.
8. If there are any components with a period on the left and a slash
on the right, either the component and the preceding period must
be moved to the end of the Name part or the ARK must be thrown
out as malformed.
9. The final step is to arrange the suffixes in ASCII collating
sequence (that is, to sort them) and to remove duplicate
suffixes, if any. It is also permissible to throw out ARKs for
which the suffixes are not sorted.
The resulting ARK string is now normalized. Comparisons between The resulting ARK string is now normalized. Comparisons between
normalized ARKs are case-sensitive, meaning that upper case letters normalized ARKs are case-sensitive, meaning that upper case letters
are considered different from their lower case counterparts. are considered different from their lower case counterparts.
To keep ARK string variation to a minimum, no reserved ARK characters To keep ARK string variation to a minimum, no reserved ARK characters
should be %-encoded unless it is deliberately to conceal their should be %-encoded unless it is deliberately to conceal their
reserved meanings. No non-reserved ARK characters should ever be reserved meanings. No non-reserved ARK characters should ever be
%-encoded. Finally, no %-encoded character should ever appear in an %-encoded. Finally, no %-encoded character should ever appear in an
ARK in its decoded form. ARK in its decoded form.
skipping to change at page 22, line 29 skipping to change at page 22, line 42
The ARK namespace reserved for an NAA is the set of names bearing its The ARK namespace reserved for an NAA is the set of names bearing its
particular NAAN. For example, all strings beginning with particular NAAN. For example, all strings beginning with
"ark:12025/" are under control of the NAA registered under 12025, "ark:12025/" are under control of the NAA registered under 12025,
which might be the National Library of Finland. Because each NAA has which might be the National Library of Finland. Because each NAA has
a different NAAN, names from one namespace cannot conflict with those a different NAAN, names from one namespace cannot conflict with those
from another. Each NAA is free to assign names from its namespace from another. Each NAA is free to assign names from its namespace
(or delegate assignment) according to its own policies. These (or delegate assignment) according to its own policies. These
policies must be documented in a manner similar to the declarations policies must be documented in a manner similar to the declarations
required for URN Namespace registration [RFC2611]. required for URN Namespace registration [RFC2611].
To register for a NAAN, please read about the mapping authority Organizations can request or update a NAAN by filling out a form
discovery file in the next section and send email to ark@cdlib.org. [NAANrequest].
3.6. NAAN Namespace Management 3.6. NAAN Namespace Management
Every NAA must have a namespace management strategy. A time-honored Every NAA must have a namespace management strategy. A time-honored
technique is to hierarchically partition a namespace into technique is to hierarchically partition a namespace into
subnamespaces using prefixes that guarantee non-collision of names in subnamespaces using prefixes that guarantee non-collision of names in
different partition. This practice is strongly encouraged for all different partition. This practice is strongly encouraged for all
NAAs, especially when subnamespace management will be delegated to NAAs, especially when subnamespace management will be delegated to
other departments, units, or projects within an organization. For other departments, units, or projects within an organization. For
example, with a NAAN that is assigned to a university and managed by example, with a NAAN that is assigned to a university and managed by
skipping to change at page 25, line 48 skipping to change at page 26, line 12
be reloaded periodically to incorporate updates. It is not expected be reloaded periodically to incorporate updates. It is not expected
that the size of the file or frequency of update should impose an that the size of the file or frequency of update should impose an
undue maintenance or searching burden any time soon, for even undue maintenance or searching burden any time soon, for even
primitive linear search of a file with ten-thousand NAAs is a primitive linear search of a file with ten-thousand NAAs is a
subsecond operation on modern server machines. The proposed file subsecond operation on modern server machines. The proposed file
strategy is similar to the /etc/hosts file strategy that supported strategy is similar to the /etc/hosts file strategy that supported
Internet host address lookup for a period of years before the advent Internet host address lookup for a period of years before the advent
of DNS. of DNS.
The name authority table file is updated on an ongoing basis and is The name authority table file is updated on an ongoing basis and is
available for copying over the internet from the California Digital available for copying over the internet from a number of mirror sites
Library at http://www.cdlib.org/inside/diglib/ark/natab and from a [NAANregistry]. The file contains comment lines (lines that begin
number of mirror sites. The file contains comment lines (lines that with `#') explaining the format and giving the file's modification
begin with `#') explaining the format and giving the file's time, reloading address, and NAA registration instructions. There is
modification time, reloading address, and NAA registration even a Perl script that processes the file embedded in the file's
instructions. There is even a Perl script that processes the file comments. The currently registered Name Assigning Authorities are:
embedded in the file's comments. The currently registered Name
Assigning Authorities are:
12025 National Library of Medicine 12025 National Library of Medicine
12026 Library of Congress 12026 Library of Congress
12027 National Agriculture Library 12027 National Agriculture Library
13030 California Digital Library 13030 California Digital Library
13038 World Intellectual Property Organization 13038 World Intellectual Property Organization
20775 University of California San Diego 20775 University of California San Diego
29114 University of California San Francisco 29114 University of California San Francisco
28722 University of California Berkeley 28722 University of California Berkeley
21198 University of California Los Angeles 21198 University of California Los Angeles
skipping to change at page 28, line 4 skipping to change at page 28, line 14
The permanence declaration for an object is a rating defined with The permanence declaration for an object is a rating defined with
respect to an identified permanence provider (guarantor), which will respect to an identified permanence provider (guarantor), which will
be the NMA. It may include the following aspects. be the NMA. It may include the following aspects.
(a) "object availability" -- whether and how access to the object (a) "object availability" -- whether and how access to the object
is supported (e.g., online 24x7, or offline only), is supported (e.g., online 24x7, or offline only),
(b) "identifier validity" -- under what conditions the identifier (b) "identifier validity" -- under what conditions the identifier
will be or has been re-assigned, will be or has been re-assigned,
(c) "content invariance" -- under what conditions the content of (c) "content invariance" -- under what conditions the content of
the object is subject to change, and the object is subject to change, and
(d) "change history" -- access to corrections, migrations, and (d) "change history" -- access to corrections, migrations, and
revisions, whether through links to the changed objects themselves revisions, whether through links to the changed objects themselves
or through a document summarizing the change history or through a document summarizing the change history
A recent approach to persistence statements, conceived independently A recent approach to persistence statements, conceived independently
from ARKs, can be found at [PStatements], with ongoing work available from ARKs, can be found at [PStatements], with ongoing work available
at Appendix A.. An older approach to a permanence rating framework at Appendix A. An older approach to a permanence rating framework is
is given in [NLMPerm], which identified the following "permanence given in [NLMPerm], which identified the following "permanence
levels": levels":
Not Guaranteed: No commitment has been made to retain this Not Guaranteed: No commitment has been made to retain this
resource. It could become unavailable at any time. Its resource. It could become unavailable at any time. Its
identifier could be changed. identifier could be changed.
Permanent: Dynamic Content: A commitment has been made to keep Permanent: Dynamic Content: A commitment has been made to keep
this resource permanently available. Its identifier will always this resource permanently available. Its identifier will always
provide access to the resource. Its content could be revised or provide access to the resource. Its content could be revised or
replaced. replaced.
skipping to change at page 36, line 13 skipping to change at page 36, line 13
April 1999, <http://www.icsti.org/forum/30/#lannom>. April 1999, <http://www.icsti.org/forum/30/#lannom>.
[Kernel] Kunze, J., "A Metadata Kernel for Electronic Permanence", [Kernel] Kunze, J., "A Metadata Kernel for Electronic Permanence",
Journal of Digital Information Vol 2, Issue 2, Journal of Digital Information Vol 2, Issue 2,
ISSN 1368-7506, January 2002, ISSN 1368-7506, January 2002,
<http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Kunze/>. <http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Kunze/>.
[N2T] Library, C. D., "Name-to-Thing Resolver", August 2006, [N2T] Library, C. D., "Name-to-Thing Resolver", August 2006,
<http://n2t.net>. <http://n2t.net>.
[NAANregistry]
ARKs.org, "NAAN Registry", 2019,
<https://arks.org/e/pub/naan_registry.txt>.
[NAANrequest] [NAANrequest]
ARKs.org, "NAAN Request Form", 2018, ARKs.org, "NAAN Request Form", 2018,
<https://arks.org/e/naan_request>. <https://n2t.net/e/naan_request>.
[NLMPerm] Byrnes, M., "Defining NLM's Commitment to the Permanence [NLMPerm] Byrnes, M., "Defining NLM's Commitment to the Permanence
of Electronic Information", ARL 212:8-9, October 2000, of Electronic Information", ARL 212:8-9, October 2000,
<http://www.arl.org/newsltr/212/nlm.html>. <http://www.arl.org/newsltr/212/nlm.html>.
[NOID] Kunze, J., "Nice Opaque Identifiers", February 2005, [NOID] Kunze, J., "Nice Opaque Identifiers", February 2005,
<http://www.cdlib.org/inside/diglib/ark/noid.pdf>. <http://www.cdlib.org/inside/diglib/ark/noid.pdf>.
[PStatements] [PStatements]
Kunze, J., "Persistence statements: describing digital Kunze, J., "Persistence statements: describing digital
 End of changes. 18 change blocks. 
47 lines changed or deleted 61 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/