[Docs] [txt|pdf|xml|html] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 RFC 6249

Network Working Group                                           A. Bryan
Internet-Draft                                                  N. McNab
Intended status: Standards Track                            H. Nordstrom
Expires: August 24, 2010
                                                                 A. Ford
                                                     Roke Manor Research
                                                       February 20, 2010


    Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Headers
                      draft-bryan-metalinkhttp-15

Abstract

   This document specifies Metalink/HTTP: Mirrors and Cryptographic
   Hashes in HTTP Headers, a different way to get information that is
   usually contained in the Metalink XML-based download description
   format.  Metalink/HTTP describes multiple download locations
   (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures,
   and other information using existing standards for HTTP headers.
   Clients can transparently use this information to make file transfers
   more robust and reliable.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on August 24, 2010.

Copyright Notice




Bryan, et al.            Expires August 24, 2010                [Page 1]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Operation Overview . . . . . . . . . . . . . . . . . . . .  4
     1.2.  Examples . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.3.  Notational Conventions . . . . . . . . . . . . . . . . . .  5
   2.  Requirements . . . . . . . . . . . . . . . . . . . . . . . . .  5
   3.  Mirrors / Multiple Download Locations  . . . . . . . . . . . .  6
     3.1.  Mirror Priority  . . . . . . . . . . . . . . . . . . . . .  6
     3.2.  Mirror Geographical Location . . . . . . . . . . . . . . .  6
     3.3.  Coordinated Mirror Policies  . . . . . . . . . . . . . . .  7
     3.4.  Mirror Depth . . . . . . . . . . . . . . . . . . . . . . .  7
   4.  Peer-to-Peer / Metainfo  . . . . . . . . . . . . . . . . . . .  7
     4.1.  Metalink/XML Files . . . . . . . . . . . . . . . . . . . .  8
   5.  OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . .  8
   6.  Cryptographic Hashes of Whole Files  . . . . . . . . . . . . .  8
   7.  Client / Server Multi-source Download Interaction  . . . . . .  9
     7.1.  Error Prevention, Detection, and Correction  . . . . . . . 11
       7.1.1.  Error Prevention (Early File Mismatch Detection) . . . 11
       7.1.2.  Error Correction . . . . . . . . . . . . . . . . . . . 12
   8.  Multi-server Performance . . . . . . . . . . . . . . . . . . . 13
   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 13
   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 14
     10.1. URIs and IRIs  . . . . . . . . . . . . . . . . . . . . . . 14
     10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 14
     10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14
     10.4. Signing  . . . . . . . . . . . . . . . . . . . . . . . . . 15
   11. Normative References . . . . . . . . . . . . . . . . . . . . . 15
   Appendix A.  Acknowledgements and Contributors . . . . . . . . . . 16
   Appendix B.  Comparisons to Similar Options  . . . . . . . . . . . 16
   Appendix C.  Document History  . . . . . . . . . . . . . . . . . . 17
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18





Bryan, et al.            Expires August 24, 2010                [Page 2]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


1.  Introduction

   Metalink/HTTP is an alternative representation of Metalink
   information, which is usually presented as an XML-based document
   format [draft-bryan-metalink].  Metalink/HTTP attempts to provide as
   much functionality as the Metalink/XML format by using existing
   standards such as Web Linking [draft-nottingham-http-link-header],
   Instance Digests in HTTP [RFC3230], and ETags.  Metalink/HTTP is used
   to list information about a file to be downloaded.  This can include
   lists of multiple URIs (mirrors), Peer-to-Peer information,
   cryptographic hashes, and digital signatures.

   Identical copies of a file are frequently accessible in multiple
   locations on the Internet over a variety of protocols (such as FTP,
   HTTP, and Peer-to-Peer).  In some cases, users are shown a list of
   these multiple download locations (mirrors) and must manually select
   a single one on the basis of geographical location, priority, or
   bandwidth.  This distributes the load across multiple servers, and
   should also increase throughput and resilience.  At times, however,
   individual servers can be slow, outdated, or unreachable, but this
   can not be determined until the download has been initiated.  Users
   will rarely have sufficient information to choose the most
   appropriate server, and will often choose the first in a list which
   may not be optimal for their needs, and will lead to a particular
   server getting a disproportionate share of load.  The use of
   suboptimal mirrors can lead to the user canceling and restarting the
   download to try to manually find a better source.  During downloads,
   errors in transmission can corrupt the file.  There are no easy ways
   to repair these files.  For large downloads this can be extremely
   troublesome.  Any of the number of problems that can occur during a
   download lead to frustration on the part of users.

   Some popular sites automate the process of selecting mirrors using
   DNS load balancing, both to approximately balance load between
   servers, and to direct clients to nearby servers with the hope that
   this improves throughput.  Indeed, DNS load balancing can balance
   long-term server load fairly effectively, but it is less effective at
   delivering the best throughput to users when the bottleneck is not
   the server but the network.

   This document describes a mechanism by which the benefit of mirrors
   can be automatically and more effectively realized.  All the
   information about a download, including mirrors, cryptographic
   hashes, digital signatures, and more can be transferred in
   coordinated HTTP Headers.  This Metalink transfers the knowledge of
   the download server (and mirror database) to the client.  Clients can
   fallback to other mirrors if the current one has an issue.  With this
   knowledge, the client is enabled to work its way to a successful



Bryan, et al.            Expires August 24, 2010                [Page 3]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   download even under adverse circumstances.  All this is done
   transparently to the user and the download is much more reliable and
   efficient.  In contrast, a traditional HTTP redirect to a mirror
   conveys only extremely minimal information - one link to one server,
   and there is no provision in the HTTP protocol to handle failures.
   Furthermore, in order to provide better load distribution across
   servers and potentially faster downloads to users, Metalink/HTTP
   facilitates multi-source downloads, where portions of a file are
   downloaded from multiple mirrors (and optionally, Peer-to-Peer)
   simultaneously.

   [[ Discussion of this draft should take place on IETF HTTP WG mailing
   list at ietf-http-wg@w3.org or the Metalink discussion mailing list
   located at metalink-discussion@googlegroups.com.  To join the list,
   visit http://groups.google.com/group/metalink-discussion . ]]

1.1.  Operation Overview

   Detailed discussion of Metalink operation is covered in Section 2;
   this section will present a very brief, high-level overview of how
   Metalink achieves its goals.

   Upon connection to a Metalink/HTTP server, a client will receive
   information about other sources of the same resource and a
   cryptographic hash of the whole resource.  The client will then be
   able to request chunks of the file from the various sources,
   scheduling appropriately in order to maximise the download rate.

1.2.  Examples

   A brief Metalink server response with ETag, mirrors, .metalink,
   OpenPGP signature, and a cryptographic hash of the whole file:

   Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
   Link: <http://www2.example.com/example.ext>; rel="duplicate"
   Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
   Link: <http://example.com/example.ext.torrent>; rel="describedby";
   type="application/x-bittorrent"
   Link: <http://example.com/example.ext.metalink>; rel="describedby";
   type="application/metalink4+xml"
   Link: <http://example.com/example.ext.asc>; rel="describedby";
   type="application/pgp-signature"
   Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
   DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==







Bryan, et al.            Expires August 24, 2010                [Page 4]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


1.3.  Notational Conventions

   This specification describes conformance of Metalink/HTTP.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, [RFC2119], as
   scoped to those conformance targets.


2.  Requirements

   In this context, "Metalink" refers to Metalink/HTTP which consists of
   mirrors and cryptographic hashes in HTTP Headers as described in this
   document.  "Metalink/XML" refers to the XML format described in
   [draft-bryan-metalink].

   Metalink resources include a Link header
   [draft-nottingham-http-link-header] to present a list of mirrors in
   the response to a client request for the resource.  The cryptographic
   hash of a resource must be included via Instance Digests in HTTP
   [RFC3230].

   Metalink servers are HTTP servers with one or more Metalink
   resources.  Mirror and cryptographic hash information provided by the
   originating Metalink server MUST be considered authoritative.
   Metalink servers and their associated mirror servers SHOULD all share
   the same ETag policy (ETag Synchronization), i.e. based on the file
   contents (cryptographic hash) and not server-unique filesystem
   metadata.  The emitted ETag MAY be implemented the same as the
   Instance Digest for simplicity.  Metalink servers MAY offer Metalink/
   XML documents that contain cryptographic hashes of parts of the file
   and other information.

   Mirror servers are typically FTP or HTTP servers that "mirror"
   another server.  That is, they provide identical copies of (at least
   some) files that are also on the mirrored server.  Mirror servers MAY
   be Metalink servers.  Mirror servers MUST support serving partial
   content.  HTTP mirror servers SHOULD share the same ETag policy as
   the originating Metalink server.  HTTP Mirror servers SHOULD support
   Instance Digests in HTTP [RFC3230].

   Metalink clients use the mirrors provided by a Metalink server with
   Link header [draft-nottingham-http-link-header].  Metalink clients
   MUST support HTTP and MAY support FTP, BitTorrent, or other download
   methods.  Metalink clients MUST switch downloads from one mirror to
   another if the mirror becomes unreachable.  Metalink clients SHOULD
   support multi-source, or parallel, downloads, where portions of a



Bryan, et al.            Expires August 24, 2010                [Page 5]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   file are downloaded from multiple mirrors simultaneously (and
   optionally, from Peer-to-Peer sources).  Metalink clients MUST
   support Instance Digests in HTTP [RFC3230] by requesting and
   verifying cryptographic hashes.  Metalink clients MAY make use of
   digital signatures if they are offered.


3.  Mirrors / Multiple Download Locations

   Mirrors are specified with the Link header
   [draft-nottingham-http-link-header] and a relation type of
   "duplicate" as defined in Section 9.

   A brief Metalink server response with two mirrors only:

   Link: <http://www2.example.com/example.ext>; rel="duplicate";
   pri=1; pref=1
   Link: <ftp://ftp.example.com/example.ext>; rel="duplicate";
   pri=2; geo="gb"; depth=1

   [[Some organizations have many mirrors.  Only send a few mirrors, or
   only use the Link header if Want-Digest is used?]]

   It is up to the server to choose how many Link headers to send.  Such
   a decision could be a hard-coded limit, a random selection, based on
   file size, or based on server load.

3.1.  Mirror Priority

   Mirror servers are listed in order of priority (from most preferred
   to least) or have a "pri" value, where mirrors with lower values are
   used first.

   This is purely an expression of the server's preferences; it is up to
   the client what it does with this information, particularly with
   reference to how many servers to use at any one time.  A client MUST
   respect the server's priority ordering, however.

   [[Would it make more sense to use qvalue-style policies here, i.e.
   q=1.0 through q=0.0 ?]]

3.2.  Mirror Geographical Location

   Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2
   two letter country code for the geographical location of the physical
   server the URI is used to access.  A client may use this information
   to select a mirror, or set of mirrors, that are geographically near
   (if the client has access to such information), with the aim of



Bryan, et al.            Expires August 24, 2010                [Page 6]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   reducing network load at inter-country bottlenecks.

3.3.  Coordinated Mirror Policies

   There are two types of mirror servers: preferred and normal.
   Preferred mirror servers are HTTP mirror servers that MUST share the
   same ETag policy as the originating Metalink server.  Optimally, they
   will do both.  Preferred mirrors make it possible to detect early on,
   before data is transferred, if the file requested matches the desired
   file.  Preferred HTTP mirror servers have a "pref" value of 1.  By
   default, if unspecified then mirrors are considered "normal" and do
   not share the same ETag policy.  FTP mirrors, as they do not emit
   ETags, MUST always be considered "normal".

   HTTP Mirror servers SHOULD support Instance Digests in HTTP
   [RFC3230].

   [[Suggestion: In order for clients to identify servers that have
   coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g.

   ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5="

   ]]

3.4.  Mirror Depth

   Some mirrors may mirror single files, whole directories, or multiple
   directories.

   Mirror servers MAY have a "depth" value, where "depth=0" is the
   default.  A value of 0 means ONLY that file is mirrored.  A value of
   1 means that file and all other files and subdirectories in the
   directory are mirrored.  A value of 2 means the directory above, and
   all files and subdirectories, are mirrored.

   A mirror with a depth value of 4:

   Link: <http://www2.example.com/dir1/dir2/dir3/dir4/dir5/example.ext>;
   rel="duplicate"; pri=1; pref=1; depth=4

   Is the above example, 4 directories up are mirrored, from /dir2/ on
   down.


4.  Peer-to-Peer / Metainfo

   Metainfo files, which describe ways to download a file over Peer-to-
   Peer networks or otherwise, are specified with the Link header



Bryan, et al.            Expires August 24, 2010                [Page 7]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   [draft-nottingham-http-link-header] and a relation type of
   "describedby" and a type parameter that indicates the MIME type of
   the metadata available at the URI.

   A brief Metalink server response with .torrent and .metalink:

   Link: <http://example.com/example.ext.torrent>; rel="describedby";
   type="application/x-bittorrent"
   Link: <http://example.com/example.ext.metalink>; rel="describedby";
   type="application/metalink4+xml"

   Metalink clients MAY support the use of metainfo files for
   downloading files.

4.1.  Metalink/XML Files

   Full Metalink/XML files for a given resource can be specified as
   shown in Section 4.  This is particularly useful for providing
   metadata such as cryptographic hashes of parts of a file, allowing a
   client to recover from partial errors (see Section 7.1.2).


5.  OpenPGP Signatures

   OpenPGP signatures are specified with the Link header
   [draft-nottingham-http-link-header] and a relation type of
   "describedby" and a type parameter of "application/pgp-signature".

   A brief Metalink server response with OpenPGP signature only:

   Link: <http://example.com/example.ext.asc>; rel="describedby";
   type="application/pgp-signature"

   Metalink clients MAY support the use of OpenPGP signatures.


6.  Cryptographic Hashes of Whole Files

   Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for
   files they describe with mirrors.  Mirror servers SHOULD as well.

   A brief Metalink server response with cryptographic hash:

   Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
   DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==






Bryan, et al.            Expires August 24, 2010                [Page 8]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


7.  Client / Server Multi-source Download Interaction

   Metalink clients begin a download with a standard HTTP [RFC2616] GET
   request to the Metalink server.  A Range limit is optional, not
   required.  Alternatively, Metalink clients can begin with a HEAD
   request to the Metalink server to discover mirrors via Link headers.
   After that, the client follows with a GET request to the desired
   mirrors.


   GET /distribution/example.ext HTTP/1.1
   Host: www.example.com

   The Metalink server responds with the data and these headers:

   HTTP/1.1 200 OK
   Accept-Ranges: bytes
   Content-Length: 14867603
   Content-Type: application/x-cd-image
   Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
   Link: <http://www2.example.com/example.ext>; rel="duplicate" pref=1
   Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
   Link: <http://example.com/example.ext.torrent>; rel="describedby";
   type="application/x-bittorrent"
   Link: <http://example.com/example.ext.metalink>; rel="describedby";
   type="application/metalink4+xml"
   Link: <http://example.com/example.ext.asc>; rel="describedby";
   type="application/pgp-signature"
   Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
   DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==

   From the Metalink server response the client learns some or all of
   the following metadata about the requested object, in addition to
   also starting to receive the object:

   o  Object size.
   o  ETag.
   o  Mirror profile link, which may describe the mirror's priority,
      whether it shares the ETag policy of the originating Metalink
      server, geographical location, and mirror depth.
   o  Peer-to-peer information.
   o  Metalink/XML, which can include partial file cryptographic hashes
      to repair a file.
   o  Digital signature.
   o  Instance Digest, which is the whole file cryptographic hash.

   (Alternatively, the client could have requested a HEAD only, and then
   skipped to making the following decisions on every available mirror



Bryan, et al.            Expires August 24, 2010                [Page 9]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   server found via the Link headers)

   If the object is large and gets delivered slower than expected then
   the Metalink client starts a number of parallel ranged downloads (one
   per selected mirror server other than the first) using mirrors
   provided by the Link header with "duplicate" relation type, using the
   location of the original GET request in the "Referer" header field.
   The size and number of ranges requested from each server is for the
   client to decide, based upon the performance observed from each
   server.  Further discussion of performance considerations is
   presented in Section 8.

   If no range limit was given in the original request then work from
   the tail of the object (the first request is still running and will
   eventually catch up), otherwise continue after the range requested in
   the first request.  If no Range was provided, the original connection
   must be terminated once all parts of the resource have been
   retrieved.  It is recommended that a HEAD request is undertaken
   first, so that the client can find out if there are any Link headers,
   and then Range-based requests are undertaken to the mirror servers as
   well as on the original connection.

   Preferred mirrors have coordinated ETags, as described in
   Section 3.3, and If-Match conditions based on the ETag SHOULD be used
   to quickly detect out-of-date mirrors by using the ETag from the
   Metalink server response.  If no indication of ETag syncronisation/
   knowledge is given then If-Match should not be used, and optimally
   there will be an Instance Digest in the mirror response which we can
   use to detect a mismatch early, and if not then a mismatch won't be
   detected until the completed object is verified.  Early file mismatch
   detection is described in detail in Section 7.1.1.

   One of the client requests to a mirror server:

   GET /example.ext HTTP/1.1
   Host: www2.example.com
   Range: bytes=7433802-
   If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
   Referer: http://www.example.com/distribution/example.ext

   The mirror servers respond with a 206 Partial Content HTTP status
   code and appropriate "Content-Length" and "Content Range" header
   fields.  The mirror server response, with data, to the above request:








Bryan, et al.            Expires August 24, 2010               [Page 10]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   HTTP/1.1 206 Partial Content
   Accept-Ranges: bytes
   Content-Length: 7433801
   Content-Range: bytes 7433802-14867602/14867603
   Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
   Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
   DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==

   If the first request was not Range limited then abort it by closing
   the connection when it catches up with the other parallel downloads
   of the same object.

   Downloads from mirrors that do not have the same file size as the
   Metalink server MUST be aborted.

   Once the download has completed, the Metalink client MUST verify the
   cryptographic hash of the file.

7.1.  Error Prevention, Detection, and Correction

   Error prevention, or early file mismatch detection, is possible
   before file transfers with the use of file sizes, ETags, and Instance
   Digests.  Error dectection requires Instance Digests, or
   cryptographic hashes, to determine after transfers if there has been
   an error.  Error correction, or download repair, is possible with
   partial file cryptographic hashes.

7.1.1.  Error Prevention (Early File Mismatch Detection)

   In HTTP terms, the requirement is that merging of ranges from
   multiple responses must be verified with a strong validator, which in
   this context is the same as either Instance Digest or a strong ETag.
   In most cases it is sufficient that the Metalink server provides
   mirrors and Instance Digest information, but operation will be more
   robust and efficient if the mirror servers do implement a
   synchronized ETag as well.  In fact, the emitted ETag may be
   implemented the same as the Instance Digest for simplicity, but there
   is no need to specify how the ETag is generated, just that it needs
   to be shared among the mirror servers.  If the mirror server provides
   neither synchronized ETag or Instance Digest, then early detection of
   mismatches is not possible unless file length also differs.  Finally,
   the error is still detectable, after the download has completed, when
   the merged response is verified.

   ETag can not be used for verifying the integrity of the received
   content.  But it is a guarantee issued by the Metalink server that
   the content is correct for that ETag.  And if the ETag given by the
   mirror server matches the ETag given by the master server, then we



Bryan, et al.            Expires August 24, 2010               [Page 11]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   have a chain of trust where the master server authorizes these
   responses as valid for that object.

   This guarantees that a mismatch will be detected by using only the
   synchronized ETag from a master server and mirror server, even
   alerted by the mirror servers themselves by responding with an error,
   preventing accidental merges of ranges from different versions of
   files with the same name.  This even includes many malicious attacks
   where the data on the mirror has been replaced by some other file,
   but not all.

   Synchronized ETag can not strictly protect against malicious attacks
   or server or network errors replacing content, but neither can
   Instance Digest on the mirror servers as the attacker most certainly
   can make the server seemingly respond with the expected Instance
   Digest even if the file contents have been modified, just as he can
   with ETag, and the same for various system failures also causing bad
   data to be returned.  The Metalink client has to rely on the Instance
   Digest returned by the Metalink master server in the first response
   for the verification of the downloaded object as a whole.

   If the mirror servers do return an Instance Digest, then that is a
   bonus, just as having them return the right set of Link headers is.
   The set of trusted mirrors doing that can be substituted as master
   servers accepting the initial request if one likes.

   The benefit of having slave mirror servers (those not trusted as
   masters) return Instance Digest is that the client then can detect
   mismatches early even if ETag is not used.  Both ETag and slave
   mirror Instance Digest do provide value, but just one is sufficient
   for early detection of mismatches.  If none is provided then early
   detection of mismatches is not possible unless the file length also
   differs, but the error is still detected when the merged response is
   verified.

7.1.2.  Error Correction

   Partial file cryptographic hashes can be used to detect errors during
   the download.  Metalink servers are not required to offer partial
   file cryptographic hashes, but they are encouraged to do so.

   If the object cryptographic hash does not match the Instance Digest
   then fetch the Metalink/XML as specified in Section 4.1, where
   partial file cryptographic hashes may be found, allowing detection of
   which server returned incorrect data.  If the Instance Digest
   computation does not match then the client needs to fetch the partial
   file cryptographic hashes, if available, and from there figure out
   what of the downloaded data can be recovered and what needs to be



Bryan, et al.            Expires August 24, 2010               [Page 12]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   fetched again.  If no partial cryptographic hashes are available,
   then the client MUST fetch the complete object from other mirrors.


8.  Multi-server Performance

   When opting to download simultaneously from multiple mirrors, there
   are a number of factors (both within and outside the influence of the
   client software) that are relevant to the performance achieved:

   o  The number of servers used simultaneously.
   o  The ability to pipeline sufficient or sufficiently large range
      requests to each server so as to avoid connections going idle.
   o  The ability to pipeline sufficiently few or sufficiently small
      range requests to servers so that all the servers finish their
      final chunks simultaneously.
   o  The ability to switch between mirrors dynamically so as to use the
      fastest mirrors at any moment in time

   Obviously we do not want to use too many simultaneous connections, or
   other traffic sharing a bottleneck link will be starved.  But at the
   same time, good performance requires that the client can
   simultaneously download from at least one fast mirror while exploring
   whether any other mirror is faster.  Based on laboratory experiments,
   we suggest a good default number of simultaneous connections is
   probably four, with three of these being used for the best three
   mirrors found so far, and one being used to evaluate whether any
   other mirror might offer better performance.

   The size of chunks chosen by the client should be sufficiently large
   that the chunk request headers and reponse headers represent neglible
   overhead, and sufficiently large that they can be pipelined
   effectively without needing a very high rate of chunk requests.  At
   the same time, the amount of time wasted waiting for the last chunk
   to download from the last server after all the other servers have
   finished should be minimized.  Thus we currently recommend that a
   chunk size of at least 10KBytes should be used.  If the file being
   transfered is very large, or the download speed very high, this can
   be increased to perhaps 1MByte.  As network bandwidths increase, we
   expect these numbers to increase appropriately, so that the time to
   transfer a chunk remains significantly larger than the latency of
   requesting a chunk from a server.


9.  IANA Considerations

   Accordingly, IANA has made the following registration to the Link
   Relation Type registry.



Bryan, et al.            Expires August 24, 2010               [Page 13]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   o Relation Name: duplicate

   o Description: Refers to a resource whose available representations
   are byte-for-byte identical with the corresponding representations of
   the context IRI.

   o Reference: This specification.

   o Notes: This relation is for static resources.  That is, an HTTP GET
   request on any duplicate will return the same representation.  It
   does not make sense for dynamic or POSTable resources and should not
   be used for them.


10.  Security Considerations

10.1.  URIs and IRIs

   Metalink clients handle URIs and IRIs.  See Section 7 of [RFC3986]
   and Section 8 of [RFC3987] for security considerations related to
   their handling and use.

10.2.  Spoofing

   There is potential for spoofing attacks where the attacker publishes
   Metalinks with false information.  In that case, this could deceive
   unaware downloaders that they are downloading a malicious or
   worthless file.  Also, malicious publishers could attempt a
   distributed denial of service attack by inserting unrelated URIs into
   Metalinks.

10.3.  Cryptographic Hashes

   Currently, some of the digest values defined in Instance Digests in
   HTTP [RFC3230] are considered insecure.  These include the whole
   Message Digest family of algorithms which are not suitable for
   cryptographically strong verification.  Malicious people could
   provide files that appear to be identical to another file because of
   a collision, i.e. the weak cryptographic hashes of the intended file
   and a substituted malicious file could match.

   If a Metalink contains whole file hashes as described in Section 6,
   it SHOULD include "sha-256" which is SHA-256, as specified in
   [FIPS-180-3], or stronger.  It MAY also include other hashes.







Bryan, et al.            Expires August 24, 2010               [Page 14]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


10.4.  Signing

   Metalinks should include digital signatures, as described in
   Section 5.

   Digital signatures provide authentication, message integrity, and
   non-repudiation with proof of origin.


11.  Normative References

   [FIPS-180-3]
              National Institute of Standards and Technology (NIST),
              "Secure Hash Standard (SHS)", FIPS PUB 180-3,
              October 2008.

   [ISO3166-1]
              International Organization for Standardization, "ISO 3166-
              1:2006.  Codes for the representation of names of
              countries and their subdivisions -- Part 1: Country
              codes", November 2006.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

   [RFC3230]  Mogul, J. and A. Van Hoff, "Instance Digests in HTTP",
              RFC 3230, January 2002.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66,
              RFC 3986, January 2005.

   [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
              Identifiers (IRIs)", RFC 3987, January 2005.

   [draft-bryan-metalink]
              Bryan, A., Ed., Tsujikawa, T., McNab, N., and P. Poeml,
              "The Metalink Download Description Format",
              draft-bryan-metalink-28 (work in progress), February 2010.

   [draft-nottingham-http-link-header]
              Nottingham, M., "Web Linking",
              draft-nottingham-http-link-header-07 (work in progress),
              January 2010.



Bryan, et al.            Expires August 24, 2010               [Page 15]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


Appendix A.  Acknowledgements and Contributors

   Thanks to the Metalink community, Mark Handley, Mark Nottingham,
   Daniel Stenberg, Tatsuhiro Tsujikawa, Peter Poeml, Matt Domsch, Micah
   Cowan, and David Morris.

   Support for simultaneous download from multiple mirrors is based upon
   work by Mark Handley and Javier Vela Diago, who also provided
   validation of the benefits of this approach.


Appendix B.  Comparisons to Similar Options

   [[ to be removed by the RFC editor before publication as an RFC. ]]

   This draft, compared to the Metalink/XML format
   [draft-bryan-metalink] :

   o  (+) Reuses existing HTTP standards without much new besides a Link
      Relation Type.  It's more of a collection/coordinated feature set.
   o  (?)  The existing standards don't seem to be widely implemented.
   o  (+) No XML dependency, except for Metalink/XML for partial file
      cryptographic hashes.
   o  (+) Existing Metalink/XML clients can be easily converted to
      support this as well.
   o  (+) Coordination of mirror servers is preferred, but not required.
      Coordination may be difficult or impossible unless you are in
      control of all servers on the mirror network.
   o  (-) Requires software or configuration changes to originating
      server.
   o  (-?)  Tied to HTTP, not as generic.  FTP/P2P clients won't be
      using it unless they also support HTTP, unlike Metalink/XML.
   o  (-) Requires server-side support.  Metalink/XML can be created by
      user (or server, but server component/changes not required).
   o  (-) Also, Metalink/XML files are easily mirrored on all servers.
      Even if usage in that case is not as transparent, it still gives
      access to users at all mirrors (FTP included) to all download
      information with no changes needed to the server.
   o  (-) Not portable/archivable/emailable.  Metalink/XML is used to
      import/export transfer queues.  Not as easy for search engines to
      index?
   o  (-) Not as rich metadata.
   o  (-) Not able to add multiple files to a download queue or create
      directory structure.







Bryan, et al.            Expires August 24, 2010               [Page 16]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


Appendix C.  Document History

   [[ to be removed by the RFC editor before publication as an RFC. ]]

   Known issues concerning this draft:
   o  Some organizations have many mirrors.  Should all be sent, or only
      a certain number?  All should be included in the Metalink/XML, if
      used.
   o  Would it make more sense to use qvalue-style policies to describe
      mirror priority, i.e. q=1.0 through q=0.0 ?
   o  Using Metalink/XML for partial file cryptographic hashes.  That
      adds XML dependency to apps for an important feature.  Is there a
      better method?
   o  Do we need an "official" MIME type for .torrent files or allow
      "application/x-bittorrent"?

   -15 : December 31, 2009.
   o  Update references and terminology.

   -14 : December 31, 2009.
   o  Baseline file hash: SHA-256.

   -13 : November 22, 2009.
   o  Metalink/XML for partial file cryptographic hashes.

   -12 : November 11, 2009.
   o  Clarifications.

   -11 : October 23, 2009.
   o  Mirror changes.

   -10 : October 15, 2009.
   o  Mirror coordination changes.

   -09 : October 12, 2009.
   o  Mirror location, coordination, and depth.
   o  Split HTTP Digest Algorithm Values Registration into
      draft-bryan-http-digest-algorithm-values-update.

   -08 : October 4, 2009.
   o  Clarifications.

   -07 : September 29, 2009.
   o  Preferred mirror servers.

   -06 : September 24, 2009.





Bryan, et al.            Expires August 24, 2010               [Page 17]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   o  Add Mismatch Detection, Error Recovery, and Digest Algorithm
      values.
   o  Remove Content-MD5 and Want-Digest.

   -05 : September 19, 2009.
   o  ETags, preferably matching the Instance Digests.

   -04 : September 17, 2009.
   o  Temporarily remove .torrent.

   -03 : September 16, 2009.
   o  Mention HEAD request, negotiate mirrors if Want-Digest is used.

   -02 : September 6, 2009.
   o  Content-MD5 for partial file cryptographic hashes.

   -01 : September 1, 2009.
   o  Link Relation Type Registration: "duplicate"

   -00 : August 24, 2009.
   o  Initial draft.


Authors' Addresses

   Anthony Bryan
   Pompano Beach, FL
   USA

   Email: anthonybryan@gmail.com
   URI:   http://www.metalinker.org


   Neil McNab

   Email: neil@nabber.org
   URI:   http://www.nabber.org


   Henrik Nordstrom

   Email: henrik@henriknordstrom.net
   URI:   http://www.henriknordstrom.net/








Bryan, et al.            Expires August 24, 2010               [Page 18]

Internet-Draft      Metalink/HTTP: Mirrors and Hashes      February 2010


   Alan Ford
   Roke Manor Research
   Old Salisbury Lane
   Romsey, Hampshire  SO51 0ZN
   UK

   Phone: +44 1794 833 465
   Email: alan.ford@roke.co.uk











































Bryan, et al.            Expires August 24, 2010               [Page 19]


Html markup produced by rfcmarkup 1.108, available from http://tools.ietf.org/tools/rfcmarkup/