IETF conneg working group                         Graham Klyne, editor
Internet draft                                5GM/Content Technologies
Category: Work-in-progress                              Larry Masinter
                                                     Xerox Corporation
                                                Expires: October December 1999

                 Identifying composite media features
               <draft-ietf-conneg-feature-hash-01.txt>
               <draft-ietf-conneg-feature-hash-02.txt>

Status of this memo

  This document is an Internet-Draft and is in full conformance with
  all provisions of Section 10 of RFC 2026.

  Internet-Drafts are working documents of the Internet Engineering
  Task Force (IETF), its areas, and its working groups.  Note that
  other groups may also distribute working documents as Internet-
  Drafts.

  Internet-Drafts are draft documents valid for a maximum of six
  months and may be updated, replaced, or obsoleted by other
  documents at any time.  It is inappropriate to use Internet- Drafts
  as reference material or to cite them other than as "work in
  progress."

  The list of current Internet-Drafts can be accessed at
  http://www.ietf.org/ietf/1id-abstracts.txt

  The list of Internet-Draft Shadow Directories can be accessed at
  http://www.ietf.org/shadow.html.

Copyright Notice

  Copyright (C) The Internet Society 1999.  All Rights Reserved.

Abstract

  In "A syntax for describing media feature sets" RFC 2533 [1], an expression format is presented for describing
  media feature capabilities as a combination of simple media feature
  tags [2].

  This document proposes describes an abbreviated format for a composite media
  feature set, based upon a hash of the feature expression describing
  that composite composite, or the URI of a resource containing the feature
  expression.

Internet Draft                    Identifying composite media features

Table of contents

  1. Introduction ............................................2
     1.1 Organization of this document                        2
     1.2 Terminology and document conventions                 3
     1.3 Discussion of this document                          3
  2. Motivation and goals ....................................4
  3. Composite feature representation ........................5
     3.1 Feature set hashed reference format                  5
       3.1.1 Hash value calculation                           6
       3.1.2 Base-32 value representation                     7
     3.2 Feature set URI reference format                     7                     8
     3.3 Resolving feature set references                     7                     9
       3.3.1 URI reference                                    8                                    9
       3.3.2 Inline feature set details                       9
     3.4 The birthday problem                                 9                       10
  4. Examples ................................................11
  5. Internationalization considerations .....................11
  6. Security considerations .................................11 .................................12
  7. Full copyright statement ................................12
  8. Acknowledgements ........................................12 ........................................13
  9. References ..............................................12 ..............................................13
  10. Authors' addresses .....................................14
  Appendix A: A The birthday problem ............................14
  Appendix B: Revision history ...............................14 ...............................16

1. Introduction

  In "A syntax for describing media feature sets" [1], an expression
  format is presented for describing media feature capabilities as a
  combination of simple media feature tags [2].

  This document proposes an abbreviated format for a composite media
  feature set, based upon a hash of the feature expression describing
  that composite.

1.1 Organization of this document

  Section 2 sets out somne some of the background and goals for feature set
  references.

  Section 3 preents presents a syntax for feature set references, and
  describes how they are related to feature set expressions.

  Section 4 discusses how feature set references are used in conction
  with feature set matching.

Internet Draft                    Identifying composite media features

1.2 Terminology and document conventions

  This section defines a number of terms and other document
  conventions, which are used with specific meaning in this memo.
  The terms are listed in alphabetical order.

  dereference
            the act of replacing a feature set reference with its
            corresponding feature set expression.  Also called
            "resolution".

  feature set
            some set of media features described by a media feature
            assertion, as described in "A syntax for describing media
            feature sets" [1].  (See that memo for a more formal
            definition of this term.)

  feature set expression
            a string that describes some feature set, formulated
            according to the rules in "A syntax for describing media
            feature sets" [1] (and possibly extended by other
            specifications).

  feature set reference
            a brief construct that references some feature set.
            (See also: "dereference".)

  feature set tag
            a name that conforms to the syntax of a feature tag [1]
            that is used to denote a feature set rather than a single
            feature.

  resolution
            (See "dereference").

  This specification uses syntax notation and conventions described
  in RFC2234 "Augmented BNF for Syntax Specifications: ABNF" [3].

       NOTE:  Comments like this provide additional nonessential
       information about the rationale behind this document.
       Such information is not needed for building a conformant
       implementation, but may help those who wish to understand
       the design in greater depth.

1.3 Discussion of this document

  Discussion of this document should take place on the content
  negotiation and media feature registration mailing list hosted by
  the Internet Mail Consortium (IMC).

Internet Draft                    Identifying composite media features
  Please send comments regarding this document to:

      ietf-medfree@imc.org

  To subscribe to this list, send a message with the body 'subscribe'
  to "ietf-medfree-request@imc.org".

  To see what has gone on before you subscribed, please see the
  mailing list archive at:

      http://www.imc.org/ietf-medfree/

2. Motivation and goals

  The range of media feature capabilities of a message handling
  system can be quite extensive, and the corresponding feature set
  expression [1] can reach a significant size.

  A requirement has been identified to allow recurring feature sets
  to be identified by a single reference value, which can be combined
  with other elements in a feature set expression.  It is anticipated
  that mechanisms will be provided that allow the recipient of such a
  feature set reference to discover the corresponding feature set
  expression.

  Thus, the goals for this proposal are:

  o  to provide an abbreviated form for referencing an arbitrary
     feature set expression.

  o  the meaning of (i.e. the corresponding feature set expression) a
     feature set reference should be independent of any particular
     mechanism that may be used to dereference it.

  o  to be able to verify whether a given feature set expression
     corresponds to some feature set reference without having to
     perform an explicit dereferencing operation (i.e. without
     incurring additional network traffic).

  o  for protocol processors that conform to [1] to be able to
     sensibly handle a feature set reference without explicit
     knowledge of its meaning (i.e. the introduction of feature set
     references should not break existing feature expression
     processors).

  o  to allow, but not require, some indication of how to dereference
     a feature set reference to be included in a feature set
     expression.

Internet Draft                    Identifying composite media features
       NOTE:  This proposal does not attempt to address the
       "override" or "default" problem.  (Also called
       "delegation", where a feature set may be referenced and
       selectively modified.)

3. Composite feature representation

  This specification hinges on three central ideas:

  o  the use of auxiliary predicates (introduced in [1]) to form the
     basis of a feature set reference, and

  o  the use of a token based on a hash function computed over the
     referenced feature set expression.

  o  the use of an expression containing a URI to indicate a mechanism an
     identifier and possibly a service for resolution of a composite
     feature set tag. set.

  A key reason to use a hash function to generate an identifier is to
  define a global name space without requiring a central naming
  authority.  New feature set tags can be introduced by any party
  following the appropriate rules of formulation, without reference
  to any centralized authority.

  Local resolution services may be needed to map feature set tags to
  their corresponding feature set expressions, but these are not able
  to vary the meaning of any given tag.  Failure of a resolution
  service to return the correct expression is detectable by a calling
  application, which should reject any incorrect value supplied.

  This memo also suggests that an expression containing a URI in the
  format '<URI>' may be used to suggest indicate a mechanism and location of
  a service to perform feature set resolution.

3.1 Feature set hashed reference format

  This specification introduces a special form of auxililiary auxiliary predicate
  name with the following syntax:

     fname       = "h." 1*HEXDIG 1*BASE32DIGIT
     BASE32DIGIT = DIGIT
                 / "A" / "B" / "C" / "D" / "E" / "F" / "G" / "H"
                 / "I" / "J" / "K" / "L" / "M" / "N" / "O" / "P"
                 / "Q" / "R" / "S" / "T" / "U" / "V"

  The sequence of hexadecimal base-32 digits is represents the value of a hash
  function calculated over the corresponding feature set expression (see next
  section), represented as a hexadecimal number.

Internet Draft                    Identifying composite media features
  (see following sections).  Note that the above syntax allows upper-
  or lower- case letters for base-32 digits (per RFC 2234 [3]).

  Thus, within a feature set expression, a hashed feature set
  reference would have the following form:

     (h.123456789abcdef0123456789abcdef0)

       NOTE:  Base64 representation (per MIME [4]) would be more
       compact (21 rather than 32 characters for the MD5 128-bit
       hash value), but an auxiliary predicate name is defined
       (by [1]) to have the same syntax as a feature tag, and
       the feature tag matching rules (per [2]) state that
       feature tag matching is case insensitive.

     (h.123456789abcdefghijklmnopq)

3.1.1 Hash value calculation

  The hash value is calculated using the MD5 algorithm [6] over the
  text of the referenced feature set expression subjected to certain
  normalizations.  The feature expression must conform to the syntax
  given in "A syntax for describing media feature sets" [1] for
  'filter': 'filter' in RFC 2533 [1]:

     filter = "(" filtercomp ")" *( ";" parameter )

  The steps for calculating a hash value are:

  1. Whitespace normalization:  all spaces, CR, LF, TAB and any other
     layout control characters that may be embedded in the feature
     expression string are removed (or ignored for the purpose of hash
     value computation).

  2. Case normalization:  all lower case letters in the feature
     expression, other than those contained within quoted strings, are
     converted to upper case.  That is, unquoted characters with
     values 97 to 122 (decimal) are changed to corresponding
     characters in the range 65 to 90.

  3. Hash computation:  the MD5 algorithm [6] algorithm, described in RFC 1321 [6],
     is applied to the normalized feature expression string.

  The result obtained in step 3 is a 128-bit number (16 octet) value that is
  converted to a hexadecimal base-32 representation to form the feature set
  reference.

       NOTE:  under some circumstances, removal of ALL
       whitespace may result in an invalid feature expression
       string.  This should not be a problem as this is done
       only for the purpose of calculating a hash value, and
       significantly different feature expressions are expected
       to differ in ways other than their whitespace.

       NOTE:  case normalization is deemed appropriate since
       feature tag and token matching is case insensitive.

Internet Draft                    Identifying composite media features

3.2 Feature set URI reference format

  This section introduces a new form of feature set predicate by
  extending the feature set syntax [1] as follows:

     filter =/ "<" URI ">" *( ";" parameter )

  where 'URI' is described by "Uniform Resource Identifiers (URI):
  Generic Syntax" [5].

  The meaning of this construct is defined

3.1.2 Base-32 value representation

  RFC 1321 [6] describes how to be the meaning of the
  expression in which '<URI>' calculate an MD5 hash value that is replaced by a copy
  sequence of the resource
  indicated by 'URI'.  The indicated resource 16 octets.  This is then required to be coded as a
  text value containing
  base-32 value, which is a valid feature set expression, NOT itself
  containing a '<URI>' reference.

  If a '<URI>' reference is used within a feature expression that
  defines sequence of base-32 digit characters.

  Each successive character in a hash reference, then the hash base-32 value is calculated over
  the expression obtained after represents 5
  successive bits of the resource has been subsituted. underlying octet sequence.  Thus, the following are examples each group
  of feature set expressions using
  URI references:

     <http://www.acme.com/widget-feature/modelT>

     (& (dpi=100) <http://www.acme.com/widget-feature/modelT> )

  This specification does not indicate:

  o  any specific URI schemes to be supported,

  o  any meaning if the resource cannot be accessed, 8 characters represents a sequence of if the 5 octets (40 bits):

                 1          2          3
      01234567 89012345 67890123 45678901 23456789
     +--------+--------+--------+--------+--------+
     |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >|
     +--------+--------+--------+--------+--------+
                                             <===> 8th character
                                       <====> 7th character
                                  <===> 6th character
                            <====> 5th character
                      <====> 4th character
                 <===> 3rd character
           <====> 2nd character
      <===> 1st character

  The value
     obtained does not correspond to some recognized format.

  These details must be (i.e. sequence of bits) represented by each base-32 digit
  character is indicated by the specification following table:
       "0"  0       "A"  10     "K"  20      "U"  30
       "1"  1       "B"  11     "L"  21      "V"  31
       "2"  2       "C"  12     "M"  22
       "3"  3       "D"  13     "N"  23
       "4"  4       "E"  14     "O"  24
       "5"  5       "F"  15     "P"  25
       "6"  6       "G"  16     "Q"  26
       "7"  7       "H"  17     "R"  27
       "8"  8       "I"  18     "S"  28
       "9"  9       "J"  19     "T"  29

  When encoding a base-32 value, each full group of any
  application or protocol that relies upon this interpretation 5 octets is
  represented by a sequence of an
  auxiliary feature predicate.

  If the URI uses 8 characters other than indicated above.  If a designated subset
  group of US-
  ASCII then those less than 5 octets remain after this, they are encoded
  using as many additional characters should as may be needed:  1, 2, 3 or 4
  octets are encoded by 2, 4, 5 or 7 characters respectively.  Any
  spare bits represented by the base-32 digit characters are selected
  to be zero.

  When decoding a base-32 value, the reverse mapping is applied:
  each full group of 8 characters codes a sequence of US-ASCII 5 octets.  A
  final group of 2, 4, 5 or 7 characters allowed codes a sequence of 1, 2, 3
  or 4 octets respectively.  Any spare bits represented by RFC 2396 [5].

3.3 Resolving feature set references

  This memo does not mandate any particular mechanism the final
  group of characters are discarded.

Internet Draft                    Identifying composite media features
  Thus, for
  defeferencing a feature set reference.  It 128-bit (16 octet) MD5 hash value, the first 15 octets
  are coded as 24 base 32 digit characters, and the final octet is expected that
  specific dereferencing mechanisms will
  coded by two characters.

       NOTE:  Base64 representation (per MIME [4]) would be specified more
       compact (21 rather than 26 characters for any
  application or protocol that uses them.

Internet Draft                    Identifying composite media features
  The following sections describe some ways that feature set
  dereferencing information may be incorporated into a feature set
  expression.  Both of these mechanisms are based on the MD5 128-bit
       hash value), but an auxiliary predicate definitions within a "where" clause [1].

  When a hash-based feature set reference name is used, conformance defined
       (by [1]) to have the
  hashing rules takes precedence over any other determination of same syntax as a feature tag, and
       the feature expression.  Any expression, however obtained, may tag matching rules (per [2]) state that
       feature tag matching is case insensitive.

       Base36 representation was considered (i.e. using all
       letters "A"-"Z") but was not be
  substituted for the hash-based reference unless it yields used because this would
       require extended precision multiplication and division
       operations to encode and decode the
  correct hash value.

3.3.1 values.

3.2 Feature set URI reference

  The two formats for format

  This section introduces a new form of feature set references described above may be
  combined predicate by defining
  extending the feature set syntax [1] as follows:

     filter =/ "<" URI ">" *( ";" parameter )

  where 'URI' is described by "Uniform Resource Identifiers (URI):
  Generic Syntax" [5].

  The meaning of a hash-based reference this construct is defined to be a
  URI-based reference.  For example:

     (& (dpi=100) (h.1234567890) )
     where
     (h.1234567890) :- <http://www.acme.com/widget-feature/modelT>
     end

  This indicates that the meaning of the hash-based form is contained
  expression in which '<URI>' is replaced by a copy of the resource whose URI is given.  In this case, an HTTP
  indicated by 'URI'.  The indicated resource
  retrieval is suggested.

  The hash required to be a
  text value containing a valid feature set expression, NOT itself
  containing a '<URI>' reference.

  If a '<URI>' reference is used within a feature expression that
  defines a hash reference, then the hash value is calculated over
  the feature set expression obtained by defererencing after the URI form expression.

       NOTE:  How a calling application processes resource has been subsituted.

  Thus, the URI is not
       specified here.  For URIs that following are URLs, one reasonable
       approach would be to use the URL scheme protocol to
       access the corresponding examples of feature set expression.  But
       other mechanisms might expressions using
  URI references:

     <http://www.acme.com/widget-feature/modelT>

     (& (dpi=100) <http://www.acme.com/widget-feature/modelT> )

  This specification does not indicate:

  o  any specific URI schemes to be used;  e.g. protocols developed
       by supported,

  o  any meaning if the IETF resource capability (RESCAP) working group
       [8].  In any case, any mechanism used must cannot be specified
       by an application that uses URI references in this way.

  When a hash-based feature set reference is resolved using a URI
  value, the retrieving program should use accessed, of if the feature expression
  thus value
     obtained only if it hashes does not correspond to the correct value. some recognized format.

Internet Draft                    Identifying composite media features

3.3.2 Inline feature set
  These details

  The feature set expression associated with a reference value may be
  specified directly in a "where" clause, using the auxiliary
  predicate definition syntax [1];  e.g.

     (& (dpi=100) (h.1234567890) )
     where
     (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) )
     end

  This form might must be used on request (where the request mechanism is
  defined indicated by the invoking specification of any
  application protocol), or when the
  originator believes the recipient may not understand the reference.

       NOTE:  viewed in isolation, this format does not have any
       obvious value, in protocol that the (h.xxx) relies upon this form of auxiliary
       predicate could a feature
  predicate.

  If the URI uses characters other than a designated subset of US-
  ASCII then those additional characters should be replaced represented by any arbitrary name.

       It is anticipated that this form might be used as a
       follow-up response
  sequence of US-ASCII characters allowed by RFC 2396 [5].

       NOTE:  the syntax above allows a '<URI>' reference to
       appear almost anywhere in a sequence along feature set expression.  In
       some contexts, the lines of:
          A> Capabilities are:
            (& (dpi=100) (h.1234567890) )
          B> Do not understand:
            (h.1234567890)
          A> Capabilities are:
            (& (dpi=100) (h.1234567890) )
            where
            (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) )
            end

  It is an error if appearance of these references may be
       restricted to specific locations (e.g. the inline right hand
       side of a auxiliary feature expression does not yield the
  hash value contained predicate, as indicated in auxiliary predicate name.

3.4 The birthday problem

       NOTE:  this entire
       section is commentary, and 3.3.1).  The specification of any application or
       protocol should fully describe any such restriction that
       it may impose.

3.3 Resolving feature set references

  This memo does not
       affect the mandate any particular mechanism for
  defeferencing a feature set reference specification in reference.  It is expected that
  specific dereferencing mechanisms will be specified for any
       way.
  application or protocol that uses them.

  The use of following sections describe some ways that feature set
  dereferencing information may be incorporated into a hash value to represent an arbitrary feature set is
  expression.  Both of these mechanisms are based on auxiliary
  predicate definitions within a presumption that no two distinct "where" clause [1].

  When a hashed feature sets will yield
  the same hash value.

  There set reference is clearly a small but distinct possibility that two
  different used, conformance to the
  hashing rules takes precedence over any other determination of the
  feature sets will indeed yield expression.  Any expression, however obtained, may not be
  substituted for the same hash value.

  We assume that hash-based reference unless it yields the
  correct hash function distributes hash values value.

3.3.1 URI reference

  The two formats for feature sets with even very small differences randomly and evenly
  through set references described above may be
  combined by defining the range meaning of 2^128 (approximately 3*10^38) possible values. a hashed reference to be a URI-
  based reference.  For example:

     (& (dpi=100) (h.1234567890) )
     where
     (h.1234567890) :- <http://www.acme.com/widget-feature/modelT>
     end

  This indicates that the meaning of the hash-based form is contained
  in the resource whose URI is given.  In this case, an HTTP resource
  retrieval is suggested.

Internet Draft                    Identifying composite media features
  This
  The hash value used is a fundamental property of a good digest algorithm like MD5.
  Thus, calculated over the chance that any two distinct feature set expressions
  yield expression
  obtained by defererencing the same hash is less than 1 in 10^38.  This URI form expression.

       IMPORTANT:  How a calling application processes a URI is negligible
  when compared with, say, the probability
       not specified here.

       For URIs that a receiving system
  will fail having received data conforming are URLs, one reasonable approach would be
       to a negotiated feature
  set.

  But when use the number of distinct feature sets in circulation
  increases, URL scheme protocol to access the probability of clashing hash values increases
  surprisingly.  This is illustrated
       corresponding feature set expression.  But other
       mechanisms might be used;  e.g. protocols developed by
       the "birthday paradox":
  given a random collection of just 23 people, there is a greater
  than even chance IETF resource capability (RESCAP) working group.  In
       any case, any mechanism used must be specified by an
       application or protocol that there exists some pair with the same birthay.
  This topic is discussed further uses URI references in sections 7.4 and 7.5 of Bruce
  Schneier's "Applied Cryptography" [7].

          Number of this
       way.

  When a hashed feature   Probability of two
          sets in use         sets with the same
                              hash value
          1                   0
          2                   3E-39
          10                  1E-37
          1E3                 1E-33
          1E6                 1E-27
          1E9                 1E-21
          1E12                1E-15
          1E15                1E-9
          1E18                1E-3

       The above probability computations are approximate, being
       performed set reference is resolved using logarithms of a Gamma function
       approximation by Lanczos [10].  The probability formula
       is 'P=1-(m!/((m-n)! m^n))', where 'm' is URI value,
  the total number
       of possible hash values (2^128) and 'n' is retrieving program should use the number of feature sets in use.

  If original feature set expressions are generated manually, or expression thus
  obtained only
  in response if it hashes to some manually constrained process, the total number
  of correct value.

3.3.2 Inline feature sets in circulation set details

  In this case, a reference is likely to remain very small resolved by including its definition
  inline in
  relation to the total number of possible hash values. an expression.

  The outcome of all this is:  assuming that the feature sets are
  manually generated, even taking account of the birthday paradox
  effect, the probability of incorrectly identifying a feature set
  using expression associated with a hash reference value is still negligibly small when compared with
  other possible failure modes.

Internet Draft                    Identifying composite media features may be
  specified directly in a "where" clause, using the auxiliary
  predicate definition syntax [1];  e.g.

     (& (dpi=100) (h.1234567890) )
     where
     (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) )
     end

  This form might be used on request (where the request mechanism is
  defined by the invoking application protocol), or when the
  originator believes the recipient may not understand the reference.

  It is an error if the inline feature expression does not yield the
  hash value contained in auxiliary predicate name.

       NOTE:  viewed in isolation, this format does not have any
       obvious value, in that the (h.xxx) form of auxiliary
       predicate could be replaced by any arbitrary name.

Internet Draft                    Identifying composite media features
       It is anticipated that this form might be used as a
       follow-up response in a sequence along the lines of:
          A> Capabilities are:
            (& (dpi=100) (h.1234567890) )
          B> Do not understand:
            (h.1234567890)
          A> Capabilities are:
            (& (dpi=100) (h.1234567890) )
            where
            (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) )
            end

4. Examples

  The following are some examples of feature set expressions
  containing feature set references:

     (& (dpi=100) (h.1234567890abcdef1234567890abcdef) (h.1234567890abcdefghijklmnop) )

     (& (dpi=100)
        <http://www.acme.com/widget-feature/modelT> )

     (& (dpi=100) (h.1234567890abcdef1234567890abcdef) (h.1234567890abcdefghijklmnop) )
     where
     (h.1234567890abcdef1234567890abcdef)
     (h.1234567890abcdefghijklmnop) :-
        <http://www.acme.com/widget-feature/modelT>
     end

5. Internationalization considerations

  Feature set expressions and URI strings are currently defined to
  consist of only characters from the US-ASCII repertoire [1,5];
  under these circumstances this specification is not impacted by
  internationalization considerations (other than any already
  applicable to URIs [5]).

  But, if future revisions of the feature set syntax permit non-US-
  ASCII characters (e.g. within quoted strings), then some canonical
  representation must be defined for the purposes of calculating hash
  values.  One choice might be to use a UTF-8 equivalent
  representation as the basis for calculating the feature set hash.
  Another choice might be to leave this as an application protocol
  issue (but this could lead to non-interoperable feature sets
  between different protocols).

Internet Draft                    Identifying composite media features
  Another conceivable issue is that of up-casing the feature
  expression in preparation for computing a hash value.  This does
  not apply to the content of strings so is not likely to be an
  issue.  But if changes are made that do permit non-US-ASCII
  characters in feature tags or token strings, consideration must be
  given to properly defining how case conversion is to be performed.

6. Security considerations

  For the most part, security considerations are the same as those
  that apply for capability identification in general [1,2,9].

  A possible added consideration is that use of a specific feature
  set tag may reveal more information about a system than is
  necessary for a transaction at hand.

Internet Draft                    Identifying composite media features

7. Full copyright statement

  Copyright (C) The Internet Society 1999.  All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain
  it or assist in its implementation may be prepared, copied,
  published and distributed, in whole or in part, without restriction
  of any kind, provided that the above copyright notice and this
  paragraph are included on all such copies and derivative works.
  However, this document itself may not be modified in any way, such
  as by removing the copyright notice or references to the Internet
  Society or other Internet organizations, except as needed for the
  purpose of developing Internet standards in which case the
  procedures for copyrights defined in the Internet Standards process
  must be followed, or as required to translate it into languages
  other than English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.

  This document and the information contained herein is provided on
  an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
  ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
  IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
  THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
  WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Internet Draft                    Identifying composite media features

8. Acknowledgements

  Much of the initial work for URI references to feature sets was
  provided by Bill Newman.  Some of the ideas here have been improved
  by early discussions with Martin Duerst, Al Gilman and Ted Hardie.

9. References

[1]  RFC 2533, "A syntax for describing media feature sets"
     Graham Klyne, 5GM/Content Technologies
     March 1999.

[2]  RFC 2506, "Media Feature Tag Registration Procedure"
     Koen Holtman, TUE
     Andrew Mutz, Hewlett-Packard
     Ted Hardie, Equinix
     March 1999.

Internet Draft                    Identifying composite media features

[3]  RFC 2234, "Augmented BNF for Syntax Specifications: ABNF"
     D. Crocker (editor), Internet Mail Consortium
     P. Overell, Demon Internet Ltd.
     November 1997.

[4]  RFC 2045, "Multipurpose Internet Mail Extensions (MIME)
     Part 1: Format of Internet message bodies"
     N. Freed, Innosoft
     N. Borenstein, First Virtual
     November 1996.

[5]  RFC 2396, "Uniform Resource Identifiers (URI): Generic Syntax",
     Tim Berners-Lee, World Wide Web Consortium/MIT
     Roy T. Fielding, University of California, Irvine
     Larry Masinter, Xerox PARC
     August 1998.

[6]  RFC 1321, "The MD5 Message-Digest Algorithm",
     R. Rivest, MIT Laboratory for Computer Science and RSA Data
     Security, Inc.,
     April 1992.

[7]  "Applied Cryptography"
     Bruce Schneier
     John Wiley and Sons, 1996 (second edition)
     ISBN 0-471-12845-7 (cloth)
     ISBN 0-471-11709-9 (paper)

Internet Draft                    Identifying composite media features

[8]  Resource capability protocol
     IETF RESCAP, work in progress
     (No details published as of March 1999.)

[9]  "Protocol-independent content negotiation framework"
     Graham Klyne, 5GM/Content Technologies
     Internet draft: <draft-ietf-conneg-requirements-02.txt>
     Work in progress, March 1999.

[10]

[9]  "Numerical Recipes"
     William H Press, Brian P Flannery, Saul A Teukolski and William T
     Vetterling
     Cambridge University Press (1986)
     ISBN 0 521 30811 9
     (The Gamma function approximation is presented presented in chapter 6 on
     "Special Functions".  There have been several later editions of
     this book published, so the chapter reference may change.)

10. Authors' addresses

  Graham Klyne
  5th Generation Messaging Ltd.    Content Technologies Ltd.
  5 Watlington Street              Forum 1, Station Road
  Nettlebed                        Theale
  Henley-on-Thames, RG9 5AB        Reading, RG7 4RA
  United Kingdom                   United Kingdom.
  Telephone: +44 1491 641 641      +44 118 930 1300
  Facsimile: +44 1491 641 611      +44 118 930 1301
  E-mail: GK@ACM.ORG

  Larry Masinter
  Xerox Corporation
  3333 Coyote Hill Road
  Palo Alto, CA 94304
  Facsimile: +1 650 812 4333
  EMail: masinter@parc.xerox.com
  http://www.parc.xerox.com/masinter

Appendix A The birthday problem

       NOTE:  this entire section is commentary, and does not
       affect the feature set reference specification in any
       way.

  The use of a hash value to represent an arbitrary feature set is
  based on a presumption that no two distinct feature sets will yield
  the same hash value.

  There is a small but distinct possibility that two different
  feature sets will indeed yield the same hash value.

Internet Draft                    Identifying composite media features
  We assume that the 128-bit hash function distributes hash values
  for feature sets, even those with very small differences, randomly
  and evenly through the range of 2^128 (approximately 3*10^38)
  possible values.  This is a fundamental property of a good digest
  algorithm like MD5.  Thus, the chance that any two distinct feature
  set expressions yield the same hash is less than 1 in 10^38.  This
  is negligible when compared with, say, the probability that a
  receiving system will fail having received data conforming to a
  negotiated feature set.

  But when the number of distinct feature sets in circulation
  increases, the probability of repeating a hash value increases
  surprisingly.  This is illustrated by the "birthday paradox":
  given a random collection of just 23 people, there is a greater
  than even chance that there exists some pair with the same birthay.
  This topic is discussed further in sections 7.4 and 7.5 of Bruce
  Schneier's "Applied Cryptography" [7].

  The table below shows the "birthday paradox" probabilities that at
  least one pair of feature sets has the same hash value for
  different numbers of feature sets in use.

          Number of feature   Probability of two
          sets in use         sets with the same
                              hash value
          1                   0
          2                   3E-39
          10                  1E-37
          1E3                 1E-33
          1E6                 1E-27
          1E9                 1E-21
          1E12                1E-15
          1E15                1E-9
          1E18                1E-3

       The above probability computations are approximate, being
       performed using logarithms of a Gamma function
       approximation by Lanczos [9].  The probability formula is
       'P=1-(m!/((m-n)! m^n))', where 'm' is the total number of
       possible hash values (2^128) and 'n' is the number of
       feature sets in chapter 6 on
     "Special Functions".  There have been several later editions use.

  If original feature set expressions are generated manually, or only
  in response to some manually constrained process, the total number
  of
     this book published, so feature sets in circulation is likely to remain very small in
  relation to the chapter reference may change.) total number of possible hash values.

Internet Draft                    Identifying composite media features

10. Authors' addresses

  Graham Klyne
  5th Generation Messaging Ltd.    Content Technologies Ltd.
  5 Watlington Street              Forum 1, Station Road
  Nettlebed                        Theale
  Henley-on-Thames, RG9 5AB        Reading, RG7 4RA
  United Kingdom                   United Kingdom.
  Telephone: +44 1491 641 641      +44 118 930 1300
  Facsimile: +44 1491 641 611      +44 118 930 1301
  E-mail: GK@ACM.ORG

  Larry Masinter
  Xerox Corporation
  3333 Coyote Hill Road
  Palo Alto, CA 94304
  Facsimile: +1 650 812 4333
  EMail: masinter@parc.xerox.com
  http://www.parc.xerox.com/masinter
  The outcome of all this is:  assuming that the feature sets are
  manually generated, even taking account of the birthday paradox
  effect, the probability of incorrectly identifying a feature set
  using a hash value is still negligibly small when compared with
  other possible failure modes.

Appendix A: B: Revision history

  [[[RFC editor:  please remove this section on publication]]]

  00a  10-Feb-1999  Initial draft.

  01a  16-Feb-1999  Added pointers to mailing list for discussion.

  01b  25-Mar-1999  Name all authors.  Add some terms to the glossary.
                    Expand on meaning of URI tag used as auxiliary
                    predicate name.  Update references.  Rework
                    section 3 to deal more evenly with both hash and
                    URI based feature set references.  State absolute
                    requirement for hash-based references to be
                    resolved to expressions that yield the correct
                    hash value.

  01c  06-Apr-1999  Define form of URI reference using new '<...>'
                    syntax, and adjust other text accordingly.

  01d  06-Apr-1999  Editorial revisions.  Include values in table of
                    probabilities for hash value clashes.  Remove
                    discussion of algebraic simplification of hash
                    references.  Correct syntax of some examples.

  02a  16-Jun-1999  Move birthday problem to an appendix.  Remove
                    RESCAP citation.  Use base-32 to represent feature
                    hashes;  describe base-32 encoding.

  02b  16-Jun-1999  Add note that the <URI> form of feature reference
                    may not be allowed at arbitrary locations in all
                    contexts.