draft-ietf-conneg-feature-hash-01.txt   draft-ietf-conneg-feature-hash-02.txt 
IETF conneg working group Graham Klyne, editor IETF conneg working group Graham Klyne, editor
Internet draft 5GM/Content Technologies Internet draft 5GM/Content Technologies
Category: Work-in-progress Larry Masinter Category: Work-in-progress Larry Masinter
Xerox Corporation Xerox Corporation
Expires: October 1999 Expires: December 1999
Identifying composite media features Identifying composite media features
<draft-ietf-conneg-feature-hash-01.txt> <draft-ietf-conneg-feature-hash-02.txt>
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026. all provisions of Section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 40 skipping to change at page 1, line 39
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society 1999. All Rights Reserved. Copyright (C) The Internet Society 1999. All Rights Reserved.
Abstract Abstract
In "A syntax for describing media feature sets" [1], an expression In RFC 2533 [1], an expression format is presented for describing
format is presented for describing media feature capabilities as a media feature capabilities as a combination of simple media feature
combination of simple media feature tags [2]. tags [2].
This document proposes an abbreviated format for a composite media This document describes an abbreviated format for a composite media
feature set, based upon a hash of the feature expression describing feature set, based upon a hash of the feature expression describing
that composite or the URI of a resource containing the feature that composite, or the URI of a resource containing the feature
expression. expression.
Internet Draft Identifying composite media features Internet Draft Identifying composite media features
Table of contents Table of contents
1. Introduction ............................................2 1. Introduction ............................................2
1.1 Organization of this document 2 1.1 Organization of this document 2
1.2 Terminology and document conventions 3 1.2 Terminology and document conventions 3
1.3 Discussion of this document 3 1.3 Discussion of this document 3
2. Motivation and goals ....................................4 2. Motivation and goals ....................................4
3. Composite feature representation ........................5 3. Composite feature representation ........................5
3.1 Feature set hashed reference format 5 3.1 Feature set hashed reference format 5
3.1.1 Hash value calculation 6 3.1.1 Hash value calculation 6
3.2 Feature set URI reference format 7 3.1.2 Base-32 value representation 7
3.3 Resolving feature set references 7 3.2 Feature set URI reference format 8
3.3.1 URI reference 8 3.3 Resolving feature set references 9
3.3.2 Inline feature set details 9 3.3.1 URI reference 9
3.4 The birthday problem 9 3.3.2 Inline feature set details 10
4. Examples ................................................11 4. Examples ................................................11
5. Internationalization considerations .....................11 5. Internationalization considerations .....................11
6. Security considerations .................................11 6. Security considerations .................................12
7. Full copyright statement ................................12 7. Full copyright statement ................................12
8. Acknowledgements ........................................12 8. Acknowledgements ........................................13
9. References ..............................................12 9. References ..............................................13
10. Authors' addresses .....................................14 10. Authors' addresses .....................................14
Appendix A: Revision history ...............................14 Appendix A The birthday problem ............................14
Appendix B: Revision history ...............................16
1. Introduction 1. Introduction
In "A syntax for describing media feature sets" [1], an expression In "A syntax for describing media feature sets" [1], an expression
format is presented for describing media feature capabilities as a format is presented for describing media feature capabilities as a
combination of simple media feature tags [2]. combination of simple media feature tags [2].
This document proposes an abbreviated format for a composite media This document proposes an abbreviated format for a composite media
feature set, based upon a hash of the feature expression describing feature set, based upon a hash of the feature expression describing
that composite. that composite.
1.1 Organization of this document 1.1 Organization of this document
Section 2 sets out somne of the background and goals for feature Section 2 sets out some of the background and goals for feature set
set references. references.
Section 3 preents a syntax for feature set references, and Section 3 presents a syntax for feature set references, and
describes how they are related to feature set expressions. describes how they are related to feature set expressions.
Section 4 discusses how feature set references are used in conction
with feature set matching.
Internet Draft Identifying composite media features Internet Draft Identifying composite media features
1.2 Terminology and document conventions 1.2 Terminology and document conventions
This section defines a number of terms and other document This section defines a number of terms and other document
conventions, which are used with specific meaning in this memo. conventions, which are used with specific meaning in this memo.
The terms are listed in alphabetical order. The terms are listed in alphabetical order.
dereference dereference
the act of replacing a feature set reference with its the act of replacing a feature set reference with its
skipping to change at page 5, line 21 skipping to change at page 5, line 21
3. Composite feature representation 3. Composite feature representation
This specification hinges on three central ideas: This specification hinges on three central ideas:
o the use of auxiliary predicates (introduced in [1]) to form the o the use of auxiliary predicates (introduced in [1]) to form the
basis of a feature set reference, and basis of a feature set reference, and
o the use of a token based on a hash function computed over the o the use of a token based on a hash function computed over the
referenced feature set expression. referenced feature set expression.
o the use of an expression containing a URI to indicate a mechanism o the use of an expression containing a URI to indicate an
and service for resolution of a feature set tag. identifier and possibly a service for resolution of a composite
feature set.
A key reason to use a hash function to generate an identifier is to A key reason to use a hash function to generate an identifier is to
define a global name space without requiring a central naming define a global name space without requiring a central naming
authority. New feature set tags can be introduced by any party authority. New feature set tags can be introduced by any party
following the appropriate rules of formulation, without reference following the appropriate rules of formulation, without reference
to any centralized authority. to any centralized authority.
Local resolution services may be needed to map feature set tags to Local resolution services may be needed to map feature set tags to
their corresponding feature set expressions, but these are not able their corresponding feature set expressions, but these are not able
to vary the meaning of any given tag. Failure of a resolution to vary the meaning of any given tag. Failure of a resolution
service to return the correct expression is detectable by a calling service to return the correct expression is detectable by a calling
application, which should reject any incorrect value supplied. application, which should reject any incorrect value supplied.
This memo also suggests that an expression containing a URI in the This memo also suggests that an expression containing a URI in the
format '<URI>' may be used to suggest a mechanism and location of a format '<URI>' may be used to indicate a mechanism and location of
service to perform feature set resolution. a service to perform feature set resolution.
3.1 Feature set hashed reference format 3.1 Feature set hashed reference format
This specification introduces a special form of auxililiary This specification introduces a special form of auxiliary predicate
predicate name with the following syntax: name with the following syntax:
fname = "h." 1*HEXDIG fname = "h." 1*BASE32DIGIT
BASE32DIGIT = DIGIT
/ "A" / "B" / "C" / "D" / "E" / "F" / "G" / "H"
/ "I" / "J" / "K" / "L" / "M" / "N" / "O" / "P"
/ "Q" / "R" / "S" / "T" / "U" / "V"
The sequence of hexadecimal digits is the value of a hash function The sequence of base-32 digits represents the value of a hash
calculated over the corresponding feature set expression (see next function calculated over the corresponding feature set expression
section), represented as a hexadecimal number.
Internet Draft Identifying composite media features Internet Draft Identifying composite media features
Thus, within a feature set expression, a feature set reference (see following sections). Note that the above syntax allows upper-
would have the following form: or lower- case letters for base-32 digits (per RFC 2234 [3]).
(h.123456789abcdef0123456789abcdef0) Thus, within a feature set expression, a hashed feature set
reference would have the following form:
NOTE: Base64 representation (per MIME [4]) would be more (h.123456789abcdefghijklmnopq)
compact (21 rather than 32 characters for the MD5 128-bit
hash value), but an auxiliary predicate name is defined
(by [1]) to have the same syntax as a feature tag, and
the feature tag matching rules (per [2]) state that
feature tag matching is case insensitive.
3.1.1 Hash value calculation 3.1.1 Hash value calculation
The hash value is calculated using the MD5 algorithm [6] over the The hash value is calculated using the MD5 algorithm [6] over the
text of the referenced feature set expression subjected to certain text of the referenced feature set expression subjected to certain
normalizations. The feature expression must conform to the syntax normalizations. The feature expression must conform to the syntax
given in "A syntax for describing media feature sets" [1] for given for 'filter' in RFC 2533 [1]:
'filter':
filter = "(" filtercomp ")" *( ";" parameter ) filter = "(" filtercomp ")" *( ";" parameter )
The steps for calculating a hash value are: The steps for calculating a hash value are:
1. Whitespace normalization: all spaces, CR, LF, TAB and any other 1. Whitespace normalization: all spaces, CR, LF, TAB and any other
layout control characters that may be embedded in the feature layout control characters that may be embedded in the feature
expression string are removed (or ignored for the purpose of hash expression string are removed (or ignored for the purpose of hash
value computation). value computation).
2. Case normalization: all lower case letters in the feature 2. Case normalization: all lower case letters in the feature
expression, other than those contained within quoted strings, are expression, other than those contained within quoted strings, are
converted to upper case. That is, unquoted characters with converted to upper case. That is, unquoted characters with
values 97 to 122 (decimal) are changed to corresponding values 97 to 122 (decimal) are changed to corresponding
characters in the range 65 to 90. characters in the range 65 to 90.
3. Hash computation: the MD5 algorithm [6] is applied to the 3. Hash computation: the MD5 algorithm, described in RFC 1321 [6],
normalized feature expression string. is applied to the normalized feature expression string.
The result obtained in step 3 is a 128-bit number that is converted The result obtained in step 3 is a 128-bit (16 octet) value that is
to a hexadecimal representation to form the feature set reference. converted to a base-32 representation to form the feature set
reference.
NOTE: under some circumstances, removal of ALL NOTE: under some circumstances, removal of ALL
whitespace may result in an invalid feature expression whitespace may result in an invalid feature expression
string. This should not be a problem as significantly string. This should not be a problem as this is done
different feature expressions are expected to differ in only for the purpose of calculating a hash value, and
ways other than their whitespace. significantly different feature expressions are expected
to differ in ways other than their whitespace.
NOTE: case normalization is deemed appropriate since NOTE: case normalization is deemed appropriate since
feature tag and token matching is case insensitive. feature tag and token matching is case insensitive.
Internet Draft Identifying composite media features Internet Draft Identifying composite media features
3.1.2 Base-32 value representation
RFC 1321 [6] describes how to calculate an MD5 hash value that is a
sequence of 16 octets. This is then required to be coded as a
base-32 value, which is a sequence of base-32 digit characters.
Each successive character in a base-32 value represents 5
successive bits of the underlying octet sequence. Thus, each group
of 8 characters represents a sequence of 5 octets (40 bits):
1 2 3
01234567 89012345 67890123 45678901 23456789
+--------+--------+--------+--------+--------+
|< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >|
+--------+--------+--------+--------+--------+
<===> 8th character
<====> 7th character
<===> 6th character
<====> 5th character
<====> 4th character
<===> 3rd character
<====> 2nd character
<===> 1st character
The value (i.e. sequence of bits) represented by each base-32 digit
character is indicated by the following table:
"0" 0 "A" 10 "K" 20 "U" 30
"1" 1 "B" 11 "L" 21 "V" 31
"2" 2 "C" 12 "M" 22
"3" 3 "D" 13 "N" 23
"4" 4 "E" 14 "O" 24
"5" 5 "F" 15 "P" 25
"6" 6 "G" 16 "Q" 26
"7" 7 "H" 17 "R" 27
"8" 8 "I" 18 "S" 28
"9" 9 "J" 19 "T" 29
When encoding a base-32 value, each full group of 5 octets is
represented by a sequence of 8 characters indicated above. If a
group of less than 5 octets remain after this, they are encoded
using as many additional characters as may be needed: 1, 2, 3 or 4
octets are encoded by 2, 4, 5 or 7 characters respectively. Any
spare bits represented by the base-32 digit characters are selected
to be zero.
When decoding a base-32 value, the reverse mapping is applied:
each full group of 8 characters codes a sequence of 5 octets. A
final group of 2, 4, 5 or 7 characters codes a sequence of 1, 2, 3
or 4 octets respectively. Any spare bits represented by the final
group of characters are discarded.
Internet Draft Identifying composite media features
Thus, for a 128-bit (16 octet) MD5 hash value, the first 15 octets
are coded as 24 base 32 digit characters, and the final octet is
coded by two characters.
NOTE: Base64 representation (per MIME [4]) would be more
compact (21 rather than 26 characters for the MD5 128-bit
hash value), but an auxiliary predicate name is defined
(by [1]) to have the same syntax as a feature tag, and
the feature tag matching rules (per [2]) state that
feature tag matching is case insensitive.
Base36 representation was considered (i.e. using all
letters "A"-"Z") but was not used because this would
require extended precision multiplication and division
operations to encode and decode the hash values.
3.2 Feature set URI reference format 3.2 Feature set URI reference format
This section introduces a new form of feature set predicate by This section introduces a new form of feature set predicate by
extending the feature set syntax [1] as follows: extending the feature set syntax [1] as follows:
filter =/ "<" URI ">" *( ";" parameter ) filter =/ "<" URI ">" *( ";" parameter )
where 'URI' is described by "Uniform Resource Identifiers (URI): where 'URI' is described by "Uniform Resource Identifiers (URI):
Generic Syntax" [5]. Generic Syntax" [5].
skipping to change at page 7, line 41 skipping to change at page 9, line 5
(& (dpi=100) <http://www.acme.com/widget-feature/modelT> ) (& (dpi=100) <http://www.acme.com/widget-feature/modelT> )
This specification does not indicate: This specification does not indicate:
o any specific URI schemes to be supported, o any specific URI schemes to be supported,
o any meaning if the resource cannot be accessed, of if the value o any meaning if the resource cannot be accessed, of if the value
obtained does not correspond to some recognized format. obtained does not correspond to some recognized format.
Internet Draft Identifying composite media features
These details must be indicated by the specification of any These details must be indicated by the specification of any
application or protocol that relies upon this interpretation of an application or protocol that relies upon this form of a feature
auxiliary feature predicate. predicate.
If the URI uses characters other than a designated subset of US- If the URI uses characters other than a designated subset of US-
ASCII then those additional characters should be represented by a ASCII then those additional characters should be represented by a
sequence of US-ASCII characters allowed by RFC 2396 [5]. sequence of US-ASCII characters allowed by RFC 2396 [5].
NOTE: the syntax above allows a '<URI>' reference to
appear almost anywhere in a feature set expression. In
some contexts, the appearance of these references may be
restricted to specific locations (e.g. the right hand
side of a auxiliary feature predicate, as indicated in
section 3.3.1). The specification of any application or
protocol should fully describe any such restriction that
it may impose.
3.3 Resolving feature set references 3.3 Resolving feature set references
This memo does not mandate any particular mechanism for This memo does not mandate any particular mechanism for
defeferencing a feature set reference. It is expected that defeferencing a feature set reference. It is expected that
specific dereferencing mechanisms will be specified for any specific dereferencing mechanisms will be specified for any
application or protocol that uses them. application or protocol that uses them.
Internet Draft Identifying composite media features
The following sections describe some ways that feature set The following sections describe some ways that feature set
dereferencing information may be incorporated into a feature set dereferencing information may be incorporated into a feature set
expression. Both of these mechanisms are based on auxiliary expression. Both of these mechanisms are based on auxiliary
predicate definitions within a "where" clause [1]. predicate definitions within a "where" clause [1].
When a hash-based feature set reference is used, conformance to the When a hashed feature set reference is used, conformance to the
hashing rules takes precedence over any other determination of the hashing rules takes precedence over any other determination of the
feature expression. Any expression, however obtained, may not be feature expression. Any expression, however obtained, may not be
substituted for the hash-based reference unless it yields the substituted for the hash-based reference unless it yields the
correct hash value. correct hash value.
3.3.1 URI reference 3.3.1 URI reference
The two formats for feature set references described above may be The two formats for feature set references described above may be
combined by defining the meaning of a hash-based reference to be a combined by defining the meaning of a hashed reference to be a URI-
URI-based reference. For example: based reference. For example:
(& (dpi=100) (h.1234567890) ) (& (dpi=100) (h.1234567890) )
where where
(h.1234567890) :- <http://www.acme.com/widget-feature/modelT> (h.1234567890) :- <http://www.acme.com/widget-feature/modelT>
end end
This indicates that the meaning of the hash-based form is contained This indicates that the meaning of the hash-based form is contained
in the resource whose URI is given. In this case, an HTTP resource in the resource whose URI is given. In this case, an HTTP resource
retrieval is suggested. retrieval is suggested.
Internet Draft Identifying composite media features
The hash value used is calculated over the feature set expression The hash value used is calculated over the feature set expression
obtained by defererencing the URI form expression. obtained by defererencing the URI form expression.
NOTE: How a calling application processes the URI is not IMPORTANT: How a calling application processes a URI is
specified here. For URIs that are URLs, one reasonable not specified here.
approach would be to use the URL scheme protocol to
access the corresponding feature set expression. But
other mechanisms might be used; e.g. protocols developed
by the IETF resource capability (RESCAP) working group
[8]. In any case, any mechanism used must be specified
by an application that uses URI references in this way.
When a hash-based feature set reference is resolved using a URI For URIs that are URLs, one reasonable approach would be
value, the retrieving program should use the feature expression to use the URL scheme protocol to access the
thus obtained only if it hashes to the correct value. corresponding feature set expression. But other
mechanisms might be used; e.g. protocols developed by
the IETF resource capability (RESCAP) working group. In
any case, any mechanism used must be specified by an
application or protocol that uses URI references in this
way.
Internet Draft Identifying composite media features When a hashed feature set reference is resolved using a URI value,
the retrieving program should use the feature expression thus
obtained only if it hashes to the correct value.
3.3.2 Inline feature set details 3.3.2 Inline feature set details
In this case, a reference is resolved by including its definition
inline in an expression.
The feature set expression associated with a reference value may be The feature set expression associated with a reference value may be
specified directly in a "where" clause, using the auxiliary specified directly in a "where" clause, using the auxiliary
predicate definition syntax [1]; e.g. predicate definition syntax [1]; e.g.
(& (dpi=100) (h.1234567890) ) (& (dpi=100) (h.1234567890) )
where where
(h.1234567890) :- (& (pix-x<=200) (pix-y<=150) ) (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) )
end end
This form might be used on request (where the request mechanism is This form might be used on request (where the request mechanism is
defined by the invoking application protocol), or when the defined by the invoking application protocol), or when the
originator believes the recipient may not understand the reference. originator believes the recipient may not understand the reference.
It is an error if the inline feature expression does not yield the
hash value contained in auxiliary predicate name.
NOTE: viewed in isolation, this format does not have any NOTE: viewed in isolation, this format does not have any
obvious value, in that the (h.xxx) form of auxiliary obvious value, in that the (h.xxx) form of auxiliary
predicate could be replaced by any arbitrary name. predicate could be replaced by any arbitrary name.
Internet Draft Identifying composite media features
It is anticipated that this form might be used as a It is anticipated that this form might be used as a
follow-up response in a sequence along the lines of: follow-up response in a sequence along the lines of:
A> Capabilities are: A> Capabilities are:
(& (dpi=100) (h.1234567890) ) (& (dpi=100) (h.1234567890) )
B> Do not understand: B> Do not understand:
(h.1234567890) (h.1234567890)
A> Capabilities are: A> Capabilities are:
(& (dpi=100) (h.1234567890) ) (& (dpi=100) (h.1234567890) )
where where
(h.1234567890) :- (& (pix-x<=200) (pix-y<=150) ) (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) )
end end
It is an error if the inline feature expression does not yield the
hash value contained in auxiliary predicate name.
3.4 The birthday problem
NOTE: this entire section is commentary, and does not
affect the feature set reference specification in any
way.
The use of a hash value to represent an arbitrary feature set is
based on a presumption that no two distinct feature sets will yield
the same hash value.
There is clearly a small but distinct possibility that two
different feature sets will indeed yield the same hash value.
We assume that the hash function distributes hash values for
feature sets with even very small differences randomly and evenly
through the range of 2^128 (approximately 3*10^38) possible values.
Internet Draft Identifying composite media features
This is a fundamental property of a good digest algorithm like MD5.
Thus, the chance that any two distinct feature set expressions
yield the same hash is less than 1 in 10^38. This is negligible
when compared with, say, the probability that a receiving system
will fail having received data conforming to a negotiated feature
set.
But when the number of distinct feature sets in circulation
increases, the probability of clashing hash values increases
surprisingly. This is illustrated by the "birthday paradox":
given a random collection of just 23 people, there is a greater
than even chance that there exists some pair with the same birthay.
This topic is discussed further in sections 7.4 and 7.5 of Bruce
Schneier's "Applied Cryptography" [7].
Number of feature Probability of two
sets in use sets with the same
hash value
1 0
2 3E-39
10 1E-37
1E3 1E-33
1E6 1E-27
1E9 1E-21
1E12 1E-15
1E15 1E-9
1E18 1E-3
The above probability computations are approximate, being
performed using logarithms of a Gamma function
approximation by Lanczos [10]. The probability formula
is 'P=1-(m!/((m-n)! m^n))', where 'm' is the total number
of possible hash values (2^128) and 'n' is the number of
feature sets in use.
If original feature set expressions are generated manually, or only
in response to some manually constrained process, the total number
of feature sets in circulation is likely to remain very small in
relation to the total number of possible hash values.
The outcome of all this is: assuming that the feature sets are
manually generated, even taking account of the birthday paradox
effect, the probability of incorrectly identifying a feature set
using a hash value is still negligibly small when compared with
other possible failure modes.
Internet Draft Identifying composite media features
4. Examples 4. Examples
The following are some examples of feature set expressions The following are some examples of feature set expressions
containing feature set references: containing feature set references:
(& (dpi=100) (h.1234567890abcdef1234567890abcdef) ) (& (dpi=100) (h.1234567890abcdefghijklmnop) )
(& (dpi=100) (& (dpi=100)
<http://www.acme.com/widget-feature/modelT> ) <http://www.acme.com/widget-feature/modelT> )
(& (dpi=100) (h.1234567890abcdef1234567890abcdef) ) (& (dpi=100) (h.1234567890abcdefghijklmnop) )
where where
(h.1234567890abcdef1234567890abcdef) :- (h.1234567890abcdefghijklmnop) :-
<http://www.acme.com/widget-feature/modelT> <http://www.acme.com/widget-feature/modelT>
end end
5. Internationalization considerations 5. Internationalization considerations
Feature set expressions and URI strings are currently defined to Feature set expressions and URI strings are currently defined to
consist of only characters from the US-ASCII repertoire [1,5]; consist of only characters from the US-ASCII repertoire [1,5];
under these circumstances this specification is not impacted by under these circumstances this specification is not impacted by
internationalization considerations (other than any already internationalization considerations (other than any already
applicable to URIs [5]). applicable to URIs [5]).
But, if future revisions of the feature set syntax permit non-US- But, if future revisions of the feature set syntax permit non-US-
ASCII characters (e.g. within quoted strings), then some canonical ASCII characters (e.g. within quoted strings), then some canonical
representation must be defined for the purposes of calculating hash representation must be defined for the purposes of calculating hash
values. One choice might be to use a UTF-8 equivalent values. One choice might be to use a UTF-8 equivalent
representation as the basis for calculating the feature set hash. representation as the basis for calculating the feature set hash.
Another choice might be to leave this as an application protocol Another choice might be to leave this as an application protocol
issue (but this could lead to non-interoperable feature sets issue (but this could lead to non-interoperable feature sets
between different protocols). between different protocols).
Internet Draft Identifying composite media features
Another conceivable issue is that of up-casing the feature Another conceivable issue is that of up-casing the feature
expression in preparation for computing a hash value. This does expression in preparation for computing a hash value. This does
not apply to the content of strings so is not likely to be an not apply to the content of strings so is not likely to be an
issue. But if changes are made that do permit non-US-ASCII issue. But if changes are made that do permit non-US-ASCII
characters in feature tags or token strings, consideration must be characters in feature tags or token strings, consideration must be
given to properly defining how case conversion is to be performed. given to properly defining how case conversion is to be performed.
6. Security considerations 6. Security considerations
For the most part, security considerations are the same as those For the most part, security considerations are the same as those
that apply for capability identification in general [1,2,9]. that apply for capability identification in general [1,2,9].
A possible added consideration is that use of a specific feature A possible added consideration is that use of a specific feature
set tag may reveal more information about a system than is set tag may reveal more information about a system than is
necessary for a transaction at hand. necessary for a transaction at hand.
Internet Draft Identifying composite media features
7. Full copyright statement 7. Full copyright statement
Copyright (C) The Internet Society 1999. All Rights Reserved. Copyright (C) The Internet Society 1999. All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain others, and derivative works that comment on or otherwise explain
it or assist in its implementation may be prepared, copied, it or assist in its implementation may be prepared, copied,
published and distributed, in whole or in part, without restriction published and distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice and this of any kind, provided that the above copyright notice and this
paragraph are included on all such copies and derivative works. paragraph are included on all such copies and derivative works.
skipping to change at page 12, line 35 skipping to change at page 13, line 5
The limited permissions granted above are perpetual and will not be The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns. revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on This document and the information contained herein is provided on
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Internet Draft Identifying composite media features
8. Acknowledgements 8. Acknowledgements
Much of the initial work for URI references to feature sets was Much of the initial work for URI references to feature sets was
provided by Bill Newman. Some of the ideas here have been improved provided by Bill Newman. Some of the ideas here have been improved
by early discussions with Martin Duerst, Al Gilman and Ted Hardie. by early discussions with Martin Duerst, Al Gilman and Ted Hardie.
9. References 9. References
[1] RFC 2533, "A syntax for describing media feature sets" [1] RFC 2533, "A syntax for describing media feature sets"
Graham Klyne, 5GM/Content Technologies Graham Klyne, 5GM/Content Technologies
March 1999. March 1999.
[2] RFC 2506, "Media Feature Tag Registration Procedure" [2] RFC 2506, "Media Feature Tag Registration Procedure"
Koen Holtman, TUE Koen Holtman, TUE
Andrew Mutz, Hewlett-Packard Andrew Mutz, Hewlett-Packard
Ted Hardie, Equinix Ted Hardie, Equinix
March 1999. March 1999.
Internet Draft Identifying composite media features
[3] RFC 2234, "Augmented BNF for Syntax Specifications: ABNF" [3] RFC 2234, "Augmented BNF for Syntax Specifications: ABNF"
D. Crocker (editor), Internet Mail Consortium D. Crocker (editor), Internet Mail Consortium
P. Overell, Demon Internet Ltd. P. Overell, Demon Internet Ltd.
November 1997. November 1997.
[4] RFC 2045, "Multipurpose Internet Mail Extensions (MIME) [4] RFC 2045, "Multipurpose Internet Mail Extensions (MIME)
Part 1: Format of Internet message bodies" Part 1: Format of Internet message bodies"
N. Freed, Innosoft N. Freed, Innosoft
N. Borenstein, First Virtual N. Borenstein, First Virtual
November 1996. November 1996.
skipping to change at page 13, line 35 skipping to change at page 14, line 5
R. Rivest, MIT Laboratory for Computer Science and RSA Data R. Rivest, MIT Laboratory for Computer Science and RSA Data
Security, Inc., Security, Inc.,
April 1992. April 1992.
[7] "Applied Cryptography" [7] "Applied Cryptography"
Bruce Schneier Bruce Schneier
John Wiley and Sons, 1996 (second edition) John Wiley and Sons, 1996 (second edition)
ISBN 0-471-12845-7 (cloth) ISBN 0-471-12845-7 (cloth)
ISBN 0-471-11709-9 (paper) ISBN 0-471-11709-9 (paper)
[8] Resource capability protocol Internet Draft Identifying composite media features
IETF RESCAP, work in progress
(No details published as of March 1999.)
[9] "Protocol-independent content negotiation framework" [8] "Protocol-independent content negotiation framework"
Graham Klyne, 5GM/Content Technologies Graham Klyne, 5GM/Content Technologies
Internet draft: <draft-ietf-conneg-requirements-02.txt> Internet draft: <draft-ietf-conneg-requirements-02.txt>
Work in progress, March 1999. Work in progress, March 1999.
[10] "Numerical Recipes" [9] "Numerical Recipes"
William H Press, Brian P Flannery, Saul A Teukolski and William T William H Press, Brian P Flannery, Saul A Teukolski and William T
Vetterling Vetterling
Cambridge University Press (1986) Cambridge University Press (1986)
ISBN 0 521 30811 9 ISBN 0 521 30811 9
(The Gamma function approximation is presented in chapter 6 on (The Gamma function approximation is presented in chapter 6 on
"Special Functions". There have been several later editions of "Special Functions". There have been several later editions of
this book published, so the chapter reference may change.) this book published, so the chapter reference may change.)
Internet Draft Identifying composite media features
10. Authors' addresses 10. Authors' addresses
Graham Klyne Graham Klyne
5th Generation Messaging Ltd. Content Technologies Ltd. 5th Generation Messaging Ltd. Content Technologies Ltd.
5 Watlington Street Forum 1, Station Road 5 Watlington Street Forum 1, Station Road
Nettlebed Theale Nettlebed Theale
Henley-on-Thames, RG9 5AB Reading, RG7 4RA Henley-on-Thames, RG9 5AB Reading, RG7 4RA
United Kingdom United Kingdom. United Kingdom United Kingdom.
Telephone: +44 1491 641 641 +44 118 930 1300 Telephone: +44 1491 641 641 +44 118 930 1300
Facsimile: +44 1491 641 611 +44 118 930 1301 Facsimile: +44 1491 641 611 +44 118 930 1301
E-mail: GK@ACM.ORG E-mail: GK@ACM.ORG
Larry Masinter Larry Masinter
Xerox Corporation Xerox Corporation
3333 Coyote Hill Road 3333 Coyote Hill Road
Palo Alto, CA 94304 Palo Alto, CA 94304
Facsimile: +1 650 812 4333 Facsimile: +1 650 812 4333
EMail: masinter@parc.xerox.com EMail: masinter@parc.xerox.com
http://www.parc.xerox.com/masinter http://www.parc.xerox.com/masinter
Appendix A: Revision history Appendix A The birthday problem
NOTE: this entire section is commentary, and does not
affect the feature set reference specification in any
way.
The use of a hash value to represent an arbitrary feature set is
based on a presumption that no two distinct feature sets will yield
the same hash value.
There is a small but distinct possibility that two different
feature sets will indeed yield the same hash value.
Internet Draft Identifying composite media features
We assume that the 128-bit hash function distributes hash values
for feature sets, even those with very small differences, randomly
and evenly through the range of 2^128 (approximately 3*10^38)
possible values. This is a fundamental property of a good digest
algorithm like MD5. Thus, the chance that any two distinct feature
set expressions yield the same hash is less than 1 in 10^38. This
is negligible when compared with, say, the probability that a
receiving system will fail having received data conforming to a
negotiated feature set.
But when the number of distinct feature sets in circulation
increases, the probability of repeating a hash value increases
surprisingly. This is illustrated by the "birthday paradox":
given a random collection of just 23 people, there is a greater
than even chance that there exists some pair with the same birthay.
This topic is discussed further in sections 7.4 and 7.5 of Bruce
Schneier's "Applied Cryptography" [7].
The table below shows the "birthday paradox" probabilities that at
least one pair of feature sets has the same hash value for
different numbers of feature sets in use.
Number of feature Probability of two
sets in use sets with the same
hash value
1 0
2 3E-39
10 1E-37
1E3 1E-33
1E6 1E-27
1E9 1E-21
1E12 1E-15
1E15 1E-9
1E18 1E-3
The above probability computations are approximate, being
performed using logarithms of a Gamma function
approximation by Lanczos [9]. The probability formula is
'P=1-(m!/((m-n)! m^n))', where 'm' is the total number of
possible hash values (2^128) and 'n' is the number of
feature sets in use.
If original feature set expressions are generated manually, or only
in response to some manually constrained process, the total number
of feature sets in circulation is likely to remain very small in
relation to the total number of possible hash values.
Internet Draft Identifying composite media features
The outcome of all this is: assuming that the feature sets are
manually generated, even taking account of the birthday paradox
effect, the probability of incorrectly identifying a feature set
using a hash value is still negligibly small when compared with
other possible failure modes.
Appendix B: Revision history
[[[RFC editor: please remove this section on publication]]] [[[RFC editor: please remove this section on publication]]]
00a 10-Feb-1999 Initial draft. 00a 10-Feb-1999 Initial draft.
01a 16-Feb-1999 Added pointers to mailing list for discussion. 01a 16-Feb-1999 Added pointers to mailing list for discussion.
01b 25-Mar-1999 Name all authors. Add some terms to the glossary. 01b 25-Mar-1999 Name all authors. Add some terms to the glossary.
Expand on meaning of URI tag used as auxiliary Expand on meaning of URI tag used as auxiliary
predicate name. Update references. Rework predicate name. Update references. Rework
skipping to change at line 681 skipping to change at page 16, line 36
resolved to expressions that yield the correct resolved to expressions that yield the correct
hash value. hash value.
01c 06-Apr-1999 Define form of URI reference using new '<...>' 01c 06-Apr-1999 Define form of URI reference using new '<...>'
syntax, and adjust other text accordingly. syntax, and adjust other text accordingly.
01d 06-Apr-1999 Editorial revisions. Include values in table of 01d 06-Apr-1999 Editorial revisions. Include values in table of
probabilities for hash value clashes. Remove probabilities for hash value clashes. Remove
discussion of algebraic simplification of hash discussion of algebraic simplification of hash
references. Correct syntax of some examples. references. Correct syntax of some examples.
02a 16-Jun-1999 Move birthday problem to an appendix. Remove
RESCAP citation. Use base-32 to represent feature
hashes; describe base-32 encoding.
02b 16-Jun-1999 Add note that the <URI> form of feature reference
may not be allowed at arbitrary locations in all
contexts.
 End of changes. 53 change blocks. 
153 lines changed or deleted 231 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/