draft-ietf-httpbis-header-structure-01.txt   draft-ietf-httpbis-header-structure-02.txt 
HTTP Working Group P-H. Kamp HTTP Working Group M. Nottingham
Internet-Draft The Varnish Cache Project Internet-Draft Fastly
Intended status: Standards Track April 24, 2017 Intended status: Standards Track P-H. Kamp
Expires: October 26, 2017 Expires: May 31, 2018 The Varnish Cache Project
November 27, 2017
HTTP Header Common Structure Structured Headers for HTTP
draft-ietf-httpbis-header-structure-01 draft-ietf-httpbis-header-structure-02
Abstract Abstract
An abstract data model for HTTP headers, "Common Structure", and a This document describes Structured Headers, a way of simplifying HTTP
HTTP/1 serialization of it, generalized from current HTTP headers. header field definition and parsing. It is intended for use by new
specifications of HTTP header fields. This includes revisions of
existing specifications when doing so does not cause interoperability
issues.
Note to Readers Note to Readers
Discussion of this draft takes place on the HTTP working group Discussion of this draft takes place on the HTTP working group
mailing list (ietf-http-wg@w3.org), which is archived at mailing list (ietf-http-wg@w3.org), which is archived at
https://lists.w3.org/Archives/Public/ietf-http-wg/ . https://lists.w3.org/Archives/Public/ietf-http-wg/ [1].
Working Group information can be found at http://httpwg.github.io/ ; _RFC EDITOR: please remove this section before publication_
source code and issues list for this draft can be found at
https://github.com/httpwg/http-extensions/labels/header-structure . Working Group information can be found at https://httpwg.github.io/
[2]; source code and issues list for this draft can be found at
https://github.com/httpwg/http-extensions/labels/header-structure
[3].
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 26, 2017. This Internet-Draft will expire on May 31, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
1. Introduction 1. Introduction
The HTTP protocol does not impose any structure or datamodel on the Specifying the syntax of new HTTP header fields is an onerous task;
information in HTTP headers, the HTTP/1 serialization is the even with the guidance in [RFC7231], Section 8.3.1, there are many
datamodel: An ASCII string without control characters. decisions - and pitfalls - for a prospective HTTP header field
author.
HTTP header definitions specify how the string must be formatted and
while families of similar headers exist, it still requires an
uncomfortable large number of bespoke parser and validation routines
to process HTTP traffic correctly.
In order to improve performance HTTP/2 and HPACK uses naive text-
compression, which incidentally decoupled the on-the-wire
serialization from the data model.
During the development of HPACK it became evident that significantly
bigger gains were available if semantic compression could be used,
most notably with timestamps. However, the lack of a common data
structure for HTTP headers would make semantic compression one long
list of special cases.
Parallel to this, various proposals for how to fulfill data-
transportation needs, and to a lesser degree to impose some kind of
order on HTTP headers, at least going forward, were floated.
All of these proposals, JSON, CBOR etc. run into the same basic
problem: Their serialization is incompatible with RFC 7230's
[RFC7230] ABNF definition of 'field-value'.
For binary formats, such as CBOR, a wholesale base64/85
reserialization would be needed, with negative results for both
debugability and bandwidth.
For textual formats, such as JSON, the format must first be neutered
to not violate field-value's ABNF, and then workarounds added to
reintroduce the features just lost, for instance UNICODE strings.
The post-surgery format is no longer JSON, and it experience
indicates that almost-but-not-quite compatibility is worse than no
compatibility.
This proposal starts from the other end, and builds and generalizes a
data structure definition from existing HTTP headers, which means
that HTTP/1 serialization and 'field-value' compatibility is built
in.
If all future HTTP headers are defined to fit into this Common
Structure we have at least halted the proliferation of bespoke
parsers and started to pave the road for semantic compression
serializations of HTTP traffic.
1.1. Terminology
In this document, the key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119
[RFC2119].
2. Definition of HTTP Header Common Structure
The data model of Common Structure is an ordered sequence of named Likewise, bespoke parsers often need to be written for specific HTTP
dictionaries. Please see Appendix A for how this model was derived. headers, because each has slightly different handling of what looks
like common syntax.
The definition of the data model is on purpose abstract, uncoupled This document introduces structured HTTP header field values
from any protocol serialization or programming environment (hereafter, Structured Headers) to address these problems.
representation, it is meant as the foundation on which all such Structured Headers define a generic, abstract model for data, along
manifestations of the model can be built. with a concrete serialisation for expressing that model in textual
HTTP headers, as used by HTTP/1 [RFC7230] and HTTP/2 [RFC7540].
Common Structure in ABNF (Slightly bastardized relative to RFC5234 HTTP headers that are defined as Structured Headers use the types
[RFC5234]): defined in this specification to define their syntax and basic
handling rules, thereby simplifying both their definition and
parsing.
import token from RFC7230 Additionally, future versions of HTTP can define alternative
import DIGIT from RFC5234 serialisations of the abstract model of Structured Headers, allowing
headers that use it to be transmitted more efficiently without being
redefined.
common-structure = 1* ( identifier dictionary ) Note that it is not a goal of this document to redefine the syntax of
existing HTTP headers; the mechanisms described herein are only
intended to be used with headers that explicitly opt into them.
dictionary = * ( identifier [ value ] ) To specify a header field that uses Structured Headers, see
Section 2.
value = identifier / Section 4 defines a number of abstract data types that can be used in
integer / Structured Headers, of which only three are allowed at the "top"
number / level: lists, dictionaries, or items.
ascii-string /
unicode-string /
blob /
timestamp /
common-structure
Recursion is included as a way to to support deep and more general Those abstract types can be serialised into textual headers - such as
data structures, but its use is highly discouraged and where it is those used in HTTP/1 and HTTP/2 - using the algorithms described in
used the depth of recursion SHALL always be explicitly limited in the Section 3.
specifications of the HTTP headers which allow it.
identifier = token [ "/" token ] 1.1. Notational Conventions
integer = ["-"] 1*19 DIGIT The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
Integers SHALL be in the range +/- 2^63-1 (= +/- 9223372036854775807) This document uses the Augmented Backus-Naur Form (ABNF) notation of
[RFC5234], including the DIGIT, ALPHA and DQUOTE rules from that
document. It also includes the OWS rule from [RFC7230].
number = ["-"] DIGIT '.' 1*14DIGIT / 2. Specifying Structured Headers
["-"] 2DIGIT '.' 1*13DIGIT /
["-"] 3DIGIT '.' 1*12DIGIT /
... /
["-"] 12DIGIT '.' 1*3DIGIT /
["-"] 13DIGIT '.' 1*2DIGIT /
["-"] 14DIGIT '.' 1DIGIT
The limit of 15 significant digits is chosen so that numbers can be HTTP headers that use Structured Headers need to be defined to do so
correctly represented by IEEE754 64 bit binary floating point. explicitly; recipients and generators need to know that the
requirements of this document are in effect. The simplest way to do
that is by referencing this document in its definition.
ascii-string = * %x20-7e The field's definition will also need to specify the field-value's
allowed syntax, in terms of the types described in Section 4, along
with their associated semantics.
This is intended to be an efficient, "safe" and uncomplicated string Field definitions MUST NOT relax or otherwise modify the requirements
type, for uses where the string content is culturally neutral or of this specification; doing so would preclude handling by generic
where it will not be user visible. software.
unicode-string = * UNICODE However, field definitions are encouraged to clearly state additional
constraints upon the syntax, as well as the consequences when those
constraints are violated.
UNICODE = <U+0000-U+D7FF / U+E000-U+10FFFF> For example:
# UNICODE nicked from draft-seantek-unicode-in-abnf-02
Unicode-strings are unrestricted because there is no sane and/or # FooExample Header
culturally neutral way to subset or otherwise make unicode "safe",
and Unicode is still evolving new and interesting code points.
Users of unicode-string SHALL be prepared for the full gammut of The FooExample HTTP header field conveys a list of numbers about how
glyph-gymnastics in order to avoid U+1F4A9 U+08 U+1F574. much Foo the sender has.
blob = * %0x00-ff FooExample is a Structured header [RFCxxxx]. Its value MUST be a
dictionary ([RFCxxxx], Section Y.Y).
Blobs are intended primarily for cryptographic data, but can be used The dictionary MUST contain:
for any otherwise unsatisfied needs.
timestamp = number * A member whose key is "foo", and whose value is an integer
([RFCxxxx], Section Y.Y), indicating the number of foos in
the message.
* A member whose key is "bar", and whose value is a string
([RFCxxxx], Section Y.Y), conveying the characteristic bar-ness
of the message.
A timestamp counts seconds since the UNIX time_t epoch, including the If the parsed header field does not contain both, it MUST be ignored.
"invisible leap-seconds" misfeature.
3. HTTP/1 Serialization of HTTP Header Common Structure Note that empty header field values are not allowed by the syntax,
and therefore will be considered errors.
In ABNF: 3. Parsing Requirements for Textual Headers
import OWS from RFC7230 When a receiving implementation parses textual HTTP header fields
import HEXDIG, DQUOTE from RFC5234 (e.g., in HTTP/1 or HTTP/2) that are known to be Structured Headers,
import EmbeddedUnicodeChar from RFC5137 it is important that care be taken, as there are a number of edge
cases that can cause interoperability or even security problems.
This section specifies the algorithm for doing so.
h1-common-structure-header = Given an ASCII string input_string that represents the chosen
h1-common-structure-legacy-header / header's field-value, return the parsed header value. Note that
h1-common-structure-self-identifying-header input_string may incorporate multiple header lines combined into one
comma-separated field-value, as per [RFC7230], Section 3.2.2.
h1-common-structure-legacy-header = 1. Discard any OWS from the beginning of input_string.
field-name ":" OWS h1-common-structure
Only white-listed legacy headers (see Section 8) can use this format. 2. If the field-value is defined to be a dictionary, return the
result of Parsing a Dictionary from Textual headers
(Section 4.7).
h1-common-structure-self-identifying-header: 3. If the field-value is defined to be a list, return the result of
field-name ":" OWS ">" h1-common-structure "<" Parsing a List from Textual Headers (Section 4.8).
h1-common-structure = h1-element * ("," h1-element) 4. If the field-value is defined to be a parameterised label, return
the result of Parsing a Parameterised Label from Textual headers
(Section 4.4).
h1-element = identifier * (";" identifier ["=" h1-value]) 5. Otherwise, return the result of Parsing an Item from Textual
Headers (Section 4.6).
h1-value = identifier / Note that in the case of lists and dictionaries, this has the effect
integer / of combining multiple instances of the header field into one.
number / However, for singular items and parameterised labels, it has the
h1-ascii-string / effect of selecting the first value and ignoring any subsequent
h1-unicode-string / instances of the field, as well as extraneous text afterwards.
h1-blob /
h1-timestamp /
">" h1-common-structure "<"
h1-ascii-string = DQUOTE *( Additionally, note that the effect of the parsing algorithms as
( "\" DQUOTE ) / specified is generally intolerant of syntax errors; if one is
( "\" "\" ) / encountered, the typical response is to throw an error, thereby
0x20-21 / discarding the entire header field value. This includes any non-
0x23-5B / ASCII characters in input_string.
0x5D-7E
) DQUOTE
h1-unicode-string = DQUOTE *( 4. Structured Header Data Types
( "\" DQUOTE )
( "\" "\" ) /
EmbeddedUnicodeChar /
0x20-21 /
0x23-5B /
0x5D-7E /
) DQUOTE
The dim prospects of ever getting a majority of HTTP1 paths 8-bit This section defines the abstract value types that can be composed
clean makes UTF-8 unviable as H1 serialization. Given that very into Structured Headers, along with the textual HTTP serialisations
little of the information in HTTP headers is presented to users in of them.
the first place, improving H1 and HPACK efficiency by inventing a
more efficient RFC5137 compliant escape-sequences seems unwarranted.
h1-blob = ":" base64 ":" 4.1. Numbers
# XXX: where to import base64 from ?
h1-timestamp = number Abstractly, numbers are integers with an optional fractional part.
They have a maximum of fifteen digits available to be used in one or
both of the parts, as reflected in the ABNF below; this allows them
to be stored as IEEE 754 double precision numbers (binary64)
([IEEE754]).
XXX: Allow OWS in parsers, but not in generators ? The textual HTTP serialisation of numbers allows a maximum of fifteen
In programming environments which do not define a native digits between the integer and fractional part, along with an
representation or serialization of Common Structure, the HTTP/1 optional "-" indicating negative numbers.
serialization should be used.
4. When to use Common Structure Parser number = ["-"] ( "." 1*15DIGIT /
DIGIT "." 1*14DIGIT /
2DIGIT "." 1*13DIGIT /
3DIGIT "." 1*12DIGIT /
4DIGIT "." 1*11DIGIT /
5DIGIT "." 1*10DIGIT /
6DIGIT "." 1*9DIGIT /
7DIGIT "." 1*8DIGIT /
8DIGIT "." 1*7DIGIT /
9DIGIT "." 1*6DIGIT /
10DIGIT "." 1*5DIGIT /
11DIGIT "." 1*4DIGIT /
12DIGIT "." 1*3DIGIT /
13DIGIT "." 1*2DIGIT /
14DIGIT "." 1DIGIT /
15DIGIT )
All future standardized and all private HTTP headers using Common integer = ["-"] 1*15DIGIT
Structure should self identify as such. In the HTTP/1 serialization unsigned = 1*15DIGIT
by making the first character ">" and the last "<". (These two
characters are deliberately "the wrong way" to not clash with
exsisting usages.)
Legacy HTTP headers which fit into Common Structure, are marked as integer and unsigned are defined as conveniences to specification
such in the IANA Message Header Registry (see Section 8), and a authors; if their use is specified and their ABNF is not matched, a
snapshot of the registry can be used to trigger parsing according to parser MUST consider it to be invalid.
Common Structure of these headers.
5. Desired Normative Effects For example, a header whose value is defined as a number could look
like:
All new HTTP headers SHOULD use the Common Structure if at all ExampleNumberHeader: 4.5
possible.
6. Open/Outstanding issues to resolve 4.1.1. Parsing Numbers from Textual Headers
6.1. Single/Multiple Headers TBD
Should we allow splitting common structure data over multiple headers 4.2. Strings
?
Pro: Abstractly, strings are ASCII strings [RFC0020], excluding control
characters (i.e., the range 0x20 to 0x7E). Note that this excludes
tabs, newlines and carriage returns. They may be at most 1024
characters long.
Avoids size restrictions, easier on-the-fly editing The textual HTTP serialisation of strings uses a backslash ("") to
escape double quotes and backslashes in strings.
Contra: string = DQUOTE 1*1024(char) DQUOTE
char = unescaped / escape ( DQUOTE / "\" )
unescaped = %x20-21 / %x23-5B / %x5D-7E
escape = "\"
For example, a header whose value is defined as a string could look
like:
Cannot act on any such header until all headers have been received. ExampleStringHeader: "hello world"
We must define where headers can be split (between identifier and Note that strings only use DQUOTE as a delimiter; single quotes do
dictionary ?, in the middle of dictionaries ?) not delimit strings. Furthermore, only DQUOTE and "" can be escaped;
other sequences MUST generate an error.
Most on-the-fly editing is hackish at best. Unicode is not directly supported in Structured Headers, because it
causes a number of interoperability issues, and - with few exceptions
- header values do not require it.
7. Future Work When it is necessary for a field value to convey non-ASCII string
7.1. Redefining existing headers for better performance content, binary content (Section 4.5) SHOULD be specified, along with
a character encoding (most likely, UTF-8).
The HTTP/1 serializations self-identification mechanism makes it 4.2.1. Parsing a String from Textual Headers
possible to extend the definition of existing Appendix A.5 headers
into Common Structure.
For instance one could imagine: Given an ASCII string input_string, return an unquoted string.
input_string is modified to remove the parsed value.
Date: >1475061449.201< 1. Let output_string be an empty string.
Which would be faster to parse and validate than the current 2. If the first character of input_string is not DQUOTE, throw an
definition of the Date header and more precise too. error.
Some kind of signal/negotiation mechanism would be required to make 3. Discard the first character of input_string.
this work in practice.
7.2. Define a validation dictionary 4. If input_string contains more than 1025 characters, throw an
error.
A machine-readable specification of the legal contents of HTTP 5. While input_string is not empty:
headers would go a long way to improve efficiency and security in
HTTP implementations.
8. IANA Considerations 1. Let char be the result of removing the first character of
input_string.
The IANA Message Header Registry will be extended with an additional 2. If char is a backslash ("\"):
field named "Common Structure" which can have the values "True",
"False" or "Unknown".
The RFC723x headers listed in Appendix A.4 will get the value "True" 1. If input_string is now empty, throw an error.
in the new field.
The RFC723x headers listed in Appendix A.5 will get the value "False" 2. Else:
in the new field.
All other existing entries in the registry will be set to "Unknown" 1. Let next_char be the result of removing the first
until and if the owner of the entry requests otherwise. character of input_string.
9. Security Considerations 2. If next_char is not DQUOTE or "\", throw an error.
Unique dictionary keys are required to reduce the risk of smuggling 3. Append next_char to output_string.
attacks.
10. References 3. Else, if char is DQUOTE, remove the first character of
10.1. Normative References input_string and return output_string.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 4. Else, append char to output_string.
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC5137] Klensin, J., "ASCII Escaping of Unicode Characters", 6. Otherwise, throw an error.
BCP 137, RFC 5137, DOI 10.17487/RFC5137, February 2008,
<http://www.rfc-editor.org/info/rfc5137>.
[RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 4.3. Labels
Specifications: ABNF", STD 68, RFC 5234,
DOI 10.17487/RFC5234, January 2008,
<http://www.rfc-editor.org/info/rfc5234>.
[RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer Labels are short (up to 256 characters) textual identifiers; their
Protocol (HTTP/1.1): Message Syntax and Routing", abstract model is identical to their expression in the textual HTTP
RFC 7230, DOI 10.17487/RFC7230, June 2014, serialisation.
<http://www.rfc-editor.org/info/rfc7230>.
10.2. Informative References label = lcalpha *255( lcalpha / DIGIT / "_" / "-"/ "*" / "/" )
lcalpha = %x61-7A ; a-z
[RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer Note that labels can only contain lowercase letters.
Protocol (HTTP/1.1): Semantics and Content", RFC 7231,
DOI 10.17487/RFC7231, June 2014,
<http://www.rfc-editor.org/info/rfc7231>.
[RFC7232] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer For example, a header whose value is defined as a label could look
Protocol (HTTP/1.1): Conditional Requests", RFC 7232, like:
DOI 10.17487/RFC7232, June 2014,
<http://www.rfc-editor.org/info/rfc7232>.
[RFC7233] Fielding, R., Ed., Lafon, Y., Ed., and J. Reschke, Ed., ExampleLabelHeader: foo/bar
"Hypertext Transfer Protocol (HTTP/1.1): Range Requests",
RFC 7233, DOI 10.17487/RFC7233, June 2014,
<http://www.rfc-editor.org/info/rfc7233>.
[RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, 4.3.1. Parsing a Label from Textual Headers
Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching",
RFC 7234, DOI 10.17487/RFC7234, June 2014,
<http://www.rfc-editor.org/info/rfc7234>.
[RFC7235] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer Given an ASCII string input_string, return a label. input_string is
Protocol (HTTP/1.1): Authentication", RFC 7235, modified to remove the parsed value.
DOI 10.17487/RFC7235, June 2014,
<http://www.rfc-editor.org/info/rfc7235>.
[RFC7239] Petersson, A. and M. Nilsson, "Forwarded HTTP Extension", 1. If input_string contains more than 256 characters, throw an
RFC 7239, DOI 10.17487/RFC7239, June 2014, error.
<http://www.rfc-editor.org/info/rfc7239>.
[RFC7694] Reschke, J., "Hypertext Transfer Protocol (HTTP) Client- 2. If the first character of input_string is not lcalpha, throw an
Initiated Content-Encoding", RFC 7694, error.
DOI 10.17487/RFC7694, November 2015,
<http://www.rfc-editor.org/info/rfc7694>.
Appendix A. Do HTTP headers have any common structure ? 3. Let output_string be an empty string.
Several proposals have been floated in recent years to use some 4. While input_string is not empty:
preexisting structured data serialization or other for HTTP headers,
to impose some sanity.
None of these proposals have gained traction and no obvious candidate 1. Let char be the result of removing the first character of
data serializations have been left unexamined. input_string.
This effort tries to tackle the question from the other side, by 2. If char is not one of lcalpha, DIGIT, "_", "-", "*" or "/":
asking if there is a common structure in existing HTTP headers we can
generalize for this purpose.
A.1. Survey of HTTP header structure 1. Prepend char to input_string.
The RFC723x family of HTTP/1 standards control 49 entries in the IANA 2. Return output_string.
Message Header Registry, and they share two common motifs.
The majority of RFC723x HTTP headers are lists. A few of them are 3. Append char to output_string.
ordered, ('Content-Encoding'), some are unordered ('Connection') and
some are ordered by 'q=%f' weight parameters ('Accept')
In most cases, the list elements are some kind of identifier, usually 5. Return output_string.
derived from ABNF 'token' as defined by [RFC7230].
A subgroup of headers, mostly related to MIME, uses what one could 4.4. Parameterised Labels
call a 'qualified token'::
qualified-token = token-or-asterix [ "/" token-or-asterix ] Parameterised Labels are labels (Section 4.3) with up to 256
parameters; each parameter has a label and an optional value that is
an item (Section 4.6). Ordering between parameters is not
significant, and duplicate parameters MUST be considered an error.
The second motif is parameterized list elements. The best known is The textual HTTP serialisation uses semicolons (";") to delimit the
the "q=0.5" weight parameter, but other parameters exist as well. parameters from each other, and equals ("=") to delimit the parameter
name from its value.
Generalizing from these motifs, our candidate "Common Structure" data parameterised = label *256( OWS ";" OWS label [ "=" item ] )
model becomes an ordered list of named dictionaries.
In pidgin ABNF, ignoring white-space for the sake of clarity, the For example,
HTTP/1.1 serialization of Common Structure is is something like:
token-or-asterix = token from RFC7230, but also allowing "*" ExampleParamHeader: abc; a=1; b=2; c
qualified-token = token-or-asterix [ "/" token-or-asterix ] 4.4.1. Parsing a Parameterised Label from Textual Headers
field-name, see RFC7230 Given an ASCII string input_string, return a label with an mapping of
parameters. input_string is modified to remove the parsed value.
Common-Structure-Header = field-name ":" 1#named-dictionary 1. Let primary_label be the result of Parsing a Label from Textual
Headers (Section 4.3) from input_string.
named-dictionary = qualified-token [ *(";" param) ] 2. Let parameters be an empty mapping.
param = token [ "=" value ] 3. In a loop:
value = we'll get back to this in a moment. 1. Consume any OWS from the beginning of input_string.
Nineteen out of the RFC723x's 48 headers, almost 40%, can already be 2. If the first character of input_string is not ";", exit the
parsed using this definition, and none the rest have requirements loop.
which could not be met by this data model. See Appendix A.4 and
Appendix A.5 for the full survey details.
A.2. Survey of values in HTTP headers 3. Consume a ";" character from the beginning of input_string.
Surveying the datatypes of HTTP headers, standardized as well as 4. Consume any OWS from the beginning of input_string.
private, the following picture emerges:
A.2.1. Numbers 5. let param_name be the result of Parsing a Label from Textual
Headers (Section 4.3) from input_string.
Integer and floating point are both used. Range and precision is 6. If param_name is already present in parameters, throw an
mostly unspecified in controlling documents. error.
Scientific notation (9.192631770e9) does not seem to be used 7. Let param_value be a null value.
anywhere.
The ranges used seem to be minus several thousand to plus a couple of 8. If the first character of input_string is "=":
billions, the high end almost exclusively being POSIX time_t
timestamps.
A.2.2. Timestamps 1. Consume the "=" character at the beginning of
input_string.
RFC723x text format, but POSIX time_t represented as integer or 2. Let param_value be the result of Parsing an Item from
floating point is not uncommon. ISO8601 have also been spotted. Textual Headers (Section 4.6) from input_string.
A.2.3. Strings 9. If parameters has more than 255 members, throw an error.
The vast majority are pure ASCII strings, with either no escapes, %xx 10. Add param_name to parameters with the value param_value.
URL-like escapes or C-style back-slash escapes, possibly with the
addition of \uxxxx UNICODE escapes.
Where non-ASCII character sets are used, they are almost always 4. Return the tuple (primary_label, parameters).
implicit, rather than explicit. UTF8 and ISO-8859-1 seem to be most
common.
A.2.4. Binary blobs 4.5. Binary Content
Often used for cryptographic data. Usually in base64 encoding, Arbitrary binary content up to 16K in size can be conveyed in
sometimes ""-quoted more often not. base85 encoding is also seen, Structured Headers.
usually quoted.
A.2.5. Identifiers The textual HTTP serialisation indicates their presence by a leading
"*", with the data encoded using Base 64 Encoding [RFC4648], without
padding (as "=" might be confused with the use of dictionaries).
Seems to almost always fit in the RFC723x 'token' definition. binary = "*" 1*21846(base64)
base64 = ALPHA / DIGIT / "+" / "/"
A.3. Is this actually a useful thing to generalize ? For example, a header whose value is defined as binary content could
look like:
The number one wishlist item seems to be UNICODE strings, with a big ExampleBinaryHeader: *cHJldGVuZCB0aGlzIGlzIGJpbmFyeSBjb250ZW50Lg
side order of not having to write a new parser routine every time
somebody comes up with a new header.
Having a common parser would indeed be a good thing, and having an 4.5.1. Parsing Binary Content from Textual Headers
underlying data model which makes it possible define a compressed
serialization, rather than rely on serialization to text followed by
text compression (ie: HPACK) seems like a good idea too.
However, when using a datamodel and a parser general enough to Given an ASCII string input_string, return binary content.
transport useful data, it will have to be followed by a validation input_string is modified to remove the parsed value.
step, which checks that the data also makes sense.
Today validation, such as it is, is often done by the bespoke 1. If the first character of input_string is not "*", throw an
parsers. error.
This then is probably where the next big potential for improvement 2. Discard the first character of input_string.
lies:
Ideally a machine readable "data dictionary" which makes it possibly 3. Let b64_content be the result of removing content of input_string
to copy that text out of RFCs, run it through a code generator which up to but not including the first character that is not in ALPHA,
spits out validation code which operates on the output of the common DIGIT, "+" or "/".
parser.
But history has been particularly unkind to that idea. 4. Let binary_content be the result of Base 64 Decoding [RFC4648]
b64_content, synthesising padding if necessary. If an error is
encountered, throw it.
Most attempts studied as part of this effort, have sunk under 5. Return binary_content.
complexity caused by reaching for generality, but where scope has
been wisely limited, it seems to be possible.
So file that idea under "future work". 4.6. Items
A.4. RFC723x headers with "common structure" An item is can be a number (Section 4.1), string (Section 4.2), label
(Section 4.3) or binary content (Section 4.5).
o Accept [RFC7231], Section 5.3.2 item = number / string / label / binary
o Accept-Charset [RFC7231], Section 5.3.3 4.6.1. Parsing an Item from Textual Headers
o Accept-Encoding [RFC7231], Section 5.3.4, [RFC7694], Section 3 Given an ASCII string input_string, return an item. input_string is
modified to remove the parsed value.
o Accept-Language [RFC7231], Section 5.3.5 1. Discard any OWS from the beginning of input_string.
o Age [RFC7234], Section 5.1 2. If the first character of input_string is a "-" or a DIGIT,
process input_string as a number (Section 4.1) and return the
result, throwing any errors encountered.
o Allow [RFC7231], Section 7.4.1 3. If the first character of input_string is a DQUOTE, process
input_string as a string (Section 4.2) and return the result,
throwing any errors encountered.
o Connection [RFC7230], Section 6.1 4. If the first character of input_string is "*", process
input_string as binary content (Section 4.5) and return the
result, throwing any errors encountered.
o Content-Encoding [RFC7231], Section 3.1.2.2 5. If the first character of input_string is an lcalpha, process
input_string as a label (Section 4.3) and return the result,
throwing any errors encountered.
o Content-Language [RFC7231], Section 3.1.3.2 6. Otherwise, throw an error.
o Content-Length [RFC7230], Section 3.3.2 4.7. Dictionaries
o Content-Type [RFC7231], Section 3.1.1.5 Dictionaries are unordered maps of key-value pairs, where the keys
are labels (Section 4.3) and the values are items (Section 4.6).
There can be between 1 and 1024 members, and keys are required to be
unique.
o Expect [RFC7231], Section 5.1.1 In the textual HTTP serialisation, keys and values are separated by
"=" (without whitespace), and key/value pairs are separated by a
comma with optional whitespace.
o Max-Forwards [RFC7231], Section 5.1.2 dictionary = label "=" item *1023( OWS "," OWS label "=" item )
o MIME-Version [RFC7231], Appendix A.1 For example, a header field whose value is defined as a dictionary
could look like:
o TE [RFC7230], Section 4.3 ExampleDictHeader: foo=1.23, en="Applepie", da=*w4ZibGV0w6ZydGUK
o Trailer [RFC7230], Section 4.4 Typically, a header field specification will define the semantics of
individual keys, as well as whether their presence is required or
optional. Recipients MUST ignore keys that are undefined or unknown,
unless the header field's specification specifically disallows them.
o Transfer-Encoding [RFC7230], Section 3.3.1 4.7.1. Parsing a Dictionary from Textual Headers
o Upgrade [RFC7230], Section 6.7 Given an ASCII string input_string, return a mapping of (label,
item). input_string is modified to remove the parsed value.
o Vary [RFC7231], Section 7.1.4 1. Let dictionary be an empty mapping.
A.5. RFC723x headers with "uncommon structure" 2. While input_string is not empty:
1 of the RFC723x headers is only reserved, and therefore have no 1. Let this_key be the result of running Parse Label from
structure at all: Textual Headers (Section 4.3) with input_string. If an error
is encountered, throw it.
o Close [RFC7230], Section 8.1 2. If dictionary already contains this_key, raise an error.
5 of the RFC723x headers are HTTP dates: 3. Consume a "=" from input_string; if none is present, raise an
error.
o Date [RFC7231], Section 7.1.1.2 4. Let this_value be the result of running Parse Item from
Textual Headers (Section 4.6) with input_string. If an error
is encountered, throw it.
o Expires [RFC7234], Section 5.3 5. Add key this_key with value this_value to dictionary.
o If-Modified-Since [RFC7232], Section 3.3 6. Discard any leading OWS from input_string.
o If-Unmodified-Since [RFC7232], Section 3.4 7. If input_string is empty, return dictionary.
o Last-Modified [RFC7232], Section 2.2 8. Consume a COMMA from input_string; if no comma is present,
raise an error.
24 of the RFC723x headers use bespoke formats which only a single or 9. Discard any leading OWS from input_string.
in rare cases two headers share:
o Accept-Ranges [RFC7233], Section 2.3 3. Return dictionary.
* bytes-unit / other-range-unit 4.8. Lists
o Authorization [RFC7235], Section 4.2 Lists are arrays of items (Section 4.6) or parameterised labels
(Section 4.4, with one to 1024 members.
o Proxy-Authorization [RFC7235], Section 4.4 In the textual HTTP serialisation, each member is separated by a
comma and optional whitespace.
* credentials list = list_member 1*1024( OWS "," OWS list_member )
list_member = item / parameterised
o Cache-Control [RFC7234], Section 5.2 For example, a header field whose value is defined as a list of
labels could look like:
* 1#cache-directive ExampleLabelListHeader: foo, bar, baz_45
o Content-Location [RFC7231], Section 3.1.4.2 and a header field whose value is defined as a list of parameterised
labels could look like:
* absolute-URI / partial-URI ExampleParamListHeader: abc/def; g="hi";j, klm/nop
o Content-Range [RFC7233], Section 4.2 4.8.1. Parsing a List from Textual Headers
* byte-content-range / other-content-range Given an ASCII string input_string, return a list of items.
input_string is modified to remove the parsed value.
o ETag [RFC7232], Section 2.3 1. Let items be an empty array.
* entity-tag 2. While input_string is not empty:
o Forwarded [RFC7239] 1. Let item be the result of running Parse Item from Textual
Headers (Section 4.6) with input_string. If an error is
encountered, throw it.
* 1#forwarded-element 2. Append item to items.
o From [RFC7231], Section 5.5.1 3. Discard any leading OWS from input_string.
* mailbox 4. If input_string is empty, return items.
o If-Match [RFC7232], Section 3.1 5. Consume a COMMA from input_string; if no comma is present,
o If-None-Match [RFC7232], Section 3.2 raise an error.
* "*" / 1#entity-tag 6. Discard any leading OWS from input_string.
o If-Range [RFC7233], Section 3.2 3. Return items.
* entity-tag / HTTP-date 5. IANA Considerations
o Host [RFC7230], Section 5.4 This draft has no actions for IANA.
* uri-host [ ":" port ] 6. Security Considerations
o Location [RFC7231], Section 7.1.2 TBD
* URI-reference 7. References
o Pragma [RFC7234], Section 5.4 7.1. Normative References
* 1#pragma-directive [RFC0020] Cerf, V., "ASCII format for network interchange", STD 80,
RFC 20, DOI 10.17487/RFC0020, October 1969,
<https://www.rfc-editor.org/info/rfc20>.
o Range [RFC7233], Section 3.1 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
* byte-ranges-specifier / other-ranges-specifier [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
<https://www.rfc-editor.org/info/rfc4648>.
o Referer [RFC7231], Section 5.5.2 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234,
DOI 10.17487/RFC5234, January 2008,
<https://www.rfc-editor.org/info/rfc5234>.
* absolute-URI / partial-URI [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
Protocol (HTTP/1.1): Message Syntax and Routing",
RFC 7230, DOI 10.17487/RFC7230, June 2014,
<https://www.rfc-editor.org/info/rfc7230>.
o Retry-After [RFC7231], Section 7.1.3 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
* HTTP-date / delay-seconds 7.2. Informative References
o Server [RFC7231], Section 7.4.2 [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", 2008,
<http://grouper.ieee.org/groups/754/>.
o User-Agent [RFC7231], Section 5.5.3 [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
Protocol (HTTP/1.1): Semantics and Content", RFC 7231,
DOI 10.17487/RFC7231, June 2014,
<https://www.rfc-editor.org/info/rfc7231>.
* product *( RWS ( product / comment ) ) [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
DOI 10.17487/RFC7540, May 2015,
<https://www.rfc-editor.org/info/rfc7540>.
o Via [RFC7230], Section 5.7.1 7.3. URIs
* 1#( received-protocol RWS received-by [ RWS comment ] ) [1] https://lists.w3.org/Archives/Public/ietf-http-wg/
o Warning [RFC7234], Section 5.5 [2] https://httpwg.github.io/
* 1#warning-value [3] https://github.com/httpwg/http-extensions/labels/header-structure
o Proxy-Authenticate [RFC7235], Section 4.3 Appendix A. Changes
o WWW-Authenticate [RFC7235], Section 4.1
* 1#challenge A.1. Since draft-ietf-httpbis-header-structure-01
Appendix B. Changes Replaced with draft-nottingham-structured-headers.
B.1. Since draft-ietf-httpbis-header-structure-00 A.2. Since draft-ietf-httpbis-header-structure-00
Added signed 64bit integer type. Added signed 64bit integer type.
Drop UTF8, and settle on BCP137 [RFC5137]::EmbeddedUnicodeChar for Drop UTF8, and settle on BCP137 ::EmbeddedUnicodeChar for h1-unicode-
h1-unicode-string. string.
Change h1_blob delimiter to ":" since "'" is valid t_char Change h1_blob delimiter to ":" since "'" is valid t_char
Author's Address Authors' Addresses
Mark Nottingham
Fastly
Email: mnot@mnot.net
URI: https://www.mnot.net/
Poul-Henning Kamp Poul-Henning Kamp
The Varnish Cache Project The Varnish Cache Project
Email: phk@varnish-cache.org Email: phk@varnish-cache.org
 End of changes. 212 change blocks. 
442 lines changed or deleted 435 lines changed or added

This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/