HTTP Working Group                                             P-H. Kamp                                         M. Nottingham
Internet-Draft                                 The Varnish Cache Project                                                    Fastly
Intended status: Standards Track                          April 24, 2017                               P-H. Kamp
Expires: October 26, May 31, 2018                          The Varnish Cache Project
                                                       November 27, 2017

                      Structured Headers for HTTP Header Common Structure
                 draft-ietf-httpbis-header-structure-01
                 draft-ietf-httpbis-header-structure-02

Abstract

   An abstract data model for

   This document describes Structured Headers, a way of simplifying HTTP headers, "Common Structure",
   header field definition and a
   HTTP/1 serialization parsing.  It is intended for use by new
   specifications of it, generalized from current HTTP headers. header fields.  This includes revisions of
   existing specifications when doing so does not cause interoperability
   issues.

Note to Readers

   Discussion of this draft takes place on the HTTP working group
   mailing list (ietf-http-wg@w3.org), which is archived at
   https://lists.w3.org/Archives/Public/ietf-http-wg/ . [1].

   _RFC EDITOR: please remove this section before publication_

   Working Group information can be found at http://httpwg.github.io/ ; https://httpwg.github.io/
   [2]; source code and issues list for this draft can be found at
   https://github.com/httpwg/http-extensions/labels/header-structure .
   [3].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/. https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on October 26, 2017. May 31, 2018.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info)
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

1.  Introduction

   The HTTP protocol does not impose any structure or datamodel on

   Specifying the
   information in syntax of new HTTP headers, the HTTP/1 serialization header fields is an onerous task;
   even with the
   datamodel: An ASCII string without control characters. guidance in [RFC7231], Section 8.3.1, there are many
   decisions - and pitfalls - for a prospective HTTP header definitions specify how the string must be formatted and
   while families of similar headers exist, it still requires an
   uncomfortable large number of field
   author.

   Likewise, bespoke parser and validation routines
   to process HTTP traffic correctly.

   In order parsers often need to improve performance HTTP/2 and HPACK uses naive text-
   compression, which incidentally decoupled the on-the-wire
   serialization from the data model.

   During the development of HPACK it became evident that significantly
   bigger gains were available if semantic compression could be used,
   most notably with timestamps.  However, the lack of a common data
   structure for HTTP headers would make semantic compression one long
   list of special cases.

   Parallel to this, various proposals written for how to fulfill data-
   transportation needs, and to a lesser degree to impose some kind of
   order on specific HTTP
   headers, at least going forward, were floated.

   All because each has slightly different handling of what looks
   like common syntax.

   This document introduces structured HTTP header field values
   (hereafter, Structured Headers) to address these proposals, JSON, CBOR etc. run into the same basic
   problem: Their serialization is incompatible with RFC 7230's
   [RFC7230] ABNF definition of 'field-value'.

   For binary formats, such as CBOR, problems.
   Structured Headers define a wholesale base64/85
   reserialization would be needed, with negative results generic, abstract model for both
   debugability and bandwidth.

   For textual formats, such as JSON, the format must first be neutered
   to not violate field-value's ABNF, and then workarounds added to
   reintroduce the features just lost, data, along
   with a concrete serialisation for instance UNICODE strings.

   The post-surgery format is no longer JSON, and it experience
   indicates expressing that almost-but-not-quite compatibility is worse than no
   compatibility.

   This proposal starts from the other end, and builds and generalizes a
   data structure definition from existing model in textual
   HTTP headers, which means
   that as used by HTTP/1 serialization [RFC7230] and 'field-value' compatibility is built
   in.

   If all future HTTP/2 [RFC7540].

   HTTP headers that are defined to fit into this Common
   Structure we have at least halted as Structured Headers use the proliferation of bespoke
   parsers and started to pave the road for semantic compression
   serializations types
   defined in this specification to define their syntax and basic
   handling rules, thereby simplifying both their definition and
   parsing.

   Additionally, future versions of HTTP traffic.

1.1.  Terminology

   In can define alternative
   serialisations of the abstract model of Structured Headers, allowing
   headers that use it to be transmitted more efficiently without being
   redefined.

   Note that it is not a goal of this document, document to redefine the syntax of
   existing HTTP headers; the mechanisms described herein are only
   intended to be used with headers that explicitly opt into them.

   To specify a header field that uses Structured Headers, see
   Section 2.

   Section 4 defines a number of abstract data types that can be used in
   Structured Headers, of which only three are allowed at the "top"
   level: lists, dictionaries, or items.

   Those abstract types can be serialised into textual headers - such as
   those used in HTTP/1 and HTTP/2 - using the algorithms described in
   Section 3.

1.1.  Notational Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119
   [RFC2119].

2.  Definition
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   This document uses the Augmented Backus-Naur Form (ABNF) notation of
   [RFC5234], including the DIGIT, ALPHA and DQUOTE rules from that
   document.  It also includes the OWS rule from [RFC7230].

2.  Specifying Structured Headers

   HTTP Header Common Structure

   The data model headers that use Structured Headers need to be defined to do so
   explicitly; recipients and generators need to know that the
   requirements of Common Structure this document are in effect.  The simplest way to do
   that is an ordered sequence of named
   dictionaries.  Please see Appendix A for how by referencing this model was derived. document in its definition.

   The field's definition will also need to specify the field-value's
   allowed syntax, in terms of the data model is on purpose abstract, uncoupled
   from any protocol serialization types described in Section 4, along
   with their associated semantics.

   Field definitions MUST NOT relax or programming environment
   representation, it is meant otherwise modify the requirements
   of this specification; doing so would preclude handling by generic
   software.

   However, field definitions are encouraged to clearly state additional
   constraints upon the syntax, as well as the foundation on which all such
   manifestations consequences when those
   constraints are violated.

   For example:

   # FooExample Header

   The FooExample HTTP header field conveys a list of numbers about how
   much Foo the model can sender has.

   FooExample is a Structured header [RFCxxxx]. Its value MUST be built.

   Common Structure in ABNF (Slightly bastardized relative to RFC5234
   [RFC5234]):

     import token from RFC7230
     import DIGIT from RFC5234

     common-structure = 1* ( identifier a
   dictionary ) ([RFCxxxx], Section Y.Y).

   The dictionary = MUST contain:

   * ( identifier [ value ] ) A member whose key is "foo", and whose value = identifier / is an integer /
     ([RFCxxxx], Section Y.Y), indicating the number /
             ascii-string /
             unicode-string /
             blob /
             timestamp /
             common-structure

   Recursion is included as a way to to support deep and more general
   data structures, but its use of foos in
     the message.
   * A member whose key is highly discouraged "bar", and where it whose value is
   used a string
     ([RFCxxxx], Section Y.Y), conveying the depth characteristic bar-ness
     of recursion SHALL always be explicitly limited in the
   specifications of message.

   If the HTTP headers which allow it.

     identifier = token  [ "/" token ]

     integer = ["-"] 1*19 DIGIT

   Integers SHALL parsed header field does not contain both, it MUST be in ignored.

   Note that empty header field values are not allowed by the range +/- 2^63-1 (= +/- 9223372036854775807) syntax,
   and therefore will be considered errors.

3.  Parsing Requirements for Textual Headers

   When a receiving implementation parses textual HTTP header fields
   (e.g., in HTTP/1 or HTTP/2) that are known to be Structured Headers,
   it is important that care be taken, as there are a number = ["-"] DIGIT '.' 1*14DIGIT /
              ["-"] 2DIGIT '.' 1*13DIGIT /
              ["-"] 3DIGIT '.' 1*12DIGIT /
              ... /
              ["-"] 12DIGIT '.' 1*3DIGIT /
              ["-"] 13DIGIT '.' 1*2DIGIT /
              ["-"] 14DIGIT '.' 1DIGIT

   The limit of 15 significant digits is chosen so edge
   cases that numbers can be
   correctly represented by IEEE754 64 bit binary floating point.

     ascii-string = * %x20-7e cause interoperability or even security problems.
   This is intended to be section specifies the algorithm for doing so.

   Given an efficient, "safe" and uncomplicated ASCII string
   type, for uses where input_string that represents the string content chosen
   header's field-value, return the parsed header value.  Note that
   input_string may incorporate multiple header lines combined into one
   comma-separated field-value, as per [RFC7230], Section 3.2.2.

   1.  Discard any OWS from the beginning of input_string.

   2.  If the field-value is culturally neutral or
   where it will not defined to be user visible.

     unicode-string = * UNICODE

     UNICODE = <U+0000-U+D7FF / U+E000-U+10FFFF>
     # UNICODE nicked a dictionary, return the
       result of Parsing a Dictionary from draft-seantek-unicode-in-abnf-02

   Unicode-strings are unrestricted because there Textual headers
       (Section 4.7).

   3.  If the field-value is no sane and/or
   culturally neutral way defined to subset or otherwise make unicode "safe",
   and Unicode is still evolving new and interesting code points.

   Users of unicode-string SHALL be prepared for a list, return the full gammut result of
   glyph-gymnastics in order
       Parsing a List from Textual Headers (Section 4.8).

   4.  If the field-value is defined to avoid U+1F4A9 U+08 U+1F574.

     blob = * %0x00-ff

   Blobs are intended primarily for cryptographic data, but can be used
   for any otherwise unsatisfied needs.

     timestamp = number

   A timestamp counts seconds since the UNIX time_t epoch, including a parameterised label, return
       the
   "invisible leap-seconds" misfeature.

3.  HTTP/1 Serialization result of HTTP Header Common Structure

   In ABNF:

     import OWS from RFC7230
     import HEXDIG, DQUOTE from RFC5234
     import EmbeddedUnicodeChar Parsing a Parameterised Label from RFC5137

     h1-common-structure-header =
             h1-common-structure-legacy-header /
             h1-common-structure-self-identifying-header

     h1-common-structure-legacy-header =
             field-name ":" OWS h1-common-structure

   Only white-listed legacy Textual headers (see Section 8) can use this format.

     h1-common-structure-self-identifying-header:
             field-name ":" OWS ">" h1-common-structure "<"

     h1-common-structure = h1-element * ("," h1-element)

     h1-element = identifier * (";" identifier ["=" h1-value])

     h1-value = identifier /
             integer /
             number /
             h1-ascii-string /
             h1-unicode-string /
             h1-blob /
             h1-timestamp /
             ">" h1-common-structure "<"

     h1-ascii-string = DQUOTE *(
                       ( "\" DQUOTE ) /
                       ( "\" "\" ) /
                       0x20-21 /
                       0x23-5B /
                       0x5D-7E
                       ) DQUOTE

     h1-unicode-string = DQUOTE *(
                         ( "\" DQUOTE )
                         ( "\" "\" ) /
                         EmbeddedUnicodeChar /
                         0x20-21 /
                         0x23-5B /
                         0x5D-7E /
                         ) DQUOTE

   The dim prospects of ever getting a majority
       (Section 4.4).

   5.  Otherwise, return the result of HTTP1 paths 8-bit
   clean makes UTF-8 unviable as H1 serialization.  Given Parsing an Item from Textual
       Headers (Section 4.6).

   Note that very
   little of the information in HTTP headers is presented to users in the first place, improving H1 case of lists and HPACK efficiency by inventing a
   more efficient RFC5137 compliant escape-sequences seems unwarranted.

     h1-blob = ":" base64 ":"
     # XXX: where to import base64 from ?

     h1-timestamp = number

   XXX: Allow OWS in parsers, but not in generators ?
   In programming environments which do not define a native
   representation or serialization dictionaries, this has the effect
   of combining multiple instances of Common Structure, the HTTP/1
   serialization should be used.

4.  When to use Common Structure Parser

   All future standardized header field into one.
   However, for singular items and all private HTTP headers using Common
   Structure should self identify as such.  In parameterised labels, it has the HTTP/1 serialization
   by making
   effect of selecting the first character ">" value and ignoring any subsequent
   instances of the last "<".  (These two
   characters are deliberately "the wrong way" to not clash with
   exsisting usages.)

   Legacy HTTP headers which fit into Common Structure, are marked field, as
   such in well as extraneous text afterwards.

   Additionally, note that the IANA Message Header Registry (see Section 8), and a
   snapshot effect of the registry can be used to trigger parsing according to
   Common Structure algorithms as
   specified is generally intolerant of these headers.

5.  Desired Normative Effects

   All new HTTP headers SHOULD use the Common Structure syntax errors; if at all
   possible.

6.  Open/Outstanding issues one is
   encountered, the typical response is to resolve

6.1.  Single/Multiple Headers

   Should we allow splitting common structure data over multiple headers
   ?

   Pro:

   Avoids size restrictions, easier on-the-fly editing

   Contra:

   Cannot act on any such throw an error, thereby
   discarding the entire header until all headers have been received.

   We must define where headers field value.  This includes any non-
   ASCII characters in input_string.

4.  Structured Header Data Types

   This section defines the abstract value types that can be split (between identifier and
   dictionary ?, in composed
   into Structured Headers, along with the middle textual HTTP serialisations
   of dictionaries ?)

   Most on-the-fly editing is hackish at best.

7.  Future Work
7.1.  Redefining existing headers for better performance

   The HTTP/1 serializations self-identification mechanism makes it
   possible to extend the definition them.

4.1.  Numbers

   Abstractly, numbers are integers with an optional fractional part.
   They have a maximum of existing Appendix A.5 headers
   into Common Structure.

   For instance fifteen digits available to be used in one could imagine:

     Date: >1475061449.201<

   Which would or
   both of the parts, as reflected in the ABNF below; this allows them
   to be faster stored as IEEE 754 double precision numbers (binary64)
   ([IEEE754]).

   The textual HTTP serialisation of numbers allows a maximum of fifteen
   digits between the integer and fractional part, along with an
   optional "-" indicating negative numbers.

   number   = ["-"] ( "." 1*15DIGIT /
                DIGIT "." 1*14DIGIT /
               2DIGIT "." 1*13DIGIT /
               3DIGIT "." 1*12DIGIT /
               4DIGIT "." 1*11DIGIT /
               5DIGIT "." 1*10DIGIT /
               6DIGIT "." 1*9DIGIT /
               7DIGIT "." 1*8DIGIT /
               8DIGIT "." 1*7DIGIT /
               9DIGIT "." 1*6DIGIT /
              10DIGIT "." 1*5DIGIT /
              11DIGIT "." 1*4DIGIT /
              12DIGIT "." 1*3DIGIT /
              13DIGIT "." 1*2DIGIT /
              14DIGIT "." 1DIGIT /
              15DIGIT )

   integer  = ["-"] 1*15DIGIT
   unsigned = 1*15DIGIT

   integer and unsigned are defined as conveniences to parse specification
   authors; if their use is specified and validate than their ABNF is not matched, a
   parser MUST consider it to be invalid.

   For example, a header whose value is defined as a number could look
   like:

   ExampleNumberHeader: 4.5

4.1.1.  Parsing Numbers from Textual Headers

   TBD

4.2.  Strings

   Abstractly, strings are ASCII strings [RFC0020], excluding control
   characters (i.e., the current
   definition range 0x20 to 0x7E).  Note that this excludes
   tabs, newlines and carriage returns.  They may be at most 1024
   characters long.

   The textual HTTP serialisation of the Date strings uses a backslash ("") to
   escape double quotes and backslashes in strings.

   string    = DQUOTE 1*1024(char) DQUOTE
   char      = unescaped / escape ( DQUOTE / "\" )
   unescaped = %x20-21 / %x23-5B / %x5D-7E
   escape    = "\"
   For example, a header whose value is defined as a string could look
   like:

   ExampleStringHeader: "hello world"

   Note that strings only use DQUOTE as a delimiter; single quotes do
   not delimit strings.  Furthermore, only DQUOTE and more precise too.

   Some kind of signal/negotiation mechanism would "" can be required to make
   this work escaped;
   other sequences MUST generate an error.

   Unicode is not directly supported in practice.

7.2.  Define Structured Headers, because it
   causes a validation dictionary

   A machine-readable specification of the legal contents number of HTTP
   headers would go interoperability issues, and - with few exceptions
   - header values do not require it.

   When it is necessary for a long way field value to improve efficiency and security in
   HTTP implementations.

8.  IANA Considerations

   The IANA Message Header Registry will convey non-ASCII string
   content, binary content (Section 4.5) SHOULD be extended specified, along with
   a character encoding (most likely, UTF-8).

4.2.1.  Parsing a String from Textual Headers

   Given an additional
   field named "Common Structure" which can have the values "True",
   "False" or "Unknown".

   The RFC723x headers listed in Appendix A.4 will get ASCII string input_string, return an unquoted string.
   input_string is modified to remove the value "True"
   in parsed value.

   1.  Let output_string be an empty string.

   2.  If the new field.

   The RFC723x headers listed in Appendix A.5 will get first character of input_string is not DQUOTE, throw an
       error.

   3.  Discard the value "False"
   in first character of input_string.

   4.  If input_string contains more than 1025 characters, throw an
       error.

   5.  While input_string is not empty:

       1.  Let char be the new field.

   All other existing entries in result of removing the registry will first character of
           input_string.

       2.  If char is a backslash ("\"):

           1.  If input_string is now empty, throw an error.

           2.  Else:

               1.  Let next_char be set to "Unknown"
   until and if the owner result of removing the entry requests otherwise.

9.  Security Considerations

   Unique dictionary keys are required first
                   character of input_string.

               2.  If next_char is not DQUOTE or "\", throw an error.

               3.  Append next_char to reduce output_string.

       3.  Else, if char is DQUOTE, remove the risk first character of smuggling
   attacks.

10.  References
10.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use
           input_string and return output_string.

       4.  Else, append char to output_string.

   6.  Otherwise, throw an error.

4.3.  Labels

   Labels are short (up to 256 characters) textual identifiers; their
   abstract model is identical to their expression in RFCs the textual HTTP
   serialisation.

   label = lcalpha *255( lcalpha / DIGIT / "_" / "-"/ "*" / "/" )
   lcalpha = %x61-7A ; a-z

   Note that labels can only contain lowercase letters.

   For example, a header whose value is defined as a label could look
   like:

   ExampleLabelHeader: foo/bar

4.3.1.  Parsing a Label from Textual Headers

   Given an ASCII string input_string, return a label. input_string is
   modified to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC5137]  Klensin, J., "ASCII Escaping remove the parsed value.

   1.  If input_string contains more than 256 characters, throw an
       error.

   2.  If the first character of Unicode Characters",
              BCP 137, RFC 5137, DOI 10.17487/RFC5137, February 2008,
              <http://www.rfc-editor.org/info/rfc5137>.

   [RFC5234]  Crocker, D., Ed. input_string is not lcalpha, throw an
       error.

   3.  Let output_string be an empty string.

   4.  While input_string is not empty:

       1.  Let char be the result of removing the first character of
           input_string.

       2.  If char is not one of lcalpha, DIGIT, "_", "-", "*" or "/":

           1.  Prepend char to input_string.

           2.  Return output_string.

       3.  Append char to output_string.

   5.  Return output_string.

4.4.  Parameterised Labels

   Parameterised Labels are labels (Section 4.3) with up to 256
   parameters; each parameter has a label and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", STD 68, RFC 5234,
              DOI 10.17487/RFC5234, January 2008,
              <http://www.rfc-editor.org/info/rfc5234>.

   [RFC7230]  Fielding, R., Ed. an optional value that is
   an item (Section 4.6).  Ordering between parameters is not
   significant, and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Message Syntax duplicate parameters MUST be considered an error.

   The textual HTTP serialisation uses semicolons (";") to delimit the
   parameters from each other, and Routing",
              RFC 7230, DOI 10.17487/RFC7230, June 2014,
              <http://www.rfc-editor.org/info/rfc7230>.

10.2.  Informative References

   [RFC7231]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Semantics and Content", RFC 7231,
              DOI 10.17487/RFC7231, June 2014,
              <http://www.rfc-editor.org/info/rfc7231>.

   [RFC7232]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Conditional Requests", RFC 7232,
              DOI 10.17487/RFC7232, June 2014,
              <http://www.rfc-editor.org/info/rfc7232>.

   [RFC7233]  Fielding, R., Ed., Lafon, Y., Ed., and J. Reschke, Ed.,
              "Hypertext Transfer Protocol (HTTP/1.1): Range Requests",
              RFC 7233, DOI 10.17487/RFC7233, June 2014,
              <http://www.rfc-editor.org/info/rfc7233>.

   [RFC7234]  Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke,
              Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching",
              RFC 7234, DOI 10.17487/RFC7234, June 2014,
              <http://www.rfc-editor.org/info/rfc7234>.

   [RFC7235]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Authentication", RFC 7235,
              DOI 10.17487/RFC7235, June 2014,
              <http://www.rfc-editor.org/info/rfc7235>.

   [RFC7239]  Petersson, A. and M. Nilsson, "Forwarded HTTP Extension",
              RFC 7239, DOI 10.17487/RFC7239, June 2014,
              <http://www.rfc-editor.org/info/rfc7239>.

   [RFC7694]  Reschke, J., "Hypertext Transfer Protocol (HTTP) Client-
              Initiated Content-Encoding", RFC 7694,
              DOI 10.17487/RFC7694, November 2015,
              <http://www.rfc-editor.org/info/rfc7694>.

Appendix A.  Do HTTP headers have any common structure ?

   Several proposals have been floated in recent years to use some
   preexisting structured data serialization or other for HTTP headers,
   to impose some sanity.

   None of these proposals have gained traction and no obvious candidate
   data serializations have been left unexamined.

   This effort tries equals ("=") to tackle delimit the question parameter
   name from the other side, by
   asking if there is its value.

   parameterised = label *256( OWS ";" OWS label [ "=" item ] )

   For example,

   ExampleParamHeader: abc; a=1; b=2; c

4.4.1.  Parsing a common structure in existing HTTP headers we can
   generalize for this purpose.

A.1.  Survey of HTTP header structure

   The RFC723x family Parameterised Label from Textual Headers

   Given an ASCII string input_string, return a label with an mapping of HTTP/1 standards control 49 entries in
   parameters. input_string is modified to remove the IANA
   Message Header Registry, and they share two common motifs.

   The majority of RFC723x HTTP headers are lists.  A few of them are
   ordered, ('Content-Encoding'), some are unordered ('Connection') and
   some are ordered by 'q=%f' weight parameters ('Accept')

   In most cases, parsed value.

   1.  Let primary_label be the list elements are some kind of identifier, usually
   derived from ABNF 'token' as defined by [RFC7230].

   A subgroup result of headers, mostly related to MIME, uses what one could
   call Parsing a 'qualified token'::

     qualified-token = token-or-asterix [ "/" token-or-asterix ]

   The second motif is parameterized list elements.  The best known is
   the "q=0.5" weight parameter, but other parameters exist as well.

   Generalizing Label from these motifs, our candidate "Common Structure" data
   model becomes Textual
       Headers (Section 4.3) from input_string.

   2.  Let parameters be an ordered list of named dictionaries. empty mapping.

   3.  In pidgin ABNF, ignoring white-space for a loop:

       1.   Consume any OWS from the sake beginning of clarity, input_string.

       2.   If the
   HTTP/1.1 serialization first character of Common Structure input_string is is something like:

     token-or-asterix = token from RFC7230, but also allowing "*"

     qualified-token = token-or-asterix [ "/" token-or-asterix ]

     field-name, see RFC7230

     Common-Structure-Header = field-name ":" 1#named-dictionary

     named-dictionary = qualified-token [ *(";" param) ]

     param = token [ "=" value ]

     value = we'll get back to this in a moment.

   Nineteen out of the RFC723x's 48 headers, almost 40%, can already be
   parsed using this definition, and none the rest have requirements
   which could not be met by this data model.  See Appendix A.4 and
   Appendix A.5 for ";", exit the full survey details.

A.2.  Survey
            loop.

       3.   Consume a ";" character from the beginning of values in HTTP headers

   Surveying input_string.

       4.   Consume any OWS from the datatypes beginning of HTTP headers, standardized as well as
   private, input_string.

       5.   let param_name be the following picture emerges:

A.2.1.  Numbers

   Integer and floating point are both used.  Range and precision result of Parsing a Label from Textual
            Headers (Section 4.3) from input_string.

       6.   If param_name is
   mostly unspecified already present in controlling documents.

   Scientific notation (9.192631770e9) does not seem to be used
   anywhere.

   The ranges used seem to parameters, throw an
            error.

       7.   Let param_value be minus several thousand to plus a couple of
   billions, null value.

       8.   If the high end almost exclusively being POSIX time_t
   timestamps.

A.2.2.  Timestamps

   RFC723x text format, but POSIX time_t represented as integer or
   floating point first character of input_string is not uncommon.  ISO8601 have also been spotted.

A.2.3.  Strings

   The vast majority are pure ASCII strings, with either no escapes, %xx
   URL-like escapes or C-style back-slash escapes, possibly with "=":

            1.  Consume the
   addition of \uxxxx UNICODE escapes.

   Where non-ASCII "=" character sets are used, they are almost always
   implicit, rather than explicit.  UTF8 and ISO-8859-1 seem to be most
   common.

A.2.4.  Binary blobs

   Often used for cryptographic data.  Usually in base64 encoding,
   sometimes ""-quoted more often not.  base85 encoding is also seen,
   usually quoted.

A.2.5.  Identifiers

   Seems to almost always fit in at the RFC723x 'token' definition.

A.3.  Is this actually a useful thing to generalize ?

   The number one wishlist item seems to be UNICODE strings, with a big
   side order beginning of not having to write a new parser routine every time
   somebody comes up with a new header.

   Having a common parser would indeed
                input_string.

            2.  Let param_value be a good thing, and having the result of Parsing an
   underlying data model which makes it possible define a compressed
   serialization, rather Item from
                Textual Headers (Section 4.6) from input_string.

       9.   If parameters has more than rely on serialization to text followed by
   text compression (ie: HPACK) seems like a good idea too.

   However, when using a datamodel and a parser general enough 255 members, throw an error.

       10.  Add param_name to
   transport useful data, it will have parameters with the value param_value.

   4.  Return the tuple (primary_label, parameters).

4.5.  Binary Content

   Arbitrary binary content up to 16K in size can be followed conveyed in
   Structured Headers.

   The textual HTTP serialisation indicates their presence by a validation
   step, which checks that leading
   "*", with the data also makes sense.

   Today validation, such as it is, is often done by the bespoke
   parsers.

   This then is probably where encoded using Base 64 Encoding [RFC4648], without
   padding (as "=" might be confused with the next big potential for improvement
   lies:

   Ideally a machine readable "data dictionary" which makes it possibly
   to copy that text out use of RFCs, run it through dictionaries).

   binary = "*" 1*21846(base64)
   base64 = ALPHA / DIGIT / "+" / "/"

   For example, a code generator which
   spits out validation code which operates on header whose value is defined as binary content could
   look like:

   ExampleBinaryHeader: *cHJldGVuZCB0aGlzIGlzIGJpbmFyeSBjb250ZW50Lg

4.5.1.  Parsing Binary Content from Textual Headers

   Given an ASCII string input_string, return binary content.
   input_string is modified to remove the parsed value.

   1.  If the output first character of input_string is not "*", throw an
       error.

   2.  Discard the common
   parser.

   But history has been particularly unkind first character of input_string.

   3.  Let b64_content be the result of removing content of input_string
       up to but not including the first character that idea.

   Most attempts studied as part is not in ALPHA,
       DIGIT, "+" or "/".

   4.  Let binary_content be the result of this effort, have sunk under
   complexity caused by reaching for generality, but where scope has
   been wisely limited, it seems to Base 64 Decoding [RFC4648]
       b64_content, synthesising padding if necessary.  If an error is
       encountered, throw it.

   5.  Return binary_content.

4.6.  Items

   An item is can be possible.

   So file that idea under "future work".

A.4.  RFC723x headers with "common structure"

   o  Accept [RFC7231], Section 5.3.2

   o  Accept-Charset [RFC7231], Section 5.3.3

   o  Accept-Encoding [RFC7231], Section 5.3.4, [RFC7694], Section 3

   o  Accept-Language [RFC7231], Section 5.3.5

   o  Age [RFC7234], Section 5.1

   o  Allow [RFC7231], Section 7.4.1

   o  Connection [RFC7230], Section 6.1

   o  Content-Encoding [RFC7231], Section 3.1.2.2

   o  Content-Language [RFC7231], Section 3.1.3.2

   o  Content-Length [RFC7230], Section 3.3.2

   o  Content-Type [RFC7231], Section 3.1.1.5

   o  Expect [RFC7231], Section 5.1.1

   o  Max-Forwards [RFC7231], Section 5.1.2

   o  MIME-Version [RFC7231], Appendix A.1

   o  TE [RFC7230], Section 4.3

   o  Trailer [RFC7230], Section 4.4

   o  Transfer-Encoding [RFC7230], Section 3.3.1

   o  Upgrade [RFC7230], Section 6.7

   o  Vary [RFC7231], Section 7.1.4

A.5.  RFC723x headers with "uncommon structure"

   1 a number (Section 4.1), string (Section 4.2), label
   (Section 4.3) or binary content (Section 4.5).

   item = number / string / label / binary

4.6.1.  Parsing an Item from Textual Headers

   Given an ASCII string input_string, return an item. input_string is
   modified to remove the parsed value.

   1.  Discard any OWS from the beginning of input_string.

   2.  If the RFC723x headers first character of input_string is only reserved, a "-" or a DIGIT,
       process input_string as a number (Section 4.1) and therefore have no
   structure at all:

   o  Close [RFC7230], Section 8.1

   5 return the
       result, throwing any errors encountered.

   3.  If the first character of input_string is a DQUOTE, process
       input_string as a string (Section 4.2) and return the RFC723x headers are HTTP dates:

   o  Date [RFC7231], Section 7.1.1.2

   o  Expires [RFC7234], Section 5.3

   o  If-Modified-Since [RFC7232], Section 3.3

   o  If-Unmodified-Since [RFC7232], Section 3.4

   o  Last-Modified [RFC7232], Section 2.2

   24 result,
       throwing any errors encountered.

   4.  If the first character of input_string is "*", process
       input_string as binary content (Section 4.5) and return the RFC723x headers use bespoke formats which only
       result, throwing any errors encountered.

   5.  If the first character of input_string is an lcalpha, process
       input_string as a single or
   in rare cases two headers share:

   o  Accept-Ranges [RFC7233], Section 2.3

      *  bytes-unit / other-range-unit

   o  Authorization [RFC7235], Section 4.2

   o  Proxy-Authorization [RFC7235], Section 4.4

      *  credentials

   o  Cache-Control [RFC7234], Section 5.2

      *  1#cache-directive

   o  Content-Location [RFC7231], Section 3.1.4.2

      *  absolute-URI / partial-URI

   o  Content-Range [RFC7233], Section 4.2

      *  byte-content-range / other-content-range

   o  ETag [RFC7232], Section 2.3

      *  entity-tag

   o  Forwarded [RFC7239]

      *  1#forwarded-element

   o  From [RFC7231], Section 5.5.1

      *  mailbox

   o  If-Match [RFC7232], Section 3.1
   o  If-None-Match [RFC7232], Section 3.2

      *  "*" / 1#entity-tag

   o  If-Range [RFC7233], Section 3.2

      *  entity-tag / HTTP-date

   o  Host [RFC7230], Section 5.4

      *  uri-host [ ":" port ]

   o  Location [RFC7231], Section 7.1.2

      *  URI-reference

   o  Pragma [RFC7234], Section 5.4

      *  1#pragma-directive

   o  Range [RFC7233], Section 3.1

      *  byte-ranges-specifier / other-ranges-specifier

   o  Referer [RFC7231], Section 5.5.2

      *  absolute-URI / partial-URI

   o  Retry-After [RFC7231], Section 7.1.3

      *  HTTP-date / delay-seconds

   o  Server [RFC7231], Section 7.4.2

   o  User-Agent [RFC7231], Section 5.5.3

      *  product *( RWS ( product / comment ) label (Section 4.3) and return the result,
       throwing any errors encountered.

   6.  Otherwise, throw an error.

4.7.  Dictionaries

   Dictionaries are unordered maps of key-value pairs, where the keys
   are labels (Section 4.3) and the values are items (Section 4.6).
   There can be between 1 and 1024 members, and keys are required to be
   unique.

   In the textual HTTP serialisation, keys and values are separated by
   "=" (without whitespace), and key/value pairs are separated by a
   comma with optional whitespace.

   dictionary = label "=" item *1023( OWS "," OWS label "=" item )

   o  Via [RFC7230], Section 5.7.1

      *  1#( received-protocol RWS received-by [ RWS comment ]

   For example, a header field whose value is defined as a dictionary
   could look like:

   ExampleDictHeader: foo=1.23, en="Applepie", da=*w4ZibGV0w6ZydGUK

   Typically, a header field specification will define the semantics of
   individual keys, as well as whether their presence is required or
   optional.  Recipients MUST ignore keys that are undefined or unknown,
   unless the header field's specification specifically disallows them.

4.7.1.  Parsing a Dictionary from Textual Headers

   Given an ASCII string input_string, return a mapping of (label,
   item). input_string is modified to remove the parsed value.

   1.  Let dictionary be an empty mapping.

   2.  While input_string is not empty:

       1.  Let this_key be the result of running Parse Label from
           Textual Headers (Section 4.3) with input_string.  If an error
           is encountered, throw it.

       2.  If dictionary already contains this_key, raise an error.

       3.  Consume a "=" from input_string; if none is present, raise an
           error.

       4.  Let this_value be the result of running Parse Item from
           Textual Headers (Section 4.6) with input_string.  If an error
           is encountered, throw it.

       5.  Add key this_key with value this_value to dictionary.

       6.  Discard any leading OWS from input_string.

       7.  If input_string is empty, return dictionary.

       8.  Consume a COMMA from input_string; if no comma is present,
           raise an error.

       9.  Discard any leading OWS from input_string.

   3.  Return dictionary.

4.8.  Lists

   Lists are arrays of items (Section 4.6) or parameterised labels
   (Section 4.4, with one to 1024 members.

   In the textual HTTP serialisation, each member is separated by a
   comma and optional whitespace.

   list = list_member 1*1024( OWS "," OWS list_member )

   o  Warning [RFC7234], Section 5.5

      *  1#warning-value

   o  Proxy-Authenticate [RFC7235], Section 4.3
   o  WWW-Authenticate [RFC7235], Section 4.1

      *  1#challenge
   list_member = item / parameterised

   For example, a header field whose value is defined as a list of
   labels could look like:

   ExampleLabelListHeader: foo, bar, baz_45

   and a header field whose value is defined as a list of parameterised
   labels could look like:

   ExampleParamListHeader: abc/def; g="hi";j, klm/nop

4.8.1.  Parsing a List from Textual Headers

   Given an ASCII string input_string, return a list of items.
   input_string is modified to remove the parsed value.

   1.  Let items be an empty array.

   2.  While input_string is not empty:

       1.  Let item be the result of running Parse Item from Textual
           Headers (Section 4.6) with input_string.  If an error is
           encountered, throw it.

       2.  Append item to items.

       3.  Discard any leading OWS from input_string.

       4.  If input_string is empty, return items.

       5.  Consume a COMMA from input_string; if no comma is present,
           raise an error.

       6.  Discard any leading OWS from input_string.

   3.  Return items.

5.  IANA Considerations

   This draft has no actions for IANA.

6.  Security Considerations

   TBD

7.  References

7.1.  Normative References

   [RFC0020]  Cerf, V., "ASCII format for network interchange", STD 80,
              RFC 20, DOI 10.17487/RFC0020, October 1969,
              <https://www.rfc-editor.org/info/rfc20>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
              <https://www.rfc-editor.org/info/rfc4648>.

   [RFC5234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", STD 68, RFC 5234,
              DOI 10.17487/RFC5234, January 2008,
              <https://www.rfc-editor.org/info/rfc5234>.

   [RFC7230]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Message Syntax and Routing",
              RFC 7230, DOI 10.17487/RFC7230, June 2014,
              <https://www.rfc-editor.org/info/rfc7230>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

7.2.  Informative References

   [IEEE754]  IEEE, "IEEE Standard for Floating-Point Arithmetic", 2008,
              <http://grouper.ieee.org/groups/754/>.

   [RFC7231]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Semantics and Content", RFC 7231,
              DOI 10.17487/RFC7231, June 2014,
              <https://www.rfc-editor.org/info/rfc7231>.

   [RFC7540]  Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
              Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
              DOI 10.17487/RFC7540, May 2015,
              <https://www.rfc-editor.org/info/rfc7540>.

7.3.  URIs

   [1] https://lists.w3.org/Archives/Public/ietf-http-wg/

   [2] https://httpwg.github.io/

   [3] https://github.com/httpwg/http-extensions/labels/header-structure

Appendix B. A.  Changes

B.1.

A.1.  Since draft-ietf-httpbis-header-structure-01

   Replaced with draft-nottingham-structured-headers.

A.2.  Since draft-ietf-httpbis-header-structure-00

   Added signed 64bit integer type.

   Drop UTF8, and settle on BCP137 [RFC5137]::EmbeddedUnicodeChar ::EmbeddedUnicodeChar for
   h1-unicode-string. h1-unicode-
   string.

   Change h1_blob delimiter to ":" since "'" is valid t_char

Author's Address

Authors' Addresses

   Mark Nottingham
   Fastly

   Email: mnot@mnot.net
   URI:   https://www.mnot.net/

   Poul-Henning Kamp
   The Varnish Cache Project

   Email: phk@varnish-cache.org