--- 1/draft-ietf-httpbis-header-compression-01.txt 2013-08-21 15:14:36.609124952 -0700 +++ 2/draft-ietf-httpbis-header-compression-02.txt 2013-08-21 15:14:36.653126047 -0700 @@ -1,443 +1,529 @@ HTTPbis Working Group R. Peon Internet-Draft Google, Inc Intended status: Informational H. Ruellan -Expires: January 10, 2014 Canon CRF - July 09, 2013 +Expires: February 22, 2014 Canon CRF + August 21, 2013 - HTTP/2.0 Header Compression - draft-ietf-httpbis-header-compression-01 + HPACK + draft-ietf-httpbis-header-compression-02 Abstract - This document describes a format adapted to efficiently represent - HTTP headers in the context of HTTP/2.0. + This document describes HPACK, a format adapted to efficiently + represent HTTP headers in the context of HTTP/2.0. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on January 10, 2014. + This Internet-Draft will expire on February 22, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents - 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 - 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 2 + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 + 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Outline . . . . . . . . . . . . . . . . . . . . . . . . . 3 - 3. Header Encoding . . . . . . . . . . . . . . . . . . . . . . . 3 - 3.1. Encoding Components . . . . . . . . . . . . . . . . . . . 3 - 3.2. Header Table . . . . . . . . . . . . . . . . . . . . . . 4 - 3.3. Header Representation . . . . . . . . . . . . . . . . . . 5 - 3.3.1. Literal Representation . . . . . . . . . . . . . . . 5 - 3.3.2. Indexed Representation . . . . . . . . . . . . . . . 6 - 3.4. Differential Coding . . . . . . . . . . . . . . . . . . . 6 - 4. Detailed Format . . . . . . . . . . . . . . . . . . . . . . . 7 - 4.1. Header Blocks . . . . . . . . . . . . . . . . . . . . . . 7 - 4.2. Low-level representations . . . . . . . . . . . . . . . . 7 - 4.2.1. Integer representation . . . . . . . . . . . . . . . 7 - 4.2.2. String literal representation . . . . . . . . . . . . 9 - 4.3. Indexed Header Representation . . . . . . . . . . . . . . 9 - 4.4. Literal Header Representation . . . . . . . . . . . . . . 10 - 4.4.1. Literal Header without Indexing . . . . . . . . . . . 10 - 4.4.2. Literal Header with Incremental Indexing . . . . . . 10 - 4.4.3. Literal Header with Substitution Indexing . . . . . . 11 - 5. Parameter Negotiation . . . . . . . . . . . . . . . . . . . . 12 - 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 - 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 - 8. Informative References . . . . . . . . . . . . . . . . . . . 13 - Appendix A. Initial header names . . . . . . . . . . . . . . . . 13 - A.1. Requests . . . . . . . . . . . . . . . . . . . . . . . . 14 - A.2. Responses . . . . . . . . . . . . . . . . . . . . . . . . 15 - Appendix B. Example . . . . . . . . . . . . . . . . . . . . . . 16 - B.1. First header set . . . . . . . . . . . . . . . . . . . . 16 - B.2. Second header set . . . . . . . . . . . . . . . . . . . . 17 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 + 3. Header Encoding . . . . . . . . . . . . . . . . . . . . . . . 4 + 3.1. Encoding Concepts . . . . . . . . . . . . . . . . . . . . 4 + 3.1.1. Encoding Context . . . . . . . . . . . . . . . . . . . 4 + 3.1.2. Header Table . . . . . . . . . . . . . . . . . . . . . 4 + 3.1.3. Reference Set . . . . . . . . . . . . . . . . . . . . 5 + 3.1.4. Header set . . . . . . . . . . . . . . . . . . . . . . 6 + 3.1.5. Header Representation . . . . . . . . . . . . . . . . 6 + 3.1.6. Header Emission . . . . . . . . . . . . . . . . . . . 6 + 3.2. Header Set Processing . . . . . . . . . . . . . . . . . . 7 + 3.2.1. Header Representation Processing . . . . . . . . . . . 7 + 3.2.2. Reference Set Emission . . . . . . . . . . . . . . . . 8 + 3.2.3. Header Set Completion . . . . . . . . . . . . . . . . 8 + 3.2.4. Header Table Management . . . . . . . . . . . . . . . 8 + 3.2.5. Specific Use Cases . . . . . . . . . . . . . . . . . . 8 + 4. Detailed Format . . . . . . . . . . . . . . . . . . . . . . . 9 + 4.1. Low-level representations . . . . . . . . . . . . . . . . 9 + 4.1.1. Integer representation . . . . . . . . . . . . . . . . 9 + 4.1.2. Header Name Representation . . . . . . . . . . . . . . 11 + 4.1.3. Header Value Representation . . . . . . . . . . . . . 11 + 4.2. Indexed Header Representation . . . . . . . . . . . . . . 11 + 4.3. Literal Header Representation . . . . . . . . . . . . . . 12 + 4.3.1. Literal Header without Indexing . . . . . . . . . . . 12 + 4.3.2. Literal Header with Incremental Indexing . . . . . . . 13 + 4.3.3. Literal Header with Substitution Indexing . . . . . . 14 + 5. Parameter Negotiation . . . . . . . . . . . . . . . . . . . . 15 + 6. Security Considerations . . . . . . . . . . . . . . . . . . . 15 + 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 + 8. Informative References . . . . . . . . . . . . . . . . . . . . 16 + Appendix A. Change Log (to be removed by RFC Editor before + publication . . . . . . . . . . . . . . . . . . . . . 16 + A.1. Since draft-ietf-httpbis-header-compression-01 . . . . . . 16 + Appendix B. Initial Header Tables . . . . . . . . . . . . . . . . 17 + B.1. Requests . . . . . . . . . . . . . . . . . . . . . . . . . 17 + B.2. Responses . . . . . . . . . . . . . . . . . . . . . . . . 18 + Appendix C. Example . . . . . . . . . . . . . . . . . . . . . . . 19 + C.1. First header set . . . . . . . . . . . . . . . . . . . . . 19 + C.2. Second header set . . . . . . . . . . . . . . . . . . . . 21 1. Introduction - This document describes a format adapted to efficiently represent - HTTP headers in the context of HTTP/2.0. + This document describes HPACK, a format adapted to efficiently + represent HTTP headers in the context of HTTP/2.0. 2. Overview In HTTP/1.X, HTTP headers, which are necessary for the functioning of the protocol, are transmitted with no transformations. Unfortunately, the amount of redundancy in both the keys and the - values of these headers is astonishingly high, and is the cause of - increased latency on lower bandwidth links. This indicates that an - alternate encoding for headers would be beneficial to latency, and - that is what is proposed here. As shown by SPDY [SPDY], Deflate - compresses HTTP very effectively. However, the use of a compression - scheme which allows for arbitrary matches against the previously - encoded data (such as Deflate) exposes users to security issues. In - particular, the compression of sensitive data, together with other - data controlled by an attacker, may lead to leakage of that sensitive - data, even when the resultant bytes are transmitted over an encrypted - channel. Another consideration is that processing and memory costs - of a compressor such as Deflate may also be too high for some classes - of devices, for example when doing forward or reverse proxying. + values of these headers is high, and is the cause of increased + latency on lower bandwidth links. This indicates that an alternate + more compact encoding for headers would be beneficial to latency, and + that is what is proposed here. + + As shown by SPDY [SPDY], Deflate compresses HTTP very effectively. + However, the use of a compression scheme which allows for arbitrary + matches against the previously encoded data (such as Deflate) exposes + users to security issues. In particular, the compression of + sensitive data, together with other data controlled by an attacker, + may lead to leakage of that sensitive data, even when the resultant + bytes are transmitted over an encrypted channel. + + Another consideration is that processing and memory costs of a + compressor such as Deflate may also be too high for some classes of + devices, for example when doing forward or reverse proxying. 2.1. Outline - The HTTP header representation described in this document is based on - indexing tables that store (name, value) pairs, called header tables - in the remainder of this document. This scheme is believed to be - safe for all known attacks against the compression context today. - Header tables are incrementally updated during the whole HTTP/2.0 - session. Two independent header tables are used during a HTTP/2.0 - session, one for HTTP request headers and one for HTTP response - headers. + The HTTP header encoding described in this document is based on a + header table that map (name, value) pairs to index values. This + scheme is believed to be safe for all known attacks against the + compression context today. Header tables are incrementally updated + during the HTTP/2.0 session. The encoder is responsible for deciding which headers to insert as - (name, value) pairs in the header table. The decoder then does - exactly what the encoder prescribes, ending in a state that exactly - matches the encoder's state. This enables decoders to remain simple - and understand a wide variety of encoders. + new entries in the header table. The decoder then does exactly what + the encoder prescribes, ending in a state that exactly matches the + encoder's state. This enables decoders to remain simple and + understand a wide variety of encoders. - A header may be represented as a literal or as an index. If - represented as a literal, the representation specifies whether this - header is used to update the indexing table. The different - representations are described in Section 3.3. + As two consecutive sets of headers often have headers in common, each + set of headers is coded as a difference from the previous set of + headers. The goal is to only encode the changes (headers present in + one of the set and not in the other) between the two sets of headers. - A set of headers is coded as a difference from the previous set of + An example illustrating the use of these different mechanisms to + represent headers is available in Appendix C. + +3. Header Encoding + +3.1. Encoding Concepts + + The encoding and decoding of headers relies on some components and + concepts. The set of components used form an encoding context. + + Header Table: The header table (see Section 3.1.2) is a component + used to associate headers to index values. + + Reference Set: The reference set (see Section 3.1.3) is a component + containing a group of headers used as a reference for the + differential encoding of a new set of headers. + + Header Set: A header set (see Section 3.1.4) is a group of headers + that are encoded jointly. A complete set of key-value pairs as + encoded in an HTTP request or response is a header set. + + Header Representation: A header can be represented in encoded form + either as a literal or as an index (see Section 3.1.5). The + indexed representation is based on the header table. + + Header Emission: When decoding a set of headers, some operations + emit a header (see Section 3.1.6). An emitted header is added to + the set of headers. Once emitted, a header can't be removed from + the set of headers. + +3.1.1. Encoding Context + + The set of components used to encode or decode a header set form an + encoding context: an encoding context contains a header table and a + reference set. + + Using HTTP, messages are exchanged between a client and a server in + both direction. To keep the encoding of headers in each direction + independent from the other direction, there is one encoding context + for each direction. + + The headers contained in a PUSH_PROMISE frame sent by a server to a + client are encoded within the same context as the headers contained + in the HEADERS frame corresponding to a response sent from the server + to the client. + +3.1.2. Header Table + + A header table consists of an ordered list of (name, value) pairs. + The first entry of a header table is assigned the index 0. + + A header can be represented by an entry of the header table if they + match. A header and an entry match if both their name and their + value match. A header name and an entry name match if they are equal + using a character-based, _case insensitive_ comparison (the case + insensitive comparison is used because HTTP header names are defined + in a case insensitive way). A header value and an entry value match + if they are equal using a character-based, _case sensitive_ + comparison. + + Generally, the header table will not contain duplicate entries. + However, implementations MUST be prepared to accept duplicates + without signalling an error. + + Initially, a header table contains a list of common headers. Two + initial lists of header are provided in Appendix B. One list is for + headers transmitted from a client to a server, the other for the + reverse direction. + + A header table is modified by either adding a new entry at the end of + the table, or by replacing an existing entry. + + The encoder decides how to update the header table and as such can + control how much memory is used by the header table. To limit the + memory requirements on the decoder side, the header table size is + bounded (see the SETTINGS_MAX_BUFFER_SIZE in Section 5). + + The size of an entry is the sum of its name's length in bytes (as + defined in Section 4.1.2), of its value's length in bytes + (Section 4.1.3) and of 32 bytes. The 32 bytes are an accounting for + the entry structure overhead. For example, an entry structure using + two 64-bits pointers to reference the name and the value and the + entry, and two 64-bits integer for counting the number of references + to these name and value would use 32 bytes. + + The size of a header table is the sum of the size of its entries. + +3.1.3. Reference Set + + A reference set is defined as an unordered set of references to + entries of the header table. + + The initial reference set is the empty set. + + The reference set is updated during the processing of a set of headers. - An example illustrating the use these different mechanisms to - represent headers is available in Appendix B. + Using the differential encoding, a header that is not present in the + reference set can be encoded either with an indexed representation + (if the header is present in the header table), or with a literal + representation (if the header is not present in the header table). -3. Header Encoding + A header that is to be removed from the reference set is encoded with + an indexed representation. -3.1. Encoding Components +3.1.4. Header set - The encoding and decoding of headers relies on a few components. - First, a header table (see Section 3.2) is used to associate headers - to index values. Second, a set of headers is encoded as a difference - from the previous reference set of headers (see Section 3.4). + A header set is a group of header fields that are encoded as a whole. + Each header field is a (name, value) pair. - As messages are exchanged in two directions, from client to server - and from server to client, there are two sets of components: one for - each direction. All the headers sent in messages from the client to - the server are encoded (and decoded) using one set of components. - All the headers sent in messages from the server to the client - (including headers contained in PUSH_PROMISE frame) are encoded using - the other set of compotents. + A header set is encoded using an ordered list of zero or more header + representations. All the header representations describing a header + set a grouped into a header block. -3.2. Header Table +3.1.5. Header Representation - A header table consists of an ordered list of (name, value) pairs. A - pair is either inserted at the end of the table or replaces an - existing pair depending on the chosen representation. A pair can be - represented as an index which is its position in the table, starting - with 0 for the first entry. + A header can be represented either as a literal or as an index. - An input header name matches the header name of a (name, value) pair - stored in the Header Table if they are equal using a character-based, - _case sensitive_ comparison. An input header value matches the - header value of a (name, value) pair stored in the Header Table if - they are equal using a character-based, _case sensitive_ comparison. - An input header (name, value) pair matches a pair in the Header Table - if both the name and value are matching as per above. + Literal Representation: A literal representation defines a new + header. The header name is represented either literally or as a + reference to an entry of the header table. The header value is + represented literally. - Generally, the header table will not contain duplicate header (name, - value) entries. However, implementations MUST be prepared to accept - duplicates without signaling an error. If duplicates are added to - the table, they MUST be treated as distinct entries with their own - index positions. + Three different literal representations are provided: - The header table is progressively updated based on headers - represented as literal (as defined in Section 3.3.1). Two update - mechanisms are defined: + * A literal representation that does not add the header to the + header table (see Section 4.3.1). - o Incremental indexing: the represented header is inserted at the - end of the header table as a (name, value) pair. The inserted - pair index is set to the next free index in the table: it is equal - to the number of headers in the table before its insertion. + * A literal representation that adds the header at the end of the + header table (see Section 4.3.2). - o Substitution indexing: the represented header contains an index to - an existing (name, value) pair. The existing pair value is - replaced by the pair representing the new header. + * A literal representation that uses the header to replace an + existing entry of the header table (see Section 4.3.3). - Incremental and substitution indexing are optional. If none of them - is selected in a header representation, the header table is not - updated. In particular, no update happens on the header table when - processing an indexed representation. + Indexed Representation: The indexed representation defines a header + as a reference in the header table (see Section 4.2). - The header table size can be bounded so as to limit the memory - requirements (see the SETTINGS_MAX_BUFFER_SIZE in Section 5). The - header table size is defined as the sum of the size of each entry of - the table. The size of an entry is the sum of the length in bytes - (as defined in Section 4.2.2) of its name, of value's length in bytes - and of 32 bytes (for accounting for the entry structure overhead). - The header table size MUST NOT exceed this limit. +3.1.6. Header Emission - Before adding a new entry to the header table or changing an existing - one, a check has to be performed to ensure that the change will not - cause the table to grow in size beyond the SETTINGS_MAX_BUFFER_SIZE - limit. If necessary, one or more items from the beginning of the - table are removed until there is enough free space available to make - the modification. Dropping an entry from the beginning of the table - causes the index positions of the remaining entries in the table to - be decremented by 1. [[Feedback is needed on this automatic eviction - strategy. ]] + The emission of header is the process of adding a header to the + current set of headers. Once an header is emitted, it can't be + removed from the current set of headers. - When using substitution indexing, it is possible that the existing - item being replaced might be one of the items removed when performing - the necessary size adjustment. In such cases, the substituted value - being added to the header table is inserted at the beginning of the - header table (at index position #0) and the index positions of the - other remaining entries in the table are incremented by 1. + The concept of header emission allows a decoder to know when it can + pass a header safely to a higher level on the receiver side. This + allows a decoder to be implemented in a streaming way, and as such to + only keep in memory the header table and the reference set. With + such an implementation, the amount of memory used by the decoder is + bounded, even in presence of a very large set of headers. The + management of memory for handling very large sets of headers can + therefore be deferred to the application, which may be able to emit + the header to the wire and thus free up memory quickly. - To optimize the representation of the headers exchanged at the - beginning of an HTTP/2.0 session, the header table is initialized - with common headers. Two lists of initial headers are provided in - Appendix A. One is for messages sent from a client to a server, the - other is for messages sent from a server to a client. +3.2. Header Set Processing -3.3. Header Representation + The processing of an encoded header set to obtain a list of headers + is defined in this section. To ensure a correct decoding of a header + set, a decoder MUST obey the following rules. -3.3.1. Literal Representation +3.2.1. Header Representation Processing - The literal representation defines a new header. A literal header is - represented as: + All the header representations contained in a header block are + processed in the order in which they are presented, as specified + below. - o A header name, with two possible representations: + An _indexed representation_ corresponding to an entry _not present_ + in the reference set entails the following actions: - * A literal string, as described in Section 4.2.2. + o The header corresponding to the entry is emitted. - * A index in the header table referencing the name of the - corresponding header. The index is represented as an integer, - as described in Section 4.2.1. + o The entry is added to the reference set. - o The header value, represented as a literal string, as described in - Section 4.2.2. + An _indexed representation_ corresponding to an entry _present_ in + the reference set entails the following actions: -3.3.2. Indexed Representation + o The entry is removed from the reference set. - The indexed representation defines a header as a match to a (name, - value) pair in the header table. An indexed header is represented - as: + A _literal representation_ that is _not added_ to the header table + entails the following action: - o An integer representing the index of the matching (name, value) - pair, as described in Section 4.2.1. + o The header is emitted. -3.4. Differential Coding + A _literal representation_ that is _added_ to the header table + entails the following actions: - A set of headers is encoded as a difference from the previous - reference set of headers. The initial reference set of headers is - the empty set. + o The header is emitted. - An indexed representation toggles the presence of the header in the - current set of headers. If the header corresponding to the indexed - representation was not in the set, it is added to the set. If the - header index was in the set, it is removed from it. + o The header is added to the header table, at the location defined + by the representation. - A literal representation adds a header to the current set of headers. + o The new entry is added to the reference set. - To ensure a correct decoding of a set of headers, the following steps - or equivalent ones MUST be executed by the decoder. +3.2.2. Reference Set Emission - First, upon starting the decoding of a new set of headers, the - reference set of headers is interpreted into the working set of - headers: for each header in the reference set, an entry is added to - the working set, containing the header name, its value, and its - current index in the header table. + Once all the representations contained in a header block have been + processed, the headers that are in common with the previous header + set are emitted, during the reference set emission. - Then, the header representations are processed in their order of - occurrence in the frame. + For the reference set emission, each header contained in the + reference set that has not been emitted during the processing of the + header block is emitted. - For an indexed representation, the decoder checks whether the index - is present in the working set. If true, the corresponding entry is - removed from the working set. If several entries correspond to this - encoded index, all these entries are removed from the working set. - If the index is not present in the working set, it is used to - retrieve the corresponding header from the header table, and a new - entry is added to the working set representing this header. +3.2.3. Header Set Completion - For a literal representation, a new entry is added to the working set - representing this header. If the literal representation specifies - that the header is to be indexed, the header is added accordingly to - the header table, and its index is included in the entry in the - working set. Otherwise, the entry in the working set contains an - undefined index. + Once all of the header representations have been processed, and the + remaining items in the reference set have been emitted, the header + set is complete. - When all the header representations have been processed, the working - set contains all the headers of the set of headers. +3.2.4. Header Table Management - The new reference set of headers is computed by removing from the - working set all the headers that are not present in the header table. + The header table can be modified by either adding a new entry to it + or by replacing an existing one. Before doing such a modification, + it has to be ensured that the header table size will stay lower than + or equal to the SETTINGS_MAX_BUFFER_SIZE limit (see Section 5). To + achieve this, repeatedly, the first entry of the header table is + removed, until enough space is available for the modification. - It should be noted that during the decoding of the header - representations, the same index may be associated to different - headers in the working set and in the header table. + A consequence of removing one or more entries at the beginning of the + header table is that the remaining entries are renumbered. The first + entry of the header table is always associated to the index 0. -4. Detailed Format + When the modification of the header table is the replacement of an + existing entry, the replaced entry is the one indicated in the + literal representation before any entry is removed from the header + table. If the entry to be replaced is removed from the header table + when performing the size adjustment, the replacement entry is + inserted at the beginning of the header table. -4.1. Header Blocks + The addition of a new entry with a size greater than the + SETTINGS_MAX_BUFFER_SIZE limit causes all the entries from the header + table to be dropped and the new entry not to be added to the header + table. The replacement of an existing entry with a new entry with a + size greater than the SETTINGS_MAX_BUFFER_SIZE has the same + consequences. - A header block consists of a set of header fields, which are name- - value pairs. Each header field is encoded using one of the header - representation. +3.2.5. Specific Use Cases -4.2. Low-level representations + Three occurrences of the same indexed representation, corresponding + to an entry not present in the reference set, emit the associated + header twice: -4.2.1. Integer representation + o The first occurrence emits the header a first time and adds the + corresponding entry to the reference set. + + o The second occurrence removes the header's entry from the + reference set. + + o The third occurrence emits the header a second time and adds again + its entry to the reference set. + + This allows for headers sets which include duplicate header entries + to be encoded efficiently and faithfully. + + The first occurrence of the indexed representation can be replaced by + a literal representation creating an entry for the header. + +4. Detailed Format + +4.1. Low-level representations + +4.1.1. Integer representation Integers are used to represent name indexes, pair indexes or string - lengths. The integer representation keeps byte-alignment as much as - possible as this allows various processing optimizations as well as - efficient use of DEFLATE. For that purpose, an integer + lengths. To allow for optimized processing, an integer representation always finishes at the end of a byte. An integer is represented in two parts: a prefix that fills the current byte and an optional list of bytes that are used if the integer value does not fit in the prefix. The number of bits of the prefix (called N) is a parameter of the integer representation. The N-bit prefix allows filling the current byte. If the value is small enough (strictly less than 2^N-1), it is encoded within the N-bit prefix. Otherwise all the bits of the prefix are set to 1 and the value is encoded using an unsigned variable length integer [1] representation. The algorithm to represent an integer I is as follows: - 1. If I < 2^N - 1, encode I on N bits - - 2. Else, encode 2^N - 1 on N bits and do the following steps: - - 3. - - 1. Set I to (I - (2^N - 1)) and Q to 1 - 2. While Q > 0 - - 3. - - 1. Compute Q and R, quotient and remainder of I divided by - 2^7 - - 2. If Q is strictly greater than 0, write one 1 bit; - otherwise, write one 0 bit - - 3. Encode R on the next 7 bits - - 4. I = Q + If I < 2^N - 1, encode I on N bits + Else + encode 2^N - 1 on N bits + While I >= 128 + Encode (I % 128 + 128) on 8 bits + I = I / 128 + encode (I) on 8 bits -4.2.1.1. Example 1: Encoding 10 using a 5-bit prefix +4.1.1.1. Example 1: Encoding 10 using a 5-bit prefix The value 10 is to be encoded with a 5-bit prefix. o 10 is less than 31 (= 2^5 - 1) and is represented using the 5-bit prefix. 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | X | X | X | 0 | 1 | 0 | 1 | 0 | 10 stored on 5 bits +---+---+---+---+---+---+---+---+ -4.2.1.2. Example 2: Encoding 1337 using a 5-bit prefix +4.1.1.2. Example 2: Encoding 1337 using a 5-bit prefix The value I=1337 is to be encoded with a 5-bit prefix. - o 1337 is greater than 31 (= 2^5 - 1). + 1337 is greater than 31 (= 2^5 - 1). - o + The 5-bit prefix is filled with its max value (31). - * The 5-bit prefix is filled with its max value (31). + I = 1337 - (2^5 - 1) = 1306. - o The value to represent on next bytes is I = 1337 - (2^5 - 1) = - 1306. + I (1306) is greater than or equal to 128, the while loop body + executes: - o + I % 128 == 26 - * 1306 = 128*10 + 26, i.e. Q=10 and R=26. + 26 + 128 == 154 - * Q is greater than 1, bit 8 is set to 1. + 154 is encoded in 8 bits as: 10011010 - * The remainder R=26 is encoded on next 7 bits. + I is set to 10 (1306 / 128 == 10) - * I is replaced by the quotient Q=10. + I is no longer greater than or equal to 128, the while loop + terminates. - o The value to represent on next bytes is I = 10. + I, now 10, is encoded on 8 bits as: 00001010 - o + The process ends. - * 10 = 128*0 + 10, i.e. Q=0 and R=10. + 0 1 2 3 4 5 6 7 ++---+---+---+---+---+---+---+---+ +| X | X | X | 1 | 1 | 1 | 1 | 1 | Prefix = 31, I = 1306 +| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1306>=128, encode(154), I = 1306/128 +| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 10<128, encode(10), done ++---+---+---+---+---+---+---+---+ - * Q is equal to 0, bit 16 is set to 0. +4.1.2. Header Name Representation - * The remainder R=10 is encoded on next 7 bits. + Header names are sequences of ASCII characters that MUST conform to + the following header-name ABNF construction: - * I is replaced by the quotient Q=0. + LOWERALPHA = %x61-7A + header-char = "!" / "#" / "$" / "%" / "&" / "'" / + "*" / "+" / "-" / "." / "^" / "_" / + "`" / "|" / "~" / DIGIT / LOWERALPHA + header-name = [":"] 1*header-char - o The process ends. + They are encoded in two parts: - 0 1 2 3 4 5 6 7 - +---+---+---+---+---+---+---+---+ - | X | X | X | 1 | 1 | 1 | 1 | 1 | Prefix = 31 - | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | Q>=1, R=26 - | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | Q=0 , R=10 - +---+---+---+---+---+---+---+---+ + 1. The length of the text, defined as the number of octets of + storage required to store the text, represented as a variable- + length-quantity (Section 4.1.1). -4.2.2. String literal representation + 2. The specific sequence of ASCII octets - Literal strings can represent header names or header values. They +4.1.3. Header Value Representation + + Header values are encoded as sequences of UTF-8 encoded text. They are encoded in two parts: - 1. The string length, defined as the number of bytes needed to store - its UTF-8 representation, is represented as an integer with a - zero bits prefix. If the string length is strictly less than - 128, it is represented as one byte. + 1. The length of the text, defined as the number of octets of + storage required to store the text, represented as a variable- + length-quantity (Section 4.1.1). - 2. The string value represented as a list of UTF-8 characters. + 2. The specific sequence of octets representing the UTF-8 text. -4.3. Indexed Header Representation + Invalid UTF-8 octet sequences, "over-long" UTF-8 encodings, and UTF-8 + octets that represent invalid Unicode Codepoints MUST NOT be used. + +4.2. Indexed Header Representation 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 1 | Index (7+) | +---+---------------------------+ Indexed Header This representation starts with the '1' 1-bit pattern, followed by the index of the matching pair, represented as an integer with a 7-bit prefix. -4.4. Literal Header Representation +4.3. Literal Header Representation -4.4.1. Literal Header without Indexing +4.3.1. Literal Header without Indexing 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 0 | 1 | 1 | Index (5+) | +---+---+---+-------------------+ | Value Length (8+) | +-------------------------------+ | Value String (Length octets) | +-------------------------------+ @@ -460,220 +546,271 @@ This representation, which does not involve updating the header table, starts with the '011' 3-bit pattern. If the header name matches the header name of a (name, value) pair stored in the Header Table, the index of the pair increased by one (index + 1) is represented as an integer with a 5-bit prefix. Note that if the index is strictly below 31, one byte is used. If the header name does not match a header name entry, the value 0 is - represented on 5 bits followed by the header name, represented as a - literal string. + represented on 5 bits followed by the header name (Section 4.1.2). Header name representation is followed by the header value - represented as a literal string as described in Section 4.2.2. + represented as a literal string as described in Section 4.1.3. + +4.3.2. Literal Header with Incremental Indexing -4.4.2. Literal Header with Incremental Indexing 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 0 | 1 | 0 | Index (5+) | +---+---+---+-------------------+ | Value Length (8+) | +-------------------------------+ | Value String (Length octets) | +-------------------------------+ - Literal Header with Incremental Indexing - Indexed Name + Literal Header with Incremental Indexing - + Indexed Name 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 0 | 1 | 0 | 0 | +---+---+---+-------------------+ | Name Length (8+) | +-------------------------------+ | Name String (Length octets) | +-------------------------------+ | Value Length (8+) | +-------------------------------+ | Value String (Length octets) | +-------------------------------+ - Literal Header with Incremental Indexing - New Name + Literal Header with Incremental Indexing - + New Name This representation starts with the '010' 3-bit pattern. If the header name matches the header name of a (name, value) pair stored in the Header Table, the index of the pair increased by one (index + 1) is represented as an integer with a 5-bit prefix. Note that if the index is strictly below 31, one byte is used. If the header name does not match a header name entry, the value 0 is - represented on 5 bits followed by the header name, represented as a - literal string. + represented on 5 bits followed by the header name (Section 4.1.2). Header name representation is followed by the header value - represented as a literal string as described in Section 4.2.2. + represented as a literal string as described in Section 4.1.3. -4.4.3. Literal Header with Substitution Indexing +4.3.3. Literal Header with Substitution Indexing 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 0 | 0 | Index (6+) | +---+---+-----------------------+ | Substituted Index (8+) | +-------------------------------+ | Value Length (8+) | +-------------------------------+ | Value String (Length octets) | +-------------------------------+ - Literal Header with Substitution Indexing - Indexed Name + Literal Header with Substitution Indexing - + Indexed Name 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 0 | 0 | 0 | +---+---+-----------------------+ | Name Length (8+) | +-------------------------------+ | Name String (Length octets) | +-------------------------------+ | Substituted Index (8+) | +-------------------------------+ | Value Length (8+) | +-------------------------------+ | Value String (Length octets) | +-------------------------------+ - Literal Header with Substitution Indexing - New Name + Literal Header with Substitution Indexing - + New Name This representation starts with the '00' 2-bit pattern. If the header name matches the header name of a (name, value) pair stored in the Header Table, the index of the pair increased by one (index + 1) is represented as an integer with a 6-bit prefix. Note that if the index is strictly below 62, one byte is used. If the header name does not match a header name entry, the value 0 is - represented on 6 bits followed by the header name, represented as a - literal string. + represented on 6 bits followed by the header name (Section 4.1.2). The index of the substituted (name, value) pair is inserted after the header name representation as a 0-bit prefix integer. The index of the substituted pair MUST correspond to a position in the header table containing a non-void entry. An index for the substituted pair that corresponds to empty position in the header table MUST be treated as an error. This index is followed by the header value represented as a literal - string as described in Section 4.2.2. + string as described in Section 4.1.3. 5. Parameter Negotiation - A few parameters can be used to accomodate client and server - processing and memory requirements. [[These settings are currently - not supported as they have not been integrated in the main + + A few parameters can be used to accommodate client and server + processing and memory requirements. [[anchor3: These settings are + currently not supported as they have not been integrated in the main specification. Therefore, the maximum buffer size for the header table is fixed at 4096 bytes. ]] SETTINGS_MAX_BUFFER_SIZE: Allows the sender to inform the remote endpoint of the maximum size it accepts for the header table. The default value is 4096 bytes. - [[Is this default value OK? Do we need a maximum size? Do we - want to allow infinite buffer?]] + [[anchor4: Is this default value OK? Do we need a maximum size? + Do we want to allow infinite buffer?]] When the remote endpoint receives a SETTINGS frame containing a SETTINGS_MAX_BUFFER_SIZE setting with a value smaller than the one currently in use, it MUST send as soon as possible a HEADER frame with a stream identifier of 0x0 containing a value smaller than or equal to the received setting value. - [[This changes slightly the behaviour of the HEADERS frame, which - should be updated as follows: ]] + [[anchor5: This changes slightly the behaviour of the HEADERS + frame, which should be updated as follows:]] A HEADER frame with a stream identifier of 0x0 indicates that the sender has reduced the maximum size of the header table. The new maximum size of the header table is encoded on 32-bit. The decoder MUST reduce its own header table by dropping entries from it until the size of the header table is lower than or equal to the transmitted maximum size. 6. Security Considerations - TODO? + This compressor exists to solve security issues present in stream + compressors such as DEFLATE whereby the compression context can be + efficiently probed to reveal secrets. A conformant implementation of + this specification should be fairly safe against that kind of attack, + as the reaping of any information from the compression context + requires more work than guessing and verifying the plaintext data + directly with the server. As with any secret, however, the longer + the length of the secret, the more difficult the secret is to guess. + It is inadvisable to have short cookies that are relied upon to + remain secret for any duration of time. + + A proper security-conscious implementation will also need to prevent + timing attacks by ensuring that the amount of time it takes to do + string comparisons is always a function of the total length of the + strings, and not a function of the number of matched characters. + + Another common security problem is when the remote endpoint + successfully causes the local endpoint to exhaust its memory. This + compressor attempts to deal with the most obvious ways that this + could occur by limiting both the peak and the steady-state amount of + memory consumed in the compressor state, by providing ways for the + application to consume/flush the emitted headers in small chunks, and + by considering overhead in the state size calculation. Implementors + must still be careful in the creation of APIs to an implementation of + this compressor by ensuring that header keys and values are either + emitted as a stream, or that the compression implementation have a + limit on the maximum size of a key or value. Failure to implement + these kinds of safeguards may still result in a scenario where the + local endpoint exhausts its memory. 7. IANA Considerations This memo includes no request to IANA. 8. Informative References [SPDY] Belshe, M. and R. Peon, "SPDY Protocol", February 2012, . -Appendix A. Initial header names + [1] - [[The tables in this section should be updated based on statistical - analysis of header names frequency and specific HTTP 2.0 header rules - (like removal of some headers). ]] - [[These tables are not adapted for headers contained in PUSH_PROMISE - frames. Either the tables can be merged, or the table for responses - can be updated. ]] +Appendix A. Change Log (to be removed by RFC Editor before publication -A.1. Requests +A.1. Since draft-ietf-httpbis-header-compression-01 + + o Refactored of Header Encoding Section: split definitions and + processing rule. + + o Backward incompatible change: Updated reference set management as + per issue #214. This changes how the interaction between the + reference set and eviction works. This also changes the working + of the reference set in some specific cases. + + o Backward incompatible change: modified initial header list, as per + issue #188. + + o Added example of 32 bytes entry structure (issue #191). + + o Added Header Set Completion section. Reflowed some text. + Clarified some writing which was akward. Added text about + duplicate header entry encoding. Clarified some language w.r.t + Header Set. Changed x-my-header to mynewheader. Added text in the + HeaderEmission section indicating that the application may also be + able to free up memory more quickly. Added information in + Security Considerations section. + +Appendix B. Initial Header Tables + + [[anchor9: The tables in this section should be updated based on + statistical analysis of header names frequency and specific HTTP 2.0 + header rules (like removal of some headers).]] + [[anchor10: These tables are not adapted for headers contained in + PUSH_PROMISE frames. Either the tables can be merged, or the table + for responses can be updated.]] + +B.1. Requests The following table lists the pre-defined headers that make-up the initial header table user to represent requests sent from a client to a server. +-------+---------------------+--------------+ | Index | Header Name | Header Value | +-------+---------------------+--------------+ | 0 | :scheme | http | | 1 | :scheme | https | | 2 | :host | | | 3 | :path | / | | 4 | :method | GET | | 5 | accept | | | 6 | accept-charset | | | 7 | accept-encoding | | | 8 | accept-language | | | 9 | cookie | | | 10 | if-modified-since | | - | 11 | keep-alive | | - | 12 | user-agent | | - | 13 | proxy-connection | | - | 14 | referer | | - | 15 | accept-datetime | | - | 16 | authorization | | - | 17 | allow | | - | 18 | cache-control | | - | 19 | connection | | - | 20 | content-length | | - | 21 | content-md5 | | - | 22 | content-type | | - | 23 | date | | - | 24 | expect | | - | 25 | from | | - | 26 | if-match | | - | 27 | if-none-match | | - | 28 | if-range | | - | 29 | if-unmodified-since | | - | 30 | max-forwards | | - | 31 | pragma | | - | 32 | proxy-authorization | | - | 33 | range | | - | 34 | te | | - | 35 | upgrade | | - | 36 | via | | - | 37 | warning | | + | 11 | user-agent | | + | 12 | referer | | + | 13 | authorization | | + | 14 | allow | | + | 15 | cache-control | | + | 16 | connection | | + | 17 | content-length | | + | 18 | content-type | | + | 19 | date | | + | 20 | expect | | + | 21 | from | | + | 22 | if-match | | + | 23 | if-none-match | | + | 24 | if-range | | + | 25 | if-unmodified-since | | + | 26 | max-forwards | | + | 27 | proxy-authorization | | + | 28 | range | | + | 29 | via | | +-------+---------------------+--------------+ - Table 1 -A.2. Responses + Table 1: Initial Header Table for Requests + +B.2. Responses The following table lists the pre-defined headers that make-up the initial header table used to represent responses sent from a server to a client. The same header table is also used to represent request headers sent from a server to a client in a PUSH_PROMISE frame. +-------+-----------------------------+--------------+ | Index | Header Name | Header Value | +-------+-----------------------------+--------------+ | 0 | :status | 200 | @@ -690,65 +827,62 @@ | 11 | vary | | | 12 | via | | | 13 | access-control-allow-origin | | | 14 | accept-ranges | | | 15 | allow | | | 16 | connection | | | 17 | content-disposition | | | 18 | content-encoding | | | 19 | content-language | | | 20 | content-location | | - | 21 | content-md5 | | - | 22 | content-range | | - | 23 | link | | - | 24 | location | | - | 25 | p3p | | - | 26 | pragma | | - | 27 | proxy-authenticate | | - | 28 | refresh | | - | 29 | retry-after | | - | 30 | strict-transport-security | | - | 31 | trailer | | - | 32 | transfer-encoding | | - | 33 | warning | | - | 34 | www-authenticate | | + | 21 | content-range | | + | 22 | link | | + | 23 | location | | + | 24 | proxy-authenticate | | + | 25 | refresh | | + | 26 | retry-after | | + | 27 | strict-transport-security | | + | 28 | transfer-encoding | | + | 29 | www-authenticate | | +-------+-----------------------------+--------------+ - Table 2 -Appendix B. Example + Table 2: Initial Header Table for Responses + +Appendix C. Example Here is an example that illustrates different representations and how - tables are updated. [[This section needs to be updated to integrate - differential coding.]] + tables are updated. [[anchor13: This section needs to be updated to + better reflect the new processing of header fields, and include more + examples.]] -B.1. First header set +C.1. First header set The first header set to represent is the following: :path: /my-example/index.html user-agent: my-user-agent - x-my-header: first + mynewheader: first The header table is empty, all headers are represented as literal - headers with indexing. The 'x-my-header' header name is not in the + headers with indexing. The 'mynewheader' header name is not in the header name table and is encoded literally. This gives the following representation: 0x44 (literal header with incremental indexing, name index = 3) 0x16 (header value string length = 22) /my-example/index.html 0x4D (literal header with incremental indexing, name index = 12) 0x0D (header value string length = 13) my-user-agent 0x40 (literal header with incremental indexing, new name) 0x0B (header name string length = 11) - x-my-header + mynewheader 0x05 (header value string length = 5) first The header table is as follows after the processing of these headers: Header table +---------+----------------+---------------------------+ | Index | Header Name | Header Value | +---------+----------------+---------------------------+ | 0 | :scheme | http | @@ -756,45 +890,45 @@ | 1 | :scheme | https | +---------+----------------+---------------------------+ | ... | ... | ... | +---------+----------------+---------------------------+ | 37 | warning | | +---------+----------------+---------------------------+ | 38 | :path | /my-example/index.html | added header +---------+----------------+---------------------------+ | 39 | user-agent | my-user-agent | added header +---------+----------------+---------------------------+ - | 40 | x-my-header | first | added header + | 40 | mynewheader | first | added header +---------+----------------+---------------------------+ As all the headers in the first header set are indexed in the header table, all are kept in the reference set of headers, which is: Reference Set: :path, /my-example/index.html user-agent, my-user-agent - x-my-header, first + mynewheader, first -B.2. Second header set +C.2. Second header set The second header set to represent is the following: :path: /my-example/resources/script.js user-agent: my-user-agent - x-my-header: second + mynewheader: second Comparing this second header set to the reference set, the first and third headers are from the reference set are not present in this second header set and must be removed. In addition, in this new set, the first and third headers have to be encoded. The path header is - represented as a literal header with substitution indexing. The x - -my-header will be represented as a literal header with incremental + represented as a literal header with substitution indexing. The + mynewheader will be represented as a literal header with incremental indexing. 0xa6 (indexed header, index = 38: removal from reference set) 0xa8 (indexed header, index = 40: removal from reference set) 0x04 (literal header, substitution indexing, name index = 3) 0x26 (replaced entry index = 38) 0x1f (header value string length = 31) /my-example/resources/script.js 0x5f 0x0a (literal header, incremental indexing, name index = 40) 0x06 (header value string length = 6) @@ -812,33 +946,33 @@ +---------+----------------+---------------------------+ | ... | ... | ... | +---------+----------------+---------------------------+ | 37 | warning | | +---------+----------------+---------------------------+ | 38 | :path | /my-example/resources/ | replaced | | | script.js | header +---------+----------------+---------------------------+ | 39 | user-agent | my-user-agent | +---------+----------------+---------------------------+ - | 40 | x-my-header | first | + | 40 | mynewheader | first | +---------+----------------+---------------------------+ - | 41 | x-my-header | second | added header + | 41 | mynewheader | second | added header +---------+----------------+---------------------------+ All the headers in this second header set are indexed in the header table, therefore, all are kept in the reference set of headers, which becomes: Reference Set: :path, /my-example/resources/script.js user-agent, my-user-agent - x-my-header, second + mynewheader, second Authors' Addresses Roberto Peon Google, Inc EMail: fenix@google.com Herve Ruellan Canon CRF