[Docs] [txt|pdf|xml|html] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: (draft-kucherawy-httpbis-dict-sec) 00

Individual submission                                          W. Handte
Internet-Draft                                            Facebook, Inc.
Intended status: Informational                          October 29, 2019
Expires: May 1, 2020


       Security Considerations Regarding Compression Dictionaries
                    draft-handte-httpbis-dict-sec-00

Abstract

   Dictionary-based compression enables better performance, but brings
   state into the process of compression, with all the complexities that
   follow.  This document explores the security implications of this
   technique in the context of internet protocols and enumerates known
   risks and mitigations.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 1, 2020.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Handte                     Expires May 1, 2020                  [Page 1]


Internet-Draft       Compression Dictionary Security        October 2019


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Compression Environments  . . . . . . . . . . . . . . . .   3
     2.2.  Security Properties . . . . . . . . . . . . . . . . . . .   4
     2.3.  Threat Model  . . . . . . . . . . . . . . . . . . . . . .   4
     2.4.  Existing Attacks  . . . . . . . . . . . . . . . . . . . .   4
   3.  Dictionaries  . . . . . . . . . . . . . . . . . . . . . . . .   5
     3.1.  Dictionary Compression  . . . . . . . . . . . . . . . . .   5
     3.2.  Dictionary Contents . . . . . . . . . . . . . . . . . . .   5
       3.2.1.  Unstructured Dictionaries . . . . . . . . . . . . . .   5
       3.2.2.  Structured Dictionaries . . . . . . . . . . . . . . .   6
     3.3.  Using Dictionaries  . . . . . . . . . . . . . . . . . . .   6
       3.3.1.  Generating Dictionaries . . . . . . . . . . . . . . .   6
       3.3.2.  Identifying Dictionaries  . . . . . . . . . . . . . .   7
       3.3.3.  Distributing Dictionaries . . . . . . . . . . . . . .   9
       3.3.4.  Selecting Dictionaries  . . . . . . . . . . . . . . .   9
       3.3.5.  Using Dictionaries  . . . . . . . . . . . . . . . . .  10
       3.3.6.  Deleting Dictionaries . . . . . . . . . . . . . . . .  10
   4.  Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
     4.1.  Revealing Message Content . . . . . . . . . . . . . . . .  11
       4.1.1.  By Observing Which Dictionary is Used . . . . . . . .  11
       4.1.2.  By Observing Message Size . . . . . . . . . . . . . .  12
       4.1.3.  By Observing Timing . . . . . . . . . . . . . . . . .  13
     4.2.  Revealing Dictionary Content  . . . . . . . . . . . . . .  14
       4.2.1.  By Observing Message Size . . . . . . . . . . . . . .  14
       4.2.2.  In Compression  . . . . . . . . . . . . . . . . . . .  14
       4.2.3.  In Decompression  . . . . . . . . . . . . . . . . . .  14
     4.3.  Manipulating Message Content  . . . . . . . . . . . . . .  15
       4.3.1.  By Manipulating Message Content . . . . . . . . . . .  16
       4.3.2.  By Manipulating Dictionary Content  . . . . . . . . .  16
       4.3.3.  By Manipulating Dictionary Identifiers  . . . . . . .  17
     4.4.  Obfuscating Message Content . . . . . . . . . . . . . . .  17
       4.4.1.  From Intermediaries . . . . . . . . . . . . . . . . .  17
       4.4.2.  Multiple Representations  . . . . . . . . . . . . . .  18
     4.5.  Tracking Users  . . . . . . . . . . . . . . . . . . . . .  18
       4.5.1.  Through Dictionary Negotiation  . . . . . . . . . . .  18
       4.5.2.  Through Dictionary Retrieval  . . . . . . . . . . . .  19
     4.6.  Denial of Service . . . . . . . . . . . . . . . . . . . .  19
     4.7.  Resource Exhaustion . . . . . . . . . . . . . . . . . . .  19
       4.7.1.  Resources . . . . . . . . . . . . . . . . . . . . . .  19
       4.7.2.  Targets . . . . . . . . . . . . . . . . . . . . . . .  22
     4.8.  Generating Dictionaries . . . . . . . . . . . . . . . . .  24
       4.8.1.  Handling Samples  . . . . . . . . . . . . . . . . . .  24
       4.8.2.  Tagging Mitigations . . . . . . . . . . . . . . . . .  24
       4.8.3.  Probabilistic Mitigations . . . . . . . . . . . . . .  25
     4.9.  Complexity  . . . . . . . . . . . . . . . . . . . . . . .  25



Handte                     Expires May 1, 2020                  [Page 2]


Internet-Draft       Compression Dictionary Security        October 2019


   5.  Conclusions . . . . . . . . . . . . . . . . . . . . . . . . .  25
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  26
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  26
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  26
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  26
     8.2.  Other Examples of Dictionary-Like Compression . . . . . .  26
     8.3.  Informative References  . . . . . . . . . . . . . . . . .  27
   Appendix A.  Acknowledgements . . . . . . . . . . . . . . . . . .  30
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  30

1.  Introduction

   General-purpose data compression algorithms are designed to achieve
   good performance on many different kinds of data.  However, that
   general-purpose nature makes them, to a certain extent, jacks of all
   trades and masters of none: a compressor that has been tuned for a
   specific use case can always perform better than a generic
   equivalent.

   In response, a number of modern compression algorithms (including
   DEFLATE [DEFLATE], Brotli [BROTLI], and Zstandard [ZSTD]) have
   developed a generic capability to specialize themselves.  In addition
   to the actual message to be processed, these compressors allow users
   to provide additional context information, which the compressor and
   decompressor can use to tailor their internal states to that
   particular use case.  To the extent that this auxiliary data matches
   the nature of the message being compressed, the compressor can use it
   to produce a smaller compressed representation of the message.  This
   auxiliary data can include various things, but it has come to be
   known as a "dictionary."

   As dictionary-based compression has been adopted, it has been found
   that its use can present security challenges.  This document is a
   collection of those challenges.  As future use cases for dictionaries
   are contemplated, this document can be used as a checklist to ensure
   that the protocols, their specifications, and their implementations
   have been appropriately evaluated against these concerns.

2.  Basis

2.1.  Compression Environments

   The security of any use of compression depends greatly on the
   environment in which it is deployed, and the threats it is subjected
   to.  This document analyzes dictionary-based compression as it might
   be used by a generic internet protocol, in which:





Handte                     Expires May 1, 2020                  [Page 3]


Internet-Draft       Compression Dictionary Security        October 2019


   o  Agents exchange messages over possibly-trusted, possibly-
      authenticated, possibly-encrypted channels, which are vulnerable
      to some combination of traffic analysis, eavesdropping, and
      manipulation.

   o  Agents exchange messages with parties they may not trust.

   o  Agents may take protocol actions (generating, sending, receiving,
      and interpreting messages) in response to triggers other than user
      action.  Some examples include:

      *  Replying automatically to received messages.

      *  Relaying or forwarding received messages to other agents (e.g.,
         an SMTP relay).

      *  Exchanging messages at the behest of trusted or untrusted code
         (e.g., trusted: a website codebase generating responses to HTTP
         requests, untrusted: a website's JavaScript code running in a
         browser.

   This document aims to enumerate all security risks raised when using
   dictionary-based compression in this baseline environment.  In
   addition to attempting an exhaustive list of possible security risks,
   this document will identify desirable properties of the protocol
   stack and environment in which the compression is used and other
   methods with which individual concerns can be obviated or mitigated.

2.2.  Security Properties

   [TODO]

2.3.  Threat Model

   [TODO]

2.4.  Existing Attacks

   This document excludes from its analysis security risks that are
   already present without the use of dictionary compression.

   In particular, compression as it broadly used today--without
   dictionaries--is known to introduce vulnerabilities.  The most well
   known series of these attacks ([CRIME] et al.) recovers message
   content of inaccessible or encrypted traffic by observing message
   sizes while manipulating other parts of the message or traffic
   stream.




Handte                     Expires May 1, 2020                  [Page 4]


Internet-Draft       Compression Dictionary Security        October 2019


3.  Dictionaries

3.1.  Dictionary Compression

   Classically, compression algorithms operate as stateless, pure
   functions.  In that mode, their output depends solely on the input
   message and the algorithm's implementation details.  Dictionaries
   break that paradigm, introducing an additional input to the
   compression and decompression operations.  Compressors may then
   leverage the contents of that additional input--the dictionary--to
   produce more compact representations of their inputs.

                    +--------- dictionary ----------+
                    |                               |
                    V                               V
              +------------+   compressed    +--------------+
   message -> | compressor | --> message --> | decompressor | -> message
              +------------+ representation  +--------------+

   In introducing this other element, the interpretation of the
   compressed message becomes dependent on the content of the
   dictionary, and therefore same dictionary that was used to compress a
   message must be presented at decompression time.  In this way,
   dictionaries are in effect an out-of-band communication or pre-shared
   key between the compressor and decompressor.

3.2.  Dictionary Contents

   In principle, the contents of a dictionary are solely the concern of
   the compressor and decompressor, and implementations should be free
   to treat them as opaque blobs.  However, when analyzing their
   security characteristics, it's useful to understand the data that is
   actually present in a dictionaries.

   Dictionaries take two broad forms.

3.2.1.  Unstructured Dictionaries

   Some compressors (e.g., DEFLATE [DEFLATE] and Zstandard [ZSTD])
   accept arbitrary, unstructured bytestreams as dictionaries.  In these
   cases, the dictionary is used purely as a buffer in which LZ77-style
   content matches can be found [LZ77].  That is, when the dictionary
   contains some sequence of bytes that is also present in the message,
   the compressor can choose to represent those bytes by referencing
   them in the dictionary, rather than by representing them literally.






Handte                     Expires May 1, 2020                  [Page 5]


Internet-Draft       Compression Dictionary Security        October 2019


3.2.2.  Structured Dictionaries

   Some compressors (e.g., Brotli [BROTLI] and Zstandard [ZSTD]) accept
   dictionaries that conform to a specific and defined format.  In these
   cases, the dictionary data can consist of multiple components, each
   of which is used in different ways.

   metadata:  The dictionary may contain metadata that identifies the
      dictionary.  For example, Zstandard dictionaries include a 32-bit
      integer ID field.

   statistics:  The dictionary may contain frequency distributions of
      various kinds of symbols, which the compressor can use to more
      efficiently encode the corresponding streams instead of using a
      default frequency distribution.

   initial values:  For example, Zstandard allows the dictionary to
      initialize certain parts of the compressor's internal state (in
      particular, the initial values of Repeated_Offset1,
      Repeated_Offset2, and Repeated_Offset3) [ZSTD].

   instructions:  The dictionary may describe preprocessing or
      transformation steps to be taken on the input.  [TODO: expand]

   corpus content:

      untokenized:  For LZ77-style compressors [LZ77], the structured
         dictionaries may still contain unstructured content for the
         compressor to make matches against.

      tokenized:  Alternatively, for LZ78-style compressors [LZ78], the
         match content is tokenized (i.e., it consists of a collection
         of independent strings, serialized in some form).

3.3.  Using Dictionaries

   In order to use compression dictionaries in a system, it is not only
   the internals and integration points of the compressor and
   decompressor whose behavior must change.  Dictionaries make
   compression stateful, and applications that use dictionaries must
   therefore participate in the whole lifecycle of state management.

3.3.1.  Generating Dictionaries

   As noted in Section 3.2.1, some compression algorithms can accept
   arbitrary, unstructured inputs as dictionaries.  These unstructured
   dictionaries do not require an explicit generation step; users can
   simply repurpose existing messages as dictionaries.  This potentially



Handte                     Expires May 1, 2020                  [Page 6]


Internet-Draft       Compression Dictionary Security        October 2019


   avoids the need to perform additional coordination and communication
   to distribute purpose-built dictionaries.  See for example the
   Compression Dictionaries for HTTP/2 proposal
   [I-D.vkrasnov-h2-compression-dictionaries].

   Alternatively, the dictionary may be a separate object, purpose-built
   for the task.  Generating such a dictionary may be desirable for a
   number of reasons, including:

   o  Building a dictionary is necessary to produce the structure in a
      structured dictionary.

   o  Trained dictionaries generally perform better than using raw
      content.  The training process selects the parts of the sample
      corpus that are useful for compression and discard the parts that
      are not, producing a more compact and more effective dictionary.

   o  The training process is an opportunity to sanitize the content
      that ends up being used as a dictionary, potentially enhancing
      security and privacy (see Section 4.8).

   In general, an algorithm is run over a corpus of sample messages
   (such as the COVER algorithm [COVER] in Zstandard), which selects
   commonly occurring substrings and bundles them together.

   Any structured metadata (e.g., symbol distribution statistics) can
   then be calculated.  For example, Zstandard then compresses some of
   the sample messages it was given with the dictionary and aggregates
   the statistics resulting from those compressions and writes them into
   the dictionary's header.

3.3.2.  Identifying Dictionaries

   If freedom exists in a system as to which dictionary is to be used
   for a given message, there must be some way to distinguish which
   dictionary to use, so that decompressors can use the same one.  In
   practice, this means associating each dictionary with an identifier.

   Popular methods to do this include:

   Identity ID:  The "identifier" for the dictionary is the dictionary
      itself.  This is not really very popular, since information theory
      strongly suggests that a compressed message without a dictionary
      will always be smaller than a message compressed with a dictionary
      plus the dictionary.

   Arbitrary IDs:  The scheme associates an arbitrary identifier (e.g.,
      a number or string) with this dictionary.  This can have the



Handte                     Expires May 1, 2020                  [Page 7]


Internet-Draft       Compression Dictionary Security        October 2019


      advantage of being the most compact, but has the disadvantage that
      it neither describes the content of the dictionary nor where to
      get it.

   Content-Derived IDs:  Identifiers that are deterministically derived
      from the content they identify (such as hashes), when designed
      well, have the benefit that they can validate the associated
      dictionary without requiring trusting the dictionary source.
      (Though they are of course vulnerable to collision attacks.)  They
      have the disadvantage that they do not describe where to source
      the dictionary.  In order to be secure, they may also have to be
      relatively verbose.

   Location-Based IDs:  Identifiers of this form (notably, URLs) do not
      identify the content directly, but rather describe where to get
      it.  They are suitable insofar as that source can be trusted to
      reliably serve the same content to different participants.

3.3.2.1.  Existing Systems

   Existing compression schemes have selected the following
   identification systems:

   DEFLATE:  DEFLATE writes an Adler32 checksum of the dictionary into
      its compressed message header and checks it at decompression-time.

   Brotli:  Brotli always implicitly uses a single static dictionary.
      As such, no identifier is needed or provided [BROTLI].

   Shared Brotli:  Shared Brotli uses either a 256-bit Highwayhash
      digest of the dictionary or a direct pointer to the dictionary
      when it is included in the same compressed stream
      [I-D.vandevenne-shared-brotli-format].

   Zstandard:  Zstandard uses 32-bit integers to identify dictionaries
      [ZSTD].

   SDCH:  SDCH uses a URL to describe how to fetch a dictionary and then
      a hash (a 96-bit prefix of the SHA-256 digest of the dictionary)
      in negotiations [I-D.lee-sdch-spec].

   CDH2:  Compression Dictionaries for HTTP/2 uses an 8-bit integer
      [I-D.vkrasnov-h2-compression-dictionaries].








Handte                     Expires May 1, 2020                  [Page 8]


Internet-Draft       Compression Dictionary Security        October 2019


3.3.3.  Distributing Dictionaries

   Dictionaries must themselves be made accessible to participants.
   There are several possible approaches to doing this:

   static:  The protocol defines the set of dictionaries.  Protocol
      implementations can statically include or independently generate
      these dictionaries.  No further distribution mechanism is
      required.

   local:  When dictionaries are not specified by the protocol, but are
      derived locally or provided by the user, no dictionary
      distribution mechanism is required, although a negotiation
      mechanism might be.

   centralized:  The set of dictionaries in use by the system changes
      over time, coordinated by and available from a central authority.

   distributed:  The set of dictionaries in use by the system changes
      over time.  Some or all participants can generate and publish
      dictionaries.

3.3.4.  Selecting Dictionaries

   Related to the above, because the same dictionary must be used to
   compress and decompress a particular message, it is necessary for the
   compressor and decompressor to come to an understanding as to which
   dictionary they will use for a given message, presumably based on
   selecting which dictionary of those available to both the sender and
   receiver is most suitable.  This selection process can take multiple
   forms:

   implicit:  In situations where the compressor or protocol specifies a
      single dictionary that is always used (e.g., Brotli [BROTLI]), no
      particular selection process is required.  Use of the compression
      scheme at all (which may or may not itself be negotiated) is
      sufficient to identify the dictionary to use.

   unilateral:  When the set of dictionaries available to the
      decompressing agent is known to the compressing agent, the
      compressing agent may unilaterally select a dictionary to use, and
      include an identification of that dictionary in either the
      compressed message itself (e.g., Zstandard's Dictionary_ID field
      in the frame header) or in protocol metadata (e.g., an HTTP
      response header).  This mechanism can be applied in simple
      situations, such as when the set of dictionaries used by the
      protocol is fixed and guaranteed to be immediately available to
      all participants (such as by being included in the



Handte                     Expires May 1, 2020                  [Page 9]


Internet-Draft       Compression Dictionary Security        October 2019


      implementation's installation).  It can also be applied to a more
      loose definition of availability, if the decompressing agent is
      known to be capable of retrieving the dictionary based on the
      provided identifier, even if it doesn't have the dictionary at
      present.

   bilateral:  When the set of dictionaries available to each party is
      not known to the other, additional messages may be required in
      order for the compressing agent to select a dictionary available
      to both both parties.  In particular, while other negotiation
      patterns only require a flow of information from the compressor to
      the decompressor, which matches the flow of the compressed message
      itself, this mechanism requires communication in both directions.

3.3.5.  Using Dictionaries

   Having selected and retrieved a dictionary, it remains to actually
   present the dictionary to the compressor or decompressor and perform
   the compression operation.

   Dictionaries, whether structured or not, are flat byte streams.  In
   order to be used (especially in compression), most implementations
   require that a preparation step be performed on the content of the
   dictionary, populating the compressor's internal datastructures.

   This materialization of the dictionary can sometimes be performed
   transparently as part of the compression or decompression operation.
   Alternatively, some compressors allow this materialization step to be
   performed separately / explicitly.  When this capability is used, the
   work of processing the serialized dictionary into the compressor's
   internal datastructures only needs to be performed once, even when
   this materialized dictionary object is used for many compressions or
   decompressions.  This can lead to significant efficiencies.

3.3.6.  Deleting Dictionaries

   Dictionary compression inherently entangles the lifetimes of
   different pieces of data.  When a dictionary is generated, it
   collects and incorporates information about the data it was trained
   on (whether that be diffuse statistical information, small common
   substrings or tokens, or significant contiguous excerpts of the
   training data).  When that dictionary is used to compress a set of
   messages, it must be retained by the system for as long as the system
   desires to be able to decompress any of those messages.  The lifetime
   of information derived from individual messages is thus tied to the
   lifetime of many messages, or even the whole system.  This introduces
   complexities for systems that wish to minimize or bound the lifetime
   of individual pieces of data.



Handte                     Expires May 1, 2020                 [Page 10]


Internet-Draft       Compression Dictionary Security        October 2019


4.  Risks

   These subsections each describe a class of security issues that have
   been raised concerning dictionary-based compression and the
   surrounding protocol mechanisms.  Where possible and known,
   mitigations are described.

4.1.  Revealing Message Content

   This section discusses attacks that use dictionary compression to
   recover content in the message.

4.1.1.  By Observing Which Dictionary is Used

   Because dictionaries' effectiveness improves the more that they
   target a specific type of data, a protocol may want to use multiple
   dictionaries, each targeting a subclass of the system's traffic.
   Alternatively, a participant may always avoid using a dictionary in
   certain scenarios, such as when reporting an error.  When this is the
   case, the use of a particular dictionary or not for a message implies
   that the message belongs to the corresponding subclass of traffic.

   The metadata identifying which dictionary was used to compress a
   message should therefore be protected to the same extent that the
   message content is protected.  (Similarly, the choice of dictionary
   and any data exchanged in that selection process may reveal other
   information about the sender and receiver, independent of the content
   of the specific message being handled, which is discussed in
   Section 4.5.1.)

   This information may itself be inferred from other signals, and
   therefore serve as a stepping stone connecting those signals to
   conclusions about message content.

   Message Size:  Observations of message sizes, especially headers or
      connection negotiations (also discussed in Section 4.1.2), can
      indicate whether a dictionary was used, or even perhaps which
      dictionary was used.

   Timing:  Compression with and without a dictionary may take
      observably different amounts of time.  This is also discussed in
      Section 4.1.3.

   Dictionary Retrieval:  When dictionaries are retrieved dynamically,
      another vector for learning this information is simply observing
      whether a message triggers a fetch for a dictionary, and if so,
      which dictionary.  (This is also discussed in Section 4.5.2.)
      Protocols should consider decoupling retrieving dictionaries



Handte                     Expires May 1, 2020                 [Page 11]


Internet-Draft       Compression Dictionary Security        October 2019


      (especially when doing so is easily observable) from using them.
      For example, SDCH advertises and retrieves dictionaries
      independently of using them [I-D.lee-sdch-spec].

4.1.2.  By Observing Message Size

   By manipulating a portion of the message and observing the overall
   size of the compressed message, the attacker can recover information
   about the portions of the message not under its control [BREACH]
   [CRIME] [HEIST].  Given that dictionary-based compression is an
   extension of dictionary-less compression, it is certainly also
   vulnerable to this attack.

   In particular, the dictionary itself can be used in this sort of
   attack, to the extent that its contents are attacker-controlled.
   Note that the ability to control which dictionary is used may
   indirectly give an attacker the effective ability to modify the
   contents of the dictionary.

   Protocol designers should therefore prevent parties that will not
   have access to the message content from being able to influence the
   dictionary used to compress the message.

   In settings where the dictionary that is used is derived from
   previous traffic, especially if previous traffic is directly used as
   a dictionary, the problem of ensuring that private data and attacker-
   controlled data grows in complexity.  In such a scheme, the attacker
   may also be able to exercise more control over the content of the
   dictionary if they can influence the order in which messages are
   exchanged.  Protocols of this sort may wish to place strong controls
   on the kinds of messages that can be included in the dictionary.  See
   for example [I-D.vkrasnov-h2-compression-dictionaries].

   The remaining question is whether the dictionary constitutes a third
   class of data (fixed, known data), with distinct security properties.
   That is, even if the dictionary is neither under attacker control nor
   does it contain private information, can its use still reveal
   information about the contents of the message under compression.

4.1.2.1.  Mitigating with Padding

   One possible mitigation of the compressed message size oracle is to
   add padding to messages, either at the compression level or at the
   transport layer (e.g., [I-D.pironti-tls-length-hiding]).  Even simple
   padding schemes can significantly inflate the cost of mounting such
   an attack, if not mitigate it completely.





Handte                     Expires May 1, 2020                 [Page 12]


Internet-Draft       Compression Dictionary Security        October 2019


4.1.2.2.  Mitigating by Separating Content

   Another possible strategy to mitigate this attack is to avoid letting
   attacker-controlled data be matched against private data.  This can
   be accomplished by avoiding compressing one or the other, or by
   compressing them independently of each other.  See, e.g.,
   [CLOUDFLARE-NO-COMPRESS].

4.1.2.3.  Mitigating by Avoiding Repeated Compressions

   A crucial feature of these attacks is that they require the message
   under attack to be re-compressed many times (proportional to the
   amount of information being extracted).  The attack can therefore be
   mitigated either by limiting the number of times the same message can
   be compressed (rate-limiting), or by making sure that it is not the
   same message that is compressed every time.

   That is to say, these attacks are most effective when the attacker-
   controlled data is the only thing that is changing between
   compressions.  Changing or randomizing content (ideally, including
   the secrets in question) in the message on each compression can make
   it much harder to extract information.

4.1.3.  By Observing Timing

   Timing is another classic side-channel through which information can
   leak.  An attacker could potentially observe the time taken during
   compression or decompression, and draw conclusions about the contents
   of a message.  As discussed in Section 4.7.1.3.1, it is possible that
   a dictionary could affect the efficiency of compression and
   decompression.

   In addition, timing can act as a vector for extracting information
   from another side-channel.  As described in the HEIST attack [HEIST],
   compression ratio information can be leaked by counting round-trip
   latencies.

   Alternatively, while compression and decompression are usually
   relatively fast and fairly content-insensitive operations, retrieving
   and initializing a dictionary might be a high-latency operation, and
   therefore may be identifiable by observing timing.  Timing is
   therefore another potential avenue to observe which dictionary is
   used, which may in turn reveal information about the message being
   processed (Section 4.1.1).







Handte                     Expires May 1, 2020                 [Page 13]


Internet-Draft       Compression Dictionary Security        October 2019


4.2.  Revealing Dictionary Content

   This section investigates the ability to leverage dictionary-based
   compression to reveal data other than the message content being
   compressed (i.e., revealing content used as the dictionary).  Note
   that this is only of interest when there are secrets in the
   dictionary, which violates the common model that is mostly analyzed
   in this document, in which the dictionary is assumed to be a shared,
   public resource.

   In systems with multiple privacy domains, the ability to nominate
   arbitrary resources in that system as dictionaries poses a risk.

   Protocol designers and implementors should ensure that compressing
   and decompressing agents cannot use as dictionaries resources from
   privacy domains that either agent does not have access to.

   A corollary is that a transport system that mixes resources from
   multiple privacy domains into the same compression context through
   dictionary-based compression should not reveal the compressed
   representation of messages (or information derived from the
   compressed representation, such as its size) to other components of
   the system that are only trusted in a particular privacy domain.

4.2.1.  By Observing Message Size

   Analogously to Section 4.1.2, an attacker can exploit knowledge about
   the contents of a message and its compressed size to draw conclusions
   about the contents of the dictionary.

4.2.2.  In Compression

   If an attacker can inspect the compressed representation of a
   message, they may be able to draw conclusions about the contents of
   the dictionary that was used to compress it.  This is especially the
   case if the attacker knows the original message that was compressed
   (i.e., a known-plaintext attack) or if the attacker can supply the
   message to be compressed (i.e., a chosen-plaintext attack), and is
   helped if the attacker can cause the message to be compressed
   multiple times while varying some aspect of the compression.

4.2.3.  In Decompression

   In compression schemes that support the use of dictionaries, and
   especially unstructured dictionaries, it is possible to craft
   compressed messages independent of a dictionary in such a way that,
   when decompressed with a provided dictionary, the decompressed
   message that is produced will reveal information about the contents



Handte                     Expires May 1, 2020                 [Page 14]


Internet-Draft       Compression Dictionary Security        October 2019


   of the dictionary that was not known by the compressor (possibly
   trivially, by directly reproducing some or all of the dictionary's
   contents).

   Consider a protocol that allows a compressing agent to freely
   identify any other resource in the system as the dictionary for a
   message.  The compressing agent could select as a dictionary some
   resource to which the decompressing agent has access, but to which it
   does not.  Without access to that resource, it could nonetheless
   generate a compressed message the effect of which would be to
   reproduce that resource in part or in its entirety.  This message,
   decompressed by the target, would cause a resource in the compressing
   agent's trust domain to appear to have the contents of a resource it
   does not itself have access to.

   This could cause the decompressing agent to take some action that the
   compressing agent would not otherwise have had the authority to
   initiate.  Alternatively, with some additional mechanism, the
   compressing agent could then cause the decompressing agent to reveal
   the uncompressed message (i.e., the selected third-party resource)
   back to the compressing agent.

4.3.  Manipulating Message Content

   When the decompressing agent uses a different dictionary to
   decompress a message than was used to compress the message (which is
   possible due to confusion on the part of either the compressing or
   decompressing agent), the reconstituted message produced by
   decompression may differ from the original message the compressing
   agent intended.

   An attacker that can induce this situation can therefore use
   dictionary compression to manipulate the perceived content of
   messages, even when they cannot directly manipulate the contents of
   the messages themselves.

   A particular implication of this is that a compressed message may
   have multiple interpretations.  In one context (with one dictionary),
   the message can be constructed so as to appear benign or to pass a
   validation or authentication step when decompressed.  Later, if a
   different component or agent can be induced to decompress the same
   message with a different dictionary, the reconstructed message may be
   completely different.

   A general mitigation against this attack is to specify mechanisms to
   validate the integrity of the message.  In particular, it may be
   desirable to validate the ultimate, uncompressed message, rather than
   validating the various components that the decompressing agent relies



Handte                     Expires May 1, 2020                 [Page 15]


Internet-Draft       Compression Dictionary Security        October 2019


   on to reconstitute the uncompressed message--the compressed message,
   the metadata identifying the dictionary, the associated dictionary
   contents, etc.  (However, this has its own problems
   [ENCRYPT-THEN-AUTHENTICATE].)

4.3.1.  By Manipulating Message Content

   The degenerate version of this attack is to manipulate the
   uncompressed message by directly manipulating the compressed
   representation of the message.  In such a scenario, the presence or
   absence of a dictionary is irrelevant.  In most cases, this attack is
   defended against by some scheme that protects the integrity of the
   compressed message.

   However, it is useful to point this attack out, as the other attacks
   in this space aim to achieve the same result indirectly, and may do
   so by exploiting protocols which protect the integrity of the
   compressed message, but perhaps not its metadata describing which
   dictionary to use nor the contents of that dictionary, such as might
   arise particularly if dictionary-based compression is an extension to
   an existing protocol.

4.3.2.  By Manipulating Dictionary Content

   One possible avenue for this kind of attack is to cause the
   compressing agent and decompressing agent to have differing views of
   the same dictionary (whether by manipulating a participant's local
   copy or by causing a fetch to return different results for different
   users or otherwise).

   Protocol designers should therefore take care to protect the
   integrity of dictionaries.  Two broad strategies exist to do so.

4.3.2.1.  Mitigating by Validating Dictionary Contents

   In the first, the identifier for the dictionary may itself be used to
   validate the contents that are retrieved, if the identifier scheme
   includes a cryptographically secure digest of the identified
   dictionary's contents (see Section 3.3.2).  Alternatively, even if
   the identifier itself does not provide for , designers should specify
   other mechanisms to ensure the integrity and correctness of
   dictionaries (signatures, checksums, etc.).  See for example schemes
   like Subresource Integrity [SRI].








Handte                     Expires May 1, 2020                 [Page 16]


Internet-Draft       Compression Dictionary Security        October 2019


4.3.2.2.  Mitigating by Validating Dictionary Sources

   Alternatively, participants can rely on a secure chain of custody
   from a trusted source. ... [TODO]

   In practice, it is probably advisable to implement both mitigations
   in some form.

4.3.3.  By Manipulating Dictionary Identifiers

   Another similar attack is to cause the different agents to have
   differing views of which dictionary to use.  That is, even if the
   integrities of compressed messages and dictionary contents are
   protected, if the association between one and the other can be
   manipulated, the same effect can be achieved.

4.4.  Obfuscating Message Content

   This section discusses attacks that obfuscate a malicious response's
   content through the use of dictionary-based compression.

4.4.1.  From Intermediaries

   Various internet protocols exchange messages through intermediaries
   which inspect or modify the traffic as it passes by (proxies, caches,
   firewalls, etc.), sometimes for reasons that include security.  If
   the compressing and decompressing agents on a connection use a
   dictionary to compress the messages they exchange, and the
   intermediaries between them are not themselves capable of processing
   messages compressed this way, the intermediaries may be prevented
   from being able to inspect the traffic, which may harm their ability
   to detect and filter malicious traffic.

   In practice, the relevance of this concern is questionable.
   Intermediaries of this form [PERVASIVE-MONITORING] can be more
   harmful than they are beneficial to the security of participants and
   their traffic.  Many protocols are moving towards end-to-end
   encrypted models that preclude intermediaries from interacting with
   messages in this way.

   Nonetheless, designers of protocols that involve intermediaries that
   might not support dictionary based compression should give those
   intermediaries the ability to downgrade the message exchange to not
   use dictionaries.  Intermediaries which inspect messages in the
   course of their business should either implement the dictionary based
   compression scheme in question or downgrade the message exchange to
   avoid its use.




Handte                     Expires May 1, 2020                 [Page 17]


Internet-Draft       Compression Dictionary Security        October 2019


4.4.2.  Multiple Representations

   Although the majority (if not the entirety) of compression schemes do
   not guarantee determinism in compression, many implementations are
   deterministic in practice (under fixed parameters).  Experience has
   demonstrated that this state of affairs sometimes entices
   implementors into confusing equality-of-message comparison with
   equality-of-representation comparison.  Representing the same message
   in a new way can therefore violate assumptions and potentially be
   used as a vector for exploitation.  Dictionaries potentially
   contribute to this issue, by introducing a new vector for non-
   determinacy in the compressed representation of a message.

   Users of compression should therefore avoid assumptions that a
   message will always be transformed into the same compressed
   representation.

4.5.  Tracking Users

   This section discusses attacks that identify users through their
   negotiation and use of dictionaries.

   Like any other protocol extension or option, the use or advertisement
   of dictionaries, may allow observers to distinguish participants that
   do and do not support the feature.

4.5.1.  Through Dictionary Negotiation

   In systems which distribute dictionaries dynamically, a participant
   or observer may be able to learn about the past actions of other
   participants by observing the dictionaries they advertise or select.

   For example, if a user exchanged messages with some site
   (www.mybank.com), and in doing so acquired dictionaries published by
   that operator, and then sometime later negotiated a connection with
   some other site (www.curiousaboutyou.com), in which the user
   advertised the dictionaries in their possession, the second operator
   could reasonably conclude that the user had a bank account at MyBank.

   Designers of protocols that use dynamically distributed and
   negotiated dictionaries should therefore take care that dictionaries
   distributed in one privacy domain are not advertised or used in
   others without reason.








Handte                     Expires May 1, 2020                 [Page 18]


Internet-Draft       Compression Dictionary Security        October 2019


4.5.2.  Through Dictionary Retrieval

   The distributor of a dictionary may also be able track the
   propagation of traffic amongst participants as it receives requests
   for a particular dictionary, especially if it can collude with the
   party that generated that message to use a unique dictionary
   identifier.

   Dictionaries that are dynamically fetched should therefore be fetched
   from the same privacy domain they are used in.

4.6.  Denial of Service

   Because dictionary-based compression introduces additional
   dependencies to the processes of generating and interpreting
   messages, an attacker that cause those dependencies to be unavailable
   can potentially cause participants to fail to process messages.

   Protocols that use dictionary-based compression, especially when the
   dictionaries are retrieved in ways that could fail, should be
   prepared to gracefully degrade when those fetches fail.  Designers
   may consider whether messages should only be compressed with
   dictionaries known to already be in the possession of the recipients.

4.7.  Resource Exhaustion

   This section discusses attacks that use dictionaries and dictionary-
   based compression to induce failures through the exhaustion of
   various resources.

   Aside from more specific concerns and corresponding protections
   discussed in the following sections, implementors should take care to
   apply at least the same resource usage constraints to dictionaries
   that they do to the other traffic they handle.  Stronger constraints
   may be warranted, in fact, since the goal of dictionaries is to lower
   total resource consumption.

4.7.1.  Resources

4.7.1.1.  Bandwidth

   Attacks of this form cause the target to consume their network
   resources, resulting in expense and degradation of service.








Handte                     Expires May 1, 2020                 [Page 19]


Internet-Draft       Compression Dictionary Security        October 2019


4.7.1.1.1.  Messages

   If dictionaries can be used to make the compressed representation of
   messages artificially large, it may be possible to cause normal
   traffic to consume disproportionately large bandwidth.  With existing
   dictionary schemes, this is unlikely.

   The reverse is also potentially dangerous, though.  Systems that are
   accustomed to using dictionary-based compression (and whose resources
   are allocated according to the efficiencies achieved thereby) may be
   vulnerable to resource exhaustion when subjected to downgrade
   attacks.  If an attacker can force the system to fall back to not
   using dictionaries, or to using bad dictionaries, or to not using
   compression at all, the system may exceed its allocated network
   resources.

4.7.1.1.2.  Dictionaries

   In protocols in which dictionaries are distributed dynamically, it
   may be possible to cause a target to repeatedly attempt to fetch
   dictionaries, whether by causing dictionary fetches to fail,
   triggering retries, or by causing the target to use many new
   dictionaries that it must then load.

   Since dictionaries can be quite large relative to the messages they
   are used to compress, this could potentially be an effective
   amplification attack.

4.7.1.2.  Storage

   Attacks of this form target the storage resources of a participant
   (any of main memory, cache, disk space, etc.).

4.7.1.2.1.  Message Size

   The same concerns apply here as in Section 4.7.1.1.1.

   Additionally, if dictionaries can be used to make the compressed
   representation of a message extremely small relative to the its
   uncompressed size, they may play a role in enabling a "zip bomb" type
   attack, in which a specially crafted, small (and therefore cheap to
   send) message causes the recipient to consume a huge amount of
   storage space after decompression.

   Implementors should therefore apply storage quotas to messages based
   on the size of the representation in which they will actually be
   stored.  Implementors may also wish to consider rejecting messages




Handte                     Expires May 1, 2020                 [Page 20]


Internet-Draft       Compression Dictionary Security        October 2019


   whose compressed representation is significantly larger than the
   message represented.

4.7.1.2.2.  Message Duplication

   Obviously, flooding a target with messages is an easy way exhaust
   that participant's resources.  Using a dictionary does not natively
   affect that brute force strategy.  However, simple mitigations to
   this sort of attack sometimes leave chinks in systems' armor, which
   dictionaries might play a role in exploiting.

   For example, if an attacker can cause a participant to receive and
   store a single logical message more than once, with different
   metadata (such as the dictionary used) or with a different compressed
   representation (as a result of using a different dictionary), the
   participant may not be able or willing to deduplicate the message.
   For example, an HTTP Cache may be forced to store the same resource
   multiple times, compressed with different dictionaries, if the choice
   of dictionary is part of the cache's secondary key [HTTP-CACHING].

4.7.1.2.3.  Dictionaries

   Another possible avenue of attack would be to cause a participant to
   consume space by storing the dictionaries themselves.  The
   effectiveness of attacks of this form are driven by the product of
   (1) the number of dictionaries stored, (2) their size, and (3) how
   long they are retained.

   Dictionaries may themselves be fairly large.  But one thing to note
   in particular is that, when in use, the space consumed by a
   dictionary may be significantly greater than its raw size.  In order
   to be used in compression or decompression (but particularly in
   compression), the dictionary contents must be loaded into the
   compressor's internal datastructures.  This can be done at
   compression-time, for every compression, using the datastructures
   already allocated for that compression.

   Alternatively, some compression algorithms allow the user to do this
   preparation step separately, producing a materialized representation
   of the dictionary in memory that can be reused across a number of
   compression operations (e.g., a ZSTD_CDict).  While this avoids
   duplicated work (processing the dictionary for each compression),
   applications which cache these materialized dictionaries can
   accidentally consume a lot of memory.  In addition to the factors
   mentioned above that control the total size of stored dictionaries,
   the expansion factor as those dictionaries are materialized is
   controlled by the compression settings (and potentially instructions
   in the dictionary).



Handte                     Expires May 1, 2020                 [Page 21]


Internet-Draft       Compression Dictionary Security        October 2019


   Applications that allow other participants to influence the contents,
   number, size, retention period, or compression settings of
   dictionaries should take care to constrain the total at rest and in-
   memory footprints of those dictionaries.

4.7.1.3.  Computation

   Attacks of this form target the computational resources (and by
   extension, the time and energy) of a participant in the protocol.

4.7.1.3.1.  Using a Dictionary

   For existing compressors that support dictionaries, compression and
   decompression with a dictionary is usually faster than without one.

   However, as the kinds of information captured in dictionaries grows,
   as described in Section 3.2.2, dictionaries may come to include
   instructions that significantly influence the speed of the
   compressor.  For example, dictionaries might specify a particularly
   laborious transformation to be performed on the input.  Or they might
   specify internal compression parameters, which might instruct the
   compressor to do huge amounts of work during compression.

   If dictionary-based compressions systems evolve to include these
   sorts of features, care should be taken to avoid allowing
   dictionaries from untrusted sources to influence compression behavior
   or parameters.  Note: this is not a concern for existing
   dictionaries.

   Analogously, care should be taken to avoid allowing dictionaries to
   influence decompression performance.

4.7.1.3.2.  Generating Dictionaries

   Training a dictionary, depending on the methodology, can be a very
   expensive computation (building an optimal dictionary is NP-hard).
   Designers of protocols that involve creating new dictionaries on the
   fly should constrain either or both of (1) who can cause a
   participant to train a new dictionary and (2) the computational cost
   of training a new dictionary (by selecting a fast algorithm or by
   limiting the amount of data over which the algorithm is run).

4.7.2.  Targets

   In addition to the immediate compressing and decompressing agents,
   the mechanisms surrounding dictionary-based compression may allow for
   the targeting of other agents.




Handte                     Expires May 1, 2020                 [Page 22]


Internet-Draft       Compression Dictionary Security        October 2019


4.7.2.1.  An Intermediary

   Insofar as intermediaries in internet protocols are often responsible
   for handling a much higher volume of traffic in a much lighter-weight
   way than protocol endpoints, any additional per-message or per-
   connection burden has the potential to significantly increase the
   workload of the intermediary.  Retrieving, caching, and processing
   dictionaries, especially when the set of dictionaries is unbounded,
   is potentially untenable for intermediaries of that type.

4.7.2.2.  A Third Party

   The mechanisms surrounding dictionary-based compression potentially
   also enable attacks against third parties, including parties with
   whom the attacker cannot exchange messages directly.

   If a recipient can be induced to relay messages to a third-party, or
   to generate new messages directed at a third-party, a third party can
   become the effective recipient of dictionary-compressed traffic.  If
   the dictionaries used to compress these messages are hard or slow to
   load (or even non-existent), the work of handling these messages
   could be significant.  This is especially a risk when decompression
   of the message is required before it can be evaluated against an
   access-control policy or otherwise distinguished from legitimate
   traffic.

   Protocol designers should therefore consider carefully the risks of
   using dictionary-based compression on (the parts of) messages that
   are used for authentication.

   Another possible attack, when dictionaries are distributed
   dynamically, arises from the ability for compressed messages to
   trigger the retrieval of a dictionary from a third party.  This is
   especially a risk when the source for a dictionary can be arbitrarily
   specified (as, for example, a URL).

   These approaches potentially allow an attacker to amplify their
   efforts and turn their attack into a distributed one.

   Protocol designers should consider how the source for the retrieval
   of a dictionary is derived, who can influence that derivation, and
   whether it should be constrained to preclude nominating a third
   party.

   Protocol designers and implementors who relay messages should also
   consider whether those messages should be relayed compressed with the
   same dictionary, or whether dictionary selection and negotiation
   should occur for each hop in the path of a message.



Handte                     Expires May 1, 2020                 [Page 23]


Internet-Draft       Compression Dictionary Security        October 2019


4.8.  Generating Dictionaries

   This section discusses the potential for inadvertent leakage of
   private information in the creation of dictionaries.

   As described in Section 3.3.1, dictionaries are commonly generated by
   an algorithm run over a corpus sampled from the application's
   traffic.  For systems which wish to publish dictionaries publicly
   (or, at any rate, with less strict access controls than the traffic
   on which they are trained), it is important to prevent the leakage of
   private information in the creation of dictionaries.

   The output of this training process, the dictionary, as described in
   Section 3.2, may be composed of several different kinds of data.
   Some of these pieces, like statistical summaries around symbol
   frequencies, are unlikely to represent vectors for leaking useful
   information about the corpus they were trained on.  Other components,
   however, directly represent substrings found in the input corpus.

   Protocol designers, implementors, and participants that construct
   their own dictionaries should take care to do so in a way that does
   not reproduce private data in the produced dictionaries' contents.

4.8.1.  Handling Samples

   Since dictionaries are generally produced from a collection of sample
   data, implementing a dictionary training capability may require
   storing or otherwise handling message traffic in ways it would
   otherwise not.  This in itself can create an attack surface, for
   example if secrets that would normally exist only in transit or in
   memory are persisted or passed to other systems.

   Care should be taken by implementors to protect the security of
   messages that are selected as samples for future use in dictionary
   training.  Protections should be implemented both at rest and in
   transit, including retention limits, so as to limit the window of
   compromise.

4.8.2.  Tagging Mitigations

   One strategy for ensuring that private data does not appear in
   dictionaries is to avoid presenting private data to the training
   algorithm at all.  This sanitization of the training samples can be
   accomplished either by removing just the specific parts of samples
   that are private or by entirely removing samples that contain any
   private data in them.





Handte                     Expires May 1, 2020                 [Page 24]


Internet-Draft       Compression Dictionary Security        October 2019


   This discrimination of private and public content can rely on being
   able to identify private information on sight (e.g.,
   [CLOUDFLARE-NO-COMPRESS]).

   Alternatively, the trainer can rely on explicit signals, provided
   alongside the messages, to perform that discrimination.

4.8.3.  Probabilistic Mitigations

   Another strategy relies on a statistical approach for the
   identification and removal of private information.

   In building the dictionary's contents, the goal of the dictionary
   training algorithm is to collect the set of strings that most
   effectively improve the compression ratio of messages in the corpus.
   This goal is best served by including strings that appear frequently
   in the sample corpus and rejecting strings that appear rarely.

   In a loose way, it is reasonable to expect that commonly occurring
   substrings are less private, and rarely occurring substrings may be
   more private.  So the dictionary trainer's interests are broadly
   aligned with this goal of not including private information in the
   dictionary.

   While existing public dictionary training algorithms largely do not
   include specific protections or offer hard guarantees to prevent the
   inclusion of private data in their output, there is ongoing research
   in this area.  Future algorithms may be able to provide confidence
   that private data (that is not somehow overrepresented in the
   training corpus) will be filtered out of the produced dictionary.

4.9.  Complexity

   Complexity is ever the enemy of security.  It is unavoidably the case
   that dictionary-based compression is more complicated than stateless
   compression.

5.  Conclusions

   This document attempts to analyze risks and responses at the
   intersection of several widely varying factors--the protocol, the
   environment, the threat model--and its conclusions are necessarily
   situational.

   From that space of configurations, some broad conclusions can
   nonetheless be drawn.  Much of the complexity and risk in
   implementing dictionary-based compression comes from its surrounding
   apparatus: creating dictionaries, handling them, distributing them,



Handte                     Expires May 1, 2020                 [Page 25]


Internet-Draft       Compression Dictionary Security        October 2019


   storing them, identifying them, and so on.  A significant distinction
   can therefore be drawn between systems that have to grapple with
   those challenges versus those that don't.

   [TODO]

6.  IANA Considerations

   This document includes no actions for IANA.

   [RFC Editor: Please remove this section before publication.]

7.  Security Considerations

   This document enumerates known security considerations about a space
   that is under development.  The list of issues discussed above may
   not be exhaustive, but it is hopefully complete enough to aid in the
   design and implementation of future systems and protocols.

8.  References

8.1.  Normative References

   [BROTLI]   Alakuijala, J. and Z. Szabadka, "Brotli Compressed Data
              Format", RFC 7932, DOI 10.17487/RFC7932, July 2016,
              <https://www.rfc-editor.org/info/rfc7932>.

   [DEFLATE]  Deutsch, P., "DEFLATE Compressed Data Format Specification
              version 1.3", RFC 1951, DOI 10.17487/RFC1951, May 1996,
              <https://www.rfc-editor.org/info/rfc1951>.

   [ZSTD]     Collet, Y. and M. Kucherawy, Ed., "Zstandard Compression
              and the application/zstd Media Type", RFC 8478,
              DOI 10.17487/RFC8478, October 2018,
              <https://www.rfc-editor.org/info/rfc8478>.

8.2.  Other Examples of Dictionary-Like Compression

   [HPACK]    Peon, R. and H. Ruellan, "HPACK: Header Compression for
              HTTP/2", RFC 7541, DOI 10.17487/RFC7541, May 2015,
              <https://www.rfc-editor.org/info/rfc7541>.

   [HTTP-DELTA-ENCODING]
              Mogul, J., Krishnamurthy, B., Douglis, F., Feldmann, A.,
              Goland, Y., van Hoff, A., and D. Hellerstein, "Delta
              encoding in HTTP", RFC 3229, DOI 10.17487/RFC3229, January
              2002, <https://www.rfc-editor.org/info/rfc3229>.




Handte                     Expires May 1, 2020                 [Page 26]


Internet-Draft       Compression Dictionary Security        October 2019


   [I-D.ietf-quic-qpack]
              Krasic, C., Bishop, M., and A. Frindell, Ed., "QPACK:
              Header Compression for HTTP/3", draft-ietf-quic-qpack-10
              (work in progress), 2019.

   [I-D.lee-sdch-spec]
              Butler, J., Lee, W., McQuade, B., and K. Mixter, "A
              Proposal for Shared Dictionary Compression over HTTP",
              draft-lee-sdch-spec-00 (work in progress), October 2016.

   [I-D.reschke-http-oob-encoding]
              Reschke, J. and S. Loreto, "'Out-Of-Band' Content Coding
              for HTTP", draft-reschke-http-oob-encoding-12 (work in
              progress), 2017.

   [I-D.vandevenne-shared-brotli-format]
              Alakuijala, J., Duong, T., Kliuchnikov, E., Obryk, R.,
              Szabadka, Z., and L. Vandevenne, Ed., "Shared Brotli
              Compressed Data Format", draft-vandevenne-shared-brotli-
              format-04 (work in progress), August 2019.

   [I-D.vkrasnov-h2-compression-dictionaries]
              Krasnov, V. and Y. Weiss, "Compression Dictionaries for
              HTTP/2", draft-vkrasnov-h2-compression-dictionaries-03
              (work in progress), 2018.

8.3.  Informative References

   [BREACH]   Prado, A., Harris, N., and Y. Gluck, "BREACH: SSL, Gone in
              30 Seconds", 2013, <https://breachattack.com/>.

   [CLOUDFLARE-NO-COMPRESS]
              Loring, B., "A Solution to Compression Oracles on the
              Web", March 2018, <https://blog.cloudflare.com/
              a-solution-to-compression-oracles-on-the-web/>.

   [COOKIES]  Barth, A., "HTTP State Management Mechanism", RFC 6265,
              DOI 10.17487/RFC6265, April 2011,
              <https://www.rfc-editor.org/info/rfc6265>.

   [COVER]    Liao, K., Petri, M., Moffat, A., and A. Wirth, "Effective
              Construction of Relative Lempel-Ziv Dictionaries",
              DOI 10.1145/2872427.2883042, 2016,
              <https://doi.org/10.1145/2872427.2883042>.

   [CRIME]    Rizzo, J. and T. Duong, "Compression Ratio Info-leak Made
              Easy", 2012, <https://www.ekoparty.org/archive/2012/
              CRIME_ekoparty2012.pdf>.



Handte                     Expires May 1, 2020                 [Page 27]


Internet-Draft       Compression Dictionary Security        October 2019


   [ENCRYPT-THEN-AUTHENTICATE]
              Krawczyk, H., "The Order of Encryption and Authentication
              for Protecting Communications (Or: How Secure is SSL?)",
              2001, <https://iacr.org/archive/crypto2001/21390309.pdf>.

   [HEIST]    Vanhoef, M. and T. Van Goethem, "HEIST: HTTP Encrypted
              Information can be Stolen through TCP-windows", 2016,
              <https://tom.vg/papers/heist_blackhat2016.pdf>.

   [HTTP-CACHING]
              Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke,
              Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching",
              RFC 7234, DOI 10.17487/RFC7234, June 2014,
              <https://www.rfc-editor.org/info/rfc7234>.

   [I-D.arkko-arch-internet-threat-model]
              Arkko, J., "Changes in the Internet Threat Model", draft-
              arkko-arch-internet-threat-model-01 (work in progress),
              July 2019.

   [I-D.draft-farrell-etm]
              Farrell, S., "We're gonna need a bigger threat model",
              draft-farrell-etm-03 (work in progress), July 2019.

   [I-D.draft-kucherawy-httpbis-dict-sec]
              Kucherawy, M., "Security Considerations Regarding
              Compression Dictionaries", draft-kucherawy-httpbis-dict-
              sec-00 (work in progress), November 2018.

   [I-D.pironti-tls-length-hiding]
              Pironti, A. and N. Mavrogiannopoulos, "Length Hiding
              Padding for the Transport Layer Security Protocol", draft-
              pironti-tls-length-hiding-02 (work in progress), September
              2013.

   [LZ77]     Ziv, J. and A. Lempel, "A Universal Algorithm for
              Sequential Data Compression",
              DOI 10.1109/TIT.1977.1055714, May 1977,
              <https://ieeexplore.ieee.org/document/1055714>.

   [LZ78]     Ziv, J. and A. Lempel, "Compression of individual
              sequences via variable-rate coding",
              DOI 10.1109/TIT.1978.1055934, September 1978,
              <https://ieeexplore.ieee.org/document/1055934>.







Handte                     Expires May 1, 2020                 [Page 28]


Internet-Draft       Compression Dictionary Security        October 2019


   [PERVASIVE-MONITORING]
              Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
              Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May
              2014, <https://www.rfc-editor.org/info/rfc7258>.

   [PRIVACY]  Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
              Morris, J., Hansen, M., and R. Smith, "Privacy
              Considerations for Internet Protocols", RFC 6973,
              DOI 10.17487/RFC6973, July 2013,
              <https://www.rfc-editor.org/info/rfc6973>.

   [RFC2360]  Scott, G., "Guide for Internet Standards Writers", BCP 22,
              RFC 2360, DOI 10.17487/RFC2360, June 1998,
              <https://www.rfc-editor.org/info/rfc2360>.

   [SECURITY-GUIDELINES]
              Rescorla, E. and B. Korver, "Guidelines for Writing RFC
              Text on Security Considerations", BCP 72, RFC 3552,
              DOI 10.17487/RFC3552, July 2003,
              <https://www.rfc-editor.org/info/rfc3552>.

   [SRI]      Akhawe, D., Braun, F., Marier, F., and J. Weinberger,
              "Subresource Integrity", March 2014,
              <https://www.w3.org/TR/SRI/>.

   [ZSTD-DICTS]
              Collet, Y., Handte, W., and N. Terrell, "5 ways Facebook
              improved compression at scale with Zstandard", December
              2018, <https://code.fb.com/core-data/zstandard/>.






















Handte                     Expires May 1, 2020                 [Page 29]


Internet-Draft       Compression Dictionary Security        October 2019


Appendix A.  Acknowledgements

   The author wishes to acknowledge the following for their help in
   writing and improving this document: Murray Kucherawy, Yann Collet,
   Nick Terrell, ... [TODO]

Author's Address

   W. Felix P. Handte
   Facebook, Inc.
   770 Broadway
   New York, NY  10003
   US

   EMail: felixh@fb.com




































Handte                     Expires May 1, 2020                 [Page 30]


Html markup produced by rfcmarkup 1.129d, available from https://tools.ietf.org/tools/rfcmarkup/