[Docs] [txt|pdf|xml|html] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits]

Versions: (draft-boschi-ipfix-anon) 00 01 02 03 04 05 06 RFC 6235

IPFIX Working Group                                            E. Boschi
Internet-Draft                                               B. Trammell
Intended status: Experimental                             Hitachi Europe
Expires: April 14, 2010                                 October 11, 2009


                     IP Flow Anonymisation Support
                      draft-ietf-ipfix-anon-00.txt

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 14, 2010.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   This document describes anonymisation techniques for IP flow data and
   the export of anonymised data using the IPFIX protocol.  It provides
   a categorization of common anonymisation schemes and defines the



Boschi & Trammell        Expires April 14, 2010                 [Page 1]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   parameters needed to describe them.  It provides guidelines for the
   implementation of anonymised data export and storage over IPFIX, and
   describes an Options-based method for anonymisation metadata export
   within the IPFIX protocol, providing the basis for the definition of
   information models for configuring anonymisation techniques within an
   IPFIX Metering or Exporting Process, and for reporting the technique
   in use to an IPFIX Collecting Process.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1.  IPFIX Protocol Overview  . . . . . . . . . . . . . . . . .  4
     1.2.  IPFIX Documents Overview . . . . . . . . . . . . . . . . .  5
     1.3.  Anonymisation within the IPFIX Architecture  . . . . . . .  5
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  7
   3.  Categorisation of Anonymisation Techniques . . . . . . . . . .  7
   4.  Anonymisation of IP Flow Data  . . . . . . . . . . . . . . . .  8
     4.1.  IP Address Anonymisation . . . . . . . . . . . . . . . . . 10
       4.1.1.  Truncation . . . . . . . . . . . . . . . . . . . . . . 10
       4.1.2.  Random Permutation . . . . . . . . . . . . . . . . . . 10
       4.1.3.  Prefix-preserving Pseudonymisation . . . . . . . . . . 11
     4.2.  Hardware Address Anonymisation . . . . . . . . . . . . . . 11
       4.2.1.  Random Permutation . . . . . . . . . . . . . . . . . . 12
       4.2.2.  Structured Pseudonymisation  . . . . . . . . . . . . . 12
     4.3.  Timestamp Anonymisation  . . . . . . . . . . . . . . . . . 12
       4.3.1.  Precision Degradation  . . . . . . . . . . . . . . . . 13
       4.3.2.  Enumeration  . . . . . . . . . . . . . . . . . . . . . 13
       4.3.3.  Random Time Shifts . . . . . . . . . . . . . . . . . . 13
     4.4.  Counter Anonymisation  . . . . . . . . . . . . . . . . . . 14
       4.4.1.  Precision Degradation  . . . . . . . . . . . . . . . . 14
       4.4.2.  Binning  . . . . . . . . . . . . . . . . . . . . . . . 14
       4.4.3.  Random Noise Addition  . . . . . . . . . . . . . . . . 15
     4.5.  Anonymisation of Other Flow Fields . . . . . . . . . . . . 15
       4.5.1.  Binning  . . . . . . . . . . . . . . . . . . . . . . . 15
       4.5.2.  Random Permutation . . . . . . . . . . . . . . . . . . 16
   5.  Parameters for the Description of Anonymisation Techniques . . 16
     5.1.  Stability  . . . . . . . . . . . . . . . . . . . . . . . . 16
     5.2.  Truncation Length  . . . . . . . . . . . . . . . . . . . . 16
     5.3.  Bin Map  . . . . . . . . . . . . . . . . . . . . . . . . . 17
     5.4.  Permutation  . . . . . . . . . . . . . . . . . . . . . . . 17
     5.5.  Shift Amount . . . . . . . . . . . . . . . . . . . . . . . 17
   6.  Anonymisation Export Support in IPFIX  . . . . . . . . . . . . 17
     6.1.  Anonymisation Options Template . . . . . . . . . . . . . . 18
     6.2.  Recommended Information Elements for Anonymisation
           Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 19
       6.2.1.  anonymisationStability . . . . . . . . . . . . . . . . 19
       6.2.2.  anonymisationTechnique . . . . . . . . . . . . . . . . 20



Boschi & Trammell        Expires April 14, 2010                 [Page 2]

Internet-Draft        IP Flow Anonymisation Support         October 2009


       6.2.3.  informationElementIndex  . . . . . . . . . . . . . . . 22
   7.  Applying Anonymisation Techniques to IPFIX Export and
       Storage  . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
     7.1.  Arrangement of Processes in IPFIX Anonymisation  . . . . . 22
     7.2.  IPFIX-Specific Anonymisation Guidelines  . . . . . . . . . 25
       7.2.1.  Appropriate Use of Information Elements for
               Anonymised Data  . . . . . . . . . . . . . . . . . . . 25
       7.2.2.  Anonymisation of Header Data . . . . . . . . . . . . . 26
       7.2.3.  Anonymisation of Options Data  . . . . . . . . . . . . 27
   8.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 28
   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 29
   11. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 29
   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 30
     12.1. Normative References . . . . . . . . . . . . . . . . . . . 30
     12.2. Informative References . . . . . . . . . . . . . . . . . . 30
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 31


































Boschi & Trammell        Expires April 14, 2010                 [Page 3]

Internet-Draft        IP Flow Anonymisation Support         October 2009


1.  Introduction

   The standardisation of an IP flow information export protocol
   [RFC5101] and associated representations removes a technical barrier
   to the sharing of IP flow data across organizational boundaries and
   with network operations, security, and research communities for a
   wide variety of purposes.  However, with wider dissemination comes
   greater risks to the privacy of the users of networks under
   measurement, and to the security of those networks.  While it is not
   a complete solution to the issues posed by distribution of IP flow
   information, anonymisation (i.e., the deletion or transformation of
   information that is considered sensitive and could be used to reveal
   the identity of subjects involved in a communication) is an important
   tool for the protection of privacy within network measurement
   infrastructures.

   This document presents a mechanism for representing anonymised data
   within IPFIX and guidelines for using it.  It begins with a
   categorization of anonymisation techniques.  It then describes
   applicability of each technique to commonly anonymisable fields of IP
   flow data, organized by information element data type and semantics
   as in [RFC5102]; enumerates the parameters required by each of the
   applicable anonymisation techniques; and provides guidelines for the
   use of each of these techniques in accordance with best practices in
   data protection.  Finally, it specifies a mechanism for exporting
   anonymised data and binding anonymisation metadata to templates using
   IPFIX Options.

1.1.  IPFIX Protocol Overview

   In the IPFIX protocol, { type, length, value } tuples are expressed
   in templates containing { type, length } pairs, specifying which {
   value } fields are present in data records conforming to the
   Template, giving great flexibility as to what data is transmitted.
   Since Templates are sent very infrequently compared with Data
   Records, this results in significant bandwidth savings.  Various
   different data formats may be transmitted simply by sending new
   Templates specifying the { type, length } pairs for the new data
   format.  See [RFC5101] for more information.

   The IPFIX information model [RFC5102] defines a large number of
   standard Information Elements which provide the necessary { type }
   information for Templates.  The use of standard elements enables
   interoperability among different vendors' implementations.
   Additionally, non-standard enterprise-specific elements may be
   defined for private use.





Boschi & Trammell        Expires April 14, 2010                 [Page 4]

Internet-Draft        IP Flow Anonymisation Support         October 2009


1.2.  IPFIX Documents Overview

   "Specification of the IPFIX Protocol for the Exchange of IP Traffic
   Flow Information" [RFC5101] and its associated documents define the
   IPFIX Protocol, which provides network engineers and administrators
   with access to IP traffic flow information.

   "Architecture for IP Flow Information Export" [RFC5470] defines the
   architecture for the export of measured IP flow information out of an
   IPFIX Exporting Process to an IPFIX Collecting Process, and the basic
   terminology used to describe the elements of this architecture, per
   the requirements defined in "Requirements for IP Flow Information
   Export" [RFC3917].  The IPFIX Protocol document [RFC5101] then covers
   the details of the method for transporting IPFIX Data Records and
   Templates via a congestion-aware transport protocol from an IPFIX
   Exporting Process to an IPFIX Collecting Process.

   "Information Model for IP Flow Information Export" [RFC5102]
   describes the Information Elements used by IPFIX, including details
   on Information Element naming, numbering, and data type encoding.
   Finally, "IPFIX Applicability" [RFC5472] describes the various
   applications of the IPFIX protocol and their use of information
   exported via IPFIX, and relates the IPFIX architecture to other
   measurement architectures and frameworks.

   Additionally, the "Specification of the IPFIX File Format"
   [I-D.ietf-ipfix-file] describes a file format based upon the IPFIX
   Protocol for the storage of flow data.

   This document references the Protocol and Architecture documents for
   terminology, and extends the IPFIX Information Model to provide new
   Information Elements for anonymisation metadata.  The anonymisation
   techniques described herein are equally applicable to the IPFIX
   Protocol and data stored in IPFIX Files.

1.3.  Anonymisation within the IPFIX Architecture

   "Architecture for IP Flow Information Export" [RFC5470] defines the
   functions performed in sequence by the various functional blocks in
   an IPFIX Device as in the figure below.


                    Packet(s) coming into Observation Point(s)
                      |                                   |
                      v                                   v
     +----------------+-------------------------+   +-----+-------+
     |          Metering Process on an          |   |             |
     |             Observation Point            |   |             |



Boschi & Trammell        Expires April 14, 2010                 [Page 5]

Internet-Draft        IP Flow Anonymisation Support         October 2009


     |                                          |   |             |
     |   packet header capturing                |   |             |
     |        |                                 |...| Metering    |
     |   timestamping                           |   | Process N   |
     |        |                                 |   |             |
     | +----->+                                 |   |             |
     | |      |                                 |   |             |
     | |   sampling Si (1:1 in case of no       |   |             |
     | |      |          sampling)              |   |             |
     | |   filtering Fi (select all when        |   |             |
     | |      |          no criteria)           |   |             |
     | +------+                                 |   |             |
     |        |                                 |   |             |
     |        |        Timing out Flows         |   |             |
     |        |    Handle resource overloads    |   |             |
     +--------|---------------------------------+   +-----|-------+
              |                                           |
      Flow Records (identified by Observation Domain)  Flow Records
              |                                           |
              +---------+---------------------------------+
                        |
   +--------------------|----------------------------------------------+
   |                    |     Exporting Process                        |
   |+-------------------|-------------------------------------------+  |
   ||                   v       IPFIX Protocol                      |  |
   ||+-----------------------------+  +----------------------------+|  |
   |||Rules for                    |  |Functions                   ||  |
   ||| Picking/sending Templates   |  |-Packetise selected Control ||  |
   ||| Picking/sending Flow Records|->|  & data Information into   ||  |
   ||| Encoding Template & data    |  |  IPFIX export packets.     ||  |
   ||| Selecting Flows to export(*)|  |-Handle export errors       ||  |
   ||+-----------------------------+  +----------------------------+|  |
   |+----------------------------+----------------------------------+  |
   |                             |                                     |
   |                    exported IPFIX Messages                        |
   |                             |                                     |
   |                +------------+-----------------+                   |
   |                |  Anonymise export packet(*)  |                   |
   |                +------------+-----------------+                   |
   |                             |                                     |
   |                +------------+-----------------+                   |
   |                |       Transport  Protocol    |                   |
   |                +------------+-----------------+                   |
   |                             |                                     |
   +-----------------------------+-------------------------------------+
                                 |
                                 v
                    IPFIX export packet to Collector



Boschi & Trammell        Expires April 14, 2010                 [Page 6]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   (*) indicates that the block is optional.


                 Figure 1: IPFIX Device functional blocks

   Note that, according to the original architecture specification,
   IPFIX Message anonymisation is optionally performed as the final
   operation before handing the Message to the transport protocol for
   export.  While no provision is made in the architecture for
   anonymisation metadata as in Section 6, this arrangement does allow
   for the message rewriting necessary for comprehensive anonymisation
   of IPFIX export as in Section 7.  The development of the IPFIX
   Mediation [I-D.ietf-ipfix-mediators-framework] framework and the
   IPFIX File Format [I-D.ietf-ipfix-file] expand upon this initial
   architectural allowance for anonymisation by adding to the list of
   places that anonymisation may be applied.  The former specifies IPFIX
   Mediators, which rewrite existing IPFIX messages, and the latter
   specifies a method for storage of IPFIX data in files.

   More detail on the applicable architectural arrangements of
   anonymisation can be found in Section 7.1


2.  Terminology

   Terms used in this document that are defined in the Terminology
   section of the IPFIX Protocol [RFC5101] document are to be
   interpreted as defined there.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


3.  Categorisation of Anonymisation Techniques

   Anonymisation modifies a data set in order to protect the identity of
   the people or entities described by the data set from disclosure.
   With respect to network traffic data, anonymisation generally
   attempts to preserve some set of properties of the network traffic
   useful for a given application or applications, while ensuring the
   data cannot be traced back to the specific networks, hosts, or users
   generating the traffic.

   Anonymisation may be broadly classified according to two properties:
   recoverability and countability.  All anonymisation techniques map
   the real space of identifiers or values into a separate, anonymised
   space, according to some function.  A technique is said to be



Boschi & Trammell        Expires April 14, 2010                 [Page 7]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   recoverable when the function used is invertible or can otherwise be
   reversed and a real identifier can be recovered from a given
   replacement identifier.

   Countability compares the dimension of the anonymised space (N) to
   the dimension of the real space (M), and denotes how the count of
   unique values is preserved by the anonymisation function.  If the
   anonymised space is smaller than the real space, then the function is
   said to generalise the input, mapping more than one input point to
   each anonymous value (e.g., as with aggregation).  By definition,
   generalisation is not recoverable.

   If the dimensions of the anonymised and real spaces are the same,
   such that the count of unique values is preserved, then the function
   is said to be a direct substitution function.  If the dimension of
   the anonymised space is larger, such that each real value maps to a
   set of anonymised values, then the function is said to be a set
   substitution function.  Note that with set substitution functions,
   the sets of anonymised values are not necessarily disjoint.  Either
   direct or set substitution functions are said to be one-way if there
   exists no method for recovering the real data point from an
   anonymised one.

   This classification is summarised in the table below.

   +------------------------+-----------------+------------------------+
   | Recoverability /       | Recoverable     | Non-recoverable        |
   | Countability           |                 |                        |
   +------------------------+-----------------+------------------------+
   | N < M                  | N.A.            | Generalisation         |
   | N = M                  | Direct          | One-way Direct         |
   |                        | Substitution    | Substitution           |
   | N > M                  | Set             | One-way Set            |
   |                        | Substitution    | Substitution           |
   +------------------------+-----------------+------------------------+


4.  Anonymisation of IP Flow Data

   Due to the restricted semantics of IP flow data, there are a
   relatively limited set of specific anonymisation techniques available
   on flow data, though each falls into the broad categories above.
   Each type of field that may commonly appear in a flow record may have
   its own applicable specific techniques.

   While anonymisation is generally applied at the resolution of single
   fields within a flow record, attacks against anonymisation use entire
   flows and relationships between hosts and flows within a given data



Boschi & Trammell        Expires April 14, 2010                 [Page 8]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   set.  Therefore, fields which may not necessarily be identifying by
   themselves may be anonymised in order to increase the anonymity of
   the data set as a whole.

   Of all the fields in an IP flow record, only IP addresses directly
   identify entities in the real world.  Each IP address is associated
   with an interface on a network host, and can potentially be
   identified with a single user.  Additionally, IP addresses are
   structured identifiers; that is, partial IP address prefixes may be
   used to identify networks just as full IP addresses identify hosts.
   This makes anonymisation of IP addresses particularly important.

   Hardware addresses uniquely identify devices on the network; while
   they are not often available in traffic data collected at Layer 3,
   and cannot be used to locate devices within the network, some traces
   may contain sub-IP data including hardware address data.  Hardware
   addresses may be mappable to device serial numbers, and to the
   entities or individuals who purchased the devices, when combined with
   external databases.  They may also leak via IPv6 addresses in certain
   circumstances.  Therefore, hardware address anonymisation is also
   important.

   Port numbers identify abstract entities (applications) as opposed to
   real-world entities, but they can be used to classify hosts and user
   behavior.  Passive port fingerprinting, both of well-known and
   ephemeral ports, can be used to determine the operating system
   running on a host.  Relative data volumes by port can also be used to
   determine the host's function (workstation, web server, etc.); this
   information can be used to identify hosts and users.

   While not identifiers in and of themselves, timestamps and counters
   can reveal the behavior of the hosts and users on a network.  Any
   given network activity is recognizable by a pattern of relative time
   differences and data volumes in the associated sequence of flows,
   even without host address information.  They can therefore be used to
   identify hosts and users.  Timestamps and counters are also
   vulnerable to traffic injection attacks, where traffic with a known
   pattern is injected into a network under measurement, and this
   pattern is later identified in the anonymised data set.

   The simplest and most extreme form of anonymisation, which can be
   applied to any field of a flow record, is black-marker anonymisation,
   or complete deletion of a given field.  Note that black-marker
   anonymisation is equivalent to simply not exporting the field(s) in
   question.

   While black-marker anonymisation completely protects the data in the
   deleted fields from the risk of disclosure, it also reduces the



Boschi & Trammell        Expires April 14, 2010                 [Page 9]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   utility of the anonymised data set as a whole.  Techniques that
   retain some information while reducing (though not eliminating) the
   disclosure risk will be extensively discussed in the following
   sections; note that the techniques specifically applicable to IP
   addresses, timestamps, ports, and counters will be discussed in
   separate sections.

4.1.  IP Address Anonymisation

   Since IP addresses are the most common identifiers within flow data
   that can be used to directly identify a person, organization, or
   host, most of the work on flow and trace data anonymisation has gone
   into IP address anonymisation techniques.  Indeed, the aim of most
   attacks against anonymisation is to recover the map from anonymised
   IP addresses to original IP addresses thereby identifying the
   identified hosts.  There is therefore a wide range of IP address
   anonymisation schemes that fit into the following categories.

       +------------------------------------+---------------------+
       | Scheme                             | Action              |
       +------------------------------------+---------------------+
       | Truncation                         | Generalisation      |
       | Random Permutation                 | Direct Substitution |
       | Prefix-preserving Pseudonymisation | Direct Substitution |
       +------------------------------------+---------------------+

4.1.1.  Truncation

   Truncation removes "n" of the least significant bits from an IP
   address, replacing them with zeroes.  In effect, it replaces a host
   address with a network address for some fixed netblock; for IPv4
   addresses, 8-bit truncation corresponds to replacement with a /24
   network address.  Truncation is a non-reversible generalisation
   scheme.  Note that while truncation is effective for making hosts
   non-identifiable, it preserves information which can be used to
   identify an organization, a geographic region, a country, or a
   continent (or RIR region of responsibility).

   Truncation to an address length of 0 is equivalent to black-marker
   anonymisation.  Removal of IP address information is only recommended
   for analysis tasks which have no need to separate flow data by host
   or network; e.g. as a first stage to per-application (port) or time-
   series total volume analyses.

4.1.2.  Random Permutation

   Random permutation is a direct substitution technique, replacing each
   IP address with an address randomly selected from the set of possible



Boschi & Trammell        Expires April 14, 2010                [Page 10]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   IP addresses, guaranteeing that each anonymised address represents a
   unique original address.  The random permutation does not preserve
   any structural information about a network, but it does preserve the
   unique count of IP addresses.  Any application that requires more
   structure than host-uniqueness will not be able to use randomly
   permuted IP addresses.

4.1.3.  Prefix-preserving Pseudonymisation

   Prefix-preserving pseudonymisation is a direct substitution
   technique, further restricted such that the structure of subnets is
   preserved at each level while anonymising IP addresses.  If two real
   IP addresses match on a prefix of "n" bits, the two anonymised IP
   addresses will match on a prefix of "n" bits as well.  This is useful
   when relationships among networks must be preserved for a given
   analysis task, but introduces structure into the anonymised data
   which can be exploited in attacks against the anonymisation
   technique.

4.2.  Hardware Address Anonymisation

   Flow data containing sub-IP information can also contain identifying
   information in the form of the hardware (MAC) address.  While
   hardware address information cannot be used to locate a node within a
   network, it can be used to directly uniquely identify a specific
   device.  Vendors or organizations within the supply chain may then
   have the information necessary to identify the entity or individual
   that purchased the device.

   Hardware address information is not as structured as IP address
   information.  EUI-48 and EUI-64 hardware addresses contain an
   Organizational Unique Identifier in the three most significant bytes
   of the address; this OUI additionally contains bits noting whether
   the address is locally or globally administered.  Beyond this, the
   address is unstructured, and there is no particular relationship
   among the OUIs assigned to a given vendor.

   Note that hardware address information also appear within IPv6
   addresses, as the EAP-64 address, or EAP-48 address encoded as an
   EAP-64 address, is used as the least significant 64 bits of the IPv6
   address in the case of link local addressing or stateless
   autoconfiguration; the considerations and techniques in this section
   may then apply to such IPv6 addresses as well.








Boschi & Trammell        Expires April 14, 2010                [Page 11]

Internet-Draft        IP Flow Anonymisation Support         October 2009


           +-----------------------------+---------------------+
           | Scheme                      | Action              |
           +-----------------------------+---------------------+
           | Random Permutation          | Direct Substitution |
           | Structured Pseudonymisation | Direct Substitution |
           +-----------------------------+---------------------+

4.2.1.  Random Permutation

   Random permutation is a direct substitution technique, replacing each
   IP address with an address randomly selected from the set of possible
   IP addresses, guaranteeing that each anonymised address represents a
   unique original address.  The random permutation does not preserve
   any structural information about a network, but it does preserve the
   unique count of IP addresses.  Any application that requires more
   structure than host-uniqueness will not be able to use randomly
   permuted IP addresses.

4.2.2.  Structured Pseudonymisation

   Structured pseudonymisation for MAC addresses is a direct
   substitution technique, like random permutation, but restricted such
   that the OUI (the most significant three bytes) is permuted
   separately from the node identifier, the remainder.  This is useful
   when the uniqueness of OUIs must be preserved for a given analysis
   task, but introduces structure into the anonymised data which can be
   exploited in attacks against the anonymisation technique.

4.3.  Timestamp Anonymisation

   The particular time at which a flow began or ended is not
   particularly identifiable information, but it can be used as part of
   attacks against other anonymisation techniques or for user profiling.
   Presice timestamps can be used in injected-traffic fingerprinting
   attacks [CITE] as well as to identify certain activity by response
   delay and size fingerprinting [CITE].  Therefore, timestamp
   information may be anonymised in order to ensure the protection of
   the entire dataset.

          +-----------------------+----------------------------+
          | Scheme                | Action                     |
          +-----------------------+----------------------------+
          | Precision Degradation | Generalisation             |
          | Enumeration           | Direct or Set Substitution |
          | Random Shifts         | Direct Substitution        |
          +-----------------------+----------------------------+





Boschi & Trammell        Expires April 14, 2010                [Page 12]

Internet-Draft        IP Flow Anonymisation Support         October 2009


4.3.1.  Precision Degradation

   Precision Degradation is a generalisation technique that removes the
   most precise components of a timestamp, accounting all events
   occurring in each given interval (e.g. one millisecond for
   millisecond level degradation) as simultaneous.  This has the effect
   of potentially collapsing many timestamps into one.  With this
   technique time precision is reduced, and sequencing may be lost, but
   the information at which time the event occurred is preserved.  The
   anonymised data may not be generally useful for applications which
   require strict sequencing of flows.

   Note that flow meters with low time precision (e.g. second precision,
   or millisecond precision on high-capacity networks) perform the
   equivalent of precision degradation anonymisation by their design.

   Note also that degradation to a very low precision (e.g. on the order
   of minutes, hours, or days) is commonly used in analyses operating on
   time-series aggregated data, and may also be described as binning;
   though the time scales are longer and applicability more restricted,
   this is in principle the same operation.

   Precision degradation to infinitely low precision is equivalent to
   black-marker anonymisation.  Removal of timestamp information is only
   recommended for analysis tasks which have no need to separate flows
   in time, for example for counting total volumes or unique occurrences
   of other flow keys in an entire dataset.

4.3.2.  Enumeration

   Enumeration is a substitution function that retains the chronological
   order in which events occurred while eliminating time information.
   Timestamps are substituted by equidistant timestamps (or numbers)
   starting from a randomly chosen start value.  The resulting data is
   useful for applications requiring strict sequencing, but not for
   those requiring good timing information (e.g. delay- or jitter-
   measurement for QoS applications or SLA validation).

4.3.3.  Random Time Shifts

   Random time shifts add a random offset to every timestamp within a
   dataset.  This reversible substitution technique therefore retains
   duration and inter-event interval information as well as
   chronological order of flows.  It is primarily intended to defeat
   traffic injection fingerprinting attacks.






Boschi & Trammell        Expires April 14, 2010                [Page 13]

Internet-Draft        IP Flow Anonymisation Support         October 2009


4.4.  Counter Anonymisation

   Counters (such as packet and octet volumes per flow) are subject to
   fingerprinting and injection attacks against anonymisation, or for
   user profiling as timestamps are.  Counter anonymisation can help
   defeat these attacks, but are only usable for analysis tasks for
   which relative or imprecise magnitudes of activity are useful.

          +-----------------------+----------------------------+
          | Scheme                | Action                     |
          +-----------------------+----------------------------+
          | Precision Degradation | Generalisation             |
          | Binning               | Generalisation             |
          | Random noise addition | Direct or Set Substitution |
          +-----------------------+----------------------------+

4.4.1.  Precision Degradation

   As with precision degradation in timestamps, precision degradation of
   counters removes lower-order bits of the counters, treating all the
   counters in a given range as having the same value.  Depending on the
   precision reduction, this loses information about the relationships
   between sizes of similarly-sized flows, but keeps relative magnitude
   information.

4.4.2.  Binning

   Binning can be seen as a special case of precision degradation; the
   operation is identical, except for in precision degradation the
   counter ranges are uniform, and in binning they need not be.  For
   example, a common counter binning scheme for packet counters could be
   to bin values 1-2 together, and 3-infinity together, thereby
   separating potentially completely-opened TCP connections from
   unopened ones.  Binning schemes are generally chosen to keep
   precisely the amount of information required in a counter for a given
   analysis task.  Note that, also unlike precision degradation, the bin
   label need not be within the bin's range.

   Binning counters to a single bin 0-infinity, or alternately precision
   degradation to infinitely low precision, is equivalent to black-
   marker anonymisation.  Removal of counter information is only
   recommended for analysis tasks which have no need to evaluate the
   removed counter, for example for counting only unique occurrences of
   other flow keys.







Boschi & Trammell        Expires April 14, 2010                [Page 14]

Internet-Draft        IP Flow Anonymisation Support         October 2009


4.4.3.  Random Noise Addition

   Random noise addition adds a random amount to a counter in each flow;
   this is used to keep relative magnitude information and minimize the
   disruption to size relationship information while avoiding
   fingerprinting attacks against anonymisation.  Note that there is no
   guarantee that random noise addition will maintain ranking order by a
   counter among members of a set.  Random noise addition is
   particularly useful when the derived analysis data will not be
   presented in such a way as to require the lower-order bits of the
   counters.

4.5.  Anonymisation of Other Flow Fields

   Other fields, particularly port numbers and protocol numbers, can be
   used to partially identify the applications that generated the
   traffic in a a given flow trace.  This information can be used in
   fingerprinting attacks, and may be of interest on its own (e.g., to
   reveal that a certain application with suspected vulnerabilities is
   running on a given network).  These fields are generally anonymised
   using one of two techniques.

               +--------------------+---------------------+
               | Scheme             | Action              |
               +--------------------+---------------------+
               | Binning            | Generalisation      |
               | Random Permutation | Direct Substitution |
               +--------------------+---------------------+

4.5.1.  Binning

   Binning is a generalisation technique mapping a set of potentially
   non-uniform ranges into a set of arbitrarily labeled bins.  Common
   bin arrangements depend on the field type and the analysis
   application.  For example, an IP protocol bin arrangement may
   preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all
   other protocols into a single bin, to mitigate the use of uncommon
   protocols in fingerprinting attacks.  Another example arrangement may
   bin source and destination ports into low (0-1023) and high (1024-
   65535) bins in order to tell service from ephemeral ports without
   identifying individual applications.

   Binning other flow key fields to a single bin is equivalent to black-
   marker anonymisation.  Removal of other flow key information is only
   recommended for analysis tasks which have no need to differentiate
   flows on the removed keys, for example for total traffic counts or
   unique counts of other flow keys.




Boschi & Trammell        Expires April 14, 2010                [Page 15]

Internet-Draft        IP Flow Anonymisation Support         October 2009


4.5.2.  Random Permutation

   Random permutation is a direct substitution technique, replacing each
   value with an value randomly selected from the set of possible range,
   guaranteeing that each anonymised value represents a unique original
   value.  This is used to preserve the count of unique values without
   preserving information about, or the ordering of, the values
   themselves.


5.  Parameters for the Description of Anonymisation Techniques

   This section details the abstract parameters used to describe the
   anonymisation techniques examined in the previous section, on a per-
   parameter basis.  These parameters and their export safety inform the
   design of the IPFIX anonymisation metadata export specified in the
   following section.

5.1.  Stability

   Any given anonymisation technique may be applied with a varying range
   of stability.  Stability is important for assessing the comparability
   of anonymised information in different data sets, or in the same data
   set over different time periods.  In general, stability ranges from
   completely stable to completely unstable; however, note that the
   completely unstable case is indistinguishable from black-marker
   anonymisation.  A completely stable anonymisation will always map a
   given value in the real space to the same value in the anonymised
   space.  In practice, an anonymisation may also be stable for every
   data set published by an a particular producer to a particular
   consumer, stable for a stated time period within a dataset or across
   datasets, or stable only for a single data set.

   If no information about stability is available, users of anonymised
   data may assume that the techniques used are stable across the entire
   dataset, but unstable across datasets.  Note that stability presents
   a risk-utility tradeoff, as completely stable anonymisation can be
   used for longer-term trend analysis tasks but also presents more risk
   of attack given the stable mapping.

5.2.  Truncation Length

   Truncation and precision degradation are described by the truncation
   length, or the amount of data still remaining in the anonymised field
   after anonymisation.

   Truncation length can be inferred from a given data set, and need not
   be specially exported or protected.



Boschi & Trammell        Expires April 14, 2010                [Page 16]

Internet-Draft        IP Flow Anonymisation Support         October 2009


5.3.  Bin Map

   Binning is described by the specification of a bin mapping function.
   This function can be generally expressed in terms of an associative
   array that maps each point in the original space to a bin, although
   from an implementation standpoint most bin functions are much simpler
   and more efficient.

   Since knowledge of the bin mapping function can be used to partially
   deanonymise binned data, depending on the degree of generalisation,
   no information about the bin mapping function should be exported.

5.4.  Permutation

   Like binning, permutation is described by the specification of a
   permutation function.  In the general case, this can be expressed in
   terms of an associative array that maps each point in the original
   space to a point in the anonymised space.  Unlike binning, each point
   in the anonymised space must correspond to a single, unique point in
   the original space.

   Since knowledge of the permutation function can be used to completely
   deanonymise permuted data, no information about the permutation
   function or its parameters should be exported.

5.5.  Shift Amount

   Shifting requires an amount to shift each value by.  Since the shift
   amount can be used to deanonymise data protected by shifting, no
   information about the shift amount should be exported.


6.  Anonymisation Export Support in IPFIX

   Anonymised data exported via IPFIX SHOULD be annotated with
   anonymisation metadata, which details which fields described by which
   Templates are anonymised, and provides appropriate information on the
   anonymisation techniques used.  This metadata SHOULD be exported in
   Data Records described by the recommended Options Templates described
   in this section; these Options Templates use the additional
   Information Elements described in the following subsection.

   Note that fields anonymised using the black-marker (removal)
   technique do not require any special metadata support.  Black-marker
   anonymised fields SHOULD NOT be exported at all; the absence of the
   field in a given Data Set is implicitly declared by not including the
   corresponding Information Element in the Template describing that
   Data Set; exporting "empty" data elements is inefficient and in the



Boschi & Trammell        Expires April 14, 2010                [Page 17]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   general case impossible, as many non-counter Information Elements do
   not have semantically distinct null values.

6.1.  Anonymisation Options Template

   The Anonymisation Options Template describes anonymisation records,
   which allow anonymisation metadata to be exported inline over IPFIX
   or stored in an IPFIX File, by binding information about
   anonymisation techniques to Information Elements within defined
   Templates.  IPFIX Exporting Processes SHOULD export anonymisation
   records for any Template describing exported anonymised Data Records;
   IPFIX Collecting Processes and processes downstream from them MAY use
   anonymisation records to treat anonymised data differently depending
   on the applied technique.

   An Exporting Process SHOULD export anonymisation records after the
   Templates they describe have been exported, and SHOULD export
   anonymisation records reliably.

   Anonymisation records, like Templates, MUST be handled by Collecting
   Processes as scoped to the Transport Session in which they are sent.
   While the anonymisationStability IE can be used to declare that a
   given anonymisation technique's mapping will remain stable across
   multiple sessions, each session MUST re-export the anonymisation
   Records along with the templates.

   [EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the
   same time is indicated with multiple elements of the same type (in
   application order as in PSAMP).  Need to verify this is actually
   useful given the defined techniques.]

   +-------------------------+-----------------------------------------+
   | IE                      | Description                             |
   +-------------------------+-----------------------------------------+
   | templateId [scope]      | The Template ID of the Template         |
   |                         | containing the Information Element      |
   |                         | described by this anonymisation record. |
   |                         | This Information Element MUST be        |
   |                         | defined as a Scope Field.               |
   | informationElementId    | The Information Element identifier of   |
   | [scope]                 | the Information Element described by    |
   |                         | this anonymisation record.  This        |
   |                         | Information Element MUST be defined as  |
   |                         | a Scope Field.                          |







Boschi & Trammell        Expires April 14, 2010                [Page 18]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   | informationElementIndex | The Information Element index of the    |
   | [scope] [optional]      | instance of the Information Element     |
   |                         | described by this anonymisation record  |
   |                         | identified by the informationElementId  |
   |                         | within the Template.  Optional; need    |
   |                         | only be present when describing         |
   |                         | Templates that have multiple instances  |
   |                         | of the same Information Element.  This  |
   |                         | Information Element MUST be defined as  |
   |                         | a Scope Field if present.  This         |
   |                         | Information Element is defined in       |
   |                         | Section 6.2, below.                     |
   | anonymisationStability  | The stability class of the anonymised   |
   |                         | data.  MUST be present.  This           |
   |                         | Information Element is defined in       |
   |                         | Section 6.2, below.                     |
   | anonymisationTechnique  | The technique used to anonymise the     |
   |                         | data.  MUST be present.  This           |
   |                         | Information Element is defined in       |
   |                         | Section 6.2, below.                     |
   +-------------------------+-----------------------------------------+

6.2.  Recommended Information Elements for Anonymisation Metadata

6.2.1.  anonymisationStability

   Description:   A description of the stability class of the
      anonymisation technique applied to a referenced Information
      Element within a referenced Template.  Stability classes refer to
      the stability of the parameters of the anonymisation technique,
      and therefore the comparability of the mapping between the real
      and anonymised values over time.  This determines which anonymised
      datasets may be compared with each other.

   +-------+-----------------------------------------------------------+
   | Value | Description                                               |
   +-------+-----------------------------------------------------------+
   | 0     | Undefined: the Exporting Process makes no representation  |
   |       | as to how stable the mapping is, or over what time period |
   |       | values of this field will remain comparable; while the    |
   |       | Collecting Process MAY assume Session level stability,    |
   |       | Session level stability is not guaranteed.  This is       |
   |       | equivalent to 0x01 Session level stability while advising |
   |       | the Collecting Process that no special effort has been    |
   |       | made to ensure stability.  Collecting Processes SHOULD    |
   |       | assume this is the case in the absence of stability class |
   |       | information; this is the default stability class.         |




Boschi & Trammell        Expires April 14, 2010                [Page 19]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   | 1     | Session: the Exporting Process will ensure that the       |
   |       | parameters of the anonymisation technique are stable      |
   |       | during the Transport Session.  All the values of the      |
   |       | described Information Element for each Record described   |
   |       | by the referenced Template within the Transport Session   |
   |       | are comparable.  The Exporting Process SHOULD endeavour   |
   |       | to ensure at least this stability class.                  |
   | 2     | Exporter-Collector Pair: the Exporting Process will       |
   |       | ensure that the parameters of the anonymisation technique |
   |       | are stable across Transport Sessions over time with the   |
   |       | given Collecting Process, but may use different           |
   |       | parameters for different Collecting Processes.  Data      |
   |       | exported to different Collecting Processes is not         |
   |       | comparable.                                               |
   | 3     | Stable: the Exporting Process will ensure that the        |
   |       | parameters of the anonymisation technique are stable      |
   |       | across Transport Sessions over time, regardless of the    |
   |       | Collecting Process to which it is sent.                   |
   +-------+-----------------------------------------------------------+

   Abstract Data Type:   unsigned8

   ElementId:   TBD1

   Status:   Proposed

6.2.2.  anonymisationTechnique

   Description:   A description of the anonymisation technique applied
      to a referenced Information Element within a referenced Template.
      Each technique may be applicable only to certain Information
      Elements and recommended only for certain Infomation Elements;
      these restrictions are noted in the table below.


















Boschi & Trammell        Expires April 14, 2010                [Page 20]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   +-------+--------------------------------+------------+-------------+
   | Value | Description                    | Applicable | Recommended |
   |       |                                | to         | for         |
   +-------+--------------------------------+------------+-------------+
   | 0     | Undefined: the Exporting       | all        | all         |
   |       | Process makes no               |            |             |
   |       | representation as to whether   |            |             |
   |       | the defined field is           |            |             |
   |       | anonymised or not.  While the  |            |             |
   |       | Collecting Process MAY assume  |            |             |
   |       | that the field is not          |            |             |
   |       | anonymised, it is not          |            |             |
   |       | guaranteed not to be.  This is |            |             |
   |       | the default anonymisation      |            |             |
   |       | technique.                     |            |             |
   | 1     | None: the values exported are  | all        | all         |
   |       | real.                          |            |             |
   | 2     | Precision                      | all        | all         |
   |       | Degradation/Truncation: the    |            |             |
   |       | values exported are anonymised |            |             |
   |       | using simple precision         |            |             |
   |       | degradation or truncation.     |            |             |
   |       | The new precision is implicit  |            |             |
   |       | in the exported data, and can  |            |             |
   |       | be deduced by the Collecting   |            |             |
   |       | Process.                       |            |             |
   | 3     | Binning: the values exported   | all        | all         |
   |       | are anonymised into bins.      |            |             |
   | 4     | Enumeration: the values        | all        | timestamps  |
   |       | exported are anonymised by     |            |             |
   |       | enumeration.                   |            |             |
   | 5     | Permutation: the values        | all        | identifiers |
   |       | exported are anonymised by     |            |             |
   |       | random permutation.            |            |             |
   | 6     | Structured Permutation: the    | addresses  |             |
   |       | values exported are anonymised |            |             |
   |       | by random permutation,         |            |             |
   |       | preserving bit-level structure |            |             |
   |       | as appropriate; this           |            |             |
   |       | represents prefix-preserving   |            |             |
   |       | IP address anonymisation or    |            |             |
   |       | structured MAC address         |            |             |
   |       | anonymisation.                 |            |             |
   +-------+--------------------------------+------------+-------------+







Boschi & Trammell        Expires April 14, 2010                [Page 21]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   Abstract Data Type:   unsigned8

   ElementId:   TBD2

   Status:   Proposed

6.2.3.  informationElementIndex

   Description:   A zero-based index of an Information Element
      referenced by informationElementId within a Template referenced by
      templateId; used to disambiguate scope for templates containing
      multiple identical Information Elements.

   Abstract Data Type:   unsigned16

   ElementId:   TBD3

   Status:   Proposed


7.  Applying Anonymisation Techniques to IPFIX Export and Storage

   When exporting or storing anonymised flow data using IPFIX, certain
   interactions between the IPFIX Protocol and the anonymisation
   techniques in use must be considered; these are treated in the
   subsections below.

7.1.  Arrangement of Processes in IPFIX Anonymisation

   Anonymisation may be applied to IPFIX data at three stages within a
   the collection infrastructure: on initial export, at a mediator, or
   after collection, as shown in Figure 2.  Each of these locations has
   specific considerations and applicability.


















Boschi & Trammell        Expires April 14, 2010                [Page 22]

Internet-Draft        IP Flow Anonymisation Support         October 2009


               +==========================================+
               | Exporting Process                        |
               +==========================================+
                 |                                      |
                 |    (Anonymised at Original Exporter) |
                 V                                      |
               +=============================+          |
               | Mediator                    |          |
               +=============================+          |
                 |                                      |
                 | (Anonymising Mediator)               |
                 V                                      V
               +==========================================+
               | Collecting Process                       |
               +==========================================+
                       |
                       | (Anonymising CP/File Writer)
                       V
               +--------------------+
               | IPFIX File Storage |
               +--------------------+

                Figure 2: Potential Anonymisation Locations

   Anonymisation is generally performed before the wider dissemination
   or repurposing of a flow data set, e.g., adapting operational
   measurement data for research.  Therefore, direct anonymisation of
   flow data on initial export is only applicable in certain restricted
   circumstances: when the Exporting Process is "publishing" data to a
   Collecting Process directly, and the Exporting Process and Collecting
   Process are operated by different entities.  Note that certain
   guidelines in Section 7.2.2 with respect to timestamp anonymisation
   may not apply in this case, as the Collecting Process may be able to
   deduce certain timing information from the time at which each Message
   is received.

   A much more flexible arrangement is to anonymise data within a
   Mediator [I-D.ietf-ipfix-mediators-framework].  Here, original data
   is sent to a Mediator, which performs the anonymisation function and
   re-exports the anonymised data.  Such a Mediator could be located at
   the administrative domain boundary of the initial Exporting Process
   operator, exporting anonymised data to other consumers outside the
   organisation.  In this case, the original Exporter SHOULD use TLS as
   specified in [RFC5101] to secure the channel to the Mediator, and the
   Mediator should follow the guidelines in Section 7.2, to mitigate the
   risk of original data disclosure.

   When data is to be published as an anonymised data set in an IPFIX



Boschi & Trammell        Expires April 14, 2010                [Page 23]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   File [I-D.ietf-ipfix-file], the anonymisation may be done at the
   final Collecting Process before storage and dissemination, as well.
   In this case, the Collector should follow the guidelines in
   Section 7.2, especially as regards File-specific Options in
   Section 7.2.3

   In each of these data flows, the anonymisation of records is
   undertaken by an Intermediate Anonymisation Process (IAP); the data
   flows into and out of this IAP are shown in Figure 3 below.

   packets --+                     +- IPFIX Messages -+
             |                     |                  |
             V                     V                  V
   +==================+ +====================+ +=============+
   | Metering Process | | Collecting Process | | File Reader |
   +==================+ +====================+ +=============+
             |      Non-anonymised | Records          |
             V                     V                  V
   +=========================================================+
   |          Intermediate Anonymisation Process (IAP)       |
   +=========================================================+
             | Anonymised     ^            Anonymised |
             | Records        |               Records |
             V                |                       V
   +===================+    Anonymisation      +=============+
   | Exporting Process |<--- Parameters ------>| File Writer |
   +===================+                       +=============+
             |                                        |
             +------------> IPFIX Messages <----------+

          Figure 3: Data flows through the anonymisation process

   Anonymisation parameters must also be available to the Exporting
   Process and/or File Writer in order to ensure header data is also
   appropriately anonymised as in Section 7.2.2.

   Following each of the data flows through the IAP, we describe five
   basic types of anonymisation arrangements within this framework in
   Figure 4.  In addition to the three arrangements described in detail
   above, anonymisation can also be done at a collocated Metering
   Process and File Writer (see section 7.3.2 of [I-D.ietf-ipfix-file]),
   or at a file manipulator (see section 7.3.7 of
   [I-D.ietf-ipfix-file]).








Boschi & Trammell        Expires April 14, 2010                [Page 24]

Internet-Draft        IP Flow Anonymisation Support         October 2009


         +----+  +-----+  +----+
 pkts -> | MP |->| IAP |->| EP |-> anonymisation on Original Exporter
         +----+  +-----+  +----+
         +----+  +-----+  +----+
 pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer
         +----+  +-----+  +----+
         +----+  +-----+  +----+
IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masquerading Proxy)
         +----+  +-----+  +----+
         +----+  +-----+  +----+
IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer
         +----+  +-----+  +----+
         +----+  +-----+  +----+
IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator
 File    +----+  +-----+  +----+

        Figure 4: Possible anonymisation arrangements in the IPFIX
                               architecture

   Note that anonymisation may occur at more than one location within a
   given collection infrastructure, to provide varying levels of
   anonymisation, disclosure risk, or data utility for specific
   purposes.

7.2.  IPFIX-Specific Anonymisation Guidelines

   In implementing and deploying the anonymisation techniques described
   in this document, implementors should note that IPFIX already
   provides features that support anonymised data export, and use these
   where appropriate.  Care must also be taken that data structures
   supporting the operation of the protocol itself do not leak data that
   could be used to reverse the anonymisation applied to the flow data.
   Such data structures may appear in the header, or within the data
   stream itself, especially as options data.  Each of these and their
   impact on specific anonymisation techniques is noted in a separate
   subsection below.

7.2.1.  Appropriate Use of Information Elements for Anonymised Data

   Note, as in Section 6 above, that black-marker anonymised fields
   SHOULD NOT be exported at all; the absence of the field in a given
   Data Set is implicitly declared by not including the corresponding
   Information Element in the Template describing that Data Set.

   When using precision degradation of timestamps, Exporting Processes
   SHOULD export timing information using Information Elements of an
   appropriate precision, as explained in Section 4.5 of [RFC5153].  For
   example, timestamps measured in millisecond-level precision and



Boschi & Trammell        Expires April 14, 2010                [Page 25]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   degraded to second-level precision should use flowStartSeconds and
   flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds.

   When exporting anonymised data and anonymisation metadata, Exporting
   Processes SHOULD ensure that the combination of Information Element
   and declared anonymisation technique are compatible.  Specifically,
   the applicable and recommended Information Element types and
   semantics for each technique are noted in the description of the
   anonymisationTechnique Information Element in Section 6.2.2.  In this
   description, a timestamp is an Information Element with the data type
   dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or
   dateTimeNanoseconds; an address is an Information Element with the
   data type ipv4Address, ipv6Address, or macAddress; and an identifier
   is an Information Element with identifier data type semantics.
   Exporting Process MUST NOT export Anonymisation Options records
   binding techniques to Information Elements to which they are not
   applicable, and SHOULD NOT export Anonymisation Options records
   binding techniques to Information Elements for which they are not
   recommended.

7.2.2.  Anonymisation of Header Data

   Each IPFIX Message contains a Message Header; within this Message
   Header are contained two fields which may be used to break certain
   anonymisation techniques: the Export Time, and the Observation Domain
   ID

   Export of IPFIX Messages containing anonymised timestamp data where
   the original Export Time Message header has some relationship to the
   anonymised timestamps SHOULD anonymise the Export Time header field
   using an equivalent technique, if possible.  Otherwise, relationships
   between export and flow time could be used to partially or totally
   reverse timestamp anonymisation.

   The similarity in size between an Observation Domain ID and an IPv4
   address (32 bits) may lead to a temptation to use an IPv4 interface
   address on the Metering or Exporting Process as the Observation
   Domain ID.  If this address bears some relation to the IP addresses
   in the flow data (e.g., shares a network prefix with internal
   addresses) and the IP addresses in the flow data are anonymised in a
   structure-preserving way, then the Observation Domain ID may be used
   to break the IP address anonymisation.  Use of an IPv4 interface
   address on the Metering or Exporting Process as the Observation
   Domain ID is NOT RECOMMENDED in this case.







Boschi & Trammell        Expires April 14, 2010                [Page 26]

Internet-Draft        IP Flow Anonymisation Support         October 2009


7.2.3.  Anonymisation of Options Data

   IPFIX uses the Options mechanism to export, among other things,
   metadata about exported flows and the flow collection infrastructure.
   As with the IPFIX Message Header, certain Options recommended in
   [RFC5101] and the IPFIX File Format [I-D.ietf-ipfix-file] containing
   flow timestamps and network addresses of Exporting and Collecting
   Processes may be used to break certain anonymisation techniques; care
   should be taken while using them with anonymised data export and
   storage.

   The Exporting Process Reliability Statistics Options Template,
   recommended in [RFC5101], contains an Exporting Process ID field,
   which may be an exportingProcessIPv4Address Information Element or an
   exportingProcessIPv6Address Information Element.  If the Exporting
   Process address bears some relation to the IP addresses in the flow
   data (e.g., shares a network prefix with internal addresses) and the
   IP addresses in the flow data are anonymised in a structure-
   preserving way, then the Exporting Process address may be used to
   break the IP address anonymisation.  Exporting Processes exporting
   anonymised data in this situation SHOULD mitigate the risk of attack
   either by omitting Options described by the Exporting Process
   Reliability Statistics Options Template, or by anonymising the
   Exporting Process address using a similar technique to that used to
   anonymise the IP addresses in the exported data.

   Similarly, the Export Session Details Options Template and Message
   Details Options Template specified for the IPFIX File Format
   [I-D.ietf-ipfix-file] may contain the exportingProcessIPv4Address
   Information Element or the exportingProcessIPv6Address Information
   Element to identify an Exporting Process from which a flow record was
   received, and the collectingProcessIPv4Address Information Element or
   the collectingProcessIPv6Address Information Element to identify the
   Collecting Process which received it.  If the Exporting Process or
   Collecting Process address bears some relation to the IP addresses in
   the flow data (e.g., shares a network prefix with internal addresses)
   and the IP addresses in the flow data are anonymised in a structure-
   preserving way, then the Exporting Process or Collecting Process
   address may be used to break the IP address anonymisation.  Since
   these Options Templates are primarily intended for storing IPFIX
   Transport Session data for auditing, replay, and testing purposes, it
   is NOT RECOMMENDED that storage of anonymised data include these
   Options Templates in order to mitigate the risk of attack.

   The Message Details Options Template specified for the IPFIX File
   Format [I-D.ietf-ipfix-file] also contains the
   collectionTimeMilliseconds Information Element.  As with the Export
   Time Message Header field, if the exported flow data contains



Boschi & Trammell        Expires April 14, 2010                [Page 27]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   anonymised timestamp information, and the collectionTimeMilliseconds
   Information Element in a given Message has some relationship to the
   anonymised timestamp information, then this relationship can be
   exploited to reverse the timestamp anonymisation.  Since this Options
   Template is primarily intended for storing IPFIX Transport Session
   data for auditing, replay, and testing purposes, it is NOT
   RECOMMENDED that storage of anonymised data include this Options
   Template in order to mitigate the risk of attack.

   Since the Time Window Options Template specified for the IPFIX File
   Format [I-D.ietf-ipfix-file] refers to the timestamps within the flow
   data to provide partial table of contents information for an IPFIX
   File, care must be taken to ensure that Options described by this
   template are written using the anonymised timestamps instead of the
   original ones.


8.  Examples

   [TODO: write this section.]


9.  Security Considerations

   This document provides guidelines for exporting metadata about
   anonymised data in IPFIX, or storing metadata about anonymised data
   in IPFIX Files.  It is not intended as a general statement on the
   applicability of specific flow data anonymisation techniques.
   Exporters or publishers of anonymised data must take care that the
   applied anonymisation technique is appropriate for the data source,
   the purpose, and the risk of deanonymisation of a given application.

   We note specifically that anonymisation is not a replacement for
   encryption for confidentiality.  It is only appropriate for
   protecting identifying information in data to be used for purposes in
   which the protected data is irrelevant.  Confidentiality in export is
   best served by using TLS or DTLS as in the Security Considerations
   section of [RFC5101], and in long-term storage by implementation-
   specific protection applied as in the Security Considerations section
   of [I-D.ietf-ipfix-file].  Indeed, confidentiality and anonymisation
   are not mutually exclusive, as encryption for confidentiality may be
   applied to anonymised data export or storage, as well, when the
   anonymised data is not intended for public release.

   When using pseudonymisation techniques that have a mutable mapping,
   there is an inherent tradeoff in the stability of the map between
   long-term comparability and security of the dataset against
   deanonymisation.  In general, deanonymisation attacks are more



Boschi & Trammell        Expires April 14, 2010                [Page 28]

Internet-Draft        IP Flow Anonymisation Support         October 2009


   effective given more information, so the longer a given mapping is
   valid, the more information can be applied to deanonymisation.  The
   specific details of this are technique-dependent and therefore out of
   the scope of this document.

   When releasing anonymised data, publishers need to ensure that data
   that could be used in deanonymisation is not leaked through the
   export protocol; guidelines for addressing this risk are provided in
   Section 7.2.

   Note as well that the Security Considerations section of [RFC5101]
   applies as well to the export of anonymised data, and the Security
   Considerations section of [I-D.ietf-ipfix-file] to the storage of
   anonymised data, or the publication of anonymised traces.


10.  IANA Considerations

   This document specifies the creation of several new IPFIX Information
   Elements in the IPFIX Information Element registry located at
   http://www.iana.org/assignments/ipfix, as defined in Section 6.2
   above.  IANA has assigned the following Information Element numbers
   for their respective Information Elements as specified below:

   o  Information Element number TBD1 for the anonymisationStability
      Information Element.

   o  Information Element number TBD2 for the anonymisationTechnique
      Information Element.

   o  Information Element number TBD3 for the informationElementIndex
      Information Element.

   [NOTE for IANA: The text TBDn should be replaced with the respective
   assigned Information Element numbers where they appear in this
   document.]


11.  Acknowledgments

   We thank Paul Aitken for his comments and insight, and the PRISM
   project for its support of this work.


12.  References






Boschi & Trammell        Expires April 14, 2010                [Page 29]

Internet-Draft        IP Flow Anonymisation Support         October 2009


12.1.  Normative References

   [RFC5101]  Claise, B., "Specification of the IP Flow Information
              Export (IPFIX) Protocol for the Exchange of IP Traffic
              Flow Information", RFC 5101, January 2008.

   [RFC5102]  Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
              Meyer, "Information Model for IP Flow Information Export",
              RFC 5102, January 2008.

12.2.  Informative References

   [RFC5472]  Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP
              Flow Information Export (IPFIX) Applicability", RFC 5472,
              March 2009.

   [RFC5470]  Sadasivan, G., Brownlee, N., Claise, B., and J. Quittek,
              "Architecture for IP Flow Information Export", RFC 5470,
              March 2009.

   [I-D.ietf-ipfix-file]
              Trammell, B., Boschi, E., Mark, L., Zseby, T., and A.
              Wagner, "Specification of the IPFIX File Format",
              draft-ietf-ipfix-file-05 (work in progress), August 2009.

   [I-D.ietf-ipfix-mediators-framework]
              Kobayashi, A., Nishida, H., and B. Claise, "IPFIX
              Mediation: Framework",
              draft-ietf-ipfix-mediators-framework-03 (work in
              progress), July 2009.

   [I-D.ietf-ipfix-mediators-problem-statement]
              Kobayashi, A., Claise, B., Nishida, H., Sommer, C.,
              Dressler, F., and E. Stephan, "IPFIX Mediation: Problem
              Statement",
              draft-ietf-ipfix-mediators-problem-statement-05 (work in
              progress), July 2009.

   [RFC5153]  Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P.
              Aitken, "IP Flow Information Export (IPFIX) Implementation
              Guidelines", RFC 5153, April 2008.

   [RFC3917]  Quittek, J., Zseby, T., Claise, B., and S. Zander,
              "Requirements for IP Flow Information Export (IPFIX)",
              RFC 3917, October 2004.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.



Boschi & Trammell        Expires April 14, 2010                [Page 30]

Internet-Draft        IP Flow Anonymisation Support         October 2009


Authors' Addresses

   Elisa Boschi
   Hitachi Europe
   c/o ETH Zurich
   Gloriastrasse 35
   8092 Zurich
   Switzerland

   Phone: +41 44 632 70 57
   Email: elisa.boschi@hitachi-eu.com


   Brian Trammell
   Hitachi Europe
   c/o ETH Zurich
   Gloriastrasse 35
   8092 Zurich
   Switzerland

   Phone: +41 44 632 70 13
   Email: brian.trammell@hitachi-eu.com





























Boschi & Trammell        Expires April 14, 2010                [Page 31]


Html markup produced by rfcmarkup 1.108, available from http://tools.ietf.org/tools/rfcmarkup/