|Network Working Group||M. Hansen|
|Intended status: Informational||H. Tschofenig|
|Expires: September 11, 2012||Nokia Siemens Networks|
|March 12, 2012|
Privacy Terminology and Concepts
Privacy is a concept that has been debated and argued throughout the last few millennia. Its most striking feature is the difficulty that disparate parties encounter when they attempt to precisely define it. In order to discuss privacy in a meaningful way, a tightly defined context is necessary. The specific context of privacy used within this document is that of personal data in Internet protocols. Personal data is any information relating to a data subject, where a data subject is an identified natural person or a natural person who can be identified, directly or indirectly.
A lot of work within the IETF involves defining protocols that can potentially transport (either explicitly or implicitly) personal data. This document aims to establish a consistent lexicon around privacy for IETF contributors to use when discussing privacy considerations within their work.
Note: This document is discussed at https://www.ietf.org/mailman/listinfo/ietf-privacy
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 11, 2012.
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Privacy is a concept that has been debated and argued throughout the last few millennia by all manner of people, including philosophers, psychologists, lawyers, and more recently, computer scientists. Its most striking feature is the difficulty that disparate parties encounter when they attempt to precisely define it. Each individual, group, and culture has its own views and preconceptions about privacy, some of which are mutually complimentary and some of which diverge. However, it is generally (but not unanimously) agreed that the protection of privacy is "A Good Thing." People often do not realize how they value privacy until they lose it.
In order to discuss privacy in a meaningful way, a tightly defined context is necessary. The specific context of privacy used within this document is that of "personal data" in Internet protocols. Personal data is any information relating to a data subject, where a data subject is an identified natural person or a natural person who can be identified, directly or indirectly.
A lot of work within the IETF involves defining protocols that can potentially transport personal data. Protocols are therefore capable of enabling both privacy protections and privacy breaches. Protocol architects often do not assume a specific relationship between the identifiers and data elements communicated in protocols and the humans using the software running the protocols. However, a protocol may facilitate the identification of a natural person depending on how protocol identifiers and other state are created and communicated.
One commonly held privacy objective is that of data minimization -- eliminating the potential for personal data to be collected. Often, however, the collection of personal data cannot not be prevented entirely, in which case the goal is to minimize the amount of personal data that can be collected for a given purpose and to offer ways to control the dissemination of personal data. This document focuses on introducing terms used to describe privacy properties that support data minimization.
Other techniques have been proposed and implemented that aim to enhance privacy by providing misinformation (inaccurate or erroneous information, provided usually without conscious effort to mislead or deceive) or disinformation (deliberately false or distorted information provided in order to mislead or deceive). These techniques are out of scope for this document.
This document aims to establish a basic lexicon around privacy so that IETF contributors who wish to discuss privacy considerations within their work (see [I-D.iab-privacy-considerations]) can do so using terminology consistent across areas. Note that it does not attempt to define all aspects of privacy terminology, rather it discusses terms describing the most common ideas and concepts.
The following sub-sections define terms related to different ways of reducing identifiability.
To enable anonymity of a data subject, there must exist a set of data subjects with potentially the same attributes, i.e., to the attacker or the observer these data subjects must appear indistinguishable from each other. The set of all such data subjects is known as the anonymity set and membership of this set may vary over time.
The composition of the anonymity set depends on the knowledge of the observer or attacker. Thus anonymity is relative with respect to the observer or attacker. An initiator may be anonymous only within a set of potential initiators -- its initiator anonymity set -- which itself may be a subset of all data subjects that may initiate communications. Conversely, a recipient may be anonymous only within a set of potential receipients -- its receipient anonymity set. Both anonymity sets may be disjoint, may overlap, or may be the same.
As an example consider RFC 3325 (P-Asserted-Identity, PAI) [RFC3325], an extension for the Session Initiation Protocol (SIP), that allows a data subject, such as a VoIP caller, to instruct an intermediary that he or she trusts not to populate the SIP From header field with the subject's authenticated and verified identity. The recipient of the call, as well as any other entity outside of the data subject's trust domain, would therefore only learn that the SIP message (typically a SIP INVITE) was sent with a header field 'From: "Anonymous" <sip:email@example.com>' rather than the subject's address-of-record, which is typically thought of as the "public address" of the user (the data subject). When PAI is used, the data subject becomes anonymous within the initiator anonymity set that is populated by every data subject making use of that specific intermediary.
Note: This example ignores the fact that other personal data may be inferred from the other SIP protocol payloads. This caveat makes the analysis of the specific protocol extension easier but cannot be assumed when conducting analysis of an entire architecture.
In the context of IETF protocols almost all identifiers are pseudonyms since there is typically no requirement to use real names in protocols. However, in certain scenarios it is reasonable to assume that real names will be used (with vCard [RFC6350], for example).
Pseudonymity is strengthened when less personal data can be linked to the pseudonym; when the same pseudonym is used less often and across fewer contexts; and when independently chosen pseudonyms are more frequently used for new actions (making them, from an observer's or attacker's perspective, unlinkable).
For Internet protocols it is important whether protocols allow pseudonyms to be changed without human interaction, the default length of pseudonym lifetimes, to whom pseudonyms are exposed, how data subjects are able to control disclosure, how often pseudonyms can be changed, and the consequences of changing them. These aspects are described in [I-D.iab-privacy-considerations].
As an example, consider the network access authentication procedures utilizing the Extensible Authentication Protocol (EAP) [RFC3748]. EAP includes an identity exchange where the Identity Response is primarily used for routing purposes and selecting which EAP method to use. Since EAP Identity Requests and Responses are sent in cleartext, eavesdroppers and intermediaries along the communication path between the EAP peer and the EAP server can snoop on the identity. To address this treat, as discussed in RFC 4282 [RFC4282], the user's identity can be hidden against these observers with the cryptography support by EAP methods. Identity confidentiality has become a recommended design criteria for EAP (see [RFC4017]). EAP-AKA [RFC4187], for example, protects the EAP peer's identity against passive adversaries by utilizing temporal identities. EAP-IKEv2 [RFC5106] is an example of an EAP method that offers protection against active observers with regard to the data subject's identity.
Unlinkability of two or more messages may depend on whether their content is protected against the observer or attacker. In the cases where this is not true, messages may only be unlinkable if it is assumed that the observer or attacker is not able to infer information about the initiator or receipient from the message content itself. It is worth noting that even if the content itself does not betray linkable information explicitly, deep semantic analysis of a message sequence can often detect certain characteristics that link them together, including similarities in structure, style, use of particular words or phrases, consistent appearance of certain grammatical errors, and so forth.
There are several items of terminology highly related to unlinkability:
In contrast to anonymity and unlinkability, where the IOI is protected indirectly through protection of the IOI's relationship to a subject or other IOI, undetectability means the IOI is directly protected. For example, undetectability is as a desirable property of steganographic systems.
If we consider the case where an IOI is a message, then undetectability means that the message is not sufficiently discernible from other messages (from, e.g., random noise).
Achieving anonymity, unlinkability, and undetectability may enable extreme data minimization. Unfortunately, this would also prevent a certain class of useful two-way communication scenarios. Therefore, for many applications, a certain amount of linkability and detectability is usually accepted while attempting to retain unlinkability between the data subject and his or her transactions. This is achieved through the use of appropriate kinds of pseudonymous identifiers. These identifiers are then often used to refer to established state or are used for access control purposes, see [I-D.iab-identifier-comparison].
[To be provided in a future version once the guidance is settled.]
Parts of this document utilizes content from [anon_terminology], which had a long history starting in 2000 and whose quality was improved due to the feedback from a number of people. The authors would like to thank Andreas Pfitzmann for his work on an earlier draft version of this document.
Within the IETF a number of persons had provided their feedback to this document. We would like to thank Scott Brim, Marc Linsner, Bryan McLaughlin, Nick Mathewson, Eric Rescorla, Scott Bradner, Nat Sakimura, Bjoern Hoehrmann, David Singer, Dean Willis, Christine Runnegar, Lucy Lynch, Trend Adams, Mark Lizar, Martin Thomson, Josh Howlett, Mischa Tuffield, S. Moonesamy, Ted Hardie, Zhou Sujing, Claudia Diaz, Leif Johansson, and Klaas Wierenga.
This document introduces terminology for talking about privacy within IETF specifications. Since privacy protection often relies on security mechanisms then this document is also related to security in its broader context.
This document does not require actions by IANA.
|[I-D.iab-privacy-considerations]||Cooper, A, Tschofenig, H, Aboba, B, Peterson, J and J Morris, "Privacy Considerations for Internet Protocols", Internet-Draft draft-iab-privacy-considerations-01, October 2011.|
|[id]||Identifier - Wikipedia", Wikipedia , URL: http://en.wikipedia.org/wiki/Identifier, Dec 2011., "|
|[RFC3325]||Jennings, C., Peterson, J. and M. Watson, "Private Extensions to the Session Initiation Protocol (SIP) for Asserted Identity within Trusted Networks", RFC 3325, November 2002.|
|[I-D.iab-identifier-comparison]||Thaler, D, "Issues in Identifier Comparison for Security Purposes", Internet-Draft draft-iab-identifier-comparison-00, July 2011.|
|[RFC6265]||Barth, A., "HTTP State Management Mechanism", RFC 6265, April 2011.|
|[RFC3748]||Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J. and H. Levkowetz, "Extensible Authentication Protocol (EAP)", RFC 3748, June 2004.|
|[RFC4017]||Stanley, D., Walker, J. and B. Aboba, "Extensible Authentication Protocol (EAP) Method Requirements for Wireless LANs", RFC 4017, March 2005.|
|[RFC4282]||Aboba, B., Beadles, M., Arkko, J. and P. Eronen, "The Network Access Identifier", RFC 4282, December 2005.|
|[RFC4187]||Arkko, J. and H. Haverinen, "Extensible Authentication Protocol Method for 3rd Generation Authentication and Key Agreement (EAP-AKA)", RFC 4187, January 2006.|
|[RFC4949]||Shirey, R., "Internet Security Glossary, Version 2", RFC 4949, August 2007.|
|[RFC5106]||Tschofenig, H., Kroeselberg, D., Pashalidis, A., Ohba, Y. and F. Bersani, "The Extensible Authentication Protocol-Internet Key Exchange Protocol version 2 (EAP-IKEv2) Method", RFC 5106, February 2008.|
|[RFC6350]||Perreault, S., "vCard Format Specification", RFC 6350, August 2011.|
|[RFC5077]||Salowey, J., Zhou, H., Eronen, P. and H. Tschofenig, "Transport Layer Security (TLS) Session Resumption without Server-Side State", RFC 5077, January 2008.|
|[RFC5246]||Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, August 2008.|
|[panopticlick]||Eckersley, P., "How Unique Is Your Web Browser?", Electronig Frontier Foundation , URL: https://panopticlick.eff.org/browser-uniqueness.pdf, 2009.|
|[Chau81]||Chaum, D., "Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms", Communications of the ACM , 24/2, 84-88, 1981.|
|[anon_terminology]||Pfitzmann, A. and M. Hansen, "A terminology for talking about privacy by data minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management", URL: http://dud.inf.tu-dresden.de/literatur/Anon_Terminology_v0.34.pdf , version 034, 2010.|