[Docs] [txt|pdf|xml|html] [Tracker] [Email] [Nits]
Versions: 00
Network Working Group S. Rao
Internet-Draft Grab
Intended status: Experimental S. Sahib
Expires: May 7, 2020 R. Guest
Salesforce
November 4, 2019
Personal Information Tagging for Logs (PITFoL)
draft-rao-pitfol-00
Abstract
Software applications typically generate a large amount of log data
in the course of their operation in order to help with monitoring,
troubleshooting, etc. However, like all data generated and operated
upon by software systems, logs can contain information sensitive to
users. Personal data identification and anonymization in logs is
thus crucial to ensure that no personal data is being inadvertently
logged and retained which would make the logging application run
afoul of laws around storing private information. This document
focuses on exploring mechanisms to specify personal or sensitive data
in logs, to enable any server collecting, processing or analyzing
logs to identify personal data and thereafter, potentially enforce
any redaction.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 7, 2020.
Rao, et al. Expires May 7, 2020 [Page 1]
Internet-Draft PITFoL November 2019
Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Motivation and Use Cases . . . . . . . . . . . . . . . . . . 3
4. Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.1. Field Level Tagging . . . . . . . . . . . . . . . . . . . 4
4.2. Log Level Tagging . . . . . . . . . . . . . . . . . . . . 4
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
6. Security Considerations . . . . . . . . . . . . . . . . . . . 5
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5
8. Normative References . . . . . . . . . . . . . . . . . . . . 5
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6
1. Introduction
Personal data identification and redaction is crucial to make sure
that a logging application is not storing and potentially leaking
users' private information. There are known precedents that help
discover and extract sensitive data, for example, we can define a
regular expression or lookup rules that will match a person's name,
credit card number, email address and so on. Besides, there are data
dictionary and datasets based training models that can predict the
presence of sensitive data. In most cases, however, what data is
considered personal and sensitive is often subjective, provisional
and contextual to the data source or the application processing the
data, which makes it hard to use automated techniques to identify
personal data. The challenges are summarized as follows:
- What comprises personal data is often subjective and use case
specific.
Rao, et al. Expires May 7, 2020 [Page 2]
Internet-Draft PITFoL November 2019
- There are many disparate set of personal data types and often
require multitude approaches for its detection.
- There are no standards that govern formats of sensitive data making
automation difficult for most common use cases.
Once the personal information is identified, it has to be
appropriately tagged. Personal data tagging is especially important
in cases where log data is flowing in from disparate sources. In
cases where tagging at source is not possible (e.g. log data
generated by a legacy IoT device, Web server or a Firewall), a
centralized logging server can be tasked with making sure the log
data is tagged before passing on downstream. Once the logs are
tagged, the logging application can use anonymization techniques to
redact the fields appropriately. This document focuses on the
tagging aspect of log redaction.
2. Terminology
Personal data: RFC 6973 [RFC6973] defines personal data as "any
information relating to an individual who can be identified, directly
or indirectly." This typically includes information such as IP
addresses, username. However, the definition of personal data varies
heavily by what other information is available, the jurisdiction of
operation and other such factors. Hence, this document does not
focus on prescriptively listing what log fields contain personal data
but rather on what a tagging mechanism would look like once a logging
application has determined which fields it considers to hold personal
data.
3. Motivation and Use Cases
Most systems like network devices, web servers and application
services record information about user activity, transactions,
network flows, etc., as log data. Logs are incredibly useful for
various purposes such as security monitoring, application debugging
and opertional maintenace. In addition, there are use cases of
organizations exporting or sharing logs with third party log
analyzers for purposes of security incident reponse, monitoring,
business analytics, where logs can be valuable source of information.
In such cases, there are concerns about potential exposure of
personal data to unintented systems or receipients. This document
explores techiques for tagging logs to aid identification of personal
data.
Rao, et al. Expires May 7, 2020 [Page 3]
Internet-Draft PITFoL November 2019
4. Techniques
Once personal information data is identified via manual detection,
dictionary or dataset based training models, the log imposed with tag
information either at field-level or the log-level.
This is an example of a log message in RFC 3164 [RFC3164] format. We
can imagine that a logging application determines that user_name,
err_user and ip_addr are fields that can contain sensitive personal
data.
<120> Nov 16 16:00:00 10.0.1.11 ABCDEFG: [AF@0 event="AF-Authority
failure" violation="A-Not authorized to object" actual_type="AF-A"
jrn_seq="1001363" timestamp="20120418163258988000"
job_name="QPADEV000B" user_name="XYZZY" job_number="256937"
err_user="TESTFORAF" ip_addr="10.0.1.21" port="55875"
action="Undefined(x00)" val_job="QPADEV000B" val_user="XYZZY"
val_jobno="256937" object="TEST" object_library="CUS9242"
object_type="*FILE" pgm_name="" pgm_libr="" workstation=""]
4.1. Field Level Tagging
In the field-level tagging method, the identifed <attribute,
value>field is tagged with a "pii_data=true" attribute specifying the
field to be sensitive or personal. In case of log-level tagging
approach, the data about fields that are personal is specified using
"pii_name" attribute that contains list of one or more field deemed
sensitive or personal.
This log can be transformed as following using 'Field-level' tagging
techique:
<120> Apr 18 16:32:58 10.0.1.11 QAUDJRN: [AF@0 event="AF-Authority
failure" violation="A-Not authorized to object" actual_type="AF-A"
jrn_seq="1001363" timestamp="20120418163258988000"
job_name="QPADEV000B" {user_name="XYZZY" pii_data="true"}
job_number="256937" {err_user="XYZZY" pii_data="true"]
[ip_addr="10.0.1.21" pii_data="true"] port="55875"
action="Undefined(x00)" val_job="QPADEV000B" val_jobno="256937"
object="TEST" object_library="CUS9242" object_type="*FILE"
pgm_name="" pgm_libr="" workstation=""]
4.2. Log Level Tagging
<120> Apr 18 16:32:58 10.0.1.11 QAUDJRN: [AF@0 event="AF-Authority
failure" violation="A-Not authorized to object" actual_type="AF-A"
jrn_seq="1001363" timestamp="20120418163258988000"
job_name="QPADEV000B" user_name="XYZZY" job_number="256937"
Rao, et al. Expires May 7, 2020 [Page 4]
Internet-Draft PITFoL November 2019
err_user="XYZZY" ip_addr="10.0.1.21" port="55875"
action="Undefined(x00)" val_job="QPADEV000B" val_jobno="256937"
object="TEST" object_library="CUS9242" object_type="*FILE"
pgm_name="" pgm_libr="" workstation="", pii="user_name,err_user,
ip_addr"]
A new (metadata) "pii" field was added to the MSG part of the syslog
log message.
A more complicated example, that can be used to support the ability
to radact different fields in different ways as per privacy
preservation policy.
<120> Apr 18 16:32:58 10.0.1.11 QAUDJRN: [AF@0 event="AF-Authority
failure" violation="A-Not authorized to object" actual_type="AF-A"
jrn_seq="1001363" timestamp="20120418163258988000"
job_name="QPADEV000B" user_name="XYZZY" job_number="256937"
err_user="XYZZY" ip_addr="10.0.1.21" port="55875"
action="Undefined(x00)" val_job="QPADEV000B" val_jobno="256937"
object="TEST" object_library="CUS9242" object_type="*FILE"
pgm_name="" pgm_libr="" workstation="",
pii_name="user_name,err_user", pii_ipaddr="ip_addr"]
where the log data is tagged with "pii_name" and "pii_ipaddr"
attributes that specifies the senstive data in the log at a granular
level.
5. IANA Considerations
We can consider defining a Structured Data ID for PII to specify
various structured parameters.
6. Security Considerations
TBD
7. Acknowledgements
TBD
8. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
Rao, et al. Expires May 7, 2020 [Page 5]
Internet-Draft PITFoL November 2019
[RFC3164] Lonvick, C., "The BSD Syslog Protocol", RFC 3164,
DOI 10.17487/RFC3164, August 2001,
<https://www.rfc-editor.org/info/rfc3164>.
[RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
Morris, J., Hansen, M., and R. Smith, "Privacy
Considerations for Internet Protocols", RFC 6973,
DOI 10.17487/RFC6973, July 2013,
<https://www.rfc-editor.org/info/rfc6973>.
Authors' Addresses
Sandeep Rao
Grab
Bangalore
India
Email: sandeeprao.ietf@gmail.com
Shivan Sahib
Salesforce
Email: shivankaulsahib@gmail.com
Ryan Guest
Salesforce
Email: rguest@salesforce.com
Rao, et al. Expires May 7, 2020 [Page 6]
Html markup produced by rfcmarkup 1.129d, available from
https://tools.ietf.org/tools/rfcmarkup/