< draft-irtf-pearg-safe-internet-measurement-00.txt   draft-irtf-pearg-safe-internet-measurement-01.txt >
Network Working Group I. Learmonth Network Working Group I. Learmonth
Internet-Draft Tor Project Internet-Draft Tor Project
Intended status: Informational July 7, 2019 Intended status: Informational July 8, 2019
Expires: January 8, 2020 Expires: January 9, 2020
Guidelines for Performing Safe Measurement on the Internet Guidelines for Performing Safe Measurement on the Internet
draft-irtf-pearg-safe-internet-measurement-00 draft-irtf-pearg-safe-internet-measurement-01
Abstract Abstract
Researchers from industry and academia often use Internet Researchers from industry and academia often use Internet
measurements as part of their work. While these measurements can measurements as part of their work. While these measurements can
give insight into the functioning and usage of the Internet, they can give insight into the functioning and usage of the Internet, they can
come at the cost of user privacy. This document describes guidelines come at the cost of user privacy. This document describes guidelines
for ensuring that such measurements can be carried out safely. for ensuring that such measurements can be carried out safely.
Note Note
skipping to change at page 1, line 43 skipping to change at page 1, line 43
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 8, 2020. This Internet-Draft will expire on January 9, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 5, line 20 skipping to change at page 5, line 20
new behaviour to any user should be considered appropriate if some new behaviour to any user should be considered appropriate if some
users are to remain with the old behavior. users are to remain with the old behavior.
In the event that something does go wrong with the update, it should In the event that something does go wrong with the update, it should
be easy for a user to discover that they have been part of an be easy for a user to discover that they have been part of an
experiment and roll back the change, allowing for explicit refusal of experiment and roll back the change, allowing for explicit refusal of
consent to override the presumed implied consent. consent to override the presumed implied consent.
3. Safety Considerations 3. Safety Considerations
3.1. Use a testbed 3.1. Isolate risk with a dedicated testbed
Wherever possible, use a testbed. An isolated network means that Wherever possible, use a testbed. An isolated network means that
there are no other users sharing the infrastructure you are using for there are no other users sharing the infrastructure you are using for
your experiments. your experiments.
When measuring performance, competing traffic can have negative When measuring performance, competing traffic can have negative
effects on the performance of your test traffic and so the testbed effects on the performance of your test traffic and so the testbed
approach can also produce more accurate and repeatable results than approach can also produce more accurate and repeatable results than
experiments using the public Internet. experiments using the public Internet.
WAN link conditions can be emulated through artificial delays and/or WAN link conditions can be emulated through artificial delays and/or
packet loss using a tool like [netem]. Competing traffic can also be packet loss using a tool like [netem]. Competing traffic can also be
emulated using traffic generators. emulated using traffic generators.
3.2. Only record your own traffic 3.2. Be respectful of other's infrastructure
When performing active measurements be sure to only capture traffic
that you have generated. Traffic may be identified by IP ranges or
by some token that is unlikely to be used by other users.
Again, this can help to improve the accuracy and repeatability of
your experiment. [RFC2544], for performance benchmarking, requires
that any frames received that were not part of the test traffic are
discarded and not counted in the results.
3.3. Be respectful of other's infrastructure
If your experiment is designed to trigger a response from If your experiment is designed to trigger a response from
infrastructure that is not your own, consider what the negative infrastructure that is not your own, consider what the negative
consequences of that may be. At the very least your experiment will consequences of that may be. At the very least your experiment will
consume bandwidth that may have to be paid for. consume bandwidth that may have to be paid for.
In more extreme circumstances, you could cause traffic to be In more extreme circumstances, you could cause traffic to be
generated that causes legal trouble for the owner of that generated that causes legal trouble for the owner of that
infrastructure. The Internet is a global network crossing many legal infrastructure. The Internet is a global network crossing many legal
jurisdictions and so what may be legal for you is not necessarily jurisdictions and so what may be legal for you is not necessarily
legal for everyone. legal for everyone.
If you are sending a lot of traffic quickly, or otherwise generally If you are sending a lot of traffic quickly, or otherwise generally
deviate from typical client behaviour, a network may identify this as deviate from typical client behaviour, a network may identify this as
an attack which means that you will not be collecting results that an attack which means that you will not be collecting results that
are representative of what a typical client would see. are representative of what a typical client would see.
3.3.1. Maintain a "Do Not Scan" list 3.2.1. Maintain a "Do Not Scan" list
When performing active measurements on a shared network, maintain a When performing active measurements on a shared network, maintain a
list of hosts that you will never scan regardless of whether they list of hosts that you will never scan regardless of whether they
appear in your target lists. When developing tools for performing appear in your target lists. When developing tools for performing
active measurement, or traffic generation for use in a larger active measurement, or traffic generation for use in a larger
measurement system, ensure that the tool will support the use of a measurement system, ensure that the tool will support the use of a
"Do Not Scan" list. "Do Not Scan" list.
If complaints are made that request you do not generate traffic If complaints are made that request you do not generate traffic
towards a host or network, you must add that host or network to your towards a host or network, you must add that host or network to your
skipping to change at page 6, line 43 skipping to change at page 6, line 32
you plan to share the reasoning when publishing your measurement you plan to share the reasoning when publishing your measurement
results, e.g. in an academic paper, you must seek consent for this results, e.g. in an academic paper, you must seek consent for this
from the requester. from the requester.
Be aware that in publishing your measurement results, it may be Be aware that in publishing your measurement results, it may be
possible to infer your "Do Not Scan" list from those results. For possible to infer your "Do Not Scan" list from those results. For
example, if you measured a well-known list of popular websites then example, if you measured a well-known list of popular websites then
it would be possible to correlate the results with that list to it would be possible to correlate the results with that list to
determine which are missing. determine which are missing.
3.4. Only collect data that is safe to make public 3.3. Data Minimization
When collecting, using, disclosing, and storing data from a
measurement, use only the minimal data necessary to perform a task.
Reducing the amount of data reduces the amount of data that can be
misused or leaked.
When deciding on the data to collect, assume that any data collected When deciding on the data to collect, assume that any data collected
might become public. There are many ways that this could happen, might be disclosed. There are many ways that this could happen,
through operation security mistakes or compulsion by a judicial through operation security mistakes or compulsion by a judicial
system. system.
3.5. Minimization When directly instrumenting a protocol to provide metrics to a
passive observer, see section 6.1 of RFC6973 [RFC6973] for data
minimalization considerations specific to this use case.
For all data collected, consider whether or not it is really needed. 3.3.1. Discarding Data
3.6. Aggregation XXX: Discard data that is not required to perform the task.
When performing active measurements be sure to only capture traffic
that you have generated. Traffic may be identified by IP ranges or
by some token that is unlikely to be used by other users.
Again, this can help to improve the accuracy and repeatability of
your experiment. [RFC2544], for performance benchmarking, requires
that any frames received that were not part of the test traffic are
discarded and not counted in the results.
3.3.2. Masking Data
XXX: Mask data that is not required to perform the task.
Particularly useful for content of traffic to indicate that either a
particular class of content existed or did not exist, or the length
of the content, but not recording the content itself. Can also
replace content with tokens, or encrypt.
3.3.3. Reduce Accuracy
XXX: Binning, categorizing, geoip, noise.
3.3.4. Data Aggregation
When collecting data, consider if the granularity can be limited by When collecting data, consider if the granularity can be limited by
using bins or adding noise. XXX: Differential privacy. using bins or adding noise. XXX: Differential privacy.
3.7. Source Aggregation XXX: Do this at the source, definitely do it before you write to
disk.
Do this at the source, definitely do it before you write to disk.
[Tor.2017-04-001] presents a case-study on the in-memory statistics [Tor.2017-04-001] presents a case-study on the in-memory statistics
in the software used by the Tor network, as an example. in the software used by the Tor network, as an example.
4. Risk Analysis 4. Risk Analysis
The benefits should outweigh the risks. Consider auxiliary data The benefits should outweigh the risks. Consider auxiliary data
(e.g. third-party data sets) when assessing the risks. (e.g. third-party data sets) when assessing the risks.
5. Security Considerations 5. Security Considerations
skipping to change at page 8, line 31 skipping to change at page 8, line 49
[RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for
Network Interconnect Devices", RFC 2544, Network Interconnect Devices", RFC 2544,
DOI 10.17487/RFC2544, March 1999, DOI 10.17487/RFC2544, March 1999,
<https://www.rfc-editor.org/info/rfc2544>. <https://www.rfc-editor.org/info/rfc2544>.
[RFC6349] Constantine, B., Forget, G., Geib, R., and R. Schrage, [RFC6349] Constantine, B., Forget, G., Geib, R., and R. Schrage,
"Framework for TCP Throughput Testing", RFC 6349, "Framework for TCP Throughput Testing", RFC 6349,
DOI 10.17487/RFC6349, August 2011, DOI 10.17487/RFC6349, August 2011,
<https://www.rfc-editor.org/info/rfc6349>. <https://www.rfc-editor.org/info/rfc6349>.
[RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
Morris, J., Hansen, M., and R. Smith, "Privacy
Considerations for Internet Protocols", RFC 6973, July
2013, <https://www.rfc-editor.org/info/rfc6937>.
[Tor.2017-04-001] [Tor.2017-04-001]
Herm, K., "Privacy analysis of Tor's in-memory Herm, K., "Privacy analysis of Tor's in-memory
statistics", Tor Tech Report 2017-04-001, April 2017, statistics", Tor Tech Report 2017-04-001, April 2017,
<https://research.torproject.org/techreports/ <https://research.torproject.org/techreports/
privacy-in-memory-2017-04-28.pdf>. privacy-in-memory-2017-04-28.pdf>.
[TorSafetyBoard] [TorSafetyBoard]
Tor Project, "Tor Research Safety Board", Tor Project, "Tor Research Safety Board",
<https://research.torproject.org/safetyboard/>. <https://research.torproject.org/safetyboard/>.
 End of changes. 13 change blocks. 
26 lines changed or deleted 49 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/