< draft-learmonth-pearg-safe-internet-measurement-01.txt | draft-learmonth-pearg-safe-internet-measurement-02.txt > | |||
---|---|---|---|---|
Network Working Group I. Learmonth | Network Working Group I. Learmonth | |||
Internet-Draft Tor Project | Internet-Draft Tor Project | |||
Intended status: Informational December 12, 2018 | Intended status: Informational May 16, 2019 | |||
Expires: June 15, 2019 | Expires: November 17, 2019 | |||
Guidelines for Performing Safe Measurement on the Internet | Guidelines for Performing Safe Measurement on the Internet | |||
draft-learmonth-pearg-safe-internet-measurement-01 | draft-learmonth-pearg-safe-internet-measurement-02 | |||
Abstract | Abstract | |||
Researchers from industry and academia will often use Internet | Researchers from industry and academia will often use Internet | |||
measurements as a part of their work. While these measurements can | measurements as a part of their work. While these measurements can | |||
give insight into the functioning and usage of the Internet, they can | give insight into the functioning and usage of the Internet, they can | |||
come at the cost of user privacy. This document describes guidelines | come at the cost of user privacy. This document describes guidelines | |||
for ensuring that such measurements can be carried out safely. | for ensuring that such measurements can be carried out safely. | |||
Note | Note | |||
skipping to change at page 1, line 43 ¶ | skipping to change at page 1, line 43 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on June 15, 2019. | This Internet-Draft will expire on November 17, 2019. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. | to this document. | |||
1. Introduction | 1. Introduction | |||
When performing research using the Internet, as opposed to an | When performing research using the Internet, as opposed to an | |||
isolated testbed or simulation platform, means that you research co- | isolated testbed or simulation platform, means that you research co- | |||
exists in a space with other users. This document outlines | exists in a space with other users. This document outlines | |||
guidelines for academic and industry researchers that might use the | guidelines for academic and industry researchers that might use the | |||
Internet as part of scientific experiementation. | Internet as part of scientific experiementation. | |||
1.1. Scope of this document | ||||
Following the guidelines contained within this document is not a | Following the guidelines contained within this document is not a | |||
substitute for any institutional ethics review process you may have, | substitute for any institutional ethics review process you may have, | |||
although these guidelines could help to inform that process. | although these guidelines could help to inform that process. | |||
Similarly, these guidelines are not legal advice and local laws | Similarly, these guidelines are not legal advice and local laws must | |||
should be considered before starting any experiment that could have | also be considered before starting any experiment that could have | |||
adverse impacts on user privacy. | adverse impacts on user privacy. | |||
Considerations are grouped into two categories: those that primarily | 1.2. Active and passive measurements | |||
apply to active measurements and those that primarily apply to | ||||
passive measurements. Some of these considerations may be applicable | ||||
to both depending on the experiment design. | ||||
2. Active measurements | Internet measurement studies can be broadly categorized into two | |||
groups: active measurements and passive measurements. Active | ||||
measurements generate traffic. Performance measurements such as TCP | ||||
throughput testing [RFC6349] or functional measurements such as the | ||||
feature-dependent connectivity failure tests performed by | ||||
[PATHspider] both fall into this category. Performing passive | ||||
measurements requires existing traffic. Passive measurements can | ||||
help to inform new developments in Internet protocols but can also | ||||
carry risk. | ||||
Active measurements generate traffic. Performance measurements such | The type of measurement is not truly binary and many studies will | |||
as TCP throughput testing [RFC6349] or functional measurements such | include both active and passive components. Each of the | |||
as the feature-dependent connectivity failure tests performed by | considerations in this document must be carefully considered for | |||
[PATHspider] both fall into this category. | their applicability regardless of the type of measurement. | |||
2.1. Use a testbed | 2. Consent | |||
Ideally, informed consent would be collected from all users of a | ||||
shared network before measurements were performed on them. In cases | ||||
where it is practical to do so, this should be done. | ||||
For consent to be informed, all possible risks must be presented to | ||||
the users. The considerations in this document can be used to | ||||
provide a starting point although other risks may be present | ||||
depending on the nature of the measurements to be performed. | ||||
2.1. Proxy Consent | ||||
In cases where it is not practical to collect informed consent from | ||||
all users of a shared network, it may be possible to obtain proxy | ||||
consent. Proxy consent may be given by a network operator or | ||||
employer that would be more familiar with the expectations of users | ||||
of a network than the researcher. | ||||
2.2. Implied consent | ||||
In larger scale measurements, even proxy consent collection may not | ||||
be practical. In this case, implied consent may be presumed from | ||||
users for some measurements. Consider that users of a network will | ||||
have certain expectations of privacy and those expectations may not | ||||
align with the privacy guarantees offered by the technologies they | ||||
are using. As a thought experiment, consider how users might respond | ||||
if you asked for their informed consent for the measurements you'd | ||||
like to perform. | ||||
For example, the operator of a web server that is exposed to the | ||||
Internet hosting a popular website would have the expectation that it | ||||
may be included in surveys that look at supported protocols or | ||||
extensions but would not expect that attempts be made to degrade the | ||||
service with large numbers of simultaneous connections. | ||||
If practical, attempt to obtain informed consent or proxy consent | ||||
from a sample of users to better understand the expectations of other | ||||
users. | ||||
3. Safety Considerations | ||||
3.1. Use a testbed | ||||
Wherever possible, use a testbed. An isolated network means that | Wherever possible, use a testbed. An isolated network means that | |||
there are no other users sharing the infrastructure you are using for | there are no other users sharing the infrastructure you are using for | |||
your experiments. | your experiments. | |||
When measuring performance, competing traffic can have negative | When measuring performance, competing traffic can have negative | |||
effects on the performance of your test traffic and so the testbed | effects on the performance of your test traffic and so the testbed | |||
approach can also produce more accurate and repeatable results than | approach can also produce more accurate and repeatable results than | |||
experiments using the public Internet. | experiments using the public Internet. | |||
WAN link conditions can be emulated through artificial delays and/or | WAN link conditions can be emulated through artificial delays and/or | |||
packet loss using a tool like [netem]. Competing traffic can also be | packet loss using a tool like [netem]. Competing traffic can also be | |||
emulated using traffic generators. | emulated using traffic generators. | |||
2.2. Only record your own traffic | 3.2. Only record your own traffic | |||
When performing measurements be sure to only capture traffic that you | When performing measurements be sure to only capture traffic that you | |||
have generated. Traffic may be identified by IP ranges or by some | have generated. Traffic may be identified by IP ranges or by some | |||
token that is unlikely to be used by other users. | token that is unlikely to be used by other users. | |||
Again, this can help to improve the accuracy and repeatability of | Again, this can help to improve the accuracy and repeatability of | |||
your experiment. [RFC2544], for performance benchmarking, requires | your experiment. [RFC2544], for performance benchmarking, requires | |||
that any frames received that were not part of the test traffic are | that any frames received that were not part of the test traffic are | |||
discarded and not counted in the results. | discarded and not counted in the results. | |||
2.3. Be respectful of other's infrastructure | 3.3. Be respectful of other's infrastructure | |||
If your experiment is designed to trigger a response from | If your experiment is designed to trigger a response from | |||
infrastructure that is not your own, consider what the negative | infrastructure that is not your own, consider what the negative | |||
consequences of that may be. At the very least your experiment will | consequences of that may be. At the very least your experiment will | |||
consume bandwidth that may have to be paid for. | consume bandwidth that may have to be paid for. | |||
In more extreme circumstances, you could cause traffic to be | In more extreme circumstances, you could cause traffic to be | |||
generated that causes legal trouble for the owner of that | generated that causes legal trouble for the owner of that | |||
infrastructure. The Internet is a global network crossing many legal | infrastructure. The Internet is a global network crossing many legal | |||
jurisdictions and so what may be legal for you is not necessarily | jurisdictions and so what may be legal for you is not necessarily | |||
legal for everyone. | legal for everyone. | |||
If you are sending a lot of traffic quickly, or otherwise generally | If you are sending a lot of traffic quickly, or otherwise generally | |||
deviate from typical client behaviour, a network may identify this as | deviate from typical client behaviour, a network may identify this as | |||
an attack which means that you will not be collecting results that | an attack which means that you will not be collecting results that | |||
are representative of what a typical client would see. | are representative of what a typical client would see. | |||
3. Passive measurements | 3.3.1. Maintain a "Do Not Scan" list | |||
Performing passive measurements requires existing traffic. Passive | When performing active measurements on a shared network, maintain a | |||
measurements can help to inform new developments in Internet | list of hosts that you will never scan regardless of whether they | |||
protocols but can also carry risk. | appear in your target lists. When developing tools for performing | |||
active measurement, or traffic generation for use in a larger | ||||
measurement system, ensure that the tool will support the use of a | ||||
"Do Not Scan" list. | ||||
3.1. Consider the expectation of privacy | If complaints are made that request you do not generate traffic | |||
towards a host or network, you must add that host or network to your | ||||
"Do Not Scan" list, even if no explanation is given or the request is | ||||
automated. | ||||
If you are in a position to perform passive measurements of live | You may ask the requester for their reasoning if it would be useful | |||
network traffic, you are also in a position of responsibility. Users | to your experiment. This can also be an oppertunity to explain your | |||
of a network will have certain expectations of privacy and those | research and offer to share any results that may be of interest. If | |||
expectations may not align with the privacy guarantees offered by the | you plan to share the reasoning when publishing your measurement | |||
technologies they are using. As a thought experiment, consider how | results, e.g. in an academic paper, you must seek consent for this | |||
users might respond if you asked for their informed consent for the | from the requester. | |||
measurements you'd like to perform. | ||||
3.2. Only collect data that is safe to make public | Be aware that in publishing your measurement results, it may be | |||
possible to infer your "Do Not Scan" list from those results. For | ||||
example, if you measured a well-known list of popular websites then | ||||
it would be possible to correlate the results with that list to | ||||
determine which are missing. | ||||
3.4. Only collect data that is safe to make public | ||||
When deciding on the data to collect, assume that any data collected | When deciding on the data to collect, assume that any data collected | |||
might become public. There are many ways that this could happen, | might become public. There are many ways that this could happen, | |||
through operation security mistakes or compulsion by a judicial | through operation security mistakes or compulsion by a judicial | |||
system. | system. | |||
3.3. Minimization | 3.5. Minimization | |||
For all data collected, consider whether or not it is really needed. | For all data collected, consider whether or not it is really needed. | |||
3.4. Aggregation | 3.6. Aggregation | |||
When collecting data, consider if the granularity can be limited by | When collecting data, consider if the granularity can be limited by | |||
using bins or adding noise. XXX: Differential privacy. | using bins or adding noise. XXX: Differential privacy. | |||
3.5. Source Aggregation | 3.7. Source Aggregation | |||
Do this at the source, definitely do it before you write to disk. | Do this at the source, definitely do it before you write to disk. | |||
[Tor.2017-04-001] presents a case-study on the in-memory statistics | [Tor.2017-04-001] presents a case-study on the in-memory statistics | |||
in the software used by the Tor network, as an example. | in the software used by the Tor network, as an example. | |||
4. Risk Analysis | 4. Risk Analysis | |||
The benefits should outweigh the risks. Consider auxiliary data | The benefits should outweigh the risks. Consider auxiliary data | |||
(e.g. third-party data sets) when assessing the risks. | (e.g. third-party data sets) when assessing the risks. | |||
skipping to change at page 4, line 48 ¶ | skipping to change at page 6, line 15 ¶ | |||
6. IANA Considerations | 6. IANA Considerations | |||
This document has no actions for IANA. | This document has no actions for IANA. | |||
7. Acknowledgements | 7. Acknowledgements | |||
Many of these considerations are based on those from the | Many of these considerations are based on those from the | |||
[TorSafetyBoard] adapted and generalised to be applied to Internet | [TorSafetyBoard] adapted and generalised to be applied to Internet | |||
research. | research. | |||
Other considerations are taken from the Menlo Report [MenloReport] | ||||
and its companion document [MenloReportCompanion]. | ||||
8. Informative References | 8. Informative References | |||
[MenloReport] | ||||
Dittrich, D. and E. Kenneally, "The Menlo Report: Ethical | ||||
Principles Guiding Information and Communication | ||||
Technology Research", August 2012, | ||||
<https://www.caida.org/publications/papers/2012/ | ||||
menlo_report_actual_formatted/>. | ||||
[MenloReportCompanion] | ||||
Bailey, M., Dittrich, D., and E. Kenneally, "Applying | ||||
Ethical Principles to Information and Communication | ||||
Technology Research", October 2013, | ||||
<https://www.impactcybertrust.org/link_docs/ | ||||
Menlo-Report-Companion.pdf>. | ||||
[netem] Stephen, H., "Network emulation with NetEm", April 2005. | [netem] Stephen, H., "Network emulation with NetEm", April 2005. | |||
[PATHspider] | [PATHspider] | |||
Learmonth, I., Trammell, B., Kuehlewind, M., and G. | Learmonth, I., Trammell, B., Kuehlewind, M., and G. | |||
Fairhurst, "PATHspider: A tool for active measurement of | Fairhurst, "PATHspider: A tool for active measurement of | |||
path transparency", DOI 10.1145/2959424.2959441, July | path transparency", DOI 10.1145/2959424.2959441, July | |||
2016, | 2016, | |||
<https://dl.acm.org/citation.cfm?doid=2959424.2959441>. | <https://dl.acm.org/citation.cfm?doid=2959424.2959441>. | |||
[RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for | [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for | |||
skipping to change at page 5, line 24 ¶ | skipping to change at page 7, line 7 ¶ | |||
DOI 10.17487/RFC2544, March 1999, | DOI 10.17487/RFC2544, March 1999, | |||
<https://www.rfc-editor.org/info/rfc2544>. | <https://www.rfc-editor.org/info/rfc2544>. | |||
[RFC6349] Constantine, B., Forget, G., Geib, R., and R. Schrage, | [RFC6349] Constantine, B., Forget, G., Geib, R., and R. Schrage, | |||
"Framework for TCP Throughput Testing", RFC 6349, | "Framework for TCP Throughput Testing", RFC 6349, | |||
DOI 10.17487/RFC6349, August 2011, | DOI 10.17487/RFC6349, August 2011, | |||
<https://www.rfc-editor.org/info/rfc6349>. | <https://www.rfc-editor.org/info/rfc6349>. | |||
[Tor.2017-04-001] | [Tor.2017-04-001] | |||
Herm, K., "Privacy analysis of Tor's in-memory | Herm, K., "Privacy analysis of Tor's in-memory | |||
statistics", Tor Tech Report 2017-04-001, | statistics", Tor Tech Report 2017-04-001, April 2017, | |||
<https://research.torproject.org/techreports/ | <https://research.torproject.org/techreports/ | |||
privacy-in-memory-2017-04-28.pdf>. | privacy-in-memory-2017-04-28.pdf>. | |||
[TorSafetyBoard] | [TorSafetyBoard] | |||
Tor Project, "Tor Research Safety Board", | Tor Project, "Tor Research Safety Board", | |||
<https://research.torproject.org/safetyboard.html>. | <https://research.torproject.org/safetyboard/>. | |||
Author's Address | Author's Address | |||
Iain R. Learmonth | Iain R. Learmonth | |||
Tor Project | Tor Project | |||
Email: irl@torproject.org | Email: irl@torproject.org | |||
End of changes. 24 change blocks. | ||||
37 lines changed or deleted | 114 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |