draft-ietf-tcpm-2140bis-01.txt | draft-ietf-tcpm-2140bis-02.txt | |||
---|---|---|---|---|
TCPM WG J. Touch | TCPM WG J. Touch | |||
Internet Draft Independent | Internet Draft Independent | |||
Intended status: Informational M. Welzl | Intended status: Informational M. Welzl | |||
Obsoletes: 2140 S. Islam | Obsoletes: 2140 S. Islam | |||
Expires: May 2020 University of Oslo | Expires: August 2020 University of Oslo | |||
November 19, 2019 | February 28, 2020 | |||
TCP Control Block Interdependence | TCP Control Block Interdependence | |||
draft-ietf-tcpm-2140bis-01.txt | draft-ietf-tcpm-2140bis-02.txt | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
This document may contain material from IETF Documents or IETF | This document may contain material from IETF Documents or IETF | |||
Contributions published or made publicly available before November | Contributions published or made publicly available before November | |||
10, 2008. The person(s) controlling the copyright in some of this | 10, 2008. The person(s) controlling the copyright in some of this | |||
material may not have granted the IETF Trust the right to allow | material may not have granted the IETF Trust the right to allow | |||
skipping to change at page 1, line 45 ¶ | skipping to change at page 1, line 45 ¶ | |||
months and may be updated, replaced, or obsoleted by other documents | months and may be updated, replaced, or obsoleted by other documents | |||
at any time. It is inappropriate to use Internet-Drafts as | at any time. It is inappropriate to use Internet-Drafts as | |||
reference material or to cite them other than as "work in progress." | reference material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt | http://www.ietf.org/ietf/1id-abstracts.txt | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html | http://www.ietf.org/shadow.html | |||
This Internet-Draft will expire on May 19, 2020. | This Internet-Draft will expire on August 28, 2020. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2020 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with | carefully, as they describe your rights and restrictions with | |||
respect to this document. Code Components extracted from this | respect to this document. Code Components extracted from this | |||
document must include Simplified BSD License text as described in | document must include Simplified BSD License text as described in | |||
Section 4.e of the Trust Legal Provisions and are provided | Section 4.e of the Trust Legal Provisions and are provided | |||
skipping to change at page 2, line 42 ¶ | skipping to change at page 2, line 42 ¶ | |||
across connections to the same host. Such sharing is intended to | across connections to the same host. Such sharing is intended to | |||
improve overall transient transport performance, while maintaining | improve overall transient transport performance, while maintaining | |||
backward-compatibility with existing implementations. The sharing | backward-compatibility with existing implementations. The sharing | |||
described herein is limited to only the TCB initialization and so | described herein is limited to only the TCB initialization and so | |||
has no effect on the long-term behavior of TCP after a connection | has no effect on the long-term behavior of TCP after a connection | |||
has been established. | has been established. | |||
Table of Contents | Table of Contents | |||
1. Introduction...................................................3 | 1. Introduction...................................................3 | |||
2. Conventions used in this document..............................4 | 2. Conventions Used in This Document..............................4 | |||
3. Terminology....................................................4 | 3. Terminology....................................................4 | |||
4. The TCP Control Block (TCB)....................................4 | 4. The TCP Control Block (TCB)....................................4 | |||
5. TCB Interdependence............................................5 | 5. TCB Interdependence............................................5 | |||
6. An Example of Temporal Sharing.................................6 | 6. Temporal Sharing...............................................6 | |||
7. An Example of Ensemble Sharing.................................9 | 6.1. Initialization of the new TCB................................6 | |||
8. Compatibility Issues..........................................11 | 6.2. Updates to the new TCP.......................................7 | |||
9. Implications..................................................13 | 6.3. Discussion...................................................8 | |||
10. Implementation Observations..................................14 | 7. Ensemble Sharing...............................................9 | |||
11. Updates to RFC 2140..........................................15 | 7.1. Initialization of a new TCB..................................9 | |||
12. Security Considerations......................................16 | 7.2. Updates to the new TCB......................................10 | |||
13. IANA Considerations..........................................16 | 7.3. Discussion..................................................11 | |||
14. References...................................................16 | 8. Compatibility Issues..........................................12 | |||
14.1. Normative References....................................16 | 8.1. Traversing the same network path............................13 | |||
14.2. Informative References..................................17 | 8.2. State dependence............................................13 | |||
15. Acknowledgments..............................................19 | 8.3. Problems with IP sharing....................................14 | |||
16. Change log...................................................20 | 9. Implications..................................................14 | |||
Appendix A : TCB sharing history.................................22 | 9.1. Layering....................................................14 | |||
Appendix B : TCP Option Sharing and Caching......................22 | 9.2. Other possibilities.........................................15 | |||
10. Implementation Observations..................................15 | ||||
11. Updates to RFC 2140..........................................16 | ||||
12. Security Considerations......................................17 | ||||
13. IANA Considerations..........................................17 | ||||
14. References...................................................18 | ||||
14.1. Normative References....................................18 | ||||
14.2. Informative References..................................18 | ||||
15. Acknowledgments..............................................21 | ||||
16. Change log...................................................21 | ||||
Appendix A : TCB Sharing History.................................24 | ||||
Appendix B : TCP Option Sharing and Caching......................25 | ||||
Appendix C : Automating the Initial Window in TCP over Long | Appendix C : Automating the Initial Window in TCP over Long | |||
Timescales.......................................................25 | Timescales.......................................................27 | |||
C.1. Introduction.............................................25 | C.1. Introduction.............................................27 | |||
C.2. Design Considerations....................................25 | C.2. Design Considerations....................................27 | |||
C.3. Proposed IW Algorithm....................................26 | C.3. Proposed IW Algorithm....................................28 | |||
C.4. Discussion...............................................29 | C.4. Discussion...............................................31 | |||
C.5. Observations.............................................30 | C.5. Observations.............................................32 | |||
1. Introduction | 1. Introduction | |||
TCP is a connection-oriented reliable transport protocol layered | TCP is a connection-oriented reliable transport protocol layered | |||
over IP [RFC793]. Each TCP connection maintains state, usually in a | over IP [RFC793]. Each TCP connection maintains state, usually in a | |||
data structure called the TCP Control Block (TCB). The TCB contains | data structure called the TCP Control Block (TCB). The TCB contains | |||
information about the connection state, its associated local | information about the connection state, its associated local | |||
process, and feedback parameters about the connection's transmission | process, and feedback parameters about the connection's transmission | |||
properties. As originally specified and usually implemented, most | properties. As originally specified and usually implemented, most | |||
TCB information is maintained on a per-connection basis. Some | TCB information is maintained on a per-connection basis. Some | |||
implementations can (and now do) share certain TCB information | implementations can (and now do) share certain TCB information | |||
across connections to the same host [RFC2140]. Such sharing is | across connections to the same host [RFC2140]. Such sharing is | |||
intended to lead to better overall transient performance, especially | intended to lead to better overall transient performance, especially | |||
for numerous short-lived and simultaneous connections, as often used | for numerous short-lived and simultaneous connections, as often used | |||
in the World-Wide Web [Be94],[Br02]. This sharing of state is | in the World-Wide Web [Be94][Br02]. This sharing of state is | |||
intended to help TCP connections converge to steady-state behavior | intended to help TCP connections converge to steady-state behavior | |||
more quickly without affecting TCP interoperability. | more quickly without affecting TCP interoperability. | |||
This document updates RFC 2140's discussion of TCB state sharing and | This document updates RFC 2140's discussion of TCB state sharing and | |||
provides a complete replacement for that document. This state | provides a complete replacement for that document. This state | |||
sharing affects only TCB initialization [RFC2140] and thus has no | sharing affects only TCB initialization [RFC2140] and thus has no | |||
effect on the long-term behavior of TCP after a connection has been | effect on the long-term behavior of TCP after a connection has been | |||
established nor on interoperability. Path information shared across | established nor on interoperability. Path information shared across | |||
SYN destination port numbers assumes that TCP segments having the | SYN destination port numbers assumes that TCP segments having the | |||
same host-pair experience the same path properties, irrespective of | same host-pair experience the same path properties, irrespective of | |||
TCP port numbers. The observations about TCB sharing in this | TCP port numbers. The observations about TCB sharing in this | |||
document apply similarly to any protocol with congestion state, | document apply similarly to any protocol with congestion state, | |||
including SCTP [RFC4960] and DCCP [RFC4340], as well as for | including SCTP [RFC4960] and DCCP [RFC4340], as well as for | |||
individual subflows in Multipath TCP [RFC6824]. | individual subflows in Multipath TCP [RFC6824]. | |||
2. Conventions used in this document | 2. Conventions Used in This Document | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
However, this document is intended to describe behavior that is | However, this document is intended to describe behavior that is | |||
already permitted by TCP implementers. As a result, it provides | already permitted by TCP standards. As a result, it provides | |||
informative guidance but does not use such normative language, | informative guidance but does not use such normative language, | |||
except when quoting other documents. | except when quoting other documents. | |||
3. Terminology | 3. Terminology | |||
Host - a source or sink of TCP segments associated with a single IP | Host - a source or sink of TCP segments associated with a single IP | |||
address | address | |||
Host-pair - a pair of hosts and their corresponding IP addresses | Host-pair - a pair of hosts and their corresponding IP addresses | |||
skipping to change at page 5, line 18 ¶ | skipping to change at page 5, line 18 ¶ | |||
pointers to Internet Protocol (IP) PCB | pointers to Internet Protocol (IP) PCB | |||
Per-connection shared state | Per-connection shared state | |||
macro-state | macro-state | |||
connection state | connection state | |||
timers | timers | |||
flags | flags | |||
local and remote host numbers and ports | local and remote host numbers and ports | |||
TCP option state | TCP option state | |||
micro-state | micro-state | |||
send and receive window state (size*, current number) | send and receive window state (size*, current number) | |||
round-trip time and variance | ||||
cong. window size (snd_cwnd)* | cong. window size (snd_cwnd)* | |||
cong. window size threshold (ssthresh)* | cong. window size threshold (ssthresh)* | |||
max window size seen* | max window size seen* | |||
sendMSS# | sendMSS# | |||
MMS_S# | MMS_S# | |||
MMS_R# | MMS_R# | |||
PMTU# | PMTU# | |||
round-trip time and variance# | round-trip time and variance# | |||
The per-connection information is shown as split into macro-state | The per-connection information is shown as split into macro-state | |||
skipping to change at page 6, line 5 ¶ | skipping to change at page 6, line 5 ¶ | |||
5. TCB Interdependence | 5. TCB Interdependence | |||
There are two cases of TCB interdependence. Temporal sharing occurs | There are two cases of TCB interdependence. Temporal sharing occurs | |||
when the TCB of an earlier (now CLOSED) connection to a host is used | when the TCB of an earlier (now CLOSED) connection to a host is used | |||
to initialize some parameters of a new connection to that same host, | to initialize some parameters of a new connection to that same host, | |||
i.e., in sequence. Ensemble sharing occurs when a currently active | i.e., in sequence. Ensemble sharing occurs when a currently active | |||
connection to a host is used to initialize another (concurrent) | connection to a host is used to initialize another (concurrent) | |||
connection to that host. | connection to that host. | |||
6. An Example of Temporal Sharing | 6. Temporal Sharing | |||
The TCB data cache is accessed in two ways: it is read to initialize | The TCB data cache is accessed in two ways: it is read to initialize | |||
new TCBs and written when more current per-host state is available. | new TCBs and written when more current per-host state is available. | |||
New TCBs can be initialized using context from past connections as | ||||
follows: | ||||
TEMPORAL SHARING - TCB Initialization | 6.1. Initialization of the new TCB | |||
TCBs for new connections can be initialized using context from past | ||||
connections as follows: | ||||
TEMPORAL SHARING - TCB Initialization | ||||
Cached TCB New TCB | Cached TCB New TCB | |||
-------------------------------------- | -------------------------------------- | |||
old_MMS_S old_MMS_S or not cached | old_MMS_S old_MMS_S or not cached | |||
old_MMS_R old_MMS_R or not cached | old_MMS_R old_MMS_R or not cached | |||
old_sendMSS old_sendMSS | old_sendMSS old_sendMSS | |||
old_PMTU old_PMTU | old_PMTU old_PMTU | |||
skipping to change at page 6, line 34 ¶ | skipping to change at page 6, line 37 ¶ | |||
old_RTT old_RTT | old_RTT old_RTT | |||
old_RTTvar old_RTTvar | old_RTTvar old_RTTvar | |||
old_option (option specific) | old_option (option specific) | |||
old_ssthresh old_ssthresh | old_ssthresh old_ssthresh | |||
old_snd_cwnd old_snd_cwnd | old_snd_cwnd old_snd_cwnd | |||
Sections 8 and 9 discuss compatibility issues and implications of | ||||
sharing the specific information listed above. Section 10 gives an | ||||
overview of known implementations. | ||||
Most cached TCB values are updated when a connection closes. The | ||||
exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122], | ||||
PMTU which is updated after Path MTU Discovery | ||||
[RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the | ||||
MSS option is received in the TCP SYN header. | ||||
Sharing sendMSS information affects only data in the SYN of the next | ||||
connection, because sendMSS information is typically included in | ||||
most TCP SYN segments. Caching PMTU can accelerate the efficiency of | ||||
PMTUD, but can also result in black-holing until corrected if in | ||||
error. Caching MMS_R and MMS_S may be of little direct value as they | ||||
are reported by the local IP stack anyway. | ||||
The way in which other TCP option state can be shared depends on the | ||||
details of that option. E.g., TFO state includes the TCP Fast Open | ||||
Cookie [RFC7413] or, in case TFO fails, a negative TCP Fast Open | ||||
response. RFC 7413 states, "The client MUST cache negative responses | ||||
from the server in order to avoid potential connection failures. | ||||
Negative responses include the server not acknowledging the data in | ||||
the SYN, ICMP error messages, and (most importantly) no response | ||||
(SYN-ACK) from the server at all, i.e., connection timeout." [RFC | ||||
7413]. TFOinfo is cached when a connection is established. | ||||
Other TCP option state might not be as readily cached. E.g., TCP-AO | ||||
[RFC5925] success or failure between a host pair for a single SYN | ||||
destination port might be usefully cached. TCP-AO success or failure | ||||
to other SYN destination ports on that host pair is never useful to | ||||
cache because TCP-AO security parameters can vary per service. | ||||
The table below gives an overview of option-specific information | The table below gives an overview of option-specific information | |||
that can be shared. Additional information on TCP options and | that can be shared. Additional information on some specific TCP | |||
sharing is provided in Appendix B. | options and sharing is provided in 0. | |||
TEMPORAL SHARING - Option info | TEMPORAL SHARING - Option Info Initialization | |||
Cached New | Cached New | |||
---------------------------------------- | ---------------------------------------- | |||
old_TFO_Cookie old_TFO_Cookie | old_TFO_Cookie old_TFO_Cookie | |||
old_TFO_Failure old_TFO_Failure | old_TFO_Failure old_TFO_Failure | |||
6.2. Updates to the new TCP | ||||
During the connection, the associated TCB can be updated based on | ||||
particular events, as shown below: | ||||
TEMPORAL SHARING - Cache Updates | TEMPORAL SHARING - Cache Updates | |||
Cached TCB Current TCB when? New Cached TCB | Cached TCB Current TCB when? New Cached TCB | |||
------------------------------------------------------ | ------------------------------------------------------ | |||
old_MMS_S curr_ MMS_S OPEN curr MMS_S | old_MMS_S curr_ MMS_S OPEN curr MMS_S | |||
old_MMS_R curr_ MMS_R OPEN curr_MMS_R | old_MMS_R curr_ MMS_R OPEN curr_MMS_R | |||
old_sendMSS curr_sendMSS MSSopt curr_sendMSS | old_sendMSS curr_sendMSS MSSopt curr_sendMSS | |||
skipping to change at page 8, line 26 ¶ | skipping to change at page 7, line 40 ¶ | |||
old_RTT curr_RTT CLOSE merge(curr,old) | old_RTT curr_RTT CLOSE merge(curr,old) | |||
old_RTTvar curr_RTTvar CLOSE merge(curr,old) | old_RTTvar curr_RTTvar CLOSE merge(curr,old) | |||
old_option curr option ESTAB (depends on option) | old_option curr option ESTAB (depends on option) | |||
old_ssthresh curr_ssthresh CLOSE merge(curr,old) | old_ssthresh curr_ssthresh CLOSE merge(curr,old) | |||
old_snd_cwnd curr_snd_cwnd CLOSE merge(curr,old) | old_snd_cwnd curr_snd_cwnd CLOSE merge(curr,old) | |||
Caching PMTU and sendMSS is trivial; reported values are cached, and | The table below gives an overview of option-specific information | |||
the most recent values are used. The cache is updated when the MSS | that can be similarly shared. | |||
option is received in a SYN or after PMTUD (i.e., when an ICMPv4 | ||||
Fraqmentation Needed [RFC1191] or ICMPv6 Packet Too Big message is | ||||
received [RFC8201] or the equivalent is inferred, e.g. as from | ||||
PLPMTUD [RFC4821]), respectively, so the cache always has the most | ||||
recent values from any connection. For sendMSS, the cache is | ||||
consulted only at connection establishment and not otherwise | ||||
updated, which means that MSS options do not affect current | ||||
connections. The default sendMSS is never saved; only reported MSS | ||||
values update the cache, so an explicit override is required to | ||||
reduce the sendMSS. There is no particular benefit to caching MMS_S | ||||
and MMS R as these are reported by the local IP stack. | ||||
TCP options are copied or merged depending on the details of each | TEMPORAL SHARING - Option Info Updates | |||
option, where "merge" is some function that combines the values of | ||||
"curr" and "old". E.g., TFO state is updated when a connection is | Cached Current when? New Cached | |||
established and read before establishing a new connection. | ---------------------------------------------------------------- | |||
old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie | ||||
old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure | ||||
6.3. Discussion | ||||
There is no particular benefit to caching MMS_S and MMS R as these | ||||
are reported by the local IP stack. Caching sendMSS and PMTU is | ||||
trivial; reported values are cached, and the most recent values are | ||||
used. The cache is updated when the MSS option is received in a SYN | ||||
or after PMTUD (i.e., when an ICMPv4 Fraqmentation Needed [RFC1191] | ||||
or ICMPv6 Packet Too Big message is received [RFC8201] or the | ||||
equivalent is inferred, e.g. as from PLPMTUD [RFC4821]), | ||||
respectively, so the cache always has the most recent values from | ||||
any connection. For sendMSS, the cache is consulted only at | ||||
connection establishment and not otherwise updated, which means that | ||||
MSS options do not affect current connections. The default sendMSS | ||||
is never saved; only reported MSS values update the cache, so an | ||||
explicit override is required to reduce the sendMSS. | ||||
RTT values are updated by formulae that merge the old and new | RTT values are updated by formulae that merge the old and new | |||
values. Dynamic RTT estimation requires a sequence of RTT | values. Dynamic RTT estimation requires a sequence of RTT | |||
measurements. As a result, the cached RTT (and its variance) is an | measurements. As a result, the cached RTT (and its variance) is an | |||
average of its previous value with the contents of the currently | average of its previous value with the contents of the currently | |||
active TCB for that host, when a TCB is closed. RTT values are | active TCB for that host, when a TCB is closed. RTT values are | |||
updated only when a connection is closed. The method for merging old | updated only when a connection is closed. The method for merging old | |||
and current values needs to attempt to reduce the transient for new | and current values needs to attempt to reduce the transient effects | |||
connections. | of the new connections. | |||
The updates for RTT, RTTvar and ssthresh rely on existing | The updates for RTT, RTTvar and ssthresh rely on existing | |||
information, i.e., old values. Should no such values exist, the | information, i.e., old values. Should no such values exist, the | |||
current values are cached instead. | current values are cached instead. | |||
TEMPORAL SHARING - Option info Updates | TCP options are copied or merged depending on the details of each | |||
option, where "merge" is some function that combines the values of | ||||
"curr" and "old". E.g., TFO state is updated when a connection is | ||||
established and read before establishing a new connection. | ||||
Cached Current when? New Cached | Sections 8 and 9 discuss compatibility issues and implications of | |||
---------------------------------------------------------------- | sharing the specific information listed above. Section 10 gives an | |||
old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie | overview of known implementations. | |||
old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure | Most cached TCB values are updated when a connection closes. The | |||
exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122], | ||||
PMTU which is updated after Path MTU Discovery | ||||
[RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the | ||||
MSS option is received in the TCP SYN header. | ||||
7. An Example of Ensemble Sharing | Sharing sendMSS information affects only data in the SYN of the next | |||
connection, because sendMSS information is typically included in | ||||
most TCP SYN segments. Caching PMTU can accelerate the efficiency of | ||||
PMTUD, but can also result in black-holing until corrected if in | ||||
error. Caching MMS_R and MMS_S may be of little direct value as they | ||||
are reported by the local IP stack anyway. | ||||
The way in which other TCP option state can be shared depends on the | ||||
details of that option. E.g., TFO state includes the TCP Fast Open | ||||
Cookie [RFC7413] or, in case TFO fails, a negative TCP Fast Open | ||||
response. RFC 7413 states, "The client MUST cache negative responses | ||||
from the server in order to avoid potential connection failures. | ||||
Negative responses include the server not acknowledging the data in | ||||
the SYN, ICMP error messages, and (most importantly) no response | ||||
(SYN-ACK) from the server at all, i.e., connection timeout." [RFC | ||||
7413]. TFOinfo is cached when a connection is established. | ||||
Other TCP option state might not be as readily cached. E.g., TCP-AO | ||||
[RFC5925] success or failure between a host pair for a single SYN | ||||
destination port might be usefully cached. TCP-AO success or failure | ||||
to other SYN destination ports on that host pair is never useful to | ||||
cache because TCP-AO security parameters can vary per service. | ||||
7. Ensemble Sharing | ||||
Sharing cached TCB data across concurrent connections requires | Sharing cached TCB data across concurrent connections requires | |||
attention to the aggregate nature of some of the shared state. For | attention to the aggregate nature of some of the shared state. For | |||
example, although MSS and RTT values can be shared by copying, it | example, although MSS and RTT values can be shared by copying, it | |||
may not be appropriate to simply copy congestion window or ssthresh | may not be appropriate to simply copy congestion window or ssthresh | |||
information; instead, the new values can be a function (f) of the | information; instead, the new values can be a function (f) of the | |||
cumulative values and the number of connections (N). | cumulative values and the number of connections (N). | |||
7.1. Initialization of a new TCB | ||||
TCBs for new connections can be initialized using context from | ||||
concurrent connections as follows: | ||||
ENSEMBLE SHARING - TCB Initialization | ENSEMBLE SHARING - TCB Initialization | |||
Cached TCB New TCB | Cached TCB New TCB | |||
-------------------------------- | -------------------------------- | |||
old_MMS_S old_MMS_S | old_MMS_S old_MMS_S | |||
old_MMS_R old_MMS_R | old_MMS_R old_MMS_R | |||
old_sendMSS old_sendMSS | old_sendMSS old_sendMSS | |||
skipping to change at page 10, line 5 ¶ | skipping to change at page 10, line 27 ¶ | |||
old_RTT old_RTT | old_RTT old_RTT | |||
old_RTTvar old_RTTvar | old_RTTvar old_RTTvar | |||
old ssthresh sum f(old ssthresh sum, N) | old ssthresh sum f(old ssthresh sum, N) | |||
old snd_cwnd sum f(old snd cwnd sum, N) | old snd_cwnd sum f(old snd cwnd sum, N) | |||
old_option (option-specific) | old_option (option-specific) | |||
Sections 8 and 9 discuss compatibility issues and implications of | ||||
sharing the specific information listed above. | ||||
The table below gives an overview of option-specific information | The table below gives an overview of option-specific information | |||
that can be shared. | that can be similarly shared. | |||
ENSEMBLE SHARING Option info | ENSEMBLE SHARING - Option Info Initialization | |||
Cached New | Cached New | |||
---------------------------------------- | ---------------------------------------- | |||
old_TFO_Cookie old_TFO_Cookie | old_TFO_Cookie old_TFO_Cookie | |||
old_TFO_Failure old_TFO_Failure | old_TFO_Failure old_TFO_Failure | |||
7.2. Updates to the new TCB | ||||
During the connection, the associated TCB can be updated based on | ||||
changes to concurrent connections, as shown below: | ||||
ENSEMBLE SHARING - Cache Updates | ENSEMBLE SHARING - Cache Updates | |||
Cached TCB Current TCB when? New Cached TCB | Cached TCB Current TCB when? New Cached TCB | |||
----------------------------------------------------- | ----------------------------------------------------- | |||
old_MMS_S curr_MMS_S OPEN curr_MMS_S | old_MMS_S curr_MMS_S OPEN curr_MMS_S | |||
old_MMS_R curr_MMS_R OPEN curr_MMS_R | old_MMS_R curr_MMS_R OPEN curr_MMS_R | |||
old_sendMSS curr_sendMSS MSSopt curr_sendMSS | old_sendMSS curr_sendMSS MSSopt curr_sendMSS | |||
skipping to change at page 10, line 42 ¶ | skipping to change at page 11, line 28 ¶ | |||
old_RTT curr_RTT update rtt_update(old,curr) | old_RTT curr_RTT update rtt_update(old,curr) | |||
old_RTTvar curr_RTTvar update rtt_update(old,curr) | old_RTTvar curr_RTTvar update rtt_update(old,curr) | |||
old ssthresh curr ssthresh update adjust sum as appopriate | old ssthresh curr ssthresh update adjust sum as appopriate | |||
old snd_cwnd curr snd_cwnd update adjust sum as appopriate | old snd_cwnd curr snd_cwnd update adjust sum as appopriate | |||
old_option curr option (depends) (option specific) | old_option curr option (depends) (option specific) | |||
The table below gives an overview of option-specific information | ||||
that can be similarly shared. | ||||
ENSEMBLE SHARING - Option Info Updates | ||||
Cached Current when? New Cached | ||||
---------------------------------------------------------------- | ||||
old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie | ||||
old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure | ||||
7.3. Discussion | ||||
For ensemble sharing, TCB information should be cached as early as | For ensemble sharing, TCB information should be cached as early as | |||
possible, sometimes before a connection is closed. Otherwise, | possible, sometimes before a connection is closed. Otherwise, | |||
opening multiple concurrent connections may not result in TCB data | opening multiple concurrent connections may not result in TCB data | |||
sharing if no connection closes before others open. The amount of | sharing if no connection closes before others open. The amount of | |||
work involved in updating the aggregate average should be minimized, | work involved in updating the aggregate average should be minimized, | |||
but the resulting value should be equivalent to having all values | but the resulting value should be equivalent to having all values | |||
measured within a single connection. The function "rtt_update" in | measured within a single connection. The function "rtt_update" in | |||
the ensemble sharing table indicates this operation, which occurs | the ensemble sharing table indicates this operation, which occurs | |||
whenever the RTT would have been updated in the individual TCP | whenever the RTT would have been updated in the individual TCP | |||
connection. As a result, the cache contains the shared RTT | connection. As a result, the cache contains the shared RTT | |||
variables, which no longer need to reside in the TCB. | variables, which no longer need to reside in the TCB. | |||
Congestion window size and ssthresh aggregation are more complicated | Congestion window size and ssthresh aggregation are more complicated | |||
in the concurrent case. When there is an ensemble of connections, we | in the concurrent case. When there is an ensemble of connections, we | |||
need to decide how that ensemble would have shared these variables, | need to decide how that ensemble would have shared these variables, | |||
in order to derive initial values for new TCBs. | in order to derive initial values for new TCBs. | |||
ENSEMBLE SHARING - Option info Updates | Sections 8 and 9 discuss compatibility issues and implications of | |||
sharing the specific information listed above. | ||||
Cached Current when? New Cached | ||||
---------------------------------------------------------------- | ||||
old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie | ||||
old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure | ||||
Any assumption of this sharing can be incorrect because identical | Any assumption of TCB information sharing can be incorrect because | |||
endpoint address pairs may not share network paths. In current | identical endpoint address pairs may not share network paths. In | |||
implementations, new congestion windows are set at an initial value | current implementations, new congestion windows are set at an | |||
of 4-10 segments [RFC3390][RFC6928], so that the sum of the current | initial value of 4-10 segments [RFC3390][RFC6928], so that the sum | |||
windows is increased for any new connection. This can have | of the current windows is increased for any new connection. This can | |||
detrimental consequences where several connections share a highly | have detrimental consequences where several connections share a | |||
congested link. | highly congested link. | |||
There are several ways to initialize the congestion window in a new | There are several ways to initialize the congestion window in a new | |||
TCB among an ensemble of current connections to a host. Current TCP | TCB among an ensemble of current connections to a host. Current TCP | |||
implementations initialize it to four segments as standard [rfc3390] | implementations initialize it to four segments as standard [rfc3390] | |||
and 10 segments experimentally [RFC6928]. These approaches assume | and 10 segments experimentally [RFC6928]. These approaches assume | |||
that new connections should behave as conservatively as possible. | that new connections should behave as conservatively as possible. | |||
The algorithm described in [Ba12] adjusts the initial cwnd depending | The algorithm described in [Ba12] adjusts the initial cwnd depending | |||
on the cwnd values of ongoing connections. There have also been | on the cwnd values of ongoing connections. There have also been | |||
suggestions to use the kind of sharing mechanisms described in this | suggestions to use the kind of sharing mechanisms described in this | |||
document over long timescales to adapt TCP's initial window | document over long timescales to adapt TCP's initial window | |||
automatically, as described further in Appendix A [To12]. | automatically, as described further in Appendix A [To12]. | |||
8. Compatibility Issues | 8. Compatibility Issues | |||
Here, we discuss various types of problems that may arise with TCB | ||||
information sharing. | ||||
For the congestion and current window information, the initial | For the congestion and current window information, the initial | |||
values computed by TCB interdependence may not be consistent with | values computed by TCB interdependence may not be consistent with | |||
the long-term aggregate behavior of a set of concurrent connections | the long-term aggregate behavior of a set of concurrent connections | |||
between the same endpoints. Under conventional TCP congestion | between the same endpoints. Under conventional TCP congestion | |||
control, if a single existing connection has converged to a | control, if a single existing connection has converged to a | |||
congestion window of 40 segments, two newly joining concurrent | congestion window of 40 segments, two newly joining concurrent | |||
connections assume initial windows of 10 segments [RFC6928], and the | connections assume initial windows of 10 segments [RFC6928], and the | |||
current connection's window doesn't decrease to accommodate this | current connection's window doesn't decrease to accommodate this | |||
additional load and connections can mutually interfere. One example | additional load and connections can mutually interfere. One example | |||
of this is seen on low-bandwidth, high-delay links, where concurrent | of this is seen on low-bandwidth, high-delay links, where concurrent | |||
connections supporting Web traffic can collide because their initial | connections supporting Web traffic can collide because their initial | |||
windows were too large, even when set at one segment. | windows were too large, even when set at one segment. | |||
The authors of [Hu12] recommend caching ssthresh for temporal | The authors of [Hu12] recommend caching ssthresh for temporal | |||
sharing only when flows are long. Some studies suggest that sharing | sharing only when flows are long. Some studies suggest that sharing | |||
ssthresh between short flows can deteriorate the performance of | ssthresh between short flows can deteriorate the performance of | |||
individual connections [Hu12, Du16], although this may benefit | individual connections [Hu12, Du16], although this may benefit | |||
aggregate network performance. | aggregate network performance. | |||
Due to mechanisms like ECMP and LAG [RFC7424], TCP connections | 8.1. Traversing the same network path | |||
sharing the same host-pair may not always share the same path. This | ||||
does not matter for host-specific information such as RWIN and TCP | TCP is sometimes used in situations where packets of the same host- | |||
option state, such as TFOinfo. When TCB information is shared across | pair do not always take the same path. Multipath routing that relies | |||
different SYN destination ports, path-related information can be | on examining transport headers, such as ECMP and LAG [RFC7424], may | |||
incorrect; however, the impact of this error is potentially | not result in repeatable path selection when TCP segments are | |||
diminished if (as discussed here) TCB sharing affects only the | encapsulated, encrypted, or altered - for example, in some Virtual | |||
transient event of a connection start or if TCB information is | Private Network (VPN) tunnels that rely on proprietary | |||
shared only within connections to the same SYN destination port. In | encapsulation. Similarly, such approaches cannot operate | |||
case of Temporal Sharing, TCB information could also become invalid | deterministically when the TCP header is encrypted, e.g., when using | |||
over time. Because this is similar to the case when a connection | IPsec ESP (although TCB interdependence among the entire set sharing | |||
becomes idle, mechanisms that address idle TCP connections (e.g., | the same endpoint IP addresses should work without problems when the | |||
[RFC7661]) could also be applied to TCB cache management, especially | TCP header is encrypted). Measures to increase the probability that | |||
when TCP Fast Open is used [RFC7413]. | connections use the same path could be applied: e.g., the | |||
connections could be given the same IPv6 flow label. TCB | ||||
interdependence can also be extended to sets of host IP address | ||||
pairs that share the same network path conditions, such as when a | ||||
group of addresses is on the same LAN (see Section 9). | ||||
Traversing the same path is not important for host-specific | ||||
information such as RWIN and TCP option state, such as TFOinfo. When | ||||
TCB information is shared across different SYN destination ports, | ||||
path-related information can be incorrect; however, the impact of | ||||
this error is potentially diminished if (as discussed here) TCB | ||||
sharing affects only the transient event of a connection start or if | ||||
TCB information is shared only within connections to the same SYN | ||||
destination port. In case of Temporal Sharing, TCB information could | ||||
also become invalid over time. Because this is similar to the case | ||||
when a connection becomes idle, mechanisms that address idle TCP | ||||
connections (e.g., [RFC7661]) could also be applied to TCB cache | ||||
management, especially when TCP Fast Open is used [RFC7413]. | ||||
8.2. State dependence | ||||
There may be additional considerations to the way in which TCB | There may be additional considerations to the way in which TCB | |||
interdependence rebalances congestion feedback among the current | interdependence rebalances congestion feedback among the current | |||
connections, e.g., it may be appropriate to consider the impact of a | connections, e.g., it may be appropriate to consider the impact of a | |||
connection being in Fast Recovery [RFC5681] or some other similar | connection being in Fast Recovery [RFC5681] or some other similar | |||
unusual feedback state, e.g., as inhibiting or affecting the | unusual feedback state, e.g., as inhibiting or affecting the | |||
calculations described herein. | calculations described herein. | |||
TCP is sometimes used in situations where packets of the same host- | 8.3. Problems with IP sharing | |||
pair do not always take the same path. Multipath routing that relies | ||||
on examining transport headers, such as ECMP and LAG, may not result | ||||
in repeatable path selection when TCP segments are encapsulated, | ||||
encrypted, or altered - for example, in some Virtual Private Network | ||||
(VPN) tunnels that rely on proprietary encapsulation. Similarly, | ||||
such approaches cannot operate deterministically when the TCP header | ||||
is encrypted, e.g., when using IPsec ESP. TCB interdependence among | ||||
the entire set sharing the same endpoint IP addresses should work | ||||
without problems under these circumstances. Moreover, measures to | ||||
increase the probability that connections use the same path could be | ||||
applied: e.g., the connections could be given the same IPv6 flow | ||||
label. TCB interdependence can also be extended to sets of host IP | ||||
address pairs that share the same network path conditions, such as | ||||
when a group of addresses is on the same LAN (see Section 9). | ||||
It can be wrong to share TCB information between TCP connections on | It can be wrong to share TCB information between TCP connections on | |||
the same host as identified by the IP address if an IP address is | the same host as identified by the IP address if an IP address is | |||
assigned to a new host (e.g., IP address spinning, as is used by | assigned to a new host (e.g., IP address spinning, as is used by | |||
ISPs to inhibit running servers). It can be wrong if Network Address | ISPs to inhibit running servers). It can be wrong if Network Address | |||
(and Port) Translation (NA(P)T) [RFC2663] or any other IP sharing | (and Port) Translation (NA(P)T) [RFC2663] or any other IP sharing | |||
mechanism is used. Such mechanisms are less likely to be used with | mechanism is used. Such mechanisms are less likely to be used with | |||
IPv6. Other methods to identify a host could also be considered to | IPv6. Other methods to identify a host could also be considered to | |||
make correct TCB sharing more likely. Moreover, some TCB information | make correct TCB sharing more likely. Moreover, some TCB information | |||
is about dominant path properties rather than the specific host. IP | is about dominant path properties rather than the specific host. IP | |||
skipping to change at page 13, line 34 ¶ | skipping to change at page 14, line 34 ¶ | |||
[RFC7231]. Protocols like HTTP/2 [RFC7540] avoid connection | [RFC7231]. Protocols like HTTP/2 [RFC7540] avoid connection | |||
reestablishment costs by serializing or multiplexing a set of per- | reestablishment costs by serializing or multiplexing a set of per- | |||
host connections across a single TCP connection. This avoids TCP's | host connections across a single TCP connection. This avoids TCP's | |||
per-connection OPEN handshake and also avoids recomputing the MSS, | per-connection OPEN handshake and also avoids recomputing the MSS, | |||
RTT, and congestion window values. By avoiding the so-called, "slow- | RTT, and congestion window values. By avoiding the so-called, "slow- | |||
start restart," performance can be optimized [Hu01]. TCB | start restart," performance can be optimized [Hu01]. TCB | |||
interdependence can provide the "slow-start restart avoidance" of | interdependence can provide the "slow-start restart avoidance" of | |||
multiplexing, without requiring a multiplexing mechanism at the | multiplexing, without requiring a multiplexing mechanism at the | |||
application layer. | application layer. | |||
Like the initial version of this document [RFC2140], this update's | ||||
approach to TCB interdependence focuses on sharing a set of TCBs by | ||||
updating the TCB state to reduce the impact of transients when | ||||
connections begin or end. Other mechanisms have since been proposed | ||||
to continuously share information between all ongoing communication | ||||
(including connectionless protocols), updating the congestion state | ||||
during any congestion-related event (e.g., timeout, loss | ||||
confirmation, etc.) [RFC3124]. By dealing exclusively with | ||||
transients, TCB interdependence is more likely to exhibit the same | ||||
behavior as unmodified, independent TCP connections. | ||||
9.1. Layering | ||||
TCB interdependence pushes some of the TCP implementation from the | TCB interdependence pushes some of the TCP implementation from the | |||
traditional transport layer (in the ISO model), to the network | traditional transport layer (in the ISO model), to the network | |||
layer. This acknowledges that some state is in fact per-host-pair or | layer. This acknowledges that some state is in fact per-host-pair or | |||
can be per-path as indicated solely by that host-pair. Transport | can be per-path as indicated solely by that host-pair. Transport | |||
protocols typically manage per-application-pair associations (per | protocols typically manage per-application-pair associations (per | |||
stream), and network protocols manage per-host-pair and path | stream), and network protocols manage per-host-pair and path | |||
associations (routing). Round-trip time, MSS, and congestion | associations (routing). Round-trip time, MSS, and congestion | |||
information could be more appropriately handled in a network-layer | information could be more appropriately handled in a network-layer | |||
fashion, aggregated among concurrent connections, and shared across | fashion, aggregated among concurrent connections, and shared across | |||
connection instances [RFC3124]. | connection instances [RFC3124]. | |||
skipping to change at page 14, line 8 ¶ | skipping to change at page 15, line 20 ¶ | |||
An earlier version of RTT sharing suggested implementing RTT state | An earlier version of RTT sharing suggested implementing RTT state | |||
at the IP layer, rather than at the TCP layer. Our observations | at the IP layer, rather than at the TCP layer. Our observations | |||
describe sharing state among TCP connections, which avoids some of | describe sharing state among TCP connections, which avoids some of | |||
the difficulties in an IP-layer solution. One such problem of an IP | the difficulties in an IP-layer solution. One such problem of an IP | |||
layer solution is determining the correspondence between packet | layer solution is determining the correspondence between packet | |||
exchanges using IP header information alone, where such | exchanges using IP header information alone, where such | |||
correspondence is needed to compute RTT. Because TCB sharing | correspondence is needed to compute RTT. Because TCB sharing | |||
computes RTTs inside the TCP layer using TCP header information, it | computes RTTs inside the TCP layer using TCP header information, it | |||
can be implemented more directly and simply than at the IP layer. | can be implemented more directly and simply than at the IP layer. | |||
This is a case where information should be computed at the transport | This is a case where information should be computed at the transport | |||
layer, but could be shared at the network layer. | layer but could be shared at the network layer. | |||
9.2. Other possibilities | ||||
Per-host-pair associations are not the limit of these techniques. It | Per-host-pair associations are not the limit of these techniques. It | |||
is possible that TCBs could be similarly shared between hosts on a | is possible that TCBs could be similarly shared between hosts on a | |||
subnet or within a cluster, because the predominant path can be | subnet or within a cluster, because the predominant path can be | |||
subnet-subnet, rather than host-host. Additionally, TCB | subnet-subnet, rather than host-host. Additionally, TCB | |||
interdependence can be applied to any protocol with congestion | interdependence can be applied to any protocol with congestion | |||
state, including SCTP [RFC4960] and DCCP [RFC4340], as well as for | state, including SCTP [RFC4960] and DCCP [RFC4340], as well as for | |||
individual subflows in Multipath TCP [RFC6824]. | individual subflows in Multipath TCP [RFC6824]. | |||
There may be other information that can be shared between concurrent | There may be other information that can be shared between concurrent | |||
connections. For example, knowing that another connection has just | connections. For example, knowing that another connection has just | |||
tried to expand its window size and failed, a connection may not | tried to expand its window size and failed, a connection may not | |||
attempt to do the same for some period. The idea is that existing | attempt to do the same for some period. The idea is that existing | |||
TCP implementations infer the behavior of all competing connections, | TCP implementations infer the behavior of all competing connections, | |||
including those within the same host or subnet. One possible | including those within the same host or subnet. One possible | |||
optimization is to make that implicit feedback explicit, via | optimization is to make that implicit feedback explicit, via | |||
extended information associated with the endpoint IP address and its | extended information associated with the endpoint IP address and its | |||
TCP implementation, rather than per-connection state in the TCB. | TCP implementation, rather than per-connection state in the TCB. | |||
Like the initial version of this document [RFC2140], this update's | ||||
approach to TCB interdependence focuses on sharing a set of TCBs by | ||||
updating the TCB state to reduce the impact of transients when | ||||
connections begin or end. Other mechanisms have since been proposed | ||||
to continuously share information between all ongoing communication | ||||
(including connectionless protocols), updating the congestion state | ||||
during any congestion-related event (e.g., timeout, loss | ||||
confirmation, etc.) [RFC3124]. By dealing exclusively with | ||||
transients, TCB interdependence is more likely to exhibit the same | ||||
behavior as unmodified, independent TCP connections. | ||||
10. Implementation Observations | 10. Implementation Observations | |||
The observation that some TCB state is host-pair specific rather | The observation that some TCB state is host-pair specific rather | |||
than application-pair dependent is not new and is a common | than application-pair dependent is not new and is a common | |||
engineering decision in layered protocol implementations. Although | engineering decision in layered protocol implementations. Although | |||
now deprecated, T/TCP [RFC1644] was the first to propose using | now deprecated, T/TCP [RFC1644] was the first to propose using | |||
caches in order to maintain TCB states (see Appendix A for more | caches in order to maintain TCB states (see Appendix A for more | |||
information). | information). | |||
The table below describes the current implementation status for some | The table below describes the current implementation status for some | |||
TCB information in Linux kernel version 4.6, FreeBSD 10 and Windows | TCB information in Linux kernel version 4.6, FreeBSD 10 and Windows | |||
(as of October 2016). In the table, "shared" only refers to temporal | (as of October 2016). In the table, "shared" only refers to temporal | |||
sharing. | sharing. | |||
CURRENT IMPLEMENTATION STATUS (as of 2016) | ||||
TCB data Status | TCB data Status | |||
----------------------------------------------------------- | ----------------------------------------------------------- | |||
old MMS_S Not shared | old MMS_S Not shared | |||
old MMS_R Not shared | old MMS_R Not shared | |||
old_sendMSS Cached and shared in Linux (MSS) | old_sendMSS Cached and shared in Linux (MSS) | |||
old PMTU Cached and shared in FreeBSD and Windows (PMTU) | old PMTU Cached and shared in FreeBSD and Windows (PMTU) | |||
skipping to change at page 16, line 18 ¶ | skipping to change at page 17, line 21 ¶ | |||
sharing over long timescales to adapt TCP's initial window | sharing over long timescales to adapt TCP's initial window | |||
automatically, largely imported from [To12]. | automatically, largely imported from [To12]. | |||
Finally, this document updates and significantly expands the | Finally, this document updates and significantly expands the | |||
referenced literature. | referenced literature. | |||
12. Security Considerations | 12. Security Considerations | |||
These presented implementation methods do not have additional | These presented implementation methods do not have additional | |||
ramifications for explicit attacks. They may be susceptible to | ramifications for explicit attacks. They may be susceptible to | |||
denial-of-service attacks if not otherwise secured. For example, an | denial-of-service attacks if not otherwise secured. | |||
application can open a connection and set its window size to zero, | ||||
denying service to any other subsequent connection between those | ||||
hosts. | ||||
TCB sharing may be susceptible to denial-of-service attacks, | TCB sharing may be susceptible to denial-of-service attacks, | |||
wherever the TCB is shared, between connections in a single host, or | wherever the TCB is shared, between connections in a single host, or | |||
between hosts if TCB sharing is implemented within a subnet (see | between hosts if TCB sharing is implemented within a subnet (see | |||
Implications section). Some shared TCB parameters are used only to | Implications section). Some shared TCB parameters are used only to | |||
create new TCBs, others are shared among the TCBs of ongoing | create new TCBs, others are shared among the TCBs of ongoing | |||
connections. New connections can join the ongoing set, e.g., to | connections. New connections can join the ongoing set, e.g., to | |||
optimize send window size among a set of connections to the same | optimize send window size among a set of connections to the same | |||
host. | host. | |||
Attacks on parameters used only for initialization affect only the | Attacks on parameters used only for initialization affect only the | |||
transient performance of a TCP connection. For short connections, | transient performance of a TCP connection. For short connections, | |||
the performance ramification can approach that of a denial-of- | the performance ramification can approach that of a denial-of- | |||
service attack. E.g., if an application changes its TCB to have a | service attack. E.g., if an application changes its TCB to have a | |||
false and small window size, subsequent connections would experience | false and small window size, subsequent connections would experience | |||
performance degradation until their window grew appropriately. | performance degradation until their window grew appropriately. | |||
TCB sharing reuses and mixes information from past and current | ||||
connections. Although reusing information could create a potential | ||||
for fingerprinting to identify hosts, the mixing reduces that | ||||
potential. There has been no evidence of fingerprinting based on | ||||
this technique and it is currently considered safe in that regard. | ||||
13. IANA Considerations | 13. IANA Considerations | |||
There are no IANA implications or requests in this document. | There are no IANA implications or requests in this document. | |||
This section should be removed upon final publication as an RFC. | This section should be removed upon final publication as an RFC. | |||
14. References | 14. References | |||
14.1. Normative References | 14.1. Normative References | |||
[RFC793] Postel, Jon, "Transmission Control Protocol," Network | ||||
Working Group RFC-793/STD-7, ISI, Sept. 1981. | ||||
[RFC1122] Braden, R. (ed), "Requirements for Internet Hosts -- | ||||
Communication Layers", RFC-1122, Oct. 1989. | ||||
[RFC1191] Mogul, J., Deering, S., "Path MTU Discovery," RFC 1191, | ||||
Nov. 1990. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU | ||||
Discovery," RFC 4821, Mar. 2007. | ||||
[RFC5681] Allman, M., Paxson, V., Blanton, E., "TCP Congestion | ||||
Control," RFC 5681 (Standards Track), Sep. 2009. | ||||
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., Jain, A., "TCP Fast | ||||
Open", RFC 7413, Dec. 2014. | ||||
[RFC8174] Leiba., B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba., B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", RFC 8174, May 2017. | 2119 Key Words", RFC 8174, May 2017. | |||
[RFC8201] McCann, J., Deering. S., Mogul, J., Hinden, R. (Ed.), | ||||
"Path MTU Discovery for IP version 6," RFC 8201, Jul. | ||||
2017. | ||||
14.2. Informative References | 14.2. Informative References | |||
[Al10] Allman, M., "Initial Congestion Window Specification", | [Al10] Allman, M., "Initial Congestion Window Specification", | |||
(work in progress), draft-allman-tcpm-bump-initcwnd-00, | (work in progress), draft-allman-tcpm-bump-initcwnd-00, | |||
Nov. 2010. | Nov. 2010. | |||
[Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A | [Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A | |||
Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala | Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala | |||
Lumpur, Malaysia, May 23-27 2016. | Lumpur, Malaysia, May 23-27 2016. | |||
skipping to change at page 17, line 48 ¶ | skipping to change at page 19, line 29 ¶ | |||
Start Restart After Idle", draft-hughes-restart-00 | Start Restart After Idle", draft-hughes-restart-00 | |||
(expired), Dec. 2001. | (expired), Dec. 2001. | |||
[Hu12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for | [Hu12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for | |||
short TCP flows," 2012 IEEE International Conference on | short TCP flows," 2012 IEEE International Conference on | |||
Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213. | Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213. | |||
[Ja88] Jacobson, V., M. Karels, "Congestion Avoidance and | [Ja88] Jacobson, V., M. Karels, "Congestion Avoidance and | |||
Control", Proc. Sigcomm 1988. | Control", Proc. Sigcomm 1988. | |||
[RFC793] Postel, Jon, "Transmission Control Protocol," Network | ||||
Working Group RFC-793/STD-7, ISI, Sept. 1981. | ||||
[RFC1122] Braden, R. (ed), "Requirements for Internet Hosts -- | ||||
Communication Layers", RFC-1122, Oct. 1989. | ||||
[RFC1191] Mogul, J., Deering, S., "Path MTU Discovery," RFC 1191, | ||||
Nov. 1990. | ||||
[RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions | [RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions | |||
Functional Specification," RFC-1644, July 1994. | Functional Specification," RFC-1644, July 1994. | |||
[RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379, | [RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379, | |||
September 1992. | September 1992. | |||
[RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast | [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast | |||
Retransmit, and Fast Recovery Algorithms", RFC2001 | Retransmit, and Fast Recovery Algorithms", RFC2001 | |||
(Standards Track), Jan. 1997. | (Standards Track), Jan. 1997. | |||
skipping to change at page 18, line 46 ¶ | skipping to change at page 20, line 17 ¶ | |||
[RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's | [RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's | |||
Initial Window," RFC 3390, Oct. 2002. | Initial Window," RFC 3390, Oct. 2002. | |||
[RFC3124] Balakrishnan, H., Seshan, S., "The Congestion Manager," | [RFC3124] Balakrishnan, H., Seshan, S., "The Congestion Manager," | |||
RFC 3124, June 2001. | RFC 3124, June 2001. | |||
[RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion | [RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion | |||
Control Protocol (DCCP)," RFC 4340, Mar. 2006. | Control Protocol (DCCP)," RFC 4340, Mar. 2006. | |||
[RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU | ||||
Discovery," RFC 4821, Mar. 2007. | ||||
[RFC4960] Stewart, R., (Ed.), "Stream Control Transmission | [RFC4960] Stewart, R., (Ed.), "Stream Control Transmission | |||
Protocol," RFC4960, Sept. 2007. | Protocol," RFC4960, Sept. 2007. | |||
[RFC5681] Allman, M., Paxson, V., Blanton, E., "TCP Congestion | ||||
Control," RFC 5681 (Standards Track), Sep. 2009. | ||||
[RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication | [RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication | |||
Option," RFC 5925, June 2010. | Option," RFC 5925, June 2010. | |||
[RFC6824] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., "TCP | [RFC6824] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., "TCP | |||
Extensions for Multipath Operation with Multiple | Extensions for Multipath Operation with Multiple | |||
Addresses," RFC 6824, Jan. 2013. | Addresses," RFC 6824, Jan. 2013. | |||
[RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing | [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing | |||
TCP's Initial Window," RFC 6928, Apr. 2013. | TCP's Initial Window," RFC 6928, Apr. 2013. | |||
[RFC7231] Fielding, R., J. Reshke, Eds., "HTTP/1.1 Semantics and | [RFC7231] Fielding, R., J. Reshke, Eds., "HTTP/1.1 Semantics and | |||
Content," RFC-7231, June 2014. | Content," RFC-7231, June 2014. | |||
[RFC7323] Borman, D., B. Braden, V. Jacobson, R. Scheffenegger | [RFC7323] Borman, D., B. Braden, V. Jacobson, R. Scheffenegger | |||
(Ed.), "TCP Extensions for High Performance," RFC 7323, | (Ed.), "TCP Extensions for High Performance," RFC 7323, | |||
Sept. 2014. | Sept. 2014. | |||
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., Jain, A., "TCP Fast | ||||
Open", RFC 7413, Dec. 2014. | ||||
[RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., Khasnabish, | [RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., Khasnabish, | |||
B., "Mechanisms for Optimizing Link Aggregation Group | B., "Mechanisms for Optimizing Link Aggregation Group | |||
(LAG) and Equal-Cost Multipath (ECMP) Component Link | (LAG) and Equal-Cost Multipath (ECMP) Component Link | |||
Utilization in Networks", RFC 7424, Jan. 2015 | Utilization in Networks", RFC 7424, Jan. 2015 | |||
[RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer | [RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer | |||
Protocol Version 2 (HTTP/2)", RFC 7540, May 2015. | Protocol Version 2 (HTTP/2)", RFC 7540, May 2015. | |||
[RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP | [RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP | |||
to Support Rate-Limited Traffic", RFC 7661, Oct. 2015. | to Support Rate-Limited Traffic", RFC 7661, Oct. 2015. | |||
[RFC8201] McCann, J., Deering. S., Mogul, J., Hinden, R. (Ed.), | ||||
"Path MTU Discovery for IP version 6," RFC 8201, Jul. | ||||
2017. | ||||
[To12] Touch, J., "Automating the Initial Window in TCP," draft- | [To12] Touch, J., "Automating the Initial Window in TCP," draft- | |||
touch-tcpm-automatic-iw-03 (expired), July 2012. | touch-tcpm-automatic-iw-03 (expired), July 2012. | |||
15. Acknowledgments | 15. Acknowledgments | |||
The authors would like to thank for Praveen Balasubramanian for | The authors would like to thank for Praveen Balasubramanian for | |||
information regarding TCB sharing in Windows, and Yuchung Cheng, | information regarding TCB sharing in Windows, and Yuchung Cheng, | |||
Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on | Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on | |||
earlier versions of the draft. Earlier revisions of this work | earlier versions of the draft. Earlier revisions of this work | |||
received funding from a collaborative research project between the | received funding from a collaborative research project between the | |||
University of Oslo and Huawei Technologies Co., Ltd. and were partly | University of Oslo and Huawei Technologies Co., Ltd. and were partly | |||
supported by USC/ISI's Postel Center. | supported by USC/ISI's Postel Center. | |||
This document was prepared using 2-Word-v2.0.template.dot. | This document was prepared using 2-Word-v2.0.template.dot. | |||
16. Change log | 16. Change log | |||
This section should be removed upon final publication as an RFC. | This section should be removed upon final publication as an RFC. | |||
ietf-02: | ||||
- Minor reorganization and correction of typographic errors | ||||
- Added text to address fingerprinting in Security section | ||||
- Now retains Appendix B and body option tables upon publication | ||||
ietf-01: | ietf-01: | |||
- Added Appendix C to address long-timescale temporal adaptation. | - Added Appendix C to address long-timescale temporal adaptation. | |||
ietf-00: | ietf-00: | |||
- Re-issued as draft-ietf-tcpm-2140bis due to WG adoption. | - Re-issued as draft-ietf-tcpm-2140bis due to WG adoption. | |||
- Cleaned orphan references to T/TCP, removed incomplete refs | - Cleaned orphan references to T/TCP, removed incomplete refs | |||
- Moved references to informative section and updated Sec 2 | - Moved references to informative section and updated Sec 2 | |||
- Updated to clarify no impact to interoperability | - Updated to clarify no impact to interoperability | |||
skipping to change at page 21, line 14 ¶ | skipping to change at page 22, line 26 ¶ | |||
- Marked entries that are considered safe to share with an | - Marked entries that are considered safe to share with an | |||
asterisk (suggestion was to split the table) | asterisk (suggestion was to split the table) | |||
- Discussed correct host identification: NATs may make IP | - Discussed correct host identification: NATs may make IP | |||
addresses the wrong input, could e.g. use HTTP cookie. | addresses the wrong input, could e.g. use HTTP cookie. | |||
- Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and | - Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and | |||
MTU | MTU | |||
- Added information about option sharing, listed options in | - Added information about option sharing, listed options in 0 | |||
Appendix B | ||||
Authors' Addresses | Authors' Addresses | |||
Joe Touch | Joe Touch | |||
Manhattan Beach, CA 90266 | Manhattan Beach, CA 90266 | |||
USA | USA | |||
Phone: +1 (310) 560-0334 | Phone: +1 (310) 560-0334 | |||
Email: touch@strayalpha.com | Email: touch@strayalpha.com | |||
skipping to change at page 21, line 34 ¶ | skipping to change at page 23, line 4 ¶ | |||
Email: touch@strayalpha.com | Email: touch@strayalpha.com | |||
Michael Welzl | Michael Welzl | |||
University of Oslo | University of Oslo | |||
PO Box 1080 Blindern | PO Box 1080 Blindern | |||
Oslo N-0316 | Oslo N-0316 | |||
Norway | Norway | |||
Phone: +47 22 85 24 20 | Phone: +47 22 85 24 20 | |||
Email: michawe@ifi.uio.no | Email: michawe@ifi.uio.no | |||
Safiqul Islam | Safiqul Islam | |||
University of Oslo | University of Oslo | |||
PO Box 1080 Blindern | PO Box 1080 Blindern | |||
Oslo N-0316 | Oslo N-0316 | |||
Norway | Norway | |||
Phone: +47 22 84 08 37 | Phone: +47 22 84 08 37 | |||
Email: safiquli@ifi.uio.no | Email: safiquli@ifi.uio.no | |||
Appendix A: TCB sharing history | Appendix A: TCB Sharing History | |||
T/TCP proposed using caches to maintain TCB information across | T/TCP proposed using caches to maintain TCB information across | |||
instances (temporal sharing), e.g., smoothed RTT, RTT variance, | instances (temporal sharing), e.g., smoothed RTT, RTT variance, | |||
congestion avoidance threshold, and MSS [RFC1644]. These values were | congestion avoidance threshold, and MSS [RFC1644]. These values were | |||
in addition to connection counts used by T/TCP to accelerate data | in addition to connection counts used by T/TCP to accelerate data | |||
delivery prior to the full three-way handshake during an OPEN. The | delivery prior to the full three-way handshake during an OPEN. The | |||
goal was to aggregate TCB components where they reflect one | goal was to aggregate TCB components where they reflect one | |||
association - that of the host-pair, rather than artificially | association - that of the host-pair, rather than artificially | |||
separating those components by connection. | separating those components by connection. | |||
At least one T/TCP implementation saved the MSS and aggregated the | At least one T/TCP implementation saved the MSS and aggregated the | |||
RTT parameters across multiple connections, but omitted caching the | RTT parameters across multiple connections but omitted caching the | |||
congestion window information [Br94], as originally specified in | congestion window information [Br94], as originally specified in | |||
[RFC1379]. Some T/TCP implementations immediately updated MSS when | [RFC1379]. Some T/TCP implementations immediately updated MSS when | |||
the TCP MSS header option was received [Br94], although this was not | the TCP MSS header option was received [Br94], although this was not | |||
addressed specifically in the concepts or functional specification | addressed specifically in the concepts or functional specification | |||
[RFC1379][RFC1644]. In later T/TCP implementations, RTT values were | [RFC1379][RFC1644]. In later T/TCP implementations, RTT values were | |||
updated only after a CLOSE, which does not benefit concurrent | updated only after a CLOSE, which does not benefit concurrent | |||
sessions. | sessions. | |||
Temporal sharing of cached TCB data was originally implemented in | Temporal sharing of cached TCB data was originally implemented in | |||
the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same | the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same | |||
[FreeBSD]. As mentioned before, only the MSS and RTT parameters were | [FreeBSD]. As mentioned before, only the MSS and RTT parameters were | |||
cached, as originally specified in [RFC1379]. Later discussion of | cached, as originally specified in [RFC1379]. Later discussion of | |||
T/TCP suggested including congestion control parameters in this | T/TCP suggested including congestion control parameters in this | |||
cache; for example, [RFC1644] (Section 3.1) hints at initializing | cache; for example, [RFC1644] (Section 3.1) hints at initializing | |||
the congestion window to the old window size. | the congestion window to the old window size. | |||
Appendix B: TCP Option Sharing and Caching | Appendix B: TCP Option Sharing and Caching | |||
In addition to the options that can be cached and shared, this memo | In addition to the options that can be cached and shared, this memo | |||
also lists known options for which state is unsafe to be kept. This | also lists known options for which state is unsafe to be kept. This | |||
list is meant to avoid work duplication and should be removed upon | list is not intended to be authoritative or exhaustive. | |||
publication. | ||||
Obsolete (unsafe to keep state): | Obsolete (unsafe to keep state): | |||
ECHO | ECHO | |||
ECHO REPLY | ECHO REPLY | |||
PO Conn permitted | PO Conn permitted | |||
PO service profile | PO service profile | |||
End of changes. 56 change blocks. | ||||
189 lines changed or deleted | 248 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |