draft-paxson-tcpm-rfc2988bis-02.txt   rfc6298.txt 
Internet Engineering Task Force V. Paxson Internet Engineering Task Force (IETF) V. Paxson
INTERNET DRAFT ICSI/UC Berkeley Request for Comments: 6298 ICSI/UC Berkeley
File: draft-paxson-tcpm-rfc2988bis-02.txt M. Allman Obsoletes: 2988 M. Allman
Intended status: Proposed Standard ICSI Updates: 1122 ICSI
J. Chu Category: Standards Track J. Chu
Google ISSN: 2070-1721 Google
M. Sargent M. Sargent
CWRU CWRU
March 14, 2011 June 2011
Computing TCP's Retransmission Timer Computing TCP's Retransmission Timer
Status of this Memo Abstract
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering This document defines the standard algorithm that Transmission
Task Force (IETF), its areas, and its working groups. Note that Control Protocol (TCP) senders are required to use to compute and
other groups may also distribute working documents as Internet- manage their retransmission timer. It expands on the discussion in
Drafts. Section 4.2.3.1 of RFC 1122 and upgrades the requirement of
supporting the algorithm from a SHOULD to a MUST. This document
obsoletes RFC 2988.
Internet-Drafts are draft documents valid for a maximum of six Status of This Memo
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at This is an Internet Standards Track document.
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at This document is a product of the Internet Engineering Task Force
http://www.ietf.org/shadow.html. (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 5741.
This Internet-Draft will expire on September 14, 2011. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc6298.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Abstract
This document defines the standard algorithm that Transmission This document is subject to BCP 78 and the IETF Trust's Legal
Control Protocol (TCP) senders are required to use to compute and Provisions Relating to IETF Documents
manage their retransmission timer. It expands on the discussion in (http://trustee.ietf.org/license-info) in effect on the date of
section 4.2.3.1 of RFC 1122 and upgrades the requirement of publication of this document. Please review these documents
supporting the algorithm from a SHOULD to a MUST. carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
1 Introduction 1. Introduction
The Transmission Control Protocol (TCP) [Pos81] uses a retransmission The Transmission Control Protocol (TCP) [Pos81] uses a retransmission
timer to ensure data delivery in the absence of any feedback from the timer to ensure data delivery in the absence of any feedback from the
remote data receiver. The duration of this timer is referred to as remote data receiver. The duration of this timer is referred to as
RTO (retransmission timeout). RFC 1122 [Bra89] specifies that the RTO (retransmission timeout). RFC 1122 [Bra89] specifies that the
RTO should be calculated as outlined in [Jac88]. RTO should be calculated as outlined in [Jac88].
This document codifies the algorithm for setting the RTO. In This document codifies the algorithm for setting the RTO. In
addition, this document expands on the discussion in section 4.2.3.1 addition, this document expands on the discussion in Section 4.2.3.1
of RFC 1122 and upgrades the requirement of supporting the algorithm of RFC 1122 and upgrades the requirement of supporting the algorithm
from a SHOULD to a MUST. RFC 5681 [APB09] outlines the algorithm TCP from a SHOULD to a MUST. RFC 5681 [APB09] outlines the algorithm TCP
uses to begin sending after the RTO expires and a retransmission is uses to begin sending after the RTO expires and a retransmission is
sent. This document does not alter the behavior outlined in RFC 5681 sent. This document does not alter the behavior outlined in RFC 5681
[APB09]. [APB09].
In some situations it may be beneficial for a TCP sender to be more In some situations, it may be beneficial for a TCP sender to be more
conservative than the algorithms detailed in this document allow. conservative than the algorithms detailed in this document allow.
However, a TCP MUST NOT be more aggressive than the following However, a TCP MUST NOT be more aggressive than the following
algorithms allow. algorithms allow. This document obsoletes RFC 2988 [PA00].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [Bra97]. document are to be interpreted as described in [Bra97].
2 The Basic Algorithm 2. The Basic Algorithm
To compute the current RTO, a TCP sender maintains two state To compute the current RTO, a TCP sender maintains two state
variables, SRTT (smoothed round-trip time) and RTTVAR (round-trip variables, SRTT (smoothed round-trip time) and RTTVAR (round-trip
time variation). In addition, we assume a clock granularity of G time variation). In addition, we assume a clock granularity of G
seconds. seconds.
The rules governing the computation of SRTT, RTTVAR, and RTO are as The rules governing the computation of SRTT, RTTVAR, and RTO are as
follows: follows:
(2.1) Until a round-trip time (RTT) measurement has been made for a (2.1) Until a round-trip time (RTT) measurement has been made for a
segment sent between the sender and receiver, the sender SHOULD segment sent between the sender and receiver, the sender SHOULD
set RTO <- 1 second, though the "backing off" on repeated set RTO <- 1 second, though the "backing off" on repeated
retransmission discussed in (5.5) still applies. retransmission discussed in (5.5) still applies.
Note that the previous version of this document used an Note that the previous version of this document used an initial
initial RTO of 3 seconds [PA00]. A TCP implementation MAY RTO of 3 seconds [PA00]. A TCP implementation MAY still use
still use this value (or any other value > 1 second). This this value (or any other value > 1 second). This change in the
change in the lower bound on the initial RTO is discussed in lower bound on the initial RTO is discussed in further detail
further detail in Appendix A. in Appendix A.
(2.2) When the first RTT measurement R is made, the host MUST set (2.2) When the first RTT measurement R is made, the host MUST set
SRTT <- R SRTT <- R
RTTVAR <- R/2 RTTVAR <- R/2
RTO <- SRTT + max (G, K*RTTVAR) RTO <- SRTT + max (G, K*RTTVAR)
where K = 4. where K = 4.
(2.3) When a subsequent RTT measurement R' is made, a host MUST set (2.3) When a subsequent RTT measurement R' is made, a host MUST set
skipping to change at page 3, line 24 skipping to change at page 3, line 43
before updating SRTT itself using the second assignment. That before updating SRTT itself using the second assignment. That
is, updating RTTVAR and SRTT MUST be computed in the above is, updating RTTVAR and SRTT MUST be computed in the above
order. order.
The above SHOULD be computed using alpha=1/8 and beta=1/4 (as The above SHOULD be computed using alpha=1/8 and beta=1/4 (as
suggested in [JK88]). suggested in [JK88]).
After the computation, a host MUST update After the computation, a host MUST update
RTO <- SRTT + max (G, K*RTTVAR) RTO <- SRTT + max (G, K*RTTVAR)
(2.4) Whenever RTO is computed, if it is less than 1 second then the (2.4) Whenever RTO is computed, if it is less than 1 second, then the
RTO SHOULD be rounded up to 1 second. RTO SHOULD be rounded up to 1 second.
Traditionally, TCP implementations use coarse grain clocks to Traditionally, TCP implementations use coarse grain clocks to
measure the RTT and trigger the RTO, which imposes a large measure the RTT and trigger the RTO, which imposes a large
minimum value on the RTO. Research suggests that a large minimum value on the RTO. Research suggests that a large
minimum RTO is needed to keep TCP conservative and avoid minimum RTO is needed to keep TCP conservative and avoid
spurious retransmissions [AP99]. Therefore, this spurious retransmissions [AP99]. Therefore, this specification
specification requires a large minimum RTO as a conservative requires a large minimum RTO as a conservative approach, while
approach, while at the same time acknowledging that at some at the same time acknowledging that at some future point,
future point, research may show that a smaller minimum RTO is research may show that a smaller minimum RTO is acceptable or
acceptable or superior. superior.
(2.5) A maximum value MAY be placed on RTO provided it is at least 60 (2.5) A maximum value MAY be placed on RTO provided it is at least 60
seconds. seconds.
3 Taking RTT Samples 3. Taking RTT Samples
TCP MUST use Karn's algorithm [KP87] for taking RTT samples. That TCP MUST use Karn's algorithm [KP87] for taking RTT samples. That
is, RTT samples MUST NOT be made using segments that were is, RTT samples MUST NOT be made using segments that were
retransmitted (and thus for which it is ambiguous whether the reply retransmitted (and thus for which it is ambiguous whether the reply
was for the first instance of the packet or a later instance). The was for the first instance of the packet or a later instance). The
only case when TCP can safely take RTT samples from retransmitted only case when TCP can safely take RTT samples from retransmitted
segments is when the TCP timestamp option [JBB92] is employed, since segments is when the TCP timestamp option [JBB92] is employed, since
the timestamp option removes the ambiguity regarding which instance the timestamp option removes the ambiguity regarding which instance
of the data segment triggered the acknowledgment. of the data segment triggered the acknowledgment.
Traditionally, TCP implementations have taken one RTT measurement at Traditionally, TCP implementations have taken one RTT measurement at
a time (typically once per RTT). However, when using the timestamp a time (typically, once per RTT). However, when using the timestamp
option, each ACK can be used as an RTT sample. RFC 1323 [JBB92] option, each ACK can be used as an RTT sample. RFC 1323 [JBB92]
suggests that TCP connections utilizing large congestion windows suggests that TCP connections utilizing large congestion windows
should take many RTT samples per window of data to avoid aliasing should take many RTT samples per window of data to avoid aliasing
effects in the estimated RTT. A TCP implementation MUST take at effects in the estimated RTT. A TCP implementation MUST take at
least one RTT measurement per RTT (unless that is not possible per least one RTT measurement per RTT (unless that is not possible per
Karn's algorithm). Karn's algorithm).
For fairly modest congestion window sizes research suggests that For fairly modest congestion window sizes, research suggests that
timing each segment does not lead to a better RTT estimator [AP99]. timing each segment does not lead to a better RTT estimator [AP99].
Additionally, when multiple samples are taken per RTT the alpha and Additionally, when multiple samples are taken per RTT, the alpha and
beta defined in section 2 may keep an inadequate RTT history. A beta defined in Section 2 may keep an inadequate RTT history. A
method for changing these constants is currently an open research method for changing these constants is currently an open research
question. question.
4 Clock Granularity 4. Clock Granularity
There is no requirement for the clock granularity G used for There is no requirement for the clock granularity G used for
computing RTT measurements and the different state variables. computing RTT measurements and the different state variables.
However, if the K*RTTVAR term in the RTO calculation equals zero, However, if the K*RTTVAR term in the RTO calculation equals zero, the
the variance term MUST be rounded to G seconds (i.e., use the variance term MUST be rounded to G seconds (i.e., use the equation
equation given in step 2.3). given in step 2.3).
RTO <- SRTT + max (G, K*RTTVAR) RTO <- SRTT + max (G, K*RTTVAR)
Experience has shown that finer clock granularities (<= 100 msec) Experience has shown that finer clock granularities (<= 100 msec)
perform somewhat better than more coarse granularities. perform somewhat better than coarser granularities.
Note that [Jac88] outlines several clever tricks that can be used to Note that [Jac88] outlines several clever tricks that can be used to
obtain better precision from coarse granularity timers. These obtain better precision from coarse granularity timers. These
changes are widely implemented in current TCP implementations. changes are widely implemented in current TCP implementations.
5 Managing the RTO Timer 5. Managing the RTO Timer
An implementation MUST manage the retransmission timer(s) in such a An implementation MUST manage the retransmission timer(s) in such a
way that a segment is never retransmitted too early, i.e. less than way that a segment is never retransmitted too early, i.e., less than
one RTO after the previous transmission of that segment. one RTO after the previous transmission of that segment.
The following is the RECOMMENDED algorithm for managing the The following is the RECOMMENDED algorithm for managing the
retransmission timer: retransmission timer:
(5.1) Every time a packet containing data is sent (including a (5.1) Every time a packet containing data is sent (including a
retransmission), if the timer is not running, start it running retransmission), if the timer is not running, start it running
so that it will expire after RTO seconds (for the current value so that it will expire after RTO seconds (for the current value
of RTO). of RTO).
skipping to change at page 4, line 57 skipping to change at page 5, line 36
(5.3) When an ACK is received that acknowledges new data, restart the (5.3) When an ACK is received that acknowledges new data, restart the
retransmission timer so that it will expire after RTO seconds retransmission timer so that it will expire after RTO seconds
(for the current value of RTO). (for the current value of RTO).
When the retransmission timer expires, do the following: When the retransmission timer expires, do the following:
(5.4) Retransmit the earliest segment that has not been acknowledged (5.4) Retransmit the earliest segment that has not been acknowledged
by the TCP receiver. by the TCP receiver.
(5.5) The host MUST set RTO <- RTO * 2 ("back off the timer"). The (5.5) The host MUST set RTO <- RTO * 2 ("back off the timer"). The
maximum value discussed in (2.5) above may be used to provide an maximum value discussed in (2.5) above may be used to provide
upper bound to this doubling operation. an upper bound to this doubling operation.
(5.6) Start the retransmission timer, such that it expires after RTO (5.6) Start the retransmission timer, such that it expires after RTO
seconds (for the value of RTO after the doubling operation seconds (for the value of RTO after the doubling operation
outlined in 5.5). outlined in 5.5).
(5.7) If the timer expires awaiting the ACK of a SYN segment and the (5.7) If the timer expires awaiting the ACK of a SYN segment and the
TCP implementation is using an RTO less than 3 seconds, the RTO TCP implementation is using an RTO less than 3 seconds, the RTO
MUST be re-initialized to 3 seconds when data transmission MUST be re-initialized to 3 seconds when data transmission
begins (i.e., after the three-way handshake completes). begins (i.e., after the three-way handshake completes).
This represents a change from the previous version of this This represents a change from the previous version of this
document [PA00] and is discussed in Appendix A. document [PA00] and is discussed in Appendix A.
Note that after retransmitting, once a new RTT measurement is Note that after retransmitting, once a new RTT measurement is
obtained (which can only happen when new data has been sent and obtained (which can only happen when new data has been sent and
acknowledged), the computations outlined in section 2 are performed, acknowledged), the computations outlined in Section 2 are performed,
including the computation of RTO, which may result in "collapsing" including the computation of RTO, which may result in "collapsing"
RTO back down after it has been subject to exponential backoff RTO back down after it has been subject to exponential back off (rule
(rule 5.5). 5.5).
Note that a TCP implementation MAY clear SRTT and RTTVAR after Note that a TCP implementation MAY clear SRTT and RTTVAR after
backing off the timer multiple times as it is likely that the backing off the timer multiple times as it is likely that the current
current SRTT and RTTVAR are bogus in this situation. Once SRTT and SRTT and RTTVAR are bogus in this situation. Once SRTT and RTTVAR
RTTVAR are cleared they should be initialized with the next RTT are cleared, they should be initialized with the next RTT sample
sample taken per (2.2) rather than using (2.3). taken per (2.2) rather than using (2.3).
6 Security Considerations 6. Security Considerations
This document requires a TCP to wait for a given interval before This document requires a TCP to wait for a given interval before
retransmitting an unacknowledged segment. An attacker could cause a retransmitting an unacknowledged segment. An attacker could cause a
TCP sender to compute a large value of RTO by adding delay to a TCP sender to compute a large value of RTO by adding delay to a timed
timed packet's latency, or that of its acknowledgment. However, packet's latency, or that of its acknowledgment. However, the
the ability to add delay to a packet's latency often coincides with ability to add delay to a packet's latency often coincides with the
the ability to cause the packet to be lost, so it is difficult to ability to cause the packet to be lost, so it is difficult to see
see what an attacker might gain from such an attack that could cause what an attacker might gain from such an attack that could cause more
more damage than simply discarding some of the TCP connection's damage than simply discarding some of the TCP connection's packets.
packets.
The Internet to a considerable degree relies on the correct The Internet, to a considerable degree, relies on the correct
implementation of the RTO algorithm (as well as those described in implementation of the RTO algorithm (as well as those described in
RFC 5681) in order to preserve network stability and avoid RFC 5681) in order to preserve network stability and avoid congestion
congestion collapse. An attacker could cause TCP endpoints to collapse. An attacker could cause TCP endpoints to respond more
respond more aggressively in the face of congestion by forging aggressively in the face of congestion by forging acknowledgments for
acknowledgments for segments before the receiver has actually segments before the receiver has actually received the data, thus
received the data, thus lowering RTO to an unsafe value. But to do lowering RTO to an unsafe value. But to do so requires spoofing the
so requires spoofing the acknowledgments correctly, which is acknowledgments correctly, which is difficult unless the attacker can
difficult unless the attacker can monitor traffic along the path monitor traffic along the path between the sender and the receiver.
between the sender and the receiver. In addition, even if the In addition, even if the attacker can cause the sender's RTO to reach
attacker can cause the sender's RTO to reach too small a value, it too small a value, it appears the attacker cannot leverage this into
appears the attacker cannot leverage this into much of an attack much of an attack (compared to the other damage they can do if they
(compared to the other damage they can do if they can spoof packets can spoof packets belonging to the connection), since the sending TCP
belonging to the connection), since the sending TCP will still back will still back off its timer in the face of an incorrectly
off its timer in the face of an incorrectly transmitted packet's transmitted packet's loss due to actual congestion.
loss due to actual congestion.
7 IANA Considerations The security considerations in RFC 5681 [APB09] are also applicable
to this document.
None 7. Changes from RFC 2988
Acknowledgments This document reduces the initial RTO from the previous 3 seconds
[PA00] to 1 second, unless the SYN or the ACK of the SYN is lost, in
which case the default RTO is reverted to 3 seconds before data
transmission begins.
8. Acknowledgments
The RTO algorithm described in this memo was originated by Van The RTO algorithm described in this memo was originated by Van
Jacobson in [Jac88]. Jacobson in [Jac88].
Much of the data that motivated changing the initial RTO from 3 Much of the data that motivated changing the initial RTO from 3
seconds to 1 second came from Robert Love, Andre Broido and Mike seconds to 1 second came from Robert Love, Andre Broido, and Mike
Belshe. Belshe.
Normative References 9. References
[APB09] Allman, M., Paxson V. and E. Blanton, "TCP Congestion 9.1. Normative References
[APB09] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009. Control", RFC 5681, September 2009.
[Bra89] Braden, R., "Requirements for Internet Hosts -- [Bra89] Braden, R., Ed., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, October 1989. Communication Layers", STD 3, RFC 1122, October 1989.
[Bra97] Bradner, S., "Key words for use in RFCs to Indicate [Bra97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[JBB92] Jacobson, V., R. Braden, D. Borman, "TCP Extensions for High [JBB92] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions for
Performance", RFC 1323, May 1992. High Performance", RFC 1323, May 1992.
[Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, [Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981. September 1981.
Non-Normative References 9.2. Informative References
[AP99] Allman, M. and V. Paxson, "On Estimating End-to-End Network [AP99] Allman, M. and V. Paxson, "On Estimating End-to-End Network
Path Properties", SIGCOMM 99. Path Properties", SIGCOMM 99.
[Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century", [Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century",
http://www.ietf.org/proceedings/75/slides/tcpm-1.pdf, July http://www.ietf.org/proceedings/75/slides/tcpm-1.pdf, July
2009. 2009.
[SLS09] Schulman, A., Levin, D., and Spring, N., "CRAWDAD data set [SLS09] Schulman, A., Levin, D., and Spring, N., "CRAWDAD data set
umd/sigcomm2008 (v. 2009-03-02)", umd/sigcomm2008 (v. 2009-03-02)",
http://crawdad.cs.dartmouth.edu/umd/sigcomm2008, March, http://crawdad.cs.dartmouth.edu/umd/sigcomm2008, March, 2009.
2009.
[HKA04] Henderson, T., Kotz, D., and Abyzov, I., "CRAWDAD trace [HKA04] Henderson, T., Kotz, D., and Abyzov, I., "CRAWDAD trace
dartmouth/campus/tcpdump/fall03 (v. 2004-11-09)", dartmouth/campus/tcpdump/fall03 (v. 2004-11-09)",
http://crawdad.cs.dartmouth.edu/dartmouth/campus/tcpdump/fall03, http://crawdad.cs.dartmouth.edu/dartmouth/campus/
November 2004. tcpdump/fall03, November 2004.
[Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer
Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. Communication Review, vol. 18, no. 4, pp. 314-329, Aug.
1988.
[JK88] Jacobson, V. and M. Karels, "Congestion Avoidance and [JK88] Jacobson, V. and M. Karels, "Congestion Avoidance and
Control", ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. Control", ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
[KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time
Estimates in Reliable Transport Protocols", SIGCOMM 87. Estimates in Reliable Transport Protocols", SIGCOMM 87.
[PA00] Paxson, V. and M. Allman, "Computing TCP's Retransmission [PA00] Paxson, V. and M. Allman, "Computing TCP's Retransmission
Timer", RFC 2988, November 2000. Timer", RFC 2988, November 2000.
Author's Addresses Appendix A. Rationale for Lowering the Initial RTO
Choosing a reasonable initial RTO requires balancing two competing
considerations:
1. The initial RTO should be sufficiently large to cover most of the
end-to-end paths to avoid spurious retransmissions and their
associated negative performance impact.
2. The initial RTO should be small enough to ensure a timely recovery
from packet loss occurring before an RTT sample is taken.
Traditionally, TCP has used 3 seconds as the initial RTO [Bra89]
[PA00]. This document calls for lowering this value to 1 second
using the following rationale:
- Modern networks are simply faster than the state-of-the-art was at
the time the initial RTO of 3 seconds was defined.
- Studies have found that the round-trip times of more than 97.5% of
the connections observed in a large scale analysis were less than 1
second [Chu09], suggesting that 1 second meets criterion 1 above.
- In addition, the studies observed retransmission rates within the
three-way handshake of roughly 2%. This shows that reducing the
initial RTO has benefit to a non-negligible set of connections.
- However, roughly 2.5% of the connections studied in [Chu09] have an
RTT longer than 1 second. For those connections, a 1 second
initial RTO guarantees a retransmission during connection
establishment (needed or not).
When this happens, this document calls for reverting to an initial
RTO of 3 seconds for the data transmission phase. Therefore, the
implications of the spurious retransmission are modest: (1) an
extra SYN is transmitted into the network, and (2) according to RFC
5681 [APB09] the initial congestion window will be limited to 1
segment. While (2) clearly puts such connections at a
disadvantage, this document at least resets the RTO such that the
connection will not continually run into problems with a short
timeout. (Of course, if the RTT is more than 3 seconds, the
connection will still encounter difficulties. But that is not a
new issue for TCP.)
In addition, we note that when using timestamps, TCP will be able
to take an RTT sample even in the presence of a spurious
retransmission, facilitating convergence to a correct RTT estimate
when the RTT exceeds 1 second.
As an additional check on the results presented in [Chu09], we
analyzed packet traces of client behavior collected at four different
vantage points at different times, as follows:
Name Dates Pkts. Cnns. Clnts. Servs.
--------------------------------------------------------
LBL-1 Oct/05--Mar/06 292M 242K 228 74K
LBL-2 Nov/09--Feb/10 1.1B 1.2M 1047 38K
ICSI-1 Sep/11--18/07 137M 2.1M 193 486K
ICSI-2 Sep/11--18/08 163M 1.9M 177 277K
ICSI-3 Sep/14--21/09 334M 3.1M 170 253K
ICSI-4 Sep/11--18/10 298M 5M 183 189K
Dartmouth Jan/4--21/04 1B 4M 3782 132K
SIGCOMM Aug/17--21/08 11.6M 133K 152 29K
The "LBL" data was taken at the Lawrence Berkeley National
Laboratory, the "ICSI" data from the International Computer Science
Institute, the "SIGCOMM" data from the wireless network that served
the attendees of SIGCOMM 2008, and the "Dartmouth" data was collected
from Dartmouth College's wireless network. The latter two datasets
are available from the CRAWDAD data repository [HKA04] [SLS09]. The
table lists the dates of the data collections, the number of packets
collected, the number of TCP connections observed, the number of
local clients monitored, and the number of remote servers contacted.
We consider only connections initiated near the tracing vantage
point.
Analysis of these datasets finds the prevalence of retransmitted SYNs
to be between 0.03% (ICSI-4) to roughly 2% (LBL-1 and Dartmouth).
We then analyzed the data to determine the number of additional and
spurious retransmissions that would have been incurred if the initial
RTO was assumed to be 1 second. In most of the datasets, the
proportion of connections with spurious retransmits was less than
0.1%. However, in the Dartmouth dataset, approximately 1.1% of the
connections would have sent a spurious retransmit with a lower
initial RTO. We attribute this to the fact that the monitored
network is wireless and therefore susceptible to additional delays
from RF effects.
Finally, there are obviously performance benefits from retransmitting
lost SYNs with a reduced initial RTO. Across our datasets, the
percentage of connections that retransmitted a SYN and would realize
at least a 10% performance improvement by using the smaller initial
RTO specified in this document ranges from 43% (LBL-1) to 87%
(ICSI-4). The percentage of connections that would realize at least
a 50% performance improvement ranges from 17% (ICSI-1 and SIGCOMM) to
73% (ICSI-4).
From the data to which we have access, we conclude that the lower
initial RTO is likely to be beneficial to many connections, and
harmful to relatively few.
Authors' Addresses
Vern Paxson Vern Paxson
ICSI ICSI/UC Berkeley
1947 Center Street 1947 Center Street
Suite 600 Suite 600
Berkeley, CA 94704-1198 Berkeley, CA 94704-1198
Phone: 510-666-2882 Phone: 510-666-2882
EMail: vern@icir.org EMail: vern@icir.org
http://www.icir.org/vern/ http://www.icir.org/vern/
Mark Allman Mark Allman
ICSI ICSI
skipping to change at page 7, line 42 skipping to change at page 11, line 37
Phone: 440-235-1792 Phone: 440-235-1792
EMail: mallman@icir.org EMail: mallman@icir.org
http://www.icir.org/mallman/ http://www.icir.org/mallman/
H.K. Jerry Chu H.K. Jerry Chu
Google, Inc. Google, Inc.
1600 Amphitheatre Parkway 1600 Amphitheatre Parkway
Mountain View, CA 94043 Mountain View, CA 94043
Phone: 650-253-3010 Phone: 650-253-3010
Email: hkchu@google.com EMail: hkchu@google.com
Matt Sargent Matt Sargent
Case Western Reserve University Olin Building Case Western Reserve University
Olin Building
10900 Euclid Avenue 10900 Euclid Avenue
Room 505 Room 505
Cleveland, OH 44106 Cleveland, OH 44106
Phone: 440-223-5932 Phone: 440-223-5932
Email: mts71@case.edu EMail: mts71@case.edu
Appendix A
Choosing a reasonable initial RTO requires balancing two
competing considerations:
1. The initial RTO should be sufficiently large to cover most of the
end-to-end paths to avoid spurious retransmissions and their
associated negative performance impact.
2. The initial RTO should be small enough to ensure a timely
recovery from packet loss occurring before an RTT sample is
taken.
Traditionally, TCP has used 3 seconds as the initial RTO
[Bra89,PA00]. This document calls for lowering this value to 1
second using the following rationale:
- Modern networks are simply faster than the state-of-the-art was
at the time the initial RTO of 3 seconds was defined.
- Studies have found that the round-trip times of more than 97.5% of
the connections observed in a large scale analysis were less than
1 second [Chu09], suggesting that 1 second meets criteria 1 above.
- In addition, the studies observed retransmission rates within
the three-way handshake of roughly 2%. This shows that reducing
the initial RTO has benefit to a non-negligible set of connections.
- However, roughly 2.5% of the connections studied in [Chu09] have
an RTT longer than 1 second. For those connections, a 1 second
initial RTO guarantees a retransmission during connection
establishment (needed or not).
When this happens, this document calls for reverting to an initial
RTO of 3 seconds for the data transmission phase. Therefore, the
implications of the spurious retransmission are modest: (1) an
extra SYN is transmitted into the network, and (2) according to
[RFC5681] the initial congestion window will be limited to 1
segment. While (2) clearly puts such connections at a
disadvantage, this document at least resets the RTO such that the
connection will not continually run into problems with a short
timeout. (Of course, if the RTT is more than three seconds, the
connection will still encounter difficulties. But that is not a
new issue for TCP.)
In addition, we note that when using timestamps, TCP will be able
to take an RTT sample even in the presence of a spurious
retransmission, facilitating convergence to a correct RTT estimate
when the RTT exceeds 1 second.
As an additional check on the results presented in [Chu09], we
analyzed packet traces of client behavior collected at four
different vantage points at different times, as follows:
Name Dates Pkts. Cnns. Clnts. Servs.
--------------------------------------------------------
LBL-1 Oct/05--Mar/06 292M 242K 228 74K
LBL-2 Nov/09--Feb/10 1.1B 1.2M 1047 38K
ICSI-1 Sep/11--18/07 137M 2.1M 193 486K
ICSI-2 Sep/11--18/08 163M 1.9M 177 277K
ICSI-3 Sep/14--21/09 334M 3.1M 170 253K
ICSI-4 Sep/11--18/10 298M 5M 183 189K
Dartmouth Jan/4--21/04 1B 4M 3782 132K
SIGCOMM Aug/17--21/08 11.6M 133K 152 29K
The "LBL" data was taken at the Lawrence Berkeley National
Laboratory, the "ICSI" data from the International Computer Science
Institute, the "SIGCOMM" data from the wireless network that served
the attendees of SIGCOMM 2008, and the "Dartmouth" data was
collected from Dartmouth College's wireless network. The latter two
datasets are available from the CRAWDAD data repository
[HKA04,SLS09]. The table lists the dates of the data collections,
the number of packets collected, the number of TCP connections
observed, the number of local clients monitored, and the number of
remote servers contacted. We consider only connections initiated
near the tracing vantage point.
Analysis of these datasets finds the prevalence of retransmitted
SYNs to be between 0.03% (ICSI-4) to roughly 2% (LBL-1 and
Dartmouth).
We then analyzed the data to determine the number of
additional---and spurious---retransmissions that would have been
incurred if the initial RTO was assumed to be 1 second. In most of
the datasets, the proportion of connections with spurious
retransmits was less than 0.1%. However, in the Dartmouth dataset
approximately 1.1% of the connections would have sent a spurious
retransmit with a lower initial RTO. We attribute this to the fact
that the monitored network is wireless and therefore susceptible to
additional delays from RF effects.
Finally, there are obviously performance benefits from
retransmitting lost SYNs with a reduced initial RTO. Across our
datasets, the percentage of connections that retransmitted a SYN and
would realize at least a 10% performance improvement by using the
smaller initial RTO specified in this document ranges from 43%
(LBL-1) to 87% (ICSI-4). The percentage of connections that would
realize at least a 50% performance improvement ranges from 17%
(ICSI-1 and SIGCOMM) to 73% (ICSI-4).
From the data to which we have access, we conclude that the lower
initial RTO is likely to be beneficial to many connections, and
harmful to relatively few.
 End of changes. 52 change blocks. 
122 lines changed or deleted 223 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/