draft-paxson-tcpm-rfc2988bis-01.txt   draft-paxson-tcpm-rfc2988bis-02.txt 
Internet Engineering Task Force V. Paxson Internet Engineering Task Force V. Paxson
INTERNET DRAFT ICSI/UC Berkeley INTERNET DRAFT ICSI/UC Berkeley
File: draft-paxson-tcpm-rfc2988bis-01.txt M. Allman File: draft-paxson-tcpm-rfc2988bis-02.txt M. Allman
ICSI Intended status: Proposed Standard ICSI
J. Chu J. Chu
Google Google
M. Sargent M. Sargent
CWRU CWRU
December 6, 2010 March 14, 2011
Computing TCP's Retransmission Timer Computing TCP's Retransmission Timer
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79. the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 36 skipping to change at page 1, line 36
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on June 6, 2011. This Internet-Draft will expire on September 14, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the BSD License. warranty as described in the Simplified BSD License.
Abstract Abstract
This document defines the standard algorithm that Transmission This document defines the standard algorithm that Transmission
Control Protocol (TCP) senders are required to use to compute and Control Protocol (TCP) senders are required to use to compute and
manage their retransmission timer. It expands on the discussion in manage their retransmission timer. It expands on the discussion in
section 4.2.3.1 of RFC 1122 and upgrades the requirement of section 4.2.3.1 of RFC 1122 and upgrades the requirement of
supporting the algorithm from a SHOULD to a MUST. supporting the algorithm from a SHOULD to a MUST.
1 Introduction 1 Introduction
The Transmission Control Protocol (TCP) [Pos81] uses a retransmission The Transmission Control Protocol (TCP) [Pos81] uses a retransmission
timer to ensure data delivery in the absence of any feedback from the timer to ensure data delivery in the absence of any feedback from the
remote data receiver. The duration of this timer is referred to as remote data receiver. The duration of this timer is referred to as
RTO (retransmission timeout). RFC 1122 [Bra89] specifies that the RTO (retransmission timeout). RFC 1122 [Bra89] specifies that the
RTO should be calculated as outlined in [Jac88]. RTO should be calculated as outlined in [Jac88].
This document codifies the algorithm for setting the RTO. In This document codifies the algorithm for setting the RTO. In
addition, this document expands on the discussion in section 4.2.3.1 addition, this document expands on the discussion in section 4.2.3.1
of RFC 1122 and upgrades the requirement of supporting the algorithm of RFC 1122 and upgrades the requirement of supporting the algorithm
from a SHOULD to a MUST. RFC 2581 [APS99] outlines the algorithm TCP from a SHOULD to a MUST. RFC 5681 [APB09] outlines the algorithm TCP
uses to begin sending after the RTO expires and a retransmission is uses to begin sending after the RTO expires and a retransmission is
sent. This document does not alter the behavior outlined in RFC 2581 sent. This document does not alter the behavior outlined in RFC 5681
[APS99]. [APB09].
In some situations it may be beneficial for a TCP sender to be more In some situations it may be beneficial for a TCP sender to be more
conservative than the algorithms detailed in this document allow. conservative than the algorithms detailed in this document allow.
However, a TCP MUST NOT be more aggressive than the following However, a TCP MUST NOT be more aggressive than the following
algorithms allow. algorithms allow.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [Bra97]. document are to be interpreted as described in [Bra97].
skipping to change at page 2, line 49 skipping to change at page 2, line 49
The rules governing the computation of SRTT, RTTVAR, and RTO are as The rules governing the computation of SRTT, RTTVAR, and RTO are as
follows: follows:
(2.1) Until a round-trip time (RTT) measurement has been made for a (2.1) Until a round-trip time (RTT) measurement has been made for a
segment sent between the sender and receiver, the sender SHOULD segment sent between the sender and receiver, the sender SHOULD
set RTO <- 1 second, though the "backing off" on repeated set RTO <- 1 second, though the "backing off" on repeated
retransmission discussed in (5.5) still applies. retransmission discussed in (5.5) still applies.
Note that the previous version of this document used an Note that the previous version of this document used an
initial RTO of 3 seconds [RFC2988]. A TCP implementation MAY initial RTO of 3 seconds [PA00]. A TCP implementation MAY
still use this value (or any other value > 1 second). This still use this value (or any other value > 1 second). This
change in the lower bound on the initial RTO is discussed in change in the lower bound on the initial RTO is discussed in
further detail in Appendix A. further detail in Appendix A.
(2.2) When the first RTT measurement R is made, the host MUST set (2.2) When the first RTT measurement R is made, the host MUST set
SRTT <- R SRTT <- R
RTTVAR <- R/2 RTTVAR <- R/2
RTO <- SRTT + max (G, K*RTTVAR) RTO <- SRTT + max (G, K*RTTVAR)
skipping to change at page 5, line 16 skipping to change at page 5, line 16
(5.6) Start the retransmission timer, such that it expires after RTO (5.6) Start the retransmission timer, such that it expires after RTO
seconds (for the value of RTO after the doubling operation seconds (for the value of RTO after the doubling operation
outlined in 5.5). outlined in 5.5).
(5.7) If the timer expires awaiting the ACK of a SYN segment and the (5.7) If the timer expires awaiting the ACK of a SYN segment and the
TCP implementation is using an RTO less than 3 seconds, the RTO TCP implementation is using an RTO less than 3 seconds, the RTO
MUST be re-initialized to 3 seconds when data transmission MUST be re-initialized to 3 seconds when data transmission
begins (i.e., after the three-way handshake completes). begins (i.e., after the three-way handshake completes).
This represents a change from the previous version of this This represents a change from the previous version of this
document [RFC2988] and is discussed in Appendix A. document [PA00] and is discussed in Appendix A.
Note that after retransmitting, once a new RTT measurement is Note that after retransmitting, once a new RTT measurement is
obtained (which can only happen when new data has been sent and obtained (which can only happen when new data has been sent and
acknowledged), the computations outlined in section 2 are performed, acknowledged), the computations outlined in section 2 are performed,
including the computation of RTO, which may result in "collapsing" including the computation of RTO, which may result in "collapsing"
RTO back down after it has been subject to exponential backoff RTO back down after it has been subject to exponential backoff
(rule 5.5). (rule 5.5).
Note that a TCP implementation MAY clear SRTT and RTTVAR after Note that a TCP implementation MAY clear SRTT and RTTVAR after
backing off the timer multiple times as it is likely that the backing off the timer multiple times as it is likely that the
skipping to change at page 5, line 45 skipping to change at page 5, line 45
TCP sender to compute a large value of RTO by adding delay to a TCP sender to compute a large value of RTO by adding delay to a
timed packet's latency, or that of its acknowledgment. However, timed packet's latency, or that of its acknowledgment. However,
the ability to add delay to a packet's latency often coincides with the ability to add delay to a packet's latency often coincides with
the ability to cause the packet to be lost, so it is difficult to the ability to cause the packet to be lost, so it is difficult to
see what an attacker might gain from such an attack that could cause see what an attacker might gain from such an attack that could cause
more damage than simply discarding some of the TCP connection's more damage than simply discarding some of the TCP connection's
packets. packets.
The Internet to a considerable degree relies on the correct The Internet to a considerable degree relies on the correct
implementation of the RTO algorithm (as well as those described in implementation of the RTO algorithm (as well as those described in
RFC 2581) in order to preserve network stability and avoid RFC 5681) in order to preserve network stability and avoid
congestion collapse. An attacker could cause TCP endpoints to congestion collapse. An attacker could cause TCP endpoints to
respond more aggressively in the face of congestion by forging respond more aggressively in the face of congestion by forging
acknowledgments for segments before the receiver has actually acknowledgments for segments before the receiver has actually
received the data, thus lowering RTO to an unsafe value. But to do received the data, thus lowering RTO to an unsafe value. But to do
so requires spoofing the acknowledgments correctly, which is so requires spoofing the acknowledgments correctly, which is
difficult unless the attacker can monitor traffic along the path difficult unless the attacker can monitor traffic along the path
between the sender and the receiver. In addition, even if the between the sender and the receiver. In addition, even if the
attacker can cause the sender's RTO to reach too small a value, it attacker can cause the sender's RTO to reach too small a value, it
appears the attacker cannot leverage this into much of an attack appears the attacker cannot leverage this into much of an attack
(compared to the other damage they can do if they can spoof packets (compared to the other damage they can do if they can spoof packets
skipping to change at page 6, line 21 skipping to change at page 6, line 21
The RTO algorithm described in this memo was originated by Van The RTO algorithm described in this memo was originated by Van
Jacobson in [Jac88]. Jacobson in [Jac88].
Much of the data that motivated changing the initial RTO from 3 Much of the data that motivated changing the initial RTO from 3
seconds to 1 second came from Robert Love, Andre Broido and Mike seconds to 1 second came from Robert Love, Andre Broido and Mike
Belshe. Belshe.
Normative References Normative References
[APS99] Allman, M., Paxson V. and W. Stevens, "TCP Congestion [APB09] Allman, M., Paxson V. and E. Blanton, "TCP Congestion
Control", RFC 2581, April 1999. Control", RFC 5681, September 2009.
[Bra89] Braden, R., "Requirements for Internet Hosts -- [Bra89] Braden, R., "Requirements for Internet Hosts --
Communication Layers", STD 3, RFC 1122, October 1989. Communication Layers", STD 3, RFC 1122, October 1989.
[Bra97] Bradner, S., "Key words for use in RFCs to Indicate [Bra97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[JBB92] Jacobson, V., R. Braden, D. Borman, "TCP Extensions for High
Performance", RFC 1323, May 1992.
[Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, [Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981. September 1981.
Non-Normative References Non-Normative References
[AP99] Allman, M. and V. Paxson, "On Estimating End-to-End Network [AP99] Allman, M. and V. Paxson, "On Estimating End-to-End Network
Path Properties", SIGCOMM 99. Path Properties", SIGCOMM 99.
[Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century", [Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century",
http://www.ietf.org/proceedings/75/slides/tcpm-1.pdf, July http://www.ietf.org/proceedings/75/slides/tcpm-1.pdf, July
skipping to change at page 7, line 8 skipping to change at page 7, line 11
[Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer
Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988.
[JK88] Jacobson, V. and M. Karels, "Congestion Avoidance and [JK88] Jacobson, V. and M. Karels, "Congestion Avoidance and
Control", ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. Control", ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
[KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time
Estimates in Reliable Transport Protocols", SIGCOMM 87. Estimates in Reliable Transport Protocols", SIGCOMM 87.
[PA00] Paxson, V. and M. Allman, "Computing TCP's Retransmission
Timer", RFC 2988, November 2000.
Author's Addresses Author's Addresses
Vern Paxson Vern Paxson
ICSI ICSI
1947 Center Street 1947 Center Street
Suite 600 Suite 600
Berkeley, CA 94704-1198 Berkeley, CA 94704-1198
Phone: 510-666-2882 Phone: 510-666-2882
EMail: vern@icir.org EMail: vern@icir.org
skipping to change at page 8, line 11 skipping to change at page 8, line 16
1. The initial RTO should be sufficiently large to cover most of the 1. The initial RTO should be sufficiently large to cover most of the
end-to-end paths to avoid spurious retransmissions and their end-to-end paths to avoid spurious retransmissions and their
associated negative performance impact. associated negative performance impact.
2. The initial RTO should be small enough to ensure a timely 2. The initial RTO should be small enough to ensure a timely
recovery from packet loss occurring before an RTT sample is recovery from packet loss occurring before an RTT sample is
taken. taken.
Traditionally, TCP has used 3 seconds as the initial RTO Traditionally, TCP has used 3 seconds as the initial RTO
[RFC1122,RFC2988]. This document calls for lowering this value to 1 [Bra89,PA00]. This document calls for lowering this value to 1
second using the following rationale: second using the following rationale:
- Modern networks are simply faster than the state-of-the-art was - Modern networks are simply faster than the state-of-the-art was
at the time the initial RTO of 3 seconds was defined. at the time the initial RTO of 3 seconds was defined.
- Studies have found that the round-trip times of more than 97.5% of - Studies have found that the round-trip times of more than 97.5% of
the connections observed in a large scale analysis were less than the connections observed in a large scale analysis were less than
1 second [Chu09], suggesting that 1 second meets criteria 1 above. 1 second [Chu09], suggesting that 1 second meets criteria 1 above.
- In addition, the studies observed retransmission rates within - In addition, the studies observed retransmission rates within
 End of changes. 14 change blocks. 
15 lines changed or deleted 21 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/