draft-ietf-tcpm-rfc4138bis-00.txt   draft-ietf-tcpm-rfc4138bis-01.txt 
Internet Engineering Task Force P. Sarolahti Internet Engineering Task Force P. Sarolahti
INTERNET-DRAFT Nokia Research Center INTERNET-DRAFT Nokia Research Center
draft-ietf-tcpm-rfc4138bis-00.txt M. Kojo draft-ietf-tcpm-rfc4138bis-01.txt M. Kojo
Expires: December 2007 University of Helsinki Expires: May 2008 University of Helsinki
K. Yamamoto K. Yamamoto
M. Hata M. Hata
NTT Docomo NTT Docomo
18 November 2007
Forward RTO-Recovery (F-RTO): An Algorithm for Detecting Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
Spurious Retransmission Timeouts with TCP Spurious Retransmission Timeouts with TCP
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
skipping to change at page 1, line 35 skipping to change at page 1, line 38
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 2007. This Internet-Draft will expire on May 2008.
Abstract Abstract
Spurious retransmission timeouts cause suboptimal TCP performance Spurious retransmission timeouts cause suboptimal TCP performance
because they often result in unnecessary retransmission of the last because they often result in unnecessary retransmission of the last
window of data. This document describes the F-RTO detection window of data. This document describes the F-RTO detection
algorithm for detecting spurious TCP retransmission timeouts. F-RTO algorithm for detecting spurious TCP retransmission timeouts. F-RTO
is a TCP sender-only algorithm that does not require any TCP options is a TCP sender-only algorithm that does not require any TCP options
to operate. After retransmitting the first unacknowledged segment to operate. After retransmitting the first unacknowledged segment
triggered by a timeout, the F-RTO algorithm of the TCP sender triggered by a timeout, the F-RTO algorithm of the TCP sender
monitors the incoming acknowledgments to determine whether the monitors the incoming acknowledgments to determine whether the
timeout was spurious. It then decides whether to send new segments timeout was spurious. It then decides whether to send new segments
or retransmit unacknowledged segments. The algorithm effectively or retransmit unacknowledged segments. The algorithm effectively
helps to avoid additional unnecessary retransmissions and thereby helps to avoid additional unnecessary retransmissions and thereby
improves TCP performance in the case of a spurious timeout. improves TCP performance in the case of a spurious timeout.
Table of Contents Table of Contents
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Conventions and Terminology. . . . . . . . . . . . . . . 5 1.1. Conventions and Terminology. . . . . . . . . . . . . . . 5
2. F-RTO Algorithm . . . . . . . . . . . . . . . . . . . . . . . 5 2. Basic F-RTO Algorithm . . . . . . . . . . . . . . . . . . . . 5
2.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . . 6 2.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . . 6
2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 7
3. Taking Actions after Detecting Spurious RTO . . . . . . . . . 9 3. SACK-Enhanced Version of the F-RTO Algorithm. . . . . . . . . 9
4. Evaluation of RFC 4138 and Differences to this 4. Taking Actions after Detecting Spurious RTO . . . . . . . . . 11
Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5. Evaluation of RFC 4138 and Differences to this
5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 14
A. Discussion of Window-Limited Cases. . . . . . . . . . . . . . 12 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
B. List of Changes . . . . . . . . . . . . . . . . . . . . . . . 13 A. Discussion of Window-Limited Cases. . . . . . . . . . . . . . 14
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 B. List of Changes . . . . . . . . . . . . . . . . . . . . . . . 15
Normative References . . . . . . . . . . . . . . . . . . . . . . 13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Informative References . . . . . . . . . . . . . . . . . . . . . 14 Normative References . . . . . . . . . . . . . . . . . . . . . . 16
AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . 16 Informative References . . . . . . . . . . . . . . . . . . . . . 17
Full Copyright Statement . . . . . . . . . . . . . . . . . . . . 17 AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . 18
Intellectual Property. . . . . . . . . . . . . . . . . . . . . . 17 Full Copyright Statement . . . . . . . . . . . . . . . . . . . . 20
Intellectual Property. . . . . . . . . . . . . . . . . . . . . . 20
1. Introduction 1. Introduction
The Transmission Control Protocol (TCP) [Pos81] has two methods for The Transmission Control Protocol (TCP) [Pos81] has two methods for
triggering retransmissions. First, the TCP sender relies on triggering retransmissions. First, the TCP sender relies on
incoming duplicate ACKs, which indicate that the receiver is missing incoming duplicate ACKs, which indicate that the receiver is missing
some of the data. After a required number of successive duplicate some of the data. After a required number of successive duplicate
ACKs have arrived at the sender, it retransmits the first ACKs have arrived at the sender, it retransmits the first
unacknowledged segment [APS99] and continues with a loss recovery unacknowledged segment [APS99] and continues with a loss recovery
algorithm such as NewReno [FHG04] or SACK-based loss recovery algorithm such as NewReno [FHG04] or SACK-based loss recovery
skipping to change at page 4, line 11 skipping to change at page 4, line 13
whole window of segments during the RTO recovery. Furthermore, whole window of segments during the RTO recovery. Furthermore,
after a spurious retransmission timeout, a conventional TCP sender after a spurious retransmission timeout, a conventional TCP sender
increases the congestion window on each late acknowledgment in slow increases the congestion window on each late acknowledgment in slow
start. This injects a large number of data segments into the start. This injects a large number of data segments into the
network within one round-trip time, thus violating the packet network within one round-trip time, thus violating the packet
conservation principle [Jac88]. conservation principle [Jac88].
There are a number of potential reasons for spurious retransmission There are a number of potential reasons for spurious retransmission
timeouts. First, some mobile networking technologies involve sudden timeouts. First, some mobile networking technologies involve sudden
delay spikes on transmission because of actions taken during a hand- delay spikes on transmission because of actions taken during a hand-
off. Second, on a low-bandwidth link the arrival of competing off. Second, a hand-off may take place from a low latency path to a
traffic (possibly with higher priority), or some other change in high latency path, suddenly increasing the round-trip time beyond
available bandwidth, can cause a sudden increase of round-trip time. the current RTO value. Third, on a low-bandwidth link the arrival
This may trigger a spurious retransmission timeout. A persistently of competing traffic (possibly with higher priority), or some other
reliable link layer can also cause a sudden delay when a data frame change in available bandwidth, can cause a sudden increase of the
and several retransmissions of it are lost for some reason. This round-trip time. This may trigger a spurious retransmission
document does not distinguish between the different causes of such a timeout. A persistently reliable link layer can also cause a sudden
delay spike. Rather, it discusses the spurious retransmission delay when a data frame and several retransmissions of it are lost
timeouts caused by a delay spike in general. for some reason. This document does not distinguish between the
different causes of such a delay spike. Rather, it discusses the
spurious retransmission timeouts caused by a delay spike in general.
This document describes the F-RTO detection algorithm. It is based This document describes the F-RTO detection algorithm. It is based
on the detection mechanism of the "Forward RTO-Recovery" (F-RTO) on the detection mechanism of the "Forward RTO-Recovery" (F-RTO)
algorithm [SKR03] that is used for detecting spurious retransmission algorithm [SKR03] that is used for detecting spurious retransmission
timeouts and thus avoids unnecessary retransmissions following the timeouts and thus avoids unnecessary retransmissions following the
retransmission timeout. When the timeout is not spurious, the F-RTO retransmission timeout. When the timeout is not spurious, the F-RTO
algorithm reverts back to the conventional RTO recovery algorithm, algorithm reverts back to the conventional RTO recovery algorithm,
and therefore has similar behavior and performance. In contrast to and therefore has similar behavior and performance. In contrast to
alternative algorithms proposed for detecting unnecessary alternative algorithms proposed for detecting unnecessary
retransmissions (Eifel [LK00], [LM03] and DSACK-based algorithms retransmissions (Eifel [LK00], [LM03] and DSACK-based algorithms
skipping to change at page 4, line 47 skipping to change at page 4, line 51
has received a duplicate segment, enabling the sender to detect has received a duplicate segment, enabling the sender to detect
afterwards whether it has retransmitted segments unnecessarily. The afterwards whether it has retransmitted segments unnecessarily. The
F-RTO algorithm only attempts to detect and avoid unnecessary F-RTO algorithm only attempts to detect and avoid unnecessary
retransmissions after an RTO. Eifel and DSACK can also be used for retransmissions after an RTO. Eifel and DSACK can also be used for
detecting unnecessary retransmissions caused by other events, such detecting unnecessary retransmissions caused by other events, such
as packet reordering. as packet reordering.
When an RTO expires, the F-RTO sender retransmits the first When an RTO expires, the F-RTO sender retransmits the first
unacknowledged segment as usual [APS99]. Deviating from the normal unacknowledged segment as usual [APS99]. Deviating from the normal
operation after a timeout, it then tries to transmit new, previously operation after a timeout, it then tries to transmit new, previously
unsent data, for the first acknowledgment that arrives after the unsent data for the first acknowledgment that arrives after the
timeout, given that the acknowledgment advances the window. If the timeout, given that the acknowledgment advances the window. If the
second acknowledgment that arrives after the timeout advances the second acknowledgment that arrives after the timeout advances the
window (i.e., acknowledges data that was not retransmitted), the F- window (i.e., acknowledges data that was not retransmitted), the F-
RTO sender declares the timeout spurious and exits the RTO recovery. RTO sender declares the timeout spurious and exits the RTO recovery.
However, if either of these two acknowledgments is a duplicate ACK, However, if either of these two acknowledgments is a duplicate ACK,
there will not be sufficient evidence of a spurious timeout. there will not be sufficient evidence of a spurious timeout.
Therefore, the F-RTO sender retransmits the unacknowledged segments Therefore, the F-RTO sender retransmits the unacknowledged segments
in slow start similarly to the traditional algorithm. in slow start similarly to the traditional algorithm.
With a SACK-enhanced version of the F-RTO algorithm, spurious With a SACK-enhanced version of the F-RTO algorithm, spurious
timeouts may be detected even if duplicate ACKs arrive after an RTO timeouts may be detected even if duplicate ACKs arrive after an RTO
retransmission. In addition, the F-RTO algorithm can also be retransmission. Even though this document only specifies F-RTO
applied to the Stream Control Transmission Protocol (SCTP) [Ste00], algorithm for TCP, the algorithm can also be applied to the Stream
because SCTP has acknowledgment and packet retransmission concepts Control Transmission Protocol (SCTP) [Ste00] that has acknowledgment
similar to TCP. This document specifies and discusses only the and packet retransmission concepts similar to TCP. Considerations on
basic F-RTO algorithm. The SACK-enhanced version of the F-RTO applying F-RTO for SCTP are discussed in RFC 4138 [SK05].
algorithm and how the F-RTO algorithm can be applied to the SCTP
protocol are discussed in RFC 4138 [SK05].
This document is organized as follows. Section 2 describes the This document is organized as follows. Section 2 describes the
basic F-RTO algorithm. Section 3 discusses the possible actions to basic F-RTO algorithm, and the SACK-enhanced F-RTO algorithm is
be taken after detecting a spurious RTO and Section 4 discusses the given in Section 3. Section 4 discusses the possible actions to be
taken after detecting a spurious RTO and Section 5 discusses the
security considerations. security considerations.
1.1. Conventions and Terminology 1.1. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119 document are to be interpreted as described in BCP 14, RFC 2119
[RFC2119] and indicate requirement levels for protocols. [RFC2119] and indicate requirement levels for protocols.
2. F-RTO Algorithm 2. Basic F-RTO Algorithm
A timeout is considered spurious if it would have been avoided had A timeout is considered spurious if it would have been avoided had
the sender waited longer for an acknowledgment to arrive [LM03]. F- the sender waited longer for an acknowledgment to arrive [LM03]. F-
RTO affects the TCP sender behavior only after a retransmission RTO affects the TCP sender behavior only after a retransmission
timeout. Otherwise, the TCP behavior remains the same. When the timeout. Otherwise, the TCP behavior remains the same. When the
RTO expires, the F-RTO algorithm monitors incoming acknowledgments RTO expires, the F-RTO algorithm monitors incoming acknowledgments
and if the TCP sender gets an acknowledgment for a segment that was and if the TCP sender gets an acknowledgment for a segment that was
not retransmitted due to timeout, the F-RTO algorithm declares a not retransmitted due to timeout, the F-RTO algorithm declares a
timeout spurious. The actions taken in response to a spurious timeout spurious. The actions taken in response to a spurious
timeout are not specified in this document, but we discuss some timeout are not specified in this document, but we discuss some
alternatives in Section 3. This section introduces the algorithm alternatives in Section 4. This section introduces the algorithm
and then discusses the different steps of the algorithm in more and then discusses the different steps of the algorithm in more
detail. detail.
Following the practice used with the Eifel Detection algorithm Following the practice used with the Eifel Detection algorithm
[LM03], we use the "SpuriousRecovery" variable to indicate whether [LM03], we use the "SpuriousRecovery" variable to indicate whether
the retransmission is declared spurious by the sender. This the retransmission is declared spurious by the sender. This
variable can be used as an input for a corresponding response variable can be used as an input for a corresponding response
algorithm. With F-RTO, the value of SpuriousRecovery can be either algorithm. With F-RTO, the value of SpuriousRecovery can be either
SPUR_TO (indicating a spurious retransmission timeout) or FALSE SPUR_TO (indicating a spurious retransmission timeout) or FALSE
(indicating that the timeout is not declared spurious), and the TCP (indicating that the timeout is not declared spurious), and the TCP
sender should follow the conventional RTO recovery algorithm. sender should follow the conventional RTO recovery algorithm.
2.1. The Algorithm 2.1. The Algorithm
A TCP sender implementing the basic F-RTO algorithm MUST take the A TCP sender implementing the basic F-RTO algorithm MUST take the
following steps after the retransmission timer expires. If the following steps after the retransmission timer expires. If the
sender implements some loss recovery algorithm other than Reno or retransmission timer expires again during the execution of the F-RTO
NewReno [FHG04], the F-RTO algorithm SHOULD NOT be entered when algorithm, the TCP sender MUST re-start the algorithm processing
earlier fast recovery is underway. from step 1. If the sender implements some loss recovery algorithm
other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT
be entered when earlier fast recovery is underway.
The F-RTO algorithm takes different actions based on whether an
incoming acknowledgement advances the cumulative acknowledgement
point for an received in-order segment, or whether it is a duplicate
acknowledgement to indicate an out-of-order segment. Duplicate
acknowledgement is defined in [APB07]. The F-RTO algorithm does not
specify actions for receiving a segment that does not acknowledge
new data but is not a duplicate acknowledgement. The TCP sender
SHOULD ignore such segments and wait for a segment that either
acknowledges new data or is a duplicate acknowledgment.
1) When RTO expires, retransmit the first unacknowledged segment and 1) When RTO expires, retransmit the first unacknowledged segment and
set SpuriousRecovery to FALSE. Also, store the highest sequence set SpuriousRecovery to FALSE. Also, store the highest sequence
number transmitted so far in variable "recover". number transmitted so far in variable "recover".
2) When the first acknowledgment after the RTO retransmission 2) When the first acknowledgment after the RTO retransmission
arrives at the sender, the sender chooses one of the following arrives at the TCP sender, the TCP sender chooses one of the
actions, depending on whether the ACK advances the window or following actions, depending on whether the ACK advances the
whether it is a duplicate ACK. window or whether it is a duplicate ACK.
a) If the acknowledgment is a duplicate ACK OR it acknowledges a a) If the acknowledgment is a duplicate ACK OR the
sequence number equal to the value of "recover" OR it does not Acknowledgement field covers "recover" but not more than
acknowledge all of the data that was retransmitted in step 1, "recover" OR the acknowledgment does not acknowledge all of
revert to the conventional RTO recovery and continue by the data that was retransmitted in step 1, revert to the
retransmitting unacknowledged data in slow start. Do not conventional RTO recovery and continue by retransmitting
enter step 3 of this algorithm. The SpuriousRecovery variable unacknowledged data in slow start. Do not enter step 3 of
remains as FALSE. this algorithm. The SpuriousRecovery variable remains as
FALSE.
b) Else, if the acknowledgment advances the window AND it is b) Else, if the acknowledgment advances the window AND the
below the value of "recover", transmit up to two new Acknowledgement field does not cover "recover", transmit up to
(previously unsent) segments and enter step 3 of this two new (previously unsent) segments and enter step 3 of this
algorithm. If the TCP sender does not have enough unsent algorithm. If the TCP sender does not have enough unsent data,
data, it can send only one segment. In addition, the TCP it can send only one segment. In addition, the TCP sender MAY
sender MAY override the Nagle algorithm [Nag84] and override the Nagle algorithm [Nag84] and immediately send a
immediately send a segment if needed. Note that sending two segment if needed. Note that sending two segments in this step
segments in this step is allowed by TCP congestion control is allowed by TCP congestion control requirements [APS99]: An
requirements [APS99]: An F-RTO TCP sender simply chooses F-RTO TCP sender simply chooses different segments to
different segments to transmit. transmit.
If the TCP sender does not have any new data to send, or the If the TCP sender does not have any new data to send, or the
advertised window prohibits new transmissions, the recommended advertised window prohibits new transmissions, the recommended
action is to skip step 3 of this algorithm and continue with action is to skip step 3 of this algorithm and continue with
slow start retransmissions, following the conventional RTO slow start retransmissions, following the conventional RTO
recovery algorithm. However, alternative ways of handling the recovery algorithm. However, alternative ways of handling the
window-limited cases that could result in better performance window-limited cases that could result in better performance
are discussed in Appendix A. are discussed in Appendix A.
3) When the second acknowledgment after the RTO retransmission 3) When the second acknowledgment after the RTO retransmission
arrives at the sender, the TCP sender either declares the timeout arrives at the TCP sender, the TCP sender either declares the
spurious, or starts retransmitting the unacknowledged segments. timeout spurious, or starts retransmitting the unacknowledged
segments.
a) If the acknowledgment is a duplicate ACK, set the congestion a) If the acknowledgment is a duplicate ACK, set the congestion
window to no more than 3 * MSS, and continue with the slow window to no more than 3 * MSS, and continue with the slow
start algorithm retransmitting unacknowledged segments. The start algorithm retransmitting unacknowledged segments. The
congestion window can be set to 3 * MSS, because two round- congestion window can be set to 3 * MSS, because two round-
trip times have elapsed since the RTO, and a conventional TCP trip times have elapsed since the RTO, and a conventional TCP
sender would have increased cwnd to 3 during the same time. sender would have increased cwnd to 3 during the same time.
Leave SpuriousRecovery set to FALSE. Leave SpuriousRecovery set to FALSE.
b) If the acknowledgment advances the window (i.e., if it b) If the acknowledgment advances the window (i.e., if it
skipping to change at page 8, line 20 skipping to change at page 8, line 38
sender and receiver might run out of segments, and no further sender and receiver might run out of segments, and no further
acknowledgments would arrive. Therefore, in the window-limited acknowledgments would arrive. Therefore, in the window-limited
case, the recommendation is to revert to the conventional RTO case, the recommendation is to revert to the conventional RTO
recovery with slow start retransmissions. Appendix A discusses some recovery with slow start retransmissions. Appendix A discusses some
alternative solutions for window-limited situations. alternative solutions for window-limited situations.
If the retransmission timeout is declared spurious, the TCP sender If the retransmission timeout is declared spurious, the TCP sender
sets the value of the "recover" variable to SND.UNA in order to sets the value of the "recover" variable to SND.UNA in order to
allow fast retransmit [FHG04]. The "recover" variable was proposed allow fast retransmit [FHG04]. The "recover" variable was proposed
for avoiding unnecessary, multiple fast retransmits when RTO expires for avoiding unnecessary, multiple fast retransmits when RTO expires
during fast recovery with NewReno TCP. Because the sender during fast recovery with NewReno TCP. Because the F-RTO sender
retransmits only the segment that triggered the timeout, the problem retransmits only the segment that triggered the timeout, the problem
of unnecessary multiple fast retransmits [FHG04] cannot occur. of unnecessary multiple fast retransmits [FHG04] cannot occur.
Therefore, if three duplicate ACKs arrive at the sender after the Therefore, if three duplicate ACKs arrive at the sender after the
timeout, they probably indicate a packet loss, and thus fast timeout, they probably indicate a packet loss, and thus fast
retransmit should be used to allow efficient recovery. If there are retransmit should be used to allow efficient recovery. If there are
not enough duplicate ACKs arriving at the sender after a packet not enough duplicate ACKs arriving at the sender after a packet
loss, the retransmission timer expires again and the sender enters loss, the retransmission timer expires again and the sender enters
step 1 of this algorithm. step 1 of this algorithm.
When the timeout is declared spurious, the TCP sender cannot detect When the timeout is declared spurious, the TCP sender cannot detect
skipping to change at page 8, line 49 skipping to change at page 9, line 18
The F-RTO algorithm has a side-effect on the TCP round-trip time The F-RTO algorithm has a side-effect on the TCP round-trip time
measurement. Because the TCP sender can avoid most of the measurement. Because the TCP sender can avoid most of the
unnecessary retransmissions after detecting a spurious timeout, the unnecessary retransmissions after detecting a spurious timeout, the
sender is able to take round-trip time samples on the delayed sender is able to take round-trip time samples on the delayed
segments. If the regular RTO recovery was used without TCP segments. If the regular RTO recovery was used without TCP
timestamps, this would not be possible due to the retransmission timestamps, this would not be possible due to the retransmission
ambiguity. As a result, the RTO is likely to have more accurate and ambiguity. As a result, the RTO is likely to have more accurate and
larger values with F-RTO than with the regular TCP after a spurious larger values with F-RTO than with the regular TCP after a spurious
timeout that was triggered due to delayed segments. We believe this timeout that was triggered due to delayed segments. We believe this
is an advantage in the networks that are prone to delay spikes. is an advantage in networks that are prone to delay spikes.
There are some situations where the F-RTO algorithm may not avoid There are some situations where the F-RTO algorithm may not avoid
unnecessary retransmissions after a spurious timeout. If packet unnecessary retransmissions after a spurious timeout. If packet
reordering or packet duplication occurs on the segment that reordering or packet duplication occurs on the segment that
triggered the spurious timeout, the F-RTO algorithm may not detect triggered the spurious timeout, the F-RTO algorithm may not detect
the spurious timeout due to incoming duplicate ACKs. Additionally, the spurious timeout due to incoming duplicate ACKs. Additionally,
if a spurious timeout occurs during fast recovery, the F-RTO if a spurious timeout occurs during fast recovery, the F-RTO
algorithm often cannot detect the spurious timeout because the algorithm often cannot detect the spurious timeout because the
segments that were transmitted before the fast recovery trigger segments that were transmitted before the fast recovery trigger
duplicate ACKs. However, we consider these cases rare, and note duplicate ACKs. However, we consider these cases rare, and note
that in cases where F-RTO fails to detect the spurious timeout, it that in cases where F-RTO fails to detect the spurious timeout, it
retransmits the unacknowledged segments in slow start, and thus retransmits the unacknowledged segments in slow start, and thus
performs similarly to the regular RTO recovery. performs similarly to the regular RTO recovery.
3. Taking Actions after Detecting Spurious RTO 3. SACK-Enhanced Version of the F-RTO Algorithm
Upon retransmission timeout, a conventional TCP sender assumes that This section describes an alternative version of the F-RTO algorithm
outstanding segments are lost and starts retransmitting the that uses the TCP Selective Acknowledgment Option [MMFR96]. By
using the SACK option, the TCP sender detects spurious timeouts in
most of the cases when packet reordering or packet duplication is
present. If the SACK blocks acknowledge new data that was not
transmitted after the RTO retransmission, the sender may declare the
timeout spurious, even when duplicate ACKs follow the RTO.
Given that the TCP Selective Acknowledgment Option [MMFR96] is
enabled for a TCP connection, a TCP sender MAY implement the SACK-
enhanced F-RTO algorithm. If the sender applies the SACK-enhanced
F-RTO algorithm, it MUST follow the steps below. This algorithm
SHOULD NOT be applied if the TCP sender is already in SACK loss
recovery when retransmission timeout occurs.
The steps of the SACK-enhanced version of the F-RTO algorithm are as
follows. If the retransmission timer expires again during the
execution of the SACK-enhanced F-RTO algorithm, the TCP sender MUST
re-start the algorithm processing from step 1.
1) When the RTO expires, retransmit the first unacknowledged segment
and set SpuriousRecovery to FALSE. Set variable "RecoveryPoint"
to indicate the highest segment transmitted so far. Following the
recommendation in SACK specification [MMFR96], reset the SACK
scoreboard.
2) Wait until the acknowledgment of the data retransmitted due to
the timeout arrives at the sender. If duplicate ACKs arrive
before the cumulative acknowledgment for retransmitted data,
adjust the scoreboard according to the incoming SACK information.
Stay in step 2 and wait for the next new acknowledgment. If RTO
expires again, go to step 1 of the algorithm.
a) if the Cumulative Acknowledgement field covers "RecoveryPoint"
but not more than "RecoveryPoint", revert to the conventional
RTO recovery and set the congestion window to no more than 2 *
MSS, like a regular TCP would do. Do not enter step 3 of this
algorithm.
b) else, if the Cumulative Acknowledgement field does not cover
"RecoveryPoint" but is larger than SND.UNA, transmit up to two
new (previously unsent) segments and proceed to step 3. If
the TCP sender is not able to transmit any previously unsent
data -- either due to receiver window limitation or because it
does not have any new data to send -- the recommended action
is to refrain from entering step 3 of this algorithm. Rather,
continue with slow start retransmissions following the
conventional RTO recovery algorithm.
It is also possible to apply some of the alternatives for
handling window-limited cases discussed in Appendix A.
3) The next acknowledgment arrives at the sender. Either a
duplicate ACK or a new cumulative ACK (advancing the window)
applies in this step. Other types of ACKs are ignored without any
action.
a) if the Cumulative Acknowledgement field or a SACK block covers
more than "RecoveryPoint", set the congestion window to no
more than 3 * MSS and proceed with the conventional RTO
recovery, retransmitting unacknowledged segments. Take this
branch also when the acknowledgment is a duplicate ACK and it
does not acknowledge any new, previously unacknowledged data
below "RecoveryPoint" in the SACK blocks. Leave
SpuriousRecovery set to FALSE.
b) if the Cumulative Acknowledgement field or a SACK block in the
ACK does not cover more than "RecoveryPoint" AND it
acknowledges data that was not acknowledged earlier (either
with cumulative acknowledgment or using SACK blocks), declare
the timeout spurious and set SpuriousRecovery to SPUR_TO. The
retransmission timeout can be declared spurious, because the
segment acknowledged with this ACK was transmitted before the
timeout.
If there are unacknowledged holes between the received SACK blocks,
those segments are retransmitted similarly to the conventional SACK
recovery algorithm [BAFW03]. If the algorithm exits with
SpuriousRecovery set to SPUR_TO, "RecoveryPoint" is set to SND.UNA,
thus allowing fast recovery on incoming duplicate acknowledgments.
The SACK enhanced algorithm works on the same principle as the basic
algorithm, but by utilizing the additional information from the SACK
option. When a genuine retransmission timeout occurs during a steady
state of a connection, it can be assumed that there are no segments
left in the pipe. Otherwise, the acknowledgments triggered by these
segments would have triggered the SACK loss recovery or transmission
of new segments. Therefore, if the F-RTO sender receives
acknowledgements for segments transmitted before the retransmission
timeout in response to the two new segments sent at the algorithm
step 2, the normal operation of TCP has been just delayed, and the
retransmission timeout is considered spurious. Note that this
reasoning works only when the TCP sender is not in SACK loss
recovery at the time the retransmission timeout occurs.
4. Taking Actions after Detecting Spurious RTO
Upon a retransmission timeout, a conventional TCP sender assumes
that outstanding segments are lost and starts retransmitting the
unacknowledged segments. When the retransmission timeout is unacknowledged segments. When the retransmission timeout is
detected to be spurious, the TCP sender should not continue detected to be spurious, the TCP sender should not continue
retransmitting based on the timeout. For example, if the sender was retransmitting based on the timeout. For example, if the sender was
in congestion avoidance phase transmitting new, previously unsent in congestion avoidance phase transmitting new, previously unsent
segments, it should continue transmitting previously unsent segments, it should continue transmitting previously unsent segments
segments. in congestion avoidance.
There are currently two alternatives specified for a spurious There are currently two alternatives specified for a spurious
timeout response algorithm, the Eifel Response Algorithm [LG04], and timeout response algorithm, the Eifel Response Algorithm [LG04], and
an algorithm for adapting the retransmission timeout after a an algorithm for adapting the retransmission timeout after a
spurious RTO [BBA06]. If no specific response algorithm is spurious RTO [BBA06]. If no specific response algorithm is
implemented, the TCP should respond to spurious timeout implemented, the TCP SHOULD respond to spurious timeout
conservatively, applying the TCP congestion control specification conservatively, applying the TCP congestion control specification
[APS99]. Different response algorithms for spurious retransmission [APS99]. Different response algorithms for spurious retransmission
timeouts have been analyzed in some research papers [GL03, Sar03] timeouts have been analyzed in some research papers [GL03, Sar03]
and IETF documents [SL03]. and IETF documents [SL03].
4. Evaluation of RFC 4138 and Differences to this Document 5. Evaluation of RFC 4138 and Differences to this Document
F-RTO was first specified in an Experimental RFC 4138 that has been F-RTO was first specified in an Experimental RFC 4138 that has been
implemented in a number of operating systems since it was published. implemented in a number of operating systems since it was published.
Gained experience has been documented in a separate document Gained experience has been documented in a separate document
[KYHS07], and can be summarized as follows. [KYHS07], and can be summarized as follows.
If the TCP sender employs F-RTO, it is able to detect spurious RTOs If the TCP sender employs F-RTO, it is able to detect spurious RTOs
and avoid the unnecessary retransmission of the whole window of and avoid the unnecessary retransmission of the whole window of
data. Because F-RTO avoids the unnecessary retransmissions after a data. Because F-RTO avoids the unnecessary retransmissions after a
spurious RTO, it is able to adhere to the packet conservation spurious RTO, it is able to adhere to the packet conservation
skipping to change at page 10, line 47 skipping to change at page 13, line 19
of any benefit to the receiver. Therefore we believe it is not of any benefit to the receiver. Therefore we believe it is not
likely that receivers would start employing such tricks in a likely that receivers would start employing such tricks in a
significant scale. Finally, loss of the unnecessary RTO significant scale. Finally, loss of the unnecessary RTO
retransmission cannot be detected without using some explicit retransmission cannot be detected without using some explicit
acknowledgement scheme such as DSACK. This is common to the other acknowledgement scheme such as DSACK. This is common to the other
mechanisms for detecting spurious RTO, as well as to regular TCP mechanisms for detecting spurious RTO, as well as to regular TCP
that does not use DSACK. We note that if the congestion control that does not use DSACK. We note that if the congestion control
response to spurious RTO is conservative enough, the above corner response to spurious RTO is conservative enough, the above corner
cases do not cause problems due to increased congestion. cases do not cause problems due to increased congestion.
RFC 4138 includes the SACK-based variant of F-RTO and discussion on 6. Security Considerations
applying F-RTO to SCTP. These sections have been left out from this
document because the authors are not aware of extensive experiments
made with SACK-enhanced F-RTO or SCTP, and therefore think the
community does not yet have sufficient experience with these
variants.
5. Security Considerations
The main security threat regarding F-RTO is the possibility that a The main security threat regarding F-RTO is the possibility that a
receiver could mislead the sender into setting too large a receiver could mislead the sender into setting too large a
congestion window after an RTO. There are two possible ways a congestion window after an RTO. There are two possible ways a
malicious receiver could trigger a wrong output from the F-RTO malicious receiver could trigger a wrong output from the F-RTO
algorithm. First, the receiver can acknowledge data that it has not algorithm. First, the receiver can acknowledge data that it has not
received. Second, it can delay acknowledgment of a segment it has received. Second, it can delay acknowledgment of a segment it has
received earlier, and acknowledge the segment after the TCP sender received earlier, and acknowledge the segment after the TCP sender
has been deluded to enter algorithm step 3. has been deluded to enter algorithm step 3.
skipping to change at page 11, line 33 skipping to change at page 13, line 46
window. window.
A common case for a retransmission timeout is that a fast A common case for a retransmission timeout is that a fast
retransmission of a segment is lost. If all other segments have retransmission of a segment is lost. If all other segments have
been received, the RTO retransmission causes the whole window to be been received, the RTO retransmission causes the whole window to be
acknowledged at once. This case is recognized in F-RTO algorithm acknowledged at once. This case is recognized in F-RTO algorithm
branch (2a). However, if the receiver only acknowledges one segment branch (2a). However, if the receiver only acknowledges one segment
after receiving the RTO retransmission, and then the rest of the after receiving the RTO retransmission, and then the rest of the
segments, it could cause the timeout to be declared spurious when it segments, it could cause the timeout to be declared spurious when it
is not. Therefore, it is suggested that, when an RTO expires during is not. Therefore, it is suggested that, when an RTO expires during
fast recovery phase, the sender would not fully revert the the fast recovery phase, the sender would not fully revert the
congestion window even if the timeout was declared spurious. congestion window even if the timeout was declared spurious.
Instead, the sender would reduce the congestion window to 1. Instead, the sender would reduce the congestion window to 1.
If there is more than one segment missing at the time of a If there is more than one segment missing at the time of a
retransmission timeout, the receiver does not benefit from retransmission timeout, the receiver does not benefit from
misleading the sender to declare a spurious timeout because the misleading the sender to declare a spurious timeout because the
sender would have to go through another recovery period to sender would have to go through another recovery period to
retransmit the missing segments, usually after an RTO has elapsed. retransmit the missing segments, usually after an RTO has elapsed.
6. Acknowledgements 7. Acknowledgements
We are thankful to Reiner Ludwig, Andrei Gurtov, Josh Blanton, Mark The authors would like to thank Alfred Hoenes and Ilpo Jarvinen for
Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias the comments on this document.
We are also thankful to Reiner Ludwig, Andrei Gurtov, Josh Blanton,
Mark Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias
Rodriguez, Sourabh Ladha, Martin Duke, Motoharu Miyake, Ted Faber, Rodriguez, Sourabh Ladha, Martin Duke, Motoharu Miyake, Ted Faber,
Samu Kontinen, and Kostas Pentikousis who gave valuable feedback Samu Kontinen, and Kostas Pentikousis who gave valuable feedback
during the preparation of RFC 4138, the precursor of this document. during the preparation of RFC 4138, the precursor of this document.
Appendix Appendix
A. Discussion of Window-Limited Cases A. Discussion of Window-Limited Cases
When the advertised window limits the transmission of two new When the advertised window limits the transmission of two new
previously unsent segments, or there are no new data to send, it is previously unsent segments, or there are no new data to send, it is
recommended in F-RTO algorithm step (2b) that the TCP sender recommended in F-RTO algorithm step (2b) that the TCP sender
continue with the conventional RTO recovery algorithm. The continue with the conventional RTO recovery algorithm. The
disadvantage is that the sender may continue unnecessary disadvantage is that the sender may continue unnecessary
retransmissions due to possible spurious timeout. This section retransmissions due to possible spurious timeout. This section
briefly discusses the options that can potentially improve briefly discusses the options that can potentially improve
performance when transmitting previously unsent data is not performance when transmitting previously unsent data is not
possible. possible.
- The TCP sender could reserve an unused space of a size of one or - The TCP sender could reserve an unused space of a size of one or
two segments in the advertised window to ensure the use of two segments in the advertised window to ensure the use of
algorithms such as F-RTO or Limited Transmit [ABF01] in window- algorithms such as F-RTO or Limited Transmit [ABF01] in receiver
limited situations. On the other hand, while doing this, the TCP window-limited situations. On the other hand, while doing this,
sender should ensure that the window of outstanding segments is the TCP sender should ensure that the window of outstanding
large enough for proper utilization of the available pipe. segments is large enough for proper utilization of the available
pipe.
- Use additional information if available, e.g., TCP timestamps with - Use additional information if available, e.g., TCP timestamps with
the Eifel Detection algorithm, for detecting a spurious timeout. the Eifel Detection algorithm, for detecting a spurious timeout.
However, Eifel detection may yield different results from F-RTO However, Eifel detection may yield different results from F-RTO
when ACK losses and an RTO occur within the same round-trip time when ACK losses and an RTO occur within the same round-trip time
[SKR03]. [SKR03].
- Retransmit data from the tail of the retransmission queue and - Retransmit data from the tail of the retransmission queue and
continue with step 3 of the F-RTO algorithm. It is possible that continue with step 3 of the F-RTO algorithm. It is possible that
the retransmission will be made unnecessarily. Thus, this option the retransmission will be made unnecessarily. Furthermore, the
is not encouraged, except for hosts that are known to operate in operation of the SACK-based F-RTO algorithm would need to consider
an environment that is prone to spurious timeouts. On the other this case separately, to not use the retransmitted segment to
hand, with this method it is possible to limit unnecessary indicate spurious timeout. Given these considerations, this option
retransmissions due to spurious timeout to one retransmission. is not recommended.
- Send a zero-sized segment below SND.UNA, similar to TCP Keep-Alive - Send a zero-sized segment below SND.UNA, similar to TCP Keep-Alive
probe, and continue with step 3 of the F-RTO algorithm. Because probe, and continue with step 3 of the F-RTO algorithm. Because
the receiver replies with a duplicate ACK, the sender is able to the receiver replies with a duplicate ACK, the sender is able to
detect whether the timeout was spurious from the incoming detect whether the timeout was spurious from the incoming
acknowledgment. This method does not send data unnecessarily, but acknowledgment. This method does not send data unnecessarily, but
it delays the recovery by one round-trip time in cases where the it delays the recovery by one round-trip time in cases where the
timeout was not spurious. Therefore, this method is not timeout was not spurious. Therefore, this method is not
encouraged. encouraged.
- In receiver-limited cases, send one octet of new data, regardless - In receiver-limited cases, send one octet of new data, regardless
of the advertised window limit, and continue with step 3 of the F- of the advertised window limit, and continue with step 3 of the F-
RTO algorithm. It is possible that the receiver will have free RTO algorithm. It is possible that the receiver will have free
buffer space to receive the data by the time the segment has buffer space to receive the data by the time the segment has
propagated through the network, in which case no harm is done. If propagated through the network, in which case no harm is done. If
the receiver is not capable of receiving the segment, it rejects the receiver is not capable of receiving the segment, it rejects
the segment and sends a duplicate ACK. the segment and sends a duplicate ACK.
B. List of Changes B. List of Changes
Changes between different document versions are summarized below,
apart from minor editing and language improvements.
Changes from draft-ietf-tcpm-rfc4138bis-00:
* Added back the original SACK-algorithm from RFC 4138 after the
common feedback to have the SACK-algorithm in the document.
Clarified the algorithm a bit, and added one paragraph of
description of the basic idea of the algorithm.
* Clarified behavior on multiple timeouts.
* Added a paragraph on acknowledgements that do not acknowledge new
data but are not duplicate acknowledgements
Changes from RFC 4138: Changes from RFC 4138:
* Removed description of the SACK-enhanced algorithm * Removed description of the SACK-enhanced algorithm
* Removed SCTP considerations * Removed SCTP considerations
* Removed earlier Appendix sections, except Appendix C from RFC * Removed earlier Appendix sections, except Appendix C from RFC
4138, which is now Appendix A 4138, which is now Appendix A
* Clarified text about the possible response algorithms * Clarified text about the possible response algorithms
* Added section that summarizes the evaluation of RFC 4138 * Added section that summarizes the evaluation of RFC 4138
References References
Normative References Normative References
[APS99] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion [APS99] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999. Control", RFC 2581, April 1999.
[APB07] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", Internet-Draft "draft-ietf-tcpm-
rfc2581bis-03.txt",
September 2007.
[BAFW03] Blanton, E., Allman, M., Fall, K., and L. Wang, "A [BAFW03] Blanton, E., Allman, M., Fall, K., and L. Wang, "A
Conservative Selective Acknowledgment (SACK)-based Loss Conservative Selective Acknowledgment (SACK)-based Loss
Recovery Algorithm for TCP", RFC 3517, April 2003. Recovery Algorithm for TCP", RFC 3517, April 2003.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[FHG04] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno [FHG04] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
Modification to TCP's Fast Recovery Algorithm", RFC 3782, Modification to TCP's Fast Recovery Algorithm", RFC 3782,
April 2004. April 2004.
 End of changes. 32 change blocks. 
95 lines changed or deleted 227 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/