draft-ietf-tsvwg-tcp-eifel-alg-05.txt   draft-ietf-tsvwg-tcp-eifel-alg-06.txt 
Network Working Group Reiner Ludwig Network Working Group Reiner Ludwig
INTERNET-DRAFT Michael Meyer INTERNET-DRAFT Michael Meyer
Expires: April 2003 Ericsson Research Expires: April 2003 Ericsson Research
October, 2002 October, 2002
The Eifel Detection Algorithm for TCP The Eifel Detection Algorithm for TCP
<draft-ietf-tsvwg-tcp-eifel-alg-05.txt> <draft-ietf-tsvwg-tcp-eifel-alg-06.txt>
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
skipping to change at page 1, line 33 skipping to change at page 1, line 33
material or cite them other than as "work in progress". material or cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/lid-abstracts.txt http://www.ietf.org/ietf/lid-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Abstract Abstract
The Eifel detection algorithm allows the TCP sender to detect a The Eifel detection algorithm allows a TCP sender to detect a
posteriori whether it has entered loss recovery unnecessarily. It posteriori whether it has entered loss recovery unnecessarily. It
already determines this in the beginning of loss recovery when the requires that the TCP Timestamps option defined in RFC1323 is enabled
first acceptable ACK arrives after the timeout-based retransmit or for a connection. The Eifel detection algorithm makes use of the fact
the fast retransmit has been sent. The algorithm requires that the that the TCP Timestamps option eliminates the retransmission
TCP Timestamps option defined in RFC1323 is enabled for a connection. ambiguity in TCP. Based on the timestamp of the first acceptable ACK
The idea is to use the timestamps echoed in returning ACKs to that arrives during loss recovery, it decides whether loss recovery
eliminate the retransmission ambiguity in TCP. The Eifel detection was entered unnecessarily. The Eifel detection algorithm provides a
algorithm provides a basis for future TCP enhancements. This includes basis for future TCP enhancements. This includes response algorithms
response algorithms to back out of loss recovery by restoring a TCP to back out of loss recovery by restoring a TCP sender's congestion
sender's congestion control state. control state.
Terminology Terminology
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119]. document, are to be interpreted as described in [RFC2119].
We refer to the first-time transmission of an octet as the 'original We refer to the first-time transmission of an octet as the 'original
transmit'. A subsequent transmission of the same octet is referred to transmit'. A subsequent transmission of the same octet is referred to
as a 'retransmit'. In most cases this terminology can likewise be as a 'retransmit'. In most cases this terminology can likewise be
applied to data segments as opposed to octets. However, with applied to data segments as opposed to octets. However, with
repacketization a segment can contain both first-time transmissions repacketization a segment can contain both first-time transmissions
and retransmissions of octets. In that case, this terminology is only and retransmissions of octets. In that case, this terminology is only
consistent when applied to "octets". For the Eifel detection consistent when applied to octets. For the Eifel detection algorithm
algorithm this makes no difference as it also operates correctly when this makes no difference as it also operates correctly when
repacketization occurs. repacketization occurs.
We use the term 'acceptable ACK' as defined in [RFC793]. That is an We use the term 'acceptable ACK' as defined in [RFC793]. That is an
ACK that acknowledges previously unacknowledged data. We use the term ACK that acknowledges previously unacknowledged data. We use the term
'duplicate ACK', and the variable 'dupacks' as defined in [WS95]. The 'duplicate ACK', and the variable 'dupacks' as defined in [WS95]. The
variable 'dupacks' is a counter of duplicate ACKs that have already variable 'dupacks' is a counter of duplicate ACKs that have already
been received by the TCP sender before the fast retransmit is sent. been received by a TCP sender before the fast retransmit is sent. We
We use the variable 'DupThresh' to refer to the so-called duplicate use the variable 'DupThresh' to refer to the so-called duplicate
acknowledgement threshold, i.e., the number of duplicate ACKs that acknowledgement threshold, i.e., the number of duplicate ACKs that
need to arrive at the TCP sender to trigger a fast retransmit. need to arrive at a TCP sender to trigger a fast retransmit.
Currently, DupThresh is specified as a fixed value of three Currently, DupThresh is specified as a fixed value of three
[RFC2581]. Future TCPs might implement an adaptive DupThresh. [RFC2581]. Future TCPs might implement an adaptive DupThresh.
1. Introduction 1. Introduction
The retransmission ambiguity problem [Zh86][KP87] is the TCP sender's The retransmission ambiguity problem [Zh86][KP87] is a TCP sender's
inability to distinguish whether the first acceptable ACK that inability to distinguish whether the first acceptable ACK that
arrives after a retransmit, was sent in response to the original arrives after a retransmit, was sent in response to the original
transmit or the retransmit. This problem occurs after a timeout-based transmit or the retransmit. This problem occurs after a timeout-based
retransmit and after a fast retransmit. The Eifel detection algorithm retransmit and after a fast retransmit. The Eifel detection algorithm
uses the TCP Timestamps option defined in [RFC1323] to eliminate the uses the TCP Timestamps option defined in [RFC1323] to eliminate the
retransmission ambiguity. It thereby allows the TCP sender to detect retransmission ambiguity. It thereby allows a TCP sender to detect a
a posteriori whether it has entered loss recovery unnecessarily. posteriori whether it has entered loss recovery unnecessarily.
This added capability of the TCP sender is useful in environments This added capability of a TCP sender is useful in environments where
where TCP's loss recovery and congestion control algorithms may often TCP's loss recovery and congestion control algorithms may often get
get falsely triggered. This can be caused by packet reordering, falsely triggered. This can be caused by packet reordering, packet
packet duplication, or a sudden delay increase in the data or the ACK duplication, or a sudden delay increase in the data or the ACK path
path that results in a spurious timeout. For example, such sudden that results in a spurious timeout. For example, such sudden delay
delay increases can often occur in wide-area wireless access networks increases can often occur in wide-area wireless access networks due
due to handovers, resource preemption due to higher priority traffic to handovers, resource preemption due to higher priority traffic
(e.g., voice), or because the mobile transmitter traverses through a (e.g., voice), or because the mobile transmitter traverses through a
radio coverage hole (e.g., see [Gu01]). In such wireless networks, radio coverage hole (e.g., see [Gu01]). In such wireless networks,
the often unnecessary go-back-N retransmits that typically occur the often unnecessary go-back-N retransmits that typically occur
after a spurious timeout create a serious problem. They decrease end- after a spurious timeout create a serious problem. They decrease end-
to-end throughput, are useless load upon a potentially congested to-end throughput, are useless load upon the network, and waste
network, and waste transmission (battery) power. Note, that across transmission (battery) power. Note, that across such networks the use
such networks the use of timestamps is recommended anyway [IMLGK02]. of timestamps is recommended anyway [IMLGK02].
Based on the Eifel detection algorithm, the TCP sender may then Based on the Eifel detection algorithm, a TCP sender may then choose
choose to implement dedicated response algorithms. One goal of such a to implement dedicated response algorithms. One goal of such a
response algorithm would be to alleviate the consequences of a response algorithm would be to alleviate the consequences of a
falsely triggered loss recovery. This may include restoring the TCP falsely triggered loss recovery. This may include restoring the TCP
sender's congestion control state, and avoiding the mentioned sender's congestion control state, and avoiding the mentioned
unnecessary go-back-N retransmits. Another goal would be to adapt unnecessary go-back-N retransmits. Another goal would be to adapt
protocol parameters such as the duplicate acknowledgement threshold protocol parameters such as the duplicate acknowledgement threshold
[RFC2581], and the RTT estimators [RFC2988]. This is to reduce the [RFC2581], and the RTT estimators [RFC2988]. This is to reduce the
risk of falsely triggering TCP's loss recovery again as the risk of falsely triggering TCP's loss recovery again as the
connection progresses. However, such response algorithms are outside connection progresses. However, such response algorithms are outside
the scope of this document. Note: The original proposal, the "Eifel the scope of this document. Note: The original proposal, the "Eifel
algorithm" [LK00], comprises both a detection and a response algorithm" [LK00], comprises both a detection and a response
algorithm. This document only defines the detection part. The algorithm. This document only defines the detection part. The
response part is defined in [LG02]. response part is defined in [LG02].
A key feature of the Eifel detection algorithm is that it already A key feature of the Eifel detection algorithm is that it already
detects upon the first acceptable ACK that arrives during loss detects upon the first acceptable ACK that arrives during loss
recovery whether a fast retransmit or a timeout was spurious. This is recovery whether a fast retransmit or a timeout was spurious. This is
crucial to be able to avoid the mentioned go-back-N retransmits. crucial to be able to avoid the mentioned go-back-N retransmits.
Another feature is that the Eifel detection algorithm is fairly Another feature is that the Eifel detection algorithm is fairly
robust against the loss of ACKs. robust against the loss of ACKs.
Also the DSACK option [RFC2883] can be used to detect a posteriori
whether a TCP sender has entered loss recovery unnecessarily [BA02].
However, the first ACK carrying a DSACK option usually arrives at the
TCP sender only after loss recovery has already terminated. Thus, the
DSACK option cannot be used to eliminate the retransmission
ambiguity. Consequently, it cannot be used to avoid the mentioned
unnecessary go-back-N retransmits. Moreover, a DSACK-based detection
algorithm is less robust against ACK losses. A recent proposal
neither based on the TCP timestamps nor the DSACK option does not
have the limitation of DSACK-based schemes, but only addresses the
case of spurious timeouts [SK02].
2. Events that Falsely Trigger TCP Loss Recovery 2. Events that Falsely Trigger TCP Loss Recovery
The following events falsely trigger a TCP sender's loss recovery and The following events may falsely trigger a TCP sender's loss recovery
congestion control algorithms. This causes a so-called spurious and congestion control algorithms. This causes a so-called spurious
retransmit, and an unnecessary reduction of the TCP sender's retransmit, and an unnecessary reduction of the TCP sender's
congestion window and slow start threshold [RFC2581]. congestion window and slow start threshold [RFC2581].
- Spurious timeout - Spurious timeout
- Packet reordering - Packet reordering
- Packet duplication - Packet duplication
A spurious timeout is a timeout that would not have occurred had the A spurious timeout is a timeout that would not have occurred had the
sender "waited longer". This may be caused by increased delay that sender "waited longer". This may be caused by increased delay that
suddenly occurs in the data and/or the ACK path. That in turn might suddenly occurs in the data and/or the ACK path. That in turn might
cause an acceptable ACK to arrive too late, i.e., only after the TCP cause an acceptable ACK to arrive too late, i.e., only after a TCP
sender's retransmission timer has expired. For the purpose of sender's retransmission timer has expired. For the purpose of
specifying the algorithm in Section 3, we define this case as SPUR_TO specifying the algorithm in Section 3, we define this case as SPUR_TO
(equal 1). (equal 1).
Note: There is another case where a timeout would not have Note: There is another case where a timeout would not have
occurred had the sender "waited longer": the retransmission occurred had the sender "waited longer": the retransmission timer
timer expires, and afterwards the TCP sender receives the expires, and afterwards a TCP sender receives the duplicate ACK
duplicate ACK that would have triggered a fast retransmit of the that would have triggered a fast retransmit of the oldest
oldest outstanding segment. We call this a "fast timeout" since outstanding segment. We call this a "fast timeout" since in
in competition with the fast retransmit algorithm the timeout competition with the fast retransmit algorithm the timeout was
was faster. However, a fast timeout is not spurious since faster. However, a fast timeout is not spurious since apparently a
apparently a segment was in fact lost, i.e., loss recovery was segment was in fact lost, i.e., loss recovery was initiated
initiated rightfully. In this document, we do not consider fast rightfully. In this document, we do not consider fast timeouts.
timeouts. This case is addressed in an independent document This case is addressed in an independent document [Lu02].
[Lu02].
Packet reordering in the network may occur because IP [RFC791] does Packet reordering in the network may occur because IP [RFC791] does
not guarantee in-order delivery of packets. Additionally, a TCP not guarantee in-order delivery of packets. Additionally, a TCP
receiver generates a duplicate ACK for each segment that arrives out- receiver generates a duplicate ACK for each segment that arrives out-
of-order. This results in a spurious fast retransmit if three or more of-order. This results in a spurious fast retransmit if three or more
data segments arrive out-of-order at the TCP receiver, and at least data segments arrive out-of-order at a TCP receiver, and at least
three of the resulting duplicate ACKs arrive at the TCP sender. This three of the resulting duplicate ACKs arrive at the TCP sender. This
assumes that the duplicate acknowledgement threshold is set to three assumes that the duplicate acknowledgement threshold is set to three
as defined in [RFC2581]. as defined in [RFC2581].
Packet duplication may occur because a receiving IP does not (cannot) Packet duplication may occur because a receiving IP does not (cannot)
remove packets that have been duplicated in the network. A TCP remove packets that have been duplicated in the network. A TCP
receiver in turn also generates a duplicate ACK for each duplicate receiver in turn also generates a duplicate ACK for each duplicate
segment. As with packet reordering, this results in a spurious fast segment. As with packet reordering, this results in a spurious fast
retransmit if duplication of data segments or ACKs results in three retransmit if duplication of data segments or ACKs results in three
or more duplicate ACKs to arrive at the TCP sender. Again, this or more duplicate ACKs to arrive at a TCP sender. Again, this assumes
assumes that the duplicate acknowledgement threshold is set to three. that the duplicate acknowledgement threshold is set to three.
The negative impact on TCP performance caused by packet reordering The negative impact on TCP performance caused by packet reordering
and packet duplication is commonly the same: a single spurious and packet duplication is commonly the same: a single spurious
retransmit (the fast retransmit), and the unnecessary halving of the retransmit (the fast retransmit), and the unnecessary halving of a
TCP sender's congestion window as a result of the subsequent fast TCP sender's congestion window as a result of the subsequent fast
recovery phase [RFC2581]. recovery phase [RFC2581].
The negative impact on TCP performance caused by a spurious timeout The negative impact on TCP performance caused by a spurious timeout
is more severe. First, the timeout event itself causes a single is more severe. First, the timeout event itself causes a single
spurious retransmit, and unnecessarily forces the TCP sender into spurious retransmit, and unnecessarily forces a TCP sender into slow
slow start [RFC2581]. Then, as the connection progresses, a chain start [RFC2581]. Then, as the connection progresses, a chain reaction
reaction gets triggered that further decreases TCP's performance. gets triggered that further decreases TCP's performance. Since the
Since the timeout was spurious, at least some ACKs for original timeout was spurious, at least some ACKs for original transmits
transmits will typically arrive at the TCP sender before the ACK for typically arrive at the TCP sender before the ACK for the retransmit
the retransmit arrives. (This is unless severe packet reordering arrives. (This is unless severe packet reordering coincided with the
coincided with the spurious timeout in such a way that the ACK for spurious timeout in such a way that the ACK for the retransmit is the
the retransmit is the first acceptable ACK to arrive at the TCP first acceptable ACK to arrive at the TCP sender.) Those ACKs for
sender.) Those ACKs for original transmits will then trigger an original transmits then trigger an implicit go-back-N loss recovery
implicit go-back-N loss recovery at the TCP sender. Assuming that at the TCP sender. Assuming that none of the outstanding segments and
none of the outstanding segments and none of the corresponding ACKs none of the corresponding ACKs were lost, all outstanding segments
were lost, all outstanding segments will get retransmitted get retransmitted unnecessarily. In fact, during this phase the TCP
unnecessarily. In fact, during this phase the TCP sender breaks sender breaks 'packet conservation' [Jac88]. This is because the
'packet conservation' [Jac88]. This is because the unnecessary unnecessary go-back-N retransmits are sent during slow start. Thus,
go-back-N retransmits are sent during slow start. Thus, for each for each packet that leaves the network and that belongs to the first
packet that leaves the network and that belongs to the first half of half of the original flight, two useless retransmits are sent into
the original flight, two useless retransmits are sent into the the network. Moreover, some TCPs in addition suffer from a spurious
network. Moreover, some TCPs will in addition suffer from a spurious
fast retransmit. This is because the unnecessary go-back-N fast retransmit. This is because the unnecessary go-back-N
retransmits will arrive as duplicates at the TCP receiver, which in retransmits arrive as duplicates at the TCP receiver, which in turn
turn triggers a series of duplicate ACKs. Note that this last triggers a series of duplicate ACKs. Note that this last spurious
spurious fast retransmit could be avoided with the careful variant of fast retransmit could be avoided with the careful variant of 'bug
'bug fix' [RFC2582]. fix' [RFC2582].
More detailed explanations including TCP trace plots that visualize More detailed explanations including TCP trace plots that visualize
the effects of spurious timeouts and packet reordering can be found the effects of spurious timeouts and packet reordering can be found
in the original proposal [LK00]. in the original proposal [LK00].
3. The Eifel Detection Algorithm 3. The Eifel Detection Algorithm
3.1 The Idea 3.1 The Idea
The goal of the Eifel detection algorithm is to allow the TCP sender The goal of the Eifel detection algorithm is to allow a TCP sender to
to detect a posteriori whether it has entered loss recovery detect a posteriori whether it has entered loss recovery
unnecessarily. Furthermore, the TCP sender should be able to make unnecessarily. Furthermore, the TCP sender should be able to make
this decision upon the first acceptable ACK that arrives after the this decision upon the first acceptable ACK that arrives after the
timeout-based retransmit or the fast retransmit has been sent. This timeout-based retransmit or the fast retransmit has been sent. This
in turn requires extra information in ACKs by which the TCP sender in turn requires extra information in ACKs by which the TCP sender
can unambiguously distinguish whether that first acceptable ACK was can unambiguously distinguish whether that first acceptable ACK was
sent in response to the original transmit or the retransmit. Such sent in response to the original transmit or the retransmit. Such
extra information is provided by the TCP Timestamps option [RFC1323]. extra information is provided by the TCP Timestamps option [RFC1323].
Generally speaking, timestamps are monotonously increasing "serial Generally speaking, timestamps are monotonously increasing "serial
numbers" added into every segment that are then echoed within the numbers" added into every segment that are then echoed within the
corresponding ACKs. This is exploited by the Eifel detection corresponding ACKs. This is exploited by the Eifel detection
algorithm in the following way. algorithm in the following way.
Given that timestamps are enabled for a connection, the TCP sender Given that timestamps are enabled for a connection, the TCP sender
always stores the timestamp of the retransmit sent in the beginning always stores the timestamp of the retransmit sent in the beginning
of loss recovery, i.e., the timestamp of the timeout-based retransmit of loss recovery, i.e., the timestamp of the timeout-based retransmit
or the fast retransmit. If the timestamp of the first acceptable ACK, or the fast retransmit. If the timestamp of the first acceptable ACK,
arriving after the retransmit was sent, is smaller then the stored that arrives after the retransmit was sent, is smaller then the
timestamp of that retransmit, then that ACK must have been sent in stored timestamp of that retransmit, then that ACK must have been
response to an original transmit. Hence, the TCP sender must have sent in response to an original transmit. Hence, the TCP sender must
entered loss recovery unnecessarily. have entered loss recovery unnecessarily.
The fact that the Eifel detection algorithm decides upon the first The fact that the Eifel detection algorithm decides upon the first
acceptable ACK is crucial to allow future response algorithms to acceptable ACK is crucial to allow future response algorithms to
avoid the unnecessary go-back-N retransmits that typically occur avoid the unnecessary go-back-N retransmits that typically occur
after a spurious timeout. Also, if loss recovery was entered after a spurious timeout. Also, if loss recovery was entered
unnecessarily, a window worth of ACKs are outstanding that all carry unnecessarily, a window worth of ACKs are outstanding that all carry
a timestamp that is smaller than the stored timestamp of the a timestamp that is smaller than the stored timestamp of the
retransmit. The arrival of any one of those ACKs suffices the Eifel retransmit. The arrival of any one of those ACKs suffices the Eifel
detection algorithm to work. Hence, the solution is fairly robust detection algorithm to work. Hence, the solution is fairly robust
against ACK losses. Even the ACK sent in response to the retransmit, against ACK losses. Even the ACK sent in response to the retransmit,
i.e., the one that carries the stored timestamp, may get lost. i.e., the one that carries the stored timestamp, may get lost.
Note: Also the DSACK option [RFC2883] can be used to detect a
posteriori whether the TCP sender has entered loss recovery
unnecessarily. However, the first ACK carrying a DSACK option
usually arrives at the TCP sender only after loss recovery has
already terminated. Thus, the DSACK option cannot be used to
eliminate the retransmission ambiguity. Consequently, it cannot
be used to avoid the mentioned unnecessary go-back-N
retransmits. Moreover, a DSACK-based detection algorithm is less
robust against ACK losses.
3.2 The Algorithm 3.2 The Algorithm
Given that the TCP Timestamps option [RFC1323] is enabled for a Given that the TCP Timestamps option [RFC1323] is enabled for a
connection, a TCP sender MAY use the Eifel detection algorithm as connection, a TCP sender MAY use the Eifel detection algorithm as
defined in this subsection. defined in this subsection.
If the Eifel detection algorithm is used, the following steps MUST be If the Eifel detection algorithm is used, the following steps MUST be
taken by the TCP sender, but only upon initiation of loss recovery, taken by a TCP sender, but only upon initiation of loss recovery,
i.e., when either the timeout-based retransmit or the fast retransmit i.e., when either the timeout-based retransmit or the fast retransmit
is sent. The Eifel detection algorithm MUST NOT be reinitiated after is sent. The Eifel detection algorithm MUST NOT be reinitiated after
loss recovery has already started. In particular, it may not be loss recovery has already started. In particular, it may not be
reinitiated upon subsequent timeouts for the same segment, and not reinitiated upon subsequent timeouts for the same segment, and not
upon retransmitting segments other than the oldest outstanding upon retransmitting segments other than the oldest outstanding
segment, e.g., during selective loss recovery. segment, e.g., during selective loss recovery.
(1) Set a "RetransmitTS" variable to the value of the (1) Set a "SpuriousRecovery" variable to FALSE (equal 0).
Timestamp Value field of the Timestamps option included
in the retransmit sent when loss recovery is initiated. A (2) Set a "RetransmitTS" variable to the value of the
TCP sender must ensure that RetransmitTS does not get Timestamp Value field of the Timestamps option included in
the retransmit sent when loss recovery is initiated. A TCP
sender must ensure that RetransmitTS does not get
overwritten as loss recovery progresses, e.g., in case of overwritten as loss recovery progresses, e.g., in case of
a second timeout and subsequent second retransmit of the a second timeout and subsequent second retransmit of the
same octet. same octet.
(2) Set a "SpuriousRecovery" variable to FALSE (equal 0).
(3) Wait for the arrival of an acceptable ACK. When an (3) Wait for the arrival of an acceptable ACK. When an
acceptable ACK has arrived proceed to step (4). acceptable ACK has arrived proceed to step (4).
(4) If the acceptable ACK neither carries a SACK option (4) If the value of the Timestamp Echo Reply field of the
[RFC2018] nor a DSACK option [RFC2883], then proceed to acceptable ACK's Timestamps option is smaller than the
step (5), value of the variable RetransmitTS, then proceed to step
(5),
else proceed to step (DONE). else proceed to step (DONE).
(5) If the value of the Timestamp Echo Reply field of the (5) If the acceptable ACK does not carry a DSACK option
acceptable ACK's Timestamps option is smaller than the [RFC2883], then proceed to step (6),
value of the variable RetransmitTS, then proceed to step
(6),
else proceed to step (DONE). else proceed to step (DONE).
(6) If the loss recovery has been initiated with a timeout- (6) If the loss recovery has been initiated with a timeout-
based retransmit, then set based retransmit, then set
SpuriousRecovery <- SPUR_TO (equal 1), SpuriousRecovery <- SPUR_TO (equal 1),
else set else set
SpuriousRecovery <- dupacks+1 SpuriousRecovery <- dupacks+1
(RESP) Do nothing (Placeholder for a response algorithm). (RESP) Do nothing (Placeholder for a response algorithm).
(DONE) No further processing. (DONE) No further processing.
The comparison "smaller than" in step (5) is conservative. In theory, The comparison "smaller than" in step (4) is conservative. In theory,
if the "timestamp clock" is slow or the network is fast, RetransmitTS if the timestamp clock is slow or the network is fast, RetransmitTS
could at most be equal to the timestamp echoed by an ACK sent in could at most be equal to the timestamp echoed by an ACK sent in
response to an original transmit. In that case, it is assumed that response to an original transmit. In that case, it is assumed that
the loss recovery was not falsely triggered. the loss recovery was not falsely triggered.
3.3 A Corner Case: "Timeout due to loss of all ACKs" 3.3 A Corner Case: "Timeout due to loss of all ACKs"
The TCP sender is forced into a timeout even though the oldest Even though the oldest outstanding segment arrived at a TCP receiver,
outstanding segment arrived at the TCP receiver when all ACKs are the TCP sender is forced into a timeout when all ACKs are lost.
lost. Although, the resulting retransmit is unnecessary, such a Although, the resulting retransmit is unnecessary, such a timeout is
timeout is unavoidable. It should therefore not be considered to be unavoidable. It should therefore not be considered to be spurious.
spurious. This particular case of a timeout is considered in this section.
The retransmit will arrive as a duplicate at the TCP receiver. In In that case, the retransmit arrives as a duplicate at the TCP
response to duplicates, RFC1323 mandates that the timestamp of the receiver. In response to duplicates, RFC1323 mandates that the
last segment that arrived in-sequence should be echoed. That timestamp of the last segment that arrived in-sequence should be
timestamp will commonly be smaller than the timestamp carried by the echoed. That timestamp is carried by the first acceptable ACK that
retransmit. Consequently, the Eifel detection algorithm will mistake arrives at the TCP sender after loss recovery was entered, and is
such a timeout as spurious, unless the SACK and DSACK option are commonly smaller than the timestamp carried by the retransmit.
enabled for that TCP connection. Consequently, the Eifel detection algorithm mistakes such a timeout
as spurious, unless that first acceptable ACK also carries a DSACK
option (see step (5) of the algorithm).
Note: Not all TCP implementations strictly follow RFC1323. In Note: Not all TCP implementations strictly follow RFC1323. In
response to a duplicate data segment, some TCP receivers will response to a duplicate data segment, some TCP receivers echo the
echo the timestamp of the duplicate. With such TCP receivers, timestamp of the duplicate. With such TCP receivers, the corner
the corner case discussed in this section does not apply. That case discussed in this section does not apply. The timestamp
is, even though all ACKs were lost, and independent of whether carried by the retransmit would be echoed in the first acceptable
the SACK and DSACK option was enabled for a connection, the ACK, and the Eifel detection algorithm would be terminated through
Eifel detection algorithm would have no effect. step (4). Thus, even though all ACKs were lost, and independent of
whether the DSACK option was enabled for a connection, the Eifel
With the SACK and DSACK option the Eifel detection algorithm will detection algorithm would have no effect.
detect this corner case. This motivates step (4) of the algorithm. If
data segments beyond the oldest outstanding one are also lost, then
the first acceptable ACK that arrives during loss recovery will carry
a SACK or a DSACK option. That ACK must have been sent in response to
a retransmit of the oldest outstanding segment that arrived as a
duplicate at the TCP receiver. It carries a SACK option if there are
holes in the TCP receiver's sequence space. Otherwise, it carries a
DSACK option.
A concern with this corner case arises if the Eifel detection A concern with this corner case arises if the Eifel detection
algorithm is combined with a response algorithm like the Eifel algorithm is combined with a response algorithm like the Eifel
response algorithm [LG02]. That algorithm will back out of loss response algorithm [LG02]. That algorithm backs out of loss recovery
recovery by reversing congestion control state after a spurious by reversing congestion control state after a spurious timeout has
timeout has been detected. I.e., this would also happen in cases when been detected. The argument against doing so, is that the TCP sender
all ACKs are lost, and the SACK and/or the DSACK option are not used should keep its congestion window halved in case all ACKs are lost
for that connection. The argument against that, is that the TCP since that is a sign of severe congestion on the ACK path.
sender should keep its congestion window halved in case all ACKs are
lost since that is a sign of severe congestion on the ACK path.
This is not really a problem as long as data segments were lost in This is not really a problem as long as data segments were lost in
addition to the entire flight of ACKs. The Eifel detection algorithm addition to the entire flight of ACKs. The Eifel detection algorithm
would misinterpret the timeout as spurious, and the Eifel response would misinterpret the timeout as spurious, and the Eifel response
algorithm would reverse congestion control state. Still, the TCP algorithm would reverse congestion control state. Still, the TCP
sender would respond to congestion (in the data path) as soon as it sender would respond to congestion (in the data path) as soon as it
finds out about the first loss in the outstanding flight. I.e., the finds out about the first loss in the outstanding flight. I.e., the
TCP sender would still half its congestion window for that flight of TCP sender would still half its congestion window for that flight of
packets. packets.
The concern remains, though, in case the DSACK option is not enabled, The concern remains, though, in case the DSACK option is not enabled,
and an entire flight of ACKs is lost while all of the data segments and an entire flight of ACKs is lost while all of the data segments
arrive at the TCP receiver. Without the Eifel detection and response arrive at the TCP receiver. Without the Eifel detection and response
algorithm a TCP sender would go into slow start. With those algorithm the TCP sender would go into slow start. With those
algorithms it would not respond to the congestion in the ACK path. algorithms it would not respond to the congestion in the ACK path.
We do not believe that this is a serious concern. First, we assume We do not believe that this is a serious concern. First, we assume
that DSACK will get increasingly deployed. And even if this was not that DSACK will get increasingly deployed. And even if this was not
the case, this special corner case must occur sufficiently often to the case, this special corner case must occur sufficiently often to
become a problem for the progress of a TCP connection. This seems become a problem for the progress of a TCP connection. It is unlikely
rather pathological, since this suggests that the ACK path is pretty that any TCP could manage to grow its congestion window much beyond
much broken. It is unlikely that any TCP could manage to grow its one maximum segment size with such an ACK path. And in that case, the
congestion much beyond one maximum segment size with such an ACK reversing of congestion control state becomes meaningless. Moreover,
path. And in that case, the reversing of congestion control state in an implementation one might choose to disable the Eifel detection
becomes meaningless. algorithm if such an ACK path is encountered.
3.4 Protecting Against Misbehaving TCP Receivers (the Safe Variant) 3.4 Protecting Against Misbehaving TCP Receivers (the Safe Variant)
A TCP receiver can easily make a genuine retransmit appear to the TCP A TCP receiver can easily make a genuine retransmit appear to a TCP
sender as a spurious retransmit by forging echoed timestamps. This sender as a spurious retransmit by forging echoed timestamps. This
may pose a security concern. may pose a security concern.
Fortunately, there is a way to modify the Eifel detection algorithm Fortunately, there is a way to modify the Eifel detection algorithm
in a way that makes it robust against lying TCP receivers. The idea in a way that makes it robust against lying TCP receivers. The idea
is to use timestamps as a "segment's secret" that the TCP receiver is to use timestamps as a "segment's secret" that a TCP receiver only
only gets to know if it receives the segment. Conversely, a TCP gets to know if it receives the segment. Conversely, a TCP receiver
receiver will not know the timestamp of a segment that was lost. will not know the timestamp of a segment that was lost. Hence, to
Hence, to "prove" that it received the original transmit of a segment "prove" that it received the original transmit of a segment that a
that the TCP sender retransmitted, the TCP receiver would need to TCP sender retransmitted, the TCP receiver would need to return the
return the timestamp of that original transmit. The Eifel detection timestamp of that original transmit. The Eifel detection algorithm
algorithm could then be modified to only decide that loss recovery could then be modified to only decide that loss recovery has been
has been unnecessarily entered if the first acceptable ACK echoes the unnecessarily entered if the first acceptable ACK echoes the
timestamp of the original transmit. timestamp of the original transmit.
Hence, implementers may choose to implement the algorithm with the Hence, implementers may choose to implement the algorithm with the
following modifications. following modifications.
Step (1) is replaced with step (1'): Step (2) is replaced with step (2'):
(1') Set a "RetransmitTS" variable to the value of the (2') Set a "RetransmitTS" variable to the value of the
Timestamp Value field of the Timestamps option that was Timestamp Value field of the Timestamps option that was
included in the original transmit corresponding to the included in the original transmit corresponding to the
retransmit. Note: This step requires that the TCP sender retransmit. Note: This step requires that the TCP sender
stores the timestamps of all outstanding original stores the timestamps of all outstanding original
transmits. transmits.
Step (5) is replaced with step (5'): Step (4) is replaced with step (4'):
(5') If the value of the Timestamp Echo Reply field of the (4') If the value of the Timestamp Echo Reply field of the
acceptable ACK's Timestamps option is equal to the value acceptable ACK's Timestamps option is equal to the value
of the variable RetransmitTS, then proceed to step (6), of the variable RetransmitTS, then proceed to step (5),
else proceed to step (DONE). else proceed to step (DONE).
These modifications come at a cost: the modified algorithm is fairly These modifications come at a cost: the modified algorithm is fairly
sensitive against ACK losses since it relies on the arrival of the sensitive against ACK losses since it relies on the arrival of the
acceptable ACK that corresponds to the original transmit. acceptable ACK that corresponds to the original transmit.
Note: The first acceptable ACK that arrives after loss recovery Note: The first acceptable ACK that arrives after loss recovery
has been unnecessarily entered, should echo the timestamp of the has been unnecessarily entered, should echo the timestamp of the
original transmit. This assumes that the ACK corresponding to original transmit. This assumes that the ACK corresponding to the
the original transmit was not lost, that that ACK was not original transmit was not lost, that that ACK was not reordered in
reordered in the network, and that the TCP receiver does not the network, and that the TCP receiver does not forge timestamps
forge timestamps but complies with RFC1323. In case of a but complies with RFC1323. In case of a spurious fast retransmit,
spurious fast retransmit, this is implied by the rules for this is implied by the rules for generating ACKs for data segments
generating ACKs for data segments that fill in all or part of a that fill in all or part of a gap in the sequence space (see
gap in the sequence space (see section 4.2 of [RFC2581]) and by section 4.2 of [RFC2581]) and by the rules for echoing timestamps
the rules for echoing timestamps in that case (see rule (C) in in that case (see rule (C) in section 3.4 of [RFC1323]). In case
section 3.4 of [RFC1323]). In case of a spurious timeout, it is of a spurious timeout, it is likely that the delay (in the data
likely that the delay (in the data path) that has caused the path) that has caused the spurious timeout has also caused the TCP
spurious timeout has also caused the TCP receiver's delayed ACK receiver's delayed ACK timer [RFC1122] to expire before the
timer [RFC1122] to expire before the original transmit arrives. original transmit arrives. Also, in this case the rules for
Also, in this case the rules for generating ACKs and the rules generating ACKs and the rules for echoing timestamps (see rule (A)
for echoing timestamps (see rule (A) in section 3.4 of in section 3.4 of [RFC1323]) ensure that the original transmit's
[RFC1323]) ensure that the original transmit's timestamp is timestamp is echoed.
echoed.
A remaining problem is that a TCP receiver might guess a lost A remaining problem is that a TCP receiver might guess a lost
segment's timestamp from observing the timestamps of segments that segment's timestamp from observing the timestamps of recently
arrived earlier. This could be avoided by having the TCP sender add a received segments. For example, if segment N was lost while segment
"random counter" to the timestamp of every segment it sends, and then N-1 and N+1 have arrived, a TCP receiver could guess the timestamp
increment that counter by a random value, e.g., between 1 and 100. that lies in the middle of the timestamps of segments N-1 and N+1,
and echo it in the ACK for segment N+1. Especially if a TCP sender
implements timestamps with a coarse granularity, a misbehaving TCP
receiver is likely to be successful with such an approach. In fact,
with the 500 ms granularity suggested in [WS95], it even becomes
quite likely that the timestamps of segments N-1, N, N+1 are
identical.
This ensures that timestamps remain monotonously increasing while One way to reduce this risk is to implement fine grained timestamps.
making it difficult for a TCP receiver to guess the timestamp of a Note that the granularity of the timestamps is independent of the
lost segment. granularity of the retransmission timer. For example, some TCP
implementations run a timestamp clock that even ticks every
microsecond. This should make it more difficult for a TCP receiver to
guess the timestamp of a lost segment. Alternatively, it might be
possible to combine the timestamps with a nonce, as done for the
Explicit Congestion Notification (ECN) [RFC3168]. One would need to
take care, though, that the timestamps of succeeding segments remain
monotonously increasing and do not interfere with the RTT timing
defined in [RFC1323].
4. Security Considerations 4. Security Considerations
There do not seem to be any security considerations associated with There do not seem to be any security considerations associated with
the Eifel detection algorithm. This is because the Eifel detection the Eifel detection algorithm. This is because the Eifel detection
algorithm does not alter existing protocol state at the TCP sender algorithm does not alter existing protocol state at a TCP sender.
nor at the receiver. That is, no value of a TCP sender's state Note that the Eifel detection algorithm only requires changes to the
variable is changed. implementation of the TCP sender.
Moreover, a variant of the Eifel detection algorithm has been Moreover, a variant of the Eifel detection algorithm has been
proposed in Section 3.4 that makes it robust against lying TCP proposed in Section 3.4 that makes it robust against lying TCP
receivers. receivers.
Acknowledgments Acknowledgments
Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally
Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi
Sarolahti, and Alexey Kuznetsov for useful discussions that Sarolahti, and Alexey Kuznetsov for useful discussions that
skipping to change at page 10, line 51 skipping to change at page 11, line 10
Performance, RFC 1323, May 1992. Performance, RFC 1323, May 1992.
[RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective
Acknowledgement Options, RFC 2018, October 1996. Acknowledgement Options, RFC 2018, October 1996.
[RFC793] J. Postel, Transmission Control Protocol, RFC793, September [RFC793] J. Postel, Transmission Control Protocol, RFC793, September
1981. 1981.
Informative References Informative References
[BA02] E. Blanton, M. Allman, Using TCP DSACKs and SCTP Duplicate
TSNs to Detect Spurious Retransmissions, work in progress,
October 2002..
[RFC1122] R. Braden, Requirements for Internet Hosts - Communication [RFC1122] R. Braden, Requirements for Internet Hosts - Communication
Layers, RFC 1122, October 1989. Layers, RFC 1122, October 1989.
[RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's
Fast Recovery Algorithm, RFC 2582, April 1999. Fast Recovery Algorithm, RFC 2582, April 1999.
[Gu01] A. Gurtov, Effect of Delays on TCP Performance, In [Gu01] A. Gurtov, Effect of Delays on TCP Performance, In
Proceedings of IFIP Personal Wireless Communications, Proceedings of IFIP Personal Wireless Communications,
August 2001. August 2001.
skipping to change at page 11, line 33 skipping to change at page 11, line 48
[LG02] R. Ludwig, A. Gurtov, The Eifel Response Algorithm for TCP, [LG02] R. Ludwig, A. Gurtov, The Eifel Response Algorithm for TCP,
work in progress, October 2002. work in progress, October 2002.
[Lu02] R. Ludwig, Responding to Fast Timeouts in TCP, work in [Lu02] R. Ludwig, Responding to Fast Timeouts in TCP, work in
progress, July 2002. progress, July 2002.
[RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer,
RFC 2988, November 2000. RFC 2988, November 2000.
[RFC791] J. Postel, Internet Protocol, RFC 791, September 1981. [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981.
[RFC3168] K. Ramakrishnan, S. Floyd, D. Black, The Addition of
Explicit Congestion Notification (ECN) to IP, RFC 3168,
September 2001.
[SK02] P. Sarolahti, M. Kojo, F-RTO: A TCP RTO Recovery Algorithm
for Avoiding Unnecessary Retransmissions, work in progress,
June 2002.
[WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2
(The Implementation), Addison Wesley, January 1995. (The Implementation), Addison Wesley, January 1995.
[Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of [Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of
ACM SIGCOMM 88. ACM SIGCOMM 88.
Author's Address Author's Address
Reiner Ludwig Reiner Ludwig
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/