draft-ietf-tsvwg-tcp-eifel-alg-03.txt   draft-ietf-tsvwg-tcp-eifel-alg-04.txt 
Network Working Group Reiner Ludwig Network Working Group Reiner Ludwig
INTERNET-DRAFT Michael Meyer INTERNET-DRAFT Michael Meyer
Expires: August 2002 Ericsson Research Expires: January 2003 Ericsson Research
February, 2002 July, 2002
The Eifel Detection Algorithm for TCP The Eifel Detection Algorithm for TCP
<draft-ietf-tsvwg-tcp-eifel-alg-03.txt> <draft-ietf-tsvwg-tcp-eifel-alg-04.txt>
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
skipping to change at page 1, line 36 skipping to change at page 1, line 36
http://www.ietf.org/ietf/lid-abstracts.txt http://www.ietf.org/ietf/lid-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Abstract Abstract
The Eifel detection algorithm allows the TCP sender to detect a The Eifel detection algorithm allows the TCP sender to detect a
posteriori whether it has entered loss recovery unnecessarily. It posteriori whether it has entered loss recovery unnecessarily. It
already determines this in the beginning of loss recovery when the already determines this in the beginning of loss recovery when the
first new ACK arrives after the timeout-based or fast retransmit has first acceptable ACK arrives after the timeout-based retransmit or
been sent. The algorithm requires that the TCP Timestamps option fast retransmit has been sent. The algorithm requires that the TCP
defined in RFC1323 is enabled for a connection. The idea is to use Timestamps option defined in RFC1323 is enabled for a connection. The
the timestamps echoed in returning ACKs to eliminate the idea is to use the timestamps echoed in returning ACKs to eliminate
retransmission ambiguity in TCP. The Eifel detection algorithm the retransmission ambiguity in TCP. The Eifel detection algorithm
provides a basis for future TCP enhancements. This includes response provides a basis for future TCP enhancements. This includes response
algorithms to back out of loss recovery by restoring a TCP sender's algorithms to back out of loss recovery by restoring a TCP sender's
congestion control state. congestion control state.
Terminology Terminology
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119]. document, are to be interpreted as described in [RFC2119].
We use the term 'new ACK' to refer to an ACK that acknowledges We refer to the first-time transmission of an octet as the 'original
previously unacknowledged data. We use the term 'duplicate ACK' as transmit'. A subsequent transmission of the same octet is referred to
defined in [WS95]. Furthermore, we refer to the first-time as a 'retransmit'. In most cases this terminology can likewise be
transmission of a data segment as the 'original transmit'. applied to "data segments" as opposed to "octets". However, with
repacketization a segment can contain both first-time transmissions
and retransmissions of octets. In that case this terminology is only
consistent when applied to "octets". For the Eifel detection
algorithm this makes no difference as it also operates correctly when
repacketization occurs.
We use the term 'acceptable ACK' as defined in [RFC793]. That is an
ACK that acknowledges previously unacknowledged data. We use the term
'duplicate ACK', and the variable 'dupacks' as defined in [WS95]. The
variable 'dupacks' is a counter of duplicate ACKs that have already
been received by the TCP sender before the fast retransmit is sent.
We use the variable 'DupThresh' to refer to the so-called duplicate
acknowledgement threshold, i.e., the number of duplicate ACKs that
need to arrive at the TCP sender to trigger a fast retransmit.
Currently, DupThresh is specified as a fixed value of three
[RFC2581]. Future TCPs might implement an adaptive DupThresh.
1. Introduction 1. Introduction
The retransmission ambiguity problem [KP87] is the TCP sender's The retransmission ambiguity problem [KP87] is the TCP sender's
inability to distinguish whether the first new ACK that arrives after inability to distinguish whether the first acceptable ACK that
a retransmit was sent in response to the original transmit or the arrives after a retransmit, was sent in response to the original
retransmit. This problem occurs after a timeout-based retransmit and transmit or the retransmit. This problem occurs after a timeout-based
after a fast retransmit. The Eifel detection algorithm uses the TCP retransmit and after a fast retransmit. The Eifel detection algorithm
Timestamps option defined in RFC1323 to eliminate the retransmission uses the TCP Timestamps option defined in [RFC1323] to eliminate the
ambiguity. It thereby allows the TCP sender to detect a posteriori retransmission ambiguity. It thereby allows the TCP sender to detect
whether it has entered loss recovery unnecessarily. a posteriori whether it has entered loss recovery unnecessarily.
This added capability of the TCP sender is useful in environments This added capability of the TCP sender is useful in environments
where TCP's loss recovery and congestion control algorithms may often where TCP's loss recovery and congestion control algorithms may often
get falsely triggered. This can be caused by packet reordering, get falsely triggered. This can be caused by packet reordering,
packet duplication, the loss of a flight of ACKs, or a sudden delay packet duplication, the loss of a flight of ACKs, or a sudden delay
increase in the data or the ACK path that results in a spurious increase in the data or the ACK path that results in a spurious
timeout. For example, such sudden delay increases can often occur in timeout. For example, such sudden delay increases can often occur in
wide-area wireless access networks due to handovers, resource wide-area wireless access networks due to handovers, resource
preemption, or because the mobile transmitter traverses through a preemption due to higher priority traffic (e.g., voice), or because
radio coverage hole (e.g., see [Gu01]). the mobile transmitter traverses through a radio coverage hole (e.g.,
see [Gu01]). In such wireless networks, the often unnecessary
go-back-N retransmits that typically occur after a spurious timeout
create a serious problem. They decrease end-to-end throughput, are
useless load upon a potentially congested network, and waste
transmission (battery) power.
Based on the Eifel detection algorithm, the TCP sender may then Based on the Eifel detection algorithm, the TCP sender may then
choose to implement dedicated response algorithms. One goal of such a choose to implement dedicated response algorithms. One goal of such a
response algorithm would be to alleviate the consequences of a response algorithm would be to alleviate the consequences of a
falsely triggered loss recovery. This may include restoring the TCP falsely triggered loss recovery. This may include restoring the TCP
sender's congestion control state, and/or avoiding the often sender's congestion control state, and avoiding the mentioned
unnecessary go-back-N retransmits that typically occur after a unnecessary go-back-N retransmits. Another goal would be to adapt
spurious timeout. Another goal would be to adapt protocol parameters protocol parameters such as the duplicate acknowledgement threshold
such as the duplicate acknowledgement threshold [RFC2581], and/or the [RFC2581], and the RTT estimators [RFC2988]. This is to reduce the
RTT estimators [RFC2988]. This is to reduce the risk of falsely risk of falsely triggering TCP's loss recovery again as the
triggering TCP's loss recovery again as the connection progresses. connection progresses. However, such response algorithms are outside
However, such response algorithms are outside the scope of this the scope of this document. Note: The original proposal, the "Eifel
document. Note: The original proposal, the "Eifel algorithm" [LK00], algorithm" [LK00], comprises both a detection and a response
comprises both a detection and a response algorithm. This document algorithm. This document only defines the detection part. The
only defines the detection part. response part is defined in [LG02].
A key feature of the Eifel detection algorithm is that it already A key feature of the Eifel detection algorithm is that it already
detects upon the first new ACK that arrives during loss recovery detects upon the first acceptable ACK that arrives during loss
whether a fast retransmit or a timeout was spurious. This is crucial recovery whether a fast retransmit or a timeout was spurious. This is
to be able to avoid the mentioned go-back-N retransmits. Another crucial to be able to avoid the mentioned go-back-N retransmits.
feature is that the Eifel detection algorithm is fairly robust Another feature is that the Eifel detection algorithm is fairly
against the loss of ACKs. robust against the loss of ACKs.
2. Events that Falsely Trigger TCP Loss Recovery 2. Events that Falsely Trigger TCP Loss Recovery
The following events falsely trigger a TCP sender's loss recovery and The following events falsely trigger a TCP sender's loss recovery and
congestion control algorithms. This causes a so-called spurious congestion control algorithms. This causes a so-called spurious
retransmit, and an unnecessary reduction of the TCP sender's retransmit, and an unnecessary reduction of the TCP sender's
congestion window and slow start threshold [RFC2581]. congestion window and slow start threshold [RFC2581].
- Spurious timeout - Spurious timeout
- Packet reordering - Packet reordering
- Packet duplication - Packet duplication
The common understanding of a spurious timeout is a timeout that The common understanding of a spurious timeout is a timeout that
would not have occurred had the sender "waited longer". This may be would not have occurred had the sender "waited longer". This may be
caused by increased delay that suddenly occurs in the data and/or the caused by increased delay that suddenly occurs in the data and/or the
ACK path. That in turn might cause an ACK to arrive too late, i.e., ACK path. That in turn might cause an acceptable ACK to arrive too
only after the TCP sender's retransmission timer has expired. Note late, i.e., only after the TCP sender's retransmission timer has
that such a late ACK could either be a new ACK that would acknowledge expired. For the purpose of specifying the algorithm in Section 3, we
the oldest outstanding segment, or it could be the third duplicate define this case as SPUR_TO (equal 1). However, in this document, we
ACK that would trigger a fast retransmit of the oldest outstanding stretch the definition of a spurious timeout. We also speak of a
segment [RFC2581]. For the purpose of specifying the algorithm in spurious timeout when the timeout occurred because an entire flight
Section 3, we define the former case as SPUR_TO_NEWACK, and the of ACKs was lost while in fact the original transmit of the oldest
latter as SPUR_TO_DUPACK. However, in this document we stretch the outstanding segment had arrived at the TCP receiver. This case is
definition of a spurious timeout. We also speak of a spurious timeout also considered to fall under the definition of SPUR_TO since a TCP
when the timeout occurred because an entire flight of ACKs was lost sender is not able to distinguish it from the other case mentioned
while in fact the original transmit of the oldest outstanding segment above. Such a timeout is certainly unavoidable, i.e., it would not
had arrived at the TCP receiver. This case is also considered to fall have made a difference if the TCP sender had "waited longer". Still,
under the definition of SPUR_TO_NEWACK. Even though such a timeout is the triggering of loss recovery was spurious since that original
certainly unavoidable, the triggering of loss recovery was spurious transmit was not lost, i.e., there had been no congestion in the data
since that original transmit was not lost. Hence, we use the term path. Hence, we use the term 'spurious timeout' in the sense that
'spurious timeout' in the sense that entering the associated loss initiating the associated loss recovery was unnecessary. Yet, we
recovery was unnecessary. Yet, we exclude from the definition of a exclude from the definition of a spurious timeout the case where the
spurious timeout the case where the retransmit arrives at the TCP retransmit arrives at the TCP receiver before the corresponding
receiver before the corresponding original transmit. original transmit.
Note: There is another case where a timeout would not have
occurred had the sender "waited longer": the retransmission
timer expires, and afterwards the TCP sender receives the
duplicate ACK that would have triggered a fast retransmit of the
oldest outstanding segment. We call this a "fast timeout" since
in competition with the fast retransmit algorithm the timeout
was faster. However, a fast timeout is not spurious since
apparently a segment was in fact lost, i.e., loss recovery was
initiating rightfully. In this document, we do not consider fast
timeouts. This case is addressed in an independent document
[Lu02].
Packet reordering in the network may occur because IP [RFC791] does Packet reordering in the network may occur because IP [RFC791] does
not guarantee in-order delivery of packets. Additionally, a TCP not guarantee in-order delivery of packets. Additionally, a TCP
receiver generates a duplicate ACK for each segment that arrives out- receiver generates a duplicate ACK for each segment that arrives out-
of-order. This results in a spurious fast retransmit if three or more of-order. This results in a spurious fast retransmit if three or more
data segments arrive out-of-order at the TCP receiver, and at least data segments arrive out-of-order at the TCP receiver, and at least
three of the resulting duplicate ACKs arrive at the TCP sender. We three of the resulting duplicate ACKs arrive at the TCP sender. This
define such a case as SPUR_FR. assumes that the duplicate acknowledgement threshold is set to three
as defined in [RFC2581].
Packet duplication may occur because a receiving IP does not (cannot) Packet duplication may occur because a receiving IP does not (cannot)
remove packets that have been duplicated in the network. A TCP remove packets that have been duplicated in the network. A TCP
receiver in turn also generates a duplicate ACK for each duplicate receiver in turn also generates a duplicate ACK for each duplicate
segment. As with packet reordering, this results in a spurious fast segment. As with packet reordering, this results in a spurious fast
retransmit if duplication of data segments or ACKs results in three retransmit if duplication of data segments or ACKs results in three
or more duplicate ACKs to arrive at the TCP sender. This case is also or more duplicate ACKs to arrive at the TCP sender. Again, this
considered to fall under the definition of SPUR_FR. assumes that the duplicate acknowledgement threshold is set to three.
The negative impact on TCP performance caused by packet reordering The negative impact on TCP performance caused by packet reordering
and packet duplication is commonly the same: a single spurious and packet duplication is commonly the same: a single spurious
retransmit (the fast retransmit), and the unnecessary halving of the retransmit (the fast retransmit), and the unnecessary halving of the
TCP sender's congestion window as a result of the subsequent fast TCP sender's congestion window as a result of the subsequent fast
recovery phase [RFC2581]. recovery phase [RFC2581].
The negative impact on TCP performance caused by a spurious timeout The negative impact on TCP performance caused by a spurious timeout
is more severe. First, the timeout event itself causes a single is more severe. First, the timeout event itself causes a single
spurious retransmit, and unnecessarily forces the TCP sender into spurious retransmit, and unnecessarily forces the TCP sender into
skipping to change at page 4, line 21 skipping to change at page 5, line 4
and packet duplication is commonly the same: a single spurious and packet duplication is commonly the same: a single spurious
retransmit (the fast retransmit), and the unnecessary halving of the retransmit (the fast retransmit), and the unnecessary halving of the
TCP sender's congestion window as a result of the subsequent fast TCP sender's congestion window as a result of the subsequent fast
recovery phase [RFC2581]. recovery phase [RFC2581].
The negative impact on TCP performance caused by a spurious timeout The negative impact on TCP performance caused by a spurious timeout
is more severe. First, the timeout event itself causes a single is more severe. First, the timeout event itself causes a single
spurious retransmit, and unnecessarily forces the TCP sender into spurious retransmit, and unnecessarily forces the TCP sender into
slow start [RFC2581]. Then, as the connection progresses, a chain slow start [RFC2581]. Then, as the connection progresses, a chain
reaction gets triggered that further decreases TCP's performance. reaction gets triggered that further decreases TCP's performance.
Since the timeout was spurious, at least some ACKs for original Since the timeout was spurious, at least some ACKs for original
transmits will typically arrive at the TCP sender before the ACK for transmits will typically arrive at the TCP sender before the ACK for
the retransmit arrives. (This is unless severe packet reordering the retransmit arrives. (This is unless severe packet reordering
coincided with the spurious timeout in such a way that the ACK for coincided with the spurious timeout in such a way that the ACK for
the retransmit is the first new ACK to arrive at the TCP sender.) the retransmit is the first acceptable ACK to arrive at the TCP
Those ACKs for original transmits will then trigger an implicit go- sender.) Those ACKs for original transmits will then trigger an
back-N loss recovery at the TCP sender. Assuming that none of the implicit go-back-N loss recovery at the TCP sender. Assuming that
outstanding segments and none of the corresponding ACKs were lost, none of the outstanding segments and none of the corresponding ACKs
all outstanding segments will get retransmitted unnecessarily. were lost, all outstanding segments will get retransmitted
Moreover, in some TCPs this may then cause a spurious fast unnecessarily. In fact, during this phase the TCP sender breaks
retransmit. This is because the spurious go-back-N retransmits will 'packet conservation' [Jac88]. This is because the unnecessary
arrive as duplicates at the TCP receiver, which in turn triggers a go-back-N retransmits are sent during slow start, i.e., for each
series of duplicate ACKs. Note that this last spurious fast original packet leaving the network, two useless retransmits are sent
retransmit could be avoided with the careful variant of 'bug fix' into the network. Moreover, some TCPs will in addition suffer from a
[RFC2582]. spurious fast retransmit. This is because the spurious go-back-N
retransmits will arrive as duplicates at the TCP receiver, which in
turn triggers a series of duplicate ACKs. Note that this last
spurious fast retransmit could be avoided with the careful variant of
'bug fix' [RFC2582].
More detailed explanations including TCP trace plots that visualize More detailed explanations including TCP trace plots that visualize
the effects of spurious timeouts and packet reordering can be found the effects of spurious timeouts and packet reordering can be found
in [LK00]. in [LK00].
3. The Eifel Detection Algorithm 3. The Eifel Detection Algorithm
3.1 The Idea 3.1 The Idea
The goal of the Eifel detection algorithm is to allow the TCP sender The goal of the Eifel detection algorithm is to allow the TCP sender
to detect a posteriori whether it has entered loss recovery to detect a posteriori whether it has entered loss recovery
unnecessarily. Furthermore, the TCP sender should be able to make unnecessarily. Furthermore, the TCP sender should be able to make
this decision upon the first new ACK that arrives after the timeout- this decision upon the first acceptable ACK that arrives after the
based or fast retransmit has been sent. This in turn requires extra timeout-based retransmit or the fast retransmit has been sent. This
information in every ACK by which the TCP sender can unambiguously in turn requires extra information in every ACK by which the TCP
distinguish whether that first new ACK was sent in response to the sender can unambiguously distinguish whether that first acceptable
original transmit or the retransmit. Such extra information is ACK was sent in response to the original transmit or the retransmit.
provided by the TCP Timestamps option [RFC1323]. Generally speaking, Such extra information is provided by the TCP Timestamps option
timestamps are monotonously increasing "serial numbers" added into [RFC1323]. Generally speaking, timestamps are monotonously increasing
every segment that are then echoed within the corresponding ACKs. "serial numbers" added into every segment that are then echoed within
This is exploited by the Eifel detection algorithm in the following the corresponding ACKs. This is exploited by the Eifel detection
way. algorithm in the following way.
Given that timestamps are enabled for a connection the TCP sender Given that timestamps are enabled for a connection, the TCP sender
always stores the timestamp of the retransmit sent in the beginning always stores the timestamp of the retransmit sent in the beginning
of loss recovery, i.e., the timestamp of the timeout-based or the of loss recovery, i.e., the timestamp of the timeout-based retransmit
fast retransmit. If the timestamp of the first new ACK, arriving or the fast retransmit. If the timestamp of the first acceptable ACK,
after the mentioned retransmit was sent, is smaller then the stored arriving after the retransmit was sent, is smaller then the stored
timestamp of that retransmit, then that ACK must have been sent in timestamp of that retransmit, then that ACK must have been sent in
response to an original transmit. Hence, the TCP sender must have response to an original transmit. Hence, the TCP sender must have
entered loss recovery unnecessarily. entered loss recovery unnecessarily.
The fact that the Eifel detection algorithm decides upon this first The fact that the Eifel detection algorithm decides upon the first
new ACK is crucial to allow future response algorithms to avoid the acceptable ACK is crucial to allow future response algorithms to
spurious go-back-N retransmits that typically occur after a spurious avoid the spurious go-back-N retransmits that typically occur after a
timeout. Also, if loss recovery was entered unnecessarily, a window spurious timeout. Also, if loss recovery was entered unnecessarily, a
worth of ACKs are outstanding that all carry a timestamp that is window worth of ACKs are outstanding that all carry a timestamp that
smaller than the stored timestamp of the retransmit. The arrival of is smaller than the stored timestamp of the retransmit. The arrival
any one of those ACKs suffices the Eifel detection algorithm to work. of any one of those ACKs suffices the Eifel detection algorithm to
Hence, the solution is fairly robust against ACK losses. Even the ACK work. Hence, the solution is fairly robust against ACK losses. Even
sent in response to the retransmit, i.e., the one that carries the the ACK sent in response to the retransmit, i.e., the one that
stored timestamp, may get lost. carries the stored timestamp, may get lost.
Note: Also the DSACK option [RFC2883] can be used to detect a Note: Also the DSACK option [RFC2883] can be used to detect a
posteriori whether the TCP sender has entered loss recovery posteriori whether the TCP sender has entered loss recovery
unnecessarily. However, the first ACK carrying a DSACK option usually unnecessarily. However, the first ACK carrying a DSACK option
arrives at the TCP sender only after loss recovery has already usually arrives at the TCP sender only after loss recovery has
terminated. Thus, the DSACK option cannot be used to eliminate the already terminated. Thus, the DSACK option cannot be used to
retransmission ambiguity. Consequently, it cannot be used to avoid eliminate the retransmission ambiguity. Consequently, it cannot
the mentioned spurious go-back-N retransmits. Moreover, a DSACK-based be used to avoid the mentioned spurious go-back-N retransmits.
detection algorithm is less robust against ACK losses. Moreover, a DSACK-based detection algorithm is less robust
against ACK losses.
3.2 The Algorithm 3.2 The Algorithm
Given that the TCP Timestamps option [RFC1323] is enabled for a Given that the TCP Timestamps option [RFC1323] is enabled for a
connection, a TCP sender MAY use the Eifel detection algorithm as connection, a TCP sender MAY use the Eifel detection algorithm as
defined in this subsection. defined in this subsection.
If the Eifel detection algorithm is used, the following steps MUST be If the Eifel detection algorithm is used, the following steps MUST be
taken by the TCP sender only upon initiation of loss recovery, i.e., taken by the TCP sender, but only upon initiation of loss recovery,
when either the timeout-based retransmit or fast retransmit is sent. i.e., when either the timeout-based retransmit or the fast retransmit
Note: The Eifel detection algorithm should not be (re-)initiated is sent. The Eifel detection algorithm MUST NOT be reinitiated after
after loss recovery has already started. In particular, it may not be loss recovery has already started. In particular, it may not be
(re-)initiated upon subsequent timeouts for the same segment, and not reinitiated upon subsequent timeouts for the same segment, and not
upon retransmitting segments other than the oldest outstanding upon retransmitting segments other than the oldest outstanding
segment. segment, e.g., during selective loss recovery.
(1) Set a "SpuriousRecovery" variable to FALSE.
(2) Set a "RetransmitTS" variable to the value of the (1) Set a "RetransmitTS" variable to the value of the
Timestamp Value field of the Timestamps option included Timestamp Value field of the Timestamps option included
in the retransmit. A TCP sender must ensure that in the retransmit sent when loss recovery is initiated. A
RetransmitTS does not get overwritten as loss recovery TCP sender must ensure that RetransmitTS does not get
progresses, e.g., in case of a second timeout and overwritten as loss recovery progresses, e.g., in case of
subsequent second retransmit of the same octet. a second timeout and subsequent second retransmit of the
same octet.
(3) Wait for the arrival of a valid ACK. If a valid ACK (2) Set a "SpuriousRecovery" variable to FALSE.
arrives, then proceed to step (4).
(4) If a first or second duplicate ACK has arrived, then (3) Wait for the arrival of an acceptable ACK. If an
return to step (3), else proceed to step (5). acceptable ACK has arrived, then proceed to step (4).
(5) If a third duplicate ACK has arrived and the loss (4) If the value of the Timestamp Echo Reply field of the
recovery had been initiated with a timeout-based acceptable ACK's Timestamps option is smaller than the
retransmit, then set SpuriousRecovery to SPUR_TO_DUPACK value of the variable RetransmitTS, then proceed to step
and proceed to step (DONE), else proceed to step (6). (5),
(6) If a new ACK has arrived and the Timestamp Echo Reply else proceed to step (DONE).
field of the ACK's Timestamps option is smaller than
RetransmitTS, then proceed to step (7), else proceed to
step (DONE).
(7) If the loss recovery had been initiated with a timeout- (5) If the loss recovery has been initiated with a timeout-
based retransmit then set SpuriousRecovery to based retransmit, then set
SPUR_TO_NEWACK, else set SpuriousRecovery to SPUR_FR. SpuriousRecovery <- SPUR_TO,
Proceed to step (DONE).
else set
SpuriousRecovery <- dupacks+1
(RESP) Do nothing (Placeholder for a response algorithm).
(DONE) No further processing. (DONE) No further processing.
The comparison operator "smaller than" in step (6) is conservative. The comparison "smaller than" in step (4) is conservative. In theory,
In theory, when the "timestamp clock" is slow or the network is fast, when the "timestamp clock" is slow or the network is fast,
RetransmitTS could at most be equal to the timestamp echoed by an ACK RetransmitTS could at most be equal to the timestamp echoed by an ACK
sent in response to an original transmit. In that case, it is assumed sent in response to an original transmit. In that case, it is assumed
that the loss recovery was not falsely triggered. that the loss recovery was not falsely triggered.
3.3 Non-RFC1323-compliant TCP Receivers Mask Certain Spurious Timeouts 3.3 Non-RFC1323-compliant TCP Receivers Mask Certain Spurious Timeouts
Not all TCP implementations strictly follow RFC1323. There are Not all TCP implementations strictly follow RFC1323. There are
differences concerning which timestamp a TCP receiver echoes when a differences concerning which timestamp a TCP receiver echoes when a
duplicate segment arrives. The relevant case to consider in this duplicate segment arrives. The relevant case to consider in this
context is a spurious timeout that occurred because an entire flight context is a spurious timeout that occurred because an entire flight
of ACKs got lost. (Recall the definition of a spurious timeout in of ACKs got lost (recall the definition of a spurious timeout from
Section 2.) The question is which timestamp the TCP receiver echoes Section 2). The question is which timestamp the TCP receiver echoes
in response to the spurious retransmit that typically arrives as a in response to the spurious retransmit that typically arrives as a
duplicate segment. RFC1323-compliant TCP receivers (e.g., LINUX duplicate segment. RFC1323-compliant TCP receivers (e.g., LINUX
kernel 2.4) will echo the timestamp of the last segment that arrived kernel 2.4.x) will echo the timestamp of the last segment that
in-sequence, while some non-RFC1323-compliant TCP receivers will echo arrived in-sequence, while some non-RFC1323-compliant TCP receivers
the timestamp of the retransmit. will echo the timestamp of the retransmit.
The Eifel detection algorithm will only detect such spurious timeouts The Eifel detection algorithm will only detect such spurious timeouts
in case the TCP receiver is RFC1323-compliant in this respect. in case the TCP receiver is RFC1323-compliant in this respect.
Otherwise, the Eifel detection algorithm will simply not kick in.
This is not any different than from running the TCP sender without
the Eifel algorithm.
3.4 Protecting Against Misbehaving TCP Receivers 3.4 Protecting Against Misbehaving TCP Receivers (the Safe Variant)
A TCP receiver can easily make a genuine retransmit appear to the TCP A TCP receiver can easily make a genuine retransmit appear to the TCP
sender as a spurious retransmit by forging echoed timestamps. This sender as a spurious retransmit by forging echoed timestamps. This
may pose a security concern. may pose a security concern.
Fortunately, there is a way to modify the Eifel detection algorithm Fortunately, there is a way to modify the Eifel detection algorithm
in a way that makes it robust against lying TCP receivers. The idea in a way that makes it robust against lying TCP receivers. The idea
is to use timestamps as a "segment's secret" that the TCP receiver is to use timestamps as a "segment's secret" that the TCP receiver
only gets to know if it receives the segment. Conversely, a TCP only gets to know if it receives the segment. Conversely, a TCP
receiver will not know the timestamp of a segment that was lost. receiver will not know the timestamp of a segment that was lost.
Hence, to "prove" that it received the original transmit of a segment Hence, to "prove" that it received the original transmit of a segment
that the TCP sender retransmitted, the TCP receiver would need to that the TCP sender retransmitted, the TCP receiver would need to
return the timestamp of that original transmit. The Eifel detection return the timestamp of that original transmit. The Eifel detection
algorithm could then be modified to only decide that loss recovery algorithm could then be modified to only decide that loss recovery
had been unnecessarily entered if the first new ACK echoes the has been unnecessarily entered if the first acceptable ACK echoes the
timestamp of the original transmit. timestamp of the original transmit.
Hence, implementers may choose to implement the algorithm with the Hence, implementers may choose to implement the algorithm with the
following modifications. following modifications.
Step (2) is replaced with step (2'): Step (1) is replaced with step (1'):
(2') Set a "RetransmitTS" variable to the value of the (1') Set a "RetransmitTS" variable to the value of the
Timestamp Value field of the Timestamps option that was Timestamp Value field of the Timestamps option that was
included in the original transmit corresponding to the included in the original transmit corresponding to the
retransmit. Note: This step requires that the TCP sender retransmit. Note: This step requires that the TCP sender
stores the timestamps of all outstanding original stores the timestamps of all outstanding original
transmits. transmits.
Step (4) is replaced with step (4'): Step (4) is replaced with step (4'):
(4') If a first or second duplicate ACK has arrived, then (4') If the value of the Timestamp Echo Reply field of the
proceed to step (DONE), else proceed to step (6). acceptable ACK's Timestamps option is equal to the value
of the variable RetransmitTS, then proceed to step (5),
Step (6) is replaced with step (6'):
(6') If a new ACK has arrived and the Timestamp Echo Reply else proceed to step (DONE).
field of the ACK's Timestamps option is equal to
RetransmitTS, then proceed to step (7), else proceed to
step (DONE).
These modifications come at a cost. First, spurious timeouts of type These modifications come at a cost: the modified algorithm is fairly
SPUR_TO_DUPACK cannot be detected any longer. Also, the modified sensitive against ACK losses since it relies on the arrival of the
algorithm is fairly sensitive against ACK losses since it relies on acceptable ACK that corresponds to the original transmit.
the arrival of the new ACK that corresponds to the original transmit.
Note: The first new ACK that arrives after loss recovery had been Note: The first acceptable ACK that arrives after loss recovery
unnecessarily entered, typically echoes the timestamp of the original has been unnecessarily entered, should echo the timestamp of the
transmit. This assumes that the ACK corresponding to the original original transmit. This assumes that the ACK corresponding to
transmit was not lost, and the TCP receiver does not forge timestamps the original transmit was not lost, that that ACK was not
but complies with RFC1323. In case of a spurious fast retransmit, reordered in the network, and that the TCP receiver does not
this is implied by the rules for generating ACKs for data segments forge timestamps but complies with RFC1323. In case of a
that fill in all or part of a gap in the sequence space (see section spurious fast retransmit, this is implied by the rules for
4.2 of [RFC2581]) and by the rules for echoing timestamps in that generating ACKs for data segments that fill in all or part of a
case (see rule (C) in section 3.4 of [RFC1323]). In case of a gap in the sequence space (see section 4.2 of [RFC2581]) and by
spurious timeout, it is likely that the delay that has caused the the rules for echoing timestamps in that case (see rule (C) in
spurious timeout has also caused the TCP receiver's delayed ACK timer section 3.4 of [RFC1323]). In case of a spurious timeout, it is
[RFC1122] to expire before the original transmit arrives. Also, in likely that the delay that has caused the spurious timeout has
this case the rules for generating ACKs and the rules for echoing also caused the TCP receiver's delayed ACK timer [RFC1122] to
timestamps (see rule (A) in section 3.4 of [RFC1323]) ensure that the expire before the original transmit arrives. Also, in this case
original transmit's timestamp is echoed. the rules for generating ACKs and the rules for echoing
timestamps (see rule (A) in section 3.4 of [RFC1323]) ensure
that the original transmit's timestamp is echoed.
A remaining problem is that a TCP receiver might guess a lost A remaining problem is that a TCP receiver might guess a lost
segment's timestamp from observing the timestamps of segments that segment's timestamp from observing the timestamps of segments that
arrived earlier. This could be avoided by having the TCP sender add a arrived earlier. This could be avoided by having the TCP sender add a
"random counter" to the timestamp of every segment it sends, and then "random counter" to the timestamp of every segment it sends, and then
increment that counter by a random value, e.g., between 1 and 100. increment that counter by a random value, e.g., between 1 and 100.
This ensures that timestamps remain monotonously increasing while This ensures that timestamps remain monotonously increasing while
making it difficult for a TCP receiver to guess the timestamp of a making it difficult for a TCP receiver to guess the timestamp of a
lost segment. lost segment.
4. Security Considerations 4. Security Considerations
There do not seem to be any security considerations associated with There do not seem to be any security considerations associated with
the Eifel detection algorithm. This is because the Eifel detection the Eifel detection algorithm. This is because the Eifel detection
algorithm does not alter existing protocol state at the TCP sender algorithm does not alter existing protocol state at the TCP sender
nor at the receiver. That is, no value of a state variable other than nor at the receiver. That is, no value of a TCP sender's state
one introduced by the algorithm itself (SpuriousRecovery, and variable is changed.
RetransmitTS) is changed.
Moreover, a variant of the Eifel detection algorithm has been Moreover, a variant of the Eifel detection algorithm has been
proposed in Section 3.4 that makes it robust against lying TCP proposed in Section 3.4 that makes it robust against lying TCP
receivers. receivers.
Acknowledgments Acknowledgments
Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally
Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi
Sarolahti, and Alexey Kuznetsov for useful discussions that Sarolahti, and Alexey Kuznetsov for useful discussions that
contributed to this work. contributed to this work.
References References
[RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control,
RFC 2581, April 1999. RFC 2581, April 1999.
[RFC2119] S. Bradner, Key words for use in RFCs to Indicate [RFC2119] S. Bradner, Key words for use in RFCs to Indicate
Requirement Levels, RFC 2119, March 1997. Requirement Levels, RFC 2119, March 1997.
[RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
Performance, RFC 1323, May 1992.
[RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's
Fast Recovery Algorithm, RFC 2582, April 1999. Fast Recovery Algorithm, RFC 2582, April 1999.
[RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow,
An Extension to the Selective Acknowledgement (SACK) Option An Extension to the Selective Acknowledgement (SACK) Option
for TCP, RFC 2883, July 2000. for TCP, RFC 2883, July 2000.
[Gu01] A. Gurtov, Effect of Delays on TCP Performance, In [Gu01] A. Gurtov, Effect of Delays on TCP Performance, In
Proceedings of IFIP Personal Wireless Communications, Proceedings of IFIP Personal Wireless Communications,
August 2001. August 2001.
[Jac88] V. Jacobson, Congestion Avoidance and Control, In
Proceedings of ACM SIGCOMM 88.
[RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
Performance, RFC 1323, May 1992.
[KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates
in Reliable Transport Protocols, In Proceedings of ACM in Reliable Transport Protocols, In Proceedings of ACM
SIGCOMM 87. SIGCOMM 87.
[LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP
Robust Against Spurious Retransmissions, ACM Computer Robust Against Spurious Retransmissions, ACM Computer
Communication Review, Vol. 30, No. 1, January 2000, Communication Review, Vol. 30, No. 1, January 2000,
available at http://www.acm.org/sigcomm/ccr/archive/2000/ available at http://www.acm.org/sigcomm/ccr/archive/2000/
jan00/ccr-200001-ludwig.html (easier studied when jan00/ccr-200001-ludwig.html (easier studied when
viewed/printed in color). viewed/printed in color).
[LG02] R. Ludwig, A. Gurtov, The Eifel Response Algorithm for TCP,
work in progress, July 2002.
[Lu02] R. Ludwig, Responding to Fast Timeouts in TCP, work in
progress, July 2002.
[RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer,
RFC 2988, November 2000. RFC 2988, November 2000.
[RFC791] J. Postel, Internet Protocol, RFC 791, September 1981. [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981.
[RFC793] J. Postel, Transmission Control Protocol, RFC793, September [RFC793] J. Postel, Transmission Control Protocol, RFC793, September
1981. 1981.
[WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2
(The Implementation), Addison Wesley, January 1995. (The Implementation), Addison Wesley, January 1995.
Author's Address Author's Address
Reiner Ludwig Reiner Ludwig
Ericsson Research (EED) Ericsson Research
Ericsson Allee 1 Ericsson Allee 1
52134 Herzogenrath, Germany 52134 Herzogenrath, Germany
Phone: +49 2407 575 719
Fax: +49 2407 575 400
Email: Reiner.Ludwig@eed.ericsson.se Email: Reiner.Ludwig@eed.ericsson.se
Michael Meyer Michael Meyer
Ericsson Research (EED) Ericsson Research
Ericsson Allee 1 Ericsson Allee 1
52134 Herzogenrath, Germany 52134 Herzogenrath, Germany
Phone: +49 2407 575 654
Fax: +49 2407 575 400
Email: Michael.Meyer@eed.ericsson.se Email: Michael.Meyer@eed.ericsson.se
This Internet-Draft expires in August 2002. This Internet-Draft expires in January 2003.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/