draft-ietf-tsvwg-tcp-eifel-alg-02.txt   draft-ietf-tsvwg-tcp-eifel-alg-03.txt 
Network Working Group Reiner Ludwig Network Working Group Reiner Ludwig
INTERNET-DRAFT Ericsson Research INTERNET-DRAFT Michael Meyer
Expires: May 2002 November, 2001 Expires: August 2002 Ericsson Research
February, 2002
The Eifel Algorithm for TCP The Eifel Detection Algorithm for TCP
<draft-ietf-tsvwg-tcp-eifel-alg-02.txt> <draft-ietf-tsvwg-tcp-eifel-alg-03.txt>
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
skipping to change at page 1, line 32 skipping to change at page 1, line 33
material or cite them other than as "work in progress". material or cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/lid-abstracts.txt http://www.ietf.org/ietf/lid-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Abstract Abstract
A solution to eliminate the retransmission ambiguity in TCP, is to The Eifel detection algorithm allows the TCP sender to detect a
mark ACKs with a special retransmit-marker. The marker would need to posteriori whether it has entered loss recovery unnecessarily. It
be present in those ACKs, and only those ACKs, that the TCP receiver already determines this in the beginning of loss recovery when the
sends in response to retransmits; both genuine and spurious first new ACK arrives after the timeout-based or fast retransmit has
retransmits. Based on such a retransmit-marker, the Eifel algorithm been sent. The algorithm requires that the TCP Timestamps option
allows the TCP sender to detect a posteriori that a fast retransmit defined in RFC1323 is enabled for a connection. The idea is to use
or a timeout was spurious. Three alternative retransmit-markers are the timestamps echoed in returning ACKs to eliminate the
defined in this document, and the Eifel algorithm may be based on retransmission ambiguity in TCP. The Eifel detection algorithm
either one of them: the TCP RXT flag, the TCP Timestamps option, and provides a basis for future TCP enhancements. This includes response
the TCP SACK option. The Eifel algorithm provides a basis for future algorithms to back out of loss recovery by restoring a TCP sender's
TCP enhancements such as response schemes that may change a TCP congestion control state.
sender's protocol state to improve end-to-end performance.
Terminology Terminology
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119]. document, are to be interpreted as described in [RFC2119].
We use the term 'new ACK' to refer to an ACK that acknowledges We use the term 'new ACK' to refer to an ACK that acknowledges
outstanding data. We use the term 'duplicate' ACK as defined in previously unacknowledged data. We use the term 'duplicate ACK' as
[WS95]. Furthermore, we refer to the first-time transmission of a defined in [WS95]. Furthermore, we refer to the first-time
data segment as the 'original transmit', and 'HighData' is the transmission of a data segment as the 'original transmit'.
highest sequence number transmitted at a given point.
1. Introduction 1. Introduction
The retransmission ambiguity problem [KP87] is the TCP sender's The retransmission ambiguity problem [KP87] is the TCP sender's
inability to distinguish whether the first new ACK that arrives after inability to distinguish whether the first new ACK that arrives after
a retransmit was sent in response to the original transmit or the a retransmit was sent in response to the original transmit or the
retransmit. A solution to eliminate the retransmission ambiguity is retransmit. This problem occurs after a timeout-based retransmit and
to mark ACKs with a special "retransmit-marker". The marker would after a fast retransmit. The Eifel detection algorithm uses the TCP
need to be present in those ACKs, and only those ACKs, that the TCP Timestamps option defined in RFC1323 to eliminate the retransmission
receiver sends in response to retransmits; both genuine and spurious ambiguity. It thereby allows the TCP sender to detect a posteriori
retransmits. whether it has entered loss recovery unnecessarily.
Based on such a retransmit-marker, the Eifel algorithm allows the TCP This added capability of the TCP sender is useful in environments
sender to detect a posteriori that a fast retransmit or a timeout was where TCP's loss recovery and congestion control algorithms may often
spurious. This added capability of the TCP sender is useful in get falsely triggered. This can be caused by packet reordering,
environments where TCP's loss recovery and congestion control packet duplication, the loss of a flight of ACKs, or a sudden delay
algorithms may often get falsely triggered. Packet reordering, packet increase in the data or the ACK path that results in a spurious
duplication, or a sudden delay increase in the data or the ACK path timeout. For example, such sudden delay increases can often occur in
that results in a spurious timeout can cause this. For example, such wide-area wireless access networks due to handovers, resource
sudden delay increases can often occur in wide-area wireless access preemption, or because the mobile transmitter traverses through a
networks due to handovers, resource preemption, or because the mobile radio coverage hole (e.g., see [Gu01]).
transmitter traverses through a radio coverage hole (e.g., see
[Gu01]).
Based on the Eifel algorithm, the TCP sender may then choose to Based on the Eifel detection algorithm, the TCP sender may then
implement dedicated response schemes. One goal of such a scheme would choose to implement dedicated response algorithms. One goal of such a
be to alleviate the consequences of a falsely triggered loss recovery response algorithm would be to alleviate the consequences of a
phase. For example, an important feature of the scheme proposed in falsely triggered loss recovery. This may include restoring the TCP
[LG01] is that it avoids the spurious go-back-N retransmits that sender's congestion control state, and/or avoiding the often
typically occur after a spurious timeout. Another goal would be to unnecessary go-back-N retransmits that typically occur after a
"upgrade" the congestion control parameters, the congestion window spurious timeout. Another goal would be to adapt protocol parameters
and slow start threshold [RFC2581]. This is to compensate for the such as the duplicate acknowledgement threshold [RFC2581], and/or the
unnecessary reduction of their values when the loss recovery was RTT estimators [RFC2988]. This is to reduce the risk of falsely
falsely triggered. Yet, another goal would be to adapt other protocol triggering TCP's loss recovery again as the connection progresses.
parameters, e.g., the duplicate acknowledgement threshold [RFC2581], However, such response algorithms are outside the scope of this
and the RTT estimators [RFC2988]. This is to reduce the risk of document. Note: The original proposal, the "Eifel algorithm" [LK00],
falsely triggering TCP's loss recovery again in the future. However, comprises both a detection and a response algorithm. This document
such response schemes are outside the scope of this document. They only defines the detection part.
are addressed in different documents (e.g., see [LG01] and [BA01a]).
A key feature of the Eifel algorithm is that it already detects upon A key feature of the Eifel detection algorithm is that it already
the first new ACK that arrives during a loss recovery phase that a detects upon the first new ACK that arrives during loss recovery
fast retransmit or a timeout was spurious. This is crucial to be able whether a fast retransmit or a timeout was spurious. This is crucial
to avoid the mentioned spurious go-back-N retransmits. Another to be able to avoid the mentioned go-back-N retransmits. Another
feature is that the Eifel algorithm is fairly robust against ACK feature is that the Eifel detection algorithm is fairly robust
losses. Especially, the loss of the single ACK carrying the against the loss of ACKs.
retransmit-marker does not affect the functioning of the Eifel
algorithm.
2. Spurious Retransmits 2. Events that Falsely Trigger TCP Loss Recovery
The following events falsely trigger a TCP sender's loss recovery and The following events falsely trigger a TCP sender's loss recovery and
congestion control algorithms. This causes a so-called spurious congestion control algorithms. This causes a so-called spurious
retransmit, and an unnecessary reduction of the TCP sender's retransmit, and an unnecessary reduction of the TCP sender's
congestion window. congestion window and slow start threshold [RFC2581].
- Spurious timeouts - Spurious timeout
- Packet reordering - Packet reordering
- Packet duplication - Packet duplication
A spurious timeout is a timeout that would not have occurred had the The common understanding of a spurious timeout is a timeout that
sender "waited longer". It may be caused by increased delay that would not have occurred had the sender "waited longer". This may be
suddenly occurs in the data or the ACK path. This in turn might cause caused by increased delay that suddenly occurs in the data and/or the
an ACK to arrive too late, i.e., only after the TCP sender's ACK path. That in turn might cause an ACK to arrive too late, i.e.,
retransmission timer has expired. This would then trigger a spurious only after the TCP sender's retransmission timer has expired. Note
retransmit. Note that the mentioned ACK could either be a new ACK that such a late ACK could either be a new ACK that would acknowledge
that would acknowledge the oldest outstanding segment, or it could be the oldest outstanding segment, or it could be the third duplicate
the third duplicate ACK that would have triggered a fast retransmit ACK that would trigger a fast retransmit of the oldest outstanding
of the oldest outstanding segment [RFC2581]. Note: In the latter case segment [RFC2581]. For the purpose of specifying the algorithm in
the sender should suppress the fast retransmit that the third Section 3, we define the former case as SPUR_TO_NEWACK, and the
duplicate ACK would usually trigger. In general, a TCP sender should latter as SPUR_TO_DUPACK. However, in this document we stretch the
ignore all duplicate ACKs that arrive after a timeout [GL01]. definition of a spurious timeout. We also speak of a spurious timeout
when the timeout occurred because an entire flight of ACKs was lost
while in fact the original transmit of the oldest outstanding segment
had arrived at the TCP receiver. This case is also considered to fall
under the definition of SPUR_TO_NEWACK. Even though such a timeout is
certainly unavoidable, the triggering of loss recovery was spurious
since that original transmit was not lost. Hence, we use the term
'spurious timeout' in the sense that entering the associated loss
recovery was unnecessary. Yet, we exclude from the definition of a
spurious timeout the case where the retransmit arrives at the TCP
receiver before the corresponding original transmit.
Packet reordering in the network may occur because IP [RFC791] is a Packet reordering in the network may occur because IP [RFC791] does
connection-less protocol. Thus, IP does not guarantee an in-order not guarantee in-order delivery of packets. Additionally, a TCP
delivery of packets. This results in a spurious fast retransmit if receiver generates a duplicate ACK for each segment that arrives out-
three or more data segments arrive out-of-order at the TCP receiver, of-order. This results in a spurious fast retransmit if three or more
and at least three of the resulting duplicate ACKs arrive at the TCP data segments arrive out-of-order at the TCP receiver, and at least
sender. This is because a TCP receiver generates a duplicate ACK for three of the resulting duplicate ACKs arrive at the TCP sender. We
each segment that arrives out-of-order, and three consecutive define such a case as SPUR_FR.
duplicate ACKs trigger the TCP sender's fast retransmit algorithm.
Packet duplication in the network may also occur because IP is Packet duplication may occur because a receiving IP does not (cannot)
connection-less. That is, the receiving IP does not (cannot) remove remove packets that have been duplicated in the network. A TCP
duplicate packets. A TCP receiver in turn also generates a duplicate receiver in turn also generates a duplicate ACK for each duplicate
ACK for each duplicate segment. As with packet reordering, this segment. As with packet reordering, this results in a spurious fast
results in a spurious fast retransmit if duplication of data segments retransmit if duplication of data segments or ACKs results in three
or ACKs results in three or more duplicate ACKs to arrive at the TCP or more duplicate ACKs to arrive at the TCP sender. This case is also
sender. considered to fall under the definition of SPUR_FR.
The negative impact on TCP performance caused by packet reordering The negative impact on TCP performance caused by packet reordering
and packet duplication is commonly the same: a single spurious and packet duplication is commonly the same: a single spurious
retransmit (the fast retransmit), and the unnecessary halving of the retransmit (the fast retransmit), and the unnecessary halving of the
TCP sender's congestion window as a result of the subsequent fast TCP sender's congestion window as a result of the subsequent fast
recovery phase [RFC2581]. recovery phase [RFC2581].
The negative impact on TCP performance caused by a spurious timeout The negative impact on TCP performance caused by a spurious timeout
is more severe. First, the timeout event itself causes a single is more severe. First, the timeout event itself causes a single
spurious retransmit, and unnecessarily forces the TCP sender into spurious retransmit, and unnecessarily forces the TCP sender into
skipping to change at page 4, line 27 skipping to change at page 4, line 30
reaction gets triggered that further decreases TCP's performance. reaction gets triggered that further decreases TCP's performance.
Since the timeout was spurious, at least some ACKs for original Since the timeout was spurious, at least some ACKs for original
transmits will typically arrive at the TCP sender before the ACK for transmits will typically arrive at the TCP sender before the ACK for
the retransmit arrives. (This is unless severe packet reordering the retransmit arrives. (This is unless severe packet reordering
coincided with the spurious timeout in such a way that the ACK for coincided with the spurious timeout in such a way that the ACK for
the retransmit is the first new ACK to arrive at the TCP sender.) the retransmit is the first new ACK to arrive at the TCP sender.)
Those ACKs for original transmits will then trigger an implicit go- Those ACKs for original transmits will then trigger an implicit go-
back-N loss recovery at the TCP sender. Assuming that none of the back-N loss recovery at the TCP sender. Assuming that none of the
outstanding segments and none of the corresponding ACKs were lost, outstanding segments and none of the corresponding ACKs were lost,
all outstanding segments will get retransmitted unnecessarily. all outstanding segments will get retransmitted unnecessarily.
Moreover, this will then typically cause a spurious fast retransmit. Moreover, in some TCPs this may then cause a spurious fast
This is because the spurious go-back-N retransmits will arrive as retransmit. This is because the spurious go-back-N retransmits will
duplicates at the TCP receiver, which in turn triggers a series of arrive as duplicates at the TCP receiver, which in turn triggers a
duplicate ACKs. Note that this last spurious fast retransmit could be series of duplicate ACKs. Note that this last spurious fast
avoided with the careful variant of 'bug fix' [RFC2582]. retransmit could be avoided with the careful variant of 'bug fix'
[RFC2582].
More detailed explanations including TCP trace plots that visualize More detailed explanations including TCP trace plots that visualize
the effects of packet reordering and spurious timeouts can be found the effects of spurious timeouts and packet reordering can be found
in [LK00]. in [LK00].
3. The Eifel Algorithm 3. The Eifel Detection Algorithm
3.1 The Idea 3.1 The Idea
As mentioned before, a solution to eliminate the retransmission The goal of the Eifel detection algorithm is to allow the TCP sender
ambiguity in TCP, is to mark ACKs with a special retransmit-marker. to detect a posteriori whether it has entered loss recovery
The marker would need to be present in those ACKs, and only those unnecessarily. Furthermore, the TCP sender should be able to make
ACKs, that the TCP receiver sends in response to retransmits; both this decision upon the first new ACK that arrives after the timeout-
genuine and spurious retransmits. Based on such a retransmit-marker, based or fast retransmit has been sent. This in turn requires extra
the Eifel algorithm allows the TCP sender to detect a posteriori that information in every ACK by which the TCP sender can unambiguously
a fast retransmit or a timeout was spurious. distinguish whether that first new ACK was sent in response to the
original transmit or the retransmit. Such extra information is
provided by the TCP Timestamps option [RFC1323]. Generally speaking,
timestamps are monotonously increasing "serial numbers" added into
every segment that are then echoed within the corresponding ACKs.
This is exploited by the Eifel detection algorithm in the following
way.
[Note: The Eifel algorithm as defined here is a pure detection Given that timestamps are enabled for a connection the TCP sender
mechanism. The original proposal of the Eifel algorithm [LK00] always stores the timestamp of the retransmit sent in the beginning
included the TCP sender's response to a detected spurious of loss recovery, i.e., the timestamp of the timeout-based or the
retransmit, and a marking scheme that allows a TCP sender to fast retransmit. If the timestamp of the first new ACK, arriving
distinguish an ACK for an original transmit from that of a after the mentioned retransmit was sent, is smaller then the stored
retransmit. Such response and marking schemes are addressed in timestamp of that retransmit, then that ACK must have been sent in
different documents [RFC2883], [BA01a], [BA01b], [LM01], response to an original transmit. Hence, the TCP sender must have
[LG01].] entered loss recovery unnecessarily.
The key idea of the Eifel algorithm is to let the TCP sender take the The fact that the Eifel detection algorithm decides upon this first
absence of a retransmit-marker in the first new ACK that arrives new ACK is crucial to allow future response algorithms to avoid the
after a retransmit as sufficient evidence that the loss recovery spurious go-back-N retransmits that typically occur after a spurious
phase was falsely triggered. Being able to make this decision upon timeout. Also, if loss recovery was entered unnecessarily, a window
the first new ACK is crucial for response schemes such as [LG01] to worth of ACKs are outstanding that all carry a timestamp that is
avoid the spurious go-back-N retransmits that typically occur after a smaller than the stored timestamp of the retransmit. The arrival of
spurious timeout. any one of those ACKs suffices the Eifel detection algorithm to work.
Hence, the solution is fairly robust against ACK losses. Even the ACK
sent in response to the retransmit, i.e., the one that carries the
stored timestamp, may get lost.
However, making this decision upon the first new ACK is not Note: Also the DSACK option [RFC2883] can be used to detect a
absolutely reliable. A counter-example can be constructed where this posteriori whether the TCP sender has entered loss recovery
approach fails: unnecessarily. However, the first ACK carrying a DSACK option usually
arrives at the TCP sender only after loss recovery has already
terminated. Thus, the DSACK option cannot be used to eliminate the
retransmission ambiguity. Consequently, it cannot be used to avoid
the mentioned spurious go-back-N retransmits. Moreover, a DSACK-based
detection algorithm is less robust against ACK losses.
In case the original transmit was in fact dropped in the 3.2 The Algorithm
network, a new ACK that arrives after a retransmit would also
not carry a retransmit-marker if (1) the retransmit arrived at
the TCP receiver in sequence, i.e., if it had jumped ahead of
all data segments that were outstanding when the retransmit was
sent, and if (2) the ACK for the retransmit carrying the
retransmit-marker got lost. In that case the mentioned new ACK
would correspond to one of the data segments that were
outstanding when the retransmit was sent. Note: This example
holds independent of whether the loss recovery phase was
triggered by the arrival of the third duplicate ACK or by a
timeout. Note also: in case the SACK option is used as the
retransmit-marker (see Section 3.2 and 3.4), condition (2) is
not required.
Nevertheless, this counter-example is regarded as being a rather Given that the TCP Timestamps option [RFC1323] is enabled for a
pathological case. In addition, it seems to be the only conceivable connection, a TCP sender MAY use the Eifel detection algorithm as
counter-example. Therefore, it is regarded as sufficiently safe to defined in this subsection.
take the absence of a retransmit-marker in the first new ACK that
arrives after a retransmit as the signal that the TCP sender falsely
entered the loss recovery phase.
This approach makes the Eifel algorithm fairly robust against ACK If the Eifel detection algorithm is used, the following steps MUST be
losses. This is because a number of ACKs that do not carry the taken by the TCP sender only upon initiation of loss recovery, i.e.,
retransmit-marker are commonly in flight during a falsely triggered when either the timeout-based retransmit or fast retransmit is sent.
loss recovery phase. The arrival of at least one of those ACKs would Note: The Eifel detection algorithm should not be (re-)initiated
be sufficient for the Eifel algorithm to make a decision. Also, the after loss recovery has already started. In particular, it may not be
loss of the single ACK carrying the retransmit-marker does not affect (re-)initiated upon subsequent timeouts for the same segment, and not
the functioning of the Eifel algorithm. upon retransmitting segments other than the oldest outstanding
segment.
Three alternative retransmit-markers are defined in this document, (1) Set a "SpuriousRecovery" variable to FALSE.
and the Eifel algorithm may be based on either one of them:
(1) the TCP RXT flag [LM01], (2) Set a "RetransmitTS" variable to the value of the
(2) the TCP Timestamps option [RFC1323], and Timestamp Value field of the Timestamps option included
in the retransmit. A TCP sender must ensure that
RetransmitTS does not get overwritten as loss recovery
progresses, e.g., in case of a second timeout and
subsequent second retransmit of the same octet.
(3) the TCP SACK option [RFC2018]. (3) Wait for the arrival of a valid ACK. If a valid ACK
arrives, then proceed to step (4).
The exact use of those markers is specified in the following (4) If a first or second duplicate ACK has arrived, then
subsections. The RXT flag is the most efficient retransmit-marker return to step (3), else proceed to step (5).
since it does not enlarge TCP segments or ACKs by an option. Unlike
for the RXT flag and the Timestamps option, the SACK option has the
drawback that it can only detect whether a fast retransmit was
spurious. It cannot be used to detect spurious timeouts as explained
in Section 3.4.
Note: The DSACK option [RFC2883] is not an appropriate retransmit- (5) If a third duplicate ACK has arrived and the loss
marker that would allow to eliminate the retransmission ambiguity. recovery had been initiated with a timeout-based
The reason is that the first new ACK that arrives after a retransmit retransmit, then set SpuriousRecovery to SPUR_TO_DUPACK
will commonly not carry a DSACK option. That is, the DACK option and proceed to step (DONE), else proceed to step (6).
commonly arrives one or more ACKs later than the first new ACK for a
retransmit. This is independent of whether the retransmit was genuine
or spurious.
Given that both the TCP sender and receiver support the RXT flag, or (6) If a new ACK has arrived and the Timestamp Echo Reply
the Timestamps option, or the SACK option, a TCP sender MAY implement field of the ACK's Timestamps option is smaller than
the Eifel algorithm as defined in the following subsections. RetransmitTS, then proceed to step (7), else proceed to
step (DONE).
3.2 The Algorithm (7) If the loss recovery had been initiated with a timeout-
based retransmit then set SpuriousRecovery to
SPUR_TO_NEWACK, else set SpuriousRecovery to SPUR_FR.
Proceed to step (DONE).
The following steps MUST be taken by the TCP sender when the third (DONE) No further processing.
duplicate ACK arrives or a timeout occurs, i.e., when a loss recovery
phase starts:
(1) Set a "SpuriousRecovery" variable to 'false'. If this The comparison operator "smaller than" in step (6) is conservative.
variable is true at step (7) below, the loss recovery In theory, when the "timestamp clock" is slow or the network is fast,
phase had been falsely triggered. RetransmitTS could at most be equal to the timestamp echoed by an ACK
sent in response to an original transmit. In that case, it is assumed
that the loss recovery was not falsely triggered.
(2) Set a "RecoveryPoint" variable to HighData. When the TCP 3.3 Non-RFC1323-compliant TCP Receivers Mask Certain Spurious Timeouts
sender receives the first new ACK for this data octet the
loss recovery phase is terminated.
(3-TS) This step only applies when the Timestamps option is used Not all TCP implementations strictly follow RFC1323. There are
as the retransmit-marker: Set a "RetransmitTS" variable differences concerning which timestamp a TCP receiver echoes when a
to the value of the Timestamp Value field of the duplicate segment arrives. The relevant case to consider in this
Timestamps option included in the first retransmit. A TCP context is a spurious timeout that occurred because an entire flight
sender MUST ensure that RetransmitTS does not get of ACKs got lost. (Recall the definition of a spurious timeout in
overwritten as the loss recovery phase progresses, e.g., Section 2.) The question is which timestamp the TCP receiver echoes
in case of a second timeout and subsequent second in response to the spurious retransmit that typically arrives as a
retransmit of the same octet. duplicate segment. RFC1323-compliant TCP receivers (e.g., LINUX
kernel 2.4) will echo the timestamp of the last segment that arrived
in-sequence, while some non-RFC1323-compliant TCP receivers will echo
the timestamp of the retransmit.
The following steps MUST be taken by the TCP sender when the first The Eifel detection algorithm will only detect such spurious timeouts
new ACK arrives after the loss recovery phase started: in case the TCP receiver is RFC1323-compliant in this respect.
(4) If the ACK's sequence number is greater than the value of 3.4 Protecting Against Misbehaving TCP Receivers
RecoveryPoint , proceed to step (7) below. This condition
ensures that only those ACKs are considered that
correspond to segments that were outstanding at the time
the retransmit was sent.
(5-SACK) This step only applies when the SACK option is used as A TCP receiver can easily make a genuine retransmit appear to the TCP
the retransmit-marker: If the ACK's sequence number is sender as a spurious retransmit by forging echoed timestamps. This
equal to the value of RecoveryPoint, proceed to step (7) may pose a security concern.
below. This step is motivated in Section 3.4.
(6-RXT) This step only applies when the RXT flag is used as the Fortunately, there is a way to modify the Eifel detection algorithm
retransmit-marker: If the ACK does not have the RXT flag in a way that makes it robust against lying TCP receivers. The idea
set (binary 1), set SpuriousRecovery to 'true' and is to use timestamps as a "segment's secret" that the TCP receiver
proceed to step (7) below. only gets to know if it receives the segment. Conversely, a TCP
receiver will not know the timestamp of a segment that was lost.
Hence, to "prove" that it received the original transmit of a segment
that the TCP sender retransmitted, the TCP receiver would need to
return the timestamp of that original transmit. The Eifel detection
algorithm could then be modified to only decide that loss recovery
had been unnecessarily entered if the first new ACK echoes the
timestamp of the original transmit.
(6-TS) This step only applies when the Timestamps option is used Hence, implementers may choose to implement the algorithm with the
as the retransmit-marker: If the value of the Timestamp following modifications.
Echo Reply field of the ACK's Timestamps option is
smaller than RetransmitTS, set SpuriousRecovery to 'true'
and proceed to step (7) below. This step is commented in
Section 3.3.
(6-SACK) This step only applies when the SACK option is used as Step (2) is replaced with step (2'):
the retransmit-marker, and only if the loss recovery
phase had been triggered by the arrival of the third
duplicate ACK: If the ACK does not carry a SACK option,
set SpuriousRecovery to 'true' and proceed to step (7)
below. This step is motivated in Section 3.4.
(7) Done. No further processing. (2') Set a "RetransmitTS" variable to the value of the
Timestamp Value field of the Timestamps option that was
included in the original transmit corresponding to the
retransmit. Note: This step requires that the TCP sender
stores the timestamps of all outstanding original
transmits.
3.3 Comments to the Timestamp-based Detection Step (4) is replaced with step (4'):
The comparison operator "smaller than" in step (6-TS) is (4') If a first or second duplicate ACK has arrived, then
conservative. In theory, when the "timestamp clock" is slow or the proceed to step (DONE), else proceed to step (6).
network is fast, RetransmitTS could at most be equal to the timestamp
echoed by an ACK sent in response to an original transmit. In that
case, it is assumed that the loss recovery was not falsely triggered.
3.4 Comments to the SACK-based Detection Step (6) is replaced with step (6'):
The SACK option is not a retransmit-marker in the sense that it is (6') If a new ACK has arrived and the Timestamp Echo Reply
present only in those ACKs that the TCP receiver sends in response to field of the ACK's Timestamps option is equal to
retransmits (see Section 3.1). This is why the SACK option cannot be RetransmitTS, then proceed to step (7), else proceed to
used to detect that a timeout was spurious. For this, we only need to step (DONE).
consider the case where an entire flight of segments was lost versus
the case of a spurious timeout. Assuming that packet reordering did
not concur, the first new ACK arriving after the retransmit would not
carry a SACK option in either case. Thus, the retransmission
ambiguity problem would still exist.
Nevertheless, the SACK option can be used to detect that a fast These modifications come at a cost. First, spurious timeouts of type
retransmit was spurious. Let's assume a SACK-enabled TCP receiver. If SPUR_TO_DUPACK cannot be detected any longer. Also, the modified
only a single segment was in fact lost, then the first new ACK that algorithm is fairly sensitive against ACK losses since it relies on
arrives after the fast retransmit will not carry a SACK option, and the arrival of the new ACK that corresponds to the original transmit.
will acknowledge RecoveryPoint. If multiple segments were in fact
lost, then the first new ACK that arrives after the fast retransmit
will carry a SACK option, and will acknowledge below RecoveryPoint,
i.e., the ACK will be partial. Thus, partial ACKs that do not carry a
SACK option are impossible independent of how many segments were
lost. Conversely, the first new ACK after the fast retransmit that is
a partial ACK and that does not carry a SACK option, signals that the
loss recovery phase was falsely triggered. This motivates steps (5-
SACK) and (6-SACK) in Section 3.2.
Note: The SACK option cannot be used to detect that a fast retransmit Note: The first new ACK that arrives after loss recovery had been
was spurious when the entire flight of segments jumps ahead of the unnecessarily entered, typically echoes the timestamp of the original
first segment (the oldest outstanding segment) of that flight. This transmit. This assumes that the ACK corresponding to the original
is because the resulting new ACK would not carry a SACK option but transmit was not lost, and the TCP receiver does not forge timestamps
would acknowledge RecoveryPoint. Thus, step (5-SACK) in Section 3.2 but complies with RFC1323. In case of a spurious fast retransmit,
would become effective. this is implied by the rules for generating ACKs for data segments
that fill in all or part of a gap in the sequence space (see section
4.2 of [RFC2581]) and by the rules for echoing timestamps in that
case (see rule (C) in section 3.4 of [RFC1323]). In case of a
spurious timeout, it is likely that the delay that has caused the
spurious timeout has also caused the TCP receiver's delayed ACK timer
[RFC1122] to expire before the original transmit arrives. Also, in
this case the rules for generating ACKs and the rules for echoing
timestamps (see rule (A) in section 3.4 of [RFC1323]) ensure that the
original transmit's timestamp is echoed.
A remaining problem is that a TCP receiver might guess a lost
segment's timestamp from observing the timestamps of segments that
arrived earlier. This could be avoided by having the TCP sender add a
"random counter" to the timestamp of every segment it sends, and then
increment that counter by a random value, e.g., between 1 and 100.
This ensures that timestamps remain monotonously increasing while
making it difficult for a TCP receiver to guess the timestamp of a
lost segment.
4. Security Considerations 4. Security Considerations
There do not seem to be any security considerations associated with There do not seem to be any security considerations associated with
the Eifel algorithm itself. This is because the Eifel algorithm is the Eifel detection algorithm. This is because the Eifel detection
only a detection scheme that is not tied to any specific action that algorithm does not alter existing protocol state at the TCP sender
might alter existing protocol state at the TCP sender or receiver. nor at the receiver. That is, no value of a state variable other than
That is, no value of a state variable other than one introduced by one introduced by the algorithm itself (SpuriousRecovery, and
the algorithm itself (SpuriousRecovery, RecoverPoint, and
RetransmitTS) is changed. RetransmitTS) is changed.
However, security considerations might exist for response schemes Moreover, a variant of the Eifel detection algorithm has been
that use the Eifel algorithm as a basis. In particular, it needs to proposed in Section 3.4 that makes it robust against lying TCP
be considered that the TCP receiver might by lying about the receivers.
retransmit-marker.
Acknowledgments Acknowledgments
Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally
Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, and Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi
Andrei Gurtov for very useful discussions that contributed to this Sarolahti, and Alexey Kuznetsov for useful discussions that
work. contributed to this work.
References References
[RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control,
RFC 2581, April 1999. RFC 2581, April 1999.
[BA01a] E. Blanton, M. Allman, Adjusting the Duplicate ACK
Threshold to Avoid Spurious Retransmits, work in progress,
July 2001.
[BA01b] E. Blanton, M. Allman, Using TCP DSACKs and SCTP Duplicate
TSNs to Detect Spurious Retransmissions, work in progress,
August 2001.
[RFC1122] R. Braden, Requirements for Internet Hosts - Communication
Layers, RFC 1122, October 1989.
[RFC2119] S. Bradner, Key words for use in RFCs to Indicate [RFC2119] S. Bradner, Key words for use in RFCs to Indicate
Requirement Levels, RFC 2119, March 1997. Requirement Levels, RFC 2119, March 1997.
[RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
Performance, RFC 1323, May 1992. Performance, RFC 1323, May 1992.
[RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective
Acknowledgement Options, RFC 2018, October 1996.
[RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's
Fast Recovery Algorithm, RFC 2582, April 1999. Fast Recovery Algorithm, RFC 2582, April 1999.
[RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow,
An Extension to the Selective Acknowledgement (SACK) Option An Extension to the Selective Acknowledgement (SACK) Option
for TCP, RFC 2883, July 2000. for TCP, RFC 2883, July 2000.
[Gu01] A. Gurtov, Effect of Delays on TCP Performance, In [Gu01] A. Gurtov, Effect of Delays on TCP Performance, In
Proceedings of IFIP Personal Wireless Communications, Proceedings of IFIP Personal Wireless Communications,
August 2001. August 2001.
[GL01] A. Gurtov, R. Ludwig, Making TCP Robust Against Delay
Spikes, work in progress, November 2001.
[KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates
in Reliable Transport Protocols, In Proceedings of ACM in Reliable Transport Protocols, In Proceedings of ACM
SIGCOMM 87. SIGCOMM 87.
[LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP
Robust Against Spurious Retransmissions, ACM Computer Robust Against Spurious Retransmissions, ACM Computer
Communication Review, Vol. 30, No. 1, January 2000, Communication Review, Vol. 30, No. 1, January 2000,
available at http://www.acm.org/sigcomm/ccr/archive/2000/ available at http://www.acm.org/sigcomm/ccr/archive/2000/
jan00/ccr-200001-ludwig.html (easier studied when jan00/ccr-200001-ludwig.html (easier studied when
viewed/printed in color). viewed/printed in color).
[LM01] R. Ludwig, M. Meyer, The TCP Retransmit (RXT) Flag, work in
progress, November 2001.
[LG01] R. Ludwig, A. Gurtov, Responding to Spurious Timeouts in
TCP, work in progress, November 2001.
[RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer,
RFC 2988, November 2000. RFC 2988, November 2000.
[RFC791] J. Postel, Internet Protocol, RFC 791, September 1981. [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981.
[RFC793] J. Postel, Transmission Control Protocol, RFC793, September [RFC793] J. Postel, Transmission Control Protocol, RFC793, September
1981. 1981.
[WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2
(The Implementation), Addison Wesley, January 1995. (The Implementation), Addison Wesley, January 1995.
Author's Address Author's Address
Reiner Ludwig Reiner Ludwig
Ericsson Research (EED) Ericsson Research (EED)
Ericsson Allee 1 Ericsson Allee 1
52134 Herzogenrath, Germany 52134 Herzogenrath, Germany
Phone: +49 2407 575 719 Phone: +49 2407 575 719
Fax: +49 2407 575 400 Fax: +49 2407 575 400
Email: Reiner.Ludwig@Ericsson.com Email: Reiner.Ludwig@eed.ericsson.se
This Internet-Draft expires in May 2002. Michael Meyer
Ericsson Research (EED)
Ericsson Allee 1
52134 Herzogenrath, Germany
Phone: +49 2407 575 654
Fax: +49 2407 575 400
Email: Michael.Meyer@eed.ericsson.se
This Internet-Draft expires in August 2002.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/