draft-ietf-tsvwg-tcp-eifel-alg-04.txt   draft-ietf-tsvwg-tcp-eifel-alg-05.txt 
Network Working Group Reiner Ludwig Network Working Group Reiner Ludwig
INTERNET-DRAFT Michael Meyer INTERNET-DRAFT Michael Meyer
Expires: January 2003 Ericsson Research Expires: April 2003 Ericsson Research
July, 2002 October, 2002
The Eifel Detection Algorithm for TCP The Eifel Detection Algorithm for TCP
<draft-ietf-tsvwg-tcp-eifel-alg-04.txt> <draft-ietf-tsvwg-tcp-eifel-alg-05.txt>
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
skipping to change at page 1, line 37 skipping to change at page 1, line 37
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Abstract Abstract
The Eifel detection algorithm allows the TCP sender to detect a The Eifel detection algorithm allows the TCP sender to detect a
posteriori whether it has entered loss recovery unnecessarily. It posteriori whether it has entered loss recovery unnecessarily. It
already determines this in the beginning of loss recovery when the already determines this in the beginning of loss recovery when the
first acceptable ACK arrives after the timeout-based retransmit or first acceptable ACK arrives after the timeout-based retransmit or
fast retransmit has been sent. The algorithm requires that the TCP the fast retransmit has been sent. The algorithm requires that the
Timestamps option defined in RFC1323 is enabled for a connection. The TCP Timestamps option defined in RFC1323 is enabled for a connection.
idea is to use the timestamps echoed in returning ACKs to eliminate The idea is to use the timestamps echoed in returning ACKs to
the retransmission ambiguity in TCP. The Eifel detection algorithm eliminate the retransmission ambiguity in TCP. The Eifel detection
provides a basis for future TCP enhancements. This includes response algorithm provides a basis for future TCP enhancements. This includes
algorithms to back out of loss recovery by restoring a TCP sender's response algorithms to back out of loss recovery by restoring a TCP
congestion control state. sender's congestion control state.
Terminology Terminology
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119]. document, are to be interpreted as described in [RFC2119].
We refer to the first-time transmission of an octet as the 'original We refer to the first-time transmission of an octet as the 'original
transmit'. A subsequent transmission of the same octet is referred to transmit'. A subsequent transmission of the same octet is referred to
as a 'retransmit'. In most cases this terminology can likewise be as a 'retransmit'. In most cases this terminology can likewise be
applied to "data segments" as opposed to "octets". However, with applied to data segments as opposed to octets. However, with
repacketization a segment can contain both first-time transmissions repacketization a segment can contain both first-time transmissions
and retransmissions of octets. In that case this terminology is only and retransmissions of octets. In that case, this terminology is only
consistent when applied to "octets". For the Eifel detection consistent when applied to "octets". For the Eifel detection
algorithm this makes no difference as it also operates correctly when algorithm this makes no difference as it also operates correctly when
repacketization occurs. repacketization occurs.
We use the term 'acceptable ACK' as defined in [RFC793]. That is an We use the term 'acceptable ACK' as defined in [RFC793]. That is an
ACK that acknowledges previously unacknowledged data. We use the term ACK that acknowledges previously unacknowledged data. We use the term
'duplicate ACK', and the variable 'dupacks' as defined in [WS95]. The 'duplicate ACK', and the variable 'dupacks' as defined in [WS95]. The
variable 'dupacks' is a counter of duplicate ACKs that have already variable 'dupacks' is a counter of duplicate ACKs that have already
been received by the TCP sender before the fast retransmit is sent. been received by the TCP sender before the fast retransmit is sent.
We use the variable 'DupThresh' to refer to the so-called duplicate We use the variable 'DupThresh' to refer to the so-called duplicate
acknowledgement threshold, i.e., the number of duplicate ACKs that acknowledgement threshold, i.e., the number of duplicate ACKs that
need to arrive at the TCP sender to trigger a fast retransmit. need to arrive at the TCP sender to trigger a fast retransmit.
Currently, DupThresh is specified as a fixed value of three Currently, DupThresh is specified as a fixed value of three
[RFC2581]. Future TCPs might implement an adaptive DupThresh. [RFC2581]. Future TCPs might implement an adaptive DupThresh.
1. Introduction 1. Introduction
The retransmission ambiguity problem [KP87] is the TCP sender's The retransmission ambiguity problem [Zh86][KP87] is the TCP sender's
inability to distinguish whether the first acceptable ACK that inability to distinguish whether the first acceptable ACK that
arrives after a retransmit, was sent in response to the original arrives after a retransmit, was sent in response to the original
transmit or the retransmit. This problem occurs after a timeout-based transmit or the retransmit. This problem occurs after a timeout-based
retransmit and after a fast retransmit. The Eifel detection algorithm retransmit and after a fast retransmit. The Eifel detection algorithm
uses the TCP Timestamps option defined in [RFC1323] to eliminate the uses the TCP Timestamps option defined in [RFC1323] to eliminate the
retransmission ambiguity. It thereby allows the TCP sender to detect retransmission ambiguity. It thereby allows the TCP sender to detect
a posteriori whether it has entered loss recovery unnecessarily. a posteriori whether it has entered loss recovery unnecessarily.
This added capability of the TCP sender is useful in environments This added capability of the TCP sender is useful in environments
where TCP's loss recovery and congestion control algorithms may often where TCP's loss recovery and congestion control algorithms may often
get falsely triggered. This can be caused by packet reordering, get falsely triggered. This can be caused by packet reordering,
packet duplication, the loss of a flight of ACKs, or a sudden delay packet duplication, or a sudden delay increase in the data or the ACK
increase in the data or the ACK path that results in a spurious path that results in a spurious timeout. For example, such sudden
timeout. For example, such sudden delay increases can often occur in delay increases can often occur in wide-area wireless access networks
wide-area wireless access networks due to handovers, resource due to handovers, resource preemption due to higher priority traffic
preemption due to higher priority traffic (e.g., voice), or because (e.g., voice), or because the mobile transmitter traverses through a
the mobile transmitter traverses through a radio coverage hole (e.g., radio coverage hole (e.g., see [Gu01]). In such wireless networks,
see [Gu01]). In such wireless networks, the often unnecessary the often unnecessary go-back-N retransmits that typically occur
go-back-N retransmits that typically occur after a spurious timeout after a spurious timeout create a serious problem. They decrease end-
create a serious problem. They decrease end-to-end throughput, are to-end throughput, are useless load upon a potentially congested
useless load upon a potentially congested network, and waste network, and waste transmission (battery) power. Note, that across
transmission (battery) power. such networks the use of timestamps is recommended anyway [IMLGK02].
Based on the Eifel detection algorithm, the TCP sender may then Based on the Eifel detection algorithm, the TCP sender may then
choose to implement dedicated response algorithms. One goal of such a choose to implement dedicated response algorithms. One goal of such a
response algorithm would be to alleviate the consequences of a response algorithm would be to alleviate the consequences of a
falsely triggered loss recovery. This may include restoring the TCP falsely triggered loss recovery. This may include restoring the TCP
sender's congestion control state, and avoiding the mentioned sender's congestion control state, and avoiding the mentioned
unnecessary go-back-N retransmits. Another goal would be to adapt unnecessary go-back-N retransmits. Another goal would be to adapt
protocol parameters such as the duplicate acknowledgement threshold protocol parameters such as the duplicate acknowledgement threshold
[RFC2581], and the RTT estimators [RFC2988]. This is to reduce the [RFC2581], and the RTT estimators [RFC2988]. This is to reduce the
risk of falsely triggering TCP's loss recovery again as the risk of falsely triggering TCP's loss recovery again as the
skipping to change at page 3, line 42 skipping to change at page 3, line 42
congestion control algorithms. This causes a so-called spurious congestion control algorithms. This causes a so-called spurious
retransmit, and an unnecessary reduction of the TCP sender's retransmit, and an unnecessary reduction of the TCP sender's
congestion window and slow start threshold [RFC2581]. congestion window and slow start threshold [RFC2581].
- Spurious timeout - Spurious timeout
- Packet reordering - Packet reordering
- Packet duplication - Packet duplication
The common understanding of a spurious timeout is a timeout that A spurious timeout is a timeout that would not have occurred had the
would not have occurred had the sender "waited longer". This may be sender "waited longer". This may be caused by increased delay that
caused by increased delay that suddenly occurs in the data and/or the suddenly occurs in the data and/or the ACK path. That in turn might
ACK path. That in turn might cause an acceptable ACK to arrive too cause an acceptable ACK to arrive too late, i.e., only after the TCP
late, i.e., only after the TCP sender's retransmission timer has sender's retransmission timer has expired. For the purpose of
expired. For the purpose of specifying the algorithm in Section 3, we specifying the algorithm in Section 3, we define this case as SPUR_TO
define this case as SPUR_TO (equal 1). However, in this document, we (equal 1).
stretch the definition of a spurious timeout. We also speak of a
spurious timeout when the timeout occurred because an entire flight
of ACKs was lost while in fact the original transmit of the oldest
outstanding segment had arrived at the TCP receiver. This case is
also considered to fall under the definition of SPUR_TO since a TCP
sender is not able to distinguish it from the other case mentioned
above. Such a timeout is certainly unavoidable, i.e., it would not
have made a difference if the TCP sender had "waited longer". Still,
the triggering of loss recovery was spurious since that original
transmit was not lost, i.e., there had been no congestion in the data
path. Hence, we use the term 'spurious timeout' in the sense that
initiating the associated loss recovery was unnecessary. Yet, we
exclude from the definition of a spurious timeout the case where the
retransmit arrives at the TCP receiver before the corresponding
original transmit.
Note: There is another case where a timeout would not have Note: There is another case where a timeout would not have
occurred had the sender "waited longer": the retransmission occurred had the sender "waited longer": the retransmission
timer expires, and afterwards the TCP sender receives the timer expires, and afterwards the TCP sender receives the
duplicate ACK that would have triggered a fast retransmit of the duplicate ACK that would have triggered a fast retransmit of the
oldest outstanding segment. We call this a "fast timeout" since oldest outstanding segment. We call this a "fast timeout" since
in competition with the fast retransmit algorithm the timeout in competition with the fast retransmit algorithm the timeout
was faster. However, a fast timeout is not spurious since was faster. However, a fast timeout is not spurious since
apparently a segment was in fact lost, i.e., loss recovery was apparently a segment was in fact lost, i.e., loss recovery was
initiating rightfully. In this document, we do not consider fast initiated rightfully. In this document, we do not consider fast
timeouts. This case is addressed in an independent document timeouts. This case is addressed in an independent document
[Lu02]. [Lu02].
Packet reordering in the network may occur because IP [RFC791] does Packet reordering in the network may occur because IP [RFC791] does
not guarantee in-order delivery of packets. Additionally, a TCP not guarantee in-order delivery of packets. Additionally, a TCP
receiver generates a duplicate ACK for each segment that arrives out- receiver generates a duplicate ACK for each segment that arrives out-
of-order. This results in a spurious fast retransmit if three or more of-order. This results in a spurious fast retransmit if three or more
data segments arrive out-of-order at the TCP receiver, and at least data segments arrive out-of-order at the TCP receiver, and at least
three of the resulting duplicate ACKs arrive at the TCP sender. This three of the resulting duplicate ACKs arrive at the TCP sender. This
assumes that the duplicate acknowledgement threshold is set to three assumes that the duplicate acknowledgement threshold is set to three
skipping to change at page 5, line 16 skipping to change at page 4, line 50
transmits will typically arrive at the TCP sender before the ACK for transmits will typically arrive at the TCP sender before the ACK for
the retransmit arrives. (This is unless severe packet reordering the retransmit arrives. (This is unless severe packet reordering
coincided with the spurious timeout in such a way that the ACK for coincided with the spurious timeout in such a way that the ACK for
the retransmit is the first acceptable ACK to arrive at the TCP the retransmit is the first acceptable ACK to arrive at the TCP
sender.) Those ACKs for original transmits will then trigger an sender.) Those ACKs for original transmits will then trigger an
implicit go-back-N loss recovery at the TCP sender. Assuming that implicit go-back-N loss recovery at the TCP sender. Assuming that
none of the outstanding segments and none of the corresponding ACKs none of the outstanding segments and none of the corresponding ACKs
were lost, all outstanding segments will get retransmitted were lost, all outstanding segments will get retransmitted
unnecessarily. In fact, during this phase the TCP sender breaks unnecessarily. In fact, during this phase the TCP sender breaks
'packet conservation' [Jac88]. This is because the unnecessary 'packet conservation' [Jac88]. This is because the unnecessary
go-back-N retransmits are sent during slow start, i.e., for each go-back-N retransmits are sent during slow start. Thus, for each
original packet leaving the network, two useless retransmits are sent packet that leaves the network and that belongs to the first half of
into the network. Moreover, some TCPs will in addition suffer from a the original flight, two useless retransmits are sent into the
spurious fast retransmit. This is because the spurious go-back-N network. Moreover, some TCPs will in addition suffer from a spurious
fast retransmit. This is because the unnecessary go-back-N
retransmits will arrive as duplicates at the TCP receiver, which in retransmits will arrive as duplicates at the TCP receiver, which in
turn triggers a series of duplicate ACKs. Note that this last turn triggers a series of duplicate ACKs. Note that this last
spurious fast retransmit could be avoided with the careful variant of spurious fast retransmit could be avoided with the careful variant of
'bug fix' [RFC2582]. 'bug fix' [RFC2582].
More detailed explanations including TCP trace plots that visualize More detailed explanations including TCP trace plots that visualize
the effects of spurious timeouts and packet reordering can be found the effects of spurious timeouts and packet reordering can be found
in [LK00]. in the original proposal [LK00].
3. The Eifel Detection Algorithm 3. The Eifel Detection Algorithm
3.1 The Idea 3.1 The Idea
The goal of the Eifel detection algorithm is to allow the TCP sender The goal of the Eifel detection algorithm is to allow the TCP sender
to detect a posteriori whether it has entered loss recovery to detect a posteriori whether it has entered loss recovery
unnecessarily. Furthermore, the TCP sender should be able to make unnecessarily. Furthermore, the TCP sender should be able to make
this decision upon the first acceptable ACK that arrives after the this decision upon the first acceptable ACK that arrives after the
timeout-based retransmit or the fast retransmit has been sent. This timeout-based retransmit or the fast retransmit has been sent. This
in turn requires extra information in every ACK by which the TCP in turn requires extra information in ACKs by which the TCP sender
sender can unambiguously distinguish whether that first acceptable can unambiguously distinguish whether that first acceptable ACK was
ACK was sent in response to the original transmit or the retransmit. sent in response to the original transmit or the retransmit. Such
Such extra information is provided by the TCP Timestamps option extra information is provided by the TCP Timestamps option [RFC1323].
[RFC1323]. Generally speaking, timestamps are monotonously increasing Generally speaking, timestamps are monotonously increasing "serial
"serial numbers" added into every segment that are then echoed within numbers" added into every segment that are then echoed within the
the corresponding ACKs. This is exploited by the Eifel detection corresponding ACKs. This is exploited by the Eifel detection
algorithm in the following way. algorithm in the following way.
Given that timestamps are enabled for a connection, the TCP sender Given that timestamps are enabled for a connection, the TCP sender
always stores the timestamp of the retransmit sent in the beginning always stores the timestamp of the retransmit sent in the beginning
of loss recovery, i.e., the timestamp of the timeout-based retransmit of loss recovery, i.e., the timestamp of the timeout-based retransmit
or the fast retransmit. If the timestamp of the first acceptable ACK, or the fast retransmit. If the timestamp of the first acceptable ACK,
arriving after the retransmit was sent, is smaller then the stored arriving after the retransmit was sent, is smaller then the stored
timestamp of that retransmit, then that ACK must have been sent in timestamp of that retransmit, then that ACK must have been sent in
response to an original transmit. Hence, the TCP sender must have response to an original transmit. Hence, the TCP sender must have
entered loss recovery unnecessarily. entered loss recovery unnecessarily.
The fact that the Eifel detection algorithm decides upon the first The fact that the Eifel detection algorithm decides upon the first
acceptable ACK is crucial to allow future response algorithms to acceptable ACK is crucial to allow future response algorithms to
avoid the spurious go-back-N retransmits that typically occur after a avoid the unnecessary go-back-N retransmits that typically occur
spurious timeout. Also, if loss recovery was entered unnecessarily, a after a spurious timeout. Also, if loss recovery was entered
window worth of ACKs are outstanding that all carry a timestamp that unnecessarily, a window worth of ACKs are outstanding that all carry
is smaller than the stored timestamp of the retransmit. The arrival a timestamp that is smaller than the stored timestamp of the
of any one of those ACKs suffices the Eifel detection algorithm to retransmit. The arrival of any one of those ACKs suffices the Eifel
work. Hence, the solution is fairly robust against ACK losses. Even detection algorithm to work. Hence, the solution is fairly robust
the ACK sent in response to the retransmit, i.e., the one that against ACK losses. Even the ACK sent in response to the retransmit,
carries the stored timestamp, may get lost. i.e., the one that carries the stored timestamp, may get lost.
Note: Also the DSACK option [RFC2883] can be used to detect a Note: Also the DSACK option [RFC2883] can be used to detect a
posteriori whether the TCP sender has entered loss recovery posteriori whether the TCP sender has entered loss recovery
unnecessarily. However, the first ACK carrying a DSACK option unnecessarily. However, the first ACK carrying a DSACK option
usually arrives at the TCP sender only after loss recovery has usually arrives at the TCP sender only after loss recovery has
already terminated. Thus, the DSACK option cannot be used to already terminated. Thus, the DSACK option cannot be used to
eliminate the retransmission ambiguity. Consequently, it cannot eliminate the retransmission ambiguity. Consequently, it cannot
be used to avoid the mentioned spurious go-back-N retransmits. be used to avoid the mentioned unnecessary go-back-N
Moreover, a DSACK-based detection algorithm is less robust retransmits. Moreover, a DSACK-based detection algorithm is less
against ACK losses. robust against ACK losses.
3.2 The Algorithm 3.2 The Algorithm
Given that the TCP Timestamps option [RFC1323] is enabled for a Given that the TCP Timestamps option [RFC1323] is enabled for a
connection, a TCP sender MAY use the Eifel detection algorithm as connection, a TCP sender MAY use the Eifel detection algorithm as
defined in this subsection. defined in this subsection.
If the Eifel detection algorithm is used, the following steps MUST be If the Eifel detection algorithm is used, the following steps MUST be
taken by the TCP sender, but only upon initiation of loss recovery, taken by the TCP sender, but only upon initiation of loss recovery,
i.e., when either the timeout-based retransmit or the fast retransmit i.e., when either the timeout-based retransmit or the fast retransmit
skipping to change at page 6, line 49 skipping to change at page 6, line 33
segment, e.g., during selective loss recovery. segment, e.g., during selective loss recovery.
(1) Set a "RetransmitTS" variable to the value of the (1) Set a "RetransmitTS" variable to the value of the
Timestamp Value field of the Timestamps option included Timestamp Value field of the Timestamps option included
in the retransmit sent when loss recovery is initiated. A in the retransmit sent when loss recovery is initiated. A
TCP sender must ensure that RetransmitTS does not get TCP sender must ensure that RetransmitTS does not get
overwritten as loss recovery progresses, e.g., in case of overwritten as loss recovery progresses, e.g., in case of
a second timeout and subsequent second retransmit of the a second timeout and subsequent second retransmit of the
same octet. same octet.
(2) Set a "SpuriousRecovery" variable to FALSE. (2) Set a "SpuriousRecovery" variable to FALSE (equal 0).
(3) Wait for the arrival of an acceptable ACK. If an (3) Wait for the arrival of an acceptable ACK. When an
acceptable ACK has arrived, then proceed to step (4). acceptable ACK has arrived proceed to step (4).
(4) If the value of the Timestamp Echo Reply field of the (4) If the acceptable ACK neither carries a SACK option
[RFC2018] nor a DSACK option [RFC2883], then proceed to
step (5),
else proceed to step (DONE).
(5) If the value of the Timestamp Echo Reply field of the
acceptable ACK's Timestamps option is smaller than the acceptable ACK's Timestamps option is smaller than the
value of the variable RetransmitTS, then proceed to step value of the variable RetransmitTS, then proceed to step
(5), (6),
else proceed to step (DONE). else proceed to step (DONE).
(5) If the loss recovery has been initiated with a timeout- (6) If the loss recovery has been initiated with a timeout-
based retransmit, then set based retransmit, then set
SpuriousRecovery <- SPUR_TO, SpuriousRecovery <- SPUR_TO (equal 1),
else set else set
SpuriousRecovery <- dupacks+1 SpuriousRecovery <- dupacks+1
(RESP) Do nothing (Placeholder for a response algorithm). (RESP) Do nothing (Placeholder for a response algorithm).
(DONE) No further processing. (DONE) No further processing.
The comparison "smaller than" in step (4) is conservative. In theory, The comparison "smaller than" in step (5) is conservative. In theory,
when the "timestamp clock" is slow or the network is fast, if the "timestamp clock" is slow or the network is fast, RetransmitTS
RetransmitTS could at most be equal to the timestamp echoed by an ACK could at most be equal to the timestamp echoed by an ACK sent in
sent in response to an original transmit. In that case, it is assumed response to an original transmit. In that case, it is assumed that
that the loss recovery was not falsely triggered. the loss recovery was not falsely triggered.
3.3 Non-RFC1323-compliant TCP Receivers Mask Certain Spurious Timeouts 3.3 A Corner Case: "Timeout due to loss of all ACKs"
Not all TCP implementations strictly follow RFC1323. There are The TCP sender is forced into a timeout even though the oldest
differences concerning which timestamp a TCP receiver echoes when a outstanding segment arrived at the TCP receiver when all ACKs are
duplicate segment arrives. The relevant case to consider in this lost. Although, the resulting retransmit is unnecessary, such a
context is a spurious timeout that occurred because an entire flight timeout is unavoidable. It should therefore not be considered to be
of ACKs got lost (recall the definition of a spurious timeout from spurious.
Section 2). The question is which timestamp the TCP receiver echoes
in response to the spurious retransmit that typically arrives as a
duplicate segment. RFC1323-compliant TCP receivers (e.g., LINUX
kernel 2.4.x) will echo the timestamp of the last segment that
arrived in-sequence, while some non-RFC1323-compliant TCP receivers
will echo the timestamp of the retransmit.
The Eifel detection algorithm will only detect such spurious timeouts The retransmit will arrive as a duplicate at the TCP receiver. In
in case the TCP receiver is RFC1323-compliant in this respect. response to duplicates, RFC1323 mandates that the timestamp of the
Otherwise, the Eifel detection algorithm will simply not kick in. last segment that arrived in-sequence should be echoed. That
This is not any different than from running the TCP sender without timestamp will commonly be smaller than the timestamp carried by the
the Eifel algorithm. retransmit. Consequently, the Eifel detection algorithm will mistake
such a timeout as spurious, unless the SACK and DSACK option are
enabled for that TCP connection.
Note: Not all TCP implementations strictly follow RFC1323. In
response to a duplicate data segment, some TCP receivers will
echo the timestamp of the duplicate. With such TCP receivers,
the corner case discussed in this section does not apply. That
is, even though all ACKs were lost, and independent of whether
the SACK and DSACK option was enabled for a connection, the
Eifel detection algorithm would have no effect.
With the SACK and DSACK option the Eifel detection algorithm will
detect this corner case. This motivates step (4) of the algorithm. If
data segments beyond the oldest outstanding one are also lost, then
the first acceptable ACK that arrives during loss recovery will carry
a SACK or a DSACK option. That ACK must have been sent in response to
a retransmit of the oldest outstanding segment that arrived as a
duplicate at the TCP receiver. It carries a SACK option if there are
holes in the TCP receiver's sequence space. Otherwise, it carries a
DSACK option.
A concern with this corner case arises if the Eifel detection
algorithm is combined with a response algorithm like the Eifel
response algorithm [LG02]. That algorithm will back out of loss
recovery by reversing congestion control state after a spurious
timeout has been detected. I.e., this would also happen in cases when
all ACKs are lost, and the SACK and/or the DSACK option are not used
for that connection. The argument against that, is that the TCP
sender should keep its congestion window halved in case all ACKs are
lost since that is a sign of severe congestion on the ACK path.
This is not really a problem as long as data segments were lost in
addition to the entire flight of ACKs. The Eifel detection algorithm
would misinterpret the timeout as spurious, and the Eifel response
algorithm would reverse congestion control state. Still, the TCP
sender would respond to congestion (in the data path) as soon as it
finds out about the first loss in the outstanding flight. I.e., the
TCP sender would still half its congestion window for that flight of
packets.
The concern remains, though, in case the DSACK option is not enabled,
and an entire flight of ACKs is lost while all of the data segments
arrive at the TCP receiver. Without the Eifel detection and response
algorithm a TCP sender would go into slow start. With those
algorithms it would not respond to the congestion in the ACK path.
We do not believe that this is a serious concern. First, we assume
that DSACK will get increasingly deployed. And even if this was not
the case, this special corner case must occur sufficiently often to
become a problem for the progress of a TCP connection. This seems
rather pathological, since this suggests that the ACK path is pretty
much broken. It is unlikely that any TCP could manage to grow its
congestion much beyond one maximum segment size with such an ACK
path. And in that case, the reversing of congestion control state
becomes meaningless.
3.4 Protecting Against Misbehaving TCP Receivers (the Safe Variant) 3.4 Protecting Against Misbehaving TCP Receivers (the Safe Variant)
A TCP receiver can easily make a genuine retransmit appear to the TCP A TCP receiver can easily make a genuine retransmit appear to the TCP
sender as a spurious retransmit by forging echoed timestamps. This sender as a spurious retransmit by forging echoed timestamps. This
may pose a security concern. may pose a security concern.
Fortunately, there is a way to modify the Eifel detection algorithm Fortunately, there is a way to modify the Eifel detection algorithm
in a way that makes it robust against lying TCP receivers. The idea in a way that makes it robust against lying TCP receivers. The idea
is to use timestamps as a "segment's secret" that the TCP receiver is to use timestamps as a "segment's secret" that the TCP receiver
skipping to change at page 8, line 29 skipping to change at page 9, line 19
Step (1) is replaced with step (1'): Step (1) is replaced with step (1'):
(1') Set a "RetransmitTS" variable to the value of the (1') Set a "RetransmitTS" variable to the value of the
Timestamp Value field of the Timestamps option that was Timestamp Value field of the Timestamps option that was
included in the original transmit corresponding to the included in the original transmit corresponding to the
retransmit. Note: This step requires that the TCP sender retransmit. Note: This step requires that the TCP sender
stores the timestamps of all outstanding original stores the timestamps of all outstanding original
transmits. transmits.
Step (4) is replaced with step (4'): Step (5) is replaced with step (5'):
(4') If the value of the Timestamp Echo Reply field of the (5') If the value of the Timestamp Echo Reply field of the
acceptable ACK's Timestamps option is equal to the value acceptable ACK's Timestamps option is equal to the value
of the variable RetransmitTS, then proceed to step (5), of the variable RetransmitTS, then proceed to step (6),
else proceed to step (DONE). else proceed to step (DONE).
These modifications come at a cost: the modified algorithm is fairly These modifications come at a cost: the modified algorithm is fairly
sensitive against ACK losses since it relies on the arrival of the sensitive against ACK losses since it relies on the arrival of the
acceptable ACK that corresponds to the original transmit. acceptable ACK that corresponds to the original transmit.
Note: The first acceptable ACK that arrives after loss recovery Note: The first acceptable ACK that arrives after loss recovery
has been unnecessarily entered, should echo the timestamp of the has been unnecessarily entered, should echo the timestamp of the
original transmit. This assumes that the ACK corresponding to original transmit. This assumes that the ACK corresponding to
the original transmit was not lost, that that ACK was not the original transmit was not lost, that that ACK was not
reordered in the network, and that the TCP receiver does not reordered in the network, and that the TCP receiver does not
forge timestamps but complies with RFC1323. In case of a forge timestamps but complies with RFC1323. In case of a
spurious fast retransmit, this is implied by the rules for spurious fast retransmit, this is implied by the rules for
generating ACKs for data segments that fill in all or part of a generating ACKs for data segments that fill in all or part of a
gap in the sequence space (see section 4.2 of [RFC2581]) and by gap in the sequence space (see section 4.2 of [RFC2581]) and by
the rules for echoing timestamps in that case (see rule (C) in the rules for echoing timestamps in that case (see rule (C) in
section 3.4 of [RFC1323]). In case of a spurious timeout, it is section 3.4 of [RFC1323]). In case of a spurious timeout, it is
likely that the delay that has caused the spurious timeout has likely that the delay (in the data path) that has caused the
also caused the TCP receiver's delayed ACK timer [RFC1122] to spurious timeout has also caused the TCP receiver's delayed ACK
expire before the original transmit arrives. Also, in this case timer [RFC1122] to expire before the original transmit arrives.
the rules for generating ACKs and the rules for echoing Also, in this case the rules for generating ACKs and the rules
timestamps (see rule (A) in section 3.4 of [RFC1323]) ensure for echoing timestamps (see rule (A) in section 3.4 of
that the original transmit's timestamp is echoed. [RFC1323]) ensure that the original transmit's timestamp is
echoed.
A remaining problem is that a TCP receiver might guess a lost A remaining problem is that a TCP receiver might guess a lost
segment's timestamp from observing the timestamps of segments that segment's timestamp from observing the timestamps of segments that
arrived earlier. This could be avoided by having the TCP sender add a arrived earlier. This could be avoided by having the TCP sender add a
"random counter" to the timestamp of every segment it sends, and then "random counter" to the timestamp of every segment it sends, and then
increment that counter by a random value, e.g., between 1 and 100. increment that counter by a random value, e.g., between 1 and 100.
This ensures that timestamps remain monotonously increasing while This ensures that timestamps remain monotonously increasing while
making it difficult for a TCP receiver to guess the timestamp of a making it difficult for a TCP receiver to guess the timestamp of a
lost segment. lost segment.
4. Security Considerations 4. Security Considerations
There do not seem to be any security considerations associated with There do not seem to be any security considerations associated with
the Eifel detection algorithm. This is because the Eifel detection the Eifel detection algorithm. This is because the Eifel detection
algorithm does not alter existing protocol state at the TCP sender algorithm does not alter existing protocol state at the TCP sender
nor at the receiver. That is, no value of a TCP sender's state nor at the receiver. That is, no value of a TCP sender's state
skipping to change at page 9, line 36 skipping to change at page 10, line 28
proposed in Section 3.4 that makes it robust against lying TCP proposed in Section 3.4 that makes it robust against lying TCP
receivers. receivers.
Acknowledgments Acknowledgments
Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally
Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi
Sarolahti, and Alexey Kuznetsov for useful discussions that Sarolahti, and Alexey Kuznetsov for useful discussions that
contributed to this work. contributed to this work.
References Normative References
[RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control,
RFC 2581, April 1999. RFC 2581, April 1999.
[RFC2119] S. Bradner, Key words for use in RFCs to Indicate [RFC2119] S. Bradner, Key words for use in RFCs to Indicate
Requirement Levels, RFC 2119, March 1997. Requirement Levels, RFC 2119, March 1997.
[RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's
Fast Recovery Algorithm, RFC 2582, April 1999.
[RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow,
An Extension to the Selective Acknowledgement (SACK) Option An Extension to the Selective Acknowledgement (SACK) Option
for TCP, RFC 2883, July 2000. for TCP, RFC 2883, July 2000.
[RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
Performance, RFC 1323, May 1992.
[RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective
Acknowledgement Options, RFC 2018, October 1996.
[RFC793] J. Postel, Transmission Control Protocol, RFC793, September
1981.
Informative References
[RFC1122] R. Braden, Requirements for Internet Hosts - Communication
Layers, RFC 1122, October 1989.
[RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's
Fast Recovery Algorithm, RFC 2582, April 1999.
[Gu01] A. Gurtov, Effect of Delays on TCP Performance, In [Gu01] A. Gurtov, Effect of Delays on TCP Performance, In
Proceedings of IFIP Personal Wireless Communications, Proceedings of IFIP Personal Wireless Communications,
August 2001. August 2001.
[IMLGK02] H. Inamura et. al., TCP over Second (2.5G) and Third (3G)
Generation Wireless Networks, work in progress, July 2002.
[Jac88] V. Jacobson, Congestion Avoidance and Control, In [Jac88] V. Jacobson, Congestion Avoidance and Control, In
Proceedings of ACM SIGCOMM 88. Proceedings of ACM SIGCOMM 88.
[RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
Performance, RFC 1323, May 1992.
[KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates
in Reliable Transport Protocols, In Proceedings of ACM in Reliable Transport Protocols, In Proceedings of ACM
SIGCOMM 87. SIGCOMM 87.
[LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP
Robust Against Spurious Retransmissions, ACM Computer Robust Against Spurious Retransmissions, ACM Computer
Communication Review, Vol. 30, No. 1, January 2000, Communication Review, Vol. 30, No. 1, January 2000.
available at http://www.acm.org/sigcomm/ccr/archive/2000/
jan00/ccr-200001-ludwig.html (easier studied when
viewed/printed in color).
[LG02] R. Ludwig, A. Gurtov, The Eifel Response Algorithm for TCP, [LG02] R. Ludwig, A. Gurtov, The Eifel Response Algorithm for TCP,
work in progress, July 2002. work in progress, October 2002.
[Lu02] R. Ludwig, Responding to Fast Timeouts in TCP, work in [Lu02] R. Ludwig, Responding to Fast Timeouts in TCP, work in
progress, July 2002. progress, July 2002.
[RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer,
RFC 2988, November 2000. RFC 2988, November 2000.
[RFC791] J. Postel, Internet Protocol, RFC 791, September 1981. [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981.
[RFC793] J. Postel, Transmission Control Protocol, RFC793, September
1981.
[WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2
(The Implementation), Addison Wesley, January 1995. (The Implementation), Addison Wesley, January 1995.
[Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of
ACM SIGCOMM 88.
Author's Address Author's Address
Reiner Ludwig Reiner Ludwig
Ericsson Research Ericsson Research
Ericsson Allee 1 Ericsson Allee 1
52134 Herzogenrath, Germany 52134 Herzogenrath, Germany
Email: Reiner.Ludwig@eed.ericsson.se Email: Reiner.Ludwig@eed.ericsson.se
Michael Meyer Michael Meyer
Ericsson Research Ericsson Research
Ericsson Allee 1 Ericsson Allee 1
52134 Herzogenrath, Germany 52134 Herzogenrath, Germany
Email: Michael.Meyer@eed.ericsson.se Email: Michael.Meyer@eed.ericsson.se
This Internet-Draft expires in January 2003. This Internet-Draft expires in April 2003.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/