draft-ietf-tsvwg-tcp-eifel-alg-06.txt   draft-ietf-tsvwg-tcp-eifel-alg-07.txt 
Network Working Group Reiner Ludwig Network Working Group Reiner Ludwig
INTERNET-DRAFT Michael Meyer INTERNET-DRAFT Michael Meyer
Expires: April 2003 Ericsson Research Expires: June 2003 Ericsson Research
October, 2002 December, 2002
The Eifel Detection Algorithm for TCP The Eifel Detection Algorithm for TCP
<draft-ietf-tsvwg-tcp-eifel-alg-06.txt> <draft-ietf-tsvwg-tcp-eifel-alg-07.txt>
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
skipping to change at page 3, line 31 skipping to change at page 3, line 31
A key feature of the Eifel detection algorithm is that it already A key feature of the Eifel detection algorithm is that it already
detects upon the first acceptable ACK that arrives during loss detects upon the first acceptable ACK that arrives during loss
recovery whether a fast retransmit or a timeout was spurious. This is recovery whether a fast retransmit or a timeout was spurious. This is
crucial to be able to avoid the mentioned go-back-N retransmits. crucial to be able to avoid the mentioned go-back-N retransmits.
Another feature is that the Eifel detection algorithm is fairly Another feature is that the Eifel detection algorithm is fairly
robust against the loss of ACKs. robust against the loss of ACKs.
Also the DSACK option [RFC2883] can be used to detect a posteriori Also the DSACK option [RFC2883] can be used to detect a posteriori
whether a TCP sender has entered loss recovery unnecessarily [BA02]. whether a TCP sender has entered loss recovery unnecessarily [BA02].
However, the first ACK carrying a DSACK option usually arrives at the However, the first ACK carrying a DSACK option usually arrives at a
TCP sender only after loss recovery has already terminated. Thus, the TCP sender only after loss recovery has already terminated. Thus, the
DSACK option cannot be used to eliminate the retransmission DSACK option cannot be used to eliminate the retransmission
ambiguity. Consequently, it cannot be used to avoid the mentioned ambiguity. Consequently, it cannot be used to avoid the mentioned
unnecessary go-back-N retransmits. Moreover, a DSACK-based detection unnecessary go-back-N retransmits. Moreover, a DSACK-based detection
algorithm is less robust against ACK losses. A recent proposal algorithm is less robust against ACK losses. A recent proposal
neither based on the TCP timestamps nor the DSACK option does not neither based on the TCP timestamps nor the DSACK option does not
have the limitation of DSACK-based schemes, but only addresses the have the limitation of DSACK-based schemes, but only addresses the
case of spurious timeouts [SK02]. case of spurious timeouts [SK02].
2. Events that Falsely Trigger TCP Loss Recovery 2. Events that Falsely Trigger TCP Loss Recovery
skipping to change at page 4, line 14 skipping to change at page 4, line 14
A spurious timeout is a timeout that would not have occurred had the A spurious timeout is a timeout that would not have occurred had the
sender "waited longer". This may be caused by increased delay that sender "waited longer". This may be caused by increased delay that
suddenly occurs in the data and/or the ACK path. That in turn might suddenly occurs in the data and/or the ACK path. That in turn might
cause an acceptable ACK to arrive too late, i.e., only after a TCP cause an acceptable ACK to arrive too late, i.e., only after a TCP
sender's retransmission timer has expired. For the purpose of sender's retransmission timer has expired. For the purpose of
specifying the algorithm in Section 3, we define this case as SPUR_TO specifying the algorithm in Section 3, we define this case as SPUR_TO
(equal 1). (equal 1).
Note: There is another case where a timeout would not have Note: There is another case where a timeout would not have
occurred had the sender "waited longer": the retransmission timer occurred had the sender "waited longer": the retransmission timer
expires, and afterwards a TCP sender receives the duplicate ACK expires, and afterwards the TCP sender receives the duplicate ACK
that would have triggered a fast retransmit of the oldest that would have triggered a fast retransmit of the oldest
outstanding segment. We call this a "fast timeout" since in outstanding segment. We call this a "fast timeout" since in
competition with the fast retransmit algorithm the timeout was competition with the fast retransmit algorithm the timeout was
faster. However, a fast timeout is not spurious since apparently a faster. However, a fast timeout is not spurious since apparently a
segment was in fact lost, i.e., loss recovery was initiated segment was in fact lost, i.e., loss recovery was initiated
rightfully. In this document, we do not consider fast timeouts. rightfully. In this document, we do not consider fast timeouts.
This case is addressed in an independent document [Lu02]. This case is addressed in an independent document [Lu02].
Packet reordering in the network may occur because IP [RFC791] does Packet reordering in the network may occur because IP [RFC791] does
not guarantee in-order delivery of packets. Additionally, a TCP not guarantee in-order delivery of packets. Additionally, a TCP
skipping to change at page 5, line 8 skipping to change at page 5, line 8
start [RFC2581]. Then, as the connection progresses, a chain reaction start [RFC2581]. Then, as the connection progresses, a chain reaction
gets triggered that further decreases TCP's performance. Since the gets triggered that further decreases TCP's performance. Since the
timeout was spurious, at least some ACKs for original transmits timeout was spurious, at least some ACKs for original transmits
typically arrive at the TCP sender before the ACK for the retransmit typically arrive at the TCP sender before the ACK for the retransmit
arrives. (This is unless severe packet reordering coincided with the arrives. (This is unless severe packet reordering coincided with the
spurious timeout in such a way that the ACK for the retransmit is the spurious timeout in such a way that the ACK for the retransmit is the
first acceptable ACK to arrive at the TCP sender.) Those ACKs for first acceptable ACK to arrive at the TCP sender.) Those ACKs for
original transmits then trigger an implicit go-back-N loss recovery original transmits then trigger an implicit go-back-N loss recovery
at the TCP sender. Assuming that none of the outstanding segments and at the TCP sender. Assuming that none of the outstanding segments and
none of the corresponding ACKs were lost, all outstanding segments none of the corresponding ACKs were lost, all outstanding segments
get retransmitted unnecessarily. In fact, during this phase the TCP get retransmitted unnecessarily. In fact, during this phase a TCP
sender breaks 'packet conservation' [Jac88]. This is because the sender breaks 'packet conservation' [Jac88]. This is because the
unnecessary go-back-N retransmits are sent during slow start. Thus, unnecessary go-back-N retransmits are sent during slow start. Thus,
for each packet that leaves the network and that belongs to the first for each packet that leaves the network and that belongs to the first
half of the original flight, two useless retransmits are sent into half of the original flight, two useless retransmits are sent into
the network. Moreover, some TCPs in addition suffer from a spurious the network. Moreover, some TCPs in addition suffer from a spurious
fast retransmit. This is because the unnecessary go-back-N fast retransmit. This is because the unnecessary go-back-N
retransmits arrive as duplicates at the TCP receiver, which in turn retransmits arrive as duplicates at the TCP receiver, which in turn
triggers a series of duplicate ACKs. Note that this last spurious triggers a series of duplicate ACKs. Note that this last spurious
fast retransmit could be avoided with the careful variant of 'bug fast retransmit could be avoided with the careful variant of 'bug
fix' [RFC2582]. fix' [RFC2582].
skipping to change at page 5, line 42 skipping to change at page 5, line 42
timeout-based retransmit or the fast retransmit has been sent. This timeout-based retransmit or the fast retransmit has been sent. This
in turn requires extra information in ACKs by which the TCP sender in turn requires extra information in ACKs by which the TCP sender
can unambiguously distinguish whether that first acceptable ACK was can unambiguously distinguish whether that first acceptable ACK was
sent in response to the original transmit or the retransmit. Such sent in response to the original transmit or the retransmit. Such
extra information is provided by the TCP Timestamps option [RFC1323]. extra information is provided by the TCP Timestamps option [RFC1323].
Generally speaking, timestamps are monotonously increasing "serial Generally speaking, timestamps are monotonously increasing "serial
numbers" added into every segment that are then echoed within the numbers" added into every segment that are then echoed within the
corresponding ACKs. This is exploited by the Eifel detection corresponding ACKs. This is exploited by the Eifel detection
algorithm in the following way. algorithm in the following way.
Given that timestamps are enabled for a connection, the TCP sender Given that timestamps are enabled for a connection, a TCP sender
always stores the timestamp of the retransmit sent in the beginning always stores the timestamp of the retransmit sent in the beginning
of loss recovery, i.e., the timestamp of the timeout-based retransmit of loss recovery, i.e., the timestamp of the timeout-based retransmit
or the fast retransmit. If the timestamp of the first acceptable ACK, or the fast retransmit. If the timestamp of the first acceptable ACK,
that arrives after the retransmit was sent, is smaller then the that arrives after the retransmit was sent, is smaller then the
stored timestamp of that retransmit, then that ACK must have been stored timestamp of that retransmit, then that ACK must have been
sent in response to an original transmit. Hence, the TCP sender must sent in response to an original transmit. Hence, the TCP sender must
have entered loss recovery unnecessarily. have entered loss recovery unnecessarily.
The fact that the Eifel detection algorithm decides upon the first The fact that the Eifel detection algorithm decides upon the first
acceptable ACK is crucial to allow future response algorithms to acceptable ACK is crucial to allow future response algorithms to
skipping to change at page 6, line 42 skipping to change at page 6, line 41
sender must ensure that RetransmitTS does not get sender must ensure that RetransmitTS does not get
overwritten as loss recovery progresses, e.g., in case of overwritten as loss recovery progresses, e.g., in case of
a second timeout and subsequent second retransmit of the a second timeout and subsequent second retransmit of the
same octet. same octet.
(3) Wait for the arrival of an acceptable ACK. When an (3) Wait for the arrival of an acceptable ACK. When an
acceptable ACK has arrived proceed to step (4). acceptable ACK has arrived proceed to step (4).
(4) If the value of the Timestamp Echo Reply field of the (4) If the value of the Timestamp Echo Reply field of the
acceptable ACK's Timestamps option is smaller than the acceptable ACK's Timestamps option is smaller than the
value of the variable RetransmitTS, then proceed to step value of RetransmitTS, then proceed to step (5),
(5),
else proceed to step (DONE). else proceed to step (DONE).
(5) If the acceptable ACK does not carry a DSACK option (5) If the acceptable ACK carries a DSACK option [RFC2883],
[RFC2883], then proceed to step (6), then proceed to step (DONE),
else if during the lifetime of the TCP connection the TCP
sender has previously received an ACK with a DSACK option,
or the acceptable ACK does not acknowledge all outstanding
data, then proceed to step (6),
else proceed to step (DONE). else proceed to step (DONE).
(6) If the loss recovery has been initiated with a timeout- (6) If the loss recovery has been initiated with a timeout-
based retransmit, then set based retransmit, then set
SpuriousRecovery <- SPUR_TO (equal 1), SpuriousRecovery <- SPUR_TO (equal 1),
else set else set
SpuriousRecovery <- dupacks+1 SpuriousRecovery <- dupacks+1
(RESP) Do nothing (Placeholder for a response algorithm). (RESP) Do nothing (Placeholder for a response algorithm).
(DONE) No further processing. (DONE) No further processing.
The comparison "smaller than" in step (4) is conservative. In theory, The comparison "smaller than" in step (4) is conservative. In theory,
if the timestamp clock is slow or the network is fast, RetransmitTS if the timestamp clock is slow or the network is fast, RetransmitTS
could at most be equal to the timestamp echoed by an ACK sent in could at most be equal to the timestamp echoed by an ACK sent in
response to an original transmit. In that case, it is assumed that response to an original transmit. In that case, it is assumed that
the loss recovery was not falsely triggered. the loss recovery was not falsely triggered.
3.3 A Corner Case: "Timeout due to loss of all ACKs" Note that the condition "if during the lifetime of the TCP connection
the TCP sender has previously received an ACK with a DSACK option" in
step (5) would be true in case the TCP receiver would signal in the
SYN that it is DSACK-enabled. But unfortunately, this is not required
by [RFC2883].
3.3 Fixing a Corner Case: "Timeout due to loss of all ACKs" (step 5)
Even though the oldest outstanding segment arrived at a TCP receiver, Even though the oldest outstanding segment arrived at a TCP receiver,
the TCP sender is forced into a timeout when all ACKs are lost. the TCP sender is forced into a timeout if all ACKs are lost.
Although, the resulting retransmit is unnecessary, such a timeout is Although, the resulting retransmit is unnecessary, such a timeout is
unavoidable. It should therefore not be considered to be spurious. unavoidable. It should therefore not be considered to be spurious.
This particular case of a timeout is considered in this section. Moreover, the subsequent reduction of the congestion window is a
crucial response to the potentially heavy congestion in the ACK path.
The original proposal [LK00] does not handle this case well. It
effectively disables this implicit form of congestion control for the
ACK path, which otherwise does not exist in TCP. This problem is
fixed by step (5) of the Eifel detection algorithm as explained in
the remainder of this section.
In that case, the retransmit arrives as a duplicate at the TCP If all ACKs are lost while the oldest outstanding segment arrived at
receiver. In response to duplicates, RFC1323 mandates that the the TCP receiver, the retransmit arrives as a duplicate. In response
timestamp of the last segment that arrived in-sequence should be to duplicates, RFC1323 mandates that the timestamp of the last
echoed. That timestamp is carried by the first acceptable ACK that segment that arrived in-sequence should be echoed. That timestamp is
arrives at the TCP sender after loss recovery was entered, and is carried by the first acceptable ACK that arrives at the TCP sender
commonly smaller than the timestamp carried by the retransmit. after loss recovery was entered, and is commonly smaller than the
Consequently, the Eifel detection algorithm mistakes such a timeout timestamp carried by the retransmit. Consequently, the Eifel
as spurious, unless that first acceptable ACK also carries a DSACK detection algorithm misinterprets such a timeout as being spurious,
option (see step (5) of the algorithm). unless the TCP receiver is DSACK-enabled [RFC2883]. In that case, the
acceptable ACK carries a DSACK option, and the Eifel algorithm is
terminated through the first part of step (5).
Note: Not all TCP implementations strictly follow RFC1323. In Note: Not all TCP implementations strictly follow RFC1323. In
response to a duplicate data segment, some TCP receivers echo the response to a duplicate data segment, some TCP receivers echo the
timestamp of the duplicate. With such TCP receivers, the corner timestamp of the duplicate. With such TCP receivers, the corner
case discussed in this section does not apply. The timestamp case discussed in this section does not apply. The timestamp
carried by the retransmit would be echoed in the first acceptable carried by the retransmit would be echoed in the first acceptable
ACK, and the Eifel detection algorithm would be terminated through ACK, and the Eifel detection algorithm would be terminated through
step (4). Thus, even though all ACKs were lost, and independent of step (4). Thus, even though all ACKs were lost, and independent of
whether the DSACK option was enabled for a connection, the Eifel whether the DSACK option was enabled for a connection, the Eifel
detection algorithm would have no effect. detection algorithm would have no effect.
A concern with this corner case arises if the Eifel detection With TCP receivers that are not DSACK-enabled, disabling the
algorithm is combined with a response algorithm like the Eifel mentioned implicit congestion control for the ACK path is not a
response algorithm [LG02]. That algorithm backs out of loss recovery problem as long as data segments are lost in addition to the entire
by reversing congestion control state after a spurious timeout has flight of ACKs. The Eifel detection algorithm misinterprets such a
been detected. The argument against doing so, is that the TCP sender timeout as being spurious, and the Eifel response algorithm would
should keep its congestion window halved in case all ACKs are lost reverse congestion control state. Still, the TCP sender would respond
since that is a sign of severe congestion on the ACK path. to congestion (in the data path) as soon as it finds out about the
first loss in the outstanding flight. I.e., the TCP sender would
This is not really a problem as long as data segments were lost in still half its congestion window for that flight of packets. If no
addition to the entire flight of ACKs. The Eifel detection algorithm data segment is lost while the entire flight of ACKs is lost, the
would misinterpret the timeout as spurious, and the Eifel response first acceptable ACK, that arrives at the TCP sender after loss
algorithm would reverse congestion control state. Still, the TCP recovery was entered, acknowledges all outstanding data. In that
sender would respond to congestion (in the data path) as soon as it case, the Eifel algorithm is terminated through the second part of
finds out about the first loss in the outstanding flight. I.e., the step (5).
TCP sender would still half its congestion window for that flight of
packets.
The concern remains, though, in case the DSACK option is not enabled,
and an entire flight of ACKs is lost while all of the data segments
arrive at the TCP receiver. Without the Eifel detection and response
algorithm the TCP sender would go into slow start. With those
algorithms it would not respond to the congestion in the ACK path.
We do not believe that this is a serious concern. First, we assume Note that there is little concern about breaking 'packet
that DSACK will get increasingly deployed. And even if this was not conservation' when entering slow start after an unavoidable timeout
the case, this special corner case must occur sufficiently often to caused by the loss of an entire flight of ACKs, i.e., when the Eifel
become a problem for the progress of a TCP connection. It is unlikely detection algorithm was terminated through step (5). This is because
that any TCP could manage to grow its congestion window much beyond in that case, the acceptable ACK corresponds to the retransmit, which
one maximum segment size with such an ACK path. And in that case, the is a strong indication that the pipe has drained entirely, i.e., that
reversing of congestion control state becomes meaningless. Moreover, no more original transmits are in the network. This is different with
in an implementation one might choose to disable the Eifel detection spurious timeouts as discussed in Section 2.
algorithm if such an ACK path is encountered.
3.4 Protecting Against Misbehaving TCP Receivers (the Safe Variant) 3.4 Protecting Against Misbehaving TCP Receivers (the Safe Variant)
A TCP receiver can easily make a genuine retransmit appear to a TCP A TCP receiver can easily make a genuine retransmit appear to the TCP
sender as a spurious retransmit by forging echoed timestamps. This sender as a spurious retransmit by forging echoed timestamps. This
may pose a security concern. may pose a security concern.
Fortunately, there is a way to modify the Eifel detection algorithm Fortunately, there is a way to modify the Eifel detection algorithm
in a way that makes it robust against lying TCP receivers. The idea in a way that makes it robust against lying TCP receivers. The idea
is to use timestamps as a "segment's secret" that a TCP receiver only is to use timestamps as a "segment's secret" that a TCP receiver only
gets to know if it receives the segment. Conversely, a TCP receiver gets to know if it receives the segment. Conversely, a TCP receiver
will not know the timestamp of a segment that was lost. Hence, to will not know the timestamp of a segment that was lost. Hence, to
"prove" that it received the original transmit of a segment that a "prove" that it received the original transmit of a segment that a
TCP sender retransmitted, the TCP receiver would need to return the TCP sender retransmitted, the TCP receiver would need to return the
skipping to change at page 9, line 49 skipping to change at page 9, line 52
original transmit arrives. Also, in this case the rules for original transmit arrives. Also, in this case the rules for
generating ACKs and the rules for echoing timestamps (see rule (A) generating ACKs and the rules for echoing timestamps (see rule (A)
in section 3.4 of [RFC1323]) ensure that the original transmit's in section 3.4 of [RFC1323]) ensure that the original transmit's
timestamp is echoed. timestamp is echoed.
A remaining problem is that a TCP receiver might guess a lost A remaining problem is that a TCP receiver might guess a lost
segment's timestamp from observing the timestamps of recently segment's timestamp from observing the timestamps of recently
received segments. For example, if segment N was lost while segment received segments. For example, if segment N was lost while segment
N-1 and N+1 have arrived, a TCP receiver could guess the timestamp N-1 and N+1 have arrived, a TCP receiver could guess the timestamp
that lies in the middle of the timestamps of segments N-1 and N+1, that lies in the middle of the timestamps of segments N-1 and N+1,
and echo it in the ACK for segment N+1. Especially if a TCP sender and echo it in the ACK for segment N+1. Especially if the TCP sender
implements timestamps with a coarse granularity, a misbehaving TCP implements timestamps with a coarse granularity, a misbehaving TCP
receiver is likely to be successful with such an approach. In fact, receiver is likely to be successful with such an approach. In fact,
with the 500 ms granularity suggested in [WS95], it even becomes with the 500 ms granularity suggested in [WS95], it even becomes
quite likely that the timestamps of segments N-1, N, N+1 are quite likely that the timestamps of segments N-1, N, N+1 are
identical. identical.
One way to reduce this risk is to implement fine grained timestamps. One way to reduce this risk is to implement fine grained timestamps.
Note that the granularity of the timestamps is independent of the Note that the granularity of the timestamps is independent of the
granularity of the retransmission timer. For example, some TCP granularity of the retransmission timer. For example, some TCP
implementations run a timestamp clock that even ticks every implementations run a timestamp clock that even ticks every
microsecond. This should make it more difficult for a TCP receiver to microsecond. This should make it more difficult for a TCP receiver to
guess the timestamp of a lost segment. Alternatively, it might be guess the timestamp of a lost segment. Alternatively, it might be
possible to combine the timestamps with a nonce, as done for the possible to combine the timestamps with a nonce, as done for the
Explicit Congestion Notification (ECN) [RFC3168]. One would need to Explicit Congestion Notification (ECN) [RFC3168]. One would need to
take care, though, that the timestamps of succeeding segments remain take care, though, that the timestamps of succeeding segments remain
monotonously increasing and do not interfere with the RTT timing monotonously increasing and do not interfere with the RTT timing
defined in [RFC1323]. defined in [RFC1323].
4. Security Considerations 4. IPR Considerations
The IETF has been notified of intellectual property rights claimed in
regard to some or all of the specification contained in this
document. For more information consult the online list of claimed
rights at http://www.ietf.org/ipr.
The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it
has made any effort to identify any such rights. Information on the
IETF's procedures with respect to rights in standards-track and
standards-related documentation can be found in BCP-11. Copies of
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to
obtain a general license or permission for the use of such
proprietary rights by implementors or users of this specification can
be obtained from the IETF Secretariat.
5. Security Considerations
There do not seem to be any security considerations associated with There do not seem to be any security considerations associated with
the Eifel detection algorithm. This is because the Eifel detection the Eifel detection algorithm. This is because the Eifel detection
algorithm does not alter existing protocol state at a TCP sender. algorithm does not alter existing protocol state at a TCP sender.
Note that the Eifel detection algorithm only requires changes to the Note that the Eifel detection algorithm only requires changes to the
implementation of the TCP sender. implementation of a TCP sender.
Moreover, a variant of the Eifel detection algorithm has been Moreover, a variant of the Eifel detection algorithm has been
proposed in Section 3.4 that makes it robust against lying TCP proposed in Section 3.4 that makes it robust against lying TCP
receivers. receivers. This may become relevant when the Eifel detection
algorithm is combined with a response algorithm such as the Eifel
response algorithm [LG02].
Acknowledgments Acknowledgments
Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally
Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi
Sarolahti, and Alexey Kuznetsov for useful discussions that Sarolahti, and Alexey Kuznetsov for useful discussions that
contributed to this work. contributed to this work.
Normative References Normative References
skipping to change at page 12, line 25 skipping to change at page 12, line 52
Ericsson Allee 1 Ericsson Allee 1
52134 Herzogenrath, Germany 52134 Herzogenrath, Germany
Email: Reiner.Ludwig@eed.ericsson.se Email: Reiner.Ludwig@eed.ericsson.se
Michael Meyer Michael Meyer
Ericsson Research Ericsson Research
Ericsson Allee 1 Ericsson Allee 1
52134 Herzogenrath, Germany 52134 Herzogenrath, Germany
Email: Michael.Meyer@eed.ericsson.se Email: Michael.Meyer@eed.ericsson.se
This Internet-Draft expires in April 2003. This Internet-Draft expires in June 2003.
 End of changes. 

This html diff was produced by rfcdiff 1.25, available from http://www.levkowetz.com/ietf/tools/rfcdiff/