Network Working Group                                      Reiner Ludwig
INTERNET-DRAFT                                         Ericsson Research
Expires: May 2002                                         November, 2001

                      The Eifel Algorithm for TCP
               <draft-ietf-tsvwg-tcp-eifel-alg-01.txt>
                <draft-ietf-tsvwg-tcp-eifel-alg-02.txt>

Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/lid-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

Abstract

   A solution to eliminate the retransmission ambiguity in TCP, is to
   mark ACKs with a special retransmit-marker. The marker would need to
   be present in those ACKs, and only those ACKs, that the TCP receiver
   sends in response to retransmits; both genuine and spurious
   retransmits. Based on such a retransmit-marker, the Eifel algorithm
   allows the TCP sender to detect a posteriori that a fast retransmit
   or a timeout was spurious. Three alternative retransmit-markers are
   defined in this document, and the Eifel algorithm may be based on
   either one of them: the TCP RXT flag, the TCP Timestamps option, and
   the TCP SACK option. The Eifel algorithm provides a basis for future
   TCP enhancements such as response schemes that may change a TCP
   sender's protocol state to improve end-to-end performance.

Terminology

   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in [RFC2119].

   We use the term 'new ACK' to refer to an ACK that acknowledges
   outstanding data. We use the term 'duplicate' ACK as defined in
   [WS95]. Furthermore, we refer to the first-time transmission of a
   data segment as the 'original transmit', and 'HighData' is the
   highest sequence number transmitted at a given point.

1. Introduction

   The retransmission ambiguity problem [KP87] is the TCP sender's
   inability to distinguish whether the first new ACK that arrives after
   a retransmit was sent in response to the original transmit or the
   retransmit. A solution to eliminate the retransmission ambiguity is
   to mark ACKs with a special "retransmit-marker". The marker would
   need to be present in those ACKs, and only those ACKs, that the TCP
   receiver sends in response to retransmits; both genuine and spurious
   retransmits.

   Based on such a retransmit-marker, the Eifel algorithm allows the TCP
   sender to detect a posteriori that a fast retransmit or a timeout was
   spurious. This added capability of the TCP sender is useful in
   environments where TCP's loss recovery and congestion control
   algorithms may often get falsely triggered. This can be caused by
   packet Packet reordering, packet
   duplication, or a sudden delay increase in the data or the ACK path
   that results in a spurious timeout. timeout can cause this. For example, such
   sudden delay increases can often occur in wide-area wireless access
   networks due to handovers, resource preemption, or because the mobile
   transmitter traverses through a radio coverage hole (e.g., see
   [Gu01]).

   Based on the Eifel algorithm, the TCP sender may then choose to
   implement dedicated response schemes. One goal of such a scheme would
   be to alleviate the consequences of a falsely triggered loss recovery
   phase. For example, an important feature of the scheme proposed in
   [LG01] is that it avoids the spurious go-back-N retransmits that
   typically occur after a spurious timeout. Another goal would be to
   "upgrade" the congestion control parameters, the congestion window
   and slow start threshold [RFC2581]. This is to compensate for the
   unnecessary reduction of their values when the loss recovery was
   falsely triggered. Yet, another goal would be to adapt other protocol
   parameters, e.g., the duplicate acknowledgement threshold [RFC2581],
   and the RTT estimators [RFC2988]. This is to reduce the risk of
   falsely triggering TCP's loss recovery again in the future. However,
   such response schemes are outside the scope of this document. They
   are addressed in different documents (e.g., see [LG01] and [BA01]). [BA01a]).

   A key feature of the Eifel algorithm is that it already detects upon
   the first new ACK that arrives during a loss recovery phase that a
   fast retransmit or a timeout was spurious. This is crucial to be able
   to avoid the mentioned spurious go-back-N retransmits. Another
   feature is that the Eifel algorithm is fairly robust against ACK
   losses. Especially, the loss of the single ACK carrying the
   retransmit-marker does not affect the functioning of the Eifel
   algorithm.

2. Spurious Retransmits

   The following events falsely trigger a TCP sender's loss recovery and
   congestion control algorithms. This causes a so-called spurious
   retransmit, and an unnecessary reduction of the TCP sender's
   congestion window.

     - spurious Spurious timeouts

     - packet Packet reordering

     - packet Packet duplication

   A spurious timeout is a timeout that would not have occurred had the
   sender "waited longer". It may be caused by increased delay that
   suddenly occurs in the data or the ACK path. This in turn might cause
   an ACK to arrive too late, i.e., only after the TCP sender's
   retransmission timer has expired. This would then trigger a spurious
   retransmit. Note that the mentioned ACK could either be a new ACK
   that would acknowledge the oldest outstanding segment, or it could be
   the third duplicate ACK that would have triggered a fast retransmit
   of the oldest outstanding segment [RFC2581]. Note: In the latter case
   the sender should suppress the fast retransmit that the third
   duplicate ACK would usually trigger. In general, a TCP sender should
   ignore all duplicate ACKs that arrive after a timeout [GL01].

   Packet reordering in the network may occur because IP [RFC791] is a
   connection-less protocol. Thus, IP does not guarantee an in-order
   delivery of packets. This results in a spurious fast retransmit if
   three or more data segments arrive out-of-order at the TCP receiver,
   and at least three of the resulting duplicate ACKs arrive at the TCP
   sender. This is because a TCP receiver generates a duplicate ACK for
   each segment that arrives out-of-order, and three consecutive
   duplicate ACKs trigger the TCP sender's fast retransmit algorithm.

   Packet duplication in the network may also occur because IP is
   connection-less. That is, the receiving IP does not (cannot) remove
   duplicate packets. A TCP receiver in turn also generates a duplicate
   ACK for each duplicate segment. As with packet reordering, this
   results in a spurious fast retransmit if duplication of data segments
   or ACKs results in three or more duplicate ACKs to arrive at the TCP
   sender.

   The negative impact on TCP performance caused by packet reordering
   and packet duplication is commonly the same: a single spurious
   retransmit (the fast retransmit), and the unnecessary halving of the
   TCP sender's congestion window as a result of the subsequent fast
   recovery phase [RFC2581].

   The negative impact on TCP performance caused by a spurious timeout
   is more severe. First, the timeout event itself causes a single
   spurious retransmit, and unnecessarily forces the TCP sender into
   slow start [RFC2581]. Then, as the connection progresses, a chain
   reaction gets triggered that further decreases TCP's performance.
   Since the timeout was spurious, at least some ACKs for original
   transmits will typically arrive at the TCP sender before the ACK for
   the retransmit arrives. (This is unless severe packet reordering
   coincided with the spurious timeout in such a way that the ACK for
   the retransmit is the first new ACK to arrive at the TCP sender.)
   Those ACKs for original transmits will then trigger an implicit go-
   back-N loss recovery at the TCP sender. Assuming that none of the
   outstanding segments and none of the corresponding ACKs were lost,
   all outstanding segments will get retransmitted unnecessarily.
   Moreover, this will then typically cause a spurious fast retransmit.
   This is because the spurious go-back-N retransmits will arrive as
   duplicates at the TCP receiver receiver, which in turn triggers a series of
   duplicate ACKs. Note that this last spurious fast retransmit could be
   avoided with the careful variant of 'bug fix' [RFC2582].

   More detailed explanations including TCP trace plots that visualize
   the effects of packet reordering and spurious timeouts can be found
   in [LK00].

3. The Eifel Algorithm

3.1 The Idea

   As mentioned before, a solution to eliminate the retransmission
   ambiguity in TCP, is to mark ACKs with a special retransmit-marker.
   The marker would need to be present in those ACKs, and only those
   ACKs, that the TCP receiver sends in response to retransmits; both
   genuine and spurious retransmits. Based on such a retransmit-marker,
   the Eifel algorithm allows the TCP sender to detect a posteriori that
   a fast retransmit or a timeout was spurious.

       [Note, that the

       [Note: The Eifel algorithm as defined here is a pure detection
       mechanism. The original proposal of the Eifel algorithm [LK00]
       included the TCP sender's response to a detected spurious
       retransmit, and a marking scheme that allows a TCP sender to
       distinguish an ACK for an original transmit from that of a
       retransmit. Such response and marking schemes are addressed in
       different documents [RFC2883], [BA01], [BA01a], [BA01b], [LM01],
       [LG01].]

   The key idea of the Eifel algorithm is to let the TCP sender take the
   absence of a retransmit-marker in the first new ACK that arrives
   after a retransmit as sufficient evidence that the loss recovery
   phase was falsely triggered. Being able to make this decision upon
   the first new ACK is crucial for response schemes such as [LG01] to
   avoid the spurious go-back-N retransmits that typically occur after a
   spurious timeout.

   However, making this decision upon the first new ACK is not
   absolutely reliable. A counter-example can be constructed where this
   approach fails:

       In case the original transmit was in fact dropped in the
       network, a new ACK acknowledging the that arrives after a retransmit would also
       not carry a retransmit-marker if (1) the retransmit arrived at
       the TCP receiver in sequence, i.e., if it had jumped ahead of
       all data segments that were outstanding when the retransmit was
       sent, and if (2) the ACK for the retransmit carrying the
       retransmit-marker got lost. In that case the mentioned new ACK
       would correspond to one of the data segments that were
       outstanding when the retransmit was sent. Note: This example
       holds independent of whether the loss recovery phase was
       triggered by the arrival of the third duplicate ACK or by a
       timeout. Note also: in case the SACK option is used as the
       retransmit-marker (see Section 3.2 and 3.4), condition (2) is
       not required.

   Nevertheless, this counter-example is regarded as being a rather
   pathological case. In addition, it seems to be the only conceivable
   counter-example. Therefore, it is regarded as sufficiently safe to
   take the absence of a retransmit-marker in the first new ACK that
   arrives after a retransmit as the signal that the TCP sender falsely
   entered the loss recovery phase.

   This approach makes the Eifel algorithm fairly robust against ACK
   losses. This is because a number of ACKs that do not carry the
   retransmit-marker are commonly in flight during a falsely triggered
   loss recovery phase. The arrival of at least one of those ACKs would
   be sufficient for the Eifel algorithm to make a decision. Also, the
   loss of the single ACK carrying the retransmit-marker does not affect
   the functioning of the Eifel algorithm.

   Three alternative retransmit-markers are defined in this document,
   and the Eifel algorithm may be based on either one of them:

       (1) the TCP RXT flag [LM01],
       (2) the TCP Timestamps option [RFC1323], and

       (3) the TCP SACK option [RFC2018].

   The exact use of those markers is specified in the following
   subsections. The RXT flag is the most efficient retransmit-marker
   since it does not enlarge TCP segments nor or ACKs by an option. Unlike
   for the RXT flag and the Timestamps option, the SACK option has the
   drawback that it can only detect whether a fast retransmit was
   spurious. It cannot be used to detect spurious timeouts as explained
   in Section 3.4.

   Note: The DSACK option [RFC2883] is not an appropriate retransmit-
   marker that would allow to eliminate the retransmission ambiguity.
   The reason is that the first new ACK acknowledging that arrives after a retransmit
   will commonly not carry a DSACK option. That is, the DACK option
   commonly arrives one or more ACKs later than the first new ACK for a
   retransmit. This is independent of whether the retransmit was genuine
   or spurious.

   Given that both the TCP sender and receiver support the RXT flag, or
   the Timestamps option, or the SACK option, a TCP sender MAY implement
   the Eifel algorithm as defined in the following subsections.

3.2 The Algorithm

   The following steps MUST be taken by the TCP sender when the third
   duplicate ACK arrives or the a timeout occurs, i.e., when the a loss recovery
   phase starts:

      (1)     Set a "SpuriousRecovery" variable to 'false'. If this
               variable is true at step (7) below, the loss recovery
               phase had been falsely triggered.

      (2)     Set a "RecoveryPoint" variable to HighData. When the TCP
               sender receives the first new ACK for this data octet the
               loss recovery phase is terminated.

      (3-TS)  This step only applies when the Timestamps option is used
               as the retransmit-marker: Set a "RetransmitTS" variable
               to the value of the Timestamp Value field of the
               Timestamps option included in the first retransmit. A TCP
               sender MUST ensure that RetransmitTS does not get
               overwritten as the loss recovery phase progresses, e.g.,
               in case of a second timeout and subsequent second
               retransmit of the same octet.

   The following steps MUST be taken by the TCP sender when the first
   new ACK arrives after the loss recovery phase started:

      (4)     If the ACK's sequence number is greater than the value of
               RecoveryPoint , proceed to step (7) below. This condition
               ensures that only those ACKs are considered that
               correspond to segments that were outstanding at the time
               the retransmit was sent.

      (5-SACK) This step only applies when the SACK option is used as
               the retransmit-marker: If the ACK's sequence number is
               equal to the value of RecoveryPoint, proceed to step (7)
               below. This step is motivated in Section 3.4.

      (6-RXT) This step only applies when the RXT flag is used as the
               retransmit-marker: If the ACK does not have the RXT flag
               set (binary 1), set SpuriousRecovery to 'true' and
               proceed to step (7) below.

      (6-TS)  This step only applies when the Timestamps option is used
               as the retransmit-marker: If the value of the Timestamp
               Echo Reply field of the ACK's Timestamps option is
               smaller than RetransmitTS, set SpuriousRecovery to 'true'
               and proceed to step (7) below. This step is commented in
               Section 3.3.

      (6-SACK) This step only applies when the SACK option is used as
               the retransmit-marker, and only if the loss recovery
               phase had been triggered by the arrival of the third
               duplicate ACK: If the ACK does not carry a SACK option,
               set SpuriousRecovery to 'true' and proceed to step (7)
               below. This step is motivated in Section 3.4.

      (7)     Done. No further processing.

3.3 Comments to the Timestamp-based Detection

   The comparison operator "smaller than" in step (6-TS) is
   conservative. In theory, when the "timestamp clock" is slow or the
   network is fast, RetransmitTS could at most be equal to the timestamp
   echoed by an ACK sent in response to an original transmit. In that
   case, it is assumed that the loss recovery was not falsely triggered.

3.4 Comments to the SACK-based Detection

   The SACK option is not a retransmit-marker in the sense that it is
   present only in those ACKs that the TCP receiver sends in response to
   retransmits (see Section 3.1). This is why the SACK option cannot be
   used to detect that a timeout was spurious. For this, we only need to
   consider the case where an entire flight of segments was lost versus
   the case of a spurious timeout. Assuming that packet reordering did
   not concur, the first new ACK arriving after the retransmit would not
   carry a SACK option in either case. Thus, the retransmission
   ambiguity problem would still exist.

   Nevertheless, the SACK option can be used to detect that a fast
   retransmit was spurious. Let's assume a SACK-enabled TCP receiver. If
   only a single segment was in fact lost, then the first new ACK that
   arrives after the fast retransmit will not carry a SACK option, and
   will acknowledge RecoveryPoint. If multiple segments were in fact
   lost, then the first new ACK that arrives after the fast retransmit
   will carry a SACK option, and will acknowledge below RecoveryPoint,
   i.e., the ACK will be partial. Thus, partial ACKs that do not carry a
   SACK option are impossible independent of how many segments were
   lost. Conversely, the first new ACK after the fast retransmit that is
   a partial ACK and that does not carry a SACK option, signals that the
   loss recovery phase was falsely triggered. This motivates steps (5-SACK) and (6- (5-
   SACK) and (6-SACK) in Section 3.2.

   Note: The SACK option cannot be used to detect that a fast retransmit
   was spurious when the entire flight of segments jumps ahead of the
   first segment (the oldest outstanding segment) of that flight. This
   is because the resulting new ACK would not carry a SACK option but
   would acknowledge RecoveryPoint. Thus, step (5-SACK) in Section 3.2
   would become effective.

4. Security Considerations

   There do not seem to be any security considerations associated with
   the Eifel algorithm itself. This is because the Eifel algorithm is
   only a detection scheme that is not tied to any specific action that
   might alter existing protocol state at the TCP sender or receiver.
   That is, no value of a state variable other than one introduced by
   the algorithm itself (SpuriousRecovery, RecoverPoint, and
   RetransmitTS) is changed.

   However, security considerations might exist for response schemes
   that use the Eifel algorithm as a basis. In particular, it needs to
   be considered that the TCP receiver might by lying about the
   retransmit-marker.

Acknowledgments

   Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan
   Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, and
   Andrei Gurtov for very useful discussions that contributed to this
   work.

References

   [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control,
             RFC 2581, April 1999.

   [BA01]

   [BA01a]   E. Blanton, M. Allman, Adjusting the Duplicate ACK
             Threshold to Avoid Spurious Retransmits, work in progress,
             July 2001.

   [BA01b]   E. Blanton, M. Allman, Using TCP DSACKs and SCTP Duplicate
             TSNs to Detect Spurious Retransmissions, work in progress,
             August 2001.

   [RFC1122] R. Braden, Requirements for Internet Hosts - Communication
             Layers, RFC 1122, October 1989.

   [RFC2119] S. Bradner, Key words for use in RFCs to Indicate
             Requirement Levels, RFC 2119, March 1997.

   [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
             Performance, RFC 1323, May 1992.

   [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective
             Acknowledgement Options, RFC 2018, October 1996.

   [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's
             Fast Recovery Algorithm, RFC 2582, April 1999.

   [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow,
             An Extension to the Selective Acknowledgement (SACK) Option
             for TCP, RFC 2883, July 2000.

   [Gu01]    A. Gurtov, Effect of Delays on TCP Performance, In
             Proceedings of IFIP Personal Wireless Communications,
             August 2001.

   [GL01]    A. Gurtov, R. Ludwig, Making TCP Robust Against Delay
             Spikes, work in progress, November 2001.

   [KP87]    P. Karn, C. Partridge, Improving Round-Trip Time Estimates
             in Reliable Transport Protocols, In Proceedings of ACM
             SIGCOMM 87.

   [LK00]    R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP
             Robust Against Spurious Retransmissions, ACM Computer
             Communication Review, Vol. 30, No. 1, January 2000,
             available at http://www.acm.org/sigcomm/ccr/archive/2000/
             jan00/ccr-200001-ludwig.html (easier studied when
             viewed/printed in color).

   [LM01]    R. Ludwig, M. Meyer, The TCP Retransmit (RXT) Flag, work in
             progress, November 2001.

   [LG01]    R. Ludwig, A. Gurtov, Responding to Spurious Timeouts in
             TCP, work in progress, November 2001.

   [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer,
             RFC 2988, November 2000.

   [RFC791]  J. Postel, Internet Protocol, RFC 791, September 1981.

   [RFC793]  J. Postel, Transmission Control Protocol, RFC793, September
             1981.

   [WS95]    G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2
             (The Implementation), Addison Wesley, January 1995.

Author's Address

     Reiner Ludwig
     Ericsson Research (EED)
     Ericsson Allee 1
     52134 Herzogenrath, Germany
     Phone: +49 2407 575 719
     Fax:   +49 2407 575 400
     Email: Reiner.Ludwig@Ericsson.com

This Internet-Draft expires in May 2002.