Network Working Group                                      Reiner Ludwig
INTERNET-DRAFT                                         Ericsson Research
Expires: August 2001                                   February 23, May 2002                                         November, 2001

                      The Eifel Algorithm for TCP
                <draft-ietf-tsvwg-tcp-eifel-alg-00.txt>
               <draft-ietf-tsvwg-tcp-eifel-alg-01.txt>

Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/lid-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

Abstract

   TCP's intertwined error and congestion control is not robust against
   spurious timeouts nor is it robust against packet re-orderings.

   A
   packet that is delayed in the network beyond solution to eliminate the expiration of TCP's retransmission timer, ambiguity in TCP, is mistaken for a packet loss by a TCP sender.
   Also, to
   mark ACKs with a packet that is re-ordered special retransmit-marker. The marker would need to
   be present in those ACKs, and only those ACKs, that the network beyond TCP's
   duplicate acknowledgment threshold, is eventually mistaken for a
   packet loss by a TCP sender. Both situations lead receiver
   sends in response to a spurious
   retransmit of the oldest outstanding segment, retransmits; both genuine and an unnecessary
   reduction of the congestion window at the sender. Moreover, a spurious timeout forces the sender into
   retransmits. Based on such a go-back-N retransmission
   mode leading to spurious retransmits of all outstanding segments.

   We propose retransmit-marker, the "Eifel algorithm" as a way to make TCP robust against
   spurious timeouts and packet re-orderings. The Eifel algorithm uses
   extra information in
   allows the ACKs TCP sender to reliably detect (a posteriori) a
   spurious retransmit of the oldest outstanding segment at the TCP
   sender. In response to such posteriori that a detection, the Eifel algorithm restores
   the congestion window, and prevents the spurious go-back-N
   retransmits following fast retransmit
   or a spurious timeout. As extra information timeout was spurious. Three alternative retransmit-markers are
   defined in the
   ACKs, this document, and the Eifel algorithm allows for two alternatives: the timestamp
   option [RFC1323] and/or two new flags in the Reserved field may be based on
   either one of them: the TCP header.

1. Introduction

   In this document, we use RXT flag, the terms 'valid ACK' as defined in
   [RFC793], TCP Timestamps option, and
   the terms 'duplicate ACK' (DUPACK), 'Congestion Window'
   (cwnd), and 'Slow Start Threshold' (ssthresh) TCP SACK option. The Eifel algorithm provides a basis for future
   TCP enhancements such as defined in
   [RFC2581]. Further, our use of the term 'retransmit' includes both
   fast retransmits triggered by the third DUPACK and retransmits
   triggered by response schemes that may change a timeout. TCP
   sender's protocol state to improve end-to-end performance.

Terminology

   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in [RFC2119].

   TCP's intertwined error and congestion control is not robust against
   spurious timeouts nor is it robust against packet re-orderings. A
   packet that is delayed in the network beyond the expiration of TCP's
   retransmission timer, is mistaken for a packet loss by a TCP sender.
   This results in a so-called spurious timeout, i.e., a timeout that
   would not have occurred had

   We use the sender "waited longer". Also, a
   packet term 'new ACK' to refer to an ACK that is re-ordered in the network beyond TCP's DUPACK
   threshold of 3, is eventually mistaken for a packet loss by a TCP
   sender. This is because the fast retransmit algorithm uses acknowledges
   outstanding data. We use the
   arrival of 3 DUPACKs term 'duplicate' ACK as an indication that a segment has been lost
   [RFC2581]. Both situations lead defined in
   [WS95]. Furthermore, we refer to a spurious retransmit of the
   oldest outstanding segment, and an unnecessary reduction first-time transmission of the
   congestion window at the sender. Moreover, a spurious timeout forces
   the sender into a go-back-N retransmission mode leading to spurious
   retransmits of all outstanding segments. A detailed explanation of
   these effects using trace plots is found in [LK00].

   We propose the "Eifel algorithm"
   data segment as a way to make TCP robust against
   spurious timeouts the 'original transmit', and packet re-orderings. The Eifel algorithm 'HighData' is
   based on the observation that the spurious go-back-N retransmits
   following
   highest sequence number transmitted at a spurious timeout and the unnecessary reduction of the
   congestion window caused by packet re-ordering have the same root:
   the retransmission ambiguity. given point.

1. Introduction

   The retransmission ambiguity problem [KP87] is the TCP sender's
   inability to distinguish an whether the first new ACK for that arrives after
   a retransmit was sent in response to the original transmit of a segment from or the ACK for its
   retransmit. The
   Eifel algorithm uses extra information in A solution to eliminate the ACKs retransmission ambiguity is
   to reliably detect
   (a posteriori) mark ACKs with a spurious retransmit of the oldest outstanding
   segment at special "retransmit-marker". The marker would
   need to be present in those ACKs, and only those ACKs, that the TCP sender. In
   receiver sends in response to such a detection, retransmits; both genuine and spurious
   retransmits.

   Based on such a retransmit-marker, the Eifel algorithm restores the congestion window, and prevents allows the spurious
   go-back-N retransmits following TCP
   sender to detect a spurious timeout.

   Spurious timeouts have not generally been posteriori that a concern in the past since
   they are rare [Pax97]. fast retransmit or a timeout was
   spurious. This can be credited to the conservativeness added capability of TCP's retransmission timer [LS00]. Yet, there is benefit in
   avoiding the detrimental effects that spurious timeouts have on TCP
   performance. This is since those effects create a strong incentive
   for keeping a conservative - potentially too conservative -
   retransmission timer. However, a retransmission timer that sender is too
   conservative useful in
   environments where TCP's loss recovery and congestion control
   algorithms may cause long idle times before a lost packet is
   retransmitted. often get falsely triggered. This can degrade performance. This is obvious for
   interactive request/response-style connections. But it also affects
   bulk data transfers whenever the sender has exhausted its send window
   before the retransmission timer has expired. The Eifel algorithm
   opens the door to the development of be caused by
   packet reordering, packet duplication, or a more optimistic retransmission
   timer as it ensures that the penalty for underestimating the round-
   trip time is minimal. In sudden delay increase in
   the common case, data or the only penalty is ACK path that results in a
   single spurious retransmit.

   Packet re-orderings timeout. For
   example, such sudden delay increases can often occur in wide-area
   wireless access networks due to handovers, resource preemption, or
   because the connection-less nature of IP
   [RFC791] which does not guarantee an in-order delivery of packets.
   However, it is difficult to evaluate how serious this is in the
   Internet today. Early studies [Pax97] conclude that this occurs
   rarely, while recent studies [BPS99] find this problem to be more
   serious. Clearly, this depends mobile transmitter traverses through a radio coverage
   hole (e.g., see [Gu01]).

   Based on the paths underlying such studies,
   e.g., whenever routers are inter-connected via multiple links/paths
   (for fault tolerance) and load balancing is performed across those
   links/paths on Eifel algorithm, the aggregate traffic, packet re-orderings will occur
   more frequently.

2. Detecting a Spurious Retransmit in TCP

   Detecting the retransmission ambiguity requires extra information in
   the ACKs that the sender can use may then choose to unambiguously distinguish an ACK
   for the original transmit
   implement dedicated response schemes. One goal of such a segment from that scheme would
   be to alleviate the consequences of a retransmit.
   This falsely triggered loss recovery
   phase. For example, an important feature of the scheme proposed in turn requires
   [LG01] is that every segment and the corresponding ACK
   carry the extra information to allow the sender to avoid it avoids the spurious go-back-N retransmits described in Section 1. Waiting for the
   receiver to signal in DUPACKs that is has correctly received
   duplicate segments, as proposed in [RFC2883],
   typically occur after a spurious timeout. Another goal would be too late, to
   "upgrade" the congestion control parameters, the congestion window
   and slow start threshold [RFC2581]. This is thus not an alternative.

   As extra information in to compensate for the ACKs,
   unnecessary reduction of their values when the Eifel algorithm allows for two
   alternatives: loss recovery was
   falsely triggered. Yet, another goal would be to adapt other protocol
   parameters, e.g., the timestamp option [RFC1323] and/or two new flags in duplicate acknowledgement threshold [RFC2581],
   and the Reserved field RTT estimators [RFC2988]. This is to reduce the risk of
   falsely triggering TCP's loss recovery again in the TCP header. Both alternatives future. However,
   such response schemes are specified
   in outside the following two subsections. In Section 2.3, we specify
   precedence rules among those two alternatives. We speak scope of this document. They
   are addressed in different documents (e.g., see [LG01] and [BA01]).

   A key feature of the
   timestamp-based Eifel algorithm to emphasize is that timestamps are
   being used it already detects upon
   the first new ACK that arrives during a loss recovery phase that a
   fast retransmit or a timeout was spurious. This is crucial to detect be able
   to avoid the mentioned spurious go-back-N retransmits. Likewise, we speak of Another
   feature is that the
   flag-based Eifel algorithm.

2.1. Detection based on the Xmit-Echo Flag

   We define Bit 6 and Bit 7 in algorithm is fairly robust against ACK
   losses. Especially, the Reserved field loss of the TCP header as
   the "Xmit flag" and "Xmit-Echo flag", respectively. The sender uses
   the Xmit flag to mark retransmits while single ACK carrying the receiver sets
   retransmit-marker does not affect the Xmit-
   Echo flag in functioning of the ACKs it sends in response to Eifel
   algorithm.

2. Spurious Retransmits

   The following events falsely trigger a segment with the Xmit
   flag set. Since, TCP can be sender's loss recovery and
   congestion control algorithms. This causes a sender so-called spurious
   retransmit, and receiver at the same time,
   two separate bits need to be used for those flags. It is worth noting
   that the two flags are similar to the sub-sequence field proposed in
   [ISO8073]. The location an unnecessary reduction of the 6-bit Reserved field in the TCP header
   is shown in Figure 3 of [RFC793]. Bit 8 and 9 of the Reserved field sender's
   congestion window.

     - spurious timeouts

     - packet reordering

     - packet duplication

   A spurious timeout is a timeout that would not have been assigned to occurred had the Explicit Congestion Notification (ECN)
   [RFC2481].

2.1.1. TCP Initialization

   The text
   sender "waited longer". It may be caused by increased delay that
   suddenly occurs in this subsection has been derived from Section 6.1.1 of
   [RFC2481]. This is because the same initialization semantics also
   apply to the flag-based Eifel algorithm. Thus, TCP's that support
   ECN, and wish to support data or the flag-based Eifel algorithm, should be
   able ACK path. This in turn might cause
   an ACK to re-use most of arrive too late, i.e., only after the initialization code implemented for ECN.

   When a TCP sends sender's
   retransmission timer has expired. This would then trigger a SYN packet, it MAY set (i.e., equal to 1) spurious
   retransmit. Note that the Xmit
   and Xmit-Echo flag. For mentioned ACK could either be a SYN packet, the setting of both flags is
   defined as an indication new ACK
   that would acknowledge the sending TCP whishes to use oldest outstanding segment, or it could be
   the
   flag-based Eifel algorithm, rather than as an indication third duplicate ACK that the SYN
   packet is a retransmit. More precisely, such would have triggered a SYN packet indicates
   that fast retransmit
   of the TCP transmitting oldest outstanding segment [RFC2581]. Note: In the SYN packet will participate in latter case
   the
   flag-based Eifel algorithm as both a sender and receiver.

   Only if should suppress the fast retransmit that the third
   duplicate ACK would usually trigger. In general, a TCP receives sender should
   ignore all duplicate ACKs that arrive after a SYN packet with both timeout [GL01].

   Packet reordering in the Xmit and Xmit-Echo
   flags set, MAY it respond with network may occur because IP [RFC791] is a SYN-ACK packet
   connection-less protocol. Thus, IP does not guarantee an in-order
   delivery of packets. This results in which it sets the
   Xmit-Echo flag, but unsets (i.e., sets equal to 0) the Xmit flag. For a SYN-ACK packet, spurious fast retransmit if
   three or more data segments arrive out-of-order at the pattern TCP receiver,
   and at least three of the Xmit-Echo flag set and resulting duplicate ACKs arrive at the Xmit
   flag unset is defined as an indication TCP
   sender. This is because a TCP receiver generates a duplicate ACK for
   each segment that arrives out-of-order, and three consecutive
   duplicate ACKs trigger the TCP transmitting sender's fast retransmit algorithm.

   Packet duplication in the network may also occur because IP is
   connection-less. That is, the
   SYN-ACK receiving IP does not (cannot) remove
   duplicate packets. A TCP receiver in turn also generates a duplicate
   ACK for each duplicate segment. As with packet agrees to participate reordering, this
   results in the flag-based Eifel
   algorithm as both a sender spurious fast retransmit if duplication of data segments
   or ACKs results in three or more duplicate ACKs to arrive at the TCP
   sender.

   The negative impact on TCP performance caused by packet reordering
   and receiver.

   This asymmetry packet duplication is necessary for commonly the robust negotiation of same: a single spurious
   retransmit (the fast retransmit), and the use unnecessary halving of the flag-based Eifel algorithm with deployed
   TCP implementations (see
   section 6.1.1 sender's congestion window as a result of [RFC2481] for details).

   For the subsequent fast
   recovery phase [RFC2581].

   The negative impact on TCP transmitting the SYN packet with both the Xmit and Xmit-
   Echo flags set, the flag-based Eifel algorithm has been successfully
   negotiated, if it receives performance caused by a SYN-ACK packet in which the Xmit-Echo
   flag spurious timeout
   is set, but more severe. First, the Xmit flag is unset. For timeout event itself causes a single
   spurious retransmit, and unnecessarily forces the TCP transmitting the
   SYN-ACK packet, sender into
   slow start [RFC2581]. Then, as the flag-based Eifel algorithm has been successfully
   negotiated, if it has received connection progresses, a SYN packet with both the Xmit and
   Xmit-Echo flags set, while it has set chain
   reaction gets triggered that further decreases TCP's performance.
   Since the Xmit-Echo flag but unset timeout was spurious, at least some ACKs for original
   transmits will typically arrive at the Xmit flag in its SYN-ACK.

2.1.2. The TCP Receiver

   If the flag-based Eifel algorithm has been successfully negotiated, sender before the following rules apply.

   The receiver SHOULD send an immediate ACK for
   the retransmit arrives. (This is unless severe packet reordering
   coincided with the Xmit-Echo flag set spurious timeout in response to an incoming data segment such a way that has the Xmit flag set.
   The immediate ACK for
   the retransmit is RECOMMENDED because of the range check that first new ACK to arrive at the
   sender performs on incoming TCP sender.)
   Those ACKs after a retransmit (see Section
   2.1.2).

   In all other cases, the receiver MUST unset (i.e., set equal to 0)
   the Xmit-Echo flag in all ACKs it sends.

2.1.3. The TCP Sender

   If the flag-based Eifel algorithm has been successfully negotiated,
   the following rules apply.

   The sender MUST set the Xmit flag in for original transmits will then trigger an implicit go-
   back-N loss recovery at the TCP header of retransmits.
   This is REQUIRED since otherwise the Eifel algorithm might get
   (falsely) triggered in response to a genuine packet loss (see Section
   5). Recall, sender. Assuming that our use none of the term 'retransmit' includes both fast
   retransmits triggered by the third DUPACK and retransmits triggered
   by a timeout.

   The sender MUST store in "cwnd_prev" the value that the sender's cwnd
   had before it is reduced when the retransmit occurs. Likewise, the
   sender MUST store in "ssthresh_prev" the value that the sender's
   ssthresh had before it is reduced when the retransmit occurs. This is
   REQUIRED since the sender will use cwnd_prev and ssthresh_prev to
   restore its cwnd and ssthresh after it has detected that the
   retransmit was spurious (see Section 3).

   When a retransmit is sent, the sender MUST store the next (previously
   unsent) sequence number to be sent in "sent_high", i.e., sent_high is
   the highest
   outstanding sequence number transmitted so far plus 1, or
   alternatively, the ACK number segments and none of a valid ACK that would ack the corresponding ACKs were lost,
   all outstanding data. segments will get retransmitted unnecessarily.
   Moreover, this will then typically cause a spurious fast retransmit.
   This is REQUIRED since because the sender will use
   sent_high to detect a spurious retransmit go-back-N retransmits will arrive as described below. Note
   that
   duplicates at the definition of "send_high" (spelled with a 'd') TCP receiver which in [RFC2582]
   is different.

   When the first valid ACK that is not a DUPACK arrives after turn triggers a
   retransmit was sent, the sender detects series of
   duplicate ACKs. Note that the retransmit was this last spurious if all of the following conditions are true:

     - the sender was expecting an ACK for a retransmit, and
     - fast retransmit could be
   avoided with the ACK number of that ACK is less than or equal to sent_high,
       and
     - that ACK does not have the Xmit-Echo bit set.

   The range check implied by the second condition prevents that the
   Eifel algorithm is triggered in situations where a series of ACKs is
   lost and a cumulative ACK beyond sent_high acks the retransmit.

2.2. Detection based on Timestamps

   The timestamp-based Eifel algorithm requires that both the sender and
   receiver have correctly implemented the timestamp option as specified
   in [RFC1323]. In addition, the TCP sender implementation needs to be
   enhanced as specified in Section 2.2.2. No change to the TCP protocol
   is required nor any change to the TCP receiver implementation.

2.2.1. TCP Initialization

   The timestamp-based Eifel algorithm has been successfully negotiated,
   if use of the timestamp option has been successfully negotiated
   during connection setup (see [RFC1323]).

2.2.2. The TCP Receiver

   No change is required beyond the implementation of the timestamp
   option as specified in [RFC1323].

2.2.3. The TCP Sender

   If the timestamp-based Eifel algorithm has been successfully
   negotiated, the following rules apply.

   The sender MUST store in "ts_first_xmit" the timestamp of the first
   retransmit for a data segment. This is REQUIRED since otherwise the
   Eifel algorithm might get (falsely) triggered in response to a
   genuine packet loss (see Section 5). For the same reason, any
   subsequent retransmit for the same oldest outstanding sequence number
   MUST NOT overwrite ts_first_xmit. Recall, that our use of the term
   'retransmit' includes both fast retransmits triggered by the third
   DUPACK and retransmits triggered by a timeout.

   As with the flag-based Eifel algorithm, the sender MUST store in
   cwnd_prev the value that the sender's cwnd had before it is reduced
   when the retransmit occurs. Likewise, the sender MUST store in
   ssthresh_prev the value that the sender's ssthresh had before it is
   reduced when the retransmit occurs.

   When the first valid ACK that is not a DUPACK arrives after a
   retransmit was sent, the sender detects that the retransmit was
   spurious if all of the following conditions are true:

     - the sender was expecting an ACK for a retransmit, and
     - the value of the Timestamp Echo Reply field in the timestamp
       option field of that ACK is less than ts_first_xmit.

   Using the comparison operator "less than" in the second condition is
   conservative. In theory, when the timestamp clock is slow or the
   network is fast, ts_first_xmit could (at most) also be equal to the
   value of the Timestamp Echo Reply field in the timestamp option field
   of the first or a subsequent ACK that acks the retransmit. Thus, in
   such a case the sender assumes that the retransmit was a genuine
   retransmit, i.e., that it was not spurious.

2.3. Timestamps or the Xmit-Echo Flag?

   The advantage of using the timestamp-based over the flag-based Eifel
   algorithm is that it does not require changes to the TCP protocol nor
   to the TCP receiver implementation. Also, [RFC1323] is already a
   standards track document, and the timestamp option has already been
   widely deployed in TCP implementations [???].

   The disadvantage of using the timestamp-based over the flag-based
   Eifel algorithm is that including the 12 bytes TCP timestamp option
   field in every segment and the corresponding ACKs introduces extra
   protocol overhead. Moreover, current TCP/IP header compression
   schemes [RFC1144], [RFC2507] do not compress timestamp option fields.
   For those reasons, a sender might not choose to negotiate the
   timestamp option. Note, that timestamps are only required for the
   PAWS mechanism (Protect Against Wrapped Sequences) [RFC1323], since
   for the RTTM mechanism (Round Trip Time Measurement) [RFC1323] there
   exist implementation alternatives careful variant of 'bug fix' [RFC2582].

   More detailed explanations including TCP trace plots that work without visualize
   the timestamp
   option field [LS00]. effects of packet reordering and spurious timeouts can be found
   in [LK00].

3. The flag-based Eifel algorithm has none of the above Algorithm

3.1 The Idea

   As mentioned
   disadvantages, but instead requires changes before, a solution to eliminate the TCP protocol (two
   new flags retransmission
   ambiguity in the Reserved field of the TCP header) and TCP, is to the TCP
   receiver implementation. Hence, we define the following precedence
   rules that mark ACKs with a special retransmit-marker.
   The marker would allow for an incremental deployment of the Eifel
   algorithm.

     - A TCP sender need to be present in those ACKs, and only those
   ACKs, that implements the timestamp option MAY also
       implement the timestamp-based Eifel algorithm.

     - A TCP sender MAY implement the flag-based Eifel algorithm, and if
       it does, MAY try to negotiate its use during connection setup.
       There are situations where it might be advisable not receiver sends in response to operate
       the flag-based Eifel algorithm (see Section 5).

     - If retransmits; both
   genuine and spurious retransmits. Based on such a receiver correctly sets the Xmit-Echo flag (see Section5),
       the operation of retransmit-marker,
   the Eifel algorithm should be based on allows the Xmit-
       Echo flag independent of whether timestamps are also being used.

       In case timestamps are also being used, a TCP sender may use
       timestamps as an additional check to verify whether detect a posteriori that
   a fast retransmit or a timeout was spurious. This rule implies

       [Note, that the negotiation to use the
       Xmit-Echo flag has succeeded.

     - If timestamps are being used but the Xmit-Echo flag Eifel algorithm as defined here is not being
       used for a particular TCP connection, pure
       detection mechanism. The original proposal of the Eifel
       algorithm MAY
       be operated based on timestamps. This rule implies that either
       the negotiation to use [LK00] included the Xmit-Echo flag has failed or it's use
       has been turned off due TCP sender's response to a broken receiver (see Section 5).

3. Responding
       detected spurious retransmit, and a marking scheme that allows a
       TCP sender to distinguish an ACK for an original transmit from
       that of a Spurious Retransmit retransmit. Such response and marking schemes are
       addressed in TCP

   If different documents [RFC2883], [BA01], [LM01],
       [LG01].]

   The key idea of the flag- or timestamp-based Eifel algorithm has been successfully
   negotiated, is to let the following rules apply.

   When a TCP sender has detected take the
   absence of a retransmit-marker in the first new ACK that it has performed arrives
   after a spurious
   retransmit, retransmit as sufficient evidence that the sender SHOULD resume transmission with loss recovery
   phase was falsely triggered. Being able to make this decision upon
   the next
   unsent segment. In addition, it restores cwnd and ssthresh first new ACK is crucial for response schemes such as
   outlined below based on [LG01] to
   avoid the definition of cwnd_prev and ssthresh_prev
   provided in Section 2.1.2 and 2.2.2.

   If spurious go-back-N retransmits that typically occur after a
   spurious timeout.

   However, making this decision upon the first (in new ACK is not
   absolutely reliable. A counter-example can be constructed where this
   approach fails:

       In case of multiple) spurious retransmit the original transmit was triggered
   by in fact dropped in the
       network, a spurious timeout, and

       if only new ACK acknowledging the retransmit would also not
       carry a single retransmit-marker if (1) the retransmit arrived at the
       TCP receiver in sequence, i.e., if it had been sent, jumped ahead of all
       data segments that were outstanding when the sender MAY
       restore cwnd with cwnd_prev retransmit was
       sent, and ssthresh with ssthresh_prev.

   Else if (2) the first (in ACK for the retransmit carrying the
       retransmit-marker got lost. In that case the mentioned new ACK
       would correspond to one of multiple) spurious the data segments that were
       outstanding when the retransmit was not sent. Note: This example
       holds independent of whether the loss recovery phase was
       triggered by the arrival of the third duplicate ACK or by a spurious timeout,
       timeout. Note also: in case the SACK option is used as the
       retransmit-marker (see Section 3.2 and

       if only 3.4), condition (2) is
       not required.

   Nevertheless, this counter-example is regarded as being a single retransmit had been sent, the sender
       first sets cwnd rather
   pathological case. In addition, it seems to ssthresh, as be the only conceivable
   counter-example. Therefore, it would do anyway [RFC2581], and
       then MAY restore ssthresh with cwnd_prev.

   If two retransmits is regarded as sufficiently safe to
   take the absence of a retransmit-marker in the same oldest outstanding sequence number had
   been sent, first new ACK that
   arrives after a retransmit as the sender MAY restore both cwnd and ssthresh with one
   half signal that the value of cwnd_prev.

   If more than two retransmits of TCP sender falsely
   entered the same oldest outstanding sequence
   number had been sent, loss recovery phase.

   This approach makes the Eifel algorithm MUST NOT have any effect on
   cwnd and ssthresh.

   Note, fairly robust against ACK
   losses. This is because a number of ACKs that do not carry the response
   retransmit-marker are commonly in case the first spurious retransmit was not flight during a falsely triggered by
   loss recovery phase. The arrival of at least one of those ACKs would
   be sufficient for the Eifel algorithm to make a spurious timeout as described above, is different from decision. Also, the original proposal described in [LK00]. However, this new response
   avoids
   loss of the packet burst that single ACK carrying the response described in [LK00] would
   typically cause.

4. Avoiding Competition between Timeout- and DUPACK-based Error Recovery

   In retransmit-marker does not affect
   the spirit functioning of the Eifel algorithm, although unrelated to spurious
   retransmits, we propose algorithm.

   Three alternative retransmit-markers are defined in this document,
   and the following rule, Eifel algorithm may be based on the definition either one of cwnd_prev them:

       (1) the TCP RXT flag [LM01],
       (2) the TCP Timestamps option [RFC1323], and ssthresh_prev provided

       (3) the TCP SACK option [RFC2018].

   The exact use of those markers is specified in Section 2.1.2 and 2.2.2. the following
   subsections. The rule applies to RXT flag is the case when most efficient retransmit-marker
   since it does not enlarge TCP segments nor ACKs by an option. Unlike
   for the third DUPACK arrives after RXT flag and the
   first timeout for Timestamps option, the same oldest outstanding sequence number SACK option has
   already occurred. In that case, the sender SHOULD suppress the
   drawback that it can only detect whether a fast retransmit and MAY restore both cwnd and ssthresh with one half the
   value of cwnd_prev. I.e., the sender restores cwnd and ssthresh was
   spurious. It cannot be used to detect spurious timeouts as if
   the timeout had not occurred, but instead goes into congestion
   avoidance [RFC2581].

5. Security Considerations

   There is a considerable risk when implementing the Eifel algorithm explained
   in
   a naive fashion. This is since a misbehaving receiver can severely
   upset congestion control at the sender. Section 3.4.

   Note: The risk DSACK option [RFC2883] is not an appropriate retransmit-
   marker that would allow to eliminate the Eifel
   algorithm retransmission ambiguity.
   The reason is (falsely) triggered in response to a genuine packet
   loss. In that case the Eifel algorithm would mistake first new ACK acknowledging a genuine retransmit as a spurious retransmit. As will
   commonly not carry a consequence DSACK option. That is, the sender
   would effectively not reduce its congestion window in response to DACK option commonly
   arrives one or more ACKs later than the
   lost packet. However, there are reliable sender-side mechanisms to
   protect against this case as outlined below.

   One needs to distinguish between broken receivers that misbehave due
   to an implementation mistake versus malicious receivers that
   deliberately misbehave. We first describe two mechanism to protect
   against broken receivers, followed by new ACK for a different mechanism to
   protect against malicious receivers. We only recommend that
   protection against broken receivers SHOULD be implemented together
   with the Eifel algorithm at the sender.
   retransmit. This is motivated by independent of whether the fact retransmit was genuine
   or spurious.

   Given that both the current TCP implicitly assumes a trust relationship between sender and receiver, i.e., it can be assumed that receivers are not
   malicious. Note that even without the Eifel algorithm, there are ways
   a misbehaving receiver can upset congestion control at the sender
   [SCWA99].

   We do not discuss problems with respect to misbehaving senders
   assuming that support the implementation of RXT flag, or
   the sender-side Eifel algorithm
   complies with Timestamps option, or the specifications in this text.

   Protection Against Broken Receivers

   There is no risk to falsely trigger SACK option, a TCP sender MAY implement
   the timestamp-based Eifel algorithm as long as defined in the receiver correctly implements following subsections.

3.2 The Algorithm

   The following steps MUST be taken by the timestamp
   option [RFC1323]. However, TCP sender when the flag-based Eifel algorithm can be
   falsely triggered third
   duplicate ACK arrives or the timeout occurs, i.e., when the receiver has agreed loss
   recovery phase starts:

      (1)     Set a "SpuriousRecovery" variable to set 'false'. If this
               variable is true at step (7) below, the Xmit-Echo
   flag in ACKs for retransmits but then "forgets" to do so. The ACK for
   a genuine retransmit would then loss recovery
               phase had been falsely trigger triggered.

      (2)     Set a "RecoveryPoint" variable to HighData. When the Eifel algorithm
   once it arrives at TCP
               sender receives the sender. To protect against first new ACK for this case the
   sender SHOULD implement data octet the following mechanism if it uses
               loss recovery phase is terminated.

      (3-TS)  This step only applies when the flag-
   based Eifel algorithm.

   After Timestamps option is used
               as the sender has detected retransmit-marker: Set a spurious retransmit and in response
   restores cwnd and ssthresh with cwnd_prev and ssthresh_prev,
   respectively (see Section 3), it also saves "RetransmitTS" variable
               to the former values value of cwnd
   and ssthresh the Timestamp Value field of the
               Timestamps option included in cwnd_prev and ssthresh_prev, respectively (e.g., help
   = cwnd; cwnd = cwnd_prev; cwnd_prev = help;). Until an ACK arrives at the first retransmit. A TCP
               sender MUST ensure that acks beyond sent_high, RetransmitTS does not get
               overwritten as the sender checks for loss recovery phase progresses, e.g.,
               in case of a valid
   ACK that arrives with second timeout and subsequent second
               retransmit of the Xmit-Echo flag set. If such an ACK arrives, same octet.

   The following steps MUST be taken by the TCP sender assumes that the Eifel algorithm was rightful triggered
   and does nothing further. Otherwise, when an the first
   new ACK arrives at after the
   sender that acks beyond sent_high, loss recovery phase started:

      (4)     If the sender assumes ACK's sequence number is greater than the value of
               RecoveryPoint , proceed to step (7) below. This condition
               ensures that only those ACKs are considered that
               correspond to segments that were outstanding at the Eifel
   algorithm was falsely triggered and reverses time
               the effects of retransmit was sent.

      (5-SACK) This step only applies when the Eifel
   algorithm. I.e. cwnd is (re-)restored to cwnd_prev and ssthresh SACK option is
   (re-)restored to ssthresh_prev. Since, also used as
               the ACKs with retransmit-marker: If the Xmit-
   Echo flag set can get lost, it would be too conservative ACK's sequence number is
               equal to
   completely disable the Eifel algorithm for the rest value of the connection RecoveryPoint, proceed to step (7)
               below. This step is motivated in this situation.

   Another risk stems from receivers that set Section 3.4.

      (6-RXT) This step only applies when the Xmit-Echo RXT flag for
   segments that have not been retransmitted. A TCP sender SHOULD
   implement an appropriate detection mechanism, since in this case, it
   cannot reliably use is used as the flag-based Eifel algorithm.
               retransmit-marker: If a sender
   detects such misbehavior, it SHOULD disable the flag-based Eifel
   algorithm for ACK does not have the rest of RXT flag
               set (binary 1), set SpuriousRecovery to 'true' and
               proceed to step (7) below.

      (6-TS)  This step only applies when the connection.

   Protection Against Malicious Receivers

   There are number of ways in which a malicious receiver could falsely
   trigger Timestamps option is used
               as the Eifel algorithm at retransmit-marker: If the sender. A sender is particularly
   vulnerable to this if it operates value of the flag-based Eifel algorithm.
   Hence, Timestamp
               Echo Reply field of the sender MAY choose ACK's Timestamps option is
               smaller than RetransmitTS, set SpuriousRecovery to 'true'
               and proceed to step (7) below. This step is commented in
               Section 3.3.

      (6-SACK) This step only use applies when the timestamp-based Eifel
   algorithm, SACK option is used as
               the retransmit-marker, and in addition implement only if the following mechanism. The
   mechanism combines loss recovery
               phase had been triggered by the idea arrival of the third
               duplicate ACK: If the ACK does not carry a "singular nonce" proposed SACK option,
               set SpuriousRecovery to 'true' and proceed to step (7)
               below. This step is motivated in
   [SCWA99] with Section 3.4.

      (7)     Done. No further processing.

3.3 Comments to the timestamp option specified in [RFC1323]. Timestamp-based Detection

   The mechanism comparison operator "smaller than" in step (6-TS) is based on
   conservative. In theory, when the observation that after a spurious
   retransmit "timestamp clock" is slow or the sender will
   network is fast, RetransmitTS could at some point receive most be equal to the ACK that was
   triggered timestamp
   echoed by the corresponding an ACK sent in response to an original transmit assuming transmit. In that
   case, it is assumed that
   ACK the loss recovery was not lost. That ACK can be expected falsely triggered.

3.4 Comments to echo the timestamp of the original transmit even if the receiver implements the delayed ACK
   algorithm. In case of packet re-ordering, this SACK-based Detection

   The SACK option is implied by not a retransmit-marker in the
   rules for generating ACKs for data segments sense that fill in all or part
   of a gap it is
   present only in those ACKs that the sequence space TCP receiver sends in response to
   retransmits (see section 4.2 of [RFC2581]) and by Section 3.1). This is why the rules for echoing timestamps in SACK option cannot be
   used to detect that a timeout was spurious. For this, we only need to
   consider the case (see rule (C) in
   section 3.4 where an entire flight of [RFC1323]). In segments was lost versus
   the case of a spurious timeout, it is quite
   likely timeout. Assuming that packet reordering did
   not concur, the delay that has caused first new ACK arriving after the spurious timeout has also
   caused retransmit would not
   carry a SACK option in either case. Thus, the receiver's delayed ACK timer [RFC1122] to expire. Hence, retransmission
   ambiguity problem would still exist.

   Nevertheless, the SACK option can be used to protect against detect that a malicious receiver, the sender should fast
   retransmit was spurious. Let's assume a SACK-enabled TCP receiver. If
   only
   trigger the Eifel algorithm a single segment was in response to fact lost, then the first new ACK for the original
   transmit and only after it has authenticated that ACK as described
   below.

   To protect against malicious receivers that spoof ACKs, the sender
   MAY implement
   the following modification to the timestamp option
   specified in [RFC1323]. The sender adds fast retransmit will not carry a separate random number to
   each timestamp to be included in an outgoing segment, SACK option, and writes will
   acknowledge RecoveryPoint. If multiple segments were in fact lost,
   then the
   result into first new ACK after the Timestamp Value field in fast retransmit will carry a SACK
   option, and will acknowledge below RecoveryPoint, i.e., the timestamp ACK will
   be partial. Thus, partial ACKs that do not carry a SACK option field.
   A are
   impossible independent of how many segments were lost. Conversely,
   the first new random number ACK after the fast retransmit that is generated per RTT to avoid a partial ACK and
   that does not carry a SACK option, signals that the receiver
   "learns" the random number. That random number should loss recovery
   phase was falsely triggered. This motivates steps (5-SACK) and (6-
   SACK) in Section 3.2.

   Note: The SACK option cannot be carefully
   chosen used to avoid bad interactions with detect that a fast retransmit
   was spurious when the PAWS mechanism specified in
   [RFC1323]. Clearly, entire flight of segments jumps ahead of the random number(s) need to be accounted for in
   first segment (the oldest outstanding segment) of that flight. This
   is because the RTTM mechanism specified resulting new ACK would not carry a SACK option but
   would acknowledge RecoveryPoint. Thus, step (5-SACK) in [RFC1323], i.e., it needs Section 3.2
   would become effective.

4. Security Considerations

   There do not seem to be
   subtracted from the echoed timestamp before any security considerations associated with
   the RTT can be
   calculated. For this to work, Eifel algorithm itself. This is because the sender needs Eifel algorithm is
   only a detection scheme that is not tied to store all
   "outstanding timestamps" and the corresponding random number. With
   this modification, any specific action that
   might alter existing protocol state at the TCP sender only identifies or receiver.
   That is, no value of a retransmit as
   spurious if state variable other than one introduced by
   the ACK algorithm itself (SpuriousRecovery, RecoverPoint, and
   RetransmitTS) is changed.

   However, security considerations might exist for the original transmit echoes the "random
   timestamp" response schemes
   that was sent. Thus, assuming a receiver has no easy
   access to use the mentioned random numbers, this should provide for Eifel algorithm as a
   fairly secure protection against malicious receivers basis. In particular, it needs to
   be considered that spoof the
   "right" ACK that would trigger TCP receiver might by lying about the Eifel algorithm.
   retransmit-marker.

Acknowledgments

   Many thanks to Keith Sklower for helping to develop the tools that
   allowed the study of spurious timeouts and packet re-orderings. Many
   thanks to Sklower, Randy Katz, Michael Meyer, Stephan
   Baucke, Sally Floyd, Vern Paxson, and Mark Allman Allman, Ethan Blanton, and
   Andrei Gurtov for very useful discussions around the Eifel
   algorithm. that contributed to this
   work.

References

   [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control,
             RFC 2581, April 1999.

   [BPS99]   J.C.R. Bennett, C. Partridge, N. Shectman, Packet
             Reordering is Not Pathological Network Behavior, IEEE/ACM
             Transactions on Networking, December `99.

   [RFC1122] R. Braden, Requirements for Internet Hosts - Communication
             Layers, RFC 1122, October 1989.

   [BA01]    E. Blanton, M. Allman, Adjusting the Duplicate ACK
             Threshold to Avoid Spurious Retransmits, work in progress,
             July 2001.

   [RFC2119] S. Bradner, Key words for use in RFCs to Indicate
             Requirement Levels, RFC 2119, March 1997.

   [RFC2507]

   [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
             Performance, RFC 1323, May 1992.

   [RFC2018] M. Degermark, B. Nordgren, Mathis, J. Mahdavi, S. Pink, IP Header Compression, Floyd, A. Romanow, TCP Selective
             Acknowledgement Options, RFC 2018, October 1996.

   [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's
             Fast Recovery Algorithm, RFC 2507, February 2582, April 1999.

   [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow,
             An Extension to the Selective Acknowledgement (SACK) Option
             for TCP, RFC 2883, July 2000.

   [ISO8073] ISO/IEC, Information processing systems - Open Systems
             Interconnection - Connection oriented transport protocol
             specification, International Standard ISO/IEC 8073,
             December 1988.

   [RFC1323] V. Jacobson, R. Braden, D. Borman,

   [Gu01]    A. Gurtov, Effect of Delays on TCP Extensions for High Performance, RFC 1323, May 1992.

   [RFC1144] V. Jacobson, Compressing TCP/IP Headers for Low-Speed
             Serial Links, RFC 1144, February 1990. In
             Proceedings of IFIP Personal Wireless Communications,
             August 2001.

   [GL01]    A. Gurtov, R. Ludwig, Making TCP Robust Against Delay
             Spikes, work in progress, November 2001.

   [KP87]    P. Karn, C. Partridge, Improving Round-Trip Time Estimates
             in Reliable Transport Protocols, In Proceedings of ACM
             SIGCOMM 87.

   [LK00]    R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP
             Robust Against Spurious Retransmissions, ACM Computer
             Communication Review, Vol. 30, No. 1, January 2000,
             available at http://www.acm.org/sigcomm/ccr/archive/2000/
             jan00/ccr-200001-ludwig.html (easier studied when
             viewed/printed in color).

   [LS00]

   [LM01]    R. Ludwig, K. Sklower, M. Meyer, The Eifel Retransmissions Timer, ACM
             Computer Communication Review, Vol. 30, No. 3, July 2000.

   [Pax97] TCP Retransmit (RXT) Flag, work in
             progress, November 2001.

   [LG01]    R. Ludwig, A. Gurtov, Responding to Spurious Timeouts in
             TCP, work in progress, November 2001.

   [RFC2988] V. Paxson, End-to-End Routing Behavior in the Internet,
             IEEE/ACM Transactions on Networking, Vol.5, No.5, October
             1997. M. Allman, Computing TCP's Retransmission Timer,
             RFC 2988, November 2000.

   [RFC791]  J. Postel, Internet Protocol, RFC 791, September 1981.

   [RFC793]  J. Postel, Transmission Control Protocol, RFC793, September
             1981.

   [SCWA99]  S. Savage, N. Cardwell, D. Wetherall, T. Anderson, TCP
             Congestion Control with a Misbehaving Receiver, ACM
             Computer Communication Review, Vol. 29, No. 5, October
             1999.

   [WS95]    G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2
             (The Implementation), Addison Wesley, January 1995.

Author's Address

     Reiner Ludwig
     Ericsson Research (EED)
     Ericsson Allee 1
     52134 Herzogenrath, Germany
     Phone: +49 2407 575 719
     Fax:   +49 2407 575 400
     Email: Reiner.Ludwig@Ericsson.com

This Internet-Draft expires in August 2001. May 2002.