Network Working Group Reiner Ludwig INTERNET-DRAFT Ericsson Research Expires:
August 2001 February 23,May 2002 November, 2001 The Eifel Algorithm for TCP <draft-ietf-tsvwg-tcp-eifel-alg-00.txt><draft-ietf-tsvwg-tcp-eifel-alg-01.txt> Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract TCP's intertwined error and congestion control is not robust against spurious timeouts nor is it robust against packet re-orderings.A packet that is delayed in the network beyondsolution to eliminate the expiration of TCP'sretransmission timer,ambiguity in TCP, is mistaken for a packet loss by a TCP sender. Also,to mark ACKs with a packet that is re-orderedspecial retransmit-marker. The marker would need to be present in those ACKs, and only those ACKs, that the network beyond TCP's duplicate acknowledgment threshold, is eventually mistaken for a packet loss by aTCP sender. Both situations leadreceiver sends in response to a spurious retransmit of the oldest outstanding segment,retransmits; both genuine and an unnecessary reduction of the congestion window at the sender. Moreover, aspurious timeout forces the sender intoretransmits. Based on such a go-back-N retransmission mode leading to spurious retransmits of all outstanding segments. We proposeretransmit-marker, the "Eifel algorithm" as a way to make TCP robust against spurious timeouts and packet re-orderings. TheEifel algorithm uses extra information inallows the ACKsTCP sender to reliablydetect (a posteriori)a spurious retransmit of the oldest outstanding segment at the TCP sender. In response to suchposteriori that a detection, the Eifel algorithm restores the congestion window, and prevents the spurious go-back-N retransmits followingfast retransmit or a spurious timeout. As extra informationtimeout was spurious. Three alternative retransmit-markers are defined in the ACKs,this document, and the Eifel algorithm allows for two alternatives: the timestamp option [RFC1323] and/or two new flags in the Reserved fieldmay be based on either one of them: the TCP header. 1. Introduction In this document, we useRXT flag, the terms 'valid ACK' as defined in [RFC793],TCP Timestamps option, and the terms 'duplicate ACK' (DUPACK), 'Congestion Window' (cwnd), and 'Slow Start Threshold' (ssthresh)TCP SACK option. The Eifel algorithm provides a basis for future TCP enhancements such as defined in [RFC2581]. Further, our use of the term 'retransmit' includes both fast retransmits triggered by the third DUPACK and retransmits triggered byresponse schemes that may change a timeout.TCP sender's protocol state to improve end-to-end performance. Terminology The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. TCP's intertwined error and congestion control is not robust against spurious timeouts nor is it robust against packet re-orderings. A packet that is delayed in the network beyond the expiration of TCP's retransmission timer, is mistaken for a packet loss by a TCP sender. This results in a so-called spurious timeout, i.e., a timeout that would not have occurred hadWe use the sender "waited longer". Also, a packetterm 'new ACK' to refer to an ACK that is re-ordered in the network beyond TCP's DUPACK threshold of 3, is eventually mistaken for a packet loss by a TCP sender. This is because the fast retransmit algorithm usesacknowledges outstanding data. We use the arrival of 3 DUPACKsterm 'duplicate' ACK as an indication that a segment has been lost [RFC2581]. Both situations leaddefined in [WS95]. Furthermore, we refer to a spurious retransmit ofthe oldest outstanding segment, and an unnecessary reductionfirst-time transmission of the congestion window at the sender. Moreover, a spurious timeout forces the sender intoa go-back-N retransmission mode leading to spurious retransmits of all outstanding segments. A detailed explanation of these effects using trace plots is found in [LK00]. We propose the "Eifel algorithm"data segment as a way to make TCP robust against spurious timeoutsthe 'original transmit', and packet re-orderings. The Eifel algorithm'HighData' is based on the observation thatthe spurious go-back-N retransmits followinghighest sequence number transmitted at a spurious timeout and the unnecessary reduction of the congestion window caused by packet re-ordering have the same root: the retransmission ambiguity.given point. 1. Introduction The retransmission ambiguity problem [KP87] is the TCP sender's inability to distinguish anwhether the first new ACK forthat arrives after a retransmit was sent in response to the original transmit of a segment fromor the ACK for itsretransmit. The Eifel algorithm uses extra information inA solution to eliminate the ACKsretransmission ambiguity is to reliably detect (a posteriori)mark ACKs with a spurious retransmit of the oldest outstanding segment atspecial "retransmit-marker". The marker would need to be present in those ACKs, and only those ACKs, that the TCP sender. Inreceiver sends in response to such a detection,retransmits; both genuine and spurious retransmits. Based on such a retransmit-marker, the Eifel algorithm restores the congestion window, and preventsallows the spurious go-back-N retransmits followingTCP sender to detect a spurious timeout. Spurious timeouts have not generally beenposteriori that a concern in the past since they are rare [Pax97].fast retransmit or a timeout was spurious. This can be credited to the conservativenessadded capability of TCP's retransmission timer [LS00]. Yet, there is benefit in avoidingthe detrimental effects that spurious timeouts have onTCP performance. This is since those effects create a strong incentive for keeping a conservative - potentially too conservative - retransmission timer. However, a retransmission timer thatsender is too conservativeuseful in environments where TCP's loss recovery and congestion control algorithms may cause long idle times before a lost packet is retransmitted.often get falsely triggered. This can degrade performance. This is obvious for interactive request/response-style connections. But it also affects bulk data transfers whenever the sender has exhausted its send window before the retransmission timer has expired. The Eifel algorithm opens the door to the development ofbe caused by packet reordering, packet duplication, or a more optimistic retransmission timer as it ensures that the penalty for underestimating the round- trip time is minimal. Insudden delay increase in the common case,data or the only penalty isACK path that results in a singlespurious retransmit. Packet re-orderingstimeout. For example, such sudden delay increases can often occur in wide-area wireless access networks due to handovers, resource preemption, or because the connection-less nature of IP [RFC791] which does not guarantee an in-order delivery of packets. However, it is difficult to evaluate how serious this is in the Internet today. Early studies [Pax97] conclude that this occurs rarely, while recent studies [BPS99] find this problem to be more serious. Clearly, this dependsmobile transmitter traverses through a radio coverage hole (e.g., see [Gu01]). Based on the paths underlying such studies, e.g., whenever routers are inter-connected via multiple links/paths (for fault tolerance) and load balancing is performed across those links/paths onEifel algorithm, the aggregate traffic, packet re-orderings will occur more frequently. 2. Detecting a Spurious Retransmit inTCP Detecting the retransmission ambiguity requires extra information in the ACKs that thesender can usemay then choose to unambiguously distinguish an ACK for the original transmitimplement dedicated response schemes. One goal of such a segment from thatscheme would be to alleviate the consequences of a retransmit. Thisfalsely triggered loss recovery phase. For example, an important feature of the scheme proposed in turn requires[LG01] is that every segment and the corresponding ACK carry the extra information to allow the sender to avoidit avoids the spurious go-back-N retransmits described in Section 1. Waiting for the receiver to signal in DUPACKsthat is has correctly received duplicate segments, as proposed in [RFC2883],typically occur after a spurious timeout. Another goal would be too late,to "upgrade" the congestion control parameters, the congestion window and slow start threshold [RFC2581]. This is thus not an alternative. As extra information into compensate for the ACKs,unnecessary reduction of their values when the Eifel algorithm allows for two alternatives:loss recovery was falsely triggered. Yet, another goal would be to adapt other protocol parameters, e.g., the timestamp option [RFC1323] and/or two new flags induplicate acknowledgement threshold [RFC2581], and the Reserved fieldRTT estimators [RFC2988]. This is to reduce the risk of falsely triggering TCP's loss recovery again in the TCP header. Both alternativesfuture. However, such response schemes are specified inoutside the following two subsections. In Section 2.3, we specify precedence rules among those two alternatives. We speakscope of this document. They are addressed in different documents (e.g., see [LG01] and [BA01]). A key feature of the timestamp-basedEifel algorithm to emphasizeis that timestamps are being usedit already detects upon the first new ACK that arrives during a loss recovery phase that a fast retransmit or a timeout was spurious. This is crucial to detectbe able to avoid the mentioned spurious go-back-N retransmits. Likewise, we speak ofAnother feature is that the flag-basedEifel algorithm. 2.1. Detection based on the Xmit-Echo Flag We define Bit 6 and Bit 7 inalgorithm is fairly robust against ACK losses. Especially, the Reserved fieldloss of the TCP header as the "Xmit flag" and "Xmit-Echo flag", respectively. The sender uses the Xmit flag to mark retransmits whilesingle ACK carrying the receiver setsretransmit-marker does not affect the Xmit- Echo flag infunctioning of the ACKs it sends in response toEifel algorithm. 2. Spurious Retransmits The following events falsely trigger a segment with the Xmit flag set. Since,TCP can besender's loss recovery and congestion control algorithms. This causes a senderso-called spurious retransmit, and receiver at the same time, two separate bits need to be used for those flags. It is worth noting that the two flags are similar to the sub-sequence field proposed in [ISO8073]. The locationan unnecessary reduction of the 6-bit Reserved field in theTCP header is shown in Figure 3 of [RFC793]. Bit 8 and 9 of the Reserved fieldsender's congestion window. - spurious timeouts - packet reordering - packet duplication A spurious timeout is a timeout that would not have been assigned tooccurred had the Explicit Congestion Notification (ECN) [RFC2481]. 2.1.1. TCP Initialization The textsender "waited longer". It may be caused by increased delay that suddenly occurs in this subsection has been derived from Section 6.1.1 of [RFC2481]. This is becausethe same initialization semantics also apply to the flag-based Eifel algorithm. Thus, TCP's that support ECN, and wish to supportdata or the flag-based Eifel algorithm, should be ableACK path. This in turn might cause an ACK to re-use most ofarrive too late, i.e., only after the initialization code implemented for ECN. When aTCP sendssender's retransmission timer has expired. This would then trigger a SYN packet, it MAY set (i.e., equal to 1)spurious retransmit. Note that the Xmit and Xmit-Echo flag. Formentioned ACK could either be a SYN packet, the setting of both flags is defined as an indicationnew ACK that would acknowledge the sending TCP whishes to useoldest outstanding segment, or it could be the flag-based Eifel algorithm, rather than as an indicationthird duplicate ACK that the SYN packet is a retransmit. More precisely, suchwould have triggered a SYN packet indicates thatfast retransmit of the TCP transmittingoldest outstanding segment [RFC2581]. Note: In the SYN packet will participate inlatter case the flag-based Eifel algorithm as both asender and receiver. Only ifshould suppress the fast retransmit that the third duplicate ACK would usually trigger. In general, a TCP receivessender should ignore all duplicate ACKs that arrive after a SYN packet with bothtimeout [GL01]. Packet reordering in the Xmit and Xmit-Echo flags set, MAY it respond withnetwork may occur because IP [RFC791] is a SYN-ACK packetconnection-less protocol. Thus, IP does not guarantee an in-order delivery of packets. This results in which it sets the Xmit-Echo flag, but unsets (i.e., sets equal to 0) the Xmit flag. Fora SYN-ACK packet,spurious fast retransmit if three or more data segments arrive out-of-order at the patternTCP receiver, and at least three of the Xmit-Echo flag set andresulting duplicate ACKs arrive at the Xmit flag unset is defined as an indicationTCP sender. This is because a TCP receiver generates a duplicate ACK for each segment that arrives out-of-order, and three consecutive duplicate ACKs trigger the TCP transmittingsender's fast retransmit algorithm. Packet duplication in the network may also occur because IP is connection-less. That is, the SYN-ACKreceiving IP does not (cannot) remove duplicate packets. A TCP receiver in turn also generates a duplicate ACK for each duplicate segment. As with packet agrees to participatereordering, this results in the flag-based Eifel algorithm as botha senderspurious fast retransmit if duplication of data segments or ACKs results in three or more duplicate ACKs to arrive at the TCP sender. The negative impact on TCP performance caused by packet reordering and receiver. This asymmetrypacket duplication is necessary forcommonly the robust negotiation ofsame: a single spurious retransmit (the fast retransmit), and the useunnecessary halving of the flag-based Eifel algorithm with deployedTCP implementations (see section 6.1.1sender's congestion window as a result of [RFC2481] for details). Forthe subsequent fast recovery phase [RFC2581]. The negative impact on TCP transmitting the SYN packet with both the Xmit and Xmit- Echo flags set, the flag-based Eifel algorithm has been successfully negotiated, if it receivesperformance caused by a SYN-ACK packet in which the Xmit-Echo flagspurious timeout is set, butmore severe. First, the Xmit flag is unset. Fortimeout event itself causes a single spurious retransmit, and unnecessarily forces the TCP transmitting the SYN-ACK packet,sender into slow start [RFC2581]. Then, as the flag-based Eifel algorithm has been successfully negotiated, if it has receivedconnection progresses, a SYN packet with both the Xmit and Xmit-Echo flags set, while it has setchain reaction gets triggered that further decreases TCP's performance. Since the Xmit-Echo flag but unsettimeout was spurious, at least some ACKs for original transmits will typically arrive at the Xmit flag in its SYN-ACK. 2.1.2. TheTCP Receiver If the flag-based Eifel algorithm has been successfully negotiated,sender before the following rules apply. The receiver SHOULD send an immediateACK for the retransmit arrives. (This is unless severe packet reordering coincided with the Xmit-Echo flag setspurious timeout in response to an incoming data segmentsuch a way that hasthe Xmit flag set. The immediateACK for the retransmit is RECOMMENDED because ofthe range check thatfirst new ACK to arrive at the sender performs on incomingTCP sender.) Those ACKs after a retransmit (see Section 2.1.2). In all other cases, the receiver MUST unset (i.e., set equal to 0) the Xmit-Echo flag in all ACKs it sends. 2.1.3. The TCP Sender If the flag-based Eifel algorithm has been successfully negotiated, the following rules apply. The sender MUST set the Xmit flag infor original transmits will then trigger an implicit go- back-N loss recovery at the TCP header of retransmits. This is REQUIRED since otherwise the Eifel algorithm might get (falsely) triggered in response to a genuine packet loss (see Section 5). Recall,sender. Assuming that our usenone of the term 'retransmit' includes both fast retransmits triggered by the third DUPACK and retransmits triggered by a timeout. The sender MUST store in "cwnd_prev" the value that the sender's cwnd had before it is reduced when the retransmit occurs. Likewise, the sender MUST store in "ssthresh_prev" the value that the sender's ssthresh had before it is reduced when the retransmit occurs. This is REQUIRED since the sender will use cwnd_prev and ssthresh_prev to restore its cwnd and ssthresh after it has detected that the retransmit was spurious (see Section 3). When a retransmit is sent, the sender MUST store the next (previously unsent) sequence number to be sent in "sent_high", i.e., sent_high is the highestoutstanding sequence number transmitted so far plus 1, or alternatively, the ACK numbersegments and none of a valid ACK that would ackthe corresponding ACKs were lost, all outstanding data.segments will get retransmitted unnecessarily. Moreover, this will then typically cause a spurious fast retransmit. This is REQUIRED sincebecause the sender will use sent_high to detect aspurious retransmitgo-back-N retransmits will arrive as described below. Note thatduplicates at the definition of "send_high" (spelled with a 'd')TCP receiver which in [RFC2582] is different. When the first valid ACK that is not a DUPACK arrives afterturn triggers a retransmit was sent, the sender detectsseries of duplicate ACKs. Note that the retransmit wasthis last spurious if all of the following conditions are true: - the sender was expecting an ACK for a retransmit, and -fast retransmit could be avoided with the ACK number of that ACK is less than or equal to sent_high, and - that ACK does not have the Xmit-Echo bit set. The range check implied by the second condition prevents that the Eifel algorithm is triggered in situations where a series of ACKs is lost and a cumulative ACK beyond sent_high acks the retransmit. 2.2. Detection based on Timestamps The timestamp-based Eifel algorithm requires that both the sender and receiver have correctly implemented the timestamp option as specified in [RFC1323]. In addition, the TCP sender implementation needs to be enhanced as specified in Section 2.2.2. No change to the TCP protocol is required nor any change to the TCP receiver implementation. 2.2.1. TCP Initialization The timestamp-based Eifel algorithm has been successfully negotiated, if use of the timestamp option has been successfully negotiated during connection setup (see [RFC1323]). 2.2.2. The TCP Receiver No change is required beyond the implementation of the timestamp option as specified in [RFC1323]. 2.2.3. The TCP Sender If the timestamp-based Eifel algorithm has been successfully negotiated, the following rules apply. The sender MUST store in "ts_first_xmit" the timestamp of the first retransmit for a data segment. This is REQUIRED since otherwise the Eifel algorithm might get (falsely) triggered in response to a genuine packet loss (see Section 5). For the same reason, any subsequent retransmit for the same oldest outstanding sequence number MUST NOT overwrite ts_first_xmit. Recall, that our use of the term 'retransmit' includes both fast retransmits triggered by the third DUPACK and retransmits triggered by a timeout. As with the flag-based Eifel algorithm, the sender MUST store in cwnd_prev the value that the sender's cwnd had before it is reduced when the retransmit occurs. Likewise, the sender MUST store in ssthresh_prev the value that the sender's ssthresh had before it is reduced when the retransmit occurs. When the first valid ACK that is not a DUPACK arrives after a retransmit was sent, the sender detects that the retransmit was spurious if all of the following conditions are true: - the sender was expecting an ACK for a retransmit, and - the value of the Timestamp Echo Reply field in the timestamp option field of that ACK is less than ts_first_xmit. Using the comparison operator "less than" in the second condition is conservative. In theory, when the timestamp clock is slow or the network is fast, ts_first_xmit could (at most) also be equal to the value of the Timestamp Echo Reply field in the timestamp option field of the first or a subsequent ACK that acks the retransmit. Thus, in such a case the sender assumes that the retransmit was a genuine retransmit, i.e., that it was not spurious. 2.3. Timestamps or the Xmit-Echo Flag? The advantage of using the timestamp-based over the flag-based Eifel algorithm is that it does not require changes to the TCP protocol nor to the TCP receiver implementation. Also, [RFC1323] is already a standards track document, and the timestamp option has already been widely deployed in TCP implementations [???]. The disadvantage of using the timestamp-based over the flag-based Eifel algorithm is that including the 12 bytes TCP timestamp option field in every segment and the corresponding ACKs introduces extra protocol overhead. Moreover, current TCP/IP header compression schemes [RFC1144], [RFC2507] do not compress timestamp option fields. For those reasons, a sender might not choose to negotiate the timestamp option. Note, that timestamps are only required for the PAWS mechanism (Protect Against Wrapped Sequences) [RFC1323], since for the RTTM mechanism (Round Trip Time Measurement) [RFC1323] there exist implementation alternativescareful variant of 'bug fix' [RFC2582]. More detailed explanations including TCP trace plots that work withoutvisualize the timestamp option field [LS00].effects of packet reordering and spurious timeouts can be found in [LK00]. 3. The flag-basedEifel algorithm has none of the aboveAlgorithm 3.1 The Idea As mentioned disadvantages, but instead requires changesbefore, a solution to eliminate the TCP protocol (two new flagsretransmission ambiguity in the Reserved field of the TCP header) andTCP, is to the TCP receiver implementation. Hence, we define the following precedence rules thatmark ACKs with a special retransmit-marker. The marker would allow for an incremental deployment of the Eifel algorithm. - A TCP senderneed to be present in those ACKs, and only those ACKs, that implements the timestamp option MAY also implementthe timestamp-based Eifel algorithm. - ATCP sender MAY implement the flag-based Eifel algorithm, and if it does, MAY try to negotiate its use during connection setup. There are situations where it might be advisable notreceiver sends in response to operate the flag-based Eifel algorithm (see Section 5). - Ifretransmits; both genuine and spurious retransmits. Based on such a receiver correctly sets the Xmit-Echo flag (see Section5), the operation ofretransmit-marker, the Eifel algorithm should be based onallows the Xmit- Echo flag independent of whether timestamps are also being used. In case timestamps are also being used, aTCP sender may use timestamps as an additional checkto verify whetherdetect a posteriori that a fast retransmit or a timeout was spurious. This rule implies[Note, that the negotiation to use the Xmit-Echo flag has succeeded. - If timestamps are being used but the Xmit-Echo flagEifel algorithm as defined here is not being used fora particular TCP connection,pure detection mechanism. The original proposal of the Eifel algorithm MAY be operated based on timestamps. This rule implies that either the negotiation to use[LK00] included the Xmit-Echo flag has failed or it's use has been turned off dueTCP sender's response to a broken receiver (see Section 5). 3. Respondingdetected spurious retransmit, and a marking scheme that allows a TCP sender to distinguish an ACK for an original transmit from that of a Spurious Retransmitretransmit. Such response and marking schemes are addressed in TCP Ifdifferent documents [RFC2883], [BA01], [LM01], [LG01].] The key idea of the flag- or timestamp-basedEifel algorithm has been successfully negotiated,is to let the following rules apply. When aTCP sender has detectedtake the absence of a retransmit-marker in the first new ACK that it has performedarrives after a spurious retransmit,retransmit as sufficient evidence that the sender SHOULD resume transmission withloss recovery phase was falsely triggered. Being able to make this decision upon the next unsent segment. In addition, it restores cwnd and ssthreshfirst new ACK is crucial for response schemes such as outlined below based on[LG01] to avoid the definition of cwnd_prev and ssthresh_prev provided in Section 2.1.2 and 2.2.2. Ifspurious go-back-N retransmits that typically occur after a spurious timeout. However, making this decision upon the first (innew ACK is not absolutely reliable. A counter-example can be constructed where this approach fails: In case of multiple) spurious retransmitthe original transmit was triggered byin fact dropped in the network, a spurious timeout, and if onlynew ACK acknowledging the retransmit would also not carry a singleretransmit-marker if (1) the retransmit arrived at the TCP receiver in sequence, i.e., if it had been sent,jumped ahead of all data segments that were outstanding when the sender MAY restore cwnd with cwnd_prevretransmit was sent, and ssthresh with ssthresh_prev. Elseif (2) the first (inACK for the retransmit carrying the retransmit-marker got lost. In that case the mentioned new ACK would correspond to one of multiple) spuriousthe data segments that were outstanding when the retransmit was notsent. Note: This example holds independent of whether the loss recovery phase was triggered by the arrival of the third duplicate ACK or by a spurious timeout,timeout. Note also: in case the SACK option is used as the retransmit-marker (see Section 3.2 and if only3.4), condition (2) is not required. Nevertheless, this counter-example is regarded as being a single retransmit had been sent, the sender first sets cwndrather pathological case. In addition, it seems to ssthresh, asbe the only conceivable counter-example. Therefore, it would do anyway [RFC2581], and then MAY restore ssthresh with cwnd_prev. If two retransmitsis regarded as sufficiently safe to take the absence of a retransmit-marker in the same oldest outstanding sequence number had been sent,first new ACK that arrives after a retransmit as the sender MAY restore both cwnd and ssthresh with one halfsignal that the value of cwnd_prev. If more than two retransmits ofTCP sender falsely entered the same oldest outstanding sequence number had been sent,loss recovery phase. This approach makes the Eifel algorithm MUST NOT have any effect on cwnd and ssthresh. Note,fairly robust against ACK losses. This is because a number of ACKs that do not carry the responseretransmit-marker are commonly in case the first spurious retransmit was notflight during a falsely triggered byloss recovery phase. The arrival of at least one of those ACKs would be sufficient for the Eifel algorithm to make a spurious timeout as described above, is different fromdecision. Also, the original proposal described in [LK00]. However, this new response avoidsloss of the packet burst thatsingle ACK carrying the response described in [LK00] would typically cause. 4. Avoiding Competition between Timeout- and DUPACK-based Error Recovery Inretransmit-marker does not affect the spiritfunctioning of the Eifel algorithm, although unrelated to spurious retransmits, we proposealgorithm. Three alternative retransmit-markers are defined in this document, and the following rule,Eifel algorithm may be based on the definitioneither one of cwnd_prevthem: (1) the TCP RXT flag [LM01], (2) the TCP Timestamps option [RFC1323], and ssthresh_prev provided(3) the TCP SACK option [RFC2018]. The exact use of those markers is specified in Section 2.1.2 and 2.2.2.the following subsections. The rule applies toRXT flag is the case whenmost efficient retransmit-marker since it does not enlarge TCP segments nor ACKs by an option. Unlike for the third DUPACK arrives afterRXT flag and the first timeout forTimestamps option, the same oldest outstanding sequence numberSACK option has already occurred. In that case, the sender SHOULD suppressthe drawback that it can only detect whether a fast retransmit and MAY restore both cwnd and ssthresh with one half the value of cwnd_prev. I.e., the sender restores cwnd and ssthreshwas spurious. It cannot be used to detect spurious timeouts as if the timeout had not occurred, but instead goes into congestion avoidance [RFC2581]. 5. Security Considerations There is a considerable risk when implementing the Eifel algorithmexplained in a naive fashion. This is since a misbehaving receiver can severely upset congestion control at the sender.Section 3.4. Note: The riskDSACK option [RFC2883] is not an appropriate retransmit- marker that would allow to eliminate the Eifel algorithmretransmission ambiguity. The reason is (falsely) triggered in response to a genuine packet loss. Inthat casethe Eifel algorithm would mistakefirst new ACK acknowledging a genuineretransmit as a spurious retransmit. Aswill commonly not carry a consequenceDSACK option. That is, the sender would effectively not reduce its congestion window in response toDACK option commonly arrives one or more ACKs later than the lost packet. However, there are reliable sender-side mechanisms to protect against this case as outlined below. One needs to distinguish between broken receivers that misbehave due to an implementation mistake versus malicious receivers that deliberately misbehave. Wefirst describe two mechanism to protect against broken receivers, followed bynew ACK for a different mechanism to protect against malicious receivers. We only recommend that protection against broken receivers SHOULD be implemented together with the Eifel algorithm at the sender.retransmit. This is motivated byindependent of whether the factretransmit was genuine or spurious. Given that both the currentTCP implicitly assumes a trust relationship betweensender and receiver, i.e., it can be assumed that receivers are not malicious. Note that even without the Eifel algorithm, there are ways a misbehavingreceiver can upset congestion control at the sender [SCWA99]. We do not discuss problems with respect to misbehaving senders assuming thatsupport the implementation ofRXT flag, or the sender-side Eifel algorithm complies withTimestamps option, or the specifications in this text. Protection Against Broken Receivers There is no risk to falsely triggerSACK option, a TCP sender MAY implement the timestamp-basedEifel algorithm as long asdefined in the receiver correctly implementsfollowing subsections. 3.2 The Algorithm The following steps MUST be taken by the timestamp option [RFC1323]. However,TCP sender when the flag-based Eifel algorithm can be falsely triggeredthird duplicate ACK arrives or the timeout occurs, i.e., when the receiver has agreedloss recovery phase starts: (1) Set a "SpuriousRecovery" variable to set'false'. If this variable is true at step (7) below, the Xmit-Echo flag in ACKs for retransmits but then "forgets" to do so. The ACK for a genuine retransmit would thenloss recovery phase had been falsely triggertriggered. (2) Set a "RecoveryPoint" variable to HighData. When the Eifel algorithm once it arrives atTCP sender receives the sender. To protect againstfirst new ACK for this case the sender SHOULD implementdata octet the following mechanism if it usesloss recovery phase is terminated. (3-TS) This step only applies when the flag- based Eifel algorithm. AfterTimestamps option is used as the sender has detectedretransmit-marker: Set a spurious retransmit and in response restores cwnd and ssthresh with cwnd_prev and ssthresh_prev, respectively (see Section 3), it also saves"RetransmitTS" variable to the former valuesvalue of cwnd and ssthreshthe Timestamp Value field of the Timestamps option included in cwnd_prev and ssthresh_prev, respectively (e.g., help = cwnd; cwnd = cwnd_prev; cwnd_prev = help;). Until an ACK arrives atthe first retransmit. A TCP sender MUST ensure that acks beyond sent_high,RetransmitTS does not get overwritten as the sender checks forloss recovery phase progresses, e.g., in case of a valid ACK that arrives withsecond timeout and subsequent second retransmit of the Xmit-Echo flag set. If such an ACK arrives,same octet. The following steps MUST be taken by the TCP sender assumes that the Eifel algorithm was rightful triggered and does nothing further. Otherwise,when anthe first new ACK arrives atafter the sender that acks beyond sent_high,loss recovery phase started: (4) If the sender assumesACK's sequence number is greater than the value of RecoveryPoint , proceed to step (7) below. This condition ensures that only those ACKs are considered that correspond to segments that were outstanding at the Eifel algorithm was falsely triggered and reversestime the effects ofretransmit was sent. (5-SACK) This step only applies when the Eifel algorithm. I.e. cwnd is (re-)restored to cwnd_prev and ssthreshSACK option is (re-)restored to ssthresh_prev. Since, alsoused as the ACKs withretransmit-marker: If the Xmit- Echo flag set can get lost, it would be too conservativeACK's sequence number is equal to completely disable the Eifel algorithm forthe restvalue of the connectionRecoveryPoint, proceed to step (7) below. This step is motivated in this situation. Another risk stems from receivers that setSection 3.4. (6-RXT) This step only applies when the Xmit-EchoRXT flag for segments that have not been retransmitted. A TCP sender SHOULD implement an appropriate detection mechanism, since in this case, it cannot reliably useis used as the flag-based Eifel algorithm.retransmit-marker: If a sender detects such misbehavior, it SHOULD disablethe flag-based Eifel algorithm forACK does not have the rest ofRXT flag set (binary 1), set SpuriousRecovery to 'true' and proceed to step (7) below. (6-TS) This step only applies when the connection. Protection Against Malicious Receivers There are number of ways in which a malicious receiver could falsely triggerTimestamps option is used as the Eifel algorithm atretransmit-marker: If the sender. A sender is particularly vulnerable to this if it operatesvalue of the flag-based Eifel algorithm. Hence,Timestamp Echo Reply field of the sender MAY chooseACK's Timestamps option is smaller than RetransmitTS, set SpuriousRecovery to 'true' and proceed to step (7) below. This step is commented in Section 3.3. (6-SACK) This step only useapplies when the timestamp-based Eifel algorithm,SACK option is used as the retransmit-marker, and in addition implementonly if the following mechanism. The mechanism combinesloss recovery phase had been triggered by the ideaarrival of the third duplicate ACK: If the ACK does not carry a "singular nonce" proposedSACK option, set SpuriousRecovery to 'true' and proceed to step (7) below. This step is motivated in [SCWA99] withSection 3.4. (7) Done. No further processing. 3.3 Comments to the timestamp option specified in [RFC1323].Timestamp-based Detection The mechanismcomparison operator "smaller than" in step (6-TS) is based onconservative. In theory, when the observation that after a spurious retransmit"timestamp clock" is slow or the sender willnetwork is fast, RetransmitTS could at some point receivemost be equal to the ACK that was triggeredtimestamp echoed by the correspondingan ACK sent in response to an original transmit assumingtransmit. In that case, it is assumed that ACKthe loss recovery was not lost. That ACK can be expectedfalsely triggered. 3.4 Comments to echo the timestamp ofthe original transmit even if the receiver implements the delayed ACK algorithm. In case of packet re-ordering, thisSACK-based Detection The SACK option is implied bynot a retransmit-marker in the rules for generating ACKs for data segmentssense that fill in all or part of a gapit is present only in those ACKs that the sequence spaceTCP receiver sends in response to retransmits (see section 4.2 of [RFC2581]) and bySection 3.1). This is why the rules for echoing timestamps inSACK option cannot be used to detect that a timeout was spurious. For this, we only need to consider the case (see rule (C) in section 3.4where an entire flight of [RFC1323]). Insegments was lost versus the case of a spurious timeout, it is quite likelytimeout. Assuming that packet reordering did not concur, the delay that has causedfirst new ACK arriving after the spurious timeout has also causedretransmit would not carry a SACK option in either case. Thus, the receiver's delayed ACK timer [RFC1122] to expire. Hence,retransmission ambiguity problem would still exist. Nevertheless, the SACK option can be used to protect againstdetect that a malicious receiver, the sender shouldfast retransmit was spurious. Let's assume a SACK-enabled TCP receiver. If only trigger the Eifel algorithma single segment was in response tofact lost, then the first new ACK for the original transmit and onlyafter it has authenticated that ACK as described below. To protect against malicious receivers that spoof ACKs, the sender MAY implementthe following modification to the timestamp option specified in [RFC1323]. The sender addsfast retransmit will not carry a separate random number to each timestamp to be included in an outgoing segment,SACK option, and writeswill acknowledge RecoveryPoint. If multiple segments were in fact lost, then the result intofirst new ACK after the Timestamp Value field infast retransmit will carry a SACK option, and will acknowledge below RecoveryPoint, i.e., the timestampACK will be partial. Thus, partial ACKs that do not carry a SACK option field. Aare impossible independent of how many segments were lost. Conversely, the first new random numberACK after the fast retransmit that is generated per RTT to avoida partial ACK and that does not carry a SACK option, signals that the receiver "learns" the random number. That random number shouldloss recovery phase was falsely triggered. This motivates steps (5-SACK) and (6- SACK) in Section 3.2. Note: The SACK option cannot be carefully chosenused to avoid bad interactions withdetect that a fast retransmit was spurious when the PAWS mechanism specified in [RFC1323]. Clearly,entire flight of segments jumps ahead of the random number(s) need to be accounted for infirst segment (the oldest outstanding segment) of that flight. This is because the RTTM mechanism specifiedresulting new ACK would not carry a SACK option but would acknowledge RecoveryPoint. Thus, step (5-SACK) in [RFC1323], i.e., it needsSection 3.2 would become effective. 4. Security Considerations There do not seem to be subtracted from the echoed timestamp beforeany security considerations associated with the RTT can be calculated. For this to work,Eifel algorithm itself. This is because the sender needsEifel algorithm is only a detection scheme that is not tied to store all "outstanding timestamps" and the corresponding random number. With this modification,any specific action that might alter existing protocol state at the TCP sender only identifiesor receiver. That is, no value of a retransmit as spurious ifstate variable other than one introduced by the ACKalgorithm itself (SpuriousRecovery, RecoverPoint, and RetransmitTS) is changed. However, security considerations might exist for the original transmit echoes the "random timestamp"response schemes that was sent. Thus, assuming a receiver has no easy access touse the mentioned random numbers, this should provide forEifel algorithm as a fairly secure protection against malicious receiversbasis. In particular, it needs to be considered that spoofthe "right" ACK that would triggerTCP receiver might by lying about the Eifel algorithm.retransmit-marker. Acknowledgments Many thanks to Keith Sklower for helping to develop the tools that allowed the study of spurious timeouts and packet re-orderings. Many thanks toSklower, Randy Katz, Michael Meyer, Stephan Baucke, Sally Floyd, Vern Paxson, andMark AllmanAllman, Ethan Blanton, and Andrei Gurtov for very useful discussions around the Eifel algorithm.that contributed to this work. References [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, RFC 2581, April 1999. [BPS99] J.C.R. Bennett, C. Partridge, N. Shectman, Packet Reordering is Not Pathological Network Behavior, IEEE/ACM Transactions on Networking, December `99. [RFC1122] R. Braden, Requirements for Internet Hosts - Communication Layers, RFC 1122, October 1989.[BA01] E. Blanton, M. Allman, Adjusting the Duplicate ACK Threshold to Avoid Spurious Retransmits, work in progress, July 2001. [RFC2119] S. Bradner, Key words for use in RFCs to Indicate Requirement Levels, RFC 2119, March 1997. [RFC2507][RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High Performance, RFC 1323, May 1992. [RFC2018] M. Degermark, B. Nordgren,Mathis, J. Mahdavi, S. Pink, IP Header Compression,Floyd, A. Romanow, TCP Selective Acknowledgement Options, RFC 2018, October 1996. [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's Fast Recovery Algorithm, RFC 2507, February2582, April 1999. [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, An Extension to the Selective Acknowledgement (SACK) Option for TCP, RFC 2883, July 2000. [ISO8073] ISO/IEC, Information processing systems - Open Systems Interconnection - Connection oriented transport protocol specification, International Standard ISO/IEC 8073, December 1988. [RFC1323] V. Jacobson, R. Braden, D. Borman,[Gu01] A. Gurtov, Effect of Delays on TCP Extensions for HighPerformance, RFC 1323, May 1992. [RFC1144] V. Jacobson, Compressing TCP/IP Headers for Low-Speed Serial Links, RFC 1144, February 1990.In Proceedings of IFIP Personal Wireless Communications, August 2001. [GL01] A. Gurtov, R. Ludwig, Making TCP Robust Against Delay Spikes, work in progress, November 2001. [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates in Reliable Transport Protocols, In Proceedings of ACM SIGCOMM 87. [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP Robust Against Spurious Retransmissions, ACM Computer Communication Review, Vol. 30, No. 1, January 2000, available at http://www.acm.org/sigcomm/ccr/archive/2000/ jan00/ccr-200001-ludwig.html (easier studied when viewed/printed in color). [LS00][LM01] R. Ludwig, K. Sklower,M. Meyer, The Eifel Retransmissions Timer, ACM Computer Communication Review, Vol. 30, No. 3, July 2000. [Pax97]TCP Retransmit (RXT) Flag, work in progress, November 2001. [LG01] R. Ludwig, A. Gurtov, Responding to Spurious Timeouts in TCP, work in progress, November 2001. [RFC2988] V. Paxson, End-to-End Routing Behavior in the Internet, IEEE/ACM Transactions on Networking, Vol.5, No.5, October 1997.M. Allman, Computing TCP's Retransmission Timer, RFC 2988, November 2000. [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981. [RFC793] J. Postel, Transmission Control Protocol, RFC793, September 1981. [SCWA99] S. Savage, N. Cardwell, D. Wetherall, T. Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM Computer Communication Review, Vol. 29, No. 5, October 1999.[WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 (The Implementation), Addison Wesley, January 1995. Author's Address Reiner Ludwig Ericsson Research (EED) Ericsson Allee 1 52134 Herzogenrath, Germany Phone: +49 2407 575 719 Fax: +49 2407 575 400 Email: Reiner.Ludwig@Ericsson.com This Internet-Draft expires in August 2001.May 2002.