draft-ietf-tsvwg-tcp-eifel-response-02.txt | draft-ietf-tsvwg-tcp-eifel-response-03.txt | |||
---|---|---|---|---|
Network Working Group Reiner Ludwig | Network Working Group Reiner Ludwig | |||
INTERNET-DRAFT Ericsson Research | INTERNET-DRAFT Ericsson Research | |||
Expires: June 2003 Andrei Gurtov | Expires: September 2003 Andrei Gurtov | |||
Sonera Corporation | Sonera Corporation | |||
December, 2002 | March, 2003 | |||
The Eifel Response Algorithm for TCP | The Eifel Response Algorithm for TCP | |||
<draft-ietf-tsvwg-tcp-eifel-response-02.txt> | <draft-ietf-tsvwg-tcp-eifel-response-03.txt> | |||
Status of this memo | Status of this memo | |||
This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that other | Task Force (IETF), its areas, and its working groups. Note that other | |||
groups may also distribute working documents as Internet-Drafts. | groups may also distribute working documents as Internet-Drafts. | |||
skipping to change at page 1, line 34 | skipping to change at page 1, line 34 | |||
material or cite them other than as "work in progress". | material or cite them other than as "work in progress". | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/lid-abstracts.txt | http://www.ietf.org/ietf/lid-abstracts.txt | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html | http://www.ietf.org/shadow.html | |||
Abstract | Abstract | |||
The Eifel response algorithm uses the Eifel detection algorithm to | The Eifel response algorithm requires a detection algorithm to detect | |||
detect a posteriori whether the TCP sender has entered loss recovery | a posteriori whether the TCP sender has entered loss recovery | |||
unnecessarily. In response to a spurious timeout it avoids the often | unnecessarily. In response to a spurious timeout it adapts the | |||
unnecessary go-back-N retransmits that would otherwise be sent, and | retransmission timer to avoid further spurious timeouts, and can | |||
adapts the retransmission timer to avoid further spurious timeouts. | avoid - depending on the detection algorithm - the often unnecessary | |||
Likewise, it adapts the duplicate acknowledgement threshold in | go-back-N retransmits that would otherwise be sent. Likewise, it | |||
response to a spurious fast retransmit. In both cases, the Eifel | adapts the duplicate acknowledgement threshold in response to a | |||
response algorithm restores the congestion control state in such a | spurious fast retransmit. In both cases, the Eifel response algorithm | |||
way that packet bursts are avoided. | restores the congestion control state in such a way that packet | |||
bursts are avoided. | ||||
Terminology | Terminology | |||
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | |||
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | |||
document, are to be interpreted as described in [RFC2119]. | document, are to be interpreted as described in [RFC2119]. | |||
We refer to the first-time transmission of an octet as the 'original | We refer to the first-time transmission of an octet as the 'original | |||
transmit'. A subsequent transmission of the same octet is referred to | transmit'. A subsequent transmission of the same octet is referred to | |||
as a 'retransmit'. In most cases this terminology can likewise be | as a 'retransmit'. In most cases this terminology can likewise be | |||
skipping to change at page 3, line 7 | skipping to change at page 3, line 7 | |||
alternatively, the difference between SND.MAX and SND.UNA at a given | alternatively, the difference between SND.MAX and SND.UNA at a given | |||
point in time. The IW is the size of the sender's congestion window | point in time. The IW is the size of the sender's congestion window | |||
after the three-way handshake is completed. We use the TCP sender | after the three-way handshake is completed. We use the TCP sender | |||
state variables 'SRTT' and 'RTTVAR', and the term 'RTO' as defined in | state variables 'SRTT' and 'RTTVAR', and the term 'RTO' as defined in | |||
[RFC2988]. In addition, we assume that the TCP sender maintains in | [RFC2988]. In addition, we assume that the TCP sender maintains in | |||
the variable 'RTT-SAMPLE' the value of the latest round-trip time | the variable 'RTT-SAMPLE' the value of the latest round-trip time | |||
(RTT) measurement. | (RTT) measurement. | |||
1. Introduction | 1. Introduction | |||
The Eifel response algorithm relies on the Eifel detection algorithm | The Eifel response algorithm relies on a detection algorithm such as | |||
defined in [LM02]. That document discusses the relevant background | the Eifel detection algorithm defined in [RFC***B]. That document | |||
and motivation that also applies to this document. Hence, the reader | discusses the relevant background and motivation that also applies to | |||
is expected to be familiar with [LM02]. Note that alternative | this document. Hence, the reader is expected to be familiar with | |||
response algorithms are conceivable that could also rely on the Eifel | [RFC***B]. Note that alternative response algorithms have been | |||
detection algorithm. | proposed [BDA03] that could also rely on the Eifel detection | |||
algorithm, and vice versa alternative detection algorithms have been | ||||
proposed [BA02b], [SK03] that could work together with the Eifel | ||||
response algorithm. | ||||
The Eifel response algorithm uses the Eifel detection algorithm to | The Eifel response algorithm requires a detection algorithm to detect | |||
detect a posteriori whether the TCP sender has entered loss recovery | a posteriori whether the TCP sender has entered loss recovery | |||
unnecessarily. In response to a spurious timeout it avoids the often | unnecessarily. In response to a spurious timeout it adapts the | |||
unnecessary go-back-N retransmits that would otherwise be sent, and | retransmission timer to avoid further spurious timeouts, and can | |||
adapts the retransmission timer to avoid further spurious timeouts. | avoid - depending on the detection algorithm - the often unnecessary | |||
Likewise, it adapts the duplicate acknowledgement threshold in | go-back-N retransmits that would otherwise be sent. Likewise, it | |||
response to a spurious fast retransmit. In both cases, the Eifel | adapts the duplicate acknowledgement threshold in response to a | |||
response algorithm restores the congestion control state in such a | spurious fast retransmit. In both cases, the Eifel response algorithm | |||
way that packet bursts are avoided. | restores the congestion control state in such a way that packet | |||
bursts are avoided. | ||||
2. The Eifel Response Algorithm | 2. Interworking with Detection Algorithms | |||
If the Eifel response algorithm is implemented at the TCP sender, it | ||||
MUST be implemented together with a detection algorithm that is | ||||
specified in an RFC. | ||||
Designers of detection algorithms who want to offer the possibility | ||||
that their detection algorithms can work together with the Eifel | ||||
response algorithm MUST reuse the variable SpuriousRecovery with the | ||||
semantics and defined values as specified in [RFC***B]. In addition, | ||||
we define LATE_SPUR_TO (equal -1) as another possible value of the | ||||
variable SpuriousRecovery. Detection algorithms must set the value of | ||||
SpuriousRecovery to LATE_SPUR_TO if the detection is based upon | ||||
receiving the ACK for the retransmit. For example, this applies to | ||||
detection algorithms that are based on the DSACK option. | ||||
3. The Eifel Response Algorithm | ||||
The complete algorithm is specified in section 2.1. In sections 2.2 | The complete algorithm is specified in section 2.1. In sections 2.2 | |||
to 2.4, we motivate the different steps of the algorithm. | to 2.4, we motivate the different steps of the algorithm. | |||
2.1. The Algorithm | 3.1. The Algorithm | |||
Given that a TCP sender has enabled the Eifel detection algorithm | ||||
[LM02] for a connection, a TCP sender MAY use the Eifel response | ||||
algorithm as defined in this subsection. Note that this implies that | ||||
the TCP Timestamps option [RFC1323] is used for that connection. | ||||
Since the Eifel response algorithm is dependent on the Eifel | ||||
detection algorithm, we describe it as an extension of the latter. | ||||
If the combined Eifel detection and response algorithm is used, the | Given that a TCP sender has enabled a detection algorithm that | |||
following steps MUST be taken by the TCP sender, but only upon | complies with the requirements set in Section 2, a TCP sender MAY use | |||
initiation of loss recovery, i.e., when either the timeout-based | the Eifel response algorithm as defined in this subsection. | |||
retransmit or the fast retransmit is sent. Note: The algorithm MUST | ||||
NOT be reinitiated after loss recovery has already started. In | ||||
particular, it may not be reinitiated upon subsequent timeouts for | ||||
the same segment, and not upon retransmitting segments other than the | ||||
oldest outstanding segment. | ||||
Steps (1)-(6) are an one-to-one copy of the Eifel detection algorithm | If the Eifel response algorithm is used, the following steps MUST be | |||
specified in [LM02], step (0) has been added, and step (RESP) from | taken by the TCP sender, but only upon initiation of loss recovery, | |||
[LM02] has been replaced by steps (RESP)-(ReCC) given below. | i.e., when either the timeout-based retransmit or the fast retransmit | |||
is sent. Note: The algorithm MUST NOT be reinitiated after loss | ||||
recovery has already started. In particular, it may not be | ||||
reinitiated upon subsequent timeouts for the same segment, and not | ||||
upon retransmitting segments other than the oldest outstanding | ||||
segment. | ||||
(0) Before the variables cwnd and ssthresh get updated when | (0) Before the variables cwnd and ssthresh get updated when | |||
loss recovery is initiated, set a "pipe_prev" variable as | loss recovery is initiated, set a "pipe_prev" variable as | |||
follows: | follows: | |||
pipe_prev <- max (FlightSize, ssthresh) | pipe_prev <- max (FlightSize, ssthresh) | |||
(1) Set a "SpuriousRecovery" variable to FALSE (equal 0). | ||||
(2) Set a "RetransmitTS" variable to the value of the | (DTCT) This is a placeholder for a detection algorithm that must | |||
Timestamp Value field of the Timestamps option included in | be executed at this point. In case [RFC***B] is used as | |||
the retransmit sent when loss recovery is initiated. A TCP | the detection algorithm, steps (1) - (6) of that algorithm | |||
sender must ensure that RetransmitTS does not get | go here. | |||
overwritten as loss recovery progresses, e.g., in case of | ||||
a second timeout and subsequent second retransmit of the | ||||
same octet. | ||||
(3) Wait for the arrival of an acceptable ACK. When an | ||||
acceptable ACK has arrived proceed to step (4). | ||||
(4) If the value of the Timestamp Echo Reply field of the | ||||
acceptable ACK's Timestamps option is smaller than the | ||||
value of RetransmitTS, then proceed to step (5), | ||||
else proceed to step (DONE). | ||||
(5) If the acceptable ACK carries a DSACK option [RFC2883], | ||||
then proceed to step (DONE), | ||||
else if during the lifetime of the TCP connection the TCP | ||||
sender has previously received an ACK with a DSACK option, | ||||
or the acceptable ACK does not acknowledge all outstanding | ||||
data, then proceed to step (6), | ||||
else proceed to step (DONE). | ||||
(6) If the loss recovery has been initiated with a timeout- | (RESP) If SpuriousRecovery equals FALSE, then proceed to step | |||
based retransmit, then set | (DONE), | |||
SpuriousRecovery <- SPUR_TO (equal 1), | ||||
else set | else if SpuriousRecovery equals SPUR_TO, then proceed to | |||
SpuriousRecovery <- dupacks+1 | step (STO.1), | |||
(RESP) If SpuriousRecovery equals SPUR_TO, then proceed to step | else if SpuriousRecovery equals LATE_SPUR_TO, then proceed | |||
(STO.1), | to step (STO.2), | |||
else (spurious fast retransmit) proceed to step (SFR). | else (spurious fast retransmit) proceed to step (SFR). | |||
(STO.1) Resume transmission off the top: | (STO.1) Resume transmission off the top: | |||
Set | Set | |||
SND.NXT <- SND.MAX | SND.NXT <- SND.MAX | |||
(STO.2) Adapt the Conservativeness of the Retransmission Timer: | (STO.2) Adapt the Conservativeness of the Retransmission Timer: | |||
skipping to change at page 5, line 40 | skipping to change at page 5, line 29 | |||
to step (DONE), | to step (DONE), | |||
else set | else set | |||
cwnd <- min (pipe_prev, (FlightSize + IW)) | cwnd <- min (pipe_prev, (FlightSize + IW)) | |||
ssthresh <- pipe_prev | ssthresh <- pipe_prev | |||
Proceed to step (DONE). | Proceed to step (DONE). | |||
(DONE) No further processing. | (DONE) No further processing. | |||
2.2 Responding to Spurious Timeouts | 3.2 Responding to Spurious Timeouts | |||
2.2.1 Suppressing the Unnecessary go-back-N Retransmits (step STO.1) | 3.2.1 Suppressing the Unnecessary go-back-N Retransmits (step STO.1) | |||
Without the use of the TCP timestamps option, the TCP sender suffers | Without the use of the TCP timestamps option, the TCP sender suffers | |||
from the retransmission ambiguity problem [Zh86], [KP87]. This means | from the retransmission ambiguity problem [Zh86], [KP87]. This means | |||
that when the first acceptable ACK arrives after a spurious timeout, | that when the first acceptable ACK arrives after a spurious timeout, | |||
the TCP sender must believe that that ACK was sent in response to the | the TCP sender must believe that that ACK was sent in response to the | |||
retransmit when in fact it was sent in response to the original | retransmit when in fact it was sent in response to the original | |||
transmit. Furthermore, the TCP sender must also believe that all | transmit. Furthermore, the TCP sender must also believe that all | |||
other segments outstanding at that point were lost. | other segments outstanding at that point were lost. | |||
Note: Except for certain cases where original ACKs were lost, that | Note: Except for certain cases where original ACKs were lost, that | |||
skipping to change at page 6, line 26 | skipping to change at page 6, line 15 | |||
network, two retransmits are sent into the network as long as SND.NXT | network, two retransmits are sent into the network as long as SND.NXT | |||
does not equal SND.MAX (see [LK00] for more detail). | does not equal SND.MAX (see [LK00] for more detail). | |||
The use of the TCP timestamps option reliably eliminates the | The use of the TCP timestamps option reliably eliminates the | |||
retransmission ambiguity problem. Thus, once the Eifel detection | retransmission ambiguity problem. Thus, once the Eifel detection | |||
algorithm detected that a timeout was spurious, it is therefore safe | algorithm detected that a timeout was spurious, it is therefore safe | |||
to let the TCP sender resume the transmission with new data. Thus, | to let the TCP sender resume the transmission with new data. Thus, | |||
the Eifel response algorithm changes the TCP sender's state by | the Eifel response algorithm changes the TCP sender's state by | |||
setting SND.NXT to SND.MAX in that case. | setting SND.NXT to SND.MAX in that case. | |||
2.2.2 Adapting the Retransmission Timer (step STO.2) | 3.2.2 Adapting the Retransmission Timer (step STO.2) | |||
There is currently only one retransmission timer standardized for TCP | There is currently only one retransmission timer standardized for TCP | |||
[RFC2988]. We therefore only address that timer explicitly. Future | [RFC2988]. We therefore only address that timer explicitly. Future | |||
standards that might define alternatives to [RFC2988] should propose | standards that might define alternatives to [RFC2988] should propose | |||
similar measures to adapt the conservativeness of the retransmission | similar measures to adapt the conservativeness of the retransmission | |||
timer. | timer. | |||
Since the timeout was spurious, the TCP sender's RTT estimators are | Since the timeout was spurious, the TCP sender's RTT estimators are | |||
likely to be off. However, since timestamps are being used, a new and | likely to be off. However, since timestamps are being used, a new and | |||
valid RTT measurement (RTT-SAMPLE) can be derived from the acceptable | valid RTT measurement (RTT-SAMPLE) can be derived from the acceptable | |||
skipping to change at page 7, line 21 | skipping to change at page 7, line 10 | |||
that the RTT estimators "lose" their memory too soon. This is a known | that the RTT estimators "lose" their memory too soon. This is a known | |||
conflict between [RFC2988] and [RFC1323]. Especially, a large RTO | conflict between [RFC2988] and [RFC1323]. Especially, a large RTO | |||
resulting from an RTT spike will decay within one or two RTTs (e.g., | resulting from an RTT spike will decay within one or two RTTs (e.g., | |||
see [LS00]). Hence, simply reinitializing RFC2988's RTT estimators | see [LS00]). Hence, simply reinitializing RFC2988's RTT estimators | |||
from RTT-SAMPLE is probably not enough to make the retransmission | from RTT-SAMPLE is probably not enough to make the retransmission | |||
timer sufficiently conservative for at least the next couple of RTTs. | timer sufficiently conservative for at least the next couple of RTTs. | |||
A solution for the case when every segment is timed according to | A solution for the case when every segment is timed according to | |||
[RFC1323] is to make the gains adaptive to the FlightSize [LS00]. We | [RFC1323] is to make the gains adaptive to the FlightSize [LS00]. We | |||
suggest to adopt this solution for at least the SRTT. | suggest to adopt this solution for at least the SRTT. | |||
2.3 Responding to Spurious Fast Retransmits (step SFR) | 3.3 Responding to Spurious Fast Retransmits (step SFR) | |||
The assumption behind the fast retransmit algorithm [RFC2581] is that | The assumption behind the fast retransmit algorithm [RFC2581] is that | |||
a segment was lost if as many duplicate ACKs have arrived at the TCP | a segment was lost if as many duplicate ACKs have arrived at the TCP | |||
sender as indicated by DupThresh. Currently, DupThresh is specified | sender as indicated by DupThresh. Currently, DupThresh is specified | |||
as a fixed value of three [RFC2581]. That value is assumed to be | as a fixed value of three [RFC2581]. That value is assumed to be | |||
sufficiently conservative so that packet reordering and/or packet | sufficiently conservative so that packet reordering and/or packet | |||
duplication does not falsely trigger the fast retransmit algorithm. | duplication does not falsely trigger the fast retransmit algorithm. | |||
Clearly, this assumption does not hold for a particular TCP | Clearly, this assumption does not hold for a particular TCP | |||
connection once the TCP sender detects that the last fast retransmit | connection once the TCP sender detects that the last fast retransmit | |||
was spurious. It is therefore suggested to dynamically adapt | was spurious. It is therefore suggested to dynamically adapt | |||
DupThresh to the reordering characteristics observed over the course | DupThresh to the reordering characteristics observed over the course | |||
of a particular connection. | of a particular connection. | |||
At the beginning of a connection DupThresh is initialized with three. | At the beginning of a connection DupThresh is initialized with three. | |||
Then for each spurious fast retransmit that is detected, DupThresh is | Then for each spurious fast retransmit that is detected, DupThresh is | |||
set to the maximum of the previous DupThresh, and the lowest value | set to the maximum of the previous DupThresh, and the lowest value | |||
that would have avoided that last spurious fast retransmit. Note that | that would have avoided that last spurious fast retransmit. Note that | |||
the Eifel detection algorithm records the latter value in | the Eifel detection algorithm records the latter value in | |||
SpuriousRecovery. This strategy ensures that the TCP sender is able | SpuriousRecovery. This strategy ensures that the TCP sender is able | |||
to cope with the longest reordering length seen on a particular | to cope with the longest reordering length seen on a particular | |||
connection so far. | connection so far. However, the strategy may lead to fast timeouts | |||
[RFC***B], i.e., an event where the retransmission timer expires | ||||
However, the strategy bears the risk that the retransmission timer | before the TCP sender receives the duplicate ACK that would trigger a | |||
expires before the TCP sender receives the duplicate ACK that would | fast retransmit of the oldest outstanding segment. | |||
trigger a fast retransmit of the oldest outstanding segment. To | ||||
alleviate that potential problem the TCP sender may implement the | ||||
Fast Timeout algorithm proposed in [Lu02]. | ||||
Also, we believe that this strategy should be implemented together | Also, we believe that this strategy should be implemented together | |||
with an advanced version of the Limited Transmit algorithm [RFC3042]. | with an advanced version of the Limited Transmit algorithm [RFC3042]. | |||
That is for each duplicate ACK that arrives until DupThresh is | That is for each duplicate ACK that arrives until DupThresh is | |||
reached, the TCP sender should sent a new data segment if allowed by | reached, the TCP sender should sent a new data segment if allowed by | |||
the TCP receiver's advertised window, and if new data is available. | the TCP receiver's advertised window, and if new data is available. | |||
Although, the current Limited Transmit algorithm only allows this for | Although, the current Limited Transmit algorithm only allows this for | |||
the first two duplicate ACKs, we believe that such an advanced | the first two duplicate ACKs, we believe that such an advanced | |||
limited transmit strategy is safe. It is already implemented in | limited transmit strategy is safe. It is already implemented in | |||
widely deployed TCPs [SK02]. | widely deployed TCPs [SK02]. | |||
Other alternatives for responding to spurious fast retransmits are | Other alternatives for responding to spurious fast retransmits are | |||
discussed in [BA02a]. | discussed in [BA02a]. | |||
2.4 Reverting Congestion Control State (step ReCC) | 3.4 Reverting Congestion Control State (step ReCC) | |||
When a TCP sender enters loss recovery, it also assumes that is has | When a TCP sender enters loss recovery, it also assumes that is has | |||
received a congestion indication. In response to that it reduces | received a congestion indication. In response to that it reduces | |||
cwnd, and ssthresh. However, once the TCP sender detects that the | cwnd, and ssthresh. However, once the TCP sender detects that the | |||
loss recovery has been falsely triggered, this reduction was | loss recovery has been falsely triggered, this reduction was | |||
unnecessary. In fact, no congestion signal has been received. We | unnecessary. In fact, no congestion signal has been received. We | |||
therefore believe that it is safe to revert to the previous | therefore believe that it is safe to revert to the previous | |||
congestion control state. | congestion control state. | |||
We suggest to restore cwnd to the minimum of the previous FlightSize, | We suggest to restore cwnd to the minimum of the previous FlightSize, | |||
skipping to change at page 9, line 5 | skipping to change at page 8, line 39 | |||
segment. Allowing three timeouts while still reverting congestion | segment. Allowing three timeouts while still reverting congestion | |||
control state goes beyond [RFC2581]. That standard recommends setting | control state goes beyond [RFC2581]. That standard recommends setting | |||
cwnd to no more than the restart window (one SMSS) if the TCP sender | cwnd to no more than the restart window (one SMSS) if the TCP sender | |||
has not sent data in an interval exceeding the current RTO. That is | has not sent data in an interval exceeding the current RTO. That is | |||
done to restart the ACK clock which is believed to be lost. The case | done to restart the ACK clock which is believed to be lost. The case | |||
in step (ReCC) of the Eifel response algorithm is different. Since, | in step (ReCC) of the Eifel response algorithm is different. Since, | |||
an acceptable ACK corresponding to an original transmit has finally | an acceptable ACK corresponding to an original transmit has finally | |||
returned, the TCP has reason to believe that the ACK clock was merely | returned, the TCP has reason to believe that the ACK clock was merely | |||
interrupted but has now resumed "ticking" again. | interrupted but has now resumed "ticking" again. | |||
3. Non-Conservative Advanced Loss Recovery after Spurious Timeouts | 4. Non-Conservative Advanced Loss Recovery after Spurious Timeouts | |||
A TCP sender MAY implement an optimistic form of advanced loss | A TCP sender MAY implement an optimistic form of advanced loss | |||
recovery after a spurious timeout has been detected as motivated in | recovery after a spurious timeout has been detected as motivated in | |||
this section. Such a scheme MUST be terminated after the highest | this section. Such a scheme MUST be terminated after the highest | |||
sequence number outstanding when the spurious timeout was detected | sequence number outstanding when the spurious timeout was detected | |||
has been acknowledged. | has been acknowledged. | |||
We believe that there are no problems concerning interoperability | We have studied environments where spurious timeouts and multiple | |||
with advanced loss recovery schemes such as NewReno [RFC2582], or | losses from the same flight of packets often coincide [GL02]. Note | |||
SACK-based schemes [2018], [BA02b]. This is because in case loss | that we refer to the case were the oldest outstanding segment does | |||
recovery has been initiated unnecessarily, the Eifel response | arrive at the TCP receiver but one or more packets from the remaining | |||
algorithm makes the TCP sender back out of loss recovery before those | outstanding flight are lost. We found that in such a case TCP-Reno's | |||
schemes would have a chance to kick in. | performance can even suffer if the Eifel response algorithm is | |||
operated without an advanced loss recovery scheme such as NewReno | ||||
In fact, if an optimistic loss recovery scheme is not chosen (see | [RFC2582], or SACK-based schemes [2018], [RFC***A]. The reason is | |||
below), we recommend that the Eifel response algorithm is implemented | TCP-Reno's aggressiveness after a spurious timeout. Even though it | |||
together with one of the mentioned advanced loss recovery schemes; | breaks 'packet conservation' (see Section 2.2.1) when blindly | |||
ideally a SACK-based alternative. In an environment where spurious | retransmitting all outstanding segments, it usually recovers the | |||
timeouts and back-to-back packet losses often coincide, we have found | back-to-back packet losses within a single round-trip time. On the | |||
that TCP's performance can even suffer if the Eifel response | contrary, the more conservative TCP-Reno/Eifel was forced into | |||
algorithm is operated without an advanced loss recovery scheme | another (backed-off) timeout in that case. | |||
[GL02]. | ||||
In that study, we among other variants compared TCP-Reno with and | ||||
without the Eifel response algorithm (TCP-Reno/Eifel vs. TCP-Reno), | ||||
and without an advanced loss recovery scheme for both variants. The | ||||
reason that TCP-Reno performed better in the mentioned scenario, is | ||||
its aggressiveness after a spurious timeout. Even though it breaks | ||||
'packet conservation' (see Section 2.2.1) when blindly retransmitting | ||||
all outstanding segments, it usually recovers the back-to-back packet | ||||
losses within a single round-trip time. On the contrary, the more | ||||
conservative TCP-Reno/Eifel was forced into another (backed-off) | ||||
timeout in that case. In case NewReno is chosen as the advanced loss | ||||
recovery scheme, we found that it performs better if the 'bugfix' | ||||
feature is disabled. That feature often leads the TCP sender to the | ||||
wrong decision. | ||||
However, in a more recent study [GL03], we found that those advanced | However, in a more recent study [GL03], we found that the mentioned | |||
loss recovery schemes are often too conservative to compete against | advanced loss recovery schemes are often too conservative to compete | |||
TCP-Reno's blind go-back-N in terms of quickly recovering multiple | against TCP-Reno's blind go-back-N in terms of quickly recovering | |||
losses after a spurious timeout. The problem with the NewReno scheme | multiple losses after a spurious timeout. The problem with the | |||
is that it does not exploit knowledge (e.g., provided through SACK | NewReno scheme is that it does not exploit knowledge (e.g., provided | |||
options) about which segments were lost. The problem with the | through SACK options) about which segments were lost. The problem | |||
conservative SACK-based scheme [BA02b] is that it waits for three | with the conservative SACK-based scheme [RFC***A] is that it waits | |||
SACKs before it retransmits a lost segment. This may often lead to a | for three SACKs before it retransmits a lost segment. This may often | |||
second - and in this case genuine - (potentially backed-off) timeout. | lead to a second - and in this case genuine - (potentially backed- | |||
In those cases TCP-Reno's loss recovery is often quicker due the | off) timeout. In those cases TCP-Reno's loss recovery is often | |||
blind go-back-N. This could be viewed as a disincentive to the | quicker due the blind go-back-N. This could be viewed as a | |||
deployment of the Eifel response algorithm. | disincentive to the deployment of the Eifel response algorithm. | |||
[Making TCP (even) more conservative by fixing a misbehavior in | [Making TCP (even) more conservative by fixing a misbehavior in | |||
the name of 'packet conservation' would probably at most result in | the name of 'packet conservation' would probably at most result in | |||
credits in the academic world.] | credits in the academic world.] | |||
We therefore suggest that a TCP sender MAY implement an optimistic | We therefore suggest that a TCP sender MAY implement an optimistic | |||
(non-conservative) form of advanced loss recovery after a spurious | (non-conservative) form of advanced loss recovery after a spurious | |||
timeout has been detected, if the following guidelines are met: | timeout has been detected, if the following guidelines are met: | |||
- Packet Conservation: The TCP sender may not have more segments | - Packet Conservation: The TCP sender may not have more segments | |||
skipping to change at page 10, line 25 | skipping to change at page 9, line 44 | |||
than indicated by the congestion window. | than indicated by the congestion window. | |||
- A retransmit may only be sent when a potential loss has been | - A retransmit may only be sent when a potential loss has been | |||
indicated. For example, a single duplicate ACK is such an | indicated. For example, a single duplicate ACK is such an | |||
indication; potentially with the corresponding SACK info in case | indication; potentially with the corresponding SACK info in case | |||
the SACK option is enabled for the connection. | the SACK option is enabled for the connection. | |||
We have developed and evaluated such a scheme (a variant of NewReno | We have developed and evaluated such a scheme (a variant of NewReno | |||
that exploits SACK info) in [GL03] that shows good results. | that exploits SACK info) in [GL03] that shows good results. | |||
4. IPR Considerations | 5. IPR Considerations | |||
The IETF has been notified of intellectual property rights claimed in | The IETF has been notified of intellectual property rights claimed in | |||
regard to some or all of the specification contained in this | regard to some or all of the specification contained in this | |||
document. For more information consult the online list of claimed | document. For more information consult the online list of claimed | |||
rights at http://www.ietf.org/ipr. | rights at http://www.ietf.org/ipr. | |||
The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
intellectual property or other rights that might be claimed to | intellectual property or other rights that might be claimed to | |||
pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
might or might not be available; neither does it represent that it | might or might not be available; neither does it represent that it | |||
has made any effort to identify any such rights. Information on the | has made any effort to identify any such rights. Information on the | |||
IETF's procedures with respect to rights in standards-track and | IETF's procedures with respect to rights in standards-track and | |||
standards-related documentation can be found in BCP-11. Copies of | standards-related documentation can be found in BCP-11. Copies of | |||
claims of rights made available for publication and any assurances of | claims of rights made available for publication and any assurances of | |||
licenses to be made available, or the result of an attempt made to | licenses to be made available, or the result of an attempt made to | |||
obtain a general license or permission for the use of such | obtain a general license or permission for the use of such | |||
proprietary rights by implementors or users of this specification can | proprietary rights by implementors or users of this specification can | |||
be obtained from the IETF Secretariat. | be obtained from the IETF Secretariat. | |||
5. Security Considerations | 6. Security Considerations | |||
There is a risk that TCP receivers make genuine retransmits appear to | There is a risk that a detection algorithm is fooled by spoofed ACKs | |||
the TCP sender as spurious retransmits by forging echoed timestamps. | that make genuine retransmits appear to the TCP sender as spurious | |||
This could effectively disable congestion control at the TCP sender. | retransmits. When such a detection algorithm is run together with the | |||
A reliable method to protect against that risk is to implement the | Eifel response algorithm, this could effectively disable congestion | |||
safe variant of the Eifel detection algorithm specified in [LM02]. | control at the TCP sender. Should this become a concern, the Eifel | |||
response algorithm SHOULD only be run together with detection | ||||
algorithms that are known to be safe against such "ACK spoofing | ||||
attacks". | ||||
For example, the safe variant of the Eifel detection algorithm | ||||
[RFC***B], is a reliable method to protect against this risk. | ||||
Acknowledgments | Acknowledgments | |||
Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan | Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan | |||
Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Pasi | Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Pasi | |||
Sarolahti, and Alexey Kuznetsov for very useful discussions that | Sarolahti, and Alexey Kuznetsov for very useful discussions that | |||
contributed to this work. | contributed to this work. | |||
Normative References | Normative References | |||
skipping to change at page 11, line 33 | skipping to change at page 11, line 5 | |||
[RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's | [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's | |||
Fast Recovery Algorithm, RFC 2582, April 1999. | Fast Recovery Algorithm, RFC 2582, April 1999. | |||
[RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, | [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, | |||
An Extension to the Selective Acknowledgement (SACK) Option | An Extension to the Selective Acknowledgement (SACK) Option | |||
for TCP, RFC 2883, July 2000. | for TCP, RFC 2883, July 2000. | |||
[RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High | [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High | |||
Performance, RFC 1323, May 1992. | Performance, RFC 1323, May 1992. | |||
[LM02] R. Ludwig, M. Meyer, The Eifel Detection Algorithm for TCP, | [RFC***B] R. Ludwig, M. Meyer, The Eifel Detection Algorithm for TCP, | |||
work in progress, draft-ietf-tsvwg-tcp-eifel-alg-07.txt, | RFC***B, March 2003. | |||
October 2002. | ||||
[RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective | [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective | |||
Acknowledgement Options, RFC 2018, October 1996. | Acknowledgement Options, RFC 2018, October 1996. | |||
[RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, | [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, | |||
RFC 2988, November 2000. | RFC 2988, November 2000. | |||
[RFC793] J. Postel, Transmission Control Protocol, RFC793, September | [RFC793] J. Postel, Transmission Control Protocol, RFC793, September | |||
1981. | 1981. | |||
[RFC3168] K. Ramakrishnan, S. Floyd, D. Black, The Addition of | [RFC3168] K. Ramakrishnan, S. Floyd, D. Black, The Addition of | |||
Explicit Congestion Notification (ECN) to IP, RFC 3168, | Explicit Congestion Notification (ECN) to IP, RFC 3168, | |||
September 2001 | September 2001 | |||
Informative References | Informative References | |||
[BA02a] E. Blanton, M. Allman, On Making TCP More Robust to Packet | [BA02a] E. Blanton, M. Allman, On Making TCP More Robust to Packet | |||
Reordering, ACM Computer Communication Review, Vol. 32, | Reordering, ACM Computer Communication Review, Vol. 32, | |||
No. 1, January 2002. | No. 1, January 2002. | |||
[BA02b] E. Blanton, M. Allman, A Conservative SACK-based Loss | [BA02b] E. Blanton, M. Allman, Using TCP DSACKs and SCTP Duplicate | |||
Recovery Algorithm for TCP, work in progress, draft-allman- | TSNs to Detect Spurious Retransmissions, draft-blanton- | |||
tcp-sack-13.txt, October 2002. | dsack-use-02.txt (work in progress), October 2002. | |||
[BDA03] E. Blanton, R. Dimond, M. Allman. Practices for TCP Senders | ||||
in the Face of Segment Reordering, draft-blanton-tcp- | ||||
reordering-00.txt (work in progress), February 2003.. | ||||
[RFC***A] E. Blanton, M. Allman, K. Fall, L. Wang, A Conservative | ||||
SACK-based Loss Recovery Algorithm for TCP, RFC***A, | ||||
March 2003. | ||||
[Gu01] A. Gurtov, Effect of Delays on TCP Performance, In | [Gu01] A. Gurtov, Effect of Delays on TCP Performance, In | |||
Proceedings of IFIP Personal Wireless Conference, | Proceedings of IFIP Personal Wireless Conference, | |||
August 2001. | August 2001. | |||
[GL02] A. Gurtov, R. Ludwig, Evaluating the Eifel Algorithm for | [GL02] A. Gurtov, R. Ludwig, Evaluating the Eifel Algorithm for | |||
TCP in a GPRS Network, In Proceedings of the European | TCP in a GPRS Network, In Proceedings of the European | |||
Wireless Conference, February 2002. | Wireless Conference, February 2002. | |||
[GL03] A. Gurtov, R. Ludwig, Responding to Spurious Timeouts in | [GL03] A. Gurtov, R. Ludwig, Responding to Spurious Timeouts in | |||
TCP, To Appear in Proceedings of IEEE INFOCOM 03. | TCP, In Proceedings of IEEE INFOCOM 03, . | |||
[IMLGK02] H. Inamura et. al., TCP over Second (2.5G) and Third (3G) | [RFC3481] H. Inamura, G. Montenegro, R. Ludwig, A. Gurtov, | |||
Generation Wireless Networks, work in progress, draft-ietf- | F. Khafizov, TCP over Second (2.5G) and Third (3G) | |||
pilc-2.5g3g-11.txt, July 2002. | Generation Wireless Networks, RFC3481, February 2003. | |||
[KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates | [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates | |||
in Reliable Transport Protocols, In Proceedings of ACM | in Reliable Transport Protocols, In Proceedings of ACM | |||
SIGCOMM 87. | SIGCOMM 87. | |||
[LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP | [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP | |||
Robust Against Spurious Retransmissions, ACM Computer | Robust Against Spurious Retransmissions, ACM Computer | |||
Communication Review, Vol. 30, No. 1, January 2000. | Communication Review, Vol. 30, No. 1, January 2000. | |||
[LS00] R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM | [LS00] R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM | |||
Computer Communication Review, Vol. 30, No. 3, July 2000. | Computer Communication Review, Vol. 30, No. 3, July 2000. | |||
[Lu02] R. Ludwig, Responding to Fast Timeouts in TCP, work in | ||||
progress, draft-ludwig-tsvwg-tcp-fast-timeouts-00.txt, | ||||
July 2002. | ||||
[SK02] P. Sarolahti, A. Kuznetsov, Congestion Control in Linux | [SK02] P. Sarolahti, A. Kuznetsov, Congestion Control in Linux | |||
TCP, In Proceedings of USENIX, June 2002. | TCP, In Proceedings of USENIX, June 2002. | |||
[SK03] P. Sarolahti, M. Kojo, F-RTO: A TCP RTO Recovery Algorithm | ||||
for Avoiding Unnecessary Retransmissions, draft-sarolahti- | ||||
tsvwg-tcp-frto-03.txt (work in progress), January 2003. | ||||
[WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 | [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 | |||
(The Implementation), Addison Wesley, January 1995. | (The Implementation), Addison Wesley, January 1995. | |||
[Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of | [Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of | |||
ACM SIGCOMM 88. | ACM SIGCOMM 88. | |||
Author's Address | Author's Address | |||
Reiner Ludwig | Reiner Ludwig | |||
Ericsson Research (EED) | Ericsson Research (EED) | |||
skipping to change at page 13, line 4 | skipping to change at page 12, line 36 | |||
[Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of | [Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of | |||
ACM SIGCOMM 88. | ACM SIGCOMM 88. | |||
Author's Address | Author's Address | |||
Reiner Ludwig | Reiner Ludwig | |||
Ericsson Research (EED) | Ericsson Research (EED) | |||
Ericsson Allee 1 | Ericsson Allee 1 | |||
52134 Herzogenrath, Germany | 52134 Herzogenrath, Germany | |||
Email: Reiner.Ludwig@ericsson.com | Email: Reiner.Ludwig@ericsson.com | |||
Andrei Gurtov | Andrei Gurtov | |||
Cellular Systems Development | Cellular Systems Development | |||
P.O. Box 970, FIN-00051 Sonera | P.O. Box 970, FIN-00051 Sonera | |||
Helsinki, Finland | Helsinki, Finland | |||
Phone: +358(0)20401 | Phone: +358(0)20401 | |||
Fax: +358(0)204064365 | Fax: +358(0)204064365 | |||
Email: andrei.gurtov@sonera.com | Email: andrei.gurtov@sonera.com | |||
Homepage: http://www.cs.helsinki.fi/u/gurtov | Homepage: http://www.cs.helsinki.fi/u/gurtov | |||
This Internet-Draft expires in June 2003. | This Internet-Draft expires in September 2003. | |||
End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |