draft-ietf-tsvwg-tcp-eifel-response-01.txt | draft-ietf-tsvwg-tcp-eifel-response-02.txt | |||
---|---|---|---|---|
Network Working Group Reiner Ludwig | Network Working Group Reiner Ludwig | |||
INTERNET-DRAFT Ericsson Research | INTERNET-DRAFT Ericsson Research | |||
Expires: April 2003 Andrei Gurtov | Expires: June 2003 Andrei Gurtov | |||
Sonera Corporation | Sonera Corporation | |||
October, 2002 | December, 2002 | |||
The Eifel Response Algorithm for TCP | The Eifel Response Algorithm for TCP | |||
<draft-ietf-tsvwg-tcp-eifel-response-01.txt> | <draft-ietf-tsvwg-tcp-eifel-response-02.txt> | |||
Status of this memo | Status of this memo | |||
This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that other | Task Force (IETF), its areas, and its working groups. Note that other | |||
groups may also distribute working documents as Internet-Drafts. | groups may also distribute working documents as Internet-Drafts. | |||
skipping to change at page 1, line 38 | skipping to change at page 1, line 38 | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html | http://www.ietf.org/shadow.html | |||
Abstract | Abstract | |||
The Eifel response algorithm uses the Eifel detection algorithm to | The Eifel response algorithm uses the Eifel detection algorithm to | |||
detect a posteriori whether the TCP sender has entered loss recovery | detect a posteriori whether the TCP sender has entered loss recovery | |||
unnecessarily. In response to a spurious timeout it avoids the often | unnecessarily. In response to a spurious timeout it avoids the often | |||
unnecessary go-back-N retransmits that would otherwise be sent, and | unnecessary go-back-N retransmits that would otherwise be sent, and | |||
reinitializes the RTT estimators to avoid further spurious timeouts. | adapts the retransmission timer to avoid further spurious timeouts. | |||
Likewise, it adapts the duplicate acknowledgement threshold in | Likewise, it adapts the duplicate acknowledgement threshold in | |||
response to a spurious fast retransmit. In both cases, the Eifel | response to a spurious fast retransmit. In both cases, the Eifel | |||
response algorithm restores the congestion control state in such a | response algorithm restores the congestion control state in such a | |||
way that packet bursts are avoided. | way that packet bursts are avoided. | |||
Terminology | Terminology | |||
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | |||
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | |||
document, are to be interpreted as described in [RFC2119]. | document, are to be interpreted as described in [RFC2119]. | |||
skipping to change at page 2, line 42 | skipping to change at page 2, line 42 | |||
Furthermore, we use the TCP sender state variables 'SND.UNA' and | Furthermore, we use the TCP sender state variables 'SND.UNA' and | |||
'SND.NXT' as defined in [RFC793]. SND.UNA holds the segment sequence | 'SND.NXT' as defined in [RFC793]. SND.UNA holds the segment sequence | |||
number of the oldest outstanding segment. SND.NXT holds the segment | number of the oldest outstanding segment. SND.NXT holds the segment | |||
sequence number of the next segment the TCP sender will | sequence number of the next segment the TCP sender will | |||
(re-)transmit. In addition, we define as 'SND.MAX' the segment | (re-)transmit. In addition, we define as 'SND.MAX' the segment | |||
sequence number of the next original transmit to be sent. The | sequence number of the next original transmit to be sent. The | |||
definition of SND.MAX is equivalent to the definition of snd_max in | definition of SND.MAX is equivalent to the definition of snd_max in | |||
[WS95]. | [WS95]. | |||
We use the TCP sender state variables 'cwnd' (congestion window), and | We use the TCP sender state variables 'cwnd' (congestion window), and | |||
'ssthresh' (slow start threshold), and the terms 'SMSS', and | 'ssthresh' (slow start threshold), and the terms 'SMSS', | |||
'FlightSize' as defined in [RFC2581]. FlightSize is the amount of | 'FlightSize', and 'Initial Window (IW)' as defined in [RFC2581]. | |||
outstanding data in the network, or alternatively, the difference | FlightSize is the amount of outstanding data in the network, or | |||
between SND.MAX and SND.UNA at a given point in time. We use the TCP | alternatively, the difference between SND.MAX and SND.UNA at a given | |||
sender state variables 'SRTT' and 'RTTVAR', and the term 'RTO' as | point in time. The IW is the size of the sender's congestion window | |||
defined in [RFC2988]. In addition, we assume that the TCP sender | after the three-way handshake is completed. We use the TCP sender | |||
maintains in the variable 'RTT-SAMPLE' the value of the latest round- | state variables 'SRTT' and 'RTTVAR', and the term 'RTO' as defined in | |||
trip time (RTT) measurement. | [RFC2988]. In addition, we assume that the TCP sender maintains in | |||
the variable 'RTT-SAMPLE' the value of the latest round-trip time | ||||
(RTT) measurement. | ||||
1. Introduction | 1. Introduction | |||
The Eifel response algorithm relies on the Eifel detection algorithm | The Eifel response algorithm relies on the Eifel detection algorithm | |||
defined in [LM02]. That document discusses the relevant background | defined in [LM02]. That document discusses the relevant background | |||
and motivation that also applies to this document. Hence, the reader | and motivation that also applies to this document. Hence, the reader | |||
is expected to be familiar with [LM02]. Note that alternative | is expected to be familiar with [LM02]. Note that alternative | |||
response algorithms are conceivable that could also rely on the Eifel | response algorithms are conceivable that could also rely on the Eifel | |||
detection algorithm. | detection algorithm. | |||
The Eifel response algorithm uses the Eifel detection algorithm to | The Eifel response algorithm uses the Eifel detection algorithm to | |||
detect a posteriori whether the TCP sender has entered loss recovery | detect a posteriori whether the TCP sender has entered loss recovery | |||
unnecessarily. In response to a spurious timeout it avoids the often | unnecessarily. In response to a spurious timeout it avoids the often | |||
unnecessary go-back-N retransmits that would otherwise be sent, and | unnecessary go-back-N retransmits that would otherwise be sent, and | |||
reinitializes the RTT estimators to avoid further spurious timeouts. | adapts the retransmission timer to avoid further spurious timeouts. | |||
Likewise, it adapts the duplicate acknowledgement threshold in | Likewise, it adapts the duplicate acknowledgement threshold in | |||
response to a spurious fast retransmit. In both cases, the Eifel | response to a spurious fast retransmit. In both cases, the Eifel | |||
response algorithm restores the congestion control state in such a | response algorithm restores the congestion control state in such a | |||
way that packet bursts are avoided. | way that packet bursts are avoided. | |||
2. The Eifel Response Algorithm | 2. The Eifel Response Algorithm | |||
The complete algorithm is specified in section 2.1. In sections 2.2 | The complete algorithm is specified in section 2.1. In sections 2.2 | |||
to 2.4, we motivate the different steps of the algorithm. | to 2.4, we motivate the different steps of the algorithm. | |||
skipping to change at page 3, line 42 | skipping to change at page 3, line 47 | |||
If the combined Eifel detection and response algorithm is used, the | If the combined Eifel detection and response algorithm is used, the | |||
following steps MUST be taken by the TCP sender, but only upon | following steps MUST be taken by the TCP sender, but only upon | |||
initiation of loss recovery, i.e., when either the timeout-based | initiation of loss recovery, i.e., when either the timeout-based | |||
retransmit or the fast retransmit is sent. Note: The algorithm MUST | retransmit or the fast retransmit is sent. Note: The algorithm MUST | |||
NOT be reinitiated after loss recovery has already started. In | NOT be reinitiated after loss recovery has already started. In | |||
particular, it may not be reinitiated upon subsequent timeouts for | particular, it may not be reinitiated upon subsequent timeouts for | |||
the same segment, and not upon retransmitting segments other than the | the same segment, and not upon retransmitting segments other than the | |||
oldest outstanding segment. | oldest outstanding segment. | |||
Note that steps (1)-(6) are an one-to-one copy of the Eifel detection | Steps (1)-(6) are an one-to-one copy of the Eifel detection algorithm | |||
algorithm specified in [LM02], step (0) has been added, and step | specified in [LM02], step (0) has been added, and step (RESP) from | |||
(RESP) from [LM02] has been replaced by steps (RESP)-(ReCC) given | [LM02] has been replaced by steps (RESP)-(ReCC) given below. | |||
below. | ||||
(0) Before the variables cwnd and ssthresh get updated when | (0) Before the variables cwnd and ssthresh get updated when | |||
loss recovery is initiated, set a "pipe_prev" variable as | loss recovery is initiated, set a "pipe_prev" variable as | |||
follows: | follows: | |||
pipe_prev <- max (FlightSize, ssthresh) | pipe_prev <- max (FlightSize, ssthresh) | |||
(1) Set a "SpuriousRecovery" variable to FALSE (equal 0). | (1) Set a "SpuriousRecovery" variable to FALSE (equal 0). | |||
(2) Set a "RetransmitTS" variable to the value of the | (2) Set a "RetransmitTS" variable to the value of the | |||
Timestamp Value field of the Timestamps option included in | Timestamp Value field of the Timestamps option included in | |||
the retransmit sent when loss recovery is initiated. A TCP | the retransmit sent when loss recovery is initiated. A TCP | |||
sender must ensure that RetransmitTS does not get | sender must ensure that RetransmitTS does not get | |||
overwritten as loss recovery progresses, e.g., in case of | overwritten as loss recovery progresses, e.g., in case of | |||
a second timeout and subsequent second retransmit of the | a second timeout and subsequent second retransmit of the | |||
same octet. | same octet. | |||
skipping to change at page 4, line 18 | skipping to change at page 4, line 19 | |||
sender must ensure that RetransmitTS does not get | sender must ensure that RetransmitTS does not get | |||
overwritten as loss recovery progresses, e.g., in case of | overwritten as loss recovery progresses, e.g., in case of | |||
a second timeout and subsequent second retransmit of the | a second timeout and subsequent second retransmit of the | |||
same octet. | same octet. | |||
(3) Wait for the arrival of an acceptable ACK. When an | (3) Wait for the arrival of an acceptable ACK. When an | |||
acceptable ACK has arrived proceed to step (4). | acceptable ACK has arrived proceed to step (4). | |||
(4) If the value of the Timestamp Echo Reply field of the | (4) If the value of the Timestamp Echo Reply field of the | |||
acceptable ACK's Timestamps option is smaller than the | acceptable ACK's Timestamps option is smaller than the | |||
value of the variable RetransmitTS, then proceed to step | value of RetransmitTS, then proceed to step (5), | |||
(5), | ||||
else proceed to step (DONE). | else proceed to step (DONE). | |||
(5) If the acceptable ACK does not carry a DSACK option | (5) If the acceptable ACK carries a DSACK option [RFC2883], | |||
[RFC2883], then proceed to step (6), | then proceed to step (DONE), | |||
else if during the lifetime of the TCP connection the TCP | ||||
sender has previously received an ACK with a DSACK option, | ||||
or the acceptable ACK does not acknowledge all outstanding | ||||
data, then proceed to step (6), | ||||
else proceed to step (DONE). | else proceed to step (DONE). | |||
(6) If the loss recovery has been initiated with a timeout- | (6) If the loss recovery has been initiated with a timeout- | |||
based retransmit, then set | based retransmit, then set | |||
SpuriousRecovery <- SPUR_TO (equal 1), | SpuriousRecovery <- SPUR_TO (equal 1), | |||
else set | else set | |||
SpuriousRecovery <- dupacks+1 | SpuriousRecovery <- dupacks+1 | |||
(RESP) If SpuriousRecovery equals SPUR_TO, then proceed to step | (RESP) If SpuriousRecovery equals SPUR_TO, then proceed to step | |||
(STO.1), | (STO.1), | |||
else (spurious fast retransmit) proceed to step (SFR). | else (spurious fast retransmit) proceed to step (SFR). | |||
(STO.1) Resume transmission off the top: | (STO.1) Resume transmission off the top: | |||
Set | Set | |||
SND.NXT <- SND.MAX | SND.NXT <- SND.MAX | |||
(STO.2) Reinitialize the RTT estimators: | (STO.2) Adapt the Conservativeness of the Retransmission Timer: | |||
Set | If the retransmission timer is implemented according to | |||
[RFC2988], then change the calculation of SRTT to | ||||
SRTT <- SRTT + 1/FlightSize * (RTT-SAMPLE - SRTT) | ||||
and set | ||||
SRTT <- RTT-SAMPLE | SRTT <- RTT-SAMPLE | |||
RTTVAR <- RTT-SAMPLE/2, | RTTVAR <- RTT-SAMPLE/2, | |||
recalculate the RTO, and restart the retransmission timer. | recalculate the RTO, and restart the retransmission timer, | |||
Note: Even after changing the calculation of SRTT, the | ||||
retransmission timer is considered as being | ||||
implemented according to [RFC2988]. | ||||
else adapt the conservativeness of the retransmission | ||||
timer. | ||||
Proceed to step (ReCC). | Proceed to step (ReCC). | |||
(SFR) Adapt the duplicate acknowledgement threshold: | (SFR) Adapt the duplicate acknowledgement threshold: | |||
Set | Set | |||
DupThresh <- max (DupThresh, SpuriousRecovery) | DupThresh <- max (DupThresh, SpuriousRecovery) | |||
Proceed to step (ReCC). | Proceed to step (ReCC). | |||
(ReCC) Revert the congestion control state: | (ReCC) Revert the congestion control state: | |||
If the acceptable ACK has the ECN-Echo flag [RFC3168] set | If the acceptable ACK has the ECN-Echo flag [RFC3168] set | |||
OR the TCP sender has already taken more than three | OR the TCP sender has already taken more than three | |||
timeouts for the oldest outstanding segment, then proceed | timeouts for the oldest outstanding segment, then proceed | |||
to step (DONE), | to step (DONE), | |||
else set | else set | |||
cwnd <- FlightSize + SMSS | cwnd <- min (pipe_prev, (FlightSize + IW)) | |||
ssthresh <- pipe_prev | ssthresh <- pipe_prev | |||
Note: At this point in the algorithm, the value of | ||||
FlightSize might be different from the value of FlightSize | ||||
in step (0). | ||||
Proceed to step (DONE). | Proceed to step (DONE). | |||
(DONE) No further processing. | (DONE) No further processing. | |||
2.2 Responding to Spurious Timeouts | 2.2 Responding to Spurious Timeouts | |||
2.2.1 Suppressing the Unnecessary go-back-N Retransmits (step STO.1) | 2.2.1 Suppressing the Unnecessary go-back-N Retransmits (step STO.1) | |||
Without the use of the TCP timestamps option, the TCP sender suffers | Without the use of the TCP timestamps option, the TCP sender suffers | |||
from the retransmission ambiguity problem [Zh86], [KP87]. This means | from the retransmission ambiguity problem [Zh86], [KP87]. This means | |||
skipping to change at page 5, line 54 | skipping to change at page 6, line 15 | |||
Consequently, once the TCP sender's state has been updated after the | Consequently, once the TCP sender's state has been updated after the | |||
first acceptable ACK has arrived, SND.NXT equals SND.UNA. This is | first acceptable ACK has arrived, SND.NXT equals SND.UNA. This is | |||
what causes the often unnecessary go-back-N retransmits. Now every | what causes the often unnecessary go-back-N retransmits. Now every | |||
arriving acceptable ACK that was sent in response to an original | arriving acceptable ACK that was sent in response to an original | |||
transmit will advance SND.NXT. But as long as SND.NXT is smaller than | transmit will advance SND.NXT. But as long as SND.NXT is smaller than | |||
the value that SND.MAX had when the timeout occurred, those ACKs will | the value that SND.MAX had when the timeout occurred, those ACKs will | |||
clock out retransmits; whether those segments were lost or not. | clock out retransmits; whether those segments were lost or not. | |||
In fact, during this phase the TCP sender breaks 'packet | In fact, during this phase the TCP sender breaks 'packet | |||
conservation' [Jac88]. This is because the go-back-N retransmits are | conservation' [Jac88]. This is because the go-back-N retransmits are | |||
sent during slow start. I.e., for each original packet leaving the | sent during slow start. I.e., for each original transmit leaving the | |||
network, two retransmits are sent into the network as long as SND.NXT | network, two retransmits are sent into the network as long as SND.NXT | |||
does not equal SND.MAX (see [LK00] for more detail). | does not equal SND.MAX (see [LK00] for more detail). | |||
The use of the TCP timestamps option reliably eliminates the | The use of the TCP timestamps option reliably eliminates the | |||
retransmission ambiguity problem. Thus, once the Eifel detection | retransmission ambiguity problem. Thus, once the Eifel detection | |||
algorithm detected that a timeout was spurious, it is therefore safe | algorithm detected that a timeout was spurious, it is therefore safe | |||
to let the TCP sender resume the transmission with new data. Thus, | to let the TCP sender resume the transmission with new data. Thus, | |||
the Eifel response algorithm changes the TCP sender's state by | the Eifel response algorithm changes the TCP sender's state by | |||
setting SND.NXT to SND.MAX in that case. | setting SND.NXT to SND.MAX in that case. | |||
2.2.2 Re-Initializing the RTT Estimators (step STO.2) | 2.2.2 Adapting the Retransmission Timer (step STO.2) | |||
There is currently only one retransmission timer standardized for TCP | ||||
[RFC2988]. We therefore only address that timer explicitly. Future | ||||
standards that might define alternatives to [RFC2988] should propose | ||||
similar measures to adapt the conservativeness of the retransmission | ||||
timer. | ||||
Since the timeout was spurious, the TCP sender's RTT estimators are | Since the timeout was spurious, the TCP sender's RTT estimators are | |||
likely to be off. On the other hand, since timestamps are used, a new | likely to be off. However, since timestamps are being used, a new and | |||
and valid RTT measurement (RTT-SAMPLE) can be derived from the | valid RTT measurement (RTT-SAMPLE) can be derived from the acceptable | |||
acceptable ACK. It is therefore suggested to reinitialize the RTT | ACK. It is therefore suggested to reinitialize the RTT estimators | |||
estimators from RTT-SAMPLE. | from RTT-SAMPLE. Note that this RTT-SAMPLE will be relatively large | |||
since it will include the delay spike that caused the spurious | ||||
timeout in the first place. To have the new RTO become effective, the | ||||
retransmission timer needs to be restarted. This is consistent with | ||||
[RFC2988] which recommends restarting the retransmission timer with | ||||
the arrival of an acceptable ACK. | ||||
To have the new RTO become effective, the retransmission timer needs | When the path's RTT varies largely, it is recommended to take RTT | |||
to be restarted. This is consistent with [RFC2988] which recommends | samples more frequently than only once per RTT. This allows the TCP | |||
restarting the retransmission timer with the arrival of an acceptable | sender to track changes in the RTT more closely. In particular, a TCP | |||
ACK. | sender can react more quickly to sudden increases of the RTT by | |||
sooner updating the RTO to a more conservative value. The TCP | ||||
Timestamps option [RFC1323] provides this capability, allowing the | ||||
TCP sender to sample the RTT from every segment that is acknowledged. | ||||
Using timestamps across such paths leads to a more conservative TCP | ||||
retransmission timer and reduces the risk of triggering spurious | ||||
timeouts [IMLGK02]. | ||||
On the other hand, it is known that executing the RTO calculation | ||||
defined in [RFC2988] more often than once per RTT leads to an RTO | ||||
that decays too quickly, i.e., that converges to the RTT too quickly. | ||||
This is because of the fixed gains (1/8 and 1/4) of RFC2988's RTT | ||||
estimators. When timing every segment these gains are increasingly | ||||
too large with an increasing FlightSize. This leads to the effect | ||||
that the RTT estimators "lose" their memory too soon. This is a known | ||||
conflict between [RFC2988] and [RFC1323]. Especially, a large RTO | ||||
resulting from an RTT spike will decay within one or two RTTs (e.g., | ||||
see [LS00]). Hence, simply reinitializing RFC2988's RTT estimators | ||||
from RTT-SAMPLE is probably not enough to make the retransmission | ||||
timer sufficiently conservative for at least the next couple of RTTs. | ||||
A solution for the case when every segment is timed according to | ||||
[RFC1323] is to make the gains adaptive to the FlightSize [LS00]. We | ||||
suggest to adopt this solution for at least the SRTT. | ||||
2.3 Responding to Spurious Fast Retransmits (step SFR) | 2.3 Responding to Spurious Fast Retransmits (step SFR) | |||
The assumption behind the fast retransmit algorithm [RFC2581] is that | The assumption behind the fast retransmit algorithm [RFC2581] is that | |||
a segment was lost if as many duplicate ACKs have arrived at the TCP | a segment was lost if as many duplicate ACKs have arrived at the TCP | |||
sender as indicated by DupThresh. Currently, DupThresh is specified | sender as indicated by DupThresh. Currently, DupThresh is specified | |||
as a fixed value of three [RFC2581]. That value is assumed to be | as a fixed value of three [RFC2581]. That value is assumed to be | |||
sufficiently conservative so that packet reordering and/or packet | sufficiently conservative so that packet reordering and/or packet | |||
duplication does not falsely trigger the fast retransmit algorithm. | duplication does not falsely trigger the fast retransmit algorithm. | |||
Clearly, this assumption does not hold for a particular TCP | Clearly, this assumption does not hold for a particular TCP | |||
skipping to change at page 7, line 30 | skipping to change at page 8, line 23 | |||
2.4 Reverting Congestion Control State (step ReCC) | 2.4 Reverting Congestion Control State (step ReCC) | |||
When a TCP sender enters loss recovery, it also assumes that is has | When a TCP sender enters loss recovery, it also assumes that is has | |||
received a congestion indication. In response to that it reduces | received a congestion indication. In response to that it reduces | |||
cwnd, and ssthresh. However, once the TCP sender detects that the | cwnd, and ssthresh. However, once the TCP sender detects that the | |||
loss recovery has been falsely triggered, this reduction was | loss recovery has been falsely triggered, this reduction was | |||
unnecessary. In fact, no congestion signal has been received. We | unnecessary. In fact, no congestion signal has been received. We | |||
therefore believe that it is safe to revert to the previous | therefore believe that it is safe to revert to the previous | |||
congestion control state. | congestion control state. | |||
To avoid packet bursts, we suggest to restore cwnd to the amount of | We suggest to restore cwnd to the minimum of the previous FlightSize, | |||
data currently outstanding in the network plus one SMSS. That will | and the current FlightSize plus IW. The latter avoids large packet | |||
allow no more than a single packet to be clocked out by the first | bursts that may occur with less careful variants for restoring | |||
acceptable ACK. In addition, we suggest to restore ssthresh to | congestion control state. For example, the original proposal [LK00] | |||
pipe_prev, i.e., the maximum of the previous value of ssthresh and | typically causes large bursts after packet reordering. The current | |||
the value that FlightSize had when loss recovery was unnecessarily | proposal limits a potential packet burst to IW, which is considered | |||
entered. As a result, the TCP sender either immediately resumes | an acceptable burst size. It is the amount of data that a TCP sender | |||
probing the network for more bandwidth in congestion avoidance, or it | may send into a yet "unprobed" network at the beginning of a | |||
first slow starts until it has reached its previous share of the | connection. | |||
available bandwidth. | ||||
In addition, we suggest to restore ssthresh to pipe_prev, i.e., the | ||||
maximum of the previous value of ssthresh and the value that | ||||
FlightSize had when loss recovery was unnecessarily entered. As a | ||||
result, the TCP sender either immediately resumes probing the network | ||||
for more bandwidth in congestion avoidance, or it first slow starts | ||||
until it has reached its previous share of the available bandwidth. | ||||
Clearly, when the acceptable ACK signals congestion through the | Clearly, when the acceptable ACK signals congestion through the | |||
ECN-Echo flag [RFC3168], the TCP sender MUST refrain from reverting | ECN-Echo flag [RFC3168], the TCP sender MUST refrain from reverting | |||
congestion control state. The same is true if the TCP sender has | congestion control state. The same is true if the TCP sender has | |||
already taken more than three timeouts for the oldest outstanding | already taken more than three timeouts for the oldest outstanding | |||
segment. Allowing three timeouts while still reverting congestion | segment. Allowing three timeouts while still reverting congestion | |||
control state goes beyond [RFC2581]. That standard recommends setting | control state goes beyond [RFC2581]. That standard recommends setting | |||
cwnd to no more than the restart window (one SMSS) if the TCP sender | cwnd to no more than the restart window (one SMSS) if the TCP sender | |||
has not sent data in an interval exceeding the current RTO. That is | has not sent data in an interval exceeding the current RTO. That is | |||
done to restart the ACK clock which is believed to be lost. The case | done to restart the ACK clock which is believed to be lost. The case | |||
in step (ReCC) of the Eifel response algorithm is different. Since, | in step (ReCC) of the Eifel response algorithm is different. Since, | |||
an acceptable ACK corresponding to an original transmit has finally | an acceptable ACK corresponding to an original transmit has finally | |||
returned, the TCP has reason to believe that the ACK clock was merely | returned, the TCP has reason to believe that the ACK clock was merely | |||
interrupted but has now resumed "ticking" again. | interrupted but has now resumed "ticking" again. | |||
3. Interoperability with Advanced Loss Recovery Schemes | 3. Non-Conservative Advanced Loss Recovery after Spurious Timeouts | |||
A TCP sender MAY implement an optimistic form of advanced loss | ||||
recovery after a spurious timeout has been detected as motivated in | ||||
this section. Such a scheme MUST be terminated after the highest | ||||
sequence number outstanding when the spurious timeout was detected | ||||
has been acknowledged. | ||||
We believe that there are no problems concerning interoperability | We believe that there are no problems concerning interoperability | |||
with advanced loss recovery schemes such as NewReno [RFC2582], or | with advanced loss recovery schemes such as NewReno [RFC2582], or | |||
SACK-based schemes [2018], [BA02b]. This is because in case loss | SACK-based schemes [2018], [BA02b]. This is because in case loss | |||
recovery has been initiated unnecessarily, the Eifel response | recovery has been initiated unnecessarily, the Eifel response | |||
algorithm makes the TCP sender back out of loss recovery before those | algorithm makes the TCP sender back out of loss recovery before those | |||
schemes would have a chance to kick in. | schemes would have a chance to kick in. | |||
In fact, we recommend that the Eifel response algorithm is | In fact, if an optimistic loss recovery scheme is not chosen (see | |||
implemented together with one of those advanced loss recovery | below), we recommend that the Eifel response algorithm is implemented | |||
schemes; ideally a SACK-based alternative. In an environment where | together with one of the mentioned advanced loss recovery schemes; | |||
spurious timeouts and back-to-back packet losses often coincide, we | ideally a SACK-based alternative. In an environment where spurious | |||
have found that TCP's performance can even suffer if the Eifel | timeouts and back-to-back packet losses often coincide, we have found | |||
response algorithm is operated without an advanced loss recovery | that TCP's performance can even suffer if the Eifel response | |||
scheme [GL02]. | algorithm is operated without an advanced loss recovery scheme | |||
[GL02]. | ||||
In that study, we among other variants compared TCP-Reno with and | In that study, we among other variants compared TCP-Reno with and | |||
without the Eifel response algorithm (TCP-Reno/Eifel vs. TCP-Reno), | without the Eifel response algorithm (TCP-Reno/Eifel vs. TCP-Reno), | |||
and without an advanced loss recovery scheme for both variants. The | and without an advanced loss recovery scheme for both variants. The | |||
reason that TCP-Reno performed better in the mentioned scenario, is | reason that TCP-Reno performed better in the mentioned scenario, is | |||
its aggressiveness after a spurious timeout. Even though it breaks | its aggressiveness after a spurious timeout. Even though it breaks | |||
'packet conservation' (see Section 2.2.1) when blindly retransmitting | 'packet conservation' (see Section 2.2.1) when blindly retransmitting | |||
all outstanding segments, it usually recovers the back-to-back packet | all outstanding segments, it usually recovers the back-to-back packet | |||
losses within a single round-trip time. On the contrary, the more | losses within a single round-trip time. On the contrary, the more | |||
conservative TCP-Reno/Eifel was forced into another (backed-off) | conservative TCP-Reno/Eifel was forced into another (backed-off) | |||
timeout in that case. In the study, we found that the best end-to-end | timeout in that case. In case NewReno is chosen as the advanced loss | |||
performance was achieved when the TCP sender implemented both the | recovery scheme, we found that it performs better if the 'bugfix' | |||
Eifel response algorithm and SACK-based loss recovery. In case | feature is disabled. That feature often leads the TCP sender to the | |||
NewReno is chosen as the advanced loss recovery scheme, we found that | wrong decision. | |||
it performs better if the 'bugfix' feature is disabled. That feature | ||||
often leads the TCP sender to the wrong decision. | ||||
4. Security Considerations | However, in a more recent study [GL03], we found that those advanced | |||
loss recovery schemes are often too conservative to compete against | ||||
TCP-Reno's blind go-back-N in terms of quickly recovering multiple | ||||
losses after a spurious timeout. The problem with the NewReno scheme | ||||
is that it does not exploit knowledge (e.g., provided through SACK | ||||
options) about which segments were lost. The problem with the | ||||
conservative SACK-based scheme [BA02b] is that it waits for three | ||||
SACKs before it retransmits a lost segment. This may often lead to a | ||||
second - and in this case genuine - (potentially backed-off) timeout. | ||||
In those cases TCP-Reno's loss recovery is often quicker due the | ||||
blind go-back-N. This could be viewed as a disincentive to the | ||||
deployment of the Eifel response algorithm. | ||||
[Making TCP (even) more conservative by fixing a misbehavior in | ||||
the name of 'packet conservation' would probably at most result in | ||||
credits in the academic world.] | ||||
We therefore suggest that a TCP sender MAY implement an optimistic | ||||
(non-conservative) form of advanced loss recovery after a spurious | ||||
timeout has been detected, if the following guidelines are met: | ||||
- Packet Conservation: The TCP sender may not have more segments | ||||
(counting both original transmits and retransmits) in flight | ||||
than indicated by the congestion window. | ||||
- A retransmit may only be sent when a potential loss has been | ||||
indicated. For example, a single duplicate ACK is such an | ||||
indication; potentially with the corresponding SACK info in case | ||||
the SACK option is enabled for the connection. | ||||
We have developed and evaluated such a scheme (a variant of NewReno | ||||
that exploits SACK info) in [GL03] that shows good results. | ||||
4. IPR Considerations | ||||
The IETF has been notified of intellectual property rights claimed in | ||||
regard to some or all of the specification contained in this | ||||
document. For more information consult the online list of claimed | ||||
rights at http://www.ietf.org/ipr. | ||||
The IETF takes no position regarding the validity or scope of any | ||||
intellectual property or other rights that might be claimed to | ||||
pertain to the implementation or use of the technology described in | ||||
this document or the extent to which any license under such rights | ||||
might or might not be available; neither does it represent that it | ||||
has made any effort to identify any such rights. Information on the | ||||
IETF's procedures with respect to rights in standards-track and | ||||
standards-related documentation can be found in BCP-11. Copies of | ||||
claims of rights made available for publication and any assurances of | ||||
licenses to be made available, or the result of an attempt made to | ||||
obtain a general license or permission for the use of such | ||||
proprietary rights by implementors or users of this specification can | ||||
be obtained from the IETF Secretariat. | ||||
5. Security Considerations | ||||
There is a risk that TCP receivers make genuine retransmits appear to | There is a risk that TCP receivers make genuine retransmits appear to | |||
the TCP sender as spurious retransmits by forging echoed timestamps. | the TCP sender as spurious retransmits by forging echoed timestamps. | |||
This could effectively disable congestion control at the TCP sender. | This could effectively disable congestion control at the TCP sender. | |||
A reliable method to protect against that risk is to implement the | A reliable method to protect against that risk is to implement the | |||
safe variant of the Eifel detection algorithm specified in [LM02]. | safe variant of the Eifel detection algorithm specified in [LM02]. | |||
Acknowledgments | Acknowledgments | |||
Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan | Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan | |||
Baucke, Sally Floyd, Vern Paxson, Mark Allman, and Ethan Blanton for | Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Pasi | |||
very useful discussions that contributed to this work. | Sarolahti, and Alexey Kuznetsov for very useful discussions that | |||
contributed to this work. | ||||
Normative References | Normative References | |||
[RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, | [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, | |||
RFC 2581, April 1999. | RFC 2581, April 1999. | |||
[RFC3042] M. Allman, H. Balakrishnan, S. Floyd, Enhancing TCP's Loss | [RFC3042] M. Allman, H. Balakrishnan, S. Floyd, Enhancing TCP's Loss | |||
Recovery Using Limited Transmit, RFC 3042, January 2001. | Recovery Using Limited Transmit, RFC 3042, January 2001. | |||
[RFC2119] S. Bradner, Key words for use in RFCs to Indicate | [RFC2119] S. Bradner, Key words for use in RFCs to Indicate | |||
skipping to change at page 9, line 27 | skipping to change at page 11, line 34 | |||
Fast Recovery Algorithm, RFC 2582, April 1999. | Fast Recovery Algorithm, RFC 2582, April 1999. | |||
[RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, | [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, | |||
An Extension to the Selective Acknowledgement (SACK) Option | An Extension to the Selective Acknowledgement (SACK) Option | |||
for TCP, RFC 2883, July 2000. | for TCP, RFC 2883, July 2000. | |||
[RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High | [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High | |||
Performance, RFC 1323, May 1992. | Performance, RFC 1323, May 1992. | |||
[LM02] R. Ludwig, M. Meyer, The Eifel Detection Algorithm for TCP, | [LM02] R. Ludwig, M. Meyer, The Eifel Detection Algorithm for TCP, | |||
work in progress, October 2002. | work in progress, draft-ietf-tsvwg-tcp-eifel-alg-07.txt, | |||
October 2002. | ||||
[RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective | [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective | |||
Acknowledgement Options, RFC 2018, October 1996. | Acknowledgement Options, RFC 2018, October 1996. | |||
[RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, | [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, | |||
RFC 2988, November 2000. | RFC 2988, November 2000. | |||
[RFC793] J. Postel, Transmission Control Protocol, RFC793, September | [RFC793] J. Postel, Transmission Control Protocol, RFC793, September | |||
1981. | 1981. | |||
skipping to change at page 9, line 49 | skipping to change at page 12, line 6 | |||
Explicit Congestion Notification (ECN) to IP, RFC 3168, | Explicit Congestion Notification (ECN) to IP, RFC 3168, | |||
September 2001 | September 2001 | |||
Informative References | Informative References | |||
[BA02a] E. Blanton, M. Allman, On Making TCP More Robust to Packet | [BA02a] E. Blanton, M. Allman, On Making TCP More Robust to Packet | |||
Reordering, ACM Computer Communication Review, Vol. 32, | Reordering, ACM Computer Communication Review, Vol. 32, | |||
No. 1, January 2002. | No. 1, January 2002. | |||
[BA02b] E. Blanton, M. Allman, A Conservative SACK-based Loss | [BA02b] E. Blanton, M. Allman, A Conservative SACK-based Loss | |||
Recovery Algorithm for TCP, work in progress, October 2002. | Recovery Algorithm for TCP, work in progress, draft-allman- | |||
tcp-sack-13.txt, October 2002. | ||||
[Gu01] A. Gurtov, Effect of Delays on TCP Performance, In | [Gu01] A. Gurtov, Effect of Delays on TCP Performance, In | |||
Proceedings of IFIP Personal Wireless Conference, | Proceedings of IFIP Personal Wireless Conference, | |||
August 2001. | August 2001. | |||
[GL02] A. Gurtov, R. Ludwig, Evaluating the Eifel Algorithm for | [GL02] A. Gurtov, R. Ludwig, Evaluating the Eifel Algorithm for | |||
TCP in a GPRS Network, In Proceedings of the European | TCP in a GPRS Network, In Proceedings of the European | |||
Wireless Conference, February 2002. | Wireless Conference, February 2002. | |||
[GL03] A. Gurtov, R. Ludwig, Responding to Spurious Timeouts in | ||||
TCP, To Appear in Proceedings of IEEE INFOCOM 03. | ||||
[IMLGK02] H. Inamura et. al., TCP over Second (2.5G) and Third (3G) | ||||
Generation Wireless Networks, work in progress, draft-ietf- | ||||
pilc-2.5g3g-11.txt, July 2002. | ||||
[KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates | [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates | |||
in Reliable Transport Protocols, In Proceedings of ACM | in Reliable Transport Protocols, In Proceedings of ACM | |||
SIGCOMM 87. | SIGCOMM 87. | |||
[LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP | [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP | |||
Robust Against Spurious Retransmissions, ACM Computer | Robust Against Spurious Retransmissions, ACM Computer | |||
Communication Review, Vol. 30, No. 1, January 2000. | Communication Review, Vol. 30, No. 1, January 2000. | |||
[LS00] R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM | ||||
Computer Communication Review, Vol. 30, No. 3, July 2000. | ||||
[Lu02] R. Ludwig, Responding to Fast Timeouts in TCP, work in | [Lu02] R. Ludwig, Responding to Fast Timeouts in TCP, work in | |||
progress, July 2002. | progress, draft-ludwig-tsvwg-tcp-fast-timeouts-00.txt, | |||
July 2002. | ||||
[SK02] P. Sarolahti, A. Kuznetsov, Congestion Control in Linux | [SK02] P. Sarolahti, A. Kuznetsov, Congestion Control in Linux | |||
TCP, In Proceedings of USENIX, June 2002. | TCP, In Proceedings of USENIX, June 2002. | |||
[WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 | [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 | |||
(The Implementation), Addison Wesley, January 1995. | (The Implementation), Addison Wesley, January 1995. | |||
[Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of | [Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of | |||
ACM SIGCOMM 88. | ACM SIGCOMM 88. | |||
skipping to change at page 10, line 36 | skipping to change at page 13, line 4 | |||
[Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of | [Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of | |||
ACM SIGCOMM 88. | ACM SIGCOMM 88. | |||
Author's Address | Author's Address | |||
Reiner Ludwig | Reiner Ludwig | |||
Ericsson Research (EED) | Ericsson Research (EED) | |||
Ericsson Allee 1 | Ericsson Allee 1 | |||
52134 Herzogenrath, Germany | 52134 Herzogenrath, Germany | |||
Email: Reiner.Ludwig@ericsson.com | Email: Reiner.Ludwig@ericsson.com | |||
Andrei Gurtov | Andrei Gurtov | |||
Cellular Systems Development | Cellular Systems Development | |||
P.O. Box 970, FIN-00051 Sonera | P.O. Box 970, FIN-00051 Sonera | |||
Helsinki, Finland | Helsinki, Finland | |||
Phone: +358(0)20401 | Phone: +358(0)20401 | |||
Fax: +358(0)204064365 | Fax: +358(0)204064365 | |||
Email: andrei.gurtov@sonera.com | Email: andrei.gurtov@sonera.com | |||
Homepage: http://www.cs.helsinki.fi/u/gurtov | Homepage: http://www.cs.helsinki.fi/u/gurtov | |||
This Internet-Draft expires in April 2003. | This Internet-Draft expires in June 2003. | |||
End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |