draft-ietf-tcpm-rfc3782-bis-05.txt | rfc6582.txt | |||
---|---|---|---|---|
TCP Maintenance and Minor T. Henderson | Internet Engineering Task Force (IETF) T. Henderson | |||
Extensions Working Group Boeing | Request for Comments: 6582 Boeing | |||
Internet-Draft S. Floyd | Obsoletes: 3782 S. Floyd | |||
Obsoletes: 3782 (if approved) ICSI | Category: Standards Track ICSI | |||
Intended status: Standards Track A. Gurtov | ISSN: 2070-1721 A. Gurtov | |||
Expires: July 18, 2012 University of Oulu | University of Oulu | |||
Y. Nishida | Y. Nishida | |||
WIDE Project | WIDE Project | |||
January 18, 2012 | April 2012 | |||
The NewReno Modification to TCP's Fast Recovery Algorithm | The NewReno Modification to TCP's Fast Recovery Algorithm | |||
draft-ietf-tcpm-rfc3782-bis-05.txt | ||||
Abstract | Abstract | |||
RFC 5681 documents the following four intertwined TCP | RFC 5681 documents the following four intertwined TCP congestion | |||
congestion control algorithms: slow start, congestion avoidance, fast | control algorithms: slow start, congestion avoidance, fast | |||
retransmit, and fast recovery. RFC 5681 explicitly allows | retransmit, and fast recovery. RFC 5681 explicitly allows certain | |||
certain modifications of these algorithms, including modifications | modifications of these algorithms, including modifications that use | |||
that use the TCP Selective Acknowledgement (SACK) option (RFC 2883), | the TCP Selective Acknowledgment (SACK) option (RFC 2883), and | |||
and modifications that respond to "partial acknowledgments" (ACKs | modifications that respond to "partial acknowledgments" (ACKs that | |||
which cover new data, but not all the data outstanding when loss was | cover new data, but not all the data outstanding when loss was | |||
detected) in the absence of SACK. This document describes a specific | detected) in the absence of SACK. This document describes a specific | |||
algorithm for responding to partial acknowledgments, referred to as | algorithm for responding to partial acknowledgments, referred to as | |||
NewReno. This response to partial acknowledgments was first proposed | "NewReno". This response to partial acknowledgments was first | |||
by Janey Hoe. This document obsoletes RFC 3782. | proposed by Janey Hoe. This document obsoletes RFC 3782. | |||
Status of this Memo | ||||
This Internet-Draft is submitted in full conformance with the | Status of This Memo | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | This is an Internet Standards Track document. | |||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at http://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 5741. | ||||
This Internet-Draft will expire on July 18, 2012. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
http://www.rfc-editor.org/info/rfc6582. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2012 IETF Trust and the persons identified as | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
the document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
skipping to change at page 3, line 7 | skipping to change at page 2, line 34 | |||
modifications of such material outside the IETF Standards Process. | modifications of such material outside the IETF Standards Process. | |||
Without obtaining an adequate license from the person(s) controlling | Without obtaining an adequate license from the person(s) controlling | |||
the copyright in such materials, this document may not be modified | the copyright in such materials, this document may not be modified | |||
outside the IETF Standards Process, and derivative works of it may | outside the IETF Standards Process, and derivative works of it may | |||
not be created outside the IETF Standards Process, except to format | not be created outside the IETF Standards Process, except to format | |||
it for publication as an RFC or to translate it into languages other | it for publication as an RFC or to translate it into languages other | |||
than English. | than English. | |||
1. Introduction | 1. Introduction | |||
For the typical implementation of the TCP Fast Recovery algorithm | For the typical implementation of the TCP fast recovery algorithm | |||
described in [RFC5681] (first implemented in the 1990 BSD Reno | described in [RFC5681] (first implemented in the 1990 BSD Reno | |||
release, and referred to as the Reno algorithm in [FF96]), the TCP | release, and referred to as the "Reno algorithm" in [FF96]), the TCP | |||
data sender only retransmits a packet after a retransmit timeout has | data sender only retransmits a packet after a retransmit timeout has | |||
occurred, or after three duplicate acknowledgments have arrived | occurred, or after three duplicate acknowledgments have arrived | |||
triggering the Fast Retransmit algorithm. A single retransmit | triggering the fast retransmit algorithm. A single retransmit | |||
timeout might result in the retransmission of several data packets, | timeout might result in the retransmission of several data packets, | |||
but each invocation of the Fast Retransmit algorithm in RFC 5681 | but each invocation of the fast retransmit algorithm in RFC 5681 | |||
leads to the retransmission of only a single data packet. | leads to the retransmission of only a single data packet. | |||
Two problems arise with Reno TCP when multiple packet losses occur | Two problems arise with Reno TCP when multiple packet losses occur in | |||
in a single window. First, Reno will often take a timeout, as | a single window. First, Reno will often take a timeout, as has been | |||
has been documented in [Hoe95]. Second, even if a retransmission | documented in [Hoe95]. Second, even if a retransmission timeout is | |||
timeout is avoided, multiple fast retransmits and window reductions | avoided, multiple fast retransmits and window reductions can occur, | |||
can occur, as documented in [F94]. When multiple packet losses | as documented in [F94]. When multiple packet losses occur, if the | |||
occur, if the SACK option [RFC2883] is available, the TCP sender | SACK option [RFC2883] is available, the TCP sender has the | |||
has the information to make intelligent decisions about which packets | information to make intelligent decisions about which packets to | |||
to retransmit and which packets not to retransmit during Fast | retransmit and which packets not to retransmit during fast recovery. | |||
Recovery. This document applies to TCP connections that are | ||||
unable to use the TCP Selective Acknowledgement (SACK) option, | This document applies to TCP connections that are unable to use the | |||
either because the option is not locally supported or | TCP Selective Acknowledgment (SACK) option, either because the option | |||
because the TCP peer did not indicate a willingness to use SACK. | is not locally supported or because the TCP peer did not indicate a | |||
willingness to use SACK. | ||||
In the absence of SACK, there is little information available to the | In the absence of SACK, there is little information available to the | |||
TCP sender in making retransmission decisions during Fast | TCP sender in making retransmission decisions during fast recovery. | |||
Recovery. From the three duplicate acknowledgments, the sender | From the three duplicate acknowledgments, the sender infers a packet | |||
infers a packet loss, and retransmits the indicated packet. After | loss, and retransmits the indicated packet. After this, the data | |||
this, the data sender could receive additional duplicate | sender could receive additional duplicate acknowledgments, as the | |||
acknowledgments, as the data receiver acknowledges additional data | data receiver acknowledges additional data packets that were already | |||
packets that were already in flight when the sender entered Fast | in flight when the sender entered fast retransmit. | |||
Retransmit. | ||||
In the case of multiple packets dropped from a single window of data, | In the case of multiple packets dropped from a single window of data, | |||
the first new information available to the sender comes when the | the first new information available to the sender comes when the | |||
sender receives an acknowledgment for the retransmitted packet (that | sender receives an acknowledgment for the retransmitted packet (that | |||
is, the packet retransmitted when Fast Retransmit was first | is, the packet retransmitted when fast retransmit was first entered). | |||
entered). If there is a single packet drop and no reordering, then | If there is a single packet drop and no reordering, then the | |||
the acknowledgment for this packet will acknowledge all of the | acknowledgment for this packet will acknowledge all of the packets | |||
packets transmitted before Fast Retransmit was entered. However, if | transmitted before fast retransmit was entered. However, if there | |||
there are multiple packet drops, then the acknowledgment for the | are multiple packet drops, then the acknowledgment for the | |||
retransmitted packet will acknowledge some but not all of the packets | retransmitted packet will acknowledge some but not all of the packets | |||
transmitted before the Fast Retransmit. We call this acknowledgment | transmitted before the fast retransmit. We call this acknowledgment | |||
a partial acknowledgment. | a partial acknowledgment. | |||
Along with several other suggestions, [Hoe95] suggested that during | Along with several other suggestions, [Hoe95] suggested that during | |||
Fast Recovery the TCP data sender responds to a partial | fast recovery the TCP data sender respond to a partial acknowledgment | |||
acknowledgment by inferring that the next in-sequence packet has been | by inferring that the next in-sequence packet has been lost and | |||
lost, and retransmitting that packet. This document describes a | retransmitting that packet. This document describes a modification | |||
modification to the Fast Recovery algorithm in RFC 5681 that | to the fast recovery algorithm in RFC 5681 that incorporates a | |||
incorporates a response to partial acknowledgments received during | response to partial acknowledgments received during fast recovery. | |||
Fast Recovery. We call this modified Fast Recovery algorithm | We call this modified fast recovery algorithm NewReno, because it is | |||
NewReno, because it is a slight but significant variation of the | a slight but significant variation of the behavior that has been | |||
basic Reno algorithm in RFC 5681. This document does not discuss the | historically referred to as Reno. This document does not discuss the | |||
other suggestions in [Hoe95] and [Hoe96], such as a change to the | other suggestions in [Hoe95] and [Hoe96], such as a change to the | |||
ssthresh parameter during Slow-Start, or the proposal to send a new | ssthresh parameter during slow start, or the proposal to send a new | |||
packet for every two duplicate acknowledgments during Fast | packet for every two duplicate acknowledgments during fast recovery. | |||
Recovery. The version of NewReno in this document also draws on | The version of NewReno in this document also draws on other | |||
other discussions of NewReno in the literature [LM97, Hen98]. | discussions of NewReno in the literature [LM97] [Hen98]. | |||
We do not claim that the NewReno version of Fast Recovery described | We do not claim that the NewReno version of fast recovery described | |||
here is an optimal modification of Fast Recovery for responding to | here is an optimal modification of fast recovery for responding to | |||
partial acknowledgments, for TCP connections that are unable to use | partial acknowledgments, for TCP connections that are unable to use | |||
SACK. Based on our experiences with the NewReno modification in the | SACK. Based on our experiences with the NewReno modification in the | |||
NS simulator [NS] and with numerous implementations of NewReno, we | network simulator known as ns-2 [NS] and with numerous | |||
believe that this modification improves the performance of the Fast | implementations of NewReno, we believe that this modification | |||
Retransmit and Fast Recovery algorithms in a wide variety of | improves the performance of the fast retransmit and fast recovery | |||
scenarios. Previous versions of this RFC [RFC2582, RFC3782] provide | algorithms in a wide variety of scenarios. Previous versions of this | |||
simulation-based evidence of the possible performance gains. | RFC [RFC2582] [RFC3782] provide simulation-based evidence of the | |||
possible performance gains. | ||||
2. Terminology and Definitions | 2. Terminology and Definitions | |||
This document assumes that the reader is familiar with the terms | This document assumes that the reader is familiar with the terms | |||
SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and | SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and | |||
FLIGHT SIZE (FlightSize) defined in [RFC5681]. | FLIGHT SIZE (FlightSize) defined in [RFC5681]. | |||
This document defines an additional sender-side state variable | This document defines an additional sender-side state variable called | |||
called RECOVER: | "recover": | |||
RECOVER: | recover: | |||
When in Fast Recovery, this variable records the send sequence | When in fast recovery, this variable records the send sequence | |||
number that must be acknowledged before the Fast Recovery | number that must be acknowledged before the fast recovery | |||
procedure is declared to be over. | procedure is declared to be over. | |||
3. The Fast Retransmit and Fast Recovery Algorithms in NewReno | 3. The Fast Retransmit and Fast Recovery Algorithms in NewReno | |||
3.1. Protocol Overview | 3.1. Protocol Overview | |||
The basic idea of these extensions to the Fast Retransmit and | The basic idea of these extensions to the fast retransmit and fast | |||
Fast Recovery algorithms described in Section 3.2 of [RFC5681] | recovery algorithms described in Section 3.2 of [RFC5681] is as | |||
is as follows. The TCP sender can infer, from the arrival of | follows. The TCP sender can infer, from the arrival of duplicate | |||
duplicate acknowledgments, whether multiple losses in the same | acknowledgments, whether multiple losses in the same window of data | |||
window of data have most likely occurred, and avoid taking a | have most likely occurred, and avoid taking a retransmit timeout or | |||
retransmit timeout or making multiple congestion window reductions | making multiple congestion window reductions due to such an event. | |||
due to such an event. | ||||
The NewReno modification applies to the Fast Recovery procedure that | The NewReno modification applies to the fast recovery procedure that | |||
begins when three duplicate ACKs are received and ends when either a | begins when three duplicate ACKs are received and ends when either a | |||
retransmission timeout occurs or an ACK arrives that acknowledges all | retransmission timeout occurs or an ACK arrives that acknowledges all | |||
of the data up to and including the data that was outstanding when | of the data up to and including the data that was outstanding when | |||
the Fast Recovery procedure began. | the fast recovery procedure began. | |||
3.2. Specification | 3.2. Specification | |||
The procedures specified in Section 3.2 of [RFC5681] are followed | The procedures specified in Section 3.2 of [RFC5681] are followed, | |||
with the following modifications. Note that this specification | with the modifications listed below. Note that this specification | |||
avoids the use of the key words defined in RFC 2119 [RFC2119] since | avoids the use of the key words defined in RFC 2119 [RFC2119], since | |||
it mainly provides sender-side implementation guidance for | it mainly provides sender-side implementation guidance for | |||
performance improvement, and does not affect interoperability. | performance improvement, and does not affect interoperability. | |||
1) Initialization of TCP protocol control block: | 1) Initialization of TCP protocol control block: | |||
When the TCP protocol control block is initialized, Recover is | When the TCP protocol control block is initialized, recover is | |||
set to the initial send sequence number. | set to the initial send sequence number. | |||
2) Three duplicate ACKs: | 2) Three duplicate ACKs: | |||
When the third duplicate ACK is received, the TCP sender first | When the third duplicate ACK is received, the TCP sender first | |||
checks the value of Recover to see if the Cumulative | checks the value of recover to see if the Cumulative | |||
Acknowledgment field covers more than Recover. If so, the value | Acknowledgment field covers more than recover. If so, the value | |||
of Recover is incremented to the value of the highest sequence | of recover is incremented to the value of the highest sequence | |||
number transmitted by the TCP so far. The TCP then enters Fast | number transmitted by the TCP so far. The TCP then enters fast | |||
Retransmit (step 2 of Section 3.2 of [RFC5681]). If not, the TCP | retransmit (step 2 of Section 3.2 of [RFC5681]). If not, the TCP | |||
does not enter fast retransmit and does not reset ssthresh. | does not enter fast retransmit and does not reset ssthresh. | |||
3) Response to newly acknowledged data: | 3) Response to newly acknowledged data: | |||
Step 6 of [RFC5681] specifies the response to the next ACK that | Step 6 of [RFC5681] specifies the response to the next ACK that | |||
acknowledges previously unacknowledged data. When an ACK | acknowledges previously unacknowledged data. When an ACK arrives | |||
arrives that acknowledges new data, this ACK could be the | that acknowledges new data, this ACK could be the acknowledgment | |||
acknowledgment elicited by the retransmission from step 2, or | elicited by the initial retransmission from fast retransmit, or | |||
elicited by a later retransmission. There are two cases. | elicited by a later retransmission. There are two cases: | |||
Full acknowledgments: | Full acknowledgments: | |||
If this ACK acknowledges all of the data up to and including | If this ACK acknowledges all of the data up to and including | |||
Recover, then the ACK acknowledges all the intermediate | recover, then the ACK acknowledges all the intermediate segments | |||
segments sent between the original transmission of the lost | sent between the original transmission of the lost segment and | |||
segment and the receipt of the third duplicate ACK. Set cwnd to | the receipt of the third duplicate ACK. Set cwnd to either (1) | |||
either (1) min (ssthresh, max(FlightSize, SMSS) + SMSS) or | min (ssthresh, max(FlightSize, SMSS) + SMSS) or (2) ssthresh, | |||
(2) ssthresh, where ssthresh is the value set when Fast | where ssthresh is the value set when fast retransmit was entered, | |||
Retransmit was entered, and where FlightSize in (1) is the amount | and where FlightSize in (1) is the amount of data presently | |||
of data presently outstanding. This is termed "deflating" the | outstanding. This is termed "deflating" the window. If the | |||
window. If the second option is selected, the implementation | second option is selected, the implementation is encouraged to | |||
is encouraged to take measures to avoid a possible burst of | take measures to avoid a possible burst of data, in case the | |||
data, in case the amount of data outstanding in the network is | amount of data outstanding in the network is much less than the | |||
much less than the new congestion window allows. A simple | new congestion window allows. A simple mechanism is to limit the | |||
mechanism is to limit the number of data packets that can be sent | number of data packets that can be sent in response to a single | |||
in response to a single acknowledgment. Exit the Fast Recovery | acknowledgment. Exit the fast recovery procedure. | |||
procedure. | ||||
Partial acknowledgments: | Partial acknowledgments: | |||
If this ACK does *not* acknowledge all of the data up to and | If this ACK does *not* acknowledge all of the data up to and | |||
including Recover, then this is a partial ACK. In this case, | including recover, then this is a partial ACK. In this case, | |||
retransmit the first unacknowledged segment. Deflate the | retransmit the first unacknowledged segment. Deflate the | |||
congestion window by the amount of new data acknowledged by the | congestion window by the amount of new data acknowledged by the | |||
cumulative acknowledgment field. If the partial ACK | Cumulative Acknowledgment field. If the partial ACK acknowledges | |||
acknowledges at least one SMSS of new data, then add back SMSS | at least one SMSS of new data, then add back SMSS bytes to the | |||
bytes to the congestion window. This artificially | congestion window. This artificially inflates the congestion | |||
inflates the congestion window in order to reflect the additional | window in order to reflect the additional segment that has left | |||
segment that has left the network. Send a new segment if | the network. Send a new segment if permitted by the new value of | |||
permitted by the new value of cwnd. This "partial window | cwnd. This "partial window deflation" attempts to ensure that, | |||
deflation" attempts to ensure that, when Fast Recovery eventually | when fast recovery eventually ends, approximately ssthresh amount | |||
ends, approximately ssthresh amount of data will be outstanding | of data will be outstanding in the network. Do not exit the fast | |||
in the network. Do not exit the Fast Recovery procedure (i.e., | recovery procedure (i.e., if any duplicate ACKs subsequently | |||
if any duplicate ACKs subsequently arrive, execute Step 4 of | arrive, execute step 4 of Section 3.2 of [RFC5681]). | |||
Section 3.2 of [RFC5681]. | ||||
For the first partial ACK that arrives during Fast Recovery, also | For the first partial ACK that arrives during fast recovery, also | |||
reset the retransmit timer. Timer management is discussed in | reset the retransmit timer. Timer management is discussed in | |||
more detail in Section 4. | more detail in Section 4. | |||
4) Retransmit timeouts: | 4) Retransmit timeouts: | |||
After a retransmit timeout, record the highest sequence number | After a retransmit timeout, record the highest sequence number | |||
transmitted in the variable Recover and exit the Fast | transmitted in the variable recover, and exit the fast recovery | |||
Recovery procedure if applicable. | procedure if applicable. | |||
Step 2 above specifies a check that the Cumulative Acknowledgment | Step 2 above specifies a check that the Cumulative Acknowledgment | |||
field covers more than Recover. Because the acknowledgment field | field covers more than recover. Because the acknowledgment field | |||
contains the sequence number that the sender next expects to receive, | contains the sequence number that the sender next expects to receive, | |||
the acknowledgment "ack_number" covers more than Recover when: | the acknowledgment "ack_number" covers more than recover when | |||
ack_number - 1 > Recover; | ack_number - 1 > recover; | |||
i.e., at least one byte more of data is acknowledged beyond the | i.e., at least one byte more of data is acknowledged beyond the | |||
highest byte that was outstanding when Fast Retransmit was last | highest byte that was outstanding when fast retransmit was last | |||
entered. | entered. | |||
Note that in Step 3 above, the congestion window is deflated after | Note that in step 3 above, the congestion window is deflated after a | |||
a partial acknowledgment is received. The congestion window was | partial acknowledgment is received. The congestion window was likely | |||
likely to have been inflated considerably when the partial | to have been inflated considerably when the partial acknowledgment | |||
acknowledgment was received. In addition, depending on the original | was received. In addition, depending on the original pattern of | |||
pattern of packet losses, the partial acknowledgment might | packet losses, the partial acknowledgment might acknowledge nearly a | |||
acknowledge nearly a window of data. In this case, if the congestion | window of data. In this case, if the congestion window was not | |||
window was not deflated, the data sender might be able to send nearly | deflated, the data sender might be able to send nearly a window of | |||
a window of data back-to-back. | data back-to-back. | |||
This document does not specify the sender's response to duplicate | This document does not specify the sender's response to duplicate | |||
ACKs when the Fast Retransmit/Fast Recovery algorithm is not | ACKs when the fast retransmit/fast recovery algorithm is not invoked. | |||
invoked. This is addressed in other documents, such as those | This is addressed in other documents, such as those describing the | |||
describing the Limited Transmit procedure [RFC3042]. This document | Limited Transmit procedure [RFC3042]. This document also does not | |||
also does not address issues of adjusting the duplicate | address issues of adjusting the duplicate acknowledgment threshold, | |||
acknowledgment threshold, but assumes the threshold specified in | but assumes the threshold specified in the IETF standards; the | |||
the IETF standards; the current standard is [RFC5681], which | current standard is [RFC5681], which specifies a threshold of three | |||
specifies a threshold of three duplicate acknowledgments. | duplicate acknowledgments. | |||
As a final note, we would observe that in the absence of the SACK | As a final note, we would observe that in the absence of the SACK | |||
option, the data sender is working from limited information. When | option, the data sender is working from limited information. When | |||
the issue of recovery from multiple dropped packets from a single | the issue of recovery from multiple dropped packets from a single | |||
window of data is of particular importance, the best alternative | window of data is of particular importance, the best alternative | |||
would be to use the SACK option. | would be to use the SACK option. | |||
4. Handling Duplicate Acknowledgments After A Timeout | 4. Handling Duplicate Acknowledgments after a Timeout | |||
After each retransmit timeout, the highest sequence number | After each retransmit timeout, the highest sequence number | |||
transmitted so far is recorded in the variable "recover". | transmitted so far is recorded in the variable recover. If, after a | |||
If, after a retransmit timeout, the TCP data sender retransmits three | retransmit timeout, the TCP data sender retransmits three consecutive | |||
consecutive packets that have already been received by the data | packets that have already been received by the data receiver, then | |||
receiver, then the TCP data sender will receive three duplicate | the TCP data sender will receive three duplicate acknowledgments that | |||
acknowledgments that do not cover more than "recover". In this | do not cover more than recover. In this case, the duplicate | |||
case, the duplicate acknowledgments are not an indication of a new | acknowledgments are not an indication of a new instance of | |||
instance of congestion. They are simply an indication that the | congestion. They are simply an indication that the sender has | |||
sender has unnecessarily retransmitted at least three packets. | unnecessarily retransmitted at least three packets. | |||
However, when a retransmitted packet is itself dropped, the sender | However, when a retransmitted packet is itself dropped, the sender | |||
can also receive three duplicate acknowledgments that do not cover | can also receive three duplicate acknowledgments that do not cover | |||
more than "recover". In this case, the sender would have been | more than recover. In this case, the sender would have been better | |||
better off if it had initiated Fast Retransmit. For a TCP sender | off if it had initiated fast retransmit. For a TCP sender that | |||
that implements the algorithm specified in Section 3.2 of this | implements the algorithm specified in Section 3.2 of this document, | |||
document, the sender does not infer a packet drop from duplicate | the sender does not infer a packet drop from duplicate | |||
acknowledgments in this scenario. As always, the retransmit timer | acknowledgments in this scenario. As always, the retransmit timer is | |||
is the backup mechanism for inferring packet loss in this case. | the backup mechanism for inferring packet loss in this case. | |||
There are several heuristics, based on timestamps or on the amount of | There are several heuristics, based on timestamps or on the amount of | |||
advancement of the cumulative acknowledgment field, that allow the | advancement of the Cumulative Acknowledgment field, that allow the | |||
sender to distinguish, in some cases, between three duplicate | sender to distinguish, in some cases, between three duplicate | |||
acknowledgments following a retransmitted packet that was dropped, | acknowledgments following a retransmitted packet that was dropped, | |||
and three duplicate acknowledgments from the unnecessary | and three duplicate acknowledgments from the unnecessary | |||
retransmission of three packets [Gur03, GF04]. The TCP sender may | retransmission of three packets [Gur03] [GF04]. The TCP sender may | |||
use such a heuristic to decide to invoke a Fast Retransmit in some | use such a heuristic to decide to invoke a fast retransmit in some | |||
cases, even when the three duplicate acknowledgments do not cover | cases, even when the three duplicate acknowledgments do not cover | |||
more than "recover". | more than recover. | |||
For example, when three duplicate acknowledgments are caused by the | For example, when three duplicate acknowledgments are caused by the | |||
unnecessary retransmission of three packets, this is likely to be | unnecessary retransmission of three packets, this is likely to be | |||
accompanied by the cumulative acknowledgment field advancing by at | accompanied by the Cumulative Acknowledgment field advancing by at | |||
least four segments. Similarly, a heuristic based on timestamps uses | least four segments. Similarly, a heuristic based on timestamps uses | |||
the fact that when there is a hole in the sequence space, the | the fact that when there is a hole in the sequence space, the | |||
timestamp echoed in the duplicate acknowledgment is the timestamp of | timestamp echoed in the duplicate acknowledgment is the timestamp of | |||
the most recent data packet that advanced the cumulative | the most recent data packet that advanced the Cumulative | |||
acknowledgment field [RFC1323]. If timestamps are used, and the | Acknowledgment field [RFC1323]. If timestamps are used, and the | |||
sender stores the timestamp of the last acknowledged segment, then | sender stores the timestamp of the last acknowledged segment, then | |||
the timestamp echoed by duplicate acknowledgments can be used to | the timestamp echoed by duplicate acknowledgments can be used to | |||
distinguish between a retransmitted packet that was dropped and | distinguish between a retransmitted packet that was dropped and three | |||
three duplicate acknowledgments from the unnecessary | duplicate acknowledgments from the unnecessary retransmission of | |||
retransmission of three packets. | three packets. | |||
4.1. ACK Heuristic | 4.1. ACK Heuristic | |||
If the ACK-based heuristic is used, then following the advancement of | If the ACK-based heuristic is used, then following the advancement of | |||
the cumulative acknowledgment field, the sender stores the value of | the Cumulative Acknowledgment field, the sender stores the value of | |||
the previous cumulative acknowledgment as prev_highest_ack, and | the previous cumulative acknowledgment as prev_highest_ack, and | |||
stores the latest cumulative ACK as highest_ack. In addition, the | stores the latest cumulative ACK as highest_ack. In addition, the | |||
following check is performed if, in Step 2 of Section 3.2, the | following check is performed if, in step 2 of Section 3.2, the | |||
Cumulative Acknowledgment field does not cover more than "recover". | Cumulative Acknowledgment field does not cover more than recover. | |||
1*) If the Cumulative Acknowledgment field didn't cover more than | 2*) If the Cumulative Acknowledgment field didn't cover more than | |||
"recover", check to see if the congestion window is greater | recover, check to see if the congestion window is greater than | |||
than SMSS bytes and the difference between highest_ack and | SMSS bytes and the difference between highest_ack and | |||
prev_highest_ack is at most 4*SMSS bytes. If true, duplicate | prev_highest_ack is at most 4*SMSS bytes. If true, duplicate | |||
ACKs indicate a lost segment (enter Fast Retransmit). | ACKs indicate a lost segment (enter fast retransmit). | |||
Otherwise, duplicate ACKs likely result from unnecessary | Otherwise, duplicate ACKs likely result from unnecessary | |||
retransmissions (do not enter Fast Retransmit). | retransmissions (do not enter fast retransmit). | |||
The congestion window check serves to protect against fast retransmit | The congestion window check serves to protect against fast retransmit | |||
immediately after a retransmit timeout. | immediately after a retransmit timeout. | |||
If several ACKs are lost, the sender can see a jump in the cumulative | If several ACKs are lost, the sender can see a jump in the cumulative | |||
ACK of more than three segments, and the heuristic can fail. | ACK of more than three segments, and the heuristic can fail. | |||
[RFC5681] recommends that a receiver should | [RFC5681] recommends that a receiver should send duplicate ACKs for | |||
send duplicate ACKs for every out-of-order data packet, such as a | every out-of-order data packet, such as a data packet received during | |||
data packet received during Fast Recovery. The ACK heuristic is more | fast recovery. The ACK heuristic is more likely to fail if the | |||
likely to fail if the receiver does not follow this advice, because | receiver does not follow this advice, because then a smaller number | |||
then a smaller number of ACK losses are needed to produce a | of ACK losses are needed to produce a sufficient jump in the | |||
sufficient jump in the cumulative ACK. | cumulative ACK. | |||
4.2. Timestamp Heuristic | 4.2. Timestamp Heuristic | |||
If this heuristic is used, the sender stores the timestamp of the | If this heuristic is used, the sender stores the timestamp of the | |||
last acknowledged segment. In addition, the last sentence of step | last acknowledged segment. In addition, the last sentence of step 2 | |||
2 in Section 3.2 is replaced as follows: | in Section 3.2 of this document is replaced as follows: | |||
1**) If the Cumulative Acknowledgment field didn't cover more than | 2**) If the Cumulative Acknowledgment field didn't cover more than | |||
"recover", check to see if the echoed timestamp in the last | recover, check to see if the echoed timestamp in the last | |||
non-duplicate acknowledgment equals the | non-duplicate acknowledgment equals the stored timestamp. If | |||
stored timestamp. If true, duplicate ACKs indicate a lost | true, duplicate ACKs indicate a lost segment (enter fast | |||
segment (enter Fast Retransmit). Otherwise, duplicate | retransmit). Otherwise, duplicate ACKs likely result from | |||
ACKs likely result from unnecessary retransmissions (do not | unnecessary retransmissions (do not enter fast retransmit). | |||
enter Fast Retransmit). | ||||
The timestamp heuristic works correctly, both when the receiver | The timestamp heuristic works correctly, both when the receiver | |||
echoes timestamps as specified by [RFC1323], and by its revision | echoes timestamps, as specified by [RFC1323], and by its revision | |||
attempts. However, if the receiver arbitrarily echoes timestamps, | attempts. However, if the receiver arbitrarily echoes timestamps, | |||
the heuristic can fail. The heuristic can also fail if a timeout was | the heuristic can fail. The heuristic can also fail if a timeout was | |||
spurious and returning ACKs are not from retransmitted segments. | spurious and returning ACKs are not from retransmitted segments. | |||
This can be prevented by detection algorithms such as [RFC3522]. | This can be prevented by detection algorithms such as the Eifel | |||
detection algorithm [RFC3522]. | ||||
5. Implementation Issues for the Data Receiver | 5. Implementation Issues for the Data Receiver | |||
[RFC5681] specifies that "Out-of-order data segments SHOULD be | [RFC5681] specifies that "Out-of-order data segments SHOULD be | |||
acknowledged immediately, in order to accelerate loss recovery." | acknowledged immediately, in order to accelerate loss recovery". | |||
Neal Cardwell has noted that some data receivers do not send an | Neal Cardwell has noted that some data receivers do not send an | |||
immediate acknowledgment when they send a partial acknowledgment, | immediate acknowledgment when they send a partial acknowledgment, but | |||
but instead wait first for their delayed acknowledgment timer to | instead wait first for their delayed acknowledgment timer to expire | |||
expire [C98]. As [C98] notes, this severely limits the potential | [C98]. As [C98] notes, this severely limits the potential benefit of | |||
benefit of NewReno by delaying the receipt of the partial | NewReno by delaying the receipt of the partial acknowledgment at the | |||
acknowledgment at the data sender. Echoing [RFC5681], our | data sender. Echoing [RFC5681], our recommendation is that the data | |||
recommendation is that the data receiver send an immediate | receiver send an immediate acknowledgment for an out-of-order | |||
acknowledgment for an out-of-order segment, even when that | segment, even when that out-of-order segment fills a hole in the | |||
out-of-order segment fills a hole in the buffer. | buffer. | |||
6. Implementation Issues for the Data Sender | 6. Implementation Issues for the Data Sender | |||
In Section 3, Step 5 above, it is noted that implementations should | In Section 3.2, step 3 above, it is noted that implementations should | |||
take measures to avoid a possible burst of data when leaving Fast | take measures to avoid a possible burst of data when leaving fast | |||
Recovery, in case the amount of new data that the sender is eligible | recovery, in case the amount of new data that the sender is eligible | |||
to send due to the new value of the congestion window is large. This | to send due to the new value of the congestion window is large. This | |||
can arise during NewReno when ACKs are lost or treated as pure window | can arise during NewReno when ACKs are lost or treated as pure window | |||
updates, thereby causing the sender to underestimate the number of | updates, thereby causing the sender to underestimate the number of | |||
new segments that can be sent during the recovery procedure. | new segments that can be sent during the recovery procedure. | |||
Specifically, bursts can occur when the FlightSize is much less than | Specifically, bursts can occur when the FlightSize is much less than | |||
the new congestion window when exiting from Fast Recovery. One | the new congestion window when exiting from fast recovery. One | |||
simple mechanism to avoid a burst of data when leaving Fast Recovery | simple mechanism to avoid a burst of data when leaving fast recovery | |||
is to limit the number of data packets that can be sent in response | is to limit the number of data packets that can be sent in response | |||
to a single acknowledgment. (This is known as "maxburst_" in the ns | to a single acknowledgment. (This is known as "maxburst_" in ns-2 | |||
simulator.) Other possible mechanisms for avoiding bursts include | [NS].) Other possible mechanisms for avoiding bursts include rate- | |||
rate-based pacing, or setting the slow-start threshold to the | based pacing, or setting the slow start threshold to the resultant | |||
resultant congestion window and then resetting the congestion window | congestion window and then resetting the congestion window to | |||
to FlightSize. A recommendation on the general mechanism to avoid | FlightSize. A recommendation on the general mechanism to avoid | |||
excessively bursty sending patterns is outside the scope of this | excessively bursty sending patterns is outside the scope of this | |||
document. | document. | |||
An implementation may want to use a separate flag to record whether | An implementation may want to use a separate flag to record whether | |||
or not it is presently in the Fast Recovery procedure. The use of | or not it is presently in the fast recovery procedure. The use of | |||
the value of the duplicate acknowledgment counter for this purpose is | the value of the duplicate acknowledgment counter for this purpose is | |||
not reliable because it can be reset upon window updates and | not reliable, because it can be reset upon window updates and out-of- | |||
out-of-order acknowledgments. | order acknowledgments. | |||
When updating the Cumulative Acknowledgment field outside of | When updating the Cumulative Acknowledgment field outside of fast | |||
Fast Recovery, the "recover" state variable may also need to be | recovery, the state variable recover may also need to be updated in | |||
updated in order to continue to permit possible entry into Fast | order to continue to permit possible entry into fast recovery | |||
Recovery (Section 3, step 1). This issue arises when an update | (Section 3.2, step 2). This issue arises when an update of the | |||
of the Cumulative Acknowledgment field results in a sequence | Cumulative Acknowledgment field results in a sequence wraparound that | |||
wraparound that affects the ordering between the Cumulative | affects the ordering between the Cumulative Acknowledgment field and | |||
Acknowledgment field and the "recover" state variable. Entry | the state variable recover. Entry into fast recovery is only | |||
into Fast Recovery is only possible when the Cumulative | possible when the Cumulative Acknowledgment field covers more than | |||
Acknowledgment field covers more than the "recover" state variable. | the state variable recover. | |||
It is important for the sender to respond correctly to duplicate ACKs | It is important for the sender to respond correctly to duplicate ACKs | |||
received when the sender is no longer in Fast Recovery (e.g., because | received when the sender is no longer in fast recovery (e.g., because | |||
of a Retransmit Timeout). The Limited Transmit procedure [RFC3042] | of a retransmit timeout). The Limited Transmit procedure [RFC3042] | |||
describes possible responses to the first and second duplicate | describes possible responses to the first and second duplicate | |||
acknowledgments. When three or more duplicate acknowledgments are | acknowledgments. When three or more duplicate acknowledgments are | |||
received, the Cumulative Acknowledgment field doesn't cover more | received, the Cumulative Acknowledgment field doesn't cover more than | |||
than "recover", and a new Fast Recovery is not invoked, it is | recover, and a new fast recovery is not invoked, the sender should | |||
important that the sender not execute the Fast Recovery steps (3) and | follow the guidance in Section 4. Otherwise, the sender could end up | |||
(4) in Section 3. Otherwise, the sender could end up in a chain of | in a chain of spurious timeouts. We mention this only because | |||
spurious timeouts. We mention this only because several NewReno | several NewReno implementations had this bug, including the | |||
implementations had this bug, including the implementation in the NS | implementation in ns-2 [NS]. | |||
simulator. | ||||
It has been observed that some TCP implementations enter a slow start | It has been observed that some TCP implementations enter a slow start | |||
or congestion avoidance window updating algorithm immediately after | or congestion avoidance window updating algorithm immediately after | |||
the cwnd is set by the equation found in (Section 3, step 5), even | the cwnd is set by the equation found in Section 3.2, step 3, even | |||
without a new external event generating the cwnd change. Note that | without a new external event generating the cwnd change. Note that | |||
after cwnd is set based on the procedure for exiting Fast Recovery | after cwnd is set based on the procedure for exiting fast recovery | |||
(Section 3, step 5), cwnd should not be updated until a further | (Section 3.2, step 3), cwnd should not be updated until a further | |||
event occurs (e.g., arrival of an ack, or timeout) after this | event occurs (e.g., arrival of an ack, or timeout) after this | |||
adjustment. | adjustment. | |||
7. Security Considerations | 7. Security Considerations | |||
[RFC5681] discusses general security considerations concerning TCP | [RFC5681] discusses general security considerations concerning TCP | |||
congestion control. This document describes a specific algorithm | congestion control. This document describes a specific algorithm | |||
that conforms with the congestion control requirements of [RFC5681], | that conforms with the congestion control requirements of [RFC5681], | |||
and so those considerations apply to this algorithm, too. There are | and so those considerations apply to this algorithm, too. There are | |||
no known additional security concerns for this specific algorithm. | no known additional security concerns for this specific algorithm. | |||
8. IANA Considerations | 8. Conclusions | |||
This document has no actions for IANA. | ||||
9. Conclusions | ||||
This document specifies the NewReno Fast Retransmit and Fast Recovery | This document specifies the NewReno fast retransmit and fast recovery | |||
algorithms for TCP. This NewReno modification to TCP can even be | algorithms for TCP. This NewReno modification to TCP can even be | |||
important for TCP implementations that support the SACK option, | important for TCP implementations that support the SACK option, | |||
because the SACK option can only be used for TCP connections when | because the SACK option can only be used for TCP connections when | |||
both TCP end-nodes support the SACK option. NewReno performs better | both TCP end-nodes support the SACK option. NewReno performs better | |||
than Reno (RFC5681) in a number of scenarios discussed in | than Reno in a number of scenarios discussed in previous versions of | |||
previous versions of this RFC ([RFC2582], [RFC3782]). | this RFC ([RFC2582] [RFC3782]). | |||
A number of options to the basic algorithm presented in Section 3 are | A number of options for the basic algorithms presented in Section 3 | |||
also referenced in Appendix A to this document. These include the | are also referenced in Appendix A of this document. These include | |||
handling of the retransmission timer, the response to partial | the handling of the retransmission timer, the response to partial | |||
acknowledgments, and whether or not the sender must maintain a state | acknowledgments, and whether or not the sender must maintain a state | |||
variable called Recover. Our belief is that the differences | variable called recover. Our belief is that the differences between | |||
between these variants of NewReno are small compared to the | these variants of NewReno are small compared to the differences | |||
differences between Reno and NewReno. That is, the important thing | between Reno and NewReno. That is, the important thing is to | |||
is to implement NewReno instead of Reno, for a TCP connection | implement NewReno instead of Reno for a TCP connection without SACK; | |||
without SACK; it is less important exactly which of the variants of | it is less important exactly which variant of NewReno is implemented. | |||
NewReno is implemented. | ||||
10. Acknowledgments | 9. Acknowledgments | |||
Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu, | Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu, | |||
Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed | Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed | |||
feedback on this document or on its precursor, RFC 2582. Jeffrey | feedback on the precursor RFCs 2582 and 3782. Jeffrey Hsu provided | |||
Hsu provided clarifications on the handling of the recover variable | clarifications on the handling of the variable recover; these | |||
that were applied to RFC 3782 as errata, and now are in Section 8 | clarifications were applied to RFC 3782 via an erratum and are | |||
of this document. Yoshifumi Nishida contributed a modification | incorporated into the text of Section 6 of this document. Yoshifumi | |||
to the fast recovery algorithm to account for the case in which | Nishida contributed a modification to the fast recovery algorithm to | |||
flightsize is 0 when the TCP sender leaves fast recovery, and the | account for the case in which FlightSize is 0 when the TCP sender | |||
TCP receiver uses delayed acknowledgments. Alexander Zimmermann | leaves fast recovery and the TCP receiver uses delayed | |||
provided several suggestions to improve the clarity of the document. | acknowledgments. Alexander Zimmermann provided several suggestions | |||
to improve the clarity of the document. | ||||
11. References | 10. References | |||
11.1. Normative References | 10.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, September 2009. | Control", RFC 5681, September 2009. | |||
11.2. Informative References | 10.2. Informative References | |||
[C98] Cardwell, N., "delayed ACKs for retransmitted packets: | [C98] Cardwell, N., "delayed ACKs for retransmitted packets: | |||
ouch!". November 1998, Email to the tcpimpl mailing list, | ouch!". November 1998, Email to the tcpimpl mailing list, | |||
Message-ID | archived at | |||
"Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs. | <http://groups.yahoo.com/group/tcp-impl/message/1428>. | |||
washington.edu", | ||||
archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". | [F94] Floyd, S., "TCP and Successive Fast Retransmits", Technical | |||
report, May 1995. | ||||
<ftp://ftp.ee.lbl.gov/papers/fastretrans.ps>. | ||||
[FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of | [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of | |||
Tahoe, Reno and SACK TCP", Computer Communication Review, | Tahoe, Reno and SACK TCP", Computer Communication Review, | |||
July 1996. | July 1996. <ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z>. | |||
URL "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z". | ||||
[F94] Floyd, S., "TCP and Successive Fast Retransmits", Technical | ||||
report, October 1994. URL | ||||
"ftp://ftp.ee.lbl.gov/papers/fastretrans.ps". | ||||
[GF04] Gurtov, A. and S. Floyd, "Resolving Acknowledgment | [GF04] Gurtov, A. and S. Floyd, "Resolving Acknowledgment | |||
Ambiguity in non-SACK TCP", Next Generation Teletraffic and | Ambiguity in non-SACK TCP", NExt Generation Teletraffic and | |||
Wired/Wireless Advanced Networking (NEW2AN'04), February | Wired/Wireless Advanced Networking (NEW2AN'04), | |||
2004. URL "http://www.cs.helsinki.fi/u/gurtov/papers/ | February 2004. <http://www.cs.helsinki.fi/u/gurtov/ | |||
heuristics.html". | papers/heuristics.html>. | |||
[Gur03] Gurtov, A., "[Tsvwg] resolving the problem of unnecessary | [Gur03] Gurtov, A., "[Tsvwg] resolving the problem of unnecessary | |||
fast retransmits in go-back-N", email to the tsvwg mailing | fast retransmits in go-back-N", email to the tsvwg mailing | |||
list, message ID <3F25B467.9020609@cs.helsinki.fi>, | list, July 28, 2003. <http://www.ietf.org/mail-archive/ | |||
July 28, 2003. URL "http://www1.ietf.org/mail-archive/ | web/tsvwg/current/msg04334.html>. | |||
working-groups/tsvwg/current/msg04334.html". | ||||
[Hen98] Henderson, T., Re: NewReno and the 2001 Revision. September | [Hen98] Henderson, T., "Re: NewReno and the 2001 Revision", | |||
1998. Email to the tcpimpl mailing list, Message ID | September 1998. Email to the tcpimpl mailing list, | |||
"Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS. | archived at | |||
Berkeley.EDU", | <http://groups.yahoo.com/group/tcp-impl/message/1321>. | |||
archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". | ||||
[Hoe95] Hoe, J., "Startup Dynamics of TCP's Congestion Control and | [Hoe95] Hoe, J., "Startup Dynamics of TCP's Congestion Control and | |||
Avoidance Schemes", Master's Thesis, MIT, 1995. | Avoidance Schemes", Master's Thesis, MIT, June 1995. | |||
[Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion | [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion | |||
Control Scheme for TCP", ACM SIGCOMM, August 1996. URL | Control Scheme for TCP", ACM SIGCOMM, August 1996. | |||
"http://www.acm.org/sigcomm/sigcomm96/program.html". | <http://ccr.sigcomm.org/archive/1996/conf/hoe.pdf>. | |||
[LM97] Lin, D. and R. Morris, "Dynamics of Random Early | [LM97] Lin, D. and R. Morris, "Dynamics of Random Early | |||
Detection", SIGCOMM 97, September 1997. URL | Detection", SIGCOMM 97, October 1997. | |||
"http://www.acm.org/sigcomm/sigcomm97/program.html". | ||||
[NS] The Network Simulator (NS). | [NS] "The Network Simulator version 2 (ns-2)", | |||
URL "http://www.isi.edu/nsnam/ns/". | <http://www.isi.edu/nsnam/ns/>. | |||
[RFC1323] Jacobson, V., Braden, R. and D. Borman, "TCP Extensions for | [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions | |||
High Performance", RFC 1323, May 1992. | for High Performance", RFC 1323, May 1992. | |||
[RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to | [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to | |||
TCP's Fast Recovery Algorithm", RFC 2582, April 1999. | TCP's Fast Recovery Algorithm", RFC 2582, April 1999. | |||
[RFC2883] Floyd, S., J. Mahdavi, M. Mathis, and M. Podolsky, "The | [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An | |||
Selective Acknowledgment (SACK) Option for TCP, RFC 2883, | Extension to the Selective Acknowledgement (SACK) Option | |||
July 2000. | for TCP", RFC 2883, July 2000. | |||
[RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's | [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing | |||
Loss Recovery Using Limited Transmit", RFC 3042, | TCP's Loss Recovery Using Limited Transmit", RFC 3042, | |||
January 2001. | January 2001. | |||
[RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for | [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for | |||
TCP", RFC 3522, April 2003. | TCP", RFC 3522, April 2003. | |||
[RFC3782] Floyd, S., T. Henderson, and A. Gurtov, "The NewReno | [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno | |||
Modification to TCP's Fast Recovery Algorithm", RFC 3782, | Modification to TCP's Fast Recovery Algorithm", RFC 3782, | |||
April 2004. | April 2004. | |||
Appendix A. Additional Information | Appendix A. Additional Information | |||
Previous versions of this RFC ([RFC2582], [RFC3782]) contained | Previous versions of this RFC ([RFC2582] [RFC3782]) contained | |||
additional informative material on the following subjects, and | additional informative material on the following subjects, and may be | |||
may be consulted by readers who may want more information about | consulted by readers who may want more information about possible | |||
possible variants to the algorithm and who may want references | variants to the algorithms and who may want references to specific | |||
to specific [NS] simulations that provide NewReno test cases. | [NS] simulations that provide NewReno test cases. | |||
Section 4 of [RFC3782] discusses some alternative behaviors for | Section 4 of [RFC3782] discusses some alternative behaviors for | |||
resetting the retransmit timer after a partial acknowledgment. | resetting the retransmit timer after a partial acknowledgment. | |||
Section 5 of [RFC3782] discusses some alternative behaviors for | Section 5 of [RFC3782] discusses some alternative behaviors for | |||
performing retransmission after a partial acknowledgment. | performing retransmission after a partial acknowledgment. | |||
Section 6 of [RFC3782] describes more information about the | Section 6 of [RFC3782] describes more information about the | |||
motivation for the sender's state variable Recover. | motivation for the sender's state variable recover. | |||
Section 9 of [RFC3782] introduces some NS simulation test | Section 9 of [RFC3782] introduces some NS simulation test suites for | |||
suites for NewReno. In addition, references to simulation | NewReno. In addition, references to simulation results can be found | |||
results can be found throughout [RFC3782]. | throughout [RFC3782]. | |||
Section 10 of [RFC3782] provides a comparison of Reno and | Section 10 of [RFC3782] provides a comparison of Reno and | |||
NewReno TCP. | NewReno TCP. | |||
Section 11 of [RFC3782] listed changes relative to [RFC2582]. | Section 11 of [RFC3782] lists changes relative to [RFC2582]. | |||
Appendix B. Changes Relative to RFC 3782 | Appendix B. Changes Relative to RFC 3782 | |||
In [RFC3782], the cwnd after Full ACK reception will be set to | In [RFC3782], the cwnd after Full ACK reception will be set to | |||
(1) min (ssthresh, FlightSize + SMSS) or (2) ssthresh. However, | (1) min (ssthresh, FlightSize + SMSS) or (2) ssthresh. However, the | |||
there is a risk in the first option which results in performance | first option carries a risk of performance degradation: With the | |||
degradation. With the first option, if FlightSize is zero, the | first option, if FlightSize is zero, the result will be 1 SMSS. This | |||
result will be 1 SMSS. This means TCP can transmit only 1 segment | means TCP can transmit only 1 segment at that moment, which can cause | |||
at this moment, which can cause delay in ACK transmission at receiver | a delay in ACK transmission at the receiver due to a delayed ACK | |||
due to delayed ACK algorithm. | algorithm. | |||
The FlightSize on Full ACK reception can be zero in some situations. | The FlightSize on Full ACK reception can be zero in some situations. | |||
A typical example is where sending window size during fast recovery | A typical example is where the sending window size during fast | |||
is small. In this case, the retransmitted packet and new data packets | recovery is small. In this case, the retransmitted packet and new | |||
can be transmitted within a short interval. If all these packets | data packets can be transmitted within a short interval. If all | |||
successfully arrive, the receiver may generate a Full ACK that | these packets successfully arrive, the receiver may generate a Full | |||
acknowledges all outstanding data. Even if window size is not small, | ACK that acknowledges all outstanding data. Even if the window size | |||
loss of ACK packets or receive buffer shortage during fast recovery | is not small, loss of ACK packets or a receive buffer shortage during | |||
can also increase the possibility of falling into this situation. | fast recovery can also increase the possibility of falling into this | |||
situation. | ||||
The proposed fix in this document, which sets cwnd to at least 2*SMSS | The proposed fix in this document, which sets cwnd to at least 2*SMSS | |||
if the implementation uses option 1 in the Full ACK case (Section 3.2 | if the implementation uses option 1 in the Full ACK case | |||
step 3, option 1), ensures that the sender TCP transmits at least two | (Section 3.2, step 3, option 1), ensures that the sender TCP | |||
segments on Full ACK reception. | transmits at least two segments on Full ACK reception. | |||
In addition, errata for RFC3782 (editorial clarification to Section 8 | In addition, an erratum was reported for RFC 3782 (an editorial | |||
of RFC2582, which is now Section 6 of this document) has been | clarification to Section 8); this erratum has been addressed in | |||
applied. | Section 6 of this document. | |||
The specification text (Section 3.2 herein) was rewritten to more | The specification text (Section 3.2 herein) was rewritten to more | |||
closely track Section 3.2 of [RFC5681]. | closely track Section 3.2 of [RFC5681]. | |||
Sections 4, 5, 9-11 of [RFC3782] were removed, and instead Appendix | Sections 4, 5, and 9-11 of [RFC3782] were removed, and instead | |||
A of this document was added to back-reference this informative | Appendix A of this document was added to back-reference this | |||
material. A few references that have no citation in the main body | informative material. A few references that have no citation in the | |||
of the draft have been removed. | main body of the document have been removed. | |||
Appendix C. Document Revision History | ||||
To be removed upon publication | ||||
+----------+--------------------------------------------------+ | ||||
| Revision | Comments | | ||||
+----------+--------------------------------------------------+ | ||||
| draft-00 | RFC3782 errata applied, and changes applied from | | ||||
| | draft-nishida-newreno-modification-02 | | ||||
+----------+--------------------------------------------------+ | ||||
| draft-01 | Non-normative sections moved to appendices, | | ||||
| | editorial clarifications applied as suggested | | ||||
| | by Alexander Zimmermann. | | ||||
+----------+--------------------------------------------------+ | ||||
| draft-02 | Better align specification text with RFC5681. | | ||||
| | Replace informative appendices by a new appendix | | ||||
| | that just provides back-references to earlier | | ||||
| | NewReno RFCs. | | ||||
+----------+--------------------------------------------------+ | ||||
| draft-03 | Document refresh and fix id-nits | | ||||
+----------+--------------------------------------------------+ | ||||
| draft-04 | Address editorial comments received from secdir | | ||||
| | review (provided by Tom Yu). | | ||||
+----------+--------------------------------------------------+ | ||||
| draft-05 | Address IESG review comments from David | | ||||
| | Harrington, and Gen-ART review comments from | | ||||
| | Ben Campbell. | | ||||
+----------+--------------------------------------------------+ | ||||
Authors' Addresses | Authors' Addresses | |||
Tom Henderson | Tom Henderson | |||
The Boeing Company | The Boeing Company | |||
EMail: thomas.r.henderson@boeing.com | EMail: thomas.r.henderson@boeing.com | |||
Sally Floyd | Sally Floyd | |||
International Computer Science Institute | International Computer Science Institute | |||
skipping to change at page 15, line 43 | skipping to change at page 16, line 34 | |||
Finland | Finland | |||
EMail: gurtov@ee.oulu.fi | EMail: gurtov@ee.oulu.fi | |||
Yoshifumi Nishida | Yoshifumi Nishida | |||
WIDE Project | WIDE Project | |||
Endo 5322 | Endo 5322 | |||
Fujisawa, Kanagawa 252-8520 | Fujisawa, Kanagawa 252-8520 | |||
Japan | Japan | |||
Email: nishida@wide.ad.jp | EMail: nishida@wide.ad.jp | |||
End of changes. 107 change blocks. | ||||
377 lines changed or deleted | 333 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |