--- 1/draft-ietf-tsvwg-newreno-01.txt 2006-02-05 02:04:56.000000000 +0100 +++ 2/draft-ietf-tsvwg-newreno-02.txt 2006-02-05 02:04:56.000000000 +0100 @@ -1,18 +1,18 @@ Internet Engineering Task Force S. Floyd INTERNET DRAFT ICSI -draft-ietf-tsvwg-newreno-01.txt T. Henderson +draft-ietf-tsvwg-newreno-02.txt T. Henderson Boeing A. Gurtov U. Helsinki - September 2003 + November 2003 The NewReno Modification to TCP's Fast Recovery Algorithm Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that @@ -29,37 +29,41 @@ The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract RFC 2581 [RFC2581] documents the following four intertwined TCP congestion control algorithms: Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery. RFC 2581 [RFC2581] explicitly allows certain modifications of these algorithms, including modifications - that use the TCP Selective Acknowledgement (SACK) option [RFC2018], - and modifications that respond to "partial acknowledgments" (ACKs - which cover new data, but not all the data outstanding when loss was - detected) in the absence of SACK. The NewReno mechanism uses an - algorithm for responding to partial acknowledgments that was first - proposed by Janey Hoe in [Hoe95]. + that use the TCP Selective Acknowledgement (SACK) option + [RFC2018,RFC3517], and modifications that respond to "partial + acknowledgments" (ACKs which cover new data, but not all the data + outstanding when loss was detected) in the absence of SACK. The + NewReno mechanism uses an algorithm for responding to partial + acknowledgments that was first proposed by Janey Hoe in [Hoe95]. RFC 2582 [RFC2582] specified the NewReno mechanisms as Experimental in 1999. This document is a small revision of RFC 2582 intended to advance the NewReno mechanisms to Proposed Standard. RFC 2581 notes that the Fast Retransmit/Fast Recovery algorithm specified in that document does not recover very efficiently from multiple losses in a single flight of packets, and that RFC 2582 contains one set of modifications to address this problem. NOTE TO THE RFC EDITOR: PLEASE REMOVE THIS SECTION UPON PUBLICATION. + Changes from draft-ietf-tsvwg-newreno-01.txt: + + * Some improvements in phrasing suggested by Mark Allman. + Changes from draft-ietf-tsvwg-newreno-00.txt: * In Section 8, added a cautionary note about using the duplicate acknowledgment counter as a flag for whether Fast Recovery is in effect. * In Section 8, added a note about pulling along "recover" with "snd_una" when Fast Recovery is not in effect. * Added a discussion in Section 6 about heuristics for distinguishing @@ -266,71 +270,78 @@ "recover", then the ACK acknowledges all the intermediate segments sent between the original transmission of the lost segment and the receipt of the third duplicate ACK. Set cwnd to either (1) min (ssthresh, FlightSize + SMSS); or (2) ssthresh, where ssthresh is the value set in step 1; this is termed "deflating" the window. (We note that "FlightSize" in step 1 referred to the amount of data outstanding in step 1, when Fast Recovery was entered, while "FlightSize" in step 5 refers to the amount of data outstanding in step 5, when Fast Recovery is exited.) If the second option is selected, the implementation - should take measures to avoid a possible burst of data, in case - the amount of data outstanding in the network was much less than - the new congestion window allows. A simple mechanism is to limit - the number of data packets that can be sent in response to a - single acknowledgement. (This is known as "maxburst_" in the NS - simulator). Exit the Fast Recovery procedure. + is encouraged to take measures to avoid a possible burst of + data, in case the amount of data outstanding in the network was + much less than the new congestion window allows. A simple + mechanism is to limit the number of data packets that can be + sent in response to a single acknowledgement. (This is known + as "maxburst_" in the NS simulator). Exit the Fast Recovery + procedure. Partial acknowledgements: If this ACK does *not* acknowledge all of the data up to and including "recover", then this is a partial ACK. In this case, retransmit the first unacknowledged segment. Deflate the - congestion window by the amount of new data acknowledged, then - add back one SMSS (if the partial ACK acknowledges at least one - SMSS of new data) and send a new segment if permitted by the new - value of cwnd. This "partial window deflation" attempts to - ensure that, when Fast Recovery eventually ends, approximately - ssthresh amount of data will be outstanding in the network. Do - not exit the Fast Recovery procedure (i.e., if any duplicate ACKs - subsequently arrive, execute Steps 3 and 4 above). + congestion window by the amount of new data acknowledged by the + cumulative acknowledgement field. If the partial ACK + acknowledges at least one SMSS of new data, then add back SMSS + bytes to the congestion window. As in Step 3, this artificially + inflates the congestion window in order to reflect the additional + segment that has left the network. Send a new segment if + permitted by the new value of cwnd. This "partial window + deflation" attempts to ensure that, when Fast Recovery eventually + ends, approximately ssthresh amount of data will be outstanding + in the network. Do not exit the Fast Recovery procedure (i.e., + if any duplicate ACKs subsequently arrive, execute Steps 3 and + 4 above). For the first partial ACK that arrives during Fast Recovery, also - reset the retransmit timer. + reset the retransmit timer. Timer management is discussed in + more detail in Section 4. 6) Retransmit timeouts: After a retransmit timeout, record the highest sequence number transmitted in the variable "recover" and exit the Fast Recovery procedure if applicable. Step 1 specifies a check that the Cumulative Acknowledgement field covers more than "recover". Because the acknowledgement field contains the sequence number that the sender next expects to receive, the acknowledgement "ack_number" covers more than "recover" when: - ack_number - one > recover. + ack_number - 1 > recover. Note that in Step 5, the congestion window is deflated after a partial acknowledgement is received. The congestion window was likely to have been inflated considerably when the partial acknowledgement was received. In addition, depending on the original pattern of packet losses, the partial acknowledgement might acknowledge nearly a window of data. In this case, if the congestion window was not deflated, the data sender might be able to send nearly a window of data back-to-back. This document does not specify the sender's response to duplicate ACKs when the Fast Retransmit/Fast Recovery algorithm is not invoked. This is addressed in other documents, such as those describing the Limited Transmit procedure [RFC3042]. This document also does not address issues of adjusting the duplicate acknowledgement threshold, - but assumes the threshold of three duplicate acknowledgements - currently specified in RFC 2581. + but assumes the threshold specified in the IETF standards; the + current standard is RFC 2581, which specifies a threshold of three + duplicate acknowledgements. As a final note, we would observe that in the absence of the SACK option, the data sender is working from limited information. When the issue of recovery from multiple dropped packets from a single window of data is of particular importance, the best alternative would be to use the SACK option. 4. Resetting the Retransmit Timer in Response to Partial Acknowledgements. @@ -415,23 +426,23 @@ SACK, recovering as quickly as slow-start introduces the likelihood of unnecessarily retransmitting packets, and this could significantly complicate the recovery mechanisms. We note that the response to partial acknowledgements specified in Section 3 of this document and in RFC 2582 differs from the response in [FF96], even though both approaches only retransmit one packet in response to a partial acknowledgement. Step 5 of Section 3 specifies that the TCP sender responds to a partial ACK by deflating the congestion window by the amount of new data acknowledged, then adding - back one SMSS if the partial ACK acknowledges at least one SMSS of - new data, and sending a new segment if permitted by the new value of - cwnd. Thus, only one previously-sent packet is retransmitted in + back SMSS bytes if the partial ACK acknowledges at least SMSS bytes + of new data, and sending a new segment if permitted by the new value + of cwnd. Thus, only one previously-sent packet is retransmitted in response to each partial acknowledgement, but additional new packets might be transmitted as well, depending on the amount of new data acknowledged by the partial acknowledgement. In contrast, the variant of NewReno illustrated in [FF96] simply set the congestion window to ssthresh when a partial acknowledgement was received. The approach in [FF96] is more conservative, and does not attempt to accurately track the actual number of outstanding packets after a partial acknowledgement is received. While either of these approaches gives acceptable performance, the variant specified in Section 3 recovers more smoothly when multiple packets are dropped @@ -538,39 +548,42 @@ If the ACK-based heuristic is used, then following the advancement of the cumulative acknowledgement field, the sender stores the value of previous cumulative acknowledgement as prev_highest_ack and stores the latest cumulative ACK as highest_ack. In addition, the following step is performed if Step 1 in Section 3 fails, before proceeding to Step 1B. 1*) If the Cumulative Acknowledgement field didn't cover more than "recover", check to see if the congestion window is greater - than one SMSS and the difference between highest_ack and - prev_highest_ack is at most four SMSS. If true, duplicate + than SMSS bytes and the difference between highest_ack and + prev_highest_ack is at most 4*SMSS bytes. If true, duplicate ACKs indicate a lost segment (proceed to Step 1A in Section 3). Otherwise, duplicate ACKs likely result from unnecessary retransmissions (proceed to Step 1B in Section 3). The congestion window check serves to protect against fast retransmit immediately after a retransmit timeout, similar to the "exitFastRetrans_" variable in NS. Examples of applying the ACK heuristic are in validation tests "./test-all-newreno newreno_rto_loss_ack" and "./test-all-newreno newreno_rto_dup_ack" in directory "tcl/test" of the NS simulator. If several ACKs are lost, the sender can see a jump in the cumulative ACK of more than three segments and the heuristic can fail. A validation test for this scenario is "./test-all-newreno - newreno_rto_loss_ackf". The ACK heuristic is more likely to fail if - the receiver uses delayed ACKs, because then a smaller number of ACK - losses are needed to produce a sufficient jump in the cumulative ACK. + newreno_rto_loss_ackf". RFC 2581 recommends that a receiver should + send duplicate ACKs for every out-of-order data packet, such as a + data packet received during Fast Recovery. The ACK heuristic is more + likely to fail if the receiver does not follow this advice, because + then a smaller number of ACK losses are needed to produce a + sufficient jump in the cumulative ACK. 6.2. Timestamp Heuristic. If this heuristic is used, the sender stores the timestamp of the last acknowledged segment. In addition, the second paragraph of step 1 in Section 3 is replaced as follows: 1**) If the Cumulative Acknowledgement field didn't cover more than "recover", check to see if the echoed timestamp equals the stored timestamp. If true, duplicate ACKs indicate a lost @@ -590,24 +603,24 @@ 7. Implementation Issues for the Data Receiver [RFC2581] specifies that "Out-of-order data segments SHOULD be acknowledged immediately, in order to accelerate loss recovery." Neal Cardwell has noted that some data receivers do not send an immediate acknowledgement when they send a partial acknowledgment, but instead wait first for their delayed acknowledgement timer to expire [C98]. As [C98] notes, this severely limits the potential benefit from NewReno by delaying the receipt of the partial - acknowledgement at the data sender. Our recommendation is that the - data receiver send an immediate acknowledgement for an out-of-order - segment, even when that out-of-order segment fills a hole in the - buffer. + acknowledgement at the data sender. Echoing RFC 2581, our + recommendation is that the data receiver send an immediate + acknowledgement for an out-of-order segment, even when that out-of- + order segment fills a hole in the buffer. 8. Implementation Issues for the Data Sender In Section 3, Step 5 above, it is noted that implementations should take measures to avoid a possible burst of data when leaving Fast Recovery, in case the amount of new data that the sender is eligible to send due to the new value of the congestion window is large. This can arise during NewReno when ACKs are lost or treated as pure window updates, thereby causing the sender to underestimate the number of new segments that can be sent during the recovery procedure. @@ -845,20 +858,24 @@ "http://www.acm.org/sigcomm/sigcomm97/program.html". [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/". [PF01] J. Padhye and S. Floyd, "Identifying the TCP Behavior of Web Servers", June 2001, SIGCOMM 2001. [RFC1323] V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for High Performance,", RFC 1323, May 1992. + [RFC3517] E. Blanton, M. Allman, K. Fall, and L. Wang, "A + Conservative Selective Acknowledgment (SACK)-based Loss Recovery + Algorithm for TCP", RFC 3517, April 2003. + [RFC3522] R. Ludwig and M. Meyer, The Eifel Detection Algorithm for TCP, RFC 3522, April 2003. 15. Security Considerations RFC 2581 discusses general security considerations concerning TCP congestion control. This document describes a specific algorithm that conforms with the congestion control requirements of RFC 2581, and so those considerations apply to this algorithm, too. There are no known additional security concerns for this specific algorithm.