--- 1/draft-ietf-tcpm-rfc3782-bis-04.txt 2012-01-21 01:13:57.222670860 +0100 +++ 2/draft-ietf-tcpm-rfc3782-bis-05.txt 2012-01-21 01:13:57.254671066 +0100 @@ -1,22 +1,23 @@ + TCP Maintenance and Minor T. Henderson Extensions Working Group Boeing Internet-Draft S. Floyd Obsoletes: 3782 (if approved) ICSI Intended status: Standards Track A. Gurtov -Expires: June 5, 2012 HIIT +Expires: July 18, 2012 University of Oulu Y. Nishida WIDE Project - December 5, 2011 + January 18, 2012 The NewReno Modification to TCP's Fast Recovery Algorithm - draft-ietf-tcpm-rfc3782-bis-04.txt + draft-ietf-tcpm-rfc3782-bis-05.txt Abstract RFC 5681 documents the following four intertwined TCP congestion control algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery. RFC 5681 explicitly allows certain modifications of these algorithms, including modifications that use the TCP Selective Acknowledgement (SACK) option (RFC 2883), and modifications that respond to "partial acknowledgments" (ACKs which cover new data, but not all the data outstanding when loss was @@ -33,25 +34,25 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on June 5, 2012. + This Internet-Draft will expire on July 18, 2012. Copyright Notice - Copyright (c) 2011 IETF Trust and the persons identified as + Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as @@ -135,33 +136,23 @@ partial acknowledgments, for TCP connections that are unable to use SACK. Based on our experiences with the NewReno modification in the NS simulator [NS] and with numerous implementations of NewReno, we believe that this modification improves the performance of the Fast Retransmit and Fast Recovery algorithms in a wide variety of scenarios. Previous versions of this RFC [RFC2582, RFC3782] provide simulation-based evidence of the possible performance gains. 2. Terminology and Definitions - The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL - NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and - "OPTIONAL" in this document are to be interpreted as described in - RFC 2119 [RFC2119]. - This document assumes that the reader is familiar with the terms SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and - FLIGHT SIZE (FlightSize) defined in [RFC5681]. FLIGHT SIZE is - defined as in [RFC5681] as follows: - - FLIGHT SIZE: - The amount of data that has been sent but not yet cumulatively - acknowledged. + FLIGHT SIZE (FlightSize) defined in [RFC5681]. This document defines an additional sender-side state variable called RECOVER: RECOVER: When in Fast Recovery, this variable records the send sequence number that must be acknowledged before the Fast Recovery procedure is declared to be over. 3. The Fast Retransmit and Fast Recovery Algorithms in NewReno @@ -178,21 +169,24 @@ The NewReno modification applies to the Fast Recovery procedure that begins when three duplicate ACKs are received and ends when either a retransmission timeout occurs or an ACK arrives that acknowledges all of the data up to and including the data that was outstanding when the Fast Recovery procedure began. 3.2. Specification The procedures specified in Section 3.2 of [RFC5681] are followed - with the following modifications. + with the following modifications. Note that this specification + avoids the use of the key words defined in RFC 2119 [RFC2119] since + it mainly provides sender-side implementation guidance for + performance improvement, and does not affect interoperability. 1) Initialization of TCP protocol control block: When the TCP protocol control block is initialized, Recover is set to the initial send sequence number. 2) Three duplicate ACKs: When the third duplicate ACK is received, the TCP sender first checks the value of Recover to see if the Cumulative Acknowledgment field covers more than Recover. If so, the value of Recover is incremented to the value of the highest sequence @@ -206,24 +200,24 @@ arrives that acknowledges new data, this ACK could be the acknowledgment elicited by the retransmission from step 2, or elicited by a later retransmission. There are two cases. Full acknowledgments: If this ACK acknowledges all of the data up to and including Recover, then the ACK acknowledges all the intermediate segments sent between the original transmission of the lost segment and the receipt of the third duplicate ACK. Set cwnd to either (1) min (ssthresh, max(FlightSize, SMSS) + SMSS) or - (2) ssthresh, where ssthresh is the value set when Fast Retransmit - was entered, and where FlightSize in (1) is the amount of data - presently outstanding. This is termed "deflating" the window. - If the second option is selected, the implementation + (2) ssthresh, where ssthresh is the value set when Fast + Retransmit was entered, and where FlightSize in (1) is the amount + of data presently outstanding. This is termed "deflating" the + window. If the second option is selected, the implementation is encouraged to take measures to avoid a possible burst of data, in case the amount of data outstanding in the network is much less than the new congestion window allows. A simple mechanism is to limit the number of data packets that can be sent in response to a single acknowledgment. Exit the Fast Recovery procedure. Partial acknowledgments: If this ACK does *not* acknowledge all of the data up to and including Recover, then this is a partial ACK. In this case, @@ -293,32 +287,32 @@ consecutive packets that have already been received by the data receiver, then the TCP data sender will receive three duplicate acknowledgments that do not cover more than "recover". In this case, the duplicate acknowledgments are not an indication of a new instance of congestion. They are simply an indication that the sender has unnecessarily retransmitted at least three packets. However, when a retransmitted packet is itself dropped, the sender can also receive three duplicate acknowledgments that do not cover more than "recover". In this case, the sender would have been - better off if it had initiated Fast Retransmit. For a TCP that - implements the algorithm specified in Section 3.2 of this document, the - sender does not infer a packet drop from duplicate acknowledgments - in this scenario. As always, the retransmit timer is the backup - mechanism for inferring packet loss in this case. + better off if it had initiated Fast Retransmit. For a TCP sender + that implements the algorithm specified in Section 3.2 of this + document, the sender does not infer a packet drop from duplicate + acknowledgments in this scenario. As always, the retransmit timer + is the backup mechanism for inferring packet loss in this case. There are several heuristics, based on timestamps or on the amount of advancement of the cumulative acknowledgment field, that allow the sender to distinguish, in some cases, between three duplicate acknowledgments following a retransmitted packet that was dropped, and three duplicate acknowledgments from the unnecessary - retransmission of three packets [Gur03, GF04]. The TCP sender MAY + retransmission of three packets [Gur03, GF04]. The TCP sender may use such a heuristic to decide to invoke a Fast Retransmit in some cases, even when the three duplicate acknowledgments do not cover more than "recover". For example, when three duplicate acknowledgments are caused by the unnecessary retransmission of three packets, this is likely to be accompanied by the cumulative acknowledgment field advancing by at least four segments. Similarly, a heuristic based on timestamps uses the fact that when there is a hole in the sequence space, the timestamp echoed in the duplicate acknowledgment is the timestamp of @@ -336,23 +330,23 @@ the cumulative acknowledgment field, the sender stores the value of the previous cumulative acknowledgment as prev_highest_ack, and stores the latest cumulative ACK as highest_ack. In addition, the following check is performed if, in Step 2 of Section 3.2, the Cumulative Acknowledgment field does not cover more than "recover". 1*) If the Cumulative Acknowledgment field didn't cover more than "recover", check to see if the congestion window is greater than SMSS bytes and the difference between highest_ack and prev_highest_ack is at most 4*SMSS bytes. If true, duplicate - ACKs indicate a lost segment (enter Fast Retransmit). Otherwise, - duplicate ACKs likely result from unnecessary retransmissions - (do not enter Fast Retransmit). + ACKs indicate a lost segment (enter Fast Retransmit). + Otherwise, duplicate ACKs likely result from unnecessary + retransmissions (do not enter Fast Retransmit). The congestion window check serves to protect against fast retransmit immediately after a retransmit timeout. If several ACKs are lost, the sender can see a jump in the cumulative ACK of more than three segments, and the heuristic can fail. [RFC5681] recommends that a receiver should send duplicate ACKs for every out-of-order data packet, such as a data packet received during Fast Recovery. The ACK heuristic is more likely to fail if the receiver does not follow this advice, because @@ -363,22 +357,22 @@ If this heuristic is used, the sender stores the timestamp of the last acknowledged segment. In addition, the last sentence of step 2 in Section 3.2 is replaced as follows: 1**) If the Cumulative Acknowledgment field didn't cover more than "recover", check to see if the echoed timestamp in the last non-duplicate acknowledgment equals the stored timestamp. If true, duplicate ACKs indicate a lost segment (enter Fast Retransmit). Otherwise, duplicate - ACKs likely result from unnecessary retransmissions (do not enter - Fast Retransmit). + ACKs likely result from unnecessary retransmissions (do not + enter Fast Retransmit). The timestamp heuristic works correctly, both when the receiver echoes timestamps as specified by [RFC1323], and by its revision attempts. However, if the receiver arbitrarily echoes timestamps, the heuristic can fail. The heuristic can also fail if a timeout was spurious and returning ACKs are not from retransmitted segments. This can be prevented by detection algorithms such as [RFC3522]. 5. Implementation Issues for the Data Receiver @@ -442,21 +436,21 @@ (4) in Section 3. Otherwise, the sender could end up in a chain of spurious timeouts. We mention this only because several NewReno implementations had this bug, including the implementation in the NS simulator. It has been observed that some TCP implementations enter a slow start or congestion avoidance window updating algorithm immediately after the cwnd is set by the equation found in (Section 3, step 5), even without a new external event generating the cwnd change. Note that after cwnd is set based on the procedure for exiting Fast Recovery - (Section 3, step 5), cwnd SHOULD NOT be updated until a further + (Section 3, step 5), cwnd should not be updated until a further event occurs (e.g., arrival of an ack, or timeout) after this adjustment. 7. Security Considerations [RFC5681] discusses general security considerations concerning TCP congestion control. This document describes a specific algorithm that conforms with the congestion control requirements of [RFC5681], and so those considerations apply to this algorithm, too. There are no known additional security concerns for this specific algorithm. @@ -502,100 +495,90 @@ 11. References 11.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion Control", RFC 5681, September 2009. - [RFC6298] Paxson, V., M. Allman, J. Chu, and M. Sargent, "Computing - TCP's Retransmission Timer", RFC 6298, June 2011. - 11.2. Informative References [C98] Cardwell, N., "delayed ACKs for retransmitted packets: ouch!". November 1998, Email to the tcpimpl mailing list, Message-ID - "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu", + "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs. + washington.edu", archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". - [F98] Floyd, S., Revisions to RFC 2001, "Presentation to the - TCPIMPL Working Group", August 1998. URLs - "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.ps" and - "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.pdf". - - [F03] Floyd, S., "Moving NewReno from Experimental to Proposed - Standard? Presentation to the TSVWG Working Group", March 2003. - URLs "http://www.icir.org/floyd/talks/newreno-Mar03.ps" and - "http://www.icir.org/floyd/talks/newreno-Mar03.pdf". - [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of - Tahoe, Reno and SACK TCP", Computer Communication Review, July 1996. + Tahoe, Reno and SACK TCP", Computer Communication Review, + July 1996. URL "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z". [F94] Floyd, S., "TCP and Successive Fast Retransmits", Technical report, October 1994. URL "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps". [GF04] Gurtov, A. and S. Floyd, "Resolving Acknowledgment Ambiguity in non-SACK TCP", Next Generation Teletraffic and Wired/Wireless Advanced Networking (NEW2AN'04), February 2004. URL "http://www.cs.helsinki.fi/u/gurtov/papers/ heuristics.html". [Gur03] Gurtov, A., "[Tsvwg] resolving the problem of unnecessary - fast retransmits in go-back-N", email to the tsvwg mailing list, - message ID <3F25B467.9020609@cs.helsinki.fi>, July 28, 2003. URL - "http://www1.ietf.org/mail-archive/working-groups/tsvwg/current/ - msg04334.html". + fast retransmits in go-back-N", email to the tsvwg mailing + list, message ID <3F25B467.9020609@cs.helsinki.fi>, + July 28, 2003. URL "http://www1.ietf.org/mail-archive/ + working-groups/tsvwg/current/msg04334.html". [Hen98] Henderson, T., Re: NewReno and the 2001 Revision. September 1998. Email to the tcpimpl mailing list, Message ID - "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS.Berkeley.EDU", + "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS. + Berkeley.EDU", archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". [Hoe95] Hoe, J., "Startup Dynamics of TCP's Congestion Control and Avoidance Schemes", Master's Thesis, MIT, 1995. [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion Control Scheme for TCP", ACM SIGCOMM, August 1996. URL "http://www.acm.org/sigcomm/sigcomm96/program.html". [LM97] Lin, D. and R. Morris, "Dynamics of Random Early Detection", SIGCOMM 97, September 1997. URL "http://www.acm.org/sigcomm/sigcomm97/program.html". [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/". - [PF01] Padhye, J. and S. Floyd, "Identifying the TCP Behavior of - Web Servers", June 2001, SIGCOMM 2001. - [RFC1323] Jacobson, V., Braden, R. and D. Borman, "TCP Extensions for High Performance", RFC 1323, May 1992. [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to TCP's Fast Recovery Algorithm", RFC 2582, April 1999. [RFC2883] Floyd, S., J. Mahdavi, M. Mathis, and M. Podolsky, "The - Selective Acknowledgment (SACK) Option for TCP, RFC 2883, July 2000. + Selective Acknowledgment (SACK) Option for TCP, RFC 2883, + July 2000. [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's - Loss Recovery Using Limited Transmit", RFC 3042, January 2001. + Loss Recovery Using Limited Transmit", RFC 3042, + January 2001. [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for TCP", RFC 3522, April 2003. [RFC3782] Floyd, S., T. Henderson, and A. Gurtov, "The NewReno - Modification to TCP's Fast Recovery Algorithm", RFC 3782, April 2004. + Modification to TCP's Fast Recovery Algorithm", RFC 3782, + April 2004. Appendix A. Additional Information Previous versions of this RFC ([RFC2582], [RFC3782]) contained additional informative material on the following subjects, and may be consulted by readers who may want more information about possible variants to the algorithm and who may want references to specific [NS] simulations that provide NewReno test cases. Section 4 of [RFC3782] discusses some alternative behaviors for @@ -607,21 +590,21 @@ Section 6 of [RFC3782] describes more information about the motivation for the sender's state variable Recover. Section 9 of [RFC3782] introduces some NS simulation test suites for NewReno. In addition, references to simulation results can be found throughout [RFC3782]. Section 10 of [RFC3782] provides a comparison of Reno and NewReno TCP. - Section 11 of [RFC3782] listed changes relative to [RFC3782]. + Section 11 of [RFC3782] listed changes relative to [RFC2582]. Appendix B. Changes Relative to RFC 3782 In [RFC3782], the cwnd after Full ACK reception will be set to (1) min (ssthresh, FlightSize + SMSS) or (2) ssthresh. However, there is a risk in the first option which results in performance degradation. With the first option, if FlightSize is zero, the result will be 1 SMSS. This means TCP can transmit only 1 segment at this moment, which can cause delay in ACK transmission at receiver due to delayed ACK algorithm. @@ -629,75 +612,85 @@ The FlightSize on Full ACK reception can be zero in some situations. A typical example is where sending window size during fast recovery is small. In this case, the retransmitted packet and new data packets can be transmitted within a short interval. If all these packets successfully arrive, the receiver may generate a Full ACK that acknowledges all outstanding data. Even if window size is not small, loss of ACK packets or receive buffer shortage during fast recovery can also increase the possibility of falling into this situation. The proposed fix in this document, which sets cwnd to at least 2*SMSS - if the implementation uses option 1 in the Full ACK case (Section 3.2, + if the implementation uses option 1 in the Full ACK case (Section 3.2 step 3, option 1), ensures that the sender TCP transmits at least two segments on Full ACK reception. In addition, errata for RFC3782 (editorial clarification to Section 8 of RFC2582, which is now Section 6 of this document) has been applied. The specification text (Section 3.2 herein) was rewritten to more closely track Section 3.2 of [RFC5681]. Sections 4, 5, 9-11 of [RFC3782] were removed, and instead Appendix A of this document was added to back-reference this informative - material. + material. A few references that have no citation in the main body + of the draft have been removed. Appendix C. Document Revision History To be removed upon publication +----------+--------------------------------------------------+ | Revision | Comments | +----------+--------------------------------------------------+ | draft-00 | RFC3782 errata applied, and changes applied from | | | draft-nishida-newreno-modification-02 | +----------+--------------------------------------------------+ | draft-01 | Non-normative sections moved to appendices, | | | editorial clarifications applied as suggested | | | by Alexander Zimmermann. | +----------+--------------------------------------------------+ | draft-02 | Better align specification text with RFC5681. | | | Replace informative appendices by a new appendix | | | that just provides back-references to earlier | | | NewReno RFCs. | +----------+--------------------------------------------------+ + | draft-03 | Document refresh and fix id-nits | + +----------+--------------------------------------------------+ + | draft-04 | Address editorial comments received from secdir | + | | review (provided by Tom Yu). | + +----------+--------------------------------------------------+ + | draft-05 | Address IESG review comments from David | + | | Harrington, and Gen-ART review comments from | + | | Ben Campbell. | + +----------+--------------------------------------------------+ Authors' Addresses Tom Henderson The Boeing Company EMail: thomas.r.henderson@boeing.com Sally Floyd International Computer Science Institute Phone: +1 (510) 666-2989 EMail: floyd@acm.org URL: http://www.icir.org/floyd/ Andrei Gurtov - HIIT - Helsinki Institute for Information Technology - P.O. Box 19215 - 00076 Aalto + University of Oulu + Centre for Wireless Communications CWC + P.O. Box 4500 + FI-90014 University of Oulu Finland - EMail: gurtov@hiit.fi + EMail: gurtov@ee.oulu.fi Yoshifumi Nishida WIDE Project Endo 5322 Fujisawa, Kanagawa 252-8520 Japan Email: nishida@wide.ad.jp