--- 1/draft-ietf-tcpm-hystartplusplus-02.txt 2021-07-25 12:13:20.867829437 -0700 +++ 2/draft-ietf-tcpm-hystartplusplus-03.txt 2021-07-25 12:13:20.879829591 -0700 @@ -1,61 +1,60 @@ Network Working Group P. Balasubramanian Internet-Draft Y. Huang Intended status: Standards Track M. Olson -Expires: January 13, 2022 Microsoft - July 12, 2021 +Expires: 26 January 2022 Microsoft + 25 July 2021 HyStart++: Modified Slow Start for TCP - draft-ietf-tcpm-hystartplusplus-02 + draft-ietf-tcpm-hystartplusplus-03 Abstract This doument describes HyStart++, a simple modification to the slow start phase of TCP congestion control algorithms. Traditional slow - start can cause overshotting of the ideal send rate and cause large + start can cause overshooting of the ideal send rate and cause large packet loss within a round-trip time which results in poor - performance. HyStart++ is composed of the delay increase variant of - HyStart to prevent overshooting of the ideal sending rate, while also - mitigating poor performance which can result from false positives. + performance. HyStart++ uses a delay increase heuristic to exit slow + start early while also mitigating poor performance which can result + from false positives. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on January 13, 2022. + This Internet-Draft will expire on 26 January 2022. Copyright Notice Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal - Provisions Relating to IETF Documents - (https://trustee.ietf.org/license-info) in effect on the date of - publication of this document. Please review these documents - carefully, as they describe your rights and restrictions with respect - to this document. Code Components extracted from this document must - include Simplified BSD License text as described in Section 4.e of - the Trust Legal Provisions and are provided without warranty as - described in the Simplified BSD License. + Provisions Relating to IETF Documents (https://trustee.ietf.org/ + license-info) in effect on the date of publication of this document. + Please review these documents carefully, as they describe your rights + and restrictions with respect to this document. Code Components + extracted from this document must include Simplified BSD License text + as described in Section 4.e of the Trust Legal Provisions and are + provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 4. HyStart++ Algorithm . . . . . . . . . . . . . . . . . . . . . 3 4.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 3 4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 4 4.3. Tuning constants . . . . . . . . . . . . . . . . . . . . 6 @@ -65,42 +64,35 @@ 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 8.1. Normative References . . . . . . . . . . . . . . . . . . 7 8.2. Informative References . . . . . . . . . . . . . . . . . 8 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 1. Introduction [RFC5681] describes the slow start congestion control algorithm for TCP. The slow start algorithm is used when the congestion window (cwnd) is less than the slow start threshold (ssthresh). During slow - start, in absence of packet loss signals, TCP sender increases cwnd - exponentially to probe the network capacity. Such a fast growth can - lead to overshooting the ideal sending rate and cause significant - packet loss. This is counter-productive for the TCP flow itself, and - also impacts the rest of the traffic sharing the bottleneck link. - TCP has several mechanisms for loss recovery, but they are only - effective for moderate loss. When these techniques are unable to - recover lost packets, a last-resort retransmission timeout (RTO) is - used to trigger packet recovery. In most operating systems, the - minimum RTO is set to a large value (200 msec or 300 msec) to prevent - spurious timeouts. This results in a long idle time which - drastically impairs flow completion times. + start, in absence of packet loss signals, TCP increases cwnd + exponentially to probe the network capacity. This fast growth can + overshoot the ideal sending rate and cause significant packet loss + which cannot always be recovered efficiently, impairing flow + completion time. - HyStart++ adds delay increase as a signal to exit slow start before - any packet loss occurs. This is one of two algorithms specified in - [HyStart]. After the HyStart delay algorithm finds an exit point, a - Conservative Slow Start (CSS) phase is used to determine if the slow - start exit was spurious. This provides protection against jitter and - prevents pefrormance problems that result from early slow start exit - due to false positives. HyStart++ reduces packet loss and - retransmissions, and improves goodput in lab measurements as well as - real world deployments. + HyStart++ first uses delay increase as a signal to exit slow start + before any packet loss occurs. This is one of two algorithms + specified in [HyStart]. After the HyStart delay algorithm finds an + exit point, a novel Conservative Slow Start (CSS) phase is used to + determine whether the slow start exit was spurious. This provides + protection against jitter and prevents performance problems that + result from early slow start exit due to false positives. HyStart++ + reduces packet loss and retransmissions, and improves goodput in lab + measurements as well as real world deployments. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 3. Definitions We repeat here some definition from [RFC5681] to aid the reader. @@ -132,131 +124,138 @@ [HyStart] specifies two algorithms (a "Delay Increase" algorithm and an "Inter-Packet Arrival" algorithm) to be run in parallel to detect that the sending rate has reached capacity. In practice, the Inter- Packet Arrival algorithm does not perform well and is not able to detect congestion early, primarily due to ACK compression. The idea of the Delay Increase algorithm is to look for RTT spikes, which suggest that the bottleneck buffer is filling up. In HyStart++, a TCP sender uses traditional slow start and then uses the "Delay Increase" algorithm to trigger an exit from slow start. - But instead of using a congestion avoidance algorithm, the sender - uses a Conservative Slow Start (CSS) algorithm to determine if the - exit was spurious. If the exit is determined to be spurious, slow - start is resumed. If the exit is determined to be not spurious, the - sender enters congestion avoidance. + But instead of going straight from slow start to congestion + avoidance, the sender spends a number of RTTs in a Conservative Slow + Start (CSS) phase to determine whether the exit was spurious. During + CSS, the congestion window is grown exponentially like in regular + slow start, but with a smaller exponential base, resulting in less + aggressive growth. If the RTT shrinks at any time during CSS, it's + concluded that the RTT spike was not related to congestion caused by + the connection sending too fast (i.e. the exit was spurious), and the + connection resumes slow start. If the RTT inflation persists + throughout CSS, the connection enters congestion avoidance. 4.2. Algorithm Details We assume that Appropriate Byte Counting (as described in [RFC3465]) - is in use and L is the cwnd increase limit. The choice of value of L - is up to the implementation. + is in use and L is the cwnd increase limit as discussed in RFC 3465. - A round is chosen to be approximately the Round-Trip Time (RTT). - Round can be approximated using sequence numbers as follows: + A round is chosen to be approximately the Round-Trip Time (RTT). We + recommend that rounds be measured using sequence numbers. Round can + be approximated using sequence numbers as follows: Define windowEnd as a sequence number initialize to SND.UNA When windowEnd is ACKed, the current round ends and windowEnd is set to SND.NXT - At the start of each round during normal slow start and CSS: + At the start of each round during standard slow start ([RFC5681]) and + CSS: lastRoundMinRTT = currentRoundMinRTT currentRoundMinRTT = infinity rttSampleCount = 0 For each arriving ACK in slow start, where N is the number of previously unacknowledged bytes acknowledged in the arriving ACK: Update the cwnd - cwnd = cwnd + min (N, L * SMSS) + - cwnd = cwnd + min (N, L * SMSS) Keep track of minimum observed RTT - currentRoundMinRTT = min(currentRoundMinRTT, currRTT) + - currentRoundMinRTT = min(currentRoundMinRTT, currRTT) - where currRTT is the RTT sampled from the incoming ACK + - where currRTT is the RTT sampled from the latest incoming ACK - rttSampleCount += 1 + - rttSampleCount += 1 For rounds where cwnd is at or higher than LOW_CWND and N_RTT_SAMPLE RTT samples have been obtained, check if delay increase triggers slow start exit - - if (cwnd >= (LOW_CWND * SMSS) AND rttSampleCount >= + - if (cwnd >= (LOW_CWND * SMSS) AND rttSampleCount >= N_RTT_SAMPLE) - RttThresh = clamp(MIN_RTT_THRESH, lastRoundMinRTT / 8, + + o RttThresh = clamp(MIN_RTT_THRESH, lastRoundMinRTT / 8, MAX_RTT_THRESH) - if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) + o if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) - cssBaselineMinRtt = currentRoundMinRTT + + cssBaselineMinRtt = currentRoundMinRTT - exit slow start and enter CSS + + exit slow start and enter CSS - CSS lasts CSS_ROUNDS rounds. If the transition into CSS happens in - the middle of a round, that partial round counts towards the limit. + CSS lasts at most CSS_ROUNDS rounds. If the transition into CSS + happens in the middle of a round, that partial round counts towards + the limit. For each arriving ACK in CSS, where N is the number of previously unacknowledged bytes acknowledged in the arriving ACK: Update the cwnd - cwnd = cwnd + (min (N, L * SMSS) / CSS_GROWTH_DIVISOR) + - cwnd = cwnd + (min (N, L * SMSS) / CSS_GROWTH_DIVISOR) Keep track of minimum observed RTT - currentRoundMinRTT = min(currentRoundMinRTT, currRTT) + - currentRoundMinRTT = min(currentRoundMinRTT, currRTT) - where currRTT is the sampled RTT from the incoming ACK + - where currRTT is the sampled RTT from the incoming ACK - rttSampleCount += 1 + - rttSampleCount += 1 For CSS rounds where N_RTT_SAMPLE RTT samples have been obtained, check if current round's minRTT drops below baseline indicating that HyStart exit was spurious. - if (currentRoundMinRTT < cssBaselineMinRtt) + - if (currentRoundMinRTT < cssBaselineMinRtt) - cssBaselineMinRtt = infinity + o cssBaselineMinRtt = infinity - resume slow start including HyStart++ + o resume slow start including HyStart++ If CSS_ROUNDS rounds are complete, enter congestion avoidance. - ssthresh = cwnd + * ssthresh = cwnd - If congestion is observed anytime during slow start or CSS, enter - congestion avoidance. + If loss or ECN-marking is observed anytime during standard slow start + or CSS, enter congestion avoidance. - ssthresh = cwnd + * ssthresh = cwnd 4.3. Tuning constants It is RECOMMENDED that a HyStart++ implementation use the following constants: - LOW_CWND = 16 + * LOW_CWND = 16 - MIN_RTT_THRESH = 4 msec + * MIN_RTT_THRESH = 4 msec - MAX_RTT_THRESH = 16 msec + * MAX_RTT_THRESH = 16 msec - N_RTT_SAMPLE = 8 + * N_RTT_SAMPLE = 8 - CSS_GROWTH_DIVISOR = 4 + * CSS_GROWTH_DIVISOR = 4 - CSS_ROUNDS = 5 + * CSS_ROUNDS = 5 These constants have been determined with lab measurements and real world deployments. An implementation MAY tune them for different network characteristics. Using smaller values of LOW_CWND will cause the algorithm to kick in before the last round RTT can be measured, particularly if the implementation uses an initial cwnd of 10 MSS. Higher values will delay the detection of delay increase and reduce the ability of HyStart++ to prevent overshoot problems. @@ -265,59 +264,58 @@ MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious exits from slow start. Larger values of MAX_RTT_THRESH may result in slow start not exiting until loss is encountered for connections on large RTT paths. A TCP implementation is required to take at least one RTT sample each round. Using lower values of N_RTT_SAMPLE will lower the accuracy of the measured RTT for the round; higher values will improve accuracy at the cost of more processing. - The minimum value of CSS_GROWTH_DIVISOR SHOULD be at least 2. - Otherwise the cwnd growth could again become too aggressive and cause - ideal send rate overshoot. Values larger than 4 will cause the - algorithm to be less aggressive and maybe less performant. + The minimum value of CSS_GROWTH_DIVISOR MUST be at least 2. A value + of 1 results in the same aggressive behavior as regular slow start. + Values larger than 4 will cause the algorithm to be less aggressive + and maybe less performant. Smaller values of CSS_ROUNDS may miss detecting jitter and larger values may limit performance. An implementation SHOULD use HyStart++ only for the initial slow start (when ssthresh is at its initial value of arbitrarily high per - [RFC5681]) and fall back to using traditional slow start for the remainder of the connection lifetime. This is acceptable because subsequent slow starts will use the discovered ssthresh value to exit slow start and avoid the overshoot problem. An implementation MAY use HyStart++ to grow the restart window ([RFC5681]) after a long idle period. 5. Deployments and Performance Evaluations - As of the time of writing, HyStart++ has been default enabled for all - TCP connections in Windows for two years. The original Hystart has - been default-enabled for all TCP connections in Linux TCP for a - decade. + As of the time of writing, HyStart++ draft 01 was default enabled for + all TCP connections in Windows for two years. The original Hystart + has been default-enabled for all TCP connections using the default + congestion control module CUBIC ([RFC8312]) for a decade. In lab measurements with Windows TCP, HyStart++ shows both goodput improvements as well as reductions in packet loss and retransmissions. For example across a variety of tests on a 100 Mbps link with a bottleneck buffer size of bandwidth-delay product, HyStart++ reduces bytes retransmitted by 50% and retransmission timeouts by 36%. - In an A/B test across a large Windows device population, out of 52 - billion TCP connections, 0.7% of connections move from 1 RTO to 0 - RTOs and another 0.7% connections move from 2 RTOs to 1 RTO with - HyStart++. This test did not focus on send heavy connections and the - impact on send heavy connections is likely much higher. We plan to - conduct more such production experiments to gather more data in the - future. + In an A/B test for HyStart++ draft 01 across a large Windows device + population, out of 52 billion TCP connections, 0.7% of connections + move from 1 RTO to 0 RTOs and another 0.7% connections move from 2 + RTOs to 1 RTO with HyStart++. This test did not focus on send heavy + connections and the impact on send heavy connections is likely much + higher. We plan to conduct more such production experiments to + gather more data in the future. 6. Security Considerations HyStart++ enhances slow start and inherits the general security considerations discussed in [RFC5681]. 7. IANA Considerations This document has no actions for IANA. @@ -340,27 +338,32 @@ 8.2. Informative References [HyStart] Ha, S. and I. Ree, "Hybrid Slow Start for High-Bandwidth and Long-Distance Networks", DOI 10.1145/1851182.1851192, International Workshop on Protocols for Fast Long-Distance Networks, 2008, . + [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and + R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", + RFC 8312, DOI 10.17487/RFC8312, February 2018, + . + Authors' Addresses Praveen Balasubramanian Microsoft One Microsoft Way Redmond, WA 98052 - USA + United States of America Phone: +1 425 538 2782 Email: pravb@microsoft.com Yi Huang Microsoft Phone: +1 425 703 0447 Email: huanyi@microsoft.com