draft-ietf-tcpm-hystartplusplus-03.txt | draft-ietf-tcpm-hystartplusplus-04.txt | |||
---|---|---|---|---|
Network Working Group P. Balasubramanian | Network Working Group P. Balasubramanian | |||
Internet-Draft Y. Huang | Internet-Draft Y. Huang | |||
Intended status: Standards Track M. Olson | Intended status: Standards Track M. Olson | |||
Expires: 26 January 2022 Microsoft | Expires: 27 July 2022 Microsoft | |||
25 July 2021 | 23 January 2022 | |||
HyStart++: Modified Slow Start for TCP | HyStart++: Modified Slow Start for TCP | |||
draft-ietf-tcpm-hystartplusplus-03 | draft-ietf-tcpm-hystartplusplus-04 | |||
Abstract | Abstract | |||
This doument describes HyStart++, a simple modification to the slow | This doument describes HyStart++, a simple modification to the slow | |||
start phase of TCP congestion control algorithms. Traditional slow | start phase of TCP congestion control algorithms. Traditional slow | |||
start can cause overshooting of the ideal send rate and cause large | start can overshoot the ideal send rate in many cases, causing high | |||
packet loss within a round-trip time which results in poor | packet loss and poor performance. HyStart++ uses a delay increase | |||
performance. HyStart++ uses a delay increase heuristic to exit slow | heuristic to find an exit point before possible overshoot. It also | |||
start early while also mitigating poor performance which can result | adds a mitigation to prevent jitter from causing premature slow start | |||
from false positives. | exit. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on 26 January 2022. | This Internet-Draft will expire on 27 July 2022. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
extracted from this document must include Simplified BSD License text | extracted from this document must include Revised BSD License text as | |||
as described in Section 4.e of the Trust Legal Provisions and are | described in Section 4.e of the Trust Legal Provisions and are | |||
provided without warranty as described in the Simplified BSD License. | provided without warranty as described in the Revised BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
4. HyStart++ Algorithm . . . . . . . . . . . . . . . . . . . . . 3 | 4. HyStart++ Algorithm . . . . . . . . . . . . . . . . . . . . . 3 | |||
4.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 4.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 4 | 4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 4 | |||
4.3. Tuning constants . . . . . . . . . . . . . . . . . . . . 6 | 4.3. Tuning constants . . . . . . . . . . . . . . . . . . . . 6 | |||
skipping to change at page 2, line 39 ¶ | skipping to change at page 2, line 39 ¶ | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
1. Introduction | 1. Introduction | |||
[RFC5681] describes the slow start congestion control algorithm for | [RFC5681] describes the slow start congestion control algorithm for | |||
TCP. The slow start algorithm is used when the congestion window | TCP. The slow start algorithm is used when the congestion window | |||
(cwnd) is less than the slow start threshold (ssthresh). During slow | (cwnd) is less than the slow start threshold (ssthresh). During slow | |||
start, in absence of packet loss signals, TCP increases cwnd | start, in absence of packet loss signals, TCP increases cwnd | |||
exponentially to probe the network capacity. This fast growth can | exponentially to probe the network capacity. This fast growth can | |||
overshoot the ideal sending rate and cause significant packet loss | overshoot the ideal sending rate and cause significant packet loss | |||
which cannot always be recovered efficiently, impairing flow | which cannot always be recovered efficiently. | |||
completion time. | ||||
HyStart++ first uses delay increase as a signal to exit slow start | HyStart++ uses delay increase as a signal to exit slow start before | |||
before any packet loss occurs. This is one of two algorithms | potential packet loss occurs as a result of overshoot. This is one | |||
specified in [HyStart]. After the HyStart delay algorithm finds an | of two algorithms specified in [HyStart]. After the slow start exit, | |||
exit point, a novel Conservative Slow Start (CSS) phase is used to | a novel Conservative Slow Start (CSS) phase is used to determine | |||
determine whether the slow start exit was spurious. This provides | whether the slow start exit was premature and to resume slow start. | |||
protection against jitter and prevents performance problems that | This mitigation improves performance in presence of jitter. | |||
result from early slow start exit due to false positives. HyStart++ | HyStart++ reduces packet loss and retransmissions, and improves | |||
reduces packet loss and retransmissions, and improves goodput in lab | goodput in lab measurements and real world deployments. | |||
measurements as well as real world deployments. | ||||
2. Terminology | 2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
3. Definitions | 3. Definitions | |||
We repeat here some definition from [RFC5681] to aid the reader. | We repeat here some definition from [RFC5681] to aid the reader. | |||
skipping to change at page 3, line 44 ¶ | skipping to change at page 3, line 44 ¶ | |||
4. HyStart++ Algorithm | 4. HyStart++ Algorithm | |||
4.1. Summary | 4.1. Summary | |||
[HyStart] specifies two algorithms (a "Delay Increase" algorithm and | [HyStart] specifies two algorithms (a "Delay Increase" algorithm and | |||
an "Inter-Packet Arrival" algorithm) to be run in parallel to detect | an "Inter-Packet Arrival" algorithm) to be run in parallel to detect | |||
that the sending rate has reached capacity. In practice, the Inter- | that the sending rate has reached capacity. In practice, the Inter- | |||
Packet Arrival algorithm does not perform well and is not able to | Packet Arrival algorithm does not perform well and is not able to | |||
detect congestion early, primarily due to ACK compression. The idea | detect congestion early, primarily due to ACK compression. The idea | |||
of the Delay Increase algorithm is to look for RTT spikes, which | of the Delay Increase algorithm is to look for spikes in RTT (round- | |||
suggest that the bottleneck buffer is filling up. | trip time), which suggest that the bottleneck buffer is filling up. | |||
In HyStart++, a TCP sender uses traditional slow start and then uses | In HyStart++, a TCP sender uses traditional slow start and then uses | |||
the "Delay Increase" algorithm to trigger an exit from slow start. | the "Delay Increase" algorithm to trigger an exit from slow start. | |||
But instead of going straight from slow start to congestion | But instead of going straight from slow start to congestion | |||
avoidance, the sender spends a number of RTTs in a Conservative Slow | avoidance, the sender spends a number of RTTs in a Conservative Slow | |||
Start (CSS) phase to determine whether the exit was spurious. During | Start (CSS) phase to determine whether the exit from slow start was | |||
CSS, the congestion window is grown exponentially like in regular | premature. During CSS, the congestion window is grown exponentially | |||
slow start, but with a smaller exponential base, resulting in less | like in regular slow start, but with a smaller exponential base, | |||
aggressive growth. If the RTT shrinks at any time during CSS, it's | resulting in less aggressive growth. If the RTT reduces during CSS, | |||
concluded that the RTT spike was not related to congestion caused by | it's concluded that the RTT spike was not related to congestion | |||
the connection sending too fast (i.e. the exit was spurious), and the | caused by the connection sending at a rate greater than the ideal | |||
connection resumes slow start. If the RTT inflation persists | send rate, and the connection resumes slow start. If the RTT | |||
throughout CSS, the connection enters congestion avoidance. | inflation persists throughout CSS, the connection enters congestion | |||
avoidance. | ||||
4.2. Algorithm Details | 4.2. Algorithm Details | |||
We assume that Appropriate Byte Counting (as described in [RFC3465]) | For the pseudocode, we assume that Appropriate Byte Counting (as | |||
is in use and L is the cwnd increase limit as discussed in RFC 3465. | described in [RFC3465]) is in use and L is the cwnd increase limit as | |||
discussed in RFC 3465. | ||||
A round is chosen to be approximately the Round-Trip Time (RTT). We | lastRoundMinRTT and currentRoundMinRTT are initialized to infinity at | |||
recommend that rounds be measured using sequence numbers. Round can | the initialization time | |||
be approximated using sequence numbers as follows: | ||||
Define windowEnd as a sequence number initialize to SND.UNA | Hystart++ measures rounds using sequence numbers, as follows: | |||
Define windowEnd as a sequence number initialized to SND.UNA | ||||
When windowEnd is ACKed, the current round ends and windowEnd is | When windowEnd is ACKed, the current round ends and windowEnd is | |||
set to SND.NXT | set to SND.NXT | |||
At the start of each round during standard slow start ([RFC5681]) and | At the start of each round during standard slow start ([RFC5681]) and | |||
CSS: | CSS: | |||
lastRoundMinRTT = currentRoundMinRTT | lastRoundMinRTT = currentRoundMinRTT | |||
currentRoundMinRTT = infinity | currentRoundMinRTT = infinity | |||
skipping to change at page 4, line 48 ¶ | skipping to change at page 5, line 4 ¶ | |||
- cwnd = cwnd + min (N, L * SMSS) | - cwnd = cwnd + min (N, L * SMSS) | |||
Keep track of minimum observed RTT | Keep track of minimum observed RTT | |||
- currentRoundMinRTT = min(currentRoundMinRTT, currRTT) | - currentRoundMinRTT = min(currentRoundMinRTT, currRTT) | |||
- where currRTT is the RTT sampled from the latest incoming ACK | - where currRTT is the RTT sampled from the latest incoming ACK | |||
- rttSampleCount += 1 | - rttSampleCount += 1 | |||
For rounds where N_RTT_SAMPLE RTT samples have been obtained and | ||||
For rounds where cwnd is at or higher than LOW_CWND and | currentRoundMinRTT and lastRoundMinRTT are valid, check if delay | |||
N_RTT_SAMPLE RTT samples have been obtained, check if delay | ||||
increase triggers slow start exit | increase triggers slow start exit | |||
- if (cwnd >= (LOW_CWND * SMSS) AND rttSampleCount >= | ||||
N_RTT_SAMPLE) | - if (rttSampleCount >= N_RTT_SAMPLE AND currentRoundMinRTT != | |||
infinity AND lastRoundMinRTT != infinity) | ||||
o RttThresh = clamp(MIN_RTT_THRESH, lastRoundMinRTT / 8, | o RttThresh = clamp(MIN_RTT_THRESH, lastRoundMinRTT / 8, | |||
MAX_RTT_THRESH) | MAX_RTT_THRESH) | |||
o if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) | o if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) | |||
+ cssBaselineMinRtt = currentRoundMinRTT | + cssBaselineMinRtt = currentRoundMinRTT | |||
+ exit slow start and enter CSS | + exit slow start and enter CSS | |||
skipping to change at page 6, line 12 ¶ | skipping to change at page 6, line 14 ¶ | |||
If loss or ECN-marking is observed anytime during standard slow start | If loss or ECN-marking is observed anytime during standard slow start | |||
or CSS, enter congestion avoidance. | or CSS, enter congestion avoidance. | |||
* ssthresh = cwnd | * ssthresh = cwnd | |||
4.3. Tuning constants | 4.3. Tuning constants | |||
It is RECOMMENDED that a HyStart++ implementation use the following | It is RECOMMENDED that a HyStart++ implementation use the following | |||
constants: | constants: | |||
* LOW_CWND = 16 | ||||
* MIN_RTT_THRESH = 4 msec | * MIN_RTT_THRESH = 4 msec | |||
* MAX_RTT_THRESH = 16 msec | * MAX_RTT_THRESH = 16 msec | |||
* N_RTT_SAMPLE = 8 | * N_RTT_SAMPLE = 8 | |||
* CSS_GROWTH_DIVISOR = 4 | * CSS_GROWTH_DIVISOR = 4 | |||
* CSS_ROUNDS = 5 | * CSS_ROUNDS = 5 | |||
These constants have been determined with lab measurements and real | These constants have been determined with lab measurements and real | |||
world deployments. An implementation MAY tune them for different | world deployments. An implementation MAY tune them for different | |||
network characteristics. | network characteristics. | |||
Using smaller values of LOW_CWND will cause the algorithm to kick in | ||||
before the last round RTT can be measured, particularly if the | ||||
implementation uses an initial cwnd of 10 MSS. Higher values will | ||||
delay the detection of delay increase and reduce the ability of | ||||
HyStart++ to prevent overshoot problems. | ||||
The delay increase sensitivity is determined by MIN_RTT_THRESH and | The delay increase sensitivity is determined by MIN_RTT_THRESH and | |||
MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious | MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious | |||
exits from slow start. Larger values of MAX_RTT_THRESH may result in | exits from slow start. Larger values of MAX_RTT_THRESH may result in | |||
slow start not exiting until loss is encountered for connections on | slow start not exiting until loss is encountered for connections on | |||
large RTT paths. | large RTT paths. | |||
A TCP implementation is required to take at least one RTT sample each | A TCP implementation is required to take at least one RTT sample each | |||
round. Using lower values of N_RTT_SAMPLE will lower the accuracy of | round. Using lower values of N_RTT_SAMPLE will lower the accuracy of | |||
the measured RTT for the round; higher values will improve accuracy | the measured RTT for the round; higher values will improve accuracy | |||
at the cost of more processing. | at the cost of more processing. | |||
skipping to change at page 7, line 16 ¶ | skipping to change at page 7, line 10 ¶ | |||
start (when ssthresh is at its initial value of arbitrarily high per | start (when ssthresh is at its initial value of arbitrarily high per | |||
[RFC5681]) and fall back to using traditional slow start for the | [RFC5681]) and fall back to using traditional slow start for the | |||
remainder of the connection lifetime. This is acceptable because | remainder of the connection lifetime. This is acceptable because | |||
subsequent slow starts will use the discovered ssthresh value to exit | subsequent slow starts will use the discovered ssthresh value to exit | |||
slow start and avoid the overshoot problem. An implementation MAY | slow start and avoid the overshoot problem. An implementation MAY | |||
use HyStart++ to grow the restart window ([RFC5681]) after a long | use HyStart++ to grow the restart window ([RFC5681]) after a long | |||
idle period. | idle period. | |||
5. Deployments and Performance Evaluations | 5. Deployments and Performance Evaluations | |||
As of the time of writing, HyStart++ draft 01 was default enabled for | As of the time of writing, HyStart++ as described in draft versions | |||
all TCP connections in Windows for two years. The original Hystart | 01 through 04 was default enabled for all TCP connections in the | |||
has been default-enabled for all TCP connections using the default | Windows operating system for over three years. The original Hystart | |||
congestion control module CUBIC ([RFC8312]) for a decade. | has been default-enabled for all TCP connections in the Linux | |||
operating system using the default congestion control module CUBIC | ||||
([RFC8312]) for a decade. | ||||
In lab measurements with Windows TCP, HyStart++ shows both goodput | In lab measurements with Windows TCP, HyStart++ shows both goodput | |||
improvements as well as reductions in packet loss and | improvements as well as reductions in packet loss and | |||
retransmissions. For example across a variety of tests on a 100 Mbps | retransmissions. For example across a variety of tests on a 100 Mbps | |||
link with a bottleneck buffer size of bandwidth-delay product, | link with a bottleneck buffer size of bandwidth-delay product, | |||
HyStart++ reduces bytes retransmitted by 50% and retransmission | HyStart++ reduces bytes retransmitted by 50% and retransmission | |||
timeouts by 36%. | timeouts by 36%. | |||
In an A/B test for HyStart++ draft 01 across a large Windows device | In an A/B test for HyStart++ draft 01 across a large Windows device | |||
population, out of 52 billion TCP connections, 0.7% of connections | population, out of 52 billion TCP connections, 0.7% of connections | |||
End of changes. 18 change blocks. | ||||
57 lines changed or deleted | 52 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |