draft-ietf-tcpm-hystartplusplus-01.txt   draft-ietf-tcpm-hystartplusplus-02.txt 
Network Working Group P. Balasubramanian Network Working Group P. Balasubramanian
Internet-Draft Y. Huang Internet-Draft Y. Huang
Intended status: Standards Track M. Olson Intended status: Standards Track M. Olson
Expires: July 10, 2021 Microsoft Expires: January 13, 2022 Microsoft
January 6, 2021 July 12, 2021
HyStart++: Modified Slow Start for TCP HyStart++: Modified Slow Start for TCP
draft-ietf-tcpm-hystartplusplus-01 draft-ietf-tcpm-hystartplusplus-02
Abstract Abstract
This doument describes HyStart++, a simple modification to the slow This doument describes HyStart++, a simple modification to the slow
start phase of TCP congestion control algorithms. Traditional slow start phase of TCP congestion control algorithms. Traditional slow
start can cause overshotting of the ideal send rate and cause large start can cause overshotting of the ideal send rate and cause large
packet loss within a round-trip time which results in poor packet loss within a round-trip time which results in poor
performance. HyStart++ combines the use of one variant of HyStart performance. HyStart++ is composed of the delay increase variant of
and Limited Slow Start (LSS) to prevent overshooting of the ideal HyStart to prevent overshooting of the ideal sending rate, while also
sending rate, while also mitigating poor performance which can result mitigating poor performance which can result from false positives.
from false positives when HyStart is used alone.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 10, 2021. This Internet-Draft will expire on January 13, 2022.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 15 skipping to change at page 2, line 14
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3
4. HyStart++ Algorithm . . . . . . . . . . . . . . . . . . . . . 3 4. HyStart++ Algorithm . . . . . . . . . . . . . . . . . . . . . 3
4.1. Use of HyStart Delay Increase and Limited Slow Start . . 3 4.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 3
4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 4 4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 4
4.3. Tuning constants . . . . . . . . . . . . . . . . . . . . 5 4.3. Tuning constants . . . . . . . . . . . . . . . . . . . . 6
5. Deployments and Performance Evaluations . . . . . . . . . . . 6 5. Deployments and Performance Evaluations . . . . . . . . . . . 7
6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 7
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 8.1. Normative References . . . . . . . . . . . . . . . . . . 7
9.1. Normative References . . . . . . . . . . . . . . . . . . 7 8.2. Informative References . . . . . . . . . . . . . . . . . 8
9.2. Informative References . . . . . . . . . . . . . . . . . 7
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8
1. Introduction 1. Introduction
[RFC5681] describes the slow start congestion control algorithm for [RFC5681] describes the slow start congestion control algorithm for
TCP. The slow start algorithm is used when the congestion window TCP. The slow start algorithm is used when the congestion window
(cwnd) is less than the slow start threshold (ssthresh). During slow (cwnd) is less than the slow start threshold (ssthresh). During slow
start, in absence of packet loss signals, TCP sender increases cwnd start, in absence of packet loss signals, TCP sender increases cwnd
exponentially to probe the network capacity. Such a fast growth can exponentially to probe the network capacity. Such a fast growth can
lead to overshooting the ideal sending rate and cause significant lead to overshooting the ideal sending rate and cause significant
skipping to change at page 2, line 47 skipping to change at page 2, line 45
TCP has several mechanisms for loss recovery, but they are only TCP has several mechanisms for loss recovery, but they are only
effective for moderate loss. When these techniques are unable to effective for moderate loss. When these techniques are unable to
recover lost packets, a last-resort retransmission timeout (RTO) is recover lost packets, a last-resort retransmission timeout (RTO) is
used to trigger packet recovery. In most operating systems, the used to trigger packet recovery. In most operating systems, the
minimum RTO is set to a large value (200 msec or 300 msec) to prevent minimum RTO is set to a large value (200 msec or 300 msec) to prevent
spurious timeouts. This results in a long idle time which spurious timeouts. This results in a long idle time which
drastically impairs flow completion times. drastically impairs flow completion times.
HyStart++ adds delay increase as a signal to exit slow start before HyStart++ adds delay increase as a signal to exit slow start before
any packet loss occurs. This is one of two algorithms specified in any packet loss occurs. This is one of two algorithms specified in
[HyStart]. After the HyStart delay algorithm finds an exit point, [HyStart]. After the HyStart delay algorithm finds an exit point, a
LSS is used in conjunction with congestion avoidance for further Conservative Slow Start (CSS) phase is used to determine if the slow
congestion window increases until the first packet loss is detected. start exit was spurious. This provides protection against jitter and
HyStart++ reduces packet loss and retransmissions, and improves prevents pefrormance problems that result from early slow start exit
goodput in lab measurements as well as real world deployments. due to false positives. HyStart++ reduces packet loss and
retransmissions, and improves goodput in lab measurements as well as
real world deployments.
2. Terminology 2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
3. Definitions 3. Definitions
We repeat here some definition from [RFC5681] to aid the reader. We repeat here some definition from [RFC5681] to aid the reader.
skipping to change at page 3, line 37 skipping to change at page 3, line 37
RECEIVER WINDOW (rwnd): The most recently advertised receiver window. RECEIVER WINDOW (rwnd): The most recently advertised receiver window.
CONGESTION WINDOW (cwnd): A TCP state variable that limits the amount CONGESTION WINDOW (cwnd): A TCP state variable that limits the amount
of data a TCP can send. At any given time, a TCP MUST NOT send data of data a TCP can send. At any given time, a TCP MUST NOT send data
with a sequence number higher than the sum of the highest with a sequence number higher than the sum of the highest
acknowledged sequence number and the minimum of cwnd and rwnd. acknowledged sequence number and the minimum of cwnd and rwnd.
4. HyStart++ Algorithm 4. HyStart++ Algorithm
4.1. Use of HyStart Delay Increase and Limited Slow Start 4.1. Summary
[HyStart] specifies two algorithms (a "Delay Increase" algorithm and [HyStart] specifies two algorithms (a "Delay Increase" algorithm and
an "Inter-Packet Arrival" algorithm) to be run in parallel to detect an "Inter-Packet Arrival" algorithm) to be run in parallel to detect
that the sending rate has reached capacity. In practice, the Inter- that the sending rate has reached capacity. In practice, the Inter-
Packet Arrival algorithm does not perform well and is not able to Packet Arrival algorithm does not perform well and is not able to
detect congestion early, primarily due to ACK compression. The idea detect congestion early, primarily due to ACK compression. The idea
of the Delay Increase algorithm is to look for RTT spikes, which of the Delay Increase algorithm is to look for RTT spikes, which
suggest that the bottleneck buffer is filling up. suggest that the bottleneck buffer is filling up.
After the HyStart "Delay Increase" algorithm triggers an exit from In HyStart++, a TCP sender uses traditional slow start and then uses
slow start, LSS (described in [RFC3742]) is used to increase Cwnd the "Delay Increase" algorithm to trigger an exit from slow start.
until congestion is observed. LSS is used because the HyStart exit But instead of using a congestion avoidance algorithm, the sender
is often premature as a result of RTT fluctuations or transient queue uses a Conservative Slow Start (CSS) algorithm to determine if the
buildup. LSS grows the cwnd fast but much slower than traditional exit was spurious. If the exit is determined to be spurious, slow
slow start. LSS helps avoid massive packet losses and subsequent start is resumed. If the exit is determined to be not spurious, the
time spent in loss recovery or retransmission timeout. sender enters congestion avoidance.
4.2. Algorithm Details 4.2. Algorithm Details
We assume that Appropriate Byte Counting (as described in [RFC3465]) We assume that Appropriate Byte Counting (as described in [RFC3465])
is in use and L is the cwnd increase limit. The choice of value of L is in use and L is the cwnd increase limit. The choice of value of L
is up to the implementation. is up to the implementation.
A round is chosen to be approximately the Round-Trip Time (RTT). A round is chosen to be approximately the Round-Trip Time (RTT).
Round can be approximated using sequence numbers as follows: Round can be approximated using sequence numbers as follows:
Define windowEnd as a sequence number initialize to SND.UNA Define windowEnd as a sequence number initialize to SND.UNA
When windowEnd is ACKed, the current round ends and windowEnd is When windowEnd is ACKed, the current round ends and windowEnd is
set to SND.NXT set to SND.NXT
At the start of each round during slow start: At the start of each round during normal slow start and CSS:
lastRoundMinRTT = currentRoundMinRTT lastRoundMinRTT = currentRoundMinRTT
currentRoundMinRTT = infinity currentRoundMinRTT = infinity
rttSampleCount = 0 rttSampleCount = 0
For each arriving ACK in slow start, where N is the number of For each arriving ACK in slow start, where N is the number of
previously unacknowledged bytes acknowledged in the arriving ACK and previously unacknowledged bytes acknowledged in the arriving ACK:
w:
Update the cwnd Update the cwnd
cwnd = cwnd + min (N, L * SMSS) cwnd = cwnd + min (N, L * SMSS)
Keep track of minimum observed RTT Keep track of minimum observed RTT
currentRoundMinRTT = min(currentRoundMinRTT, currRTT) currentRoundMinRTT = min(currentRoundMinRTT, currRTT)
where currRTT is the measured RTT based on the incoming ACK where currRTT is the RTT sampled from the incoming ACK
rttSampleCount += 1 rttSampleCount += 1
For rounds where cwnd is at or higher than LOW_CWND and For rounds where cwnd is at or higher than LOW_CWND and
N_RTT_SAMPLE RTT samples have been obtained, check if delay N_RTT_SAMPLE RTT samples have been obtained, check if delay
increase triggers slow start exit increase triggers slow start exit
if (cwnd >= (LOW_CWND * SMSS) AND rttSampleCount >= if (cwnd >= (LOW_CWND * SMSS) AND rttSampleCount >=
N_RTT_SAMPLE) N_RTT_SAMPLE)
RttThresh = clamp(MIN_RTT_THRESH, lastRoundMinRTT / 8, RttThresh = clamp(MIN_RTT_THRESH, lastRoundMinRTT / 8,
MAX_RTT_THRESH) MAX_RTT_THRESH)
if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh))
ssthresh = cwnd cssBaselineMinRtt = currentRoundMinRTT
exit slow start and enter LSS exit slow start and enter CSS
For each arriving ACK in LSS, where N is the number of previously CSS lasts CSS_ROUNDS rounds. If the transition into CSS happens in
the middle of a round, that partial round counts towards the limit.
For each arriving ACK in CSS, where N is the number of previously
unacknowledged bytes acknowledged in the arriving ACK: unacknowledged bytes acknowledged in the arriving ACK:
K = cwnd / (LSS_DIVISOR * ssthresh) Update the cwnd
cwnd = max(cwnd + (min (N, L * SMSS) / K), CA_cwnd()) cwnd = cwnd + (min (N, L * SMSS) / CSS_GROWTH_DIVISOR)
CA_cwnd() denotes the cwnd that a congestion control algorithm would Keep track of minimum observed RTT
have increased to if congestion avoidance started instead of LSS.
LSS grows cwnd very fast but for long-lived flows in high BDP
networks, the congestion avoidance algorithm could increase cwnd much
faster. For example, CUBIC congestion avoidance [RFC8312] in convex
region can ramp up cwnd rapidly. Taking the max can help improve
performance when exiting slow start prematurely.
HyStart++ ends when congestion is observed. currentRoundMinRTT = min(currentRoundMinRTT, currRTT)
where currRTT is the sampled RTT from the incoming ACK
rttSampleCount += 1
For CSS rounds where N_RTT_SAMPLE RTT samples have been obtained,
check if current round's minRTT drops below baseline indicating
that HyStart exit was spurious.
if (currentRoundMinRTT < cssBaselineMinRtt)
cssBaselineMinRtt = infinity
resume slow start including HyStart++
If CSS_ROUNDS rounds are complete, enter congestion avoidance.
ssthresh = cwnd
If congestion is observed anytime during slow start or CSS, enter
congestion avoidance.
ssthresh = cwnd
4.3. Tuning constants 4.3. Tuning constants
It is RECOMMENDED that a HyStart++ implementation use the following It is RECOMMENDED that a HyStart++ implementation use the following
constants: constants:
LOW_CWND = 16 LOW_CWND = 16
MIN_RTT_THRESH = 4 msec MIN_RTT_THRESH = 4 msec
MAX_RTT_THRESH = 16 msec MAX_RTT_THRESH = 16 msec
LSS_DIVISOR = 0.25
N_RTT_SAMPLE = 8 N_RTT_SAMPLE = 8
CSS_GROWTH_DIVISOR = 4
CSS_ROUNDS = 5
These constants have been determined with lab measurements and real These constants have been determined with lab measurements and real
world deployments. An implementation MAY tune them for different world deployments. An implementation MAY tune them for different
network characteristics. network characteristics.
Using smaller values of LOW_CWND will cause the algorithm to kick in Using smaller values of LOW_CWND will cause the algorithm to kick in
before the last round RTT can be measured, particularly if the before the last round RTT can be measured, particularly if the
implementation uses an initial cwnd of 10 MSS. Higher values will implementation uses an initial cwnd of 10 MSS. Higher values will
delay the detection of delay increase and reduce the ability of delay the detection of delay increase and reduce the ability of
HyStart++ to prevent overshoot problems. HyStart++ to prevent overshoot problems.
skipping to change at page 6, line 18 skipping to change at page 6, line 43
MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious
exits from slow start. Larger values of MAX_RTT_THRESH may result in exits from slow start. Larger values of MAX_RTT_THRESH may result in
slow start not exiting until loss is encountered for connections on slow start not exiting until loss is encountered for connections on
large RTT paths. large RTT paths.
A TCP implementation is required to take at least one RTT sample each A TCP implementation is required to take at least one RTT sample each
round. Using lower values of N_RTT_SAMPLE will lower the accuracy of round. Using lower values of N_RTT_SAMPLE will lower the accuracy of
the measured RTT for the round; higher values will improve accuracy the measured RTT for the round; higher values will improve accuracy
at the cost of more processing. at the cost of more processing.
The maximum value of LSS_DIVISOR SHOULD NOT exceed 0.5, which is the The minimum value of CSS_GROWTH_DIVISOR SHOULD be at least 2.
value recommended in [RFC3742]. Otherwise the cwnd growth could Otherwise the cwnd growth could again become too aggressive and cause
again become too aggressive and cause ideal send rate overshoot. ideal send rate overshoot. Values larger than 4 will cause the
Smaller values will cause the algorithm to be less aggressive and may algorithm to be less aggressive and maybe less performant.
leave some cwnd growth on the table.
Smaller values of CSS_ROUNDS may miss detecting jitter and larger
values may limit performance.
An implementation SHOULD use HyStart++ only for the initial slow An implementation SHOULD use HyStart++ only for the initial slow
start and fall back to using traditional slow start for the remainder start (when ssthresh is at its initial value of arbitrarily high per
of the connection lifetime. This is acceptable because subsequent
slow starts will use the discovered ssthresh value to exit slow [RFC5681]) and fall back to using traditional slow start for the
start. An implementation MAY use HyStart++ to grow the restart remainder of the connection lifetime. This is acceptable because
window ([RFC5681]) after a long idle period. subsequent slow starts will use the discovered ssthresh value to exit
slow start and avoid the overshoot problem. An implementation MAY
use HyStart++ to grow the restart window ([RFC5681]) after a long
idle period.
5. Deployments and Performance Evaluations 5. Deployments and Performance Evaluations
As of the time of writing, HyStart++ has been default enabled for all As of the time of writing, HyStart++ has been default enabled for all
TCP connections in Windows for two years. The original Hystart has TCP connections in Windows for two years. The original Hystart has
been default-enabled for all TCP connections in Linux TCP for a been default-enabled for all TCP connections in Linux TCP for a
decade. decade.
In lab measurements with Windows TCP, HyStart++ shows both goodput In lab measurements with Windows TCP, HyStart++ shows both goodput
improvements as well as reductions in packet loss and improvements as well as reductions in packet loss and
skipping to change at page 7, line 14 skipping to change at page 7, line 43
6. Security Considerations 6. Security Considerations
HyStart++ enhances slow start and inherits the general security HyStart++ enhances slow start and inherits the general security
considerations discussed in [RFC5681]. considerations discussed in [RFC5681].
7. IANA Considerations 7. IANA Considerations
This document has no actions for IANA. This document has no actions for IANA.
8. Acknowledgements 8. References
Neal Cardwell suggested the idea of using the maximum of cwnd value
computed by LSS and congestion avoidance after exiting slow start.
9. References
9.1. Normative References 8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte
Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February
2003, <https://www.rfc-editor.org/info/rfc3465>. 2003, <https://www.rfc-editor.org/info/rfc3465>.
[RFC3742] Floyd, S., "Limited Slow-Start for TCP with Large
Congestion Windows", RFC 3742, DOI 10.17487/RFC3742, March
2004, <https://www.rfc-editor.org/info/rfc3742>.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
<https://www.rfc-editor.org/info/rfc5681>. <https://www.rfc-editor.org/info/rfc5681>.
9.2. Informative References 8.2. Informative References
[HyStart] Ha, S. and I. Ree, "Hybrid Slow Start for High-Bandwidth [HyStart] Ha, S. and I. Ree, "Hybrid Slow Start for High-Bandwidth
and Long-Distance Networks", and Long-Distance Networks",
DOI 10.1145/1851182.1851192, International Workshop on DOI 10.1145/1851182.1851192, International Workshop on
Protocols for Fast Long-Distance Networks, 2008, Protocols for Fast Long-Distance Networks, 2008,
<https://pdfs.semanticscholar.org/25e9/ <https://pdfs.semanticscholar.org/25e9/
ef3f03315782c7f1cbcd31b587857adae7d1.pdf>. ef3f03315782c7f1cbcd31b587857adae7d1.pdf>.
[RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
RFC 8312, DOI 10.17487/RFC8312, February 2018,
<https://www.rfc-editor.org/info/rfc8312>.
Authors' Addresses Authors' Addresses
Praveen Balasubramanian Praveen Balasubramanian
Microsoft Microsoft
One Microsoft Way One Microsoft Way
Redmond, WA 98052 Redmond, WA 98052
USA USA
Phone: +1 425 538 2782 Phone: +1 425 538 2782
Email: pravb@microsoft.com Email: pravb@microsoft.com
 End of changes. 29 change blocks. 
74 lines changed or deleted 86 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/