< draft-fairhurst-tsvwg-cc-00.txt   draft-fairhurst-tsvwg-cc-01.txt >
Internet Engineering Task Force G. Fairhurst Internet Engineering Task Force G. Fairhurst
Internet-Draft University of Aberdeen Internet-Draft University of Aberdeen
Intended status: Standards Track July 01, 2019 Intended status: Standards Track July 05, 2019
Expires: January 2, 2020 Expires: January 6, 2020
Guidelines for Internet Congestion Control at Endpoints Guidelines for Internet Congestion Control at Endpoints
draft-fairhurst-tsvwg-cc-00 draft-fairhurst-tsvwg-cc-01
Abstract Abstract
This document provides guidance on the design of methods to avoid This document provides guidance on the design of methods to avoid
congestion collapse and to provide congestion control. congestion collapse and to provide congestion control.
Recommendations and requirements on this topic are distributed across Recommendations and requirements on this topic are distributed across
many documents in the RFC series. It seeks to gather and consolidate many documents in the RFC series. It seeks to gather and consolidate
these recommendations. This is intended to provide input to the these recommendations. This is intended to provide input to the
design of new congestion control methods in protocols, such as IETF design of new congestion control methods in protocols, such as IETF
QUIC. QUIC.
skipping to change at page 1, line 38 skipping to change at page 1, line 38
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 2, 2020. This Internet-Draft will expire on January 6, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 16 skipping to change at page 2, line 16
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Principles of Congestion Control . . . . . . . . . . . . . . 3 3. Principles of Congestion Control . . . . . . . . . . . . . . 3
3.1. A Diversity of Path Characteristics . . . . . . . . . . . 3 3.1. A Diversity of Path Characteristics . . . . . . . . . . . 3
3.2. Flow Multiplexing and Congestion . . . . . . . . . . . . 4 3.2. Flow Multiplexing and Congestion . . . . . . . . . . . . 4
3.3. Avoiding Congestion Collapse . . . . . . . . . . . . . . 6 3.3. Avoiding Congestion Collapse . . . . . . . . . . . . . . 5
4. Guidelines for performing Congestion Control . . . . . . . . 6 4. Guidelines for Performing Congestion Control . . . . . . . . 6
4.1. Connection Initialization . . . . . . . . . . . . . . . . 6 4.1. Connection Initialization . . . . . . . . . . . . . . . . 6
4.2. Timers and Retranmission . . . . . . . . . . . . . . . . 8 4.2. Using Path Capacity . . . . . . . . . . . . . . . . . . . 7
4.3. Using Path Capacity . . . . . . . . . . . . . . . . . . . 9 4.3. Timers and Retransmission . . . . . . . . . . . . . . . . 9
4.4. Responding to Potential Congestion . . . . . . . . . . . 10 4.4. Responding to Potential Congestion . . . . . . . . . . . 10
4.5. Using More Capacity . . . . . . . . . . . . . . . . . . . 11 4.5. Using More Capacity . . . . . . . . . . . . . . . . . . . 11
4.6. Network Signals . . . . . . . . . . . . . . . . . . . . . 12 4.6. Network Signals . . . . . . . . . . . . . . . . . . . . . 12
4.7. Protection of Protocol Mechanisms . . . . . . . . . . . . 13 4.7. Protection of Protocol Mechanisms . . . . . . . . . . . . 13
5. IETF guidelines on evaluation of Congestion Control . . . . . 13 5. IETF Guidelines on Evaluation of Congestion Control . . . . . 13
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.1. Normative References . . . . . . . . . . . . . . . . . . 14 9.1. Normative References . . . . . . . . . . . . . . . . . . 14
9.2. Informative References . . . . . . . . . . . . . . . . . 15 9.2. Informative References . . . . . . . . . . . . . . . . . 15
Appendix A. Revision Notes . . . . . . . . . . . . . . . . . . . 17 Appendix A. Revision Notes . . . . . . . . . . . . . . . . . . . 17
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 17 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 17
1. Introduction 1. Introduction
The IETF has specified Internet transports (e.g., TCP [ID.ietf-tcpm- The IETF has specified Internet transports (e.g., TCP [ID.ietf-tcpm-
skipping to change at page 3, line 11 skipping to change at page 3, line 11
many documents in the RFC series. This document seeks to gather and many documents in the RFC series. This document seeks to gather and
consolidate these recommendations. This is intended to provide input consolidate these recommendations. This is intended to provide input
to the design of new congestion control methods in protocols. to the design of new congestion control methods in protocols.
2. Terminology 2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
Other terminology is directly copied the cited RFCs. The path between endpoints (sometimes called "Internet Hosts")
consists of the endpoint protocol stack at the sender and receiver
(which implements the transport service), and a succession of links
and network devices (routers or middleboxes) that provide
connectivity across the network. The set of network devices forming
the path is not usually fixed, and it should generally be assumed
that this set can change over arbitrary lengths of time.
Other terminology is directly copied from the cited RFCs.
3. Principles of Congestion Control 3. Principles of Congestion Control
This section describes principles for providing congestion control. This section summarises the principles for providing congestion
control, and provides the background for section Section 4.
3.1. A Diversity of Path Characteristics 3.1. A Diversity of Path Characteristics
The path between endpoints (sometimes called "internet Hosts")
consists of the endpoint protocol stack (implementing the transport)
and a succession of links and network devices (routers or
middleboxes) that provide connectivity across the network.
Internet transports do not usually rely upon prior reservation of Internet transports do not usually rely upon prior reservation of
capacity along the path they use. In the absence of such a resource capacity along the path they use. In the absence of such a resource
reservation, endpoints are unable to determine a safe rate start or reservation, endpoints are unable to determine a safe rate at which
continue their transmission. The use of an Internet path therefore to start or continue their transmission. The use of an Internet path
requires a combination of end-to-end transport mechanisms to detect therefore requires a combination of end-to-end transport mechanisms
and respond to changes in the capacity available across the network to detect and respond to changes in the capacity available across the
path. Buffering (an increase in latency) or loss (discard of a network path. Buffering (an increase in latency) or loss (discard of
packet) arises when the traffic arriving at a link or network exceed a packet) arises when the traffic arriving at a link or network
the resources available. A network device that does not support exceeds the resources available.
Active Queue Management (AQM) [RFC7567] typically uses a drop-tail
policy to drop excess IP packets when its queue becomes full.
Although losses are not always due to congestion (loss may be due to
link corruption, receiver overrun, etc. [RFC3819]), endpoints have
to conservatively presume that loss is potentially due to congestion
and reduce the sending rate of their flows to reflect the available
capacity.
A path that is not congested can still experience an increased A network device that does not support Active Queue Management (AQM)
latency when the path multiplexes the traffic of multiple flows, and/ [RFC7567] typically uses a drop-tail policy to drop excess IP packets
or when the level of traffic is (transiently) higher than the when its queue becomes full. Although losses are not always due to
currently available capacity. As with loss, latency can also be congestion (loss may be due to link corruption, receiver overrun,
incurred for other reasons [RFC3819] (e.g. Quality of service etc. [RFC3819]), endpoint congestion control has to conservatively
scheduling, link radio resource management/bandwdith on demand, assume that any loss is potentially due to congestion and then reduce
transient outages, link retransmission, and connection/resource setup the sending rate of their flows to reflect the available capacity.
below the IP layer).
The use of a path impacts any flows (possibly from or to other The use of a path to send packets impacts any flows (possibly from or
endpoints) that share (multiplex their data) over a common network to other endpoints) that share the capacity (i.e., multiplex packets)
device or link. using a common network device or link. Even when a path is not
congested, flows can still experience an increased latency when the
path multiplexes traffic belonging to multiple flows. As with loss,
latency can also be incurred for other reasons [RFC3819] (Quality of
Service link scheduling, link radio resource management/bandwidth on
demand, transient outages, link retransmission, and connection/
resource setup below the IP layer, etc).
Principles include: Principles include:
o The design of a congestion controller needs to consider the wide o The design is REQUIRED be robust to a change in the set of devices
range of path characteristics presented by the variety of Internet on the network path. A reconfiguration, reset or other event
paths. Transports MUST be designed such that they operate safely could interrupt the path or trigger a change in the set of network
and effectively over common paths. devices forming the path.
o An endpoint cannot assume that a particular packet header is
passed transparently by the path or a particular forwarding
treatment applies. The supported set of packet headers and
forwarding can also change once a flow has commenced.
o A design MUST assume path characteristics can change over o Transports are REQUIRED to operate safely over the wide range of
relatively short intervals of time (i.e. characteristics path characteristics presented by Internet paths.
discovered do not necessarily remain valid for multiple Round Trip
Times, RTTs). In particular, they need to measure and adapt to
the path(s) they use path capacity, and the estimated RTT used for
timers.
o A design MUST assume that the set of network devices encountered o The path characteristics can change over relatively short
along a path can change with time. This MUST be robust to intervals of time (i.e., characteristics discovered do not
reconfiguration of network devices, reset of devices necessarily remain valid for multiple Round Trip Times, RTTs). In
particular, a sender SHOULD measure and adapt to the
characteristics of the path(s) they use.
3.2. Flow Multiplexing and Congestion 3.2. Flow Multiplexing and Congestion
It is normal to observe some perturbation in latency or loss to It is normal to observe some perturbation in latency or loss to
traffic when it shares a common network bottleneck with other traffic when it shares a common network bottleneck with other
traffic. This impact needs to be considered and Internet flows ought traffic. This impact needs to be considered and Internet flows ought
to implement appropriate safeguards to avoid inappropriate impact on to implement appropriate safeguards to avoid inappropriate impact on
other flows that share the resources along a path. Congestion other flows that share the resources along a path. Congestion
control methods satisfy this requirement and therefore also avoid control methods satisfy this requirement and therefore avoid
congestion collapse [@ARTICLE{author = {Bob Briscoe}, title = {Flow congestion collapse [@ARTICLE{author = {Bob Briscoe}, title = {Flow
Rate Fairness: Dismantling a Religion}, journal = {ACM CCR}, year = Rate Fairness: Dismantling a Religion}, journal = {ACM CCR}, year =
{2007} }]. {2007} }].
An endpoint can become aware of congestion by various means. A
signal that indicates congestion on the end-to-end network path, must
result in a congestion control reaction by the transport to reduce
the maximum rate permitted by the sending endpoint [RFC8087]].
Internet transports should react to avoid congestion that impacts Internet transports should react to avoid congestion that impacts
other flows sharing the path, and need to be designed to avoid other flows sharing a path, and need to be designed to avoid starving
starving other flows of capacity. This could include methods seeking other flows of capacity. This could include methods seeking to
to equally distribute resources between sharing flows, but this is equally distribute resources between sharing flows, but this is
explicitly not a requirement for design of network devices. explicitly not a requirement for a network device.
The Requirements for Internet Hosts [RFC1122] formally mandates that The Requirements for Internet Hosts [RFC1122] formally mandates that
endpoints perform congestion control. "Because congestion control is endpoints perform congestion control. "Because congestion control is
critical to the stable operation of the Internet, applications and critical to the stable operation of the Internet, applications and
other protocols that choose to use UDP as an Internet transport must other protocols that choose to use UDP as an Internet transport must
employ mechanisms to prevent congestion collapse and to establish employ mechanisms to prevent congestion collapse and to establish
some degree of fairness with concurrent traffic [RFC2914]. They may some degree of fairness with concurrent traffic [RFC2914]. They may
also need to implement additional mechanisms, depending on how they also need to implement additional mechanisms, depending on how they
use UDP" [RFC8085]. [RFC2309] also discussed the dangers of use UDP" [RFC8085]. [RFC2309] also discussed the dangers of
congestion-unresponsive flows and states that "all UDP-based congestion-unresponsive flows, and states that "all UDP-based
streaming applications should incorporate effective congestion streaming applications should incorporate effective congestion
avoidance mechanisms." [RFC7567] and [RFC8085] reaffirm this. avoidance mechanisms." [RFC7567] and [RFC8085] reaffirm this.
An endpoint can become aware of congestion by various means. A The general recommendation in the UDP Guidelines [RFC8085] is that
signal that indicates congestion on the end-to-end network path, must applications SHOULD leverage existing congestion control techniques,
result in a congestion control reaction by the transport to reduce such as those defined for TCP [RFC5681], TFRC [RFC5348], SCTP
the maximum rate permitted by the sending endpoint [RFC8087]]. [RFC4960], and other IETF-defined transports. This is because there
are many trade offs and details that can have a serious impact on the
The general recommendation in the UDP Guidelines [RFC8085] is performance of congestion control for the application they support
therefore that applications SHOULD leverage existing congestion and other traffic that seeks to share the resources along the path
control techniques, such as those defined for TCP [RFC5681], TFRC over which they communicate.
[RFC5348], SCTP [RFC4960], and other IETF-defined transports. This
is because there are many trade offs and details that can have a
serious impact on the performance of congestion control for the
application they support and other traffic that seeks to share the
resources along the path over which they communicate.
Section 3.6 notes that by default, IETF specifications target Experience has shown that successful protocols developed in a
deployment on the general Internet. Experience has however shown specific context or for a particular application tend to also become
that successful protocols developed in one specific context or for a used in a wider range of contexts. Therefore, IETF specifications by
particular application tend to become used in a wider range of default target deployment on the general Internet, or need to be
contexts. Experience has however shown that successful protocols defined for use only within a controlled environment.
developed in one specific context or for a particular application
tend to become used in a wider range of contexts.
Principles include: Principles include:
o [RFC1122] mandates that endpoints perform congestion control. o [RFC1122] mandates that endpoints perform congestion control.
o Transports need to avoid inducing flow starvation to other flows o Transports MUST avoid inducing flow starvation to other flows that
sharing resources along the path they use. share resources along the path they use.
o "If an application or protocol chooses not to use a congestion- o "If an application or protocol chooses not to use a congestion-
controlled transport protocol, it SHOULD control the rate at which controlled transport protocol, it SHOULD control the rate at which
it sends UDP datagrams to a destination host, in order to fulfil it sends UDP datagrams to a destination host, in order to fulfil
the requirements of [RFC2914]", as stated in [RFC8085]. the requirements of [RFC2914]", as stated in [RFC8085].
o Transports that do not target Internet deployment need to be o Transports that do not target Internet deployment need to be
constrained to only operate in a controlled environment (e.g. see constrained to only operate in a controlled environment (e.g. see
Section 3.6 of [RFC8085]) and provide appropriate mechanisms to Section 3.6 of [RFC8085]) and provide appropriate mechanisms to
prevent traffic accidentally leaving the controlled environment prevent traffic accidentally leaving the controlled environment
skipping to change at page 6, line 16 skipping to change at page 6, line 5
A significant pathology can arise when a poorly designed transport A significant pathology can arise when a poorly designed transport
creates congestion. This can result in severe service degradation or creates congestion. This can result in severe service degradation or
"Internet meltdown". This phenomenon was first observed during the "Internet meltdown". This phenomenon was first observed during the
early growth phase of the Internet in the mid 1980s [RFC896] early growth phase of the Internet in the mid 1980s [RFC896]
[RFC970]; it is technically called "congestion collapse" and was a [RFC970]; it is technically called "congestion collapse" and was a
key focus of [RFC2309]. key focus of [RFC2309].
o Endpoints MUST control their flows to avoid Congestion Collapse. o Endpoints MUST control their flows to avoid Congestion Collapse.
o Endpoints MUST employ exponential backoff to their traffic when o A sender SHOULD measure and adapt the protocol timers to the
they detect persistent congestion. measured the path RTT.
o When a endpoint detects persitent congestion, it MUST employ
exponential backoff to the maximum rate (congestion window).
o Endpoints MUST treat a loss of all feedback (e.g., RTO expiry) as o Endpoints MUST treat a loss of all feedback (e.g., RTO expiry) as
a tentative indication of congestion collapse, reacting until the an indication of persisent congestions (an indication of potential
path characteristics can again be confirmed. congestion collapse), until the path characteristics can again be
confirmed.
o Network devices should provide mechanisms to avoid congestion o Network devices can provide mechanisms to mitigate the impact of
collapse (e.g., priority forwarding of control information, and congestion collapse by transport flows (e.g., priority forwarding
starvation detection and protection [RFC7567]). of control information, and starvation detection) to mitigate the
impact of non-conformant and malicious flows [RFC7567]).
4. Guidelines for performing Congestion Control 4. Guidelines for Performing Congestion Control
This section provides guidance for designers of a new transport This section provides guidance for designers of a new transport
protocol that decide to implement congestion control and its protocol that decide to implement congestion control and its
associated mechanisms. associated mechanisms.
4.1. Connection Initialization 4.1. Connection Initialization
When a connection or flow to a new destination is established, the When a connection or flow to a new destination is established, the
endpoints have little information about the characteristics of the endpoints have little information about the characteristics of the
network path. This section describes how a flow starts transmission network path. This section describes how a flow starts transmission
over such a path. over such a path.
Flow Start: A new flow between a local and a remote endpoint cannot Flow Start: A new flow between two endpoints cannot assume that
assume that capacity is available at the start of the flow, unless capacity is available at the start of the flow, unless it uses a
it uses a mechanism to explicitly reserve capacity. In the mechanism to explicitly reserve capacity. In the absence of a
absence of a capacity signal, a flow MUST therefore start slowly. capacity signal, a flow MUST therefore start slowly.
The slow-start algorithm is the accepted standard for flow startup The slow-start algorithm is the accepted standard for flow startup
[RFC5681]. TCP uses the notion of an Initial Window (IW [RFC3390] [RFC5681]. TCP uses the notion of an Initial Window (IW [RFC3390]
updated by [RFC6928]) to define the initial volume of data that updated by [RFC6928]) to define the initial volume of data that
can be sent on a path. This is not the smallest burst, or the can be sent on a path. This is not the smallest burst, or the
smallest window - it is considered a safe starting point for a smallest window - it is considered a safe starting point for a
network that is not suffering persistent congestion, and network that is not suffering persistent congestion, and
applicable until feedback about the path is received. This applicable until feedback about the path is received. This
initial sending rate needs to be viewed as tentative until the initial sending rate needs to be viewed as tentative until the
capacity is confirmed to be available. capacity is confirmed to be available.
Initial RTO: When a flow sends the first packet it typically has no Initial RTO Interval: When a flow sends the first packet it
way to know the actual RTT of the path it uses. The values used typically has no way to know the actual RTT of the path it uses.
to initialise the Retransmission Timeout (RTO) is therefore a The initial value used to the Retransmission Timeout (RTO) is
trade off that has important consequences on the overall Internet therefore a trade off that has important consequences on the
stability [RFC6928] [RFC8085]. In the absence of any knowledge overall Internet stability [RFC6928] [RFC8085]. In the absence of
about the latency of a path, the RTO MUST be conservatively set to any knowledge about the latency of a path, the RTO MUST be
no less than 1 second. Values shorter than 1 second can be conservatively set to no less than 1 second. Values shorter than
problematic (see the appendix of [RFC6298]). 1 second can be problematic (see the appendix of [RFC6298]).
Initial RTO Expiry: If the RTO timer expires while awaiting Initial RTO Expiry: If the RTO timer expires while awaiting
completion of the connection setup (in TCP, the ACK of a SYN completion of the connection setup (in TCP, the ACK of a SYN
segment), and the implementation is using an RTO less than 3 segment), and the implementation is using an RTO less than 3
seconds, the sender can resend the connection setup. The RTO MUST seconds, the local endpoint can resend the connection setup. The
then be re-initialized to increase it to 3 seconds when data RTO MUST then be re-initialized to increase it to 3 seconds when
transmission begins (i.e., after the three-way handshake data transmission begins (i.e., after the three-way handshake
completes) [RFC6298] [RFC8085]. This conservative increase is completes) [RFC6298] [RFC8085]. This conservative increase is
necessary to avoid congestion collapse when many flows retransmit necessary to avoid congestion collapse when many flows retransmit
across a shared bottleneck with restricted capacity. across a shared bottleneck with restricted capacity.
Initial Measured RTO: Once an RTT measurement is available (e.g., Initial Measured RTO: Once an RTT measurement is available (e.g.,
through reception of an acknowledgement), this value must be through reception of an acknowledgement), this value must be
adjusted, and MUST take into account the RTT variance. For the adjusted, and MUST take into account the RTT variance. For the
first (sample this variance cannot be determined, and a sender first sample this variance cannot be determined, and a local
must therefore initialise the variance to RTT/2 (see equation 2.2 endpoint must therefore initialise the variance to RTT/2 (see
of [RFC6928] and related text for UDP in section 3.1.1 of equation 2.2 of [RFC6928] and related text for UDP in section
[RFC8085]). 3.1.1 of [RFC8085]).
Current State: A congestion controller MAY assume that recently used Current State: A congestion controller MAY assume that recently used
capacity between a pair of endpoint addresses is an indication of capacity between a pair of endpoint addresses is an indication of
capacity available in the next RTT between the same endpoints (and capacity available in the next RTT between the same endpoints (and
react accordingly if this is not confirmed to be true). react accordingly if this is not confirmed to be true).
Cached State: An endpoint that has recently used the same path Cached State: A congestion controller that recently used a path
between a local and remote endpoint could also have additional could use additional state that lets a flow take-over the capacity
state that lets a flow take-over utilising the capacity that was that was previously consumed by another flow (e.g., in the last
previously consumed (e.g., in the last RTT) by another flow. In RTT). In TCP, this mechanism is referred to as TCP Control Block
TCP, this mechanism is referred to as TCP Control Block (TCB) (CB) sharing [RFC2140] [ID.ietf-tcpm-2140bis]. This and other
sharing [RFC2140] [ID.ietf-tcpm-2140bis]. This and other
information can be used to suggest a faster initial sending rate, information can be used to suggest a faster initial sending rate,
but MUST be viewed as tentative until the capacity is confirmed to but MUST be viewed as tentative until the capacity is confirmed to
be available. A sender MUST reduce its rate if the actual used be available. A sender MUST reduce its rate if this capacity is
capacity is not confirmed within the current RTT interval. not confirmed within the current RTO interval.
4.2. Timers and Retranmission
This section describes mechanisms to detect and provide
retransmission, and to protect the network in the absence of timely
feedback.
Loss Detection: Loss detection occurs after a sender determines
there is no delivery confirmation within an expected period of
time. Retransmission mechanisms MAY utilise a measure of the RTT
of a path to detect loss before the period specified by the RTO
[RFC8085].
Detection can also be performed using the time-ordering of
transmission (as in TCP DupACK), or a combination of using a timer
and ordering information to trigger retransmission of data
[ID.ietf-tcpm-rack-05 ].
Retransmssion: Retransmission of lost packets or messages is a
common reliability mechanism. When a loss is detected, the sender
can choose to retransmit the lost data, ignore the loss, or send
other data. Any transmission consumes network capacity, therefore
retransmissions MUST NOT increase the network load in response to
congestion loss (which worsens that congestion) [RFC8085]. Any
method that sends additional data following loss is responsible
for congestion control of the retransmissions (and any other
packets sent) as well as the original traffic.
Maintaining the RTO: Once an endpoint is communicating with it peer
the RTO should MUST adjusted by measuring the RTT and its variance
(see equation 2.3 of [RFC6928]). The RTO SHOULD be set based on
recent observations [RFC8530].
RTO Expiry: Persistent lack of feedback detected by the RTO (or
other means) must be used an indication of potential congestion.
A failure to receive any specific response within a RTO interval
could potentially be a result of a RTT change, change of path,
excessive loss, or even congestion collapse.
If there is no response within the timeout period (often called
the RTO interval), TCP collapses the congestion window to one
segment [RFC5681]. Other transports must similarly respond when
they detect loss of feedback.
RTO expiry require to exponentially increase the size of the
timeout interval [RFC8085]. When the retransmission timer
expires, the RTO MUST be set to RTO * 2 ("back off the timer")
[RFC6298] [RFC8085]. A maximum value MAY be placed on the RTO.
This maximum RTO MUST NOT be less than 60 seconds [RFC6298].
4.3. Using Path Capacity 4.2. Using Path Capacity
This section describes how a sender needs to regulate the maximum This section describes how a sender needs to regulate the maximum
volume of data in flight over the interval of the current RTT, and volume of data in flight over the interval of the current RT, and how
how it manages transmission of the capacity that it perceives is it manages transmission of the capacity that it perceives is
available. available.
Congestion Management: The capacity available to a flow could be Congestion Management: The capacity available to a flow could be
expressed as the number of bytes in flight, the sending rate or a expressed as the number of bytes in flight, the sending rate or a
limit on the number of unacknowledged segments. In steady-state limit on the number of unacknowledged segments. In steady-state
this congestion window reflects a safe limit to the sending rate this congestion window reflects a safe limit to the sending rate
that has not resulted in persistent congestion. A sender that has not resulted in persistent congestion. A sender
performing congestion management will usually optimise performance performing congestion management will usually optimise performance
for its application by avoiding excessive loss or delay. for its application by avoiding excessive loss or delay.
One common model views the path between two endpoints as a pipe, One common model views the path between two endpoints as a pipe.
new packets enter the pipe at the sender, older one leaves at the New packets enter the pipe at the sending endpoint, older ones
receiver. The rate of data that leaves the pipe indicates the leave at the receiving endpoint, and are usually acknowledged to
share of the capacity utilised by a flow. If on average (over an the sender. The rate that data leaves the pipe indicates the
RTT the sending rate equals the sending rate, it indicates the share of the capacity that has been utilised by the flow. If, on
capacity can safely be used in the next RTT. If the average average (over an RT), the sending rate equals the receiving rate,
receiving rate is less than the sending rate, then the path is this indicates that this capacity can be safely used again in the
either queuing packets, the RTT/path has changed, or there is next RT. If the average receiving rate is less than the sending
packet loss. rate, then the path is either queuing packets, the RTT/path has
changed, or there is packet loss.
Transient Path: Path capacity information is transient. A sender Transient Path: Path capacity information is transient. A sender
that fails to use capacity has no understanding whether that that fails to use capacity has no understanding whether that
capacity remains available to use - or whether it has disappeared capacity remains available to use - or whether it has disappeared
(e.g., to a change to a path with a smaller bottleneck, or more (e.g., to a change to a path with a smaller bottleneck, or more
traffic has emerged that has consumed the previously available traffic has emerged that has consumed the previously available
capacity). For this reason, a sender that is limited by the capacity). For this reason, a sender that is limited by the
volume of application data available to send MUST NOT continue to volume of application data available to send MUST NOT continue to
grow the congestion window [RFC5681]. grow the congestion window [RFC5681].
Standard TCP states that a TCP sender SHOULD set the congestion Standard TCP states that a TCP sender SHOULD set the congestion
window to no more than the Restart Window (RW) before beginning window to no more than the Restart Window (R) before beginning
transmission if the TCP sender has not sent data in an interval transmission if the TCP sender has not sent data in an interval
that exceeds the current retransmission timeout, i.e., when an that exceeds the current retransmission timeout, i.e., when an
application becomes idle [RFC5681]. Experimental specifications application becomes idle [RFC5681]. Experimental specifications
permit TCP senders to tentatively maintain a congestion window permit TCP senders to tentatively maintain a congestion window
when application-limited, provided that they appropriately and when application-limited, provided that they appropriately and
rapidly collapse the window when potential congestion is detected rapidly collapse the window when potential congestion is detected
[RFC7661]. This mechanism is called Congestion Window Validation [RFC7661]. This mechanism is called Congestion Window Validation
(CWV). (CWV).
Burst Mitigation: Even in the absence of congestion, statistical Burst Mitigation: Even in the absence of congestion, statistical
multiplexing of flows can result in transient effects for flows multiplexing of flows can result in transient effects for flows
sharing common resources. A sender therefore SHOULD avoid sharing common resources. A sender therefore SHOULD avoid
inducing excessive congestion to other flows (collateral damage), inducing excessive congestion to other flows (collateral damage),
or patterns of loss that result in denying a reasonable access to or patterns of loss that result in denying a reasonable access to
the available capacity (sometimes called flow starvation). While the available capacity (sometimes called flow starvation).
a congestion controller ought to limit sending at the granularity
of the current RTT, this can be insufficient to satisfy the goals While a congestion controller ought to limit sending at the
of preventing starvation and mitigating collateral damage. This granularity of the current RTT, this can be insufficient to
requires moderating the burst rate of the sender to avoid satisfy the goals of preventing starvation and mitigating
significant periods where a flow(s) consume all buffer capacity at collateral damage. This requires moderating the burst rate of the
the path bottleneck, which would otherwise prevent other flows sender to avoid significant periods where a flow(s) consume all
from gaining a reasonable share. buffer capacity at the path bottleneck, which would otherwise
prevent other flows from gaining a reasonable share.
Endpoints SHOULD provide mechanisms to regulate the bursts of Endpoints SHOULD provide mechanisms to regulate the bursts of
transmission that the application/protocol sends to the network transmission that the application/protocol sends to the network
(section 3.1.6. of [RFC8085]). ACK-Clocking [RFC5681] can help (section 3.1.6 of [RFC8085]). ACK-Clocking [RFC5681] can help
mitigate bursts for protocols that receive continuous feedback of mitigate bursts for protocols that receive continuous feedback of
reception (such as TCP). Sender pacing can mitigate this reception (such as TCP). Sender pacing can mitigate this
[RFC8085], (See Section 4.6 of [RFC3449], and has been recommended [RFC8085], (See Section 4.6 of [RFC3449]), and has been
for TCP in conditions where ACK-clocking is not effective, (e.g., recommended for TCP in conditions where ACK-clocking is not
[RFC3742], [RFC7661]). SCTP [RFC4960] defines a maximum burst effective, (e.g., [RFC3742], [RFC7661]). SCTP [RFC4960] defines a
length (Max.Burst) with a recommended value of 4 segments to limit maximum burst length (Max.Burst) with a recommended value of 4
the SCTP burst size. segments to limit the SCTP burst size.
4.3. Timers and Retransmission
This section describes mechanisms to detect and provide
retransmission, and to protect the network in the absence of timely
feedback.
Loss Detection: Loss detection occurs after a sender determines
there is no delivery confirmation within an expected period of
time. Loss detection can be performed observing the time-ordering
of the reception of ACKs (as in TCP DupACK) or can utilise a timer
to detect loss before the expiry of the RTO [RFC8085] [ID.ietf-
tcpm-rack] or a combination of using a timer and ordering
information to trigger retransmission of data.
Retransmission: Retransmission of lost packets or messages is a
common reliability mechanism. When a loss is detected, the sender
can choose to retransmit the lost data, ignore the loss, or send
other data. Any transmission consumes network capacity, therefore
retransmissions MUST NOT increase the network load in response to
congestion loss (which worsens that congestion) [RFC8085]. Any
method that sends additional data following loss is responsible
for congestion control of the retransmissions (and any other
packets sent) as well as the original traffic.
Measuring the RTT: Once an endpoint has started communicating with
its peer, the RTT MUST adjusted by measuring the actual path RTT
and its variance (see equation 2.3 of [RFC6928]).
Maintaining the RTO: The RTO SHOULD be set based on recent RTT
observations [RFC8530].
RTO Expiry: Persistent lack of feedback detected by the RTO timer
(or other means) MUST be treated an indication of potential
congestion. A failure to receive any specific response within a
RTO interval could potentially be a result of a RTT change, change
of path, excessive loss, or even congestion collapse. If there is
no response within the RTO interval, TCP collapses the congestion
window to one segment [RFC5681]. Other transports must similarly
respond when they detect loss of feedback.
An endpoint needs to exponentially backoff the RTO interval
[RFC8085] each time the RTO expires. That is the RTO interval
MUST be set to the RTO * 2[RFC6298] [RFC8085].
Maximum RTO: A maximum value MAY be placed on the RTO interval. The
maximum limit to the RTO interval MUST NOT be less than 60 seconds
[RFC6298].
4.4. Responding to Potential Congestion 4.4. Responding to Potential Congestion
Internet flows SHOULD implement appropriate safeguards to avoid Internet flows SHOULD implement appropriate safeguards to avoid
inappropriate impact on other flows that share the resources along a inappropriate impact on other flows that share the resources along a
path. The safety and responsiveness of new proposals need to be path. The safety and responsiveness of new proposals need to be
evaluated [RFC5166]. In determining an appropriate congestion evaluated [RFC5166]. In determining an appropriate congestion
response, designs could take into consideration the size of the response, designs could take into consideration the size of the
packets that experience congestion [RFC4828]. packets that experience congestion [RFC4828].
Congestion Response: An endpoint MUST reduce the rate of Congestion Response: An endpoint MUST reduce the rate of
transmission when it detects loss (or some other indicator of transmission when it detects loss (or some other indicator of
congestion) [RFC2914]. (i.e. a reduction needs to not depend on congestion) [RFC2914]. Prompt reaction should follow a signal
reception of a signal from the remote endpoint, considering that from the remote endpoint indicating congestion (or inference of
congestion indications could themselves be lost under congestion). that, e.g., through detecting packet loss).
TCP Reno established a method that relies on multiplicative- TCP Reno established a method that relies on multiplicative-
decrease to halve the sending rate while congestion is detected. decrease to halve the sending rate while congestion is detected.
This response to loss is considered sufficient for safe Internet This response to loss is considered sufficient for safe Internet
operation, but other decrease factors have also been published in operation, but other decrease factors have also been published in
the RFC series [RFC8312]. the RFC Series [RFC8312].
ECN Response: A congestion control design should provide the ECN Response: A congestion control design should provide the
necessary mechanisms to support Explicit Congestion Notification necessary mechanisms to support Explicit Congestion Notification
(ECN) [RFC3168] [RFC6679], as described in section 3.1.7. of (ECN) [RFC3168] [RFC6679], as described in section 3.1.7. of
[RFC8085]. This can provide help determine an appropriate [RFC8085]. This can provide help determine an appropriate
congestion window when supported by routers on the path [RFC7567] congestion window when supported by routers on the path [RFC7567]
to enable rapid early indication of incipient congestion. to enable rapid early indication of incipient congestion.
The early detection of incipient congestion justifies a different The early detection of incipient congestion justifies a different
reaction to that for loss [RFC8311] [RFC8087]]. Simple feedback reaction to the reaction to packet loss [RFC8311] [RFC8087]].
of congestion experienced by ECN-marked packets [RFC3168] Simple feedback of received Congestion Experienced (CE) marks
[RFC8511], relies only on an indication that congestion has been [RFC3168], relies only on an indication that congestion has been
experienced within the last RTT. The reaction for traffic marked experienced within the last RT, appropriate for using ECT(0). The
with ECT(0) when using this simple feedback of congestion was reaction to reception of this indication was modified in TCP ABE
modified [RFC8511].
Further detail of the ECN marking can be obtained by providing [RFC8511]. Further detail about the received CE-marking can be
more accurate receiver feedback [ID.-ietf-tcpm-accurate-ecn], obtained by using more accurate receiver feedback [ID.-ietf-tcpm-
enabling a faster reaction reducing the queuing latency accurate-ecn]. This more detailed feedback provides an
[RFC8087]]. Current work in progress [ID.ietf-tsvwg-l4s-arch] opportunity for a finer-granularity of congestion response.
defines a reaction for packets marked with ECT(1), building on the
style of feedback provided by [ID.-ietf-tcpm-accurate-ecn].
Protection from Path Change: Congestion control, like loss recovery, Current work-in-progress [ID.ietf-tsvwg-l4s-arch] defines a
requires timely feedback. Congestion control MUST NOT solely rely reaction for packets marked with ECT(1), building on the style of
on the presence of feedback to perform safely. The only way to detailed feedback provided by [ID.-ietf-tcpm-accurate-ecn] and a
surely confirm that a local endpoint has successfully communicated modified marking system [ID.ietf-tsvwg-aqm-dualq-coupled].
with a remote endpoint is to utilise a timer Section 4.2 to detect
a lack of response that could result from a change in the path or
the path characteristics. Congestion controllers that are unable
to react one (or at most a few RTT) after a congestion indication
should observe the guidance in 3.3 of the UDP Guidelines
[RFC8085].
Persistent Congestion: Endpoints MUST reduce the rate further below Robustness to Path Change: The detection of congestion and the
that reflected by the restart window, if the RTO continues to resulting reduction MUST NOT solely depend upon reception of a
expire. signal from the remote endpoint, because congestion indications
could themselves be lost under persistent congestion.
Persistent congestion can result in congestion collapse and MUST The only way to surely confirm that a sending endpoint has
be aggressively avoided [RFC2914]. [RFC8085] provides guidelines successfully communicated with a remote endpoint is to utilise a
for a sender that does not or is unable too adapt the congestion timer (see (Section 4.3)) to detect a lack of response that could
window. A suitable method (e.g., TFRC) continues to reduce the result from a change in the path or the path characteristics
sending rate under persistent congestion, to one packet per round- (usually called the RTO). Congestion controllers that are unable
trip time and then exponentially backs off the time between single to react after one (or at most a few) RTTs after receiving a
packet transmissions if congestion continues to persist [RFC2914]. congestion indication should observe the guidance in section 3.3
of the UDP Guidelines [RFC8085].
Persistent Congestion: Persistent congestion can result in
congestion collapse. This MUST be aggressively avoided [RFC2914].
Endpoints that experience persistent congestion and have already
exponentially reduced their congestion window to the restart
window (e.g., 1 packet), MUST further reduce the rate if the RTO
timer continues to expire. For example, TCP-Friendly Rate Control
(TFRC) [RFC5348] continues to reduce its sending rate under
persistent congestion to one packet per RT, and then exponentially
backs off the time between single packet transmissions if the
congestion continues to persist [RFC2914].
[RFC8085] provides guidelines for a sender that does not, or is
unable to, adapt the congestion window.
4.5. Using More Capacity 4.5. Using More Capacity
In the absence of persistent congestion, endpoints are permitted to In the absence of persistent congestion, an endpoint is permitted to
increase their congestion window and hence their sending rate, increase its congestion window and hence the sending rate. This
providing that there is (or is expected to be) additional data increase should only occur when there is additional data available to
available to send across the path. send across the path (i.e., the sender will utilise the additional
capacity in the next RT).
TCP Reno [RFC5681] defines an algorithm, known as the AIMD (additive- TCP Reno [RFC5681] defines an algorithm, known as the AIMD (additive-
increase/ multiplicative-decrease) that allows a sender to increase/ multiplicative-decrease) that allows a sender to
exponentially increase the congestion window each RTT from the exponentially increase the congestion window each RTT from the
initial window to the first detected congestion event. This is initial window to the first detected congestion event. This is
designed to allow new flows to rapidly acquire a suitable congestion designed to allow new flows to rapidly acquire a suitable congestion
window. Where the bandwidth delay product (BDP) is large it can take window. Where the bandwidth delay product (BDP) is large, it can
many RTTs to find a suitable share of the path capacity, such paths take many RTTs to determine a suitable share of the path capacity.
benefit from methods that more rapidly increase the congestion window Such high BDP paths benefit from methods that more rapidly increase
(e.g., TCP Cubic [RFC8312]), but need to be designed to also react the congestion window, but in compensation these need to be designed
rapidly to any detected congestion. to also react rapidly to any detected congestion (e.g., TCP Cubic
[RFC8312]).
Increasing Congestion Window: A sender SHOULD stop increasing its Increasing Congestion Window: A sender MUST NOT continue to increase
congestion window as soon as it receives indication of congestion its rate for more than an RTT after a congestion indication is
and MUST NOT continue to increase its rate for more than a RTT received. It SHOULD stop increasing its congestion window as soon
after a congestion indication is received. When increasing the as it receives indication of congestion to avoid excessive
congestion window, a sender can transmit faster than the last "overshoot".
known safe rate.
Any increase above the last confirmed rate needs to be regarded as While the sender is increasing the congestion window, a sender can
tentative and the sender reduce their rate below the last transmit faster than the last known safe rate. Any increase above
confirmed safe rate when they experience congestion. the last confirmed rate needs to be regarded as tentative and the
sender reduce their rate below the last confirmed safe rate when
congestion is experienced (a congestion event).
Congestion: An endpoint MUST utilise a method that assures the Congestion: An endpoint MUST utilise a method that assures the
sender will keep the rate below the previously confirmed safe rate sender will keep the rate below the previously confirmed safe rate
for multiple RTTs after an observed congestion event. In TCP this for multiple RTTs after an observed congestion event. In TCP,
is performed by using linear increase from a slow start threshold this is performed by using a linear increase from a slow start
that is re-initialised when congestion is experienced. threshold that is re-initialised when congestion is experienced.
Avoiding Overshoot: Overshoot of the congestion window beyond the Avoiding Overshoot: Overshoot of the congestion window beyond the
point of congestion can significantly impact other flows sharing point of congestion can significantly impact other flows sharing
resources along a path. It is important to note that as endpoints resources along a path. It is important to note that as endpoints
experience more paths with a large BDP and a wider range of experience more paths with a large BDP and a wider range of
potential path RTT, that variability or changes in the path can potential path RT, that variability or changes in the path can
have very significant impacts on appropriate dynamics for have very significant impacts on appropriate dynamics for
increasing the congestion window (see also burst mitigation increasing the congestion window (see also burst mitigation
Section 4.3). Section 4.2).
4.6. Network Signals 4.6. Network Signals
An endpoint can utilise signals from the network to help determine An endpoint can utilise signals from the network to help determine
how to control the traffic it sends. how to regulate the traffic it sends.
Network Signals: The assumptions that network devices on path may Network Signals: Mechanisms MUST NOT solely rely on messages or
change motivates the use of soft-state when designing protocols other specific signalling messages to perform safely (See section
that interact with network devices (e.g., ECN). To protect from 5.2 of [RFC8085] describing use of ICMP messages). The path
changes in the path. characteristics can change at any time. Transport mechanisms need
to be robust to potential black-holing of any signals (i.e., it
needs to be robust to loss or modification of packets).
Transport mechanisms need to be robust to potential black-holing A mechanism that utilises signals originating in the network (e.g.
of signals, as it must also be robust to loss of packets. RSVP, NSIS, Quick-Start, ECN), must assume that the set of network
devices on the path can change. This motivates the use of soft-
state when designing protocols that interact with signals
originating from network devices (e.g., ECN). This can include
context-sensitive treatment of "soft" signals provided to the
endpoint [RFC5461].
Mechanisms MUST NOT solely rely on ICMP messages or other specific 4.7. Protection of Protocol Mechanisms
signalling messages to perform safely when these are not received
(section 5.2 of [RFC8085]). This can include context-sensitive
treatment of "soft" signals provided to the endpoint [RFC5461].
Validation of ICMP: ICMP messages [RFC0792] MUST to be validated An endpoint needs to provide protection from attacks on the traffic
before they are used. Other path signals must similarly be it generates, or atatcks that increase the capacity it consumes
validated to protect from malicious use. (impacting other traffic that shared a bottleneck).
4.7. Protection of Protocol Mechanisms Off Path Attack: A design MUST protect from off-path attack to the
protocol [RFC8085]. An attack on the congestion control can lead
to a DoS vulnerability for the flow being controlled and/or other
flows that share network resources along the path.
Off Path Attack: A design MUST protect from off-path attack Validation of Signals: Network signalling and control messages
[RFC8085] where an attack on the congestion control can lead to a (e.g., ICMP [RFC0792]) MUST to be validated before they are used
DoS vulnerability for the flow being controlled and/or other flows to protect from malicious use. This MUST at least include
that share network resources along the path. protection from off-path attack [RFC8085].
OffOn Path Attack: A protocol can be designed to protect from on- On Path Attack: A protocol can be designed to protect from on-path
path attacks, but this requires more complexity and the use of attacks, but this requires more complexity and the use of
encryption/authentication mechanisms (e.g., IPsec [RFC4301], QUIC encryption/authentication mechanisms (e.g., IPsec [RFC4301], QUIC
[I-D.ietf-quic-transport]). [I-D.ietf-quic-transport]).
5. IETF guidelines on evaluation of Congestion Control 5. IETF Guidelines on Evaluation of Congestion Control
The IETF has provided guidance [RFC5033] for considering alternate The IETF has provided guidance [RFC5033] for considering alternate
congestion control algorithms. The IRTF has described set of metrics congestion control algorithms. The IRTF has described a set of
and related trade-off between metrics that can be used to compare, metrics and related trade-off between metrics that can be used to
contrast, and evaluate congestion control techniques [RFC5166]. compare, contrast, and evaluate congestion control techniques
[RFC5166].
6. Acknowledgements 6. Acknowledgements
Nicholas Kuhn helped develop the first draft of these guidelines. Nicholas Kuhn helped develop the first draft of these guidelines.
Tom Jones reviewed the first version of this draft. Gorry Fairhurst Tom Jones reviewed the first version of this draft. Gorry Fairhurst
and Tom Jones were funded at the University of Aberdeen by the and Tom Jones were funded at the University of Aberdeen by the
European Space Agency. European Space Agency. Ana Custura helped review the text.
The views expressed are solely those of the author(s). The views expressed are solely those of the author(s).
7. IANA Considerations 7. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
RFC Editor Note: If there are no requirements for IANA, the section RFC Editor Note: If there are no requirements for IANA, the section
will be removed during conversion into an RFC by the RFC Editor. will be removed during conversion into an RFC by the RFC Editor.
skipping to change at page 17, line 20 skipping to change at page 17, line 30
Appendix A. Revision Notes Appendix A. Revision Notes
Note to RFC-Editor: please remove this entire section prior to Note to RFC-Editor: please remove this entire section prior to
publication. publication.
Individual draft -00: Individual draft -00:
o Comments and corrections are welcome directly to the authors or o Comments and corrections are welcome directly to the authors or
via the IETF TSVWG, working group mailing list. via the IETF TSVWG, working group mailing list.
Individual draft -01:
o
o This update is proposed for initial WG comments. o This update is proposed for initial WG comments.
o If there is interest in progresisng this document, the next o If there is interest in progresisng this document, the next
version will include more complee referencing to citred material. version will include more complee referencing to citred material.
Author's Address Author's Address
Godred Fairhurst Godred Fairhurst
University of Aberdeen University of Aberdeen
School of Engineering School of Engineering
 End of changes. 66 change blocks. 
284 lines changed or deleted 303 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/