draft-ietf-tsvwg-l4s-arch-09.txt   draft-ietf-tsvwg-l4s-arch-10.txt 
Transport Area Working Group B. Briscoe, Ed. Transport Area Working Group B. Briscoe, Ed.
Internet-Draft Independent Internet-Draft Independent
Intended status: Informational K. De Schepper Intended status: Informational K. De Schepper
Expires: November 22, 2021 Nokia Bell Labs Expires: January 2, 2022 Nokia Bell Labs
M. Bagnulo Braun M. Bagnulo Braun
Universidad Carlos III de Madrid Universidad Carlos III de Madrid
G. White G. White
CableLabs CableLabs
May 21, 2021 July 1, 2021
Low Latency, Low Loss, Scalable Throughput (L4S) Internet Service: Low Latency, Low Loss, Scalable Throughput (L4S) Internet Service:
Architecture Architecture
draft-ietf-tsvwg-l4s-arch-09 draft-ietf-tsvwg-l4s-arch-10
Abstract Abstract
This document describes the L4S architecture, which enables Internet This document describes the L4S architecture, which enables Internet
applications to achieve Low queuing Latency, Low Loss, and Scalable applications to achieve Low queuing Latency, Low Loss, and Scalable
throughput (L4S). The insight on which L4S is based is that the root throughput (L4S). The insight on which L4S is based is that the root
cause of queuing delay is in the congestion controllers of senders, cause of queuing delay is in the congestion controllers of senders,
not in the queue itself. The L4S architecture is intended to enable not in the queue itself. The L4S architecture is intended to enable
_all_ Internet applications to transition away from congestion _all_ Internet applications to transition away from congestion
control algorithms that cause queuing delay, to a new class of control algorithms that cause queuing delay, to a new class of
skipping to change at page 2, line 20 skipping to change at page 2, line 20
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 22, 2021. This Internet-Draft will expire on January 2, 2022.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 43 skipping to change at page 2, line 43
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. L4S Architecture Overview . . . . . . . . . . . . . . . . . . 5 2. L4S Architecture Overview . . . . . . . . . . . . . . . . . . 5
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. L4S Architecture Components . . . . . . . . . . . . . . . . . 7 4. L4S Architecture Components . . . . . . . . . . . . . . . . . 7
4.1. Protocol Mechanisms . . . . . . . . . . . . . . . . . . . 7
4.2. Network Components . . . . . . . . . . . . . . . . . . . 9
4.3. Host Mechanisms . . . . . . . . . . . . . . . . . . . . . 11
5. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1. Why These Primary Components? . . . . . . . . . . . . . . 12 5.1. Why These Primary Components? . . . . . . . . . . . . . . 12
5.2. What L4S adds to Existing Approaches . . . . . . . . . . 14 5.2. What L4S adds to Existing Approaches . . . . . . . . . . 15
6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 17 6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 18
6.1. Applications . . . . . . . . . . . . . . . . . . . . . . 17 6.1. Applications . . . . . . . . . . . . . . . . . . . . . . 18
6.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 19 6.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 19
6.3. Applicability with Specific Link Technologies . . . . . . 20 6.3. Applicability with Specific Link Technologies . . . . . . 20
6.4. Deployment Considerations . . . . . . . . . . . . . . . . 20 6.4. Deployment Considerations . . . . . . . . . . . . . . . . 21
6.4.1. Deployment Topology . . . . . . . . . . . . . . . . . 21 6.4.1. Deployment Topology . . . . . . . . . . . . . . . . . 21
6.4.2. Deployment Sequences . . . . . . . . . . . . . . . . 22 6.4.2. Deployment Sequences . . . . . . . . . . . . . . . . 22
6.4.3. L4S Flow but Non-ECN Bottleneck . . . . . . . . . . . 25 6.4.3. L4S Flow but Non-ECN Bottleneck . . . . . . . . . . . 25
6.4.4. L4S Flow but Classic ECN Bottleneck . . . . . . . . . 25 6.4.4. L4S Flow but Classic ECN Bottleneck . . . . . . . . . 26
6.4.5. L4S AQM Deployment within Tunnels . . . . . . . . . . 26 6.4.5. L4S AQM Deployment within Tunnels . . . . . . . . . . 26
7. IANA Considerations (to be removed by RFC Editor) . . . . . . 26 7. IANA Considerations (to be removed by RFC Editor) . . . . . . 26
8. Security Considerations . . . . . . . . . . . . . . . . . . . 26 8. Security Considerations . . . . . . . . . . . . . . . . . . . 26
8.1. Traffic Rate (Non-)Policing . . . . . . . . . . . . . . . 26 8.1. Traffic Rate (Non-)Policing . . . . . . . . . . . . . . . 26
8.2. 'Latency Friendliness' . . . . . . . . . . . . . . . . . 27 8.2. 'Latency Friendliness' . . . . . . . . . . . . . . . . . 27
8.3. Interaction between Rate Policing and L4S . . . . . . . . 29 8.3. Interaction between Rate Policing and L4S . . . . . . . . 29
8.4. ECN Integrity . . . . . . . . . . . . . . . . . . . . . . 29 8.4. ECN Integrity . . . . . . . . . . . . . . . . . . . . . . 30
8.5. Privacy Considerations . . . . . . . . . . . . . . . . . 30 8.5. Privacy Considerations . . . . . . . . . . . . . . . . . 30
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 31 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 31
10. Informative References . . . . . . . . . . . . . . . . . . . 31 10. Informative References . . . . . . . . . . . . . . . . . . . 31
Appendix A. Standardization items . . . . . . . . . . . . . . . 38 Appendix A. Standardization items . . . . . . . . . . . . . . . 39
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41
1. Introduction 1. Introduction
It is increasingly common for _all_ of a user's applications at any It is increasingly common for _all_ of a user's applications at any
one time to require low delay: interactive Web, Web services, voice, one time to require low delay: interactive Web, Web services, voice,
conversational video, interactive video, interactive remote presence, conversational video, interactive video, interactive remote presence,
instant messaging, online gaming, remote desktop, cloud-based instant messaging, online gaming, remote desktop, cloud-based
applications and video-assisted remote control of machinery and applications and video-assisted remote control of machinery and
industrial processes. In the last decade or so, much has been done industrial processes. In the last decade or so, much has been done
to reduce propagation delay by placing caches or servers closer to to reduce propagation delay by placing caches or servers closer to
skipping to change at page 4, line 40 skipping to change at page 4, line 42
bottom of every saw-tooth. bottom of every saw-tooth.
It has been demonstrated that if the sending host replaces a Classic It has been demonstrated that if the sending host replaces a Classic
congestion control with a 'Scalable' alternative, when a suitable AQM congestion control with a 'Scalable' alternative, when a suitable AQM
is deployed in the network the performance under load of all the is deployed in the network the performance under load of all the
above interactive applications can be significantly improved. For above interactive applications can be significantly improved. For
instance, queuing delay under heavy load with the example DCTCP/DualQ instance, queuing delay under heavy load with the example DCTCP/DualQ
solution cited below on a DSL or Ethernet link is roughly 1 to 2 solution cited below on a DSL or Ethernet link is roughly 1 to 2
milliseconds at the 99th percentile without losing link milliseconds at the 99th percentile without losing link
utilization [DualPI2Linux], [DCttH15] (for other link types, see utilization [DualPI2Linux], [DCttH15] (for other link types, see
Section 6.3). This compares with 5 to 20 ms on _average_ with a Section 6.3). This compares with 5-20 ms on _average_ with a Classic
Classic congestion control and current state-of-the-art AQMs such as congestion control and current state-of-the-art AQMs such as FQ-
FQ-CoDel [RFC8290], PIE [RFC8033] or DOCSIS PIE [RFC8034] and about CoDel [RFC8290], PIE [RFC8033] or DOCSIS PIE [RFC8034] and about
20-30 ms at the 99th percentile [DualPI2Linux]. 20-30 ms at the 99th percentile [DualPI2Linux].
It has also been demonstrated [DCttH15], [DualPI2Linux] that it is It has also been demonstrated [DCttH15], [DualPI2Linux] that it is
possible to deploy such an L4S service alongside the existing best possible to deploy such an L4S service alongside the existing best
efforts service so that all of a user's applications can shift to it efforts service so that all of a user's applications can shift to it
when their stack is updated. Access networks are typically designed when their stack is updated. Access networks are typically designed
with one link as the bottleneck for each site (which might be a home, with one link as the bottleneck for each site (which might be a home,
small enterprise or mobile device), so deployment at each end of this small enterprise or mobile device), so deployment at each end of this
link should give nearly all the benefit in each direction. The L4S link should give nearly all the benefit in each direction. The L4S
approach also requires component mechanisms at the endpoints to approach also requires component mechanisms at the endpoints to
fulfill its goal. This document presents the L4S architecture, by fulfill its goal. This document presents the L4S architecture, by
describing the different components and how they interact to provide describing the different components and how they interact to provide
the scalable, low latency, low loss Internet service. the scalable, low latency, low loss Internet service.
2. L4S Architecture Overview 2. L4S Architecture Overview
There are three main components to the L4S architecture: There are three main components to the L4S architecture; the AQM in
the network, the congestion control on the host, and the protocol
between them:
1) Network: L4S traffic needs to be isolated from the queuing 1) Network: L4S traffic needs to be isolated from the queuing
latency of Classic traffic. One queue per application flow (FQ) latency of Classic traffic. One queue per application flow (FQ)
is one way to achieve this, e.g. FQ-CoDel [RFC8290]. However, is one way to achieve this, e.g. FQ-CoDel [RFC8290]. However,
just two queues is sufficient and does not require inspection of just two queues is sufficient and does not require inspection of
transport layer headers in the network, which is not always transport layer headers in the network, which is not always
possible (see Section 5.2). With just two queues, it might seem possible (see Section 5.2). With just two queues, it might seem
impossible to know how much capacity to schedule for each queue impossible to know how much capacity to schedule for each queue
without inspecting how many flows at any one time are using each. without inspecting how many flows at any one time are using each.
And it would be undesirable to arbitrarily divide access network And it would be undesirable to arbitrarily divide access network
skipping to change at page 5, line 51 skipping to change at page 6, line 7
and deployed in Windows Server Editions (since 2012), in Linux and and deployed in Windows Server Editions (since 2012), in Linux and
in FreeBSD. Although DCTCP as-is 'works' well over the public in FreeBSD. Although DCTCP as-is 'works' well over the public
Internet, most implementations lack certain safety features that Internet, most implementations lack certain safety features that
will be necessary once it is used outside controlled environments will be necessary once it is used outside controlled environments
like data centres (see Section 6.4.3 and Appendix A). Scalable like data centres (see Section 6.4.3 and Appendix A). Scalable
congestion control will also need to be implemented in protocols congestion control will also need to be implemented in protocols
other than TCP (QUIC, SCTP, RTP/RTCP, RMCAT, etc.). Indeed, other than TCP (QUIC, SCTP, RTP/RTCP, RMCAT, etc.). Indeed,
between the present document being drafted and published, the between the present document being drafted and published, the
following scalable congestion controls were implemented: TCP following scalable congestion controls were implemented: TCP
Prague [PragueLinux], QUIC Prague, an L4S variant of the RMCAT Prague [PragueLinux], QUIC Prague, an L4S variant of the RMCAT
SCReAM controller [RFC8298] and the L4S ECN part of SCReAM controller [SCReAM] and the L4S ECN part of BBRv2 [BBRv2]
BBRv2 [I-D.cardwell-iccrg-bbr-congestion-control] intended for TCP intended for TCP and QUIC transports.
and QUIC transports.
3. Terminology 3. Terminology
Classic Congestion Control: A congestion control behaviour that can Classic Congestion Control: A congestion control behaviour that can
co-exist with standard TCP Reno [RFC5681] without causing co-exist with standard Reno [RFC5681] without causing
significantly negative impact on its flow rate [RFC5033]. With significantly negative impact on its flow rate [RFC5033]. With
Classic congestion controls, as flow rate scales, the number of Classic congestion controls, such as Reno or Cubic, because flow
round trips between congestion signals (losses or ECN marks) rises rate has scaled since TCP congestion control was first designed in
with the flow rate. So it takes longer and longer to recover 1988, it now takes hundreds of round trips (and growing) to
after each congestion event. Therefore control of queuing and recover after a congestion signal (whether a loss or an ECN mark)
utilization becomes very slack, and the slightest disturbance as shown in the examples in Section 5.1 and [RFC3649]. Therefore
prevents a high rate from being attained [RFC3649]. control of queuing and utilization becomes very slack, and the
slightest disturbances (e.g. from new flows starting) prevent a
For instance, with 1500 byte packets and an end-to-end round trip high rate from being attained.
time (RTT) of 36 ms, over the years, as Reno flow rate scales from
2 to 100 Mb/s the number of round trips taken to recover from a
congestion event rises proportionately, from 4 to 200.
Cubic [RFC8312] was developed to be less unscalable, but it is
approaching its scaling limit; with the same RTT of 36 ms, at
100Mb/s it takes about 106 round trips to recover, and at 800 Mb/s
its recovery time triples to over 340 round trips, or still more
than 12 seconds (Reno would take 57 seconds).
Scalable Congestion Control: A congestion control where the average Scalable Congestion Control: A congestion control where the average
time from one congestion signal to the next (the recovery time) time from one congestion signal to the next (the recovery time)
remains invariant as the flow rate scales, all other factors being remains invariant as the flow rate scales, all other factors being
equal. This maintains the same degree of control over queueing equal. This maintains the same degree of control over queueing
and utilization whatever the flow rate, as well as ensuring that and utilization whatever the flow rate, as well as ensuring that
high throughput is more robust to disturbances (e.g. from new high throughput is more robust to disturbances. For instance,
flows starting). For instance, DCTCP averages 2 congestion DCTCP averages 2 congestion signals per round-trip whatever the
signals per round-trip whatever the flow rate, as do other flow rate, as do other recently developed scalable congestion
recently developed scalable congestion controls, e.g. Relentless controls, e.g. Relentless TCP [Mathis09], TCP Prague
TCP [Mathis09], TCP Prague [PragueLinux] and the L4S variant of [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux],
SCReAM for real-time media [RFC8298]).See Section 4.3 of BBRv2 [BBRv2] and the L4S variant of SCReAM for real-time
media [SCReAM], [RFC8298]). See Section 4.3 of
[I-D.ietf-tsvwg-ecn-l4s-id] for more explanation. [I-D.ietf-tsvwg-ecn-l4s-id] for more explanation.
Classic service: The Classic service is intended for all the Classic service: The Classic service is intended for all the
congestion control behaviours that co-exist with Reno [RFC5681] congestion control behaviours that co-exist with Reno [RFC5681]
(e.g. Reno itself, Cubic [RFC8312], (e.g. Reno itself, Cubic [RFC8312],
Compound [I-D.sridharan-tcpm-ctcp], TFRC [RFC5348]). The term Compound [I-D.sridharan-tcpm-ctcp], TFRC [RFC5348]). The term
'Classic queue' means a queue providing the Classic service. 'Classic queue' means a queue providing the Classic service.
Low-Latency, Low-Loss Scalable throughput (L4S) service: The 'L4S' Low-Latency, Low-Loss Scalable throughput (L4S) service: The 'L4S'
service is intended for traffic from scalable congestion control service is intended for traffic from scalable congestion control
algorithms, such as Data Center TCP [RFC8257]. The L4S service is algorithms, such as the Prague congestion control
for more general traffic than just DCTCP--it allows the set of [I-D.briscoe-iccrg-prague-congestion-control], which was derived
congestion controls with similar scaling properties to DCTCP to from DCTCP [RFC8257]. The L4S service is for more general
evolve, such as the examples listed above (Relentless, Prague, traffic than just TCP Prague--it allows the set of congestion
SCReAM). The term 'L4S queue' means a queue providing the L4S controls with similar scaling properties to Prague to evolve, such
service. as the examples listed above (Relentless, SCReAM). The term 'L4S
queue' means a queue providing the L4S service.
The terms Classic or L4S can also qualify other nouns, such as The terms Classic or L4S can also qualify other nouns, such as
'queue', 'codepoint', 'identifier', 'classification', 'packet', 'queue', 'codepoint', 'identifier', 'classification', 'packet',
'flow'. For example: an L4S packet means a packet with an L4S 'flow'. For example: an L4S packet means a packet with an L4S
identifier sent from an L4S congestion control. identifier sent from an L4S congestion control.
Both Classic and L4S services can cope with a proportion of Both Classic and L4S services can cope with a proportion of
unresponsive or less-responsive traffic as well, as long as it unresponsive or less-responsive traffic as well, but in the L4S
does not build a queue (e.g. DNS, VoIP, game sync datagrams, etc). case its rate has to be smooth enough or low enough not build a
queue (e.g. DNS, VoIP, game sync datagrams, etc).
Reno-friendly: The subset of Classic traffic that excludes Reno-friendly: The subset of Classic traffic that is friendly to the
unresponsive traffic and excludes experimental congestion controls standard Reno congestion control defined for TCP in [RFC5681].
intended to coexist with Reno but without always being strictly Reno-friendly is used in place of 'TCP-friendly', given the latter
friendly to it (as allowed by [RFC5033]). Reno-friendly is used has become imprecise, because the TCP protocol is now used with so
in place of 'TCP-friendly', given that friendliness is a property many different congestion control behaviours, and Reno is used in
of the congestion controller (Reno), not the wire protocol (TCP), non-TCP transports such as QUIC.
which is used with many different congestion control behaviours.
Classic ECN: The original Explicit Congestion Notification (ECN) Classic ECN: The original Explicit Congestion Notification (ECN)
protocol [RFC3168], which requires ECN signals to be treated as protocol [RFC3168], which requires ECN signals to be treated as
equivalent to drops, both when generated in the network and when equivalent to drops, both when generated in the network and when
responded to by the sender. responded to by the sender.
The names used for the four codepoints of the 2-bit IP-ECN field For L4S, the names used for the four codepoints of the 2-bit IP-
are as defined in [RFC3168]: Not ECT, ECT(0), ECT(1) and CE, where ECN field are unchanged from those defined in [RFC3168]: Not ECT,
ECT stands for ECN-Capable Transport and CE stands for Congestion ECT(0), ECT(1) and CE, where ECT stands for ECN-Capable Transport
Experienced. and CE stands for Congestion Experienced. A packet marked with
the CE codepoint is termed 'ECN-marked' or sometimes just 'marked'
where the context makes ECN obvious.
Site: A home, mobile device, small enterprise or campus, where the Site: A home, mobile device, small enterprise or campus, where the
network bottleneck is typically the access link to the site. Not network bottleneck is typically the access link to the site. Not
all network arrangements fit this model but it is a useful, widely all network arrangements fit this model but it is a useful, widely
applicable generalization. applicable generalization.
4. L4S Architecture Components 4. L4S Architecture Components
The L4S architecture is composed of the following elements. The L4S architecture is composed of the elements in the following
three subsections.
Protocols: The L4S architecture encompasses two identifier changes 4.1. Protocol Mechanisms
(an unassignment and an assignment) and optional further identifiers:
The L4S architecture involves: a) unassignment of an identifier; b)
reassignment of the same identifier; and c) optional further
identifiers:
a. An essential aspect of a scalable congestion control is the use a. An essential aspect of a scalable congestion control is the use
of explicit congestion signals rather than losses, because the of explicit congestion signals. 'Classic' ECN [RFC3168] requires
signals need to be sent frequently and immediately. In contrast, an ECN signal to be treated as equivalent to drop, both when it
'Classic' ECN [RFC3168] requires an ECN signal to be treated as is generated in the network and when it is responded to by hosts.
equivalent to drop, both when it is generated in the network and L4S needs networks and hosts to support a more fine-grained
when it is responded to by hosts. L4S needs networks and hosts meaning for each ECN signal that is less severe than a drop, so
to support a different meaning for ECN: that the L4S signals:
* much more frequent signals--too often to require an equivalent * can be much more frequent;
excessive degree of drop from non-ECN flows;
* immediately tracking every fluctuation of the queue--too soon * can be signalled immediately, without the signficant delay
to warrant dropping packets from non-ECN flows. required to smooth out fluctuations in the queue.
So the standards track [RFC3168] has had to be updated to allow To enable L4S, the standards track [RFC3168] has had to be
L4S packets to depart from the 'same as drop' constraint. updated to allow L4S packets to depart from the 'equivalent to
[RFC8311] is a standards track update to relax specific drop' constraint. [RFC8311] is a standards track update to relax
requirements in RFC 3168 (and certain other standards track specific requirements in RFC 3168 (and certain other standards
RFCs), which clears the way for the experimental changes proposed track RFCs), which clears the way for the experimental changes
for L4S. [RFC8311] also reclassifies the original experimental proposed for L4S. [RFC8311] also reclassifies the original
assignment of the ECT(1) codepoint as an ECN nonce [RFC3540] as experimental assignment of the ECT(1) codepoint as an ECN
historic. nonce [RFC3540] as historic.
b. [I-D.ietf-tsvwg-ecn-l4s-id] recommends ECT(1) is used as the b. [I-D.ietf-tsvwg-ecn-l4s-id] recommends ECT(1) is used as the
identifier to classify L4S packets into a separate treatment from identifier to classify L4S packets into a separate treatment from
Classic packets. This satisfies the requirements for identifying Classic packets. This satisfies the requirements for identifying
an alternative ECN treatment in [RFC4774]. an alternative ECN treatment in [RFC4774].
The CE codepoint is used to indicate Congestion Experienced by The CE codepoint is used to indicate Congestion Experienced by
both L4S and Classic treatments. This raises the concern that a both L4S and Classic treatments. This raises the concern that a
Classic AQM earlier on the path might have marked some ECT(0) Classic AQM earlier on the path might have marked some ECT(0)
packets as CE. Then these packets will be erroneously classified packets as CE. Then these packets will be erroneously classified
skipping to change at page 8, line 46 skipping to change at page 8, line 50
retransmission. retransmission.
c. A network operator might wish to include certain unresponsive, c. A network operator might wish to include certain unresponsive,
non-L4S traffic in the L4S queue if it is deemed to be smoothly non-L4S traffic in the L4S queue if it is deemed to be smoothly
enough paced and low enough rate not to build a queue. For enough paced and low enough rate not to build a queue. For
instance, VoIP, low rate datagrams to sync online games, instance, VoIP, low rate datagrams to sync online games,
relatively low rate application-limited traffic, DNS, LDAP, etc. relatively low rate application-limited traffic, DNS, LDAP, etc.
This traffic would need to be tagged with specific identifiers, This traffic would need to be tagged with specific identifiers,
e.g. a low latency Diffserv Codepoint such as Expedited e.g. a low latency Diffserv Codepoint such as Expedited
Forwarding (EF [RFC3246]), Non-Queue-Building Forwarding (EF [RFC3246]), Non-Queue-Building
(NQB [I-D.white-tsvwg-nqb]), or operator-specific identifiers. (NQB [I-D.ietf-tsvwg-nqb]), or operator-specific identifiers.
Network components: The L4S architecture aims to provide low latency 4.2. Network Components
without the _need_ for per-flow operations in network components.
Nonetheless, the architecture does not preclude per-flow solutions - The L4S architecture aims to provide low latency without the _need_
it encompasses the following combinations: for per-flow operations in network components. Nonetheless, the
architecture does not preclude per-flow solutions--it encompasses the
following combinations:
a. The Dual Queue Coupled AQM (illustrated in Figure 1) achieves the a. The Dual Queue Coupled AQM (illustrated in Figure 1) achieves the
'semi-permeable' membrane property mentioned earlier as follows. 'semi-permeable' membrane property mentioned earlier as follows.
The obvious part is that using two separate queues isolates the The obvious part is that using two separate queues isolates the
queuing delay of one from the other. The less obvious part is queuing delay of one from the other. The less obvious part is
how the two queues act as if they are a single pool of bandwidth how the two queues act as if they are a single pool of bandwidth
without the scheduler needing to decide between them. This is without the scheduler needing to decide between them. This is
achieved by having the Classic AQM provide a congestion signal to achieved by having the Classic AQM provide a congestion signal to
both queues in a manner that ensures a consistent response from both queues in a manner that ensures a consistent response from
the two types of congestion control. In other words, the Classic the two types of congestion control. In other words, the Classic
skipping to change at page 9, line 29 skipping to change at page 9, line 36
leave the right amount of capacity for the Classic traffic (as leave the right amount of capacity for the Classic traffic (as
they would if they were the same type of traffic sharing the same they would if they were the same type of traffic sharing the same
queue). Then the scheduler can serve the L4S queue with queue). Then the scheduler can serve the L4S queue with
priority, because the L4S traffic isn't offering up enough priority, because the L4S traffic isn't offering up enough
traffic to use all the priority that it is given. Therefore, on traffic to use all the priority that it is given. Therefore, on
short time-scales (sub-round-trip) the prioritization of the L4S short time-scales (sub-round-trip) the prioritization of the L4S
queue protects its low latency by allowing bursts to dissipate queue protects its low latency by allowing bursts to dissipate
quickly; but on longer time-scales (round-trip and longer) the quickly; but on longer time-scales (round-trip and longer) the
Classic queue creates an equal and opposite pressure against the Classic queue creates an equal and opposite pressure against the
L4S traffic to ensure that neither has priority when it comes to L4S traffic to ensure that neither has priority when it comes to
bandwidth. The tension between prioritizing L4S and coupling bandwidth. The tension between prioritizing L4S and coupling the
marking from Classic results in per-flow fairness. To protect marking from the Classic AQM results in approximate per-flow
against unresponsive traffic in the L4S queue taking advantage of fairness. To protect against unresponsive traffic in the L4S
the prioritization and starving the Classic queue, it is queue taking advantage of the prioritization and starving the
advisable not to use strict priority, but instead to use a Classic queue, it is advisable not to use strict priority, but
weighted scheduler (see Appendix A of instead to use a weighted scheduler (see Appendix A of
[I-D.ietf-tsvwg-aqm-dualq-coupled]). [I-D.ietf-tsvwg-aqm-dualq-coupled]).
When there is no Classic traffic, the L4S queue's AQM comes into When there is no Classic traffic, the L4S queue's AQM comes into
play, and it sets an appropriate marking rate to maintain very play. It starts congestion marking with a very shallow queue, so
low queuing delay. L4S traffic maintains very low queuing delay.
The Dual Queue Coupled AQM has been specified as generically as The Dual Queue Coupled AQM has been specified as generically as
possible [I-D.ietf-tsvwg-aqm-dualq-coupled] without specifying possible [I-D.ietf-tsvwg-aqm-dualq-coupled] without specifying
the particular AQMs to use in the two queues so that designers the particular AQMs to use in the two queues so that designers
are free to implement diverse ideas. Informational appendices in are free to implement diverse ideas. Informational appendices in
that draft give pseudocode examples of two different specific AQM that draft give pseudocode examples of two different specific AQM
approaches: one called DualPI2 (pronounced Dual PI approaches: one called DualPI2 (pronounced Dual PI
Squared) [DualPI2Linux] that uses the PI2 variant of PIE, and a Squared) [DualPI2Linux] that uses the PI2 variant of PIE, and a
zero-config variant of RED called Curvy RED. A DualQ Coupled AQM zero-config variant of RED called Curvy RED. A DualQ Coupled AQM
based on PIE has also been specified and implemented for Low based on PIE has also been specified and implemented for Low
skipping to change at page 10, line 23 skipping to change at page 10, line 28
________ / |Classifier| : /|scheduler| / ________ / |Classifier| : /|scheduler| /
|Classic |/ |__________|\ --------. ___:__ / |_________| |Classic |/ |__________|\ --------. ___:__ / |_________|
| sender | \_\ || | |||___\_| mark/|/ | sender | \_\ || | |||___\_| mark/|/
|________| / || | ||| / | drop | |________| / || | ||| / | drop |
Classic --------' |______| Classic --------' |______|
Figure 1: Components of an L4S Solution: 1) Isolation in separate Figure 1: Components of an L4S Solution: 1) Isolation in separate
network queues; 2) Packet Identification Protocol; and 3) Scalable network queues; 2) Packet Identification Protocol; and 3) Scalable
Sending Host Sending Host
b. A scheduler with per-flow queues can be used for L4S. It is b. A scheduler with per-flow queues such as FQ-CoDel or FQ-PIE can
simple to modify an existing design such as FQ-CoDel or FQ-PIE. be used for L4S. For instance within each queue of an FQ-CoDel
For instance within each queue of an FQ-CoDel system, as well as system, as well as a CoDel AQM, there is typically also ECN
a CoDel AQM, immediate (unsmoothed) shallow threshold ECN marking marking at an immediate (unsmoothed) shallow threshold to support
has been added (see Sec.5.2.7 of [RFC8290]). Then the Classic use in data centres (see Sec.5.2.7 of [RFC8290]). This can be
AQM such as CoDel or PIE is applied to non-ECN or ECT(0) packets, modified so that the shallow threshold is solely applied to
while the shallow threshold is applied to ECT(1) packets, to give ECT(1) packets. Then if there is a flow of non-ECN or ECT(0)
sub-millisecond average queue delay. packets in the per-flow-queue, the Classic AQM (e.g. CoDel) is
applied; while if there is a flow of ECT(1) packets in the queue,
the shallower (typically sub-millisecond) threshold is applied.
In addition, ECT(0) and not-ECT packets could potentially be
classified into a separate flow-queue from ECT(1) and CE packets
to avoid them mixing if they share a common flow-identifier (e.g.
in a VPN).
c. It would also be possible to use dual queues for isolation, but c. It should also be possible to use dual queues for isolation, but
with per-flow marking to control flow-rates (instead of the with per-flow marking to control flow-rates (instead of the
coupled per-queue marking of the Dual Queue Coupled AQM). One of coupled per-queue marking of the Dual Queue Coupled AQM). One of
the two queues would be for isolating L4S packets, which would be the two queues would be for isolating L4S packets, which would be
classified by the ECN codepoint. Flow rates could be controlled classified by the ECN codepoint. Flow rates could be controlled
by flow-specific marking. The policy goal of the marking could by flow-specific marking. The policy goal of the marking could
be to differentiate flow rates (e.g. [Nadas20], which requires be to differentiate flow rates (e.g. [Nadas20], which requires
additional signalling of a per-flow 'value'), or to equalize additional signalling of a per-flow 'value'), or to equalize
flow-rates (perhaps in a similar way to Approx Fair CoDel [AFCD], flow-rates (perhaps in a similar way to Approx Fair CoDel [AFCD],
[I-D.morton-tsvwg-codel-approx-fair], but with two queues not [I-D.morton-tsvwg-codel-approx-fair], but with two queues not
one). one).
Note that whenever the term 'DualQ' is used loosely without Note that whenever the term 'DualQ' is used loosely without
saying whether marking is per-queue or per-flow, it means a dual saying whether marking is per-queue or per-flow, it means a dual
queue AQM with per-queue marking. queue AQM with per-queue marking.
Host mechanisms: The L4S architecture includes two main mechanisms in 4.3. Host Mechanisms
the end host that we enumerate next:
a. Scalable Congestion Control: Data Center TCP is the most widely The L4S architecture includes two main mechanisms in the end host
used example. It has been documented as an informational record that we enumerate next:
of the protocol currently in use in controlled
environments [RFC8257]. A draft list of safety and performance a. Scalable Congestion Control at the sender: Data Center TCP is the
improvements for a scalable congestion control to be usable on most widely used example. It has been documented as an
the public Internet has been drawn up (the so-called 'Prague L4S informational record of the protocol currently in use in
requirements' in Appendix A of [I-D.ietf-tsvwg-ecn-l4s-id]). The controlled environments [RFC8257]. A draft list of safety and
subset that involve risk of harm to others have been captured as performance improvements for a scalable congestion control to be
normative requirements in Section 4 of usable on the public Internet has been drawn up (the so-called
[I-D.ietf-tsvwg-ecn-l4s-id]. TCP Prague has been implemented in 'Prague L4S requirements' in Appendix A of
Linux as a reference implementation to address these requirements [I-D.ietf-tsvwg-ecn-l4s-id]). The subset that involve risk of
[PragueLinux]. harm to others have been captured as normative requirements in
Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id]. TCP
Prague [I-D.briscoe-iccrg-prague-congestion-control] has been
implemented in Linux as a reference implementation to address
these requirements [PragueLinux].
Transport protocols other than TCP use various congestion Transport protocols other than TCP use various congestion
controls that are designed to be friendly with Reno. Before they controls that are designed to be friendly with Reno. Before they
can use the L4S service, it will be necessary to implement can use the L4S service, they will need to be updated to
scalable variants of each of these congestion control behaviours. implement a scalable congestion response, which they will have to
They will eventually need to be updated to implement a scalable indicate by using the ECT(1) codepoint. Scalable variants are
congestion response, which they will have to indicate by using under consideration for more recent transport protocols,
the ECT(1) codepoint. Scalable variants are under consideration e.g. QUIC, and the L4S ECN part of BBRv2 [BBRv2] is a scalable
for some new transport protocols that are themselves under
development, e.g. QUIC. Also the L4S ECN part of
BBRv2 [I-D.cardwell-iccrg-bbr-congestion-control] is a scalable
congestion control intended for the TCP and QUIC transports, congestion control intended for the TCP and QUIC transports,
amongst others. Also an L4S variant of the RMCAT SCReAM amongst others. Also an L4S variant of the RMCAT SCReAM
controller [RFC8298] has been implemented for media transported controller [RFC8298] has been implemented [SCReAM] for media
over RTP. transported over RTP.
b. ECN feedback is sufficient for L4S in some transport protocols b. The ECN feedback in some transport protocols is already
(specifically DCCP [RFC4340] and QUIC [I-D.ietf-quic-transport]). sufficiently fine-grained for L4S (specifically DCCP [RFC4340]
But others either require update or are in the process of being and QUIC [RFC9000]). But others either require update or are in
updated: the process of being updated:
* For the case of TCP, the feedback protocol for ECN embeds the * For the case of TCP, the feedback protocol for ECN embeds the
assumption from Classic ECN [RFC3168] that an ECN mark is assumption from Classic ECN [RFC3168] that an ECN mark is
equivalent to a drop, making it unusable for a scalable TCP. equivalent to a drop, making it unusable for a scalable TCP.
Therefore, the implementation of TCP receivers will have to be Therefore, the implementation of TCP receivers will have to be
upgraded [RFC7560]. Work to standardize and implement more upgraded [RFC7560]. Work to standardize and implement more
accurate ECN feedback for TCP (AccECN) is in accurate ECN feedback for TCP (AccECN) is in
progress [I-D.ietf-tcpm-accurate-ecn], [PragueLinux]. progress [I-D.ietf-tcpm-accurate-ecn], [PragueLinux].
* ECN feedback is only roughly sketched in an appendix of the * ECN feedback is only roughly sketched in an appendix of the
SCTP specification [RFC4960]. A fuller specification has been SCTP specification [RFC4960]. A fuller specification has been
proposed in a long-expired draft [I-D.stewart-tsvwg-sctpecn], proposed in a long-expired draft [I-D.stewart-tsvwg-sctpecn],
which would need to be implemented and deployed before SCTCP which would need to be implemented and deployed before SCTCP
could support L4S. could support L4S.
* For RTP, sufficient ECN feedback was defined in [RFC6679], but * For RTP, sufficient ECN feedback was defined in [RFC6679], but
[I-D.ietf-avtcore-cc-feedback-message] defines the latest [RFC8888] defines the latest standards track improvements.
standards track improvements.
5. Rationale 5. Rationale
5.1. Why These Primary Components? 5.1. Why These Primary Components?
Explicit congestion signalling (protocol): Explicit congestion Explicit congestion signalling (protocol): Explicit congestion
signalling is a key part of the L4S approach. In contrast, use of signalling is a key part of the L4S approach. In contrast, use of
drop as a congestion signal creates a tension because drop is both drop as a congestion signal creates a tension because drop is both
an impairment (less would be better) and a useful signal (more an impairment (less would be better) and a useful signal (more
would be better): would be better):
* Explicit congestion signals can be used many times per round * Explicit congestion signals can be used many times per round
trip, to keep tight control, without any impairment. Under trip, to keep tight control, without any impairment. Under
heavy load, even more explicit signals can be applied so the heavy load, even more explicit signals can be applied so the
queue can be kept short whatever the load. Whereas state-of- queue can be kept short whatever the load. In contrast,
the-art AQMs have to introduce very high packet drop at high Classic AQMs have to introduce very high packet drop at high
load to keep the queue short. Further, when using ECN, the load to keep the queue short. By using ECN, an L4S congestion
congestion control's sawtooth reduction can be smaller and control's sawtooth reduction can be smaller and therefore
therefore return to the operating point more often, without return to the operating point more often, without worrying that
worrying that this causes more signals (one at the top of each more sawteeth will cause more signals. The consequent smaller
smaller sawtooth). The consequent smaller amplitude sawteeth amplitude sawteeth fit between an empty queue and a very
fit between a very shallow marking threshold and an empty shallow marking threshold (~1 ms in the public Internet), so
queue, so queue delay variation can be very low, without risk queue delay variation can be very low, without risk of under-
of under-utilization. utilization.
* Explicit congestion signals can be sent immediately to track * Explicit congestion signals can be emitted immediately to track
fluctuations of the queue. L4S shifts smoothing from the fluctuations of the queue. L4S shifts smoothing from the
network (which doesn't know the round trip times of all the network to the host. The network doesn't know the round trip
flows) to the host (which knows its own round trip time). times of all the flows. So if the network is responsible for
Previously, the network had to smooth to keep a worst-case smoothing (as in the Classic approach), it has to assume a
round trip stable, which delayed congestion signals by 100-200 worst case RTT, otherwise long RTT flows would become unstable.
ms. This delays Classic congestion signals by 100-200 ms. In
contrast, each host knows its own round trip time. So, in the
L4S approach, the host can smooth each flow over its own RTT,
introducing no more soothing delay than strictly necessary
(usually only a few milliseconds). A host can also choose not
to introduce any smoothing delay if appropriate, e.g. during
flow start-up.
All the above makes it clear that explicit congestion signalling Neither of the above are feasible if explicit congestion
is only advantageous for latency if it does not have to be signalling has to be considered 'equivalent to drop' (as was
considered 'equivalent to' drop (as was required with Classic required with Classic ECN [RFC3168]), because drop is an
ECN [RFC3168]). Therefore, in an L4S AQM, the L4S queue uses a impairment as well as a signal. So drop cannot be excessively
new L4S variant of ECN that is not equivalent to drop (see section frequent, and drop cannot be immediate, otherwise too many drops
5.2 of [I-D.ietf-tsvwg-ecn-l4s-id]), while the Classic queue uses would turn out to have been due to only a transient fluctuation in
either classic ECN [RFC3168] or drop, which are equivalent to each the queue that would not have warranted dropping a packet in
other. hindsight. Therefore, in an L4S AQM, the L4S queue uses a new L4S
variant of ECN that is not equivalent to drop (see section 5.2 of
[I-D.ietf-tsvwg-ecn-l4s-id]), while the Classic queue uses either
Classic ECN [RFC3168] or drop, which are equivalent to each other.
Before Classic ECN was standardized, there were various proposals Before Classic ECN was standardized, there were various proposals
to give an ECN mark a different meaning from drop. However, there to give an ECN mark a different meaning from drop. However, there
was no particular reason to agree on any one of the alternative was no particular reason to agree on any one of the alternative
meanings, so 'equivalent to drop' was the only compromise that meanings, so 'equivalent to drop' was the only compromise that
could be reached. RFC 3168 contains a statement that: could be reached. RFC 3168 contains a statement that:
"An environment where all end nodes were ECN-Capable could "An environment where all end nodes were ECN-Capable could
allow new criteria to be developed for setting the CE allow new criteria to be developed for setting the CE
codepoint, and new congestion control mechanisms for end-node codepoint, and new congestion control mechanisms for end-node
skipping to change at page 13, line 32 skipping to change at page 13, line 50
determine their rate, packet by packet, rather than be overridden determine their rate, packet by packet, rather than be overridden
by a network scheduler. An alternative is for a network scheduler by a network scheduler. An alternative is for a network scheduler
to control the rate of each application flow (see discussion in to control the rate of each application flow (see discussion in
Section 5.2). Section 5.2).
L4S packet identifier (protocol): Once there are at least two L4S packet identifier (protocol): Once there are at least two
treatments in the network, hosts need an identifier at the IP treatments in the network, hosts need an identifier at the IP
layer to distinguish which treatment they intend to use. layer to distinguish which treatment they intend to use.
Scalable congestion notification: A scalable congestion control in Scalable congestion notification: A scalable congestion control in
the host keeps the signalling frequency from the network high so the host keeps the signalling frequency from the network high
that rate variations can be small when signalling is stable, and whatever the flow rate, so that queue delay variations can be
rate can track variations in available capacity as rapidly as small when conditions are stable, and rate can track variations in
possible otherwise. available capacity as rapidly as possible otherwise.
Low loss: Latency is not the only concern of L4S. The 'Low Loss" Low loss: Latency is not the only concern of L4S. The 'Low Loss"
part of the name denotes that L4S generally achieves zero part of the name denotes that L4S generally achieves zero
congestion loss due to its use of ECN. Otherwise, loss would congestion loss due to its use of ECN. Otherwise, loss would
itself cause delay, particularly for short flows, due to itself cause delay, particularly for short flows, due to
retransmission delay [RFC2884]. retransmission delay [RFC2884].
Scalable throughput: The "Scalable throughput" part of the name Scalable throughput: The "Scalable throughput" part of the name
denotes that the per-flow throughput of scalable congestion denotes that the per-flow throughput of scalable congestion
controls should scale indefinitely, avoiding the imminent scaling controls should scale indefinitely, avoiding the imminent scaling
problems with Reno-friendly congestion control problems with Reno-friendly congestion control
algorithms [RFC3649]. It was known when TCP congestion avoidance algorithms [RFC3649]. It was known when TCP congestion avoidance
was first developed that it would not scale to high bandwidth- was first developed in 1988 that it would not scale to high
delay products (see footnote 6 in [TCP-CA]). Today, regular bandwidth-delay products (see footnote 6 in [TCP-CA]). Today,
broadband bit-rates over WAN distances are already beyond the regular broadband flow rates over WAN distances are already beyond
scaling range of Classic Reno congestion control. So `less the scaling range of Classic Reno congestion control. So `less
unscalable' Cubic [RFC8312] and Compound [I-D.sridharan-tcpm-ctcp] unscalable' Cubic [RFC8312] and Compound [I-D.sridharan-tcpm-ctcp]
variants of TCP have been successfully deployed. However, these variants of TCP have been successfully deployed. However, these
are now approaching their scaling limits. As the examples in are now approaching their scaling limits.
Section 3 demonstrate, as flow rate scales Classic congestion
controls like Reno or Cubic induce a congestion signal more and For instance, we will consider a scenario with a maximum RTT of
more infrequently (hundreds of round trips at today's flow rates 30 ms at the peak of each sawtooth. As Reno packet rate scales 8x
and growing), which makes dynamic control very sloppy. In from 1,250 to 10,000 packet/s (from 15 to 120 Mb/s with 1500 B
contrast on average a scalable congestion control like DCTCP or packets), the time to recover from a congestion event rises
TCP Prague induces 2 congestion signals per round trip, which proportionately by 8x as well, from 422 ms to 3.38 s. It is
remains invariant for any flow rate, keeping dynamic control very clearly problematic for a congestion control to take multiple
tight. seconds to recover from each congestion event. Cubic [RFC8312]
was developed to be less unscalable, but it is approaching its
scaling limit; with the same max RTT of 30 ms, at 120 Mb/s the
Linux implementation of Cubic is still in its Reno-friendly mode,
so it takes about 2.3 s to recover. However, once the flow rate
scales by 8x again to 960 Mb/s it enters true Cubic mode, with a
recovery time of 10.6 s. From then on, each further scaling by 8x
doubles Cubic's recovery time (because the cube root of 8 is 2),
e.g. at 7.68 Gb/s the recovery time is 21.3 s. In contrast a
scalable congestion control like DCTCP or TCP Prague induces 2
congestion signals per round trip on average, which remains
invariant for any flow rate, keeping dynamic control very tight.
Although work on scaling congestion controls tends to start with Although work on scaling congestion controls tends to start with
TCP as the transport, the above is not intended to exclude other TCP as the transport, the above is not intended to exclude other
transports (e.g. SCTP, QUIC) or less elastic algorithms transports (e.g. SCTP, QUIC) or less elastic algorithms
(e.g. RMCAT), which all tend to adopt the same or similar (e.g. RMCAT), which all tend to adopt the same or similar
developments. developments.
5.2. What L4S adds to Existing Approaches 5.2. What L4S adds to Existing Approaches
All the following approaches address some part of the same problem All the following approaches address some part of the same problem
skipping to change at page 16, line 25 skipping to change at page 17, line 7
while taking account of the needs of others via congestion while taking account of the needs of others via congestion
signals. They maintain that this has allowed applications signals. They maintain that this has allowed applications
with interesting rate behaviours to evolve, for instance, with interesting rate behaviours to evolve, for instance,
variable bit-rate video that varies around an equal share variable bit-rate video that varies around an equal share
rather than being forced to remain equal at every instant, or rather than being forced to remain equal at every instant, or
scavenger services that use less than an equal share of scavenger services that use less than an equal share of
capacity [LEDBAT_AQM]. capacity [LEDBAT_AQM].
The L4S architecture does not require the IETF to commit to The L4S architecture does not require the IETF to commit to
one approach over the other, because it supports both, so that one approach over the other, because it supports both, so that
the market can decide. Nonetheless, in the spirit of 'Do one the 'market' can decide. Nonetheless, in the spirit of 'Do
thing and do it well' [McIlroy78], the DualQ option provides one thing and do it well' [McIlroy78], the DualQ option
low delay without prejudging the issue of flow-rate control. provides low delay without prejudging the issue of flow-rate
Then, flow rate policing can be added separately if desired. control. Then, flow rate policing can be added separately if
This allows application control up to a point, but the network desired. This allows application control up to a point, but
can still choose to set the point at which it intervenes to the network can still choose to set the point at which it
prevent one flow completely starving another. intervenes to prevent one flow completely starving another.
Note: Note:
1. It might seem that self-inflicted queuing delay within a per- 1. It might seem that self-inflicted queuing delay within a per-
flow queue should not be counted, because if the delay wasn't flow queue should not be counted, because if the delay wasn't
in the network it would just shift to the sender. However, in the network it would just shift to the sender. However,
modern adaptive applications, e.g. HTTP/2 [RFC7540] or some modern adaptive applications, e.g. HTTP/2 [RFC7540] or some
interactive media applications (see Section 6.1), can keep low interactive media applications (see Section 6.1), can keep low
latency objects at the front of their local send queue by latency objects at the front of their local send queue by
shuffling priorities of other objects dependent on the shuffling priorities of other objects dependent on the
skipping to change at page 17, line 18 skipping to change at page 17, line 48
(BBR [I-D.cardwell-iccrg-bbr-congestion-control]) controls queuing (BBR [I-D.cardwell-iccrg-bbr-congestion-control]) controls queuing
delay end-to-end without needing any special logic in the network, delay end-to-end without needing any special logic in the network,
such as an AQM. So it works pretty-much on any path (although it such as an AQM. So it works pretty-much on any path (although it
has not been without problems, particularly capacity sharing in has not been without problems, particularly capacity sharing in
BBRv1). BBR keeps queuing delay reasonably low, but perhaps not BBRv1). BBR keeps queuing delay reasonably low, but perhaps not
quite as low as with state-of-the-art AQMs such as PIE or FQ- quite as low as with state-of-the-art AQMs such as PIE or FQ-
CoDel, and certainly nowhere near as low as with L4S. Queuing CoDel, and certainly nowhere near as low as with L4S. Queuing
delay is also not consistently low, due to BBR's regular bandwidth delay is also not consistently low, due to BBR's regular bandwidth
probing spikes and its aggressive flow start-up phase. probing spikes and its aggressive flow start-up phase.
L4S complements BBR. Indeed BBRv2 uses L4S ECN and a scalable L4S L4S complements BBR. Indeed BBRv2 [BBRv2] uses L4S ECN and a
congestion control behaviour in response to any ECN signalling scalable L4S congestion control behaviour in response to any ECN
from the path. The L4S ECN signal complements the delay based signalling from the path. The L4S ECN signal complements the
congestion control aspects of BBR with an explicit indication that delay based congestion control aspects of BBR with an explicit
hosts can use, both to converge on a fair rate and to keep below a indication that hosts can use, both to converge on a fair rate and
shallow queue target set by the network. Without L4S ECN, both to keep below a shallow queue target set by the network. Without
these aspects need to be assumed or estimated. L4S ECN, both these aspects need to be assumed or estimated.
6. Applicability 6. Applicability
6.1. Applications 6.1. Applications
A transport layer that solves the current latency issues will provide A transport layer that solves the current latency issues will provide
new service, product and application opportunities. new service, product and application opportunities.
With the L4S approach, the following existing applications also With the L4S approach, the following existing applications also
experience significantly better quality of experience under load: experience significantly better quality of experience under load:
skipping to change at page 24, line 17 skipping to change at page 24, line 37
1. Here, the immediate benefit of a single AQM deployment can be 1. Here, the immediate benefit of a single AQM deployment can be
seen, but limited to a controlled trial or controlled deployment. seen, but limited to a controlled trial or controlled deployment.
In this example downstream deployment is first, but in other In this example downstream deployment is first, but in other
scenarios the upstream might be deployed first. If no AQM at all scenarios the upstream might be deployed first. If no AQM at all
was previously deployed for the downstream access, an L4S AQM was previously deployed for the downstream access, an L4S AQM
greatly improves the Classic service (as well as adding the L4S greatly improves the Classic service (as well as adding the L4S
service). If an AQM was already deployed, the Classic service service). If an AQM was already deployed, the Classic service
will be unchanged (and L4S will add an improvement on top). will be unchanged (and L4S will add an improvement on top).
2. In this stage, the name 'TCP Prague' [PragueLinux] is used to 2. In this stage, the name 'TCP
Prague' [I-D.briscoe-iccrg-prague-congestion-control] is used to
represent a variant of DCTCP that is safe to use in a production represent a variant of DCTCP that is safe to use in a production
Internet environment. If the application is primarily Internet environment. If the application is primarily
unidirectional, 'TCP Prague' at one end will provide all the unidirectional, 'TCP Prague' at one end will provide all the
benefit needed. For TCP transports, Accurate ECN feedback benefit needed. For TCP transports, Accurate ECN feedback
(AccECN) [I-D.ietf-tcpm-accurate-ecn] is needed at the other end, (AccECN) [I-D.ietf-tcpm-accurate-ecn] is needed at the other end,
but it is a generic ECN feedback facility that is already planned but it is a generic ECN feedback facility that is already planned
to be deployed for other purposes, e.g. DCTCP, BBR. The two ends to be deployed for other purposes, e.g. DCTCP, BBR. The two ends
can be deployed in either order, because, in TCP, an L4S can be deployed in either order, because, in TCP, an L4S
congestion control only enables itself if it has negotiated the congestion control only enables itself if it has negotiated the
use of AccECN feedback with the other end during the connection use of AccECN feedback with the other end during the connection
handshake. Thus, deployment of TCP Prague on a server enables handshake. Thus, deployment of TCP Prague on a server enables
L4S trials to move to a production service in one direction, L4S trials to move to a production service in one direction,
wherever AccECN is deployed at the other end. This stage might wherever AccECN is deployed at the other end. This stage might
be further motivated by the performance improvements of TCP be further motivated by the performance improvements of TCP
Prague relative to DCTCP (see Appendix A.2 of Prague relative to DCTCP (see Appendix A.2 of
[I-D.ietf-tsvwg-ecn-l4s-id]). [I-D.ietf-tsvwg-ecn-l4s-id]).
Unlike TCP, from the outset, QUIC ECN Unlike TCP, from the outset, QUIC ECN feedback [RFC9000] has
feedback [I-D.ietf-quic-transport] has supported L4S. Therefore, supported L4S. Therefore, if the transport is QUIC, one-ended
if the transport is QUIC, one-ended deployment of a Prague deployment of a Prague congestion control at this stage is simple
congestion control at this stage is simple and sufficient. and sufficient.
3. This is a two-move stage to enable L4S upstream. An L4S AQM or 3. This is a two-move stage to enable L4S upstream. An L4S AQM or
TCP Prague can be deployed in either order as already explained. TCP Prague can be deployed in either order as already explained.
To motivate the first of two independent moves, the deferred To motivate the first of two independent moves, the deferred
benefit of enabling new services after the second move has to be benefit of enabling new services after the second move has to be
worth it to cover the first mover's investment risk. As worth it to cover the first mover's investment risk. As
explained already, the potential for new interactive services explained already, the potential for new interactive services
provides this motivation. An L4S AQM also improves the upstream provides this motivation. An L4S AQM also improves the upstream
Classic service - significantly if no other AQM has already been Classic service - significantly if no other AQM has already been
deployed. deployed.
skipping to change at page 31, line 7 skipping to change at page 31, line 23
or classes within that broad set to be distinguishable in any way or classes within that broad set to be distinguishable in any way
while traversing networks. This removes much of the ability to while traversing networks. This removes much of the ability to
correlate between the delay requirements of traffic and other correlate between the delay requirements of traffic and other
identifying features [RFC6973]. There may be some types of traffic identifying features [RFC6973]. There may be some types of traffic
that prefer not to use L4S, but the coarse binary categorization of that prefer not to use L4S, but the coarse binary categorization of
traffic reveals very little that could be exploited to compromise traffic reveals very little that could be exploited to compromise
privacy. privacy.
9. Acknowledgements 9. Acknowledgements
Thanks to Richard Scheffenegger, Wes Eddy, Karen Nielsen, David Black Thanks to Richard Scheffenegger, Wes Eddy, Karen Nielsen, David
and Jake Holland for their useful review comments. Black, Jake Holland and Vidhi Goel for their useful review comments.
Bob Briscoe and Koen De Schepper were part-funded by the European Bob Briscoe and Koen De Schepper were part-funded by the European
Community under its Seventh Framework Programme through the Reducing Community under its Seventh Framework Programme through the Reducing
Internet Transport Latency (RITE) project (ICT-317700). Bob Briscoe Internet Transport Latency (RITE) project (ICT-317700). Bob Briscoe
was also part-funded by the Research Council of Norway through the was also part-funded by the Research Council of Norway through the
TimeIn project, partly by CableLabs and partly by the Comcast TimeIn project, partly by CableLabs and partly by the Comcast
Innovation Fund. The views expressed here are solely those of the Innovation Fund. The views expressed here are solely those of the
authors. authors.
10. Informative References 10. Informative References
[AFCD] Xue, L., Kumar, S., Cui, C., Kondikoppa, P., Chiu, C-H., [AFCD] Xue, L., Kumar, S., Cui, C., Kondikoppa, P., Chiu, C-H.,
and S-J. Park, "Towards fair and low latency next and S-J. Park, "Towards fair and low latency next
generation high speed networks: AFCD queuing", Journal of generation high speed networks: AFCD queuing", Journal of
Network and Computer Applications 70:183--193, July 2016. Network and Computer Applications 70:183--193, July 2016.
[BBRv2] Cardwell, N., "TCP BBR v2 Alpha/Preview Release", github
repository; Linux congestion control module,
<https://github.com/google/bbr/blob/v2alpha/README.md>.
[DCttH15] De Schepper, K., Bondarenko, O., Briscoe, B., and I. [DCttH15] De Schepper, K., Bondarenko, O., Briscoe, B., and I.
Tsang, "`Data Centre to the Home': Ultra-Low Latency for Tsang, "`Data Centre to the Home': Ultra-Low Latency for
All", RITE project Technical Report , 2015, All", RITE project Technical Report , 2015,
<http://riteproject.eu/publications/>. <http://riteproject.eu/publications/>.
[DOCSIS3.1] [DOCSIS3.1]
CableLabs, "MAC and Upper Layer Protocols Interface CableLabs, "MAC and Upper Layer Protocols Interface
(MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable
Service Interface Specifications DOCSIS(R) 3.1 Version i17 Service Interface Specifications DOCSIS(R) 3.1 Version i17
or later, January 2019, <https://specification- or later, January 2019, <https://specification-
skipping to change at page 32, line 15 skipping to change at page 32, line 35
[I-D.briscoe-conex-policing] [I-D.briscoe-conex-policing]
Briscoe, B., "Network Performance Isolation using Briscoe, B., "Network Performance Isolation using
Congestion Policing", draft-briscoe-conex-policing-01 Congestion Policing", draft-briscoe-conex-policing-01
(work in progress), February 2014. (work in progress), February 2014.
[I-D.briscoe-docsis-q-protection] [I-D.briscoe-docsis-q-protection]
Briscoe, B. and G. White, "Queue Protection to Preserve Briscoe, B. and G. White, "Queue Protection to Preserve
Low Latency", draft-briscoe-docsis-q-protection-00 (work Low Latency", draft-briscoe-docsis-q-protection-00 (work
in progress), July 2019. in progress), July 2019.
[I-D.briscoe-iccrg-prague-congestion-control]
Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague
Congestion Control", draft-briscoe-iccrg-prague-
congestion-control-00 (work in progress), March 2021.
[I-D.briscoe-tsvwg-l4s-diffserv] [I-D.briscoe-tsvwg-l4s-diffserv]
Briscoe, B., "Interactions between Low Latency, Low Loss, Briscoe, B., "Interactions between Low Latency, Low Loss,
Scalable Throughput (L4S) and Differentiated Services", Scalable Throughput (L4S) and Differentiated Services",
draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress),
November 2018. November 2018.
[I-D.cardwell-iccrg-bbr-congestion-control] [I-D.cardwell-iccrg-bbr-congestion-control]
Cardwell, N., Cheng, Y., Yeganeh, S. H., and V. Jacobson, Cardwell, N., Cheng, Y., Yeganeh, S. H., and V. Jacobson,
"BBR Congestion Control", draft-cardwell-iccrg-bbr- "BBR Congestion Control", draft-cardwell-iccrg-bbr-
congestion-control-00 (work in progress), July 2017. congestion-control-00 (work in progress), July 2017.
[I-D.ietf-avtcore-cc-feedback-message]
Sarker, Z., Perkins, C., Singh, V., and M. A. Ramalho,
"RTP Control Protocol (RTCP) Feedback for Congestion
Control", draft-ietf-avtcore-cc-feedback-message-09 (work
in progress), November 2020.
[I-D.ietf-quic-transport]
Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
and Secure Transport", draft-ietf-quic-transport-34 (work
in progress), January 2021.
[I-D.ietf-tcpm-accurate-ecn] [I-D.ietf-tcpm-accurate-ecn]
Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More
Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate-
ecn-14 (work in progress), February 2021. ecn-14 (work in progress), February 2021.
[I-D.ietf-tcpm-generalized-ecn] [I-D.ietf-tcpm-generalized-ecn]
Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit
Congestion Notification (ECN) to TCP Control Packets", Congestion Notification (ECN) to TCP Control Packets",
draft-ietf-tcpm-generalized-ecn-07 (work in progress), draft-ietf-tcpm-generalized-ecn-07 (work in progress),
February 2021. February 2021.
skipping to change at page 33, line 17 skipping to change at page 33, line 34
Congestion Notification to Protocols that Encapsulate IP", Congestion Notification to Protocols that Encapsulate IP",
draft-ietf-tsvwg-ecn-encap-guidelines-15 (work in draft-ietf-tsvwg-ecn-encap-guidelines-15 (work in
progress), March 2021. progress), March 2021.
[I-D.ietf-tsvwg-ecn-l4s-id] [I-D.ietf-tsvwg-ecn-l4s-id]
Schepper, K. D. and B. Briscoe, "Explicit Congestion Schepper, K. D. and B. Briscoe, "Explicit Congestion
Notification (ECN) Protocol for Ultra-Low Queuing Delay Notification (ECN) Protocol for Ultra-Low Queuing Delay
(L4S)", draft-ietf-tsvwg-ecn-l4s-id-14 (work in progress), (L4S)", draft-ietf-tsvwg-ecn-l4s-id-14 (work in progress),
March 2021. March 2021.
[I-D.ietf-tsvwg-nqb]
White, G. and T. Fossati, "A Non-Queue-Building Per-Hop
Behavior (NQB PHB) for Differentiated Services", draft-
ietf-tsvwg-nqb-05 (work in progress), March 2021.
[I-D.ietf-tsvwg-rfc6040update-shim] [I-D.ietf-tsvwg-rfc6040update-shim]
Briscoe, B., "Propagating Explicit Congestion Notification Briscoe, B., "Propagating Explicit Congestion Notification
Across IP Tunnel Headers Separated by a Shim", draft-ietf- Across IP Tunnel Headers Separated by a Shim", draft-ietf-
tsvwg-rfc6040update-shim-13 (work in progress), March tsvwg-rfc6040update-shim-13 (work in progress), March
2021. 2021.
[I-D.morton-tsvwg-codel-approx-fair] [I-D.morton-tsvwg-codel-approx-fair]
Morton, J. and P. G. Heist, "Controlled Delay Approximate Morton, J. and P. G. Heist, "Controlled Delay Approximate
Fairness AQM", draft-morton-tsvwg-codel-approx-fair-01 Fairness AQM", draft-morton-tsvwg-codel-approx-fair-01
(work in progress), March 2020. (work in progress), March 2020.
skipping to change at page 33, line 39 skipping to change at page 34, line 16
Sridharan, M., Tan, K., Bansal, D., and D. Thaler, Sridharan, M., Tan, K., Bansal, D., and D. Thaler,
"Compound TCP: A New TCP Congestion Control for High-Speed "Compound TCP: A New TCP Congestion Control for High-Speed
and Long Distance Networks", draft-sridharan-tcpm-ctcp-02 and Long Distance Networks", draft-sridharan-tcpm-ctcp-02
(work in progress), November 2008. (work in progress), November 2008.
[I-D.stewart-tsvwg-sctpecn] [I-D.stewart-tsvwg-sctpecn]
Stewart, R. R., Tuexen, M., and X. Dong, "ECN for Stream Stewart, R. R., Tuexen, M., and X. Dong, "ECN for Stream
Control Transmission Protocol (SCTP)", draft-stewart- Control Transmission Protocol (SCTP)", draft-stewart-
tsvwg-sctpecn-05 (work in progress), January 2014. tsvwg-sctpecn-05 (work in progress), January 2014.
[I-D.white-tsvwg-nqb]
White, G. and T. Fossati, "Identifying and Handling Non
Queue Building Flows in a Bottleneck Link", draft-white-
tsvwg-nqb-02 (work in progress), June 2019.
[L4Sdemo16] [L4Sdemo16]
Bondarenko, O., De Schepper, K., Tsang, I., and B. Bondarenko, O., De Schepper, K., Tsang, I., and B.
Briscoe, "Ultra-Low Delay for All: Live Experience, Live Briscoe, "Ultra-Low Delay for All: Live Experience, Live
Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016,
<http://dl.acm.org/citation.cfm?doid=2910017.2910633 <http://dl.acm.org/citation.cfm?doid=2910017.2910633
(videos of demos: (videos of demos:
https://riteproject.eu/dctth/#1511dispatchwg )>. https://riteproject.eu/dctth/#1511dispatchwg )>.
[LEDBAT_AQM] [LEDBAT_AQM]
Al-Saadi, R., Armitage, G., and J. But, "Characterising Al-Saadi, R., Armitage, G., and J. But, "Characterising
skipping to change at page 38, line 15 skipping to change at page 38, line 35
[RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of [RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of
Pervasive Encryption on Operators", RFC 8404, Pervasive Encryption on Operators", RFC 8404,
DOI 10.17487/RFC8404, July 2018, DOI 10.17487/RFC8404, July 2018,
<https://www.rfc-editor.org/info/rfc8404>. <https://www.rfc-editor.org/info/rfc8404>.
[RFC8511] Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst, [RFC8511] Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst,
"TCP Alternative Backoff with ECN (ABE)", RFC 8511, "TCP Alternative Backoff with ECN (ABE)", RFC 8511,
DOI 10.17487/RFC8511, December 2018, DOI 10.17487/RFC8511, December 2018,
<https://www.rfc-editor.org/info/rfc8511>. <https://www.rfc-editor.org/info/rfc8511>.
[RFC8888] Sarker, Z., Perkins, C., Singh, V., and M. Ramalho, "RTP
Control Protocol (RTCP) Feedback for Congestion Control",
RFC 8888, DOI 10.17487/RFC8888, January 2021,
<https://www.rfc-editor.org/info/rfc8888>.
[RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
Multiplexed and Secure Transport", RFC 9000,
DOI 10.17487/RFC9000, May 2021,
<https://www.rfc-editor.org/info/rfc9000>.
[SCReAM] Johansson, I., "SCReAM", github repository; ,
<https://github.com/EricssonResearch/scream/blob/master/
README.md>.
[TCP-CA] Jacobson, V. and M. Karels, "Congestion Avoidance and [TCP-CA] Jacobson, V. and M. Karels, "Congestion Avoidance and
Control", Laurence Berkeley Labs Technical Report , Control", Laurence Berkeley Labs Technical Report ,
November 1988, <http://ee.lbl.gov/papers/congavoid.pdf>. November 1988, <http://ee.lbl.gov/papers/congavoid.pdf>.
[TCP-sub-mss-w] [TCP-sub-mss-w]
Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion
Window for Small Round Trip Times", BT Technical Report Window for Small Round Trip Times", BT Technical Report
TR-TUB8-2015-002, May 2015, TR-TUB8-2015-002, May 2015,
<http://www.bobbriscoe.net/projects/latency/sub-mss- <http://www.bobbriscoe.net/projects/latency/sub-mss-
w.pdf>. w.pdf>.
 End of changes. 56 change blocks. 
212 lines changed or deleted 255 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/