draft-ietf-aqm-fq-codel-06.txt   rfc8290.txt 
AQM working group T. Hoeiland-Joergensen Internet Engineering Task Force (IETF) T. Hoeiland-Joergensen
Internet-Draft Karlstad University Request for Comments: 8290 Karlstad University
Intended status: Experimental P. McKenney Category: Experimental P. McKenney
Expires: September 19, 2016 IBM Linux Technology Center ISSN: 2070-1721 IBM Linux Technology Center
D. Taht D. Taht
Teklibre Teklibre
J. Gettys J. Gettys
E. Dumazet E. Dumazet
Google, Inc. Google, Inc.
March 18, 2016 January 2018
The FlowQueue-CoDel Packet Scheduler and Active Queue Management The Flow Queue CoDel Packet Scheduler and
Algorithm Active Queue Management Algorithm
draft-ietf-aqm-fq-codel-06
Abstract Abstract
This memo presents the FQ-CoDel hybrid packet scheduler/Active Queue This memo presents the FQ-CoDel hybrid packet scheduler and Active
Management algorithm, a powerful tool for fighting bufferbloat and Queue Management (AQM) algorithm, a powerful tool for fighting
reducing latency. bufferbloat and reducing latency.
FQ-CoDel mixes packets from multiple flows and reduces the impact of FQ-CoDel mixes packets from multiple flows and reduces the impact of
head of line blocking from bursty traffic. It provides isolation for head-of-line blocking from bursty traffic. It provides isolation for
low-rate traffic such as DNS, web, and videoconferencing traffic. It low-rate traffic such as DNS, web, and videoconferencing traffic. It
improves utilisation across the networking fabric, especially for improves utilisation across the networking fabric, especially for
bidirectional traffic, by keeping queue lengths short; and it can be bidirectional traffic, by keeping queue lengths short, and it can be
implemented in a memory- and CPU-efficient fashion across a wide implemented in a memory- and CPU-efficient fashion across a wide
range of hardware. range of hardware.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This document is not an Internet Standards Track specification; it is
provisions of BCP 78 and BCP 79. published for examination, experimental implementation, and
evaluation.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document defines an Experimental Protocol for the Internet
and may be updated, replaced, or obsoleted by other documents at any community. This document is a product of the Internet Engineering
time. It is inappropriate to use Internet-Drafts as reference Task Force (IETF). It represents the consensus of the IETF
material or to cite them other than as "work in progress." community. It has received public review and has been approved for
publication by the Internet Engineering Steering Group (IESG). Not
all documents approved by the IESG are a candidate for any level of
Internet Standard; see Section 2 of RFC 7841.
This Internet-Draft will expire on September 19, 2016. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc8290.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Conventions used in this document . . . . . . . . . . . . 4 1.1. Conventions Used in This Document . . . . . . . . . . . . 4
1.2. Terminology and concepts . . . . . . . . . . . . . . . . 4 1.2. Terminology and Concepts . . . . . . . . . . . . . . . . 5
1.3. Informal summary of FQ-CoDel . . . . . . . . . . . . . . 5 1.3. Informal Summary of FQ-CoDel . . . . . . . . . . . . . . 5
2. CoDel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. CoDel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3. Flow Queueing . . . . . . . . . . . . . . . . . . . . . . . . 6 3. Flow Queueing . . . . . . . . . . . . . . . . . . . . . . . . 7
4. The FQ-CoDel scheduler . . . . . . . . . . . . . . . . . . . 7 4. The FQ-CoDel Scheduler . . . . . . . . . . . . . . . . . . . 8
4.1. Enqueue . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.1. Enqueue . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.1. Alternative classification schemes . . . . . . . . . 8 4.1.1. Alternative Classification Schemes . . . . . . . . . 9
4.2. Dequeue . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2. Dequeue . . . . . . . . . . . . . . . . . . . . . . . . . 10
5. Implementation considerations . . . . . . . . . . . . . . . . 10 5. Implementation Considerations . . . . . . . . . . . . . . . . 11
5.1. Data structures . . . . . . . . . . . . . . . . . . . . . 10 5.1. Data Structures . . . . . . . . . . . . . . . . . . . . . 11
5.2. Parameters . . . . . . . . . . . . . . . . . . . . . . . 11 5.2. Parameters . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1. Interval . . . . . . . . . . . . . . . . . . . . . . 11 5.2.1. Interval . . . . . . . . . . . . . . . . . . . . . . 12
5.2.2. Target . . . . . . . . . . . . . . . . . . . . . . . 11 5.2.2. Target . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.3. Packet limit . . . . . . . . . . . . . . . . . . . . 11 5.2.3. Packet Limit . . . . . . . . . . . . . . . . . . . . 13
5.2.4. Quantum . . . . . . . . . . . . . . . . . . . . . . . 12 5.2.4. Quantum . . . . . . . . . . . . . . . . . . . . . . . 13
5.2.5. Flows . . . . . . . . . . . . . . . . . . . . . . . . 12 5.2.5. Flows . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2.6. Explicit Congestion Notification (ECN) . . . . . . . 12 5.2.6. Explicit Congestion Notification (ECN) . . . . . . . 14
5.2.7. CE threshold . . . . . . . . . . . . . . . . . . . . 13 5.2.7. CE Threshold . . . . . . . . . . . . . . . . . . . . 14
5.3. Probability of hash collisions . . . . . . . . . . . . . 13 5.3. Probability of Hash Collisions . . . . . . . . . . . . . 14
5.4. Memory Overhead . . . . . . . . . . . . . . . . . . . . . 13 5.4. Memory Overhead . . . . . . . . . . . . . . . . . . . . . 15
5.5. Per-Packet Timestamping . . . . . . . . . . . . . . . . . 14 5.5. Per-Packet Timestamping . . . . . . . . . . . . . . . . . 16
5.6. Limiting queueing in lower layers . . . . . . . . . . . . 15 5.6. Limiting Queueing in Lower Layers . . . . . . . . . . . . 16
5.7. Other forms of "Fair Queueing" . . . . . . . . . . . . . 15 5.7. Other Forms of Fair Queueing . . . . . . . . . . . . . . 17
5.8. Differences between CoDel and FQ-CoDel behaviour . . . . 15 5.8. Differences between CoDel and FQ-CoDel Behaviour . . . . 17
6. Limitations of flow queueing . . . . . . . . . . . . . . . . 16 6. Limitations of Flow Queueing . . . . . . . . . . . . . . . . 18
6.1. Fairness between things other than flows . . . . . . . . 16 6.1. Fairness between Things Other Than Flows . . . . . . . . 18
6.2. Flow bunching by opaque encapsulation . . . . . . . . . . 17 6.2. Flow Bunching by Opaque Encapsulation . . . . . . . . . . 18
6.3. Low-priority congestion control algorithms . . . . . . . 17 6.3. Low-Priority Congestion Control Algorithms . . . . . . . 19
7. Deployment status and future work . . . . . . . . . . . . . . 18 7. Deployment Status and Future Work . . . . . . . . . . . . . . 19
8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 8. Security Considerations . . . . . . . . . . . . . . . . . . . 20
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 19 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 21
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 10.1. Normative References . . . . . . . . . . . . . . . . . . 21
11.1. Normative References . . . . . . . . . . . . . . . . . . 19 10.2. Informative References . . . . . . . . . . . . . . . . . 21
11.2. Informative References . . . . . . . . . . . . . . . . . 21 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 24
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25
1. Introduction 1. Introduction
The FlowQueue-CoDel (FQ-CoDel) algorithm is a combined packet The Flow Queue CoDel (FQ-CoDel) algorithm is a combined packet
scheduler and Active Queue Management (AQM) [RFC3168] algorithm scheduler and Active Queue Management (AQM) [RFC3168] algorithm
developed as part of the bufferbloat-fighting community effort developed as part of the bufferbloat-fighting community effort
[BLOATWEB]. It is based on a modified Deficit Round Robin (DRR) [BLOATWEB]. It is based on a modified Deficit Round Robin (DRR)
queue scheduler [DRR][DRRPP], with the CoDel AQM [I-D.ietf-aqm-codel] queue scheduler [DRR] [DRRPP] with the CoDel AQM [RFC8289] algorithm
algorithm operating on each queue. This document describes the operating on each queue. This document describes the combined
combined algorithm; reference implementations are available for the algorithm; reference implementations are available for the ns-2 [NS2]
ns2 [NS2] and ns3 [NS3] network simulators, and it is included in the and ns-3 [NS3] network simulators, and the algorithm is included in
mainline Linux kernel as the fq_codel queueing discipline [LINUXSRC]. the mainline Linux kernel as the fq_codel queueing discipline
[LINUXSRC].
FQ-CoDel is a general, efficient, nearly parameterless queue FQ-CoDel is a general, efficient, nearly parameterless queue
management approach combining flow queueing with CoDel. It is a management approach combining flow queueing with CoDel. It is a
powerful tool for solving bufferbloat [BLOAT], and we believe it to powerful tool for solving bufferbloat [BLOAT] and has already been
be safe to turn on by default, as has already happened in a number of turned on by default in a number of Linux distributions. In this
Linux distributions. In this document we document the Linux document, we describe the Linux implementation in sufficient detail
implementation in sufficient detail for an independent for others to independently implement the algorithm for deployment
implementation, to enable deployment outside of the Linux ecosystem. outside the Linux ecosystem.
Since the FQ-CoDel algorithm was originally developed in the Linux Since the FQ-CoDel algorithm was originally developed in the Linux
kernel, that implementation is still considered canonical. This kernel, that implementation is still considered canonical. This
document strives to describe the algorithm in the abstract in the document describes the algorithm in the abstract in Sections 1-4 and
first sections and separate out most implementation details in separates out most implementation details in subsequent sections;
subsequent sections, but does use the Linux implementation as however, the Linux implementation is used as a reference for default
reference for default behaviour in the algorithm description itself. behaviour in the abstract algorithm description.
The rest of this document is structured as follows: This section This document is structured as follows. This section gives some
gives some concepts and terminology used in the rest of the document, concepts and terminology used in the rest of the document and gives a
and gives a short informal summary of the FQ-CoDel algorithm. short informal summary of the FQ-CoDel algorithm. Section 2 gives an
Section 2 gives an overview of the CoDel algorithm. Section 3 covers overview of the CoDel algorithm. Section 3 covers the flow hashing
the flow hashing and DRR portion. Section 4 then describes the and DRR portion. Section 4 then describes the working of the
working of the algorithm in detail, while Section 5 describes algorithm in detail, while Section 5 describes implementation details
implementation details and considerations. Section 6 lists some of and considerations. Section 6 lists some of the limitations of using
the limitations of using flow queueing. Finally, Section 7 outlines flow queueing. Section 7 outlines the current status of FQ-CoDel
the current status of FQ-CoDel deployment and lists some possible deployment and lists some possible future areas of inquiry. Finally,
future areas of inquiry, and Section 8 reiterates some important Section 8 reiterates some important security points that must be
security points that must be observed in the implementation. observed in the implementation.
1.1. Conventions used in this document 1.1. Conventions Used in This Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
document are to be interpreted as described in [RFC2119]. "OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
In this document, these words will appear with that interpretation capitals, as shown here.
only when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying [RFC2119] significance.
1.2. Terminology and concepts 1.2. Terminology and Concepts
Flow: A flow is typically identified by a 5-tuple of source IP, Flow: A flow is typically identified by a 5-tuple of source IP
destination IP, source port, destination port, and protocol number. address, destination IP address, source port number, destination
It can also be identified by a superset or subset of those port number, and protocol number. It can also be identified by a
parameters, or by media access control (MAC) address, or other means. superset or subset of those parameters, by Media Access Control
FQ-CoDel hashes flows into a configurable number of buckets to assign (MAC) address, or by other means. FQ-CoDel hashes flows into a
packets to internal Queues. configurable number of buckets to assign packets to internal
queues.
Queue: A queue of packets represented internally in FQ-CoDel. In Queue: A queue of packets represented internally in FQ-CoDel. In
most instances each flow gets its own queue; however because of the most instances, each flow gets its own queue; however, because of
possibility of hash collisions, this is not always the case. In an the possibility of hash collisions, this is not always the case.
attempt to avoid confusion, the word 'queue' is used to refer to the In an attempt to avoid confusion, the word "queue" is used to
internal data structure, and 'flow' to refer to the actual stream of refer to the internal data structure, and "flow" is used to refer
packets being delivered to the FQ-CoDel algorithm. to the actual stream of packets being delivered to the FQ-CoDel
algorithm.
Scheduler: A mechanism to select which queue a packet is dequeued Scheduler: A mechanism to select which queue a packet is dequeued
from. from.
CoDel AQM: The Active Queue Management algorithm employed by FQ-CoDel CoDel AQM: The Active Queue Management algorithm employed by
[I-D.ietf-aqm-codel]. FQ-CoDel as described in [RFC8289].
DRR: Deficit round-robin scheduling [DRR]. DRR: Deficit Round Robin scheduling [DRR].
Quantum: The maximum amount of bytes to be dequeued from a queue at Quantum: The maximum amount of bytes to be dequeued from a queue at
once. once.
Interval: Characteristic time period used by the control loop of Interval: Characteristic time period used by the control loop of
CoDel to detect when a persistent Queue is developing (see CoDel to detect when a persistent queue is developing (see
Section 4.3 of [I-D.ietf-aqm-codel]). Section 4.2 of [RFC8289]).
Target: Setpoint value of the minimum sojourn time of packets in a Target: Setpoint value of the minimum sojourn time of packets in a
Queue used as the target of the control loop in CoDel (see queue used as the target of the control loop in CoDel (see
Section 4.4 of [I-D.ietf-aqm-codel]). Section 4.3 of [RFC8289]).
1.3. Informal summary of FQ-CoDel 1.3. Informal Summary of FQ-CoDel
FQ-CoDel is a _hybrid_ of DRR [DRR] and CoDel [I-D.ietf-aqm-codel], FQ-CoDel is a hybrid of DRR [DRR] and CoDel [RFC8289], with an
with an optimisation for sparse flows similar to Shortest Queue First optimisation for sparse flows similar to Shortest Queue First (SQF)
(SQF) [SQF] and DRR++ [DRRPP]. We call this "Flow Queueing" rather [SQF] and DRR++ [DRRPP]. We call this "flow queueing" rather than
than "Fair Queueing" as flows that build a queue are treated "fair queueing", as flows that build a queue are treated differently
differently from flows that do not. from flows that do not.
By default, FQ-CoDel stochastically classifies incoming packets into By default, FQ-CoDel stochastically classifies incoming packets into
different queues by hashing the 5-tuple of IP protocol number and different queues by hashing the 5-tuple of protocol number, source
source and destination IP and port numbers, perturbed with a random and destination IP addresses, and source and destination port
number selected at initiation time (although other flow numbers, perturbed with a random number selected at initiation time
classification schemes can optionally be configured instead; see (although other flow classification schemes can optionally be
Section 4.1.1). Each queue is managed by the CoDel AQM algorithm configured instead; see Section 4.1.1). Each queue is managed by the
[CODEL]. Packet ordering within a queue is preserved, since queues CoDel AQM algorithm [CODEL] [RFC8289]. Packet ordering within a
have FIFO ordering. queue is preserved, since queues have FIFO ordering.
The FQ-CoDel algorithm consists of two logical parts: the scheduler The FQ-CoDel algorithm consists of two logical parts: (1) the
which selects which queue to dequeue a packet from, and the CoDel AQM scheduler, which selects which queue to dequeue a packet from, and
which works on each of the queues. The subtleties of FQ-CoDel are (2) the CoDel AQM, which works on each of the queues. The subtleties
mostly in the scheduling part, whereas the interaction between the of FQ-CoDel are mostly in the scheduling part, whereas the
scheduler and the CoDel algorithm are fairly straight forward: interaction between the scheduler and the CoDel algorithm are fairly
straightforward.
At initialisation, each queue is set up to have a separate set of At initialisation, each queue is set up to have a separate set of
CoDel state variables. By default, 1024 queues are created. The CoDel state variables. By default, 1024 queues are created. The
Linux implementation at the time of writing supports anywhere from Linux implementation at the time of writing supports anywhere from
one to 64K separate queues, and each queue maintains the state one to 65535 separate queues, and each queue maintains the state
variables throughout its lifetime, and so acts the same as the non-FQ variables throughout its lifetime, and so acts the same as the non-FQ
CoDel variant would. This means that with only one queue, FQ-CoDel variant of CoDel would. This means that with only one queue,
behaves essentially the same as CoDel by itself. FQ-CoDel behaves essentially the same as CoDel by itself.
On dequeue, FQ-CoDel selects a queue from which to dequeue by a two- On dequeue, FQ-CoDel selects a queue from which to dequeue by a two-
tier round-robin scheme, in which each queue is allowed to dequeue up tier, round-robin scheme, in which each queue is allowed to dequeue
to a configurable quantum of bytes for each iteration. Deviations up to a configurable quantum of bytes for each iteration. Deviations
from this quantum is maintained as byte credits for the queue, which from this quantum are maintained as byte credits for the queue, which
serves to make the fairness scheme byte-based rather than packet- serves to make the fairness scheme byte-based rather than packet-
based. The two-tier round-robin mechanism distinguishes between based. The two-tier, round-robin mechanism distinguishes between
"new" queues (which don't build up a standing queue) and "old" "new" queues (which don't build up a standing queue) and "old" queues
queues, that have queued enough data to be around for more than one (which have queued enough data to be active for more than one
iteration of the round-robin scheduler. iteration of the round-robin scheduler).
This new/old queue distinction has a particular consequence for This new/old queue distinction has a particular consequence for
queues that don't build up more than a quantum of bytes before being queues that don't build up more than a quantum of bytes before being
visited by the scheduler: Such queues are removed from the list, and visited by the scheduler: such a queue will be removed from the list
then re-added as a new queue each time a packet arrives for it, and after it empties and then re-added as a new queue the next time a
so will get priority over queues that do not empty out each round packet arrives for it. This means it will effectively get priority
(except for a minor modification to protect against starvation, over queues that do not empty out each round (a minor caveat is
detailed below). Exactly how little data a flow has to send to keep required here to protect against starvation, see below). Exactly how
its queue in this state is somewhat difficult to reason about, little data a flow has to send to keep its queue in this state is
because it depends on both the egress link speed and the number of somewhat difficult to reason about, because it depends on both the
concurrent flows. However, in practice many things that are egress link speed and the number of concurrent flows. However, in
beneficial to have prioritised for typical internet use (ACKs, DNS practice, many things that are beneficial to have prioritised for
lookups, interactive SSH, HTTP requests, VoIP) _tend_ to fall in this typical internet use (ACKs, DNS lookups, interactive Secure Shell
(SSH), HTTP requests, Voice over IP (VoIP)) _tend_ to fall in this
category, which is why FQ-CoDel performs so well for many practical category, which is why FQ-CoDel performs so well for many practical
applications. However, the implicitness of the prioritisation means applications. However, the implicitness of the prioritisation means
that for applications that require guaranteed priority (for instance that for applications that require guaranteed priority (for instance,
multiplexing the network control plane over the network itself), multiplexing the network control plane over the network itself),
explicit classification is still needed. explicit classification is still needed.
This scheduling scheme has some subtlety to it, which is explained in This scheduling scheme has some subtlety to it, which is explained in
detail in the remainder of this document. detail in the remainder of this document.
2. CoDel 2. CoDel
CoDel is described in the ACM Queue paper [CODEL], and the IETF CoDel is described in the Communications of the ACM paper [CODEL] and
document [I-D.ietf-aqm-codel]. The basic idea is to control queue the IETF document [RFC8289]. The basic idea is to control queue
length, maintaining sufficient queueing to keep the outgoing link length, maintaining sufficient queueing to keep the outgoing link
busy, but avoiding building up the queue beyond that point. This is busy but avoiding building up the queue beyond that point. This is
done by preferentially dropping packets that remain in the queue for done by preferentially dropping packets that remain in the queue for
"too long". Packets are dropped by head drop, which lowers the time "too long". Packets are dropped by head drop, which lowers the time
for the drop signal to propagate back to the sender by the length of for the drop signal to propagate back to the sender by the length of
the queue, and helps trigger TCP fast retransmit sooner. the queue and helps trigger TCP fast retransmit sooner.
The CoDel algorithm itself will not be described here; instead we The CoDel algorithm itself will not be described here; instead, we
refer the reader to the CoDel draft [I-D.ietf-aqm-codel]. refer the reader to the CoDel document [RFC8289].
3. Flow Queueing 3. Flow Queueing
The intention of FQ-CoDel's scheduler is to give each _flow_ its own The intention of FQ-CoDel's scheduler is to give each flow its own
queue, hence the term _Flow Queueing_. Rather than a perfect queue, hence the term "flow queueing". Rather than a perfect
realisation of this, a hashing-based scheme is used, where flows are realisation of this, a hashing-based scheme is used, where flows are
hashed into a number of buckets which each has its own queue. The hashed into a number of buckets, each of which has its own queue.
number of buckets is configurable, and presently defaults to 1024 in The number of buckets is configurable and presently defaults to 1024
the Linux implementation. This is enough to avoid hash collisions on in the Linux implementation. This is enough to avoid hash collisions
a moderate number of flows as seen for instance in a home gateway. on a moderate number of flows as seen, for instance, in a home
Depending on the characteristics of the link, this can be tuned to gateway. Depending on the characteristics of the link, this can be
trade off memory for a lower probability of hash collisions. See tuned to trade off memory for a lower probability of hash collisions.
Section 6 for a more in-depth discussion of this. See Sections 5.3 and 5.4 for a more in-depth discussion of this.
By default, the flow hashing is performed on the 5-tuple of source By default, the flow hashing is performed on the 5-tuple of source
and destination IP addresses and port numbers and IP protocol number. and destination IP addresses, source and destination port numbers,
While the hashing can be customised to match on arbitrary packet and protocol number. While the hashing can be customised to match on
bytes, care should be taken when doing so: Much of the benefit of the arbitrary packet bytes, care should be taken when doing so; much of
FQ-CoDel scheduler comes from this per-flow distinction. However, the benefit of the FQ-CoDel scheduler comes from this per-flow
the default hashing does have some limitations, as discussed in distinction. However, the default hashing does have some
Section 6. limitations, as discussed in Section 6.
FQ-CoDel's DRR scheduler is byte-based, employing a deficit round- FQ-CoDel's DRR scheduler is byte-based, employing a deficit round-
robin mechanism between queues. This works by keeping track of the robin mechanism between queues. This works by keeping track of the
current number _byte credits_ of each queue. This number is current number of "byte credits" of each queue. This number is
initialised to the configurable quantum; each time a queue gets a initialised to the configurable quantum; each time a queue gets a
dequeue opportunity, it gets to dequeue packets, decreasing the dequeue opportunity, it gets to dequeue packets, thus decreasing the
number of credits by the packet size for each packet. This continues number of credits by the packet size for each packet. This continues
until the value of _byte credits_ becomes zero or less, at which until the value of the byte credits counter becomes zero or less, at
point it is increased by one quantum, and the dequeue opportunity which point the counter is increased by one quantum, and the dequeue
ends. opportunity ends.
This means that if one queue contains packets of, for instance, size This means that if one queue contains packets of, for instance, size
quantum/3, and another contains quantum-sized packets, the first quantum/3, and another contains quantum-sized packets, the first
queue will dequeue three packets each time it gets a turn, whereas queue will dequeue three packets each time it gets a turn, whereas
the second only dequeues one. This means that flows that send small the second only dequeues one. This means that flows that send small
packets are not penalised by the difference in packet sizes; rather, packets are not penalised by the difference in packet sizes; rather,
the DRR scheme approximates a (single-)byte-based fairness queueing the DRR scheme approximates a byte-based fairness queueing scheme.
scheme. The size of the quantum determines the scheduling The size of the quantum determines the scheduling granularity, with
granularity, with the tradeoff from too small a quantum being the trade-off from too small a quantum being scheduling overhead.
scheduling overhead. For small bandwidths, lowering the quantum from For small bandwidths, lowering the quantum from the default MTU size
the default MTU size can be advantageous. can be advantageous.
Unlike plain DRR there are two sets of flows - a "new" list for flows Unlike plain DRR, there are two sets of flows: a "new" list for flows
that have not built a queue recently, and an "old" list for queues that have not built a queue recently and an "old" list for queues
that build a backlog. This distinction is an integral part of the that build a backlog. This distinction is an integral part of the
FQ-CoDel scheduler and is described in more detail in Section 4. FQ-CoDel scheduler and is described in more detail in Section 4.
4. The FQ-CoDel scheduler 4. The FQ-CoDel Scheduler
To make its scheduling decisions, FQ-CoDel maintains two ordered To make its scheduling decisions, FQ-CoDel maintains two ordered
lists of active queues, called "new" and "old" queues. When a packet lists of active queues: new and old queues. When a packet is added
is added to a queue that is not currently active, that queue becomes to a queue that is not currently active, that queue becomes active by
active by being added to the list of new queues. Later on, it is being added to the list of new queues. Later on, it is moved to the
moved to the list of old queues, from which it is removed when it is list of old queues, from which it is removed when it is no longer
no longer active. This behaviour is the source of some subtlety in active. This behaviour is the source of some subtlety in the packet
the packet scheduling at dequeue time, explained below. scheduling at dequeue time, as explained below.
4.1. Enqueue 4.1. Enqueue
The packet enqueue mechanism consists of three stages: classification The packet enqueue mechanism consists of three stages: classifying
into a queue, timestamping and bookkeeping, and optionally dropping a into a queue, timestamping and bookkeeping, and optionally dropping a
packet when the total number of enqueued packets goes over the packet when the total number of enqueued packets goes over the
maximum. maximum.
When a packet is enqueued, it is first classified into the When a packet is enqueued, it is first classified into the
appropriate queue. By default, this is done by hashing (using a appropriate queue. By default, this is done by hashing (using a
Jenkins hash function [JENKINS]) on the 5-tuple of IP protocol, and Jenkins hash function [JENKINS]) on the 5-tuple of IP protocol,
source and destination IP addresses and port numbers (if they exist), source and destination IP addresses, and source and destination port
and taking the hash value modulo the number of queues. The hash is numbers (if they exist) and then taking the hash value modulo the
salted by modulo addition of a random value selected at number of queues. The hash is salted by modulo addition of a random
initialisation time, to prevent possible DoS attacks if the hash is value selected at initialisation time to prevent possible DoS attacks
predictable ahead of time (see Section 8). The Linux kernel if the hash is predictable ahead of time (see Section 8). The Linux
implements the Jenkins hash function by mixing three 32-bit values kernel implements the Jenkins hash function by mixing three 32-bit
into a single 32-bit output value. Inputs larger than 96 bits are values into a single 32-bit output value. Inputs larger than 96 bits
reduced by additional mixing steps, 96 bits at a time. are reduced by additional mixing steps, 96 bits at a time.
Once the packet has been successfully classified into a queue, it is Once the packet has been successfully classified into a queue, it is
handed over to the CoDel algorithm for timestamping. It is then handed over to the CoDel algorithm for timestamping. It is then
added to the tail of the selected queue, and the queue's byte count added to the tail of the selected queue, and the queue's byte count
is updated by the packet size. Then, if the queue is not currently is updated by the packet size. Then, if the queue is not currently
active (i.e., if it is not in either the list of new or the list of active (i.e., if it is not in either the list of new queues or the
old queues), it is added to the end of the list of new queues, and list of old queues), it is added to the end of the list of new
its number of credits is initiated to the configured quantum. queues, and its number of credits is initiated to the configured
Otherwise, the queue is left in its current queue list. quantum. Otherwise, the queue is left in its current queue list.
Finally, the total number of enqueued packets is compared with the Finally, to protect against overload, the total number of enqueued
configured limit, and if it is _above_ this value (which can happen packets is compared with the configured limit. If the limit is
since a packet was just enqueued), a packet is dropped from the head exceeded (which can happen since a packet was just enqueued), the
of the queue with the largest current byte count. Note that this in queue with the largest current byte count is selected and half the
most cases means that the packet that gets dropped is different from number of packets from this queue (up to a maximum of 64 packets) are
the one that was just enqueued, and may even be from a different dropped from the head of that queue. Dropping several packets at
queue. once helps amortise the cost of finding the longest queue,
significantly lowering CPU usage in an overload situation.
4.1.1. Alternative classification schemes 4.1.1. Alternative Classification Schemes
As mentioned previously, it is possible to modify the classification As mentioned previously, it is possible to modify the classification
scheme to provide a different notion of a 'flow'. The Linux scheme to provide a different notion of a flow. The Linux
implementation provides this option in the form of the "tc filter" implementation provides this option in the form of the "tc filter"
command. While this can add capabilities (for instance, matching on command. While this can add capabilities (for instance, matching on
other possible parameters such as MAC address, diffserv code point other possible parameters such as MAC address, Diffserv code point
values, firewall rules, flow specific markings, IPv6 flow label, values, firewall rules, flow-specific markings, IPv6 flow label,
etc.), care should be taken to preserve the notion of 'flow' as much etc.), care should be taken to preserve the notion of flow because
of the benefit of the FQ-CoDel scheduler comes from keeping flows in much of the benefit of the FQ-CoDel scheduler comes from keeping
separate queues. flows in separate queues.
For protocols that do not contain a port number (such as ICMP), the For protocols that do not contain a port number (such as ICMP), the
Linux implementation simply sets the port numbers to zero and Linux implementation simply sets the port numbers to zero and
performs the hashing as usual. In practice, this results in such performs the hashing as usual. In practice, this results in such
protocols to each get their own queue (except in the case of hash protocols each getting their own queue (except in the case of hash
collisions). An implementation can perform other classifications for collisions). An implementation can perform other classifications for
protocols that have their own notion of a flow, but SHOULD fall back protocols that have their own notion of a flow but SHOULD fall back
to simply hashing on source and destination IP address and IP to simply hashing on source and destination IP address and protocol
protocol number in the absence of other information. number in the absence of other information.
The default classification scheme can additionally be improved by The default classification scheme can additionally be improved by
performing decapsulation of tunnelled packets prior to hashing on the performing decapsulation of tunnelled packets prior to hashing on the
5-tuple in the encapsulated payload. The Linux implementation does 5-tuple in the encapsulated payload. The Linux implementation does
this for common encapsulations known to the kernel, such as 6in4 this for common encapsulations known to the kernel, such as 6in4
[RFC4213], IP-in-IP [RFC2003] and GRE (Generic Routing Encapsulation) [RFC4213], IP-in-IP [RFC2003], and Generic Routing Encapsulation
[RFC2890]. This helps to distinguish between flows that share the (GRE) [RFC2890]. This helps to distinguish between flows that share
same (outer) 5-tuple, but of course is limited to unencrypted tunnels the same (outer) 5-tuple but, of course, is limited to unencrypted
(see Section 6.2). tunnels (see Section 6.2 for a discussion of encrypted tunnels).
4.2. Dequeue 4.2. Dequeue
Most of FQ-CoDel's work is done at packet dequeue time. It consists Most of FQ-CoDel's work is done at packet dequeue time. It consists
of three parts: selecting a queue from which to dequeue a packet, of three parts: selecting a queue from which to dequeue a packet,
actually dequeuing it (employing the CoDel algorithm in the process), actually dequeueing it (employing the CoDel algorithm in the
and some final bookkeeping. process), and some final bookkeeping.
For the first part, the scheduler first looks at the list of new For the first part, the scheduler first looks at the list of new
queues; for the queue at the head of that list, if that queue has a queues; for the queue at the head of that list, if that queue has a
negative number of credits (i.e., it has already dequeued at least a negative number of credits (i.e., it has already dequeued at least a
quantum of bytes), it is given an additional quantum of credits, the quantum of bytes), it is given an additional quantum of credits, the
queue is put onto _the end of_ the list of old queues, and the queue is put onto _the end of_ the list of old queues, and the
routine selects the next queue and starts again. routine selects the next queue and starts again.
Otherwise, that queue is selected for dequeue. If the list of new Otherwise, that queue is selected for dequeue. If the list of new
queues is empty, the scheduler proceeds down the list of old queues queues is empty, the scheduler proceeds down the list of old queues
in the same fashion (checking the credits, and either selecting the in the same fashion (checking the credits and either selecting the
queue for dequeuing, or adding credits and putting the queue back at queue for dequeueing or adding credits and putting the queue back at
the end of the list). the end of the list).
After having selected a queue from which to dequeue a packet, the After having selected a queue from which to dequeue a packet, the
CoDel algorithm is invoked on that queue. This applies the CoDel CoDel algorithm is invoked on that queue. This applies the CoDel
control law, which is the mechanism CoDel uses to determine when to control law, which is the mechanism CoDel uses to determine when to
drop packets (see [I-D.ietf-aqm-codel]). As a result of this, one or drop packets (see [RFC8289]). As a result of this, one or more
more packets may be discarded from the head of the selected queue, packets may be discarded from the head of the selected queue before
before the packet that should be dequeued is returned (or nothing is the packet that should be dequeued is returned (or nothing is
returned if the queue is or becomes empty while being handled by the returned if the queue is or becomes empty while being handled by the
CoDel algorithm). CoDel algorithm).
Finally, if the CoDel algorithm does not return a packet, then the Finally, if the CoDel algorithm does not return a packet, then the
queue must be empty, and the scheduler does one of two things: if the queue must be empty, and the scheduler does one of two things. If
queue selected for dequeue came from the list of new queues, it is the queue selected for dequeue came from the list of new queues, it
moved to _the end of_ the list of old queues. If instead it came is moved to _the end of_ the list of old queues. If instead it came
from the list of old queues, that queue is removed from the list, to from the list of old queues, that queue is removed from the list, to
be added back (as a new queue) the next time a packet arrives that be added back (as a new queue) the next time a packet arrives that
hashes to that queue. Then (since no packet was available for hashes to that queue. Then (since no packet was available for
dequeue), the whole dequeue process is restarted from the beginning. dequeue), the whole dequeue process is restarted from the beginning.
If, instead, the scheduler _did_ get a packet back from the CoDel If, instead, the scheduler _did_ get a packet back from the CoDel
algorithm, it subtracts the size of the packet from the byte credits algorithm, it subtracts the size of the packet from the byte credits
for the selected queue and returns the packet as the result of the for the selected queue and returns the packet as the result of the
dequeue operation. dequeue operation.
The step that moves an empty queue from the list of new queues to The step that moves an empty queue from the list of new queues to the
_the end of_ the list of old queues before it is removed is crucial end of the list of old queues before it is removed is crucial to
to prevent starvation. Otherwise the queue could reappear (the next prevent starvation. Otherwise, the queue could reappear (the next
time a packet arrives for it) before the list of old queues is time a packet arrives for it) before the list of old queues is
visited; this can go on indefinitely even with a small number of visited; this can go on indefinitely, even with a small number of
active flows, if the flow providing packets to the queue in question active flows, if the flow providing packets to the queue in question
transmits at just the right rate. This is prevented by first moving transmits at just the right rate. This is prevented by first moving
the queue to _the end of_ the list of old queues, forcing a pass the queue to the end of the list of old queues, forcing the scheduler
through that, and thus preventing starvation. Moving it to the end to service all old queues before the empty queue is removed and thus
of the list, rather than the front, is crucial for this to work. preventing starvation.
The resulting migration of queues between the different states is The resulting migration of queues between the different states is
summarised in the following state diagram: summarised in the state diagram shown in Figure 1. Note that both
the new and old queue states can additionally have arrival and
dequeue events that do not change the state; these are omitted in the
figure.
+-----------------+ +------------------+ +-----------------+ +------------------+
| | Empty | | | | Empty | |
| Empty |<---------------+ Old +----+ | Empty |<---------------+ Old +----+
| | | | | | | | | |
+-------+---------+ +------------------+ | +-------+---------+ +------------------+ |
| ^ ^ |Credits | ^ ^ |Credits
|Arrival | | |Exhausted |Arrival | | |Exhausted
v | | | v | | |
+-----------------+ | | | +-----------------+ | | |
| | Empty or | | | | | Empty or | | |
| New +-------------------+ +-------+ | New +-------------------+ +-------+
| | Credits exhausted | | Credits Exhausted
+-----------------+ +-----------------+
Figure 1: Partial state diagram for queues between different states. Figure 1: Partial State Diagram for Queues between Different States
Both the new and old queue states can additionally have arrival and
dequeue events that do not change the state; these are omitted here.
5. Implementation considerations 5. Implementation Considerations
This section contains implementation details for the FQ-CoDel This section contains implementation details for the FQ-CoDel
algorithm. This includes the data structures and parameters used in algorithm. This includes the data structures and parameters used in
the Linux implementation, as well as discussion of some required the Linux implementation, as well as discussion of some required
features of the target platform and other considerations. features of the target platform and other considerations.
5.1. Data structures 5.1. Data Structures
The main data structure of FQ-CoDel is the array of queues, which is The main data structure of FQ-CoDel is the array of queues, which is
instantiated with the number of queues specified by the _flows_ instantiated with the number of queues specified by the "flows"
parameter at instantiation time. Each queue consists simply of an parameter at instantiation time. Each queue consists simply of an
ordered list of packets with FIFO semantics, two state variables ordered list of packets with FIFO semantics, two state variables
tracking the queue credits and total number of bytes enqueued, and tracking the queue credits and total number of bytes enqueued, and
the set of CoDel state variables. Other state variables to track the set of CoDel state variables. Other state variables to track
queue statistics can also be included: for instance, the Linux queue statistics can also be included; for instance, the Linux
implementation keeps a count of dropped packets. implementation keeps a count of dropped packets.
In addition to the queue structures themselves, FQ-CoDel maintains In addition to the queue structures themselves, FQ-CoDel maintains
two ordered lists containing references to the subset of queues that two ordered lists containing references to the subset of queues that
are currently active. These are the list of 'new' queues and the are currently active. These are the lists of new and old queues, as
list of 'old' queues, as explained in Section 4 above. explained in Section 4 above.
In the Linux implementation, queue space is shared: there's a global In the Linux implementation, queue space is shared: there's a global
limit on the number of packets the queues can hold, but not one per limit on the number of packets the queues can hold, but not a limit
queue. for each queue.
5.2. Parameters 5.2. Parameters
The following are the user configuration parameters exposed by the The following are the user configuration parameters exposed by the
Linux implementation of FQ-CoDel. Linux implementation of FQ-CoDel.
5.2.1. Interval 5.2.1. Interval
The _interval_ parameter has the same semantics as CoDel and is used The "interval" parameter has the same semantics as CoDel and is used
to ensure that the minimum sojourn time of packets in a queue used as to ensure that the minimum sojourn time of packets in a queue used as
an estimator by the CoDel control algorithm is a relatively up-to- an estimator by the CoDel control algorithm is a relatively up-to-
date value. That is, CoDel only reacts to delay experienced in the date value. That is, CoDel only reacts to delay experienced in the
last epoch of length interval. It SHOULD be set to be on the order last epoch of length interval. It SHOULD be set to be on the order
of the worst-case RTT through the bottleneck to give end-points of the worst-case RTT through the bottleneck to give end points
sufficient time to react. sufficient time to react.
The default interval value is 100 ms. The default interval value is 100 ms.
5.2.2. Target 5.2.2. Target
The _target_ parameter has the same semantics as CoDel. It is the The "target" parameter has the same semantics as CoDel. It is the
acceptable minimum standing/persistent queue delay for each FQ-CoDel acceptable minimum standing/persistent queue delay for each FQ-CoDel
Queue. This minimum delay is identified by tracking the local queue. This minimum delay is identified by tracking the local
minimum queue delay that packets experience. minimum queue delay that packets experience.
The default target value is 5 ms, but this value should be tuned to The default target value is 5 ms, but this value should be tuned to
be at least the transmission time of a single MTU-sized packet at the be at least the transmission time of a single MTU-sized packet at the
prevalent egress link speed (which for, e.g., 1Mbps and MTU 1500 is prevalent egress link speed (which, for example, is ~15 ms for 1 Mbps
~15ms), to prevent CoDel from being too aggressive at low bandwidths. and MTU 1500). This prevents CoDel from being too aggressive at low
It should otherwise be set to on the order of 5-10% of the configured bandwidths. It should otherwise be set to 5-10% of the configured
interval. interval.
5.2.3. Packet limit 5.2.3. Packet Limit
Routers do not have infinite memory, so some packet limit MUST be Routers do not have infinite memory, so some packet limit MUST be
enforced. enforced.
The _limit_ parameter is the hard limit on the real queue size, The "limit" parameter is the hard limit on the real queue size,
measured in number of packets. This limit is a global limit on the measured in number of packets. This limit is a global limit on the
number of packets in all queues; each individual queue does not have number of packets in all queues; each individual queue does not have
an upper limit. When the limit is reached and a new packet arrives an upper limit. When the limit is reached and a new packet arrives
for enqueue, a packet is dropped from the head of the largest queue for enqueue, packets are dropped from the head of the largest queue
(measured in bytes) to make room for the new packet. (measured in bytes) to make room for the new packet.
In Linux, the default packet limit is 10240 packets, which is In Linux, the default packet limit is 10240 packets, which is
suitable for up to 10 Gigabit Ethernet speeds. In practice, the hard suitable for up to 10-Gigabit Ethernet speeds. In practice, the hard
limit is rarely, if ever, hit, as drops are performed by the CoDel limit is rarely (if ever) hit, as drops are performed by the CoDel
algorithm long before the limit is hit. For platforms that are algorithm long before the limit is hit. For platforms that are
severely memory constrained, a lower limit can be used. severely memory constrained, a lower limit can be used.
5.2.4. Quantum 5.2.4. Quantum
The _quantum_ parameter is the number of bytes each queue gets to The "quantum" parameter is the number of bytes each queue gets to
dequeue on each round of the scheduling algorithm. The default is dequeue on each round of the scheduling algorithm. The default is
set to 1514 bytes which corresponds to the Ethernet MTU plus the set to 1514 bytes, which corresponds to the Ethernet MTU plus the
hardware header length of 14 bytes. hardware header length of 14 bytes.
In systems employing TCP Segmentation Offload (TSO), where a "packet" In systems employing TCP Segmentation Offload (TSO), where a "packet"
consists of an offloaded packet train, it can presently be as large consists of an offloaded packet train, it can presently be as large
as 64K bytes. In systems using Generic Receive Offload (GRO), they as 64 kilobytes. In systems using Generic Receive Offload (GRO),
can be up to 17 times the TCP max segment size (or 25K bytes). These they can be up to 17 times the TCP max segment size (or 25
mega-packets severely impact FQ-CoDel's ability to schedule traffic, kilobytes). These mega-packets severely impact FQ-CoDel's ability to
and hurt latency needlessly. There is ongoing work in Linux to make schedule traffic, and they hurt latency needlessly. There is ongoing
smarter use of offload engines. work in Linux to make smarter use of offload engines.
5.2.5. Flows 5.2.5. Flows
The _flows_ parameter sets the number of queues into which the The "flows" parameter sets the number of queues into which the
incoming packets are classified. Due to the stochastic nature of incoming packets are classified. Due to the stochastic nature of
hashing, multiple flows may end up being hashed into the same slot. hashing, multiple flows may end up being hashed into the same slot.
This parameter can be set only at initialisation time in the current This parameter can be set only at initialisation time in the current
implementation, since memory has to be allocated for the hash table. implementation, since memory has to be allocated for the hash table.
The default value is 1024 in the current Linux implementation. The default value is 1024 in the current Linux implementation.
5.2.6. Explicit Congestion Notification (ECN) 5.2.6. Explicit Congestion Notification (ECN)
ECN is _enabled_ by default. Rather than do anything special with ECN [RFC3168] is enabled by default. Rather than do anything special
misbehaved ECN flows, FQ-CoDel relies on the packet scheduling system with misbehaved ECN flows, FQ-CoDel relies on the packet scheduling
to minimise their impact, thus the number of unresponsive packets in system to minimise their impact; thus, the number of unresponsive
a flow being marked with ECN can grow to the overall packet limit, packets in a flow being marked with ECN can grow to the overall
but will not otherwise affect the performance of the system. packet limit but will not otherwise affect the performance of the
system.
It can be disabled by specifying the _noecn_ parameter. ECN can be disabled by specifying the "noecn" parameter.
5.2.7. CE threshold 5.2.7. CE Threshold
This parameter enables Date Centre TCP (DCTCP)-like processing This parameter enables DCTCP-like processing resulting in Congestion
resulting in CE (Congestion Encountered) marking on ECN-Capable Encountered (CE) marking on ECN-Capable Transport (ECT) packets
Transport (ECT) packets [RFC3168] starting at a lower sojourn delay [RFC3168] starting at a lower sojourn delay setpoint than the default
setpoint than the default CoDel Target. Details of DCTCP can be CoDel target. Details of Data Center TCP (DCTCP) can be found in
found in [I-D.ietf-tcpm-dctcp]. [RFC8257].
The parameter, _ce_threshold_, is disabled by default and can be set The "ce_threshold" parameter is disabled by default; it can be
to a number of microseconds to enable. enabled by setting it to a number of microseconds.
5.3. Probability of hash collisions 5.3. Probability of Hash Collisions
Since the Linux FQ-CoDel implementation by default uses 1024 hash Since the Linux FQ-CoDel implementation by default uses 1024 hash
buckets, the probability that (say) 100 flows will all hash to the buckets, the probability that (say) 100 flows will all hash to the
same bucket is something like ten to the power of minus 300. Thus, same bucket is something like ten to the power of minus 300. Thus,
at least one of the flows will almost certainly hash to some other at least one of the flows will almost certainly hash to some other
queue. queue.
Expanding on this, based on analytical equations for hash collision Expanding on this, based on analytical equations for hash collision
probabilities, for 100 flows, the probability of no collision is probabilities, for 100 flows, the probability of no collision is
90.78%; the probability that no more than two of the 100 flows will 90.78%; the probability that no more than two of the 100 flows will
be involved in any given collision = 99.57%; and the probability that be involved in any given collision is 99.57%; and the probability
no more than three of the 100 flows will be involved in any given that no more than three of the 100 flows will be involved in any
collision = 99.99%. These probabilities assume a hypothetical given collision is 99.99%. These probabilities assume a hypothetical
perfect hashing function, so in practice they may be a bit lower. We perfect hashing function, so in practice, they may be a bit lower.
have not found this difference to matter in practice. We have not found this difference to matter in practice.
These probabilities can be improved upon by using set-associative These probabilities can be improved upon by using set-associative
hashing, a technique used in the Cake algorithm currently being hashing, a technique used in the Cake algorithm currently being
developed as a further development upon the FQ-CoDel principles. For developed as a further refinement of the FQ-CoDel principles [CAKE].
a 4-way associative hash with the same number of total queues, the For a 4-way associative hash with the same number of total queues,
probability of no collisions for 100 flows is 99.93%, while for an the probability of no collisions for 100 flows is 99.93%, while for
8-way associative hash it is ~100%. an 8-way associative hash, it is ~100%.
5.4. Memory Overhead 5.4. Memory Overhead
FQ-CoDel can be implemented with a low memory footprint (less than 64 FQ-CoDel can be implemented with a low memory footprint (less than 64
bytes per queue on 64 bit systems). These are the data structures bytes per queue on 64-bit systems). These are the data structures
used in the Linux implementation: used in the Linux implementation:
<CODE BEGINS> <CODE BEGINS>
struct codel_vars { struct codel_vars {
u32 count; /* number of dropped packets */ u32 count; /* number of dropped packets */
u32 lastcount; /* count entry to dropping state */ u32 lastcount; /* count entry to dropping state */
bool dropping; /* currently dropping? */ bool dropping; /* currently dropping? */
u16 rec_inv_sqrt; /* reciprocal sqrt computation */ u16 rec_inv_sqrt; /* reciprocal sqrt computation */
codel_time_t first_above_time; /* when delay above target */ codel_time_t first_above_time; /* when delay above target */
codel_time_t drop_next; /* next time to drop */ codel_time_t drop_next; /* next time to drop */
codel_time_t ldelay; /* sojourn time of last dequeued packet */ codel_time_t ldelay; /* sojourn time of last dequeued packet */
}; };
skipping to change at page 15, line 4 skipping to change at page 16, line 31
struct list_head old_flows; /* list of old flows */ struct list_head old_flows; /* list of old flows */
}; };
<CODE ENDS> <CODE ENDS>
5.5. Per-Packet Timestamping 5.5. Per-Packet Timestamping
The CoDel portion of the algorithm requires per-packet timestamps be The CoDel portion of the algorithm requires per-packet timestamps be
stored along with the packet. While this approach works well for stored along with the packet. While this approach works well for
software-based routers, it may be impossible to retrofit devices that software-based routers, it may be impossible to retrofit devices that
do most of their processing in silicon and lack space or mechanism do most of their processing in silicon and lack the space or
for timestamping. mechanism for timestamping.
Also, while perfect resolution is not needed, timestamp resolution Also, while perfect resolution is not needed, timestamp resolution
finer than the CoDel target setting is necessary. Furthermore, finer than the CoDel target setting is necessary. Furthermore,
timestamping functions in the core OS need to be efficient as they timestamping functions in the core OS need to be efficient, as they
are called at least once on each packet enqueue and dequeue. are called at least once on each packet enqueue and dequeue.
5.6. Limiting queueing in lower layers 5.6. Limiting Queueing in Lower Layers
When deploying a queue management algorithm such as FQ-CoDel, it is When deploying a queue management algorithm such as FQ-CoDel, it is
important to ensure that the algorithm actually runs in the right important to ensure that the algorithm actually runs in the right
place to control the queue. In particular lower layers of the place to control the queue. In particular, lower layers of the
operating system networking stack can have queues of their own, as operating system networking stack can have queues of their own, as
can device drivers and hardware. Thus, it is desirable that the can device drivers and hardware. Thus, it is desirable that the
queue management algorithm runs as close to the hardware as possible. queue management algorithm runs as close to the hardware as possible.
However, scheduling such complexity at interrupt time is difficult, However, scheduling such complexity at interrupt time is difficult,
so a small standing queue between the algorithm and the wire is often so a small standing queue between the algorithm and the wire is often
needed at higher transmit rates. needed at higher transmit rates.
In Linux, the mechanism to ensure these different needs are balanced In Linux, the mechanism to ensure these different needs are balanced
is called "Byte Queue Limits" [BQL], which controls the device driver is called "Byte Queue Limits" [BQL]; it controls the device driver
ring buffer (for physical line rates). For cases where this ring buffer (for physical line rates). For cases where this
functionality is not available, the queue can be controlled by means functionality is not available, the queue can be controlled by means
of a software rate limiter such as Hierarchical Token Bucket [HTB] or of a software rate limiter such as Hierarchical Token Bucket [HTB] or
Hierarchical Fair-Service Curve [HFSC]. The Cake algorithm [CAKE] Hierarchical Fair-Service Curve [HFSC]. The Cake algorithm [CAKE]
integrates a software rate limiter for this purpose. integrates a software rate limiter for this purpose.
Other issues with queues at lower layers are described in [CODEL]. Other issues with queues at lower layers are described in [CODEL].
5.7. Other forms of "Fair Queueing" 5.7. Other Forms of Fair Queueing
Much of the scheduling portion of FQ-CoDel is derived from DRR and is Much of the scheduling portion of FQ-CoDel is derived from DRR and is
substantially similar to DRR++. Versions based on Stochastic Fair substantially similar to DRR++. Versions based on Stochastic Fair
Queueing [SFQ] have also been produced and tested in ns2. Other Queueing [SFQ] have also been produced and tested in ns2. Other
forms of Fair Queueing, such as Weighted Fair Queueing [WFQ] or Quick forms of fair queueing, such as Weighted Fair Queueing [WFQ] or Quick
Fair Queueing [QFQ], have not been thoroughly explored, but there's Fair Queueing [QFQ], have not been thoroughly explored, but there's
no a priori reason why the round-robin scheduling of FQ-CoDel no a priori reason why the round-robin scheduling of FQ-CoDel
couldn't be replaced with something else. couldn't be replaced with something else.
For a comprehensive discussion of fairness queueing algorithms and For a comprehensive discussion of fairness queueing algorithms and
their combination with AQM, see [I-D.ietf-aqm-fq-implementation]. their combination with AQM, see [RFC7806].
5.8. Differences between CoDel and FQ-CoDel behaviour 5.8. Differences between CoDel and FQ-CoDel Behaviour
CoDel can be applied to a single queue system as a straight AQM, CoDel can be applied to a single queue system as a straight AQM,
where it converges towards an "ideal" drop rate (i.e., one that where it converges towards an "ideal" drop rate (i.e., one that
minimises delay while keeping a high link utilisation), and then minimises delay while keeping a high link utilisation) and then
optimises around that control point. optimises around that control point.
The scheduling of FQ-CoDel mixes packets of competing flows, which The scheduling of FQ-CoDel mixes packets of competing flows, which
acts to pace bursty flows to better fill the pipe. Additionally, a acts to pace bursty flows to better fill the pipe. Additionally, a
new flow gets substantial leeway over other flows until CoDel finds new flow gets substantial leeway over other flows until CoDel finds
an ideal drop rate for it. However, for a new flow that exceeds the an ideal drop rate for it. However, for a new flow that exceeds the
configured quantum, more time passes before all of its data is configured quantum, more time passes before all of its data is
delivered (as packets from it, too, are mixed across the other delivered (as packets from it, too, are mixed across the other
existing queue-building flows). Thus, FQ-CoDel takes longer (as existing queue-building flows). Thus, FQ-CoDel takes longer (as
measured in time) to converge towards an ideal drop rate for a given measured in time) to converge towards an ideal drop rate for a given
new flow, but does so within fewer delivered _packets_ from that new flow but does so within fewer delivered _packets_ from that flow.
flow.
Finally, the flow isolation FQ-CoDel provides means that the CoDel Finally, the flow isolation provided by FQ-CoDel means that the CoDel
drop mechanism operates on the flows actually building queues, which drop mechanism operates on the flows actually building queues; this
results in packets being dropped more accurately from the largest results in packets being dropped more accurately from the largest
flows than CoDel alone manages. Additionally, flow isolation flows than when only CoDel is used. Additionally, flow isolation
radically improves the transient behaviour of the network when radically improves the transient behaviour of the network when
traffic or link characteristics change (e.g., when new flows start up traffic or link characteristics change (e.g., when new flows start up
or the link bandwidth changes); while CoDel itself can take a while or the link bandwidth changes); while CoDel itself can take a while
to respond, FQ-CoDel reacts almost immediately. to respond, FQ-CoDel reacts almost immediately.
6. Limitations of flow queueing 6. Limitations of Flow Queueing
While FQ-CoDel has been shown in many scenarios to offer significant While FQ-CoDel has been shown in many scenarios to offer significant
performance gains compared to alternative queue management performance gains compared to alternative queue management
strategies, there are some scenarios where the scheduling algorithm strategies, there are some scenarios where the scheduling algorithm
in particular is not a good fit. This section documents some of the in particular is not a good fit. This section documents some of the
known cases which either may require tweaking the default behaviour, known cases in which either the default behaviour may require
or where alternatives to flow queueing should be considered. tweaking or alternatives to flow queueing should be considered.
6.1. Fairness between things other than flows 6.1. Fairness between Things Other Than Flows
In some parts of the network, enforcing flow-level fairness may not In some parts of the network, enforcing flow-level fairness may not
be desirable, or some other form of fairness may be more important. be desirable, or some other form of fairness may be more important.
An example of this can be an Internet Service Provider that may be Some examples of this include an ISP that may be more interested in
more interested in ensuring fairness between customers than between ensuring fairness between customers than between flows or a hosting
flows. Or a hosting or transit provider that wishes to ensure or transit provider that wishes to ensure fairness between connecting
fairness between connecting Autonomous Systems or networks. Another Autonomous Systems or networks. Another issue can be that the number
issue can be that the number of simultaneous flows experienced at a of simultaneous flows experienced at a particular link can be too
particular link can be too high for flow-based fairness queueing to high for flow-based fairness queueing to be effective.
be effective.
Whatever the reason, in a scenario where fairness between flows is Whatever the reason, in a scenario where fairness between flows is
not desirable, reconfiguring FQ-CoDel to match on a different not desirable, reconfiguring FQ-CoDel to match on a different
characteristic can be a way forward. The implementation in Linux can characteristic can be a way forward. The implementation in Linux can
leverage the packet matching mechanism of the _tc_ subsystem to use leverage the packet matching mechanism of the "tc" subsystem to use
any available packet field to partition packets into virtual queues, any available packet field to partition packets into virtual queues,
to for instance match on address or subnet source/destination pairs, for instance, to match on address or subnet source/destination pairs,
application layer characteristics, etc. application-layer characteristics, etc.
Furthermore, as commonly deployed today, FQ-CoDel is used with three Furthermore, as commonly deployed today, FQ-CoDel is used with three
or more tiers of service classification: priority, best effort and or more tiers of service classification, based on Diffserv markings:
background, based on diffserv markings. Some products do more priority, best effort, and background. Some products do more
detailed classification, including deep packet inspection and detailed classification, including deep packet inspection and
destination-specific filters to achieve their desired result. destination-specific filters to achieve their desired result.
6.2. Flow bunching by opaque encapsulation 6.2. Flow Bunching by Opaque Encapsulation
Where possible, FQ-CoDel will attempt to decapsulate packets before Where possible, FQ-CoDel will attempt to decapsulate packets before
matching on the header fields for the flow hashing. However, for matching on the header fields for the flow hashing. However, for
some encapsulation techniques, most notably encrypted VPNs, this is some encapsulation techniques, most notably encrypted VPNs, this is
not possible. If several flows are bunched into one such not possible. If several flows are bunched into one such
encapsulated tunnel, they will be seen as one flow by the FQ-CoDel encapsulated tunnel, they will be seen as one flow by the FQ-CoDel
algorithm. This means that they will share a queue, and drop algorithm. This means that they will share a queue and drop
behaviour, and so flows inside the encapsulation will not benefit behaviour, so flows inside the encapsulation will not benefit from
from the implicit prioritisation of FQ-CoDel, but will continue to the implicit prioritisation of FQ-CoDel but will continue to benefit
benefit from the reduced overall queue length from the CoDel from the reduced overall queue length from the CoDel algorithm
algorithm operating on the queue. In addition, when such an operating on the queue. In addition, when such an encapsulated bunch
encapsulated bunch competes against other flows, it will count as one competes against other flows, it will count as one flow and not
flow, and not assigned a share of the bandwidth based on how many assigned a share of the bandwidth based on how many flows are inside
flows are inside the encapsulation. the encapsulation.
Depending on the application, this may or may not be desirable Depending on the application, this may or may not be desirable
behaviour. In cases where it is not, changing FQ-CoDel's matching to behaviour. In cases where it is not, changing FQ-CoDel's matching to
not be flow-based (as detailed in the previous subsection above) can not be flow-based (as detailed in the previous subsection above) can
be a mitigation. Going forward, having some mechanism for opaque be a mitigation. Going forward, having some mechanism for opaque
encapsulations to express to the outer layer which flow a packet encapsulations to express to the outer layer which flow a packet
belongs to, could be a way to mitigate this. Naturally, care needs belongs to could be a way to mitigate this. Naturally, care needs to
to be taken when designing such a mechanism to ensure no new privacy be taken when designing such a mechanism to ensure no new privacy and
and security issues are raised by exposing information from inside security issues are raised by exposing information from inside the
the encapsulation to the outside world. Keeping the extra encapsulation to the outside world. Keeping the extra information
information out-of-band and dropping it before it hits the network out of band and dropping it before it hits the network could be one
could be one way to achieve this. way to achieve this.
6.3. Low-priority congestion control algorithms 6.3. Low-Priority Congestion Control Algorithms
In the presence of queue management schemes that limit latency under In the presence of queue management schemes that limit latency under
load, low-priority congestion control algorithms such as LEDBAT load, low-priority congestion control algorithms such as Low Extra
[RFC6817] (or, in general, algorithms that try to voluntarily use up Delay Background Transport (LEDBAT) [RFC6817] (or, in general,
less than their fair share of bandwidth) experiences little added algorithms that try to voluntarily use up less than their fair share
latency when the link is congested. Thus, they lack the signal to of bandwidth) experience little added latency when the link is
back off that added latency previously afforded them. This effect is congested. Thus, they lack the signal to back off that added latency
seen with FQ-CoDel as well as with any effective AQM [GONG2014]. previously afforded them. This effect is seen with FQ-CoDel as well
as with any effective AQM [GONG2014].
As such, these delay-based algorithms tend to revert to loss-based As such, these delay-based algorithms tend to revert to loss-based
congestion control, and will consume the fair share of bandwidth congestion control and will consume the fair share of bandwidth
afforded to them by the FQ-CoDel scheduler. However, low-priority afforded to them by the FQ-CoDel scheduler. However, low-priority
congestion control mechanisms may be able to take steps to continue congestion control mechanisms may be able to take steps to continue
to be low priority, for instance by taking into account the vastly to be low priority, for instance, by taking into account the vastly
reduced level of delay afforded by an AQM, or by using a coupled reduced level of delay afforded by an AQM or by using a coupled
approach to observing the behaviour of multiple flows. approach to observing the behaviour of multiple flows.
7. Deployment status and future work 7. Deployment Status and Future Work
The FQ-CoDel algorithm as described in this document has been shipped The FQ-CoDel algorithm as described in this document has been shipped
as part of the Linux kernel since version 3.5, released on the 21st as part of the Linux kernel since version 3.5 (released on the 21st
of July, 2012, with the ce_threshold being added in version 4.2. The of July, 2012), with the ce_threshold being added in version 4.2.
algorithm has seen widespread testing in a variety of contexts and is The algorithm has seen widespread testing in a variety of contexts
configured as the default queueing discipline in a number of mainline and is configured as the default queueing discipline in a number of
Linux distributions (as of this writing at least OpenWRT, Arch Linux mainline Linux distributions (as of this writing, at least OpenWRT,
and Fedora). We believe it to be a safe default and encourage people Arch Linux, and Fedora). In addition, a BSD implementation is
running Linux to turn it on: It is a massive improvement over the available. All data resulting from these trials have shown FQ-CoDel
previous default FIFO queue. to be a massive improvement over the previous default FIFO queue, and
people are encouraged to turn it on.
Of course there is always room for improvement, and this document has Of course, there is always room for improvement, and this document
listed some of the known limitations of the algorithm. As such, we has listed some of the known limitations of the algorithm. As such,
encourage further research into algorithm refinements and addressing we encourage further research into algorithm refinements and
of limitations. One such effort is undertaken by the bufferbloat addressing of limitations. One such effort has been undertaken by
community in the form of the Cake queue management scheme [CAKE]. In the bufferbloat community in the form of the Cake queue management
addition to this we believe the following (non-exhaustive) list of scheme [CAKE]. In addition to this, we believe the following
issues to be worthy of further enquiry: (non-exhaustive) list of issues to be worthy of further enquiry:
o Variations on the flow classification mechanism to fit different o Variations on the flow classification mechanism to fit different
notions of flows. For instance, an ISP might want to deploy per- notions of flows. For instance, an ISP might want to deploy per-
subscriber scheduling, while in other cases several flows can subscriber scheduling, while in other cases, several flows can
share a 5-tuple, as exemplified by the RTCWEB QoS recommendations share a 5-tuple, as exemplified by the RTCWEB QoS recommendations
[I-D.ietf-tsvwg-rtcweb-qos]. [WEBRTC-QOS].
o Interactions between flow queueing and delay-based congestion o Interactions between flow queueing and delay-based congestion
control algorithms and scavenger protocols. control algorithms and scavenger protocols.
o Other scheduling mechanisms to replace the DRR portion of the o Other scheduling mechanisms to replace the DRR portion of the
algorithm, e.g., QFQ or WFQ. algorithm, e.g., QFQ or WFQ.
o Sensitivity of parameters, most notably the number of queues and o Sensitivity of parameters, most notably, the number of queues and
the CoDel parameters. the CoDel parameters.
8. Security Considerations 8. Security Considerations
There are no specific security exposures associated with FQ-CoDel There are no specific security exposures associated with FQ-CoDel
that are not also present in current FIFO systems. On the contrary, that are not also present in current FIFO systems. On the contrary,
some vulnerabilities of FIFO systems are reduced with FQ-CoDel (e.g., some vulnerabilities of FIFO systems are reduced with FQ-CoDel (e.g.,
simple minded packet floods). However, some care is needed in the simple minded packet floods). However, some care is needed in the
implementation to ensure this is the case. These are included in the implementation to ensure this is the case. These are included in the
description above, however we reiterate them here: description above, but we reiterate them here:
o To prevent packets in the new queues from starving old queues, it o To prevent packets in the new queues from starving old queues, it
is important that when a queue on the list of new queues empties, is important that when a queue on the list of new queues empties,
it is moved to _the end of_ the list of old queues. This is it is moved to _the end of_ the list of old queues. This is
described at the end of Section 4.2. described at the end of Section 4.2.
o To prevent an attacker targeting a specific flow for a denial of o To prevent an attacker targeting a specific flow for a denial-of-
service attack, the hash that maps packets to queues should not be service attack, the hash that maps packets to queues should not be
predictable. To achieve this, FQ-CoDel salts the hash, as predictable. To achieve this, FQ-CoDel salts the hash, as
described in the beginning of Section 4.1. The size of the salt described in the beginning of Section 4.1. The size of the salt
and the strength of the hash function is obviously a tradeoff and the strength of the hash function is obviously a trade-off
between performance and security. The Linux implementation uses a between performance and security. The Linux implementation uses a
32 bit random value as the salt and a Jenkins hash function. This 32-bit random value as the salt and a Jenkins hash function. This
makes it possible to achieve high throughput, and we consider it makes it possible to achieve high throughput, and we consider it
sufficient to ward off the most obvious attacks. sufficient to ward off the most obvious attacks.
o Packet fragments without a layer 4 header can be hashed into o Packet fragments without a Layer 4 header can be hashed into
different bins than the first fragment with the header intact. different bins than the first fragment with the header intact.
This can cause reordering and/or adversely affect the performance This can cause reordering and/or adversely affect the performance
of the flow. Keeping state to match the fragments to the of the flow. Keeping state to match the fragments to the
beginning of the packet, or simply putting all packet fragments beginning of the packet or simply putting all packet fragments
(including the first fragment of each fragmented packet) into the (including the first fragment of each fragmented packet) into the
same queue, are two ways to alleviate this. same queue are two ways to alleviate this.
9. IANA Considerations 9. IANA Considerations
This document has no actions for IANA. This document does not require any IANA actions.
10. Acknowledgements
Our deepest thanks to Kathie Nichols, Van Jacobson, and all the
members of the bufferbloat.net effort for all the help on developing
and testing the algorithm. In addition, our thanks to Anil Agarwal
for his help with getting the hash collision probabilities in this
document right.
11. References
11.1. Normative References
[I-D.ietf-aqm-codel]
Nichols, K., Jacobson, V., McGregor, A., and J. Iyengar,
"Controlled Delay Active Queue Management", draft-ietf-
aqm-codel-03 (work in progress), March 2016.
[I-D.ietf-aqm-fq-implementation]
Baker, F. and R. Pan, "On Queuing, Marking, and Dropping",
draft-ietf-aqm-fq-implementation-05 (work in progress),
November 2015.
[I-D.ietf-tcpm-dctcp]
Bensley, S., Eggert, L., Thaler, D., Balasubramanian, P.,
and G. Judd, "Datacenter TCP (DCTCP): TCP Congestion
Control for Datacenters", draft-ietf-tcpm-dctcp-01 (work
in progress), November 2015.
[I-D.ietf-tsvwg-rtcweb-qos] 10. References
Jones, P., Dhesikan, S., Jennings, C., and D. Druta, "DSCP
and other packet markings for WebRTC QoS", draft-ietf-
tsvwg-rtcweb-qos-14 (work in progress), March 2016.
[RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 10.1. Normative References
DOI 10.17487/RFC2003, October 1996,
<http://www.rfc-editor.org/info/rfc2003>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE",
RFC 2890, DOI 10.17487/RFC2890, September 2000,
<http://www.rfc-editor.org/info/rfc2890>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC7806] Baker, F. and R. Pan, "On Queuing, Marking, and Dropping",
of Explicit Congestion Notification (ECN) to IP", RFC 7806, DOI 10.17487/RFC7806, April 2016,
RFC 3168, DOI 10.17487/RFC3168, September 2001, <https://www.rfc-editor.org/info/rfc7806>.
<http://www.rfc-editor.org/info/rfc3168>.
[RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
for IPv6 Hosts and Routers", RFC 4213, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
DOI 10.17487/RFC4213, October 2005, May 2017, <https://www.rfc-editor.org/info/rfc8174>.
<http://www.rfc-editor.org/info/rfc4213>.
[RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, [RFC8289] Nichols, K., Jacobson, V., McGregor, A., Ed., and J.
"Low Extra Delay Background Transport (LEDBAT)", RFC 6817, Iyengar, Ed., "Controlled Delay Active Queue Management",
DOI 10.17487/RFC6817, December 2012, RFC 8289, DOI 10.17487/RFC8289, January 2018,
<http://www.rfc-editor.org/info/rfc6817>. <https://www.rfc-editor.org/info/rfc8289>.
11.2. Informative References 10.2. Informative References
[BLOAT] Gettys, J., "Bufferbloat: Dark buffers in the Internet.", [BLOAT] Gettys, J. and K. Nichols, "Bufferbloat: Dark Buffers in
in IEEE Internet Comput. 15, 3, the Internet", Communications of the ACM, Volume 55, Issue
DOI http://dx.doi.org/10.1109/MIC.2011.56, 2011, 1, DOI 10.1145/2063176.2063196, January 2012.
<http://www.bufferbloat.net/attachments/27/
IC-15-03-Backspace.pdf>.
[BLOATWEB] [BLOATWEB] "Bufferbloat", <https://www.bufferbloat.net>.
"Bufferbloat web site", <https://www.bufferbloat.net>.
[BQL] Herbert, T., "Network Byte Queue Limits", August 2011, [BQL] Herbert, T., "bql: Byte Queue Limits", August 2011,
<https://lwn.net/Articles/454390/>. <https://lwn.net/Articles/454378/>.
[CAKE] "Cake comprehensive queue management system", [CAKE] "Cake - Common Applications Kept Enhanced",
<http://www.bufferbloat.net/projects/codel/wiki/Cake>. <http://www.bufferbloat.net/projects/codel/wiki/Cake>.
[CODEL] Nichols, K. and V. Jacobson, "Controlling Queue Delay", [CODEL] Nichols, K. and V. Jacobson, "Controlling Queue Delay",
July 2012, <http://queue.acm.org/detail.cfm?id=2209336>. ACM Queue, Volume 10, Issue 5,
DOI 10.1145/2208917.2209336, May 2012,
<http://queue.acm.org/detail.cfm?id=2209336>.
[DRR] Shreedhar, M. and G. Varghese, "Efficient Fair Queueing [DRR] Shreedhar, M. and G. Varghese, "Efficient Fair Queueing
Using Deficit Round Robin", in IEEE/ACM Trans. Netw. 4, 3, Using Deficit Round Robin", IEEE/ACM Transactions on
June 1996, Networking, Volume 4, Issue 3, DOI 10.1109/90.502236, June
<http://users.ece.gatech.edu/~siva/ECE4607/presentations/ 1996.
DRR.pdf>.
[DRRPP] MacGregor, M. and W. Shi, "Deficits for Bursty Latency- [DRRPP] MacGregor, M. and W. Shi, "Deficits for Bursty Latency-
critical Flows: DRR++", in Proceedings IEEE International Critical Flows: DRR++", Proceedings of the IEEE
Conference on Networks 2000 (ICON 2000), 2000, International Conference on Networks 2000 (ICON 2000),
DOI 10.1109/ICON.2000.875803, September 2000,
<http://ieeexplore.ieee.org/xpls/ <http://ieeexplore.ieee.org/xpls/
abs_all.jsp?arnumber=875803>. abs_all.jsp?arnumber=875803>.
[GONG2014] [GONG2014] Gong, Y., Rossi, D., Testa, C., Valenti, S., and D. Taht,
Gong, Y., Rossi, D., Testa, C., Valenti, S., and D. Taht, "Fighting the bufferbloat: On the coexistence of AQM and
"Fighting the bufferbloat: on the coexistence of AQM and low priority congestion control", Elsevier Computer
low priority congestion control", in 2013 IEEE Conference Networks, Volume 65, DOI 10.1016/j.bjp.2014.01.009, June
on Computer Communications Workshops (INFOCOM WKSHPS), 2014, <https://www.sciencedirect.com/science/article/pii/
July 2014, <http://perso.telecom- S1389128614000188>.
paristech.fr/~drossi/paper/rossi14comnet-b.pdf>.
[HFSC] Stoica, I., Zhang, H., and T. Eugene, "Hierarchical fair- [HFSC] Stoica, I., Zhang, H., and T. Eugene Ng, "A Hierarchical
service curve", in Sigcomm 1997 proceedings, 1997, Fair Service Curve Algorithm for Link-Sharing, Real-Time
and Priority Services", Proceedings of ACM SIGCOMM,
DOI 10.1145/263105.263175, September 1997,
<http://conferences.sigcomm.org/sigcomm/1997/papers/ <http://conferences.sigcomm.org/sigcomm/1997/papers/
p011.pdf>. p011.pdf>.
[HTB] "Hierarchical Token Bucket", [HTB] Wikipedia, "Token Bucket: Variations", October 2017,
<https://en.wikipedia.org/wiki/Token_bucket#Variations>. <https://en.wikipedia.org/w/
index.php?title=Token_bucket&oldid=803574657>.
[JENKINS] Jenkins, B., "A Hash Function for Hash Table Lookup", [JENKINS] Jenkins, B., "A Hash Function for Hash Table Lookup",
1996, <http://www.burtleburtle.net/bob/hash/doobs.html>. <http://www.burtleburtle.net/bob/hash/doobs.html>.
[LINUXSRC] [LINUXSRC] "Linux Kernel Source Tree", <https://git.kernel.org/cgit/l
"Current FQ-CoDel Linux source code", <https://git.kernel. inux/kernel/git/torvalds/linux.git/tree/net/sched/
org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/ sch_fq_codel.c>.
sched/sch_fq_codel.c>.
[NS2] "NS2 web site", <http://nsnam.sourceforge.net/wiki>. [NS2] "ns-2", December 2014, <http://nsnam.sourceforge.net/wiki/
index.php?title=Main_Page&oldid=8076>.
[NS3] "NS3 web site", <https://www.nsnam.org/wiki>. [NS3] "ns-3", February 2016, <https://www.nsnam.org/mediawiki/
index.php?title=Main_Page&oldid=9883>.
[QFQ] Checconi, F., Rizzo, L., and P. Valente, "QFQ: Efficient [QFQ] Checconi, F., Rizzo, L., and P. Valente, "QFQ: Efficient
packet scheduling with tight guarantees", in IEEE/ACM Packet Scheduling with Tight Guarantees", IEEE/ACM
Transactions on Networking (TON), 2013, Transactions on Networking (TON), Volume 21, Issue 3, pp.
802-816, DOI 10.1109/TNET.2012.2215881, June 2013,
<http://dl.acm.org/citation.cfm?id=2525552>. <http://dl.acm.org/citation.cfm?id=2525552>.
[SFQ] McKenney, P., "Stochastic fairness queueing", published as [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003,
technical report, 2002, DOI 10.17487/RFC2003, October 1996,
<https://web.archive.org/web/20151003174154/ <https://www.rfc-editor.org/info/rfc2003>.
http://www2.rdrop.com/~paulmck/scalability/paper/
sfq.2002.06.04.pdf>.
[SQF] Bonald, T., Muscariello, L., and N. Ostallo, "On the [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE",
impact of TCP and per-flow scheduling on Internet RFC 2890, DOI 10.17487/RFC2890, September 2000,
Performance", in IEEE/ACM transactions on Networking, <https://www.rfc-editor.org/info/rfc2890>.
April 2012, <http://perso.telecom-
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>.
[RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms
for IPv6 Hosts and Routers", RFC 4213,
DOI 10.17487/RFC4213, October 2005,
<https://www.rfc-editor.org/info/rfc4213>.
[RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
"Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
DOI 10.17487/RFC6817, December 2012,
<https://www.rfc-editor.org/info/rfc6817>.
[RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L.,
and G. Judd, "Data Center TCP (DCTCP): TCP Congestion
Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257,
October 2017, <https://www.rfc-editor.org/info/rfc8257>.
[SFQ] McKenney, P., "Stochastic Fairness Queueing", Proceedings
of IEEE INFOCOM, DOI 10.1109/INFCOM.1990.91316, June 1990,
<http://perso.telecom-
paristech.fr/~bonald/Publications_files/BMO2011.pdf>. paristech.fr/~bonald/Publications_files/BMO2011.pdf>.
[SQF] Carofiglio, G. and L. Muscariello, "On the Impact of TCP
and Per-Flow Scheduling on Internet Performance", IEEE/ACM
Transactions on Networking, Volume 20, Issue 2,
DOI 10.1109/TNET.2011.2164553, August 2011.
[WEBRTC-QOS]
Jones, P., Dhesikan, S., Jennings, C., and D. Druta, "DSCP
Packet Markings for WebRTC QoS", Work in Progress,
draft-ietf-tsvwg-rtcweb-qos-18, August 2016.
[WFQ] Demers, A., Keshav, S., and S. Shenker, "Analysis and [WFQ] Demers, A., Keshav, S., and S. Shenker, "Analysis and
simulation of a fair queueing algorithm", in SIGCOMM Simulation of a Fair Queueing Algorithm", ACM SIGCOMM
Comput. Commun. Rev., September 1989, Computer Communication Review, Volume 19, Issue 4, pp.
1-12, DOI 10.1145/75247.75248, September 1989,
<http://doi.acm.org/10.1145/75247.75248>. <http://doi.acm.org/10.1145/75247.75248>.
Acknowledgements
Our deepest thanks to Kathie Nichols, Van Jacobson, and all the
members of the bufferbloat.net effort for all the help on developing
and testing the algorithm. In addition, our thanks to Anil Agarwal
for his help with getting the hash collision probabilities in this
document right.
Authors' Addresses Authors' Addresses
Toke Hoeiland-Joergensen Toke Hoeiland-Joergensen
Karlstad University Karlstad University
Dept. of Computer Science Dept. of Computer Science
Karlstad 65188 Karlstad 65188
Sweden Sweden
Email: toke@toke.dk
Email: toke.hoiland-jorgensen@kau.se
Paul McKenney Paul McKenney
IBM Linux Technology Center IBM Linux Technology Center
1385 NW Amberglen Parkway 1385 NW Amberglen Parkway
Hillsboro, OR 97006 Hillsboro, OR 97006
USA United States of America
Email: paulmck@linux.vnet.ibm.com Email: paulmck@linux.vnet.ibm.com
URI: http://www2.rdrop.com/~paulmck/ URI: http://www2.rdrop.com/~paulmck/
Dave Taht Dave Taht
Teklibre Teklibre
2104 W First street 2104 W First street
Apt 2002 Apt 2002
FT Myers, FL 33901 FT Myers, FL 33901
USA United States of America
Email: dave.taht@gmail.com Email: dave.taht@gmail.com
URI: http://www.teklibre.com/ URI: http://www.teklibre.com/
Jim Gettys Jim Gettys
21 Oak Knoll Road 21 Oak Knoll Road
Carlisle, MA 993 Carlisle, MA 993
USA United States of America
Email: jg@freedesktop.org Email: jg@freedesktop.org
URI: https://en.wikipedia.org/wiki/Jim_Gettys URI: https://en.wikipedia.org/wiki/Jim_Gettys
Eric Dumazet Eric Dumazet
Google, Inc. Google, Inc.
1600 Amphitheater Pkwy 1600 Amphitheatre Pkwy
Mountain View, CA 94043 Mountain View, CA 94043
USA United States of America
Email: edumazet@gmail.com Email: edumazet@gmail.com
 End of changes. 177 change blocks. 
510 lines changed or deleted 517 lines changed or added

This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/