draft-ietf-aqm-fq-codel-04.txt   draft-ietf-aqm-fq-codel-05.txt 
AQM working group T. Hoeiland-Joergensen AQM working group T. Hoeiland-Joergensen
Internet-Draft Karlstad University Internet-Draft Karlstad University
Intended status: Experimental P. McKenney Intended status: Experimental P. McKenney
Expires: August 7, 2016 IBM Linux Technology Center Expires: September 2, 2016 IBM Linux Technology Center
D. Taht D. Taht
Teklibre Teklibre
J. Gettys J. Gettys
E. Dumazet E. Dumazet
Google, Inc. Google, Inc.
February 04, 2016 March 01, 2016
FlowQueue-Codel FlowQueue-Codel
draft-ietf-aqm-fq-codel-04 draft-ietf-aqm-fq-codel-05
Abstract Abstract
This memo presents the FQ-CoDel hybrid packet scheduler/AQM This memo presents the FQ-CoDel hybrid packet scheduler/AQM
algorithm, a powerful tool for fighting bufferbloat and reducing algorithm, a powerful tool for fighting bufferbloat and reducing
latency. latency.
FQ-CoDel mixes packets from multiple flows and reduces the impact of FQ-CoDel mixes packets from multiple flows and reduces the impact of
head of line blocking from bursty traffic. It provides isolation for head of line blocking from bursty traffic. It provides isolation for
low-rate traffic such as DNS, web, and videoconferencing traffic. It low-rate traffic such as DNS, web, and videoconferencing traffic. It
skipping to change at page 1, line 47 skipping to change at page 1, line 47
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 7, 2016. This Internet-Draft will expire on September 2, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Terminology and concepts . . . . . . . . . . . . . . . . 3 1.1. Conventions used in this document . . . . . . . . . . . . 3
1.2. Informal summary of FQ-CoDel . . . . . . . . . . . . . . 4 1.2. Terminology and concepts . . . . . . . . . . . . . . . . 4
1.3. Informal summary of FQ-CoDel . . . . . . . . . . . . . . 4
2. CoDel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. CoDel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Flow Queueing . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Flow Queueing . . . . . . . . . . . . . . . . . . . . . . . . 6
4. The FQ-CoDel scheduler . . . . . . . . . . . . . . . . . . . 6 4. The FQ-CoDel scheduler . . . . . . . . . . . . . . . . . . . 7
4.1. Enqueue . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.1. Enqueue . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.1. Alternative classification schemes . . . . . . . . . 7 4.1.1. Alternative classification schemes . . . . . . . . . 8
4.2. Dequeue . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2. Dequeue . . . . . . . . . . . . . . . . . . . . . . . . . 8
5. Implementation considerations . . . . . . . . . . . . . . . . 9 5. Implementation considerations . . . . . . . . . . . . . . . . 9
5.1. Data structures . . . . . . . . . . . . . . . . . . . . . 9 5.1. Data structures . . . . . . . . . . . . . . . . . . . . . 10
5.2. Parameters . . . . . . . . . . . . . . . . . . . . . . . 10 5.2. Parameters . . . . . . . . . . . . . . . . . . . . . . . 10
5.2.1. Interval . . . . . . . . . . . . . . . . . . . . . . 10 5.2.1. Interval . . . . . . . . . . . . . . . . . . . . . . 10
5.2.2. Target . . . . . . . . . . . . . . . . . . . . . . . 10 5.2.2. Target . . . . . . . . . . . . . . . . . . . . . . . 10
5.2.3. Packet limit . . . . . . . . . . . . . . . . . . . . 10 5.2.3. Packet limit . . . . . . . . . . . . . . . . . . . . 11
5.2.4. Quantum . . . . . . . . . . . . . . . . . . . . . . . 11 5.2.4. Quantum . . . . . . . . . . . . . . . . . . . . . . . 11
5.2.5. Flows . . . . . . . . . . . . . . . . . . . . . . . . 11 5.2.5. Flows . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2.6. ECN . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.2.6. ECN . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2.7. CE threshold . . . . . . . . . . . . . . . . . . . . 11 5.2.7. CE threshold . . . . . . . . . . . . . . . . . . . . 12
5.3. Probability of hash collisions . . . . . . . . . . . . . 12 5.3. Probability of hash collisions . . . . . . . . . . . . . 12
5.4. Memory Overhead . . . . . . . . . . . . . . . . . . . . . 12 5.4. Memory Overhead . . . . . . . . . . . . . . . . . . . . . 12
5.5. Per-Packet Timestamping . . . . . . . . . . . . . . . . . 13 5.5. Per-Packet Timestamping . . . . . . . . . . . . . . . . . 13
5.6. Other forms of "Fair Queueing" . . . . . . . . . . . . . 13 5.6. Other forms of "Fair Queueing" . . . . . . . . . . . . . 14
5.7. Differences between CoDel and FQ-CoDel behaviour . . . . 13 5.7. Differences between CoDel and FQ-CoDel behaviour . . . . 14
6. Limitations of flow queueing . . . . . . . . . . . . . . . . 14 6. Limitations of flow queueing . . . . . . . . . . . . . . . . 15
6.1. Fairness between things other than flows . . . . . . . . 14 6.1. Fairness between things other than flows . . . . . . . . 15
6.2. Flow bunching by opaque encapsulation . . . . . . . . . . 15 6.2. Flow bunching by opaque encapsulation . . . . . . . . . . 15
6.3. Low-priority congestion control algorithms . . . . . . . 15 6.3. Low-priority congestion control algorithms . . . . . . . 16
7. Deployment status and future work . . . . . . . . . . . . . . 16 7. Deployment status and future work . . . . . . . . . . . . . . 16
8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18
11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 17 11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 18
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 18
12.1. Normative References . . . . . . . . . . . . . . . . . . 17 12.1. Normative References . . . . . . . . . . . . . . . . . . 18
12.2. Informative References . . . . . . . . . . . . . . . . . 18 12.2. Informative References . . . . . . . . . . . . . . . . . 19
12.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19
1. Introduction 1. Introduction
The FQ-CoDel algorithm is a combined packet scheduler and AQM The FQ-CoDel algorithm is a combined packet scheduler and AQM
developed as part of the bufferbloat-fighting community effort. It developed as part of the bufferbloat-fighting community effort. It
is based on a modified Deficit Round Robin (DRR) queue scheduler, is based on a modified Deficit Round Robin (DRR) queue scheduler,
with the CoDel AQM algorithm operating on each queue. This document with the CoDel AQM algorithm operating on each queue. This document
describes the combined algorithm; reference implementations are describes the combined algorithm; reference implementations are
available for ns2 and ns3 and it is included in the mainline Linux available for ns2 and ns3 and it is included in the mainline Linux
skipping to change at page 3, line 42 skipping to change at page 3, line 42
Section 2 gives an overview of the CoDel algorithm. Section 3 covers Section 2 gives an overview of the CoDel algorithm. Section 3 covers
the flow hashing and DRR portion. Section 4 then describes the the flow hashing and DRR portion. Section 4 then describes the
working of the algorithm in detail, while Section 5 describes working of the algorithm in detail, while Section 5 describes
implementation details and considerations. Section 6 lists some of implementation details and considerations. Section 6 lists some of
the limitations of using flow queueing. Section 7 outlines the the limitations of using flow queueing. Section 7 outlines the
current status of FQ-CoDel deployment and lists some possible future current status of FQ-CoDel deployment and lists some possible future
areas of inquiry, and Section 8 reiterates some important security areas of inquiry, and Section 8 reiterates some important security
points that must be observed in the implementation. Finally, points that must be observed in the implementation. Finally,
Section 11 concludes. Section 11 concludes.
1.1. Terminology and concepts 1.1. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying [RFC2119] significance.
1.2. Terminology and concepts
Flow: A flow is typically identified by a 5-tuple of source IP, Flow: A flow is typically identified by a 5-tuple of source IP,
destination IP, source port, destination port, and protocol. It can destination IP, source port, destination port, and protocol. It can
also be identified by a superset or subset of those parameters, or by also be identified by a superset or subset of those parameters, or by
mac address, or other means. mac address, or other means.
Queue: A queue of packets represented internally in FQ-CoDel. In Queue: A queue of packets represented internally in FQ-CoDel. In
most instances each flow gets its own queue; however because of the most instances each flow gets its own queue; however because of the
possibility of hash collisions, this is not always the case. In an possibility of hash collisions, this is not always the case. In an
attempt to avoid confusion, the word 'queue' is used to refer to the attempt to avoid confusion, the word 'queue' is used to refer to the
skipping to change at page 4, line 19 skipping to change at page 4, line 30
from. from.
CoDel AQM: The Active Queue Management algorithm employed by FQ- CoDel AQM: The Active Queue Management algorithm employed by FQ-
CoDel. CoDel.
DRR: Deficit round-robin scheduling. DRR: Deficit round-robin scheduling.
Quantum: The maximum amount of bytes to be dequeued from a queue at Quantum: The maximum amount of bytes to be dequeued from a queue at
once. once.
1.2. Informal summary of FQ-CoDel 1.3. Informal summary of FQ-CoDel
FQ-CoDel is a _hybrid_ of DRR [DRR] and CoDel [CODELDRAFT], with an FQ-CoDel is a _hybrid_ of DRR [DRR] and CoDel [CODELDRAFT], with an
optimisation for sparse flows similar to SQF [SQF2012] and DRR++ optimisation for sparse flows similar to SQF [SQF2012] and DRR++
[DRRPP]. We call this "Flow Queueing" rather than "Fair Queueing" as [DRRPP]. We call this "Flow Queueing" rather than "Fair Queueing" as
flows that build a queue are treated differently than flows that do flows that build a queue are treated differently than flows that do
not. not.
FQ-CoDel stochastically classifies incoming packets into different FQ-CoDel stochastically classifies incoming packets into different
queues by hashing the 5-tuple of IP protocol number and source and queues by hashing the 5-tuple of IP protocol number and source and
destination IP and port numbers, perturbed with a random number destination IP and port numbers, perturbed with a random number
skipping to change at page 10, line 18 skipping to change at page 10, line 34
5.2. Parameters 5.2. Parameters
The following are the user configuration parameters exposed by the The following are the user configuration parameters exposed by the
Linux implementation of FQ-CoDel. Linux implementation of FQ-CoDel.
5.2.1. Interval 5.2.1. Interval
The _interval_ parameter has the same semantics as CoDel and is used The _interval_ parameter has the same semantics as CoDel and is used
to ensure that the measured minimum delay does not become too stale. to ensure that the measured minimum delay does not become too stale.
The minimum delay MUST be experienced in the last epoch of length That is, CoDel only reacts to delay experienced in the last epoch of
interval. It SHOULD be set on the order of the worst-case RTT length interval. It SHOULD be set on the order of the worst-case RTT
through the bottleneck to give end-points sufficient time to react. through the bottleneck to give end-points sufficient time to react.
The default interval value is 100 ms. The default interval value is 100 ms.
5.2.2. Target 5.2.2. Target
The _target_ parameter has the same semantics as CoDel. It is the The _target_ parameter has the same semantics as CoDel. It is the
acceptable minimum standing/persistent queue delay for each FQ-CoDel acceptable minimum standing/persistent queue delay for each FQ-CoDel
Queue. This minimum delay is identified by tracking the local Queue. This minimum delay is identified by tracking the local
minimum queue delay that packets experience. minimum queue delay that packets experience.
skipping to change at page 12, line 33 skipping to change at page 12, line 45
a 4-way associative hash with the same number of total queues, the a 4-way associative hash with the same number of total queues, the
probability of no collisions for 100 flows is 99.93%, while for an probability of no collisions for 100 flows is 99.93%, while for an
8-way associative hash it is ~100%. 8-way associative hash it is ~100%.
5.4. Memory Overhead 5.4. Memory Overhead
FQ-CoDel can be implemented with a very low memory footprint (less FQ-CoDel can be implemented with a very low memory footprint (less
than 64 bytes per queue on 64 bit systems). These are the data than 64 bytes per queue on 64 bit systems). These are the data
structures used in the Linux implementation: structures used in the Linux implementation:
<CODE BEGINS>
struct codel_vars { struct codel_vars {
u32 count; u32 count;
u32 lastcount; u32 lastcount;
bool dropping; bool dropping;
u16 rec_inv_sqrt; u16 rec_inv_sqrt;
codel_time_t first_above_time; codel_time_t first_above_time;
codel_time_t drop_next; codel_time_t drop_next;
codel_time_t ldelay; codel_time_t ldelay;
}; };
struct fq_codel_flow { struct fq_codel_flow {
struct sk_buff *head; struct sk_buff *head;
struct sk_buff *tail; struct sk_buff *tail;
struct list_head flowchain; struct list_head flowchain;
int credits; /* current number of queue credits */ int credits; /* current number of queue credits */
u32 dropped; /* # of drops (or ECN marks) on flow */ u32 dropped; /* # of drops (or ECN marks) on flow */
struct codel_vars cvars; struct codel_vars cvars;
}; };
<CODE ENDS>
The master table managing all queues looks like this: The master table managing all queues looks like this:
<CODE BEGINS>
struct fq_codel_sched_data { struct fq_codel_sched_data {
struct tcf_proto *filter_list; /* optional external classifier */ struct tcf_proto *filter_list; /* optional external classifier */
struct fq_codel_flow *flows; /* Flows table [flows_cnt] */ struct fq_codel_flow *flows; /* Flows table [flows_cnt] */
u32 *backlogs; /* backlog table [flows_cnt] */ u32 *backlogs; /* backlog table [flows_cnt] */
u32 flows_cnt; /* number of flows */ u32 flows_cnt; /* number of flows */
u32 perturbation; /* hash perturbation */ u32 perturbation; /* hash perturbation */
u32 quantum; /* psched_mtu(qdisc_dev(sch)); */ u32 quantum; /* psched_mtu(qdisc_dev(sch)); */
struct codel_params cparams; struct codel_params cparams;
struct codel_stats cstats; struct codel_stats cstats;
u32 drop_overlimit; u32 drop_overlimit;
u32 new_flow_count; u32 new_flow_count;
struct list_head new_flows; /* list of new flows */ struct list_head new_flows; /* list of new flows */
struct list_head old_flows; /* list of old flows */ struct list_head old_flows; /* list of old flows */
}; };
<CODE ENDS>
5.5. Per-Packet Timestamping 5.5. Per-Packet Timestamping
The CoDel portion of the algorithm requires per-packet timestamps be The CoDel portion of the algorithm requires per-packet timestamps be
stored along with the packet. While this approach works well for stored along with the packet. While this approach works well for
software-based routers, it may be impossible to retrofit devices that software-based routers, it may be impossible to retrofit devices that
do most of their processing in silicon and lack space or mechanism do most of their processing in silicon and lack space or mechanism
for timestamping. for timestamping.
Also, while perfect resolution is not needed, timestamp resolution Also, while perfect resolution is not needed, timestamp resolution
below the target is necessary. Furthermore, timestamping functions below the target is necessary. Furthermore, timestamping functions
skipping to change at page 16, line 21 skipping to change at page 16, line 48
configured as the default queueing discipline in a number of mainline configured as the default queueing discipline in a number of mainline
Linux distributions (as of this writing at least OpenWRT, Arch Linux Linux distributions (as of this writing at least OpenWRT, Arch Linux
and Fedora). We believe it to be a safe default and encourage people and Fedora). We believe it to be a safe default and encourage people
running Linux to turn it on: It is a massive improvement over the running Linux to turn it on: It is a massive improvement over the
previous default FIFO queue. previous default FIFO queue.
Of course there is always room for improvement, and this document has Of course there is always room for improvement, and this document has
listed some of the know limitations of the algorithm. As such, we listed some of the know limitations of the algorithm. As such, we
encourage further research into algorithm refinements and addressing encourage further research into algorithm refinements and addressing
of limitations. One such effort is undertaken by the bufferbloat of limitations. One such effort is undertaken by the bufferbloat
community in the form of the Cake [1] queue management scheme. In community in the form of the Cake queue management scheme (see
addition to this we believe the following (non-exhaustive) list of http://www.bufferbloat.net/projects/codel/wiki/Cake). In addition to
issues to be worthy of further enquiry: this we believe the following (non-exhaustive) list of issues to be
worthy of further enquiry:
o Variations on the flow classification mechanism to fit different o Variations on the flow classification mechanism to fit different
notions of flows. For instance, an ISP might want to deploy per- notions of flows. For instance, an ISP might want to deploy per-
subscriber scheduling, while in other cases several flows can subscriber scheduling, while in other cases several flows can
share a 5-tuple, as exemplified by the RTCWEB QoS recommendations share a 5-tuple, as exemplified by the RTCWEB QoS recommendations
[RTCWEB-QOS]. [RTCWEB-QOS].
o Interactions between flow queueing and delay-based congestion o Interactions between flow queueing and delay-based congestion
control algorithms and scavenger protocols. control algorithms and scavenger protocols.
skipping to change at page 18, line 5 skipping to change at page 18, line 37
12. References 12. References
12.1. Normative References 12.1. Normative References
[CODELDRAFT] [CODELDRAFT]
Nichols, K., Jacobson, V., McGregor, A., and J. Iyengar, Nichols, K., Jacobson, V., McGregor, A., and J. Iyengar,
"Controlled Delay Active Queue Management", October 2014, "Controlled Delay Active Queue Management", October 2014,
<https://datatracker.ietf.org/doc/draft-ietf-aqm-codel/>. <https://datatracker.ietf.org/doc/draft-ietf-aqm-codel/>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
"Low Extra Delay Background Transport (LEDBAT)", RFC 6817, "Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
DOI 10.17487/RFC6817, December 2012, DOI 10.17487/RFC6817, December 2012,
<http://www.rfc-editor.org/info/rfc6817>. <http://www.rfc-editor.org/info/rfc6817>.
[RTCWEB-QOS] [RTCWEB-QOS]
Dhesikan, S., Jennings, C., Druta, D., Jones, P., and J. Dhesikan, S., Jennings, C., Druta, D., Jones, P., and J.
Polk, "DSCP and other packet markings for RTCWeb QoS", Polk, "DSCP and other packet markings for RTCWeb QoS",
December 2014, <https://datatracker.ietf.org/doc/draft- December 2014, <https://datatracker.ietf.org/doc/draft-
dhesikan-tsvwg-rtcweb-qos/>. dhesikan-tsvwg-rtcweb-qos/>.
skipping to change at page 18, line 45 skipping to change at page 19, line 34
low priority congestion control", July 2014, low priority congestion control", July 2014,
<http://perso.telecom-paristech.fr/~drossi/paper/ <http://perso.telecom-paristech.fr/~drossi/paper/
rossi14comnet-b.pdf>. rossi14comnet-b.pdf>.
[SQF2012] Bonald, T., Muscariello, L., and N. Ostallo, "On the [SQF2012] Bonald, T., Muscariello, L., and N. Ostallo, "On the
impact of TCP and per-flow scheduling on Internet impact of TCP and per-flow scheduling on Internet
Performance - IEEE/ACM transactions on Networking", April Performance - IEEE/ACM transactions on Networking", April
2012, <http://perso.telecom- 2012, <http://perso.telecom-
paristech.fr/~bonald/Publications_files/BMO2011.pdf>. paristech.fr/~bonald/Publications_files/BMO2011.pdf>.
12.3. URIs
[1] http://www.bufferbloat.net/projects/codel/wiki/Cake
Authors' Addresses Authors' Addresses
Toke Hoeiland-Joergensen Toke Hoeiland-Joergensen
Karlstad University Karlstad University
Dept. of Computer Science Dept. of Computer Science
Karlstad 65188 Karlstad 65188
Sweden Sweden
Email: toke.hoiland-jorgensen@kau.se Email: toke.hoiland-jorgensen@kau.se
 End of changes. 23 change blocks. 
36 lines changed or deleted 55 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/