draft-ietf-aqm-fq-codel-03.txt   draft-ietf-aqm-fq-codel-04.txt 
AQM working group T. Hoeiland-Joergensen AQM working group T. Hoeiland-Joergensen
Internet-Draft Karlstad University Internet-Draft Karlstad University
Intended status: Informational P. McKenney Intended status: Experimental P. McKenney
Expires: May 23, 2016 IBM Linux Technology Center Expires: August 7, 2016 IBM Linux Technology Center
D. Taht D. Taht
Teklibre Teklibre
J. Gettys J. Gettys
E. Dumazet E. Dumazet
Google, Inc. Google, Inc.
November 20, 2015 February 04, 2016
FlowQueue-Codel FlowQueue-Codel
draft-ietf-aqm-fq-codel-03 draft-ietf-aqm-fq-codel-04
Abstract Abstract
This memo presents the FQ-CoDel hybrid packet scheduler/AQM This memo presents the FQ-CoDel hybrid packet scheduler/AQM
algorithm, a powerful tool for fighting bufferbloat and reducing algorithm, a powerful tool for fighting bufferbloat and reducing
latency. latency.
FQ-CoDel mixes packets from multiple flows and reduces the impact of FQ-CoDel mixes packets from multiple flows and reduces the impact of
head of line blocking from bursty traffic. It provides isolation for head of line blocking from bursty traffic. It provides isolation for
low-rate traffic such as DNS, web, and videoconferencing traffic. It low-rate traffic such as DNS, web, and videoconferencing traffic. It
skipping to change at page 1, line 47 skipping to change at page 1, line 47
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 23, 2016. This Internet-Draft will expire on August 7, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 5, line 4 skipping to change at page 5, line 4
CoDel state variables. By default, 1024 queues are created. The CoDel state variables. By default, 1024 queues are created. The
current implementation supports anywhere from one to 64K separate current implementation supports anywhere from one to 64K separate
queues, and each queue maintains the state variables throughout its queues, and each queue maintains the state variables throughout its
lifetime, and so acts the same as the non-FQ CoDel variant would. lifetime, and so acts the same as the non-FQ CoDel variant would.
This means that with only one queue, FQ-CoDel behaves essentially the This means that with only one queue, FQ-CoDel behaves essentially the
same as CoDel by itself. same as CoDel by itself.
On dequeue, FQ-CoDel selects a queue from which to dequeue by a two- On dequeue, FQ-CoDel selects a queue from which to dequeue by a two-
tier round-robin scheme, in which each queue is allowed to dequeue up tier round-robin scheme, in which each queue is allowed to dequeue up
to a configurable quantum of bytes for each iteration. Deviations to a configurable quantum of bytes for each iteration. Deviations
from this quantum is maintained as a deficit for the queue, which from this quantum is maintained as byte credits for the queue, which
serves to make the fairness scheme byte-based rather than a packet- serves to make the fairness scheme byte-based rather than a packet-
based. The two-tier round-robin mechanism distinguishes between based. The two-tier round-robin mechanism distinguishes between
"new" queues (which don't build up a standing queue) and "old" "new" queues (which don't build up a standing queue) and "old"
queues, that have queued enough data to be around for more than one queues, that have queued enough data to be around for more than one
iteration of the round-robin scheduler. iteration of the round-robin scheduler.
This new/old queue distinction has a particular consequence for This new/old queue distinction has a particular consequence for
queues that don't build up more than a quantum of bytes before being queues that don't build up more than a quantum of bytes before being
visited by the scheduler: Such queues are removed from the list, and visited by the scheduler: Such queues are removed from the list, and
then re-added as a new queue each time a packet arrives for it, and then re-added as a new queue each time a packet arrives for it, and
skipping to change at page 6, line 21 skipping to change at page 6, line 21
By default, the flow hashing is performed on the 5-tuple of source By default, the flow hashing is performed on the 5-tuple of source
and destination IP and port numbers and IP protocol number. While and destination IP and port numbers and IP protocol number. While
the hashing can be customised to match on arbitrary packet bytes, the hashing can be customised to match on arbitrary packet bytes,
care should be taken when doing so: Much of the benefit of the FQ- care should be taken when doing so: Much of the benefit of the FQ-
CoDel scheduler comes from this per-flow distinction. However, the CoDel scheduler comes from this per-flow distinction. However, the
default hashing does have some limitations, as discussed in default hashing does have some limitations, as discussed in
Section 6. Section 6.
FQ-CoDel's DRR scheduler is byte-based, employing a deficit round- FQ-CoDel's DRR scheduler is byte-based, employing a deficit round-
robin mechanism between queues. This works by keeping track of the robin mechanism between queues. This works by keeping track of the
current byte _deficit_ of each queue. This deficit is initialised to current number _byte credits_ of each queue. This number is is
the configurable quantum; each time a queue gets a dequeue initialised to the configurable quantum; each time a queue gets a
opportunity, it gets to dequeue packets, decreasing the deficit by dequeue opportunity, it gets to dequeue packets, decreasing the
the packet size for each packet, until the deficit runs into the number of credits by the packet size for each packet. This continues
negative, at which point it is increased by one quantum, and the until the number of credits runs into the negative, at which point it
dequeue opportunity ends. is increased by one quantum, and the dequeue opportunity ends.
This means that if one queue contains packets of, for instance, size This means that if one queue contains packets of, for instance, size
quantum/3, and another contains quantum-sized packets, the first quantum/3, and another contains quantum-sized packets, the first
queue will dequeue three packets each time it gets a turn, whereas queue will dequeue three packets each time it gets a turn, whereas
the second only dequeues one. This means that flows that send small the second only dequeues one. This means that flows that send small
packets are not penalised by the difference in packet sizes; rather, packets are not penalised by the difference in packet sizes; rather,
the DRR scheme approximates a (single-)byte-based fairness queueing. the DRR scheme approximates a (single-)byte-based fairness queueing.
The size of the quantum determines the scheduling granularity, with The size of the quantum determines the scheduling granularity, with
the tradeoff from too small a quantum being scheduling overhead. For the tradeoff from too small a quantum being scheduling overhead. For
small bandwidths, lowering the quantum from the default MTU size can small bandwidths, lowering the quantum from the default MTU size can
skipping to change at page 7, line 27 skipping to change at page 7, line 27
destination IP and port numbers, permuted by a random value selected destination IP and port numbers, permuted by a random value selected
at initialisation time, and taking the hash value modulo the number at initialisation time, and taking the hash value modulo the number
of queues. of queues.
Once the packet has been successfully classified into a queue, it is Once the packet has been successfully classified into a queue, it is
handed over to the CoDel algorithm for timestamping. It is then handed over to the CoDel algorithm for timestamping. It is then
added to the tail of the selected queue, and the queue's byte count added to the tail of the selected queue, and the queue's byte count
is updated by the packet size. Then, if the queue is not currently is updated by the packet size. Then, if the queue is not currently
active (i.e. if it is not in either the list of new or the list of active (i.e. if it is not in either the list of new or the list of
old queues), it is added to the end of the list of new queues, and old queues), it is added to the end of the list of new queues, and
its deficit is initiated to the configured quantum. Otherwise, the its number of credits is initiated to the configured quantum.
queue is left in its current queue list. Otherwise, the queue is left in its current queue list.
Finally, the total number of enqueued packets is compared with the Finally, the total number of enqueued packets is compared with the
configured limit, and if it is _above_ this value (which can happen configured limit, and if it is _above_ this value (which can happen
since a packet was just enqueued), a packet is dropped from the head since a packet was just enqueued), a packet is dropped from the head
of the queue with the largest current byte count. Note that this in of the queue with the largest current byte count. Note that this in
most cases means that the packet that gets dropped is different from most cases means that the packet that gets dropped is different from
the one that was just enqueued, and may even be from a different the one that was just enqueued, and may even be from a different
queue. queue.
4.1.1. Alternative classification schemes 4.1.1. Alternative classification schemes
skipping to change at page 8, line 18 skipping to change at page 8, line 18
4.2. Dequeue 4.2. Dequeue
Most of FQ-CoDel's work is done at packet dequeue time. It consists Most of FQ-CoDel's work is done at packet dequeue time. It consists
of three parts: selecting a queue from which to dequeue a packet, of three parts: selecting a queue from which to dequeue a packet,
actually dequeuing it (employing the CoDel algorithm in the process), actually dequeuing it (employing the CoDel algorithm in the process),
and some final bookkeeping. and some final bookkeeping.
For the first part, the scheduler first looks at the list of new For the first part, the scheduler first looks at the list of new
queues; for the queue at the head of that list, if that queue has a queues; for the queue at the head of that list, if that queue has a
negative deficit (i.e. it has already dequeued at least a quantum of negative number of credits (i.e. it has already dequeued at least a
bytes), its deficit is increased by one quantum, and the queue is put quantum of bytes), it is given an additional quantum of credits, the
onto _the end of_ the list of old queues, and the routine selects the queue is put onto _the end of_ the list of old queues, and the
next queue and starts again. routine selects the next queue and starts again.
Otherwise, that queue is selected for dequeue. If the list of new Otherwise, that queue is selected for dequeue. If the list of new
queues is empty, the scheduler proceeds down the list of old queues queues is empty, the scheduler proceeds down the list of old queues
in the same fashion (checking the deficit, and either selecting the in the same fashion (checking the credits, and either selecting the
queue for dequeuing, or increasing the deficit and putting the queue queue for dequeuing, or adding credits and putting the queue back at
back at the end of the list). the end of the list).
After having selected a queue from which to dequeue a packet, the After having selected a queue from which to dequeue a packet, the
CoDel algorithm is invoked on that queue. This applies the CoDel CoDel algorithm is invoked on that queue. This applies the CoDel
control law, and may discard one or more packets from the head of control law, and may discard one or more packets from the head of
that queue, before returning the packet that should be dequeued (or that queue, before returning the packet that should be dequeued (or
nothing if the queue is or becomes empty while being handled by the nothing if the queue is or becomes empty while being handled by the
CoDel algorithm). CoDel algorithm).
Finally, if the CoDel algorithm does not return a packet, then the Finally, if the CoDel algorithm does not return a packet, then the
queue must be empty, and the scheduler does one of two things: if the queue must be empty, and the scheduler does one of two things: if the
queue selected for dequeue came from the list of new queues, it is queue selected for dequeue came from the list of new queues, it is
moved to _the end of_ the list of old queues. If instead it came moved to _the end of_ the list of old queues. If instead it came
from the list of old queues, that queue is removed from the list, to from the list of old queues, that queue is removed from the list, to
be added back (as a new queue) the next time a packet arrives that be added back (as a new queue) the next time a packet arrives that
hashes to that queue. Then (since no packet was available for hashes to that queue. Then (since no packet was available for
dequeue), the whole dequeue process is restarted from the beginning. dequeue), the whole dequeue process is restarted from the beginning.
If, instead, the scheduler _did_ get a packet back from the CoDel If, instead, the scheduler _did_ get a packet back from the CoDel
algorithm, it updates the byte deficit for the selected queue and algorithm, it subtracts the size of the packet from the the byte
moves it to the end of the list of old queues, before returning the credits for the selected queue and returns the packet as the result
packet as the result of the dequeue operation. of the dequeue operation.
The step that moves an empty queue from the list of new queues to The step that moves an empty queue from the list of new queues to
_the end of_ the list of old queues before it is removed is crucial _the end of_ the list of old queues before it is removed is crucial
to prevent starvation. Otherwise the queue could reappear (the next to prevent starvation. Otherwise the queue could reappear (the next
time a packet arrives for it) before the list of old queues is time a packet arrives for it) before the list of old queues is
visited; this can go on indefinitely even with a small number of visited; this can go on indefinitely even with a small number of
active flows, if the flow providing packets to the queue in question active flows, if the flow providing packets to the queue in question
transmits at just the right rate. This is prevented by first moving transmits at just the right rate. This is prevented by first moving
the queue to _the end of_ the list of old queues, forcing a pass the queue to _the end of_ the list of old queues, forcing a pass
through that, and thus preventing starvation. Moving it to the end through that, and thus preventing starvation. Moving it to the end
of the list, rather than the front, is crucial for this to work. of the list, rather than the front, is crucial for this to work.
The resulting migration of queues between the different states is The resulting migration of queues between the different states is
summarised in the following state diagram: summarised in the following state diagram:
+-----------------+ +------------------+ +-----------------+ +------------------+
| | Empty | | | | Empty | |
| Empty |<---------------+ Old +-----+ | Empty |<---------------+ Old +----+
| | | | | | | | | |
+-------+---------+ +------------------+ | +-------+---------+ +------------------+ |
| ^ ^ |Quantum | ^ ^ |Credits
|Arrival | | |Exceeded |Arrival | | |Exhausted
v | | | v | | |
+-----------------+ | | | +-----------------+ | | |
| | Empty or | | | | | Empty or | | |
| New +-------------------+ +--------+ | New +-------------------+ +-------+
| | Quantum exceeded | | Credits exhausted
+-----------------+ +-----------------+
5. Implementation considerations 5. Implementation considerations
This section contains implementation details for the FQ-CoDel This section contains implementation details for the FQ-CoDel
algorithm. This includes the data structures and parameters used in algorithm. This includes the data structures and parameters used in
the Linux implementation, as well as discussion of some required the Linux implementation, as well as discussion of some required
features of the target platform and other considerations. features of the target platform and other considerations.
5.1. Data structures 5.1. Data structures
The main data structure of FQ-CoDel is the array of queues, which is The main data structure of FQ-CoDel is the array of queues, which is
instantiated with the number of queues specified by the _flows_ instantiated with the number of queues specified by the _flows_
parameter at instantiation time. Each queue consists simply of an parameter at instantiation time. Each queue consists simply of an
ordered list of packets with FIFO semantics, two state variables ordered list of packets with FIFO semantics, two state variables
tracking the queue deficit and total number of bytes enqueued, and tracking the queue credits and total number of bytes enqueued, and
the set of CoDel state variables. Other state variables to track the set of CoDel state variables. Other state variables to track
queue statistics can also be included: for instance, the Linux queue statistics can also be included: for instance, the Linux
implementation keeps a count of dropped packets. implementation keeps a count of dropped packets.
In addition to the queue structures themselves, FQ-CoDel maintains In addition to the queue structures themselves, FQ-CoDel maintains
two ordered lists containing references to the subset of queues that two ordered lists containing references to the subset of queues that
are currently active. These are the list of 'new' queues and the are currently active. These are the list of 'new' queues and the
list of 'old' queues, as explained in Section 4 above. list of 'old' queues, as explained in Section 4 above.
In the Linux implementation, queue space is shared: there's a global In the Linux implementation, queue space is shared: there's a global
skipping to change at page 11, line 48 skipping to change at page 11, line 48
otherwise affect the performance of the system. otherwise affect the performance of the system.
It can be disabled by specifying the _noecn_ parameter. It can be disabled by specifying the _noecn_ parameter.
5.2.7. CE threshold 5.2.7. CE threshold
This parameter enables DCTCP-like processing to enable CE marking ECT This parameter enables DCTCP-like processing to enable CE marking ECT
packets at a lower setpoint than the default codel target. packets at a lower setpoint than the default codel target.
The parameter, _ce_threshold_, is disabled by default and can be set The parameter, _ce_threshold_, is disabled by default and can be set
to a number of usec to enable. to a number of microseconds to enable.
5.3. Probability of hash collisions 5.3. Probability of hash collisions
Since the Linux FQ-CoDel implementation by default uses 1024 hash Since the Linux FQ-CoDel implementation by default uses 1024 hash
buckets, the probability that (say) 100 flows will all hash to the buckets, the probability that (say) 100 flows will all hash to the
same bucket is something like ten to the power of minus 300. Thus, same bucket is something like ten to the power of minus 300. Thus,
the probability that at least one of the flows will hash to some the probability that at least one of the flows will hash to some
other queue is very high indeed. other queue is very high indeed.
Expanding on this, based on analytical equations for hash collision Expanding on this, based on analytical equations for hash collision
skipping to change at page 12, line 47 skipping to change at page 12, line 47
u16 rec_inv_sqrt; u16 rec_inv_sqrt;
codel_time_t first_above_time; codel_time_t first_above_time;
codel_time_t drop_next; codel_time_t drop_next;
codel_time_t ldelay; codel_time_t ldelay;
}; };
struct fq_codel_flow { struct fq_codel_flow {
struct sk_buff *head; struct sk_buff *head;
struct sk_buff *tail; struct sk_buff *tail;
struct list_head flowchain; struct list_head flowchain;
int deficit; int credits; /* current number of queue credits */
u32 dropped; /* # of drops (or ECN marks) on flow */ u32 dropped; /* # of drops (or ECN marks) on flow */
struct codel_vars cvars; struct codel_vars cvars;
}; };
The master table managing all queues looks like this: The master table managing all queues looks like this:
struct fq_codel_sched_data { struct fq_codel_sched_data {
struct tcf_proto *filter_list; /* optional external classifier */ struct tcf_proto *filter_list; /* optional external classifier */
struct fq_codel_flow *flows; /* Flows table [flows_cnt] */ struct fq_codel_flow *flows; /* Flows table [flows_cnt] */
u32 *backlogs; /* backlog table [flows_cnt] */ u32 *backlogs; /* backlog table [flows_cnt] */
skipping to change at page 13, line 52 skipping to change at page 13, line 52
5.7. Differences between CoDel and FQ-CoDel behaviour 5.7. Differences between CoDel and FQ-CoDel behaviour
CoDel can be applied to a single queue system as a straight AQM, CoDel can be applied to a single queue system as a straight AQM,
where it converges towards an "ideal" drop rate (i.e. one that where it converges towards an "ideal" drop rate (i.e. one that
minimises delay while keeping a high link utilisation), and then minimises delay while keeping a high link utilisation), and then
optimises around that control point. optimises around that control point.
The scheduling of FQ-CoDel mixes packets of competing flows, which The scheduling of FQ-CoDel mixes packets of competing flows, which
acts to pace bursty flows to better fill the pipe. Additionally, a acts to pace bursty flows to better fill the pipe. Additionally, a
new flow gets substantial "credit" over other flows until CoDel finds new flow gets substantial leeway over other flows until CoDel finds
an ideal drop rate for it. However, for a new flow that exceeds the an ideal drop rate for it. However, for a new flow that exceeds the
configured quantum, more time passes before all of its data is configured quantum, more time passes before all of its data is
delivered (as packets from it, too, are mixed across the other delivered (as packets from it, too, are mixed across the other
existing queue-building flows). Thus, FQ-CoDel takes longer (as existing queue-building flows). Thus, FQ-CoDel takes longer (as
measured in time) to converge towards an ideal drop rate for a given measured in time) to converge towards an ideal drop rate for a given
new flow, but does so within fewer delivered _packets_ from that new flow, but does so within fewer delivered _packets_ from that
flow. flow.
Finally, the flow isolation FQ-CoDel provides means that the CoDel Finally, the flow isolation FQ-CoDel provides means that the CoDel
drop mechanism operates on the flows actually building queues, which drop mechanism operates on the flows actually building queues, which
 End of changes. 16 change blocks. 
39 lines changed or deleted 39 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/