draft-ietf-rmcat-sbd-00.txt   draft-ietf-rmcat-sbd-01.txt 
RTP Media Congestion Avoidance D. Hayes, Ed. RTP Media Congestion Avoidance D. Hayes, Ed.
Techniques University of Oslo Techniques University of Oslo
Internet-Draft S. Ferlin Internet-Draft S. Ferlin
Intended status: Experimental Simula Research Laboratory Intended status: Experimental Simula Research Laboratory
Expires: November 9, 2015 M. Welzl Expires: January 2, 2016 M. Welzl
University of Oslo University of Oslo
May 8, 2015 July 1, 2015
Shared Bottleneck Detection for Coupled Congestion Control for RTP Shared Bottleneck Detection for Coupled Congestion Control for RTP
Media. Media.
draft-ietf-rmcat-sbd-00 draft-ietf-rmcat-sbd-01
Abstract Abstract
This document describes a mechanism to detect whether end-to-end data This document describes a mechanism to detect whether end-to-end data
flows share a common bottleneck. It relies on summary statistics flows share a common bottleneck. It relies on summary statistics
that are calculated by a data receiver based on continuous that are calculated by a data receiver based on continuous
measurements and regularly fed to a grouping algorithm that runs measurements and regularly fed to a grouping algorithm that runs
wherever the knowledge is needed. This mechanism complements the wherever the knowledge is needed. This mechanism complements the
coupled congestion control mechanism in draft-welzl-rmcat-coupled-cc. coupled congestion control mechanism in draft-welzl-rmcat-coupled-cc.
skipping to change at page 1, line 39 skipping to change at page 1, line 39
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 9, 2015. This Internet-Draft will expire on January 2, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 18 skipping to change at page 2, line 18
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. The signals . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. The signals . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1. Packet Loss . . . . . . . . . . . . . . . . . . . . . 3 1.1.1. Packet Loss . . . . . . . . . . . . . . . . . . . . . 3
1.1.2. Packet Delay . . . . . . . . . . . . . . . . . . . . . 3 1.1.2. Packet Delay . . . . . . . . . . . . . . . . . . . . . 3
1.1.3. Path Lag . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3. Path Lag . . . . . . . . . . . . . . . . . . . . . . . 4
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Parameter Values . . . . . . . . . . . . . . . . . . . . . 5 2.1. Parameters and their Effect . . . . . . . . . . . . . . . 5
3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2. Recommended Parameter Values . . . . . . . . . . . . . . . 7
3.1. Key metrics and their calculation . . . . . . . . . . . . 7 3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1. Mean delay . . . . . . . . . . . . . . . . . . . . . . 7 3.1. Key metrics and their calculation . . . . . . . . . . . . 9
3.1.2. Skewness Estimate . . . . . . . . . . . . . . . . . . 8 3.1.1. Mean delay . . . . . . . . . . . . . . . . . . . . . . 9
3.1.3. Variance Estimate . . . . . . . . . . . . . . . . . . 9 3.1.2. Skewness Estimate . . . . . . . . . . . . . . . . . . 9
3.1.4. Oscillation Estimate . . . . . . . . . . . . . . . . . 9 3.1.3. Variability Estimate . . . . . . . . . . . . . . . . . 10
3.1.5. Packet loss . . . . . . . . . . . . . . . . . . . . . 10 3.1.4. Oscillation Estimate . . . . . . . . . . . . . . . . . 11
3.2. Flow Grouping . . . . . . . . . . . . . . . . . . . . . . 10 3.1.5. Packet loss . . . . . . . . . . . . . . . . . . . . . 11
3.2.1. Flow Grouping Algorithm . . . . . . . . . . . . . . . 10 3.2. Flow Grouping . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2. Using the flow group signal . . . . . . . . . . . . . 12 3.2.1. Flow Grouping Algorithm . . . . . . . . . . . . . . . 12
3.3. Removing Noise from the Estimates . . . . . . . . . . . . 12 3.2.2. Using the flow group signal . . . . . . . . . . . . . 13
3.3.1. Oscillation noise . . . . . . . . . . . . . . . . . . 12 3.3. Removing Noise from the Estimates . . . . . . . . . . . . 13
3.3.2. Clock drift . . . . . . . . . . . . . . . . . . . . . 13 3.3.1. PDV noise . . . . . . . . . . . . . . . . . . . . . . 14
3.3.3. Bias in the skewness measure . . . . . . . . . . . . . 14 3.3.2. Oscillation noise . . . . . . . . . . . . . . . . . . 14
3.4. Reducing lag and Improving Responsiveness . . . . . . . . 14 3.3.3. Clock skew . . . . . . . . . . . . . . . . . . . . . . 15
3.4.1. Improving the response of the skewness estimate . . . 15 3.4. Reducing lag and Improving Responsiveness . . . . . . . . 15
3.4.2. Improving the response of the variance estimate . . . 15 3.4.1. Improving the response of the skewness estimate . . . 16
4. Measuring OWD . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4.2. Improving the response of the variability estimate . . 16
4.1. Time stamp resolution . . . . . . . . . . . . . . . . . . 16 4. Measuring OWD . . . . . . . . . . . . . . . . . . . . . . . . 17
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 4.1. Time stamp resolution . . . . . . . . . . . . . . . . . . 17
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17
8. Change history . . . . . . . . . . . . . . . . . . . . . . . . 17 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 8. Change history . . . . . . . . . . . . . . . . . . . . . . . . 18
9.1. Normative References . . . . . . . . . . . . . . . . . . . 17 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
9.2. Informative References . . . . . . . . . . . . . . . . . . 17 9.1. Normative References . . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 9.2. Informative References . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19
1. Introduction 1. Introduction
In the Internet, it is not normally known if flows (e.g., TCP In the Internet, it is not normally known if flows (e.g., TCP
connections or UDP data streams) traverse the same bottlenecks. Even connections or UDP data streams) traverse the same bottlenecks. Even
flows that have the same sender and receiver may take different paths flows that have the same sender and receiver may take different paths
and share a bottleneck or not. Flows that share a bottleneck link and share a bottleneck or not. Flows that share a bottleneck link
usually compete with one another for their share of the capacity. usually compete with one another for their share of the capacity.
This competition has the potential to increase packet loss and This competition has the potential to increase packet loss and
delays. This is especially relevant for interactive applications delays. This is especially relevant for interactive applications
skipping to change at page 3, line 47 skipping to change at page 3, line 47
End-to-end delay measurements include noise from every device along End-to-end delay measurements include noise from every device along
the path in addition to the delay perturbation at the bottleneck the path in addition to the delay perturbation at the bottleneck
device. The noise is often significantly increased if the round-trip device. The noise is often significantly increased if the round-trip
time is used. The cleanest signal is obtained by using One-Way-Delay time is used. The cleanest signal is obtained by using One-Way-Delay
(OWD). (OWD).
Measuring absolute OWD is difficult since it requires both the sender Measuring absolute OWD is difficult since it requires both the sender
and receiver clocks to be synchronised. However, since the and receiver clocks to be synchronised. However, since the
statistics being collected are relative to the mean OWD, a relative statistics being collected are relative to the mean OWD, a relative
OWD measurement is sufficient. Clock drift is not usually OWD measurement is sufficient. Clock skew is not usually significant
significant over the time intervals used by this SBD mechanism (see over the time intervals used by this SBD mechanism (see [RFC6817] A.2
[RFC6817] A.2 for a discussion on clock drift and OWD measurements). for a discussion on clock skew and OWD measurements). However, in
However, in circumstances where it is significant, Section 3.3.2 circumstances where it is significant, Section 3.3.3 outlines a way
outlines a way of adjusting the calculations to cater for it. of adjusting the calculations to cater for it.
Each packet arriving at the bottleneck buffer may experience very Each packet arriving at the bottleneck buffer may experience very
different queue lengths, and therefore different waiting times. A different queue lengths, and therefore different waiting times. A
single OWD sample does not, therefore, characterize the path well. single OWD sample does not, therefore, characterize the path well.
However, multiple OWD measurements do reflect the distribution of However, multiple OWD measurements do reflect the distribution of
delays experienced at the bottleneck. delays experienced at the bottleneck.
1.1.3. Path Lag 1.1.3. Path Lag
Flows that share a common bottleneck may traverse different paths, Flows that share a common bottleneck may traverse different paths,
skipping to change at page 4, line 31 skipping to change at page 4, line 31
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
Acronyms used in this document: Acronyms used in this document:
OWD -- One Way Delay OWD -- One Way Delay
PDV -- Packet Delay Variation PDV -- Packet Delay Variation
MAD -- Mean Absolute Deviation
RTT -- Round Trip Time RTT -- Round Trip Time
SBD -- Shared Bottleneck Detection SBD -- Shared Bottleneck Detection
Conventions used in this document: Conventions used in this document:
T -- the base time interval over which measurements are T -- the base time interval over which measurements are
made. made.
N -- the number of base time, T, intervals used in some N -- the number of base time, T, intervals used in some
calculations. calculations.
sum_T(...) -- summation of all the measurements of the variable sum_T(...) -- summation of all the measurements of the variable
in parentheses taken over the interval T in parentheses taken over the interval T
sum(...) -- summation of terms of the variable in parentheses
sum(...) -- summation of terms of the variable in parentheses
sum_N(...) -- summation of N terms of the variable in parentheses sum_N(...) -- summation of N terms of the variable in parentheses
sum_NT(...) -- summation of all measurements taken over the
interval N*T
E_T(...) -- the expectation or mean of the measurements of the sum_NT(...) -- summation of all measurements taken over the
variable in parentheses over T interval N*T
E_N(...) -- The expectation or mean of the last N values of the E_T(...) -- the expectation or mean of the measurements of the
variable in parentheses variable in parentheses over T
E_M(...) -- The expectation or mean of the last M values of the E_N(...) -- the expectation or mean of the last N values of the
variable in parentheses, where M <= N. variable in parentheses
E_M(...) -- the expectation or mean of the last M values of the
variable in parentheses, where M <= N.
max_T(...) -- the maximum recorded measurement of the variable in max_T(...) -- the maximum recorded measurement of the variable in
parentheses taken over the interval T parentheses taken over the interval T
min_T(...) -- the minimum recorded measurement of the variable in min_T(...) -- the minimum recorded measurement of the variable in
parentheses taken over the interval T parentheses taken over the interval T
num_T(...) -- the count of measurements of the variable in num_T(...) -- the count of measurements of the variable in
parentheses taken in the interval T parentheses taken in the interval T
num_VM(...) -- the count of valid values of the variable in num_VM(...) -- the count of valid values of the variable in
parentheses given M records parentheses given M records
PC -- a boolean variable indicating the particular flow was PC -- a boolean variable indicating the particular flow
identified as experiencing congestion in the previous was identified as experiencing congestion in the
interval T (i.e. Previously Congested) previous interval T (i.e. Previously Congested)
CD_T -- an estimate of the effect of Clock Drift on the mean skew_est -- a measure of skewness in a OWD distribution.
OWD per T
CD_Adj(...) -- Mean OWD adjusted for clock drift var_est -- a measure of variability in OWD measurements.
p_l, p_f, p_pdv, c_s, c_h, p_s, p_d, p_v -- various thresholds freq_est -- a measure of low frequency oscillation in the OWD
used in the mechanism. measurements.
N, M, and F -- number of values (calculated over T). p_l, p_f, p_pdv, p_mad, c_s, c_h, p_s, p_d, p_v -- various
thresholds used in the mechanism
2.1. Parameter Values M and F -- number of values related to N
2.1. Parameters and their Effect
T T should be long enough so that there are enough packets
received during T for a useful estimate of short term mean
OWD and variation statistics. Making T too large can limit
the efficacy of PDV and freq_est. It will also increase the
response time of the mechanism. Making T too small will make
the metrics noisier.
N & M N should be large enough provide a stable estimate of
oscillations in OWD and average PDV. Usually M=N, though
having M<N may be beneficial in certain circumstances. M*T
needs to be long enough provide stable estimates of skewness
and MAD (if used).
F F determines the number of intervals over which statistics
are considered to be equally weighted. When F=M recent and
older measurements are considered equal. Making F<M can
increase the responsiveness of the SBD mechanism. If F is
too small, statistics will be too noisy.
c_s c_s is the threshold in skew_est used for determining whether
a flow is experiencing congestion or not. It should be
slightly negative so that a very lightly loaded path does not
give a false indication. Setting c_s more negative makes the
SBD mechanism less sensitive to transient and light
congestion episodes.
c_s c_h adds hysteresis to the congestion determination. It
should be large enough to avoid constant switching in the
determination, but low enough to ensure that grouping is not
attempted when there is no congestion and the delay and loss
signals cannot be relied upon.
p_v p_v determines the sensitivity of freq_est to noise. Making
it smaller will yield higher but noisier values for freq_est.
Making it too large will render it ineffective for
determining groups.
p_* Flows are separated when the skew_est|var_est|freq_est
measure is greater than p_s|p_f|p_d|(p_pdv|p_mad). Adjusting
these is a compromise between false grouping of flows that do
not share a bottleneck and false splitting of flows that do.
Making them larger can help if the measures are very noisy,
but reducing the noise in the statistical measures by
adjusting T and N|M may be a better solution.
2.2. Recommended Parameter Values
Reference [Hayes-LCN14] uses T=350ms, N=50, p_l = 0.1. The other Reference [Hayes-LCN14] uses T=350ms, N=50, p_l = 0.1. The other
parameters have been tightened to reflect minor enhancements to the parameters have been tightened to reflect minor enhancements to the
algorithm outlined in Section 3.3: c_s = -0.01, p_f = p_s = p_d = algorithm outlined in Section 3.3: c_s = -0.01, p_f = p_s = p_d =
0.1, p_pdv = 0.2, p_v = 0.2. M=50, F=10, and c_h = 0.3 are 0.1, p_pdv = 0.2, p_v = 0.2 (or p_mad=0.1, p_v=0.7). M=50, F=25, and
additional parameters defined in the document. These are values that c_h = 0.3 are additional parameters defined in the document. These
seem to work well over a wide range of practical Internet conditions, are values that seem to work well over a wide range of practical
but are the subject of ongoing tests. Internet conditions.
3. Mechanism 3. Mechanism
The mechanism described in this document is based on the observation The mechanism described in this document is based on the observation
that the distribution of delay measurements of packets from flows that the distribution of delay measurements of packets that traverse
that share a common bottleneck have similar shape characteristics. a common bottleneck have similar shape characteristics. These shape
These shape characteristics are described using 3 key summary characteristics are described using 3 key summary statistics:
statistics:
variance (estimate var_est, see Section 3.1.3) variability (estimate var_est, see Section 3.1.3)
skewness (estimate skew_est, see Section 3.1.2) skewness (estimate skew_est, see Section 3.1.2)
oscillation (estimate freq_est, see Section 3.1.4) oscillation (estimate freq_est, see Section 3.1.4)
with packet loss (estimate pkt_loss, see Section 3.1.5) used as a with packet loss (estimate pkt_loss, see Section 3.1.5) used as a
supplementary statistic. supplementary statistic.
Summary statistics help to address both the noise and the path lag Summary statistics help to address both the noise and the path lag
problems by describing the general shape over a relatively long problems by describing the general shape over a relatively long
period of time. This is sufficient for their application in coupled period of time. This is sufficient for their application in coupled
congestion control for RTP Media. They can be signalled from a congestion control for RTP Media. They can be signalled from a
receiver, which measures the OWD and calculates the summary receiver, which measures the OWD and calculates the summary
statistics, to a sender, which is the entity that is transmitting the statistics, to a sender, which is the entity that is transmitting the
media stream. An RTP Media device may be both a sender and a media stream. An RTP Media device may be both a sender and a
receiver. SBD can be performed at either Sender or receiver or both. receiver. SBD can be performed at either a sender or a receiver or
both.
+----+ +----+
| H2 | | H2 |
+----+ +----+
| |
| L2 | L2
| |
+----+ L1 | L3 +----+ +----+ L1 | L3 +----+
| H1 |------|------| H3 | | H1 |------|------| H3 |
+----+ +----+ +----+ +----+
skipping to change at page 7, line 50 skipping to change at page 9, line 28
necessarily be synchronized. However, it is a base measure for the 3 necessarily be synchronized. However, it is a base measure for the 3
summary statistics. The mean delay, E_T(OWD), is the average one way summary statistics. The mean delay, E_T(OWD), is the average one way
delay measured over T. delay measured over T.
To facilitate the other calculations, the last N E_T(OWD) values will To facilitate the other calculations, the last N E_T(OWD) values will
need to be stored in a cyclic buffer along with the moving average of need to be stored in a cyclic buffer along with the moving average of
E_T(OWD): E_T(OWD):
mean_delay = E_M(E_T(OWD)) = sum_M(E_T(OWD)) / M mean_delay = E_M(E_T(OWD)) = sum_M(E_T(OWD)) / M
where M <= N. Generally M=N, setting M to be less than N allows the where M <= N. Generally M=N: setting M to be less than N allows the
mechanism to be more responsive to changes, but potentially at the mechanism to be more responsive to changes, but potentially at the
expense of a higher error rate (see Section 3.4 for a discussion on expense of a higher error rate (see Section 3.4 for a discussion on
improving the responsiveness of the mechanism.) improving the responsiveness of the mechanism.)
3.1.2. Skewness Estimate 3.1.2. Skewness Estimate
Skewness is difficult to calculate efficiently and accurately. Skewness is difficult to calculate efficiently and accurately.
Ideally it should be calculated over the entire period (M * T) from Ideally it should be calculated over the entire period (M * T) from
the mean OWD over that period. However this would require storing the mean OWD over that period. However this would require storing
every delay measurement over the period. Instead, an estimate is every delay measurement over the period. Instead, an estimate is
made over T using the previous calculation of mean_delay. made over M * T based on a calculation every T using the previous T's
Comparisons are made using the mean of M skew estimates (an calculation of mean_delay.
alternative that removes bias in the mean is given in Section 3.3.3).
The skewness is estimated using two counters, counting the number of The skewness is estimated using two counters, counting the number of
one way delay samples (OWD) above and below the mean: one way delay samples (OWD) above and below the mean:
skew_est_T = (sum_T(OWD < mean_delay) skew_base_T = sum_T(OWD < mean_delay) - sum_T(OWD > mean_delay)
- sum_T(OWD > mean_delay)) / num_T(OWD)
where where
if (OWD < mean_delay) 1 else 0 if (OWD < mean_delay) 1 else 0
if (OWD > mean_delay) 1 else 0 if (OWD > mean_delay) 1 else 0
skew_est_T is a number between -1 and 1 and mean_delay does not include the mean of the current T
interval.
skew_est = E_M(skew_est_T) = sum_M(skew_est_T) / M skew_est = sum_MT(skew_base_T)/num_MT(OWD)
For implementation ease, mean_delay does not include the mean of the where skew_est is a number between -1 and 1
current T interval.
Note: Care must be taken when implementing the comparisons to ensure Note: Care must be taken when implementing the comparisons to ensure
that rounding does not bias skew_est. It is important that the mean that rounding does not bias skew_est. It is important that the mean
is calculated with a higher precision than the samples. is calculated with a higher precision than the samples.
3.1.3. Variance Estimate 3.1.3. Variability Estimate
Packet Delay Variation (PDV) ([RFC5481] and [ITU-Y1540]) is used as Packet Delay Variation (PDV) ([RFC5481] and [ITU-Y1540]) is used as
an estimator of the variance of the delay signal. We define PDV as an estimator of the variability of the delay signal. We define PDV
follows: as follows:
PDV = PDV_max = max_T(OWD) - E_T(OWD) PDV = PDV_max = max_T(OWD) - E_T(OWD)
var_est = E_M(PDV) = sum_M(PDV) / M var_est = E_M(PDV) = sum_M(PDV) / M
This modifies PDV as outlined in [RFC5481] to provide a summary This modifies PDV as outlined in [RFC5481] to provide a summary
statistic version that best aids the grouping decisions of the statistic version that best aids the grouping decisions of the
algorithm (see [Hayes-LCN14] section IVB). algorithm (see [Hayes-LCN14] section IVB).
The use of PDV = PDV_min = E_T(OWD) - min_T(OWD) is currently being Generally the maximum is sampled well during congestion, though it is
investigated as an alternative that is less sensitive to noise. The more sensitive to path and operating system noise. The use of PDV =
drawback of using PDV_min is that it does not distinguish between PDV_min = E_T(OWD) - min_T(OWD) would be less sensitive to this
groups of flows with similar values of skew_est as well as PDV_max noise, but is not well sampled during congestion at the bottleneck
(see [Hayes-LCN14] section IVB). and therefore not recommended.
3.1.4. Oscillation Estimate 3.1.4. Oscillation Estimate
An estimate of the low frequency oscillation of the delay signal is An estimate of the low frequency oscillation of the delay signal is
calculated by counting and normalising the significant mean, calculated by counting and normalising the significant mean,
E_T(OWD), crossings of mean_delay: E_T(OWD), crossings of mean_delay:
freq_est = number_of_crossings / N freq_est = number_of_crossings / N
Where where we define a significant mean crossing as a crossing that
we define a significant mean crossing as a crossing that
extends p_v * var_est from mean_delay. In our experiments we extends p_v * var_est from mean_delay. In our experiments we
have found that p_v = 0.2 is a good value. have found that p_v = 0.2 is a good value.
Freq_est is a number between 0 and 1. Freq_est can be approximated Freq_est is a number between 0 and 1. Freq_est can be approximated
incrementally as follows: incrementally as follows:
With each new calculation of E_T(OWD) a decision is made as to With each new calculation of E_T(OWD) a decision is made as to
whether this value of E_T(OWD) significantly crosses the current whether this value of E_T(OWD) significantly crosses the current
long term mean, mean_delay, with respect to the previous long term mean, mean_delay, with respect to the previous
significant mean crossing. significant mean crossing.
A cyclic buffer, last_N_crossings, records a 1 if there is a A cyclic buffer, last_N_crossings, records a 1 if there is a
significant mean crossing, otherwise a 0. significant mean crossing, otherwise a 0.
The counter, number_of_crossings, is incremented when there is a The counter, number_of_crossings, is incremented when there is a
significant mean crossing and subtracted from when a non-zero significant mean crossing and decremented when a non-zero value is
value is removed from the last_N_crossings. removed from the last_N_crossings.
This approximation of freq_est was not used in [Hayes-LCN14], which This approximation of freq_est was not used in [Hayes-LCN14], which
calculated freq_est every T using the current E_N(E_T(OWD)). Our calculated freq_est every T using the current E_N(E_T(OWD)). Our
tests show that this approximation of freq_est yields results that tests show that this approximation of freq_est yields results that
are almost identical to when the full calculation is performed every are almost identical to when the full calculation is performed every
T. T.
3.1.5. Packet loss 3.1.5. Packet loss
The proportion of packets lost is used as a supplementary measure: The proportion of packets lost is used as a supplementary measure:
pkt_loss = sum_NT(lost packets) / sum_NT(total packets) pkt_loss = sum_NT(lost packets) / sum_NT(total packets)
Note: When pkt_loss is small it is very variable, however, when Note: When pkt_loss is small it is very variable, however, when
pkt_loss is high it becomes a stable measure for making grouping pkt_loss is high it becomes a stable measure for making grouping
decisions. decisions..
3.2. Flow Grouping 3.2. Flow Grouping
3.2.1. Flow Grouping Algorithm 3.2.1. Flow Grouping Algorithm
The following grouping algorithm is RECOMMENDED for SBD in the RMCAT The following grouping algorithm is RECOMMENDED for SBD in the RMCAT
context and is sufficient and efficient for small to moderate numbers context and is sufficient and efficient for small to moderate numbers
of flows. For very large numbers of flows (e.g. hundreds), a more of flows. For very large numbers of flows (e.g. hundreds), a more
complex clustering algorithm may be substituted. complex clustering algorithm may be substituted.
skipping to change at page 11, line 51 skipping to change at page 13, line 30
diff(skew_est) < p_s diff(skew_est) < p_s
otherwise otherwise
diff(pkt_loss) < (p_d * pkt_loss) diff(pkt_loss) < (p_d * pkt_loss)
The threshold, (p_d * pkt_loss), is with respect to the The threshold, (p_d * pkt_loss), is with respect to the
highest value in the difference. highest value in the difference.
This procedure involves sorting estimates from highest to lowest. It This procedure involves sorting estimates from highest to lowest. It
is simple to implement, and efficient for small numbers of flows, is simple to implement, and efficient for small numbers of flows (up
such as are expected in RTCWEB. to 10-20).
3.2.2. Using the flow group signal 3.2.2. Using the flow group signal
A grouping decisions is made every T from the second T, though they A grouping decisions is made every T from the second T, though they
will not attain their full design accuracy until after the N'th T will not attain their full design accuracy until after the N'th T
interval. interval.
Network conditions, and even the congestion controllers, can cause Network conditions, and even the congestion controllers, can cause
bottlenecks to fluctuate. A coupled congestion controller MAY decide bottlenecks to fluctuate. A coupled congestion controller MAY decide
only to couple groups that remain stable, say grouped together 90% of only to couple groups that remain stable, say grouped together 90% of
skipping to change at page 12, line 26 skipping to change at page 14, line 5
coupled congestion controllers objectives. coupled congestion controllers objectives.
3.3. Removing Noise from the Estimates 3.3. Removing Noise from the Estimates
The following describe small changes to the calculation of the key The following describe small changes to the calculation of the key
metrics that help remove noise from them. Currently these "tweaks" metrics that help remove noise from them. Currently these "tweaks"
are described separately to keep the main description succinct. In are described separately to keep the main description succinct. In
future revisions of the draft these enhancements may replace the future revisions of the draft these enhancements may replace the
original key metric calculations. original key metric calculations.
3.3.1. Oscillation noise 3.3.1. PDV noise
When a path has no congestion, the PDV will be very small and the Usually during congestion the max_T(OWD) is quite well sampled as the
delay distribution is skewed toward the maximum. However max_T(OWD)
is subject to delay noise from other queues along the path as well as
the host operating system. Min_T(OWD) is less prone to noise along
the path and from the host operating system, but is not well sampled
during congestion (i.e. when there is a bottleneck). Flows with very
different packet send rates exacerbate the problem.
An alternative delay variation measure that is less sensitive to
extreme values and different send rates is Mean Absolute Deviation
(MAD). It can be implemented in an online manner as follows:
var_base_T = sum_T(|OWD - E_T(OWD)|)
where
|x| is the absolute value of x
E_T(OWD) is the mean OWD calculated in the previous T
var_est = MAD_MT = sum_MT(var_base_T)/num_MT(OWD)
For calculation of freq_est p_v=0.7 (MAD is a smaller number than
PDV)
For the grouping threshold p_mad=0.1 instead of p_pdv (MAD is less
noisy so the test can be tighter)
Note that the method for improving responsiveness of MAD_MT is the
same as that described in Section 3.4.1 for skew_est.
3.3.2. Oscillation noise
When a path has no congestion, var_est will be very small and the
recorded significant mean crossings will be the result of path noise. recorded significant mean crossings will be the result of path noise.
Thus up to N-1 meaningless mean crossings can be a source of error at Thus up to N-1 meaningless mean crossings can be a source of error at
the point a link becomes a bottleneck and flows traversing it begin the point a link becomes a bottleneck and flows traversing it begin
to be grouped. to be grouped.
To remove this source of noise from freq_est: To remove this source of noise from freq_est:
1. Set the current PDV to PDV = NaN (a value representing an invalid 1. Set the current PDV to PDV = NaN (a value representing an invalid
record, ie Not a Number) for flows that are deemed to not be record, i.e. Not a Number) for flows that are deemed to not be
experiencing congestion by the first skew_est based grouping test experiencing congestion by the first skew_est based grouping test
(see Section 3.2.1). (see Section 3.2.1).
2. Then var_est = sum_M(PDV != NaN) / num_VM(PDV) 2. Then var_est = sum_M(PDV != NaN) / num_VM(PDV)
3. For freq_est, only record a significant mean crossing if flow is 3. For freq_est, only record a significant mean crossing if flow is
experiencing congestion. experiencing congestion.
These three changes will remove the non-congestion noise from These three changes will remove the non-congestion noise from
freq_est. freq_est. A similar adjustment can be made for MAD based var_est.
3.3.2. Clock drift 3.3.3. Clock skew
Generally sender and receiver clock drift will be too small to cause Generally sender and receiver clock skew will be too small to cause
significant errors in the estimators. Skew_est is most sensitive to significant errors in the estimators. Skew_est is most sensitive to
this type of noise. In circumstances where clock drift is high, this type of noise. In circumstances where clock skew is high,
making M < N can reduce this error. making M < N can reduce this error.
A better method is to estimate the effect the clock drift is having A better method is to estimate the effect the clock skew is having on
on the E_N(E_T(OWD)), and then adjust mean_delay accordingly. A the summary statistics, and then adjust statistics accordingly. A
simple method of doing this follows: simple online method of doing this based on min_T(OWD) will be
described here in a subsequent version of the draft.
First divide the N E_T(OWD) values into two halves (N/2 in each)
-- old and new.
Calculate a mean of the old half:
Older_mean = E_old(E_T(OWD)) / N/2
Calculate a mean of the new (most recent) half:
Newer_mean = E_new(E_T(OWD)) / N/2
A linear estimate of the Clock Drift per T estimates is:
CD_T = (Newer_mean - Older_mean)/N/2
An adjusted mean estimate then is:
mean_delay = CD_Adj(E_M(E_T(OWD))) = E_M(E_T(OWD)) + CD_T *
(M/2 + 0.5)
CD_Adj can be thought of as a prediction of what the long term mean
will be in the current measurement period T. It is used as the basis
for skew_est and freq_est.
3.3.3. Bias in the skewness measure
If successive calculations of skew_est are made with very different
numbers of samples (num_T(OWD)), the simple calculation of
E_M(skew_est) used for grouping decisions will be biased by the
intervals that have few samples samples. This bias can be corrected
if necessary as follows.
skew_base_T = sum_T(OWD < mean_delay) - sum_T(OWD > mean_delay)
skew_est = sum_MT(skew_base_T)/num_MT(OWD)
This calculation requires slightly more state, since an
implementation will need to maintain two cyclic buffers storing
skew_base_T and num_T(OWD) respectively to manage the rolling
summations (note only one cyclic buffer is needed for the calculation
of skew_est outlined previously).
3.4. Reducing lag and Improving Responsiveness 3.4. Reducing lag and Improving Responsiveness
Measurement based shared bottleneck detection makes decisions in the Measurement based shared bottleneck detection makes decisions in the
present based on what has been measured in the past. This means that present based on what has been measured in the past. This means that
there is always a lag in responding to changing conditions. This there is always a lag in responding to changing conditions. This
mechanism is based on summary statistics taken over (N*T) seconds. mechanism is based on summary statistics taken over (N*T) seconds.
This mechanism can be made more responsive to changing conditions by: This mechanism can be made more responsive to changing conditions by:
1. Reducing N and/or M -- but at the expense of less accurate 1. Reducing N and/or M -- but at the expense of having less accurate
metrics, and/or metrics, and/or
2. Exploiting the fact that more recent measurements are more 2. Exploiting the fact that more recent measurements are more
valuable than older measurements and weighting them accordingly. valuable than older measurements and weighting them accordingly.
Although more recent measurements are more valuable, older Although more recent measurements are more valuable, older
measurements are still needed to gain an accurate estimate of the measurements are still needed to gain an accurate estimate of the
distribution descriptor we are measuring. Unfortunately, the simple distribution descriptor we are measuring. Unfortunately, the simple
exponentially weighted moving average weights drop off too quickly exponentially weighted moving average weights drop off too quickly
for our requirements and have an infinite tail. A simple linearly for our requirements and have an infinite tail. A simple linearly
skipping to change at page 15, line 8 skipping to change at page 16, line 8
to the most recent measurements. We propose a piecewise linear to the most recent measurements. We propose a piecewise linear
distribution of weights, such that the first section (samples 1:F) is distribution of weights, such that the first section (samples 1:F) is
flat as in a simple moving average, and the second section (samples flat as in a simple moving average, and the second section (samples
F+1:M) is linearly declining weights to the end of the averaging F+1:M) is linearly declining weights to the end of the averaging
window. We choose integer weights, which allows incremental window. We choose integer weights, which allows incremental
calculation without introducing rounding errors. calculation without introducing rounding errors.
3.4.1. Improving the response of the skewness estimate 3.4.1. Improving the response of the skewness estimate
The weighted moving average for skew_est, based on skew_est in The weighted moving average for skew_est, based on skew_est in
Section 3.3.3, can be calculated as follows: Section 3.1.2, can be calculated as follows:
skew_est = ((M-F+1)*sum(skew_base_T(1:F)) skew_est = ((M-F+1)*sum(skew_base_T(1:F))
+ sum([(M-F):1].*skew_base_T(F+1:M))) + sum([(M-F):1].*skew_base_T(F+1:M)))
/ ((M-F+1)*sum(numsampT(1:F)) / ((M-F+1)*sum(numsampT(1:F))
+ sum([(M-F):1].*numsampT(F+1:M))) + sum([(M-F):1].*numsampT(F+1:M)))
where numsampT is an array of the number of OWD samples in each T (ie where numsampT is an array of the number of OWD samples in each T
num_T(OWD)), and numsampT(1) is the most recent; skew_base_T(1) is (i.e. num_T(OWD)), and numsampT(1) is the most recent; skew_base_T(1)
the most recent calculation of skew_base_T; 1:F refers to the integer is the most recent calculation of skew_base_T; 1:F refers to the
values 1 through to F, and [(M-F):1] refers to an array of the integer values 1 through to F, and [(M-F):1] refers to an array of
integer values (M-F) declining through to 1; and ".*" is the array the integer values (M-F) declining through to 1; and ".*" is the
scalar dot product operator. array scalar dot product operator.
3.4.2. Improving the response of the variance estimate 3.4.2. Improving the response of the variability estimate
The weighted moving average for var_est can be calculated as follows: The weighted moving average for var_est can be calculated as follows:
var_est = ((M-F+1)*sum(PDV(1:F)) + sum([(M-F):1].*PDV(F+1:M))) var_est = ((M-F+1)*sum(PDV(1:F)) + sum([(M-F):1].*PDV(F+1:M)))
/ (F*(M-F+1) + sum([(M-F):1]) / (F*(M-F+1) + sum([(M-F):1])
where 1:F refers to the integer values 1 through to F, and [(M-F):1] where 1:F refers to the integer values 1 through to F, and [(M-F):1]
refers to an array of the integer values (M-F) declining through to refers to an array of the integer values (M-F) declining through to
1; and ".*" is the array scalar dot product operator. When removing 1; and ".*" is the array scalar dot product operator. When removing
oscillation noise (see Section 3.3.1) this calculation must be oscillation noise (see Section 3.3.2) this calculation must be
adjusted to allow for invalid PDV records. adjusted to allow for invalid PDV records.
4. Measuring OWD 4. Measuring OWD
This section discusses the OWD measurements required for this This section discusses the OWD measurements required for this
algorithm to detect shared bottlenecks. algorithm to detect shared bottlenecks.
The SBD mechanism described in this draft relies on differences The SBD mechanism described in this draft relies on differences
between OWD measurements to avoid the practical problems with between OWD measurements to avoid the practical problems with
measuring absolute OWD (see [Hayes-LCN14] section IIIC). Since all measuring absolute OWD (see [Hayes-LCN14] section IIIC). Since all
skipping to change at page 17, line 9 skipping to change at page 18, line 9
Non-authenticated RTCP packets carrying shared bottleneck indications Non-authenticated RTCP packets carrying shared bottleneck indications
and summary statistics could allow attackers to alter the bottleneck and summary statistics could allow attackers to alter the bottleneck
sharing characteristics for private gain or disruption of other sharing characteristics for private gain or disruption of other
parties communication. parties communication.
8. Change history 8. Change history
Changes made to this document: Changes made to this document:
02->WG-00 : Fixed missing 0.5 in 3.3.2 and missing brace in 3.3.3 WG-00->WG-01 : Moved unbiased skew section to replace skew
estimate, more robust variability estimator, the
term variance replaced with variability, clock
drift term corrected to clock skew, revision to
clock skew section with a place holder, description
of parameters.
01->02 : New section describing improvements to the key metric 02->WG-00 : Fixed missing 0.5 in 3.3.2 and missing brace in
calculations that help to remove noise, bias, and 3.3.3
reduce lag. Some revisions to the notation to make
it clearer. Some tightening of the thresholds.
00->01 : Revisions to terminology for clarity 01->02 : New section describing improvements to the key
metric calculations that help to remove noise,
bias, and reduce lag. Some revisions to the
notation to make it clearer. Some tightening of
the thresholds.
00->01 : Revisions to terminology for clarity
9. References 9. References
9.1. Normative References 9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
9.2. Informative References 9.2. Informative References
 End of changes. 61 change blocks. 
175 lines changed or deleted 222 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/