draft-ietf-rmcat-sbd-00.txt | draft-ietf-rmcat-sbd-01.txt | |||
---|---|---|---|---|

RTP Media Congestion Avoidance D. Hayes, Ed. | RTP Media Congestion Avoidance D. Hayes, Ed. | |||

Techniques University of Oslo | Techniques University of Oslo | |||

Internet-Draft S. Ferlin | Internet-Draft S. Ferlin | |||

Intended status: Experimental Simula Research Laboratory | Intended status: Experimental Simula Research Laboratory | |||

Expires: November 9, 2015 M. Welzl | Expires: January 2, 2016 M. Welzl | |||

University of Oslo | University of Oslo | |||

May 8, 2015 | July 1, 2015 | |||

Shared Bottleneck Detection for Coupled Congestion Control for RTP | Shared Bottleneck Detection for Coupled Congestion Control for RTP | |||

Media. | Media. | |||

draft-ietf-rmcat-sbd-00 | draft-ietf-rmcat-sbd-01 | |||

Abstract | Abstract | |||

This document describes a mechanism to detect whether end-to-end data | This document describes a mechanism to detect whether end-to-end data | |||

flows share a common bottleneck. It relies on summary statistics | flows share a common bottleneck. It relies on summary statistics | |||

that are calculated by a data receiver based on continuous | that are calculated by a data receiver based on continuous | |||

measurements and regularly fed to a grouping algorithm that runs | measurements and regularly fed to a grouping algorithm that runs | |||

wherever the knowledge is needed. This mechanism complements the | wherever the knowledge is needed. This mechanism complements the | |||

coupled congestion control mechanism in draft-welzl-rmcat-coupled-cc. | coupled congestion control mechanism in draft-welzl-rmcat-coupled-cc. | |||

skipping to change at page 1, line 39 | skipping to change at page 1, line 39 | |||

Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||

Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||

working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||

Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||

Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||

and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||

time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||

material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||

This Internet-Draft will expire on November 9, 2015. | This Internet-Draft will expire on January 2, 2016. | |||

Copyright Notice | Copyright Notice | |||

Copyright (c) 2015 IETF Trust and the persons identified as the | Copyright (c) 2015 IETF Trust and the persons identified as the | |||

document authors. All rights reserved. | document authors. All rights reserved. | |||

This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||

Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||

(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||

publication of this document. Please review these documents | publication of this document. Please review these documents | |||

skipping to change at page 2, line 18 | skipping to change at page 2, line 18 | |||

described in the Simplified BSD License. | described in the Simplified BSD License. | |||

Table of Contents | Table of Contents | |||

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||

1.1. The signals . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1. The signals . . . . . . . . . . . . . . . . . . . . . . . 3 | |||

1.1.1. Packet Loss . . . . . . . . . . . . . . . . . . . . . 3 | 1.1.1. Packet Loss . . . . . . . . . . . . . . . . . . . . . 3 | |||

1.1.2. Packet Delay . . . . . . . . . . . . . . . . . . . . . 3 | 1.1.2. Packet Delay . . . . . . . . . . . . . . . . . . . . . 3 | |||

1.1.3. Path Lag . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1.3. Path Lag . . . . . . . . . . . . . . . . . . . . . . . 4 | |||

2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||

2.1. Parameter Values . . . . . . . . . . . . . . . . . . . . . 5 | 2.1. Parameters and their Effect . . . . . . . . . . . . . . . 5 | |||

3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 2.2. Recommended Parameter Values . . . . . . . . . . . . . . . 7 | |||

3.1. Key metrics and their calculation . . . . . . . . . . . . 7 | 3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||

3.1.1. Mean delay . . . . . . . . . . . . . . . . . . . . . . 7 | 3.1. Key metrics and their calculation . . . . . . . . . . . . 9 | |||

3.1.2. Skewness Estimate . . . . . . . . . . . . . . . . . . 8 | 3.1.1. Mean delay . . . . . . . . . . . . . . . . . . . . . . 9 | |||

3.1.3. Variance Estimate . . . . . . . . . . . . . . . . . . 9 | 3.1.2. Skewness Estimate . . . . . . . . . . . . . . . . . . 9 | |||

3.1.4. Oscillation Estimate . . . . . . . . . . . . . . . . . 9 | 3.1.3. Variability Estimate . . . . . . . . . . . . . . . . . 10 | |||

3.1.5. Packet loss . . . . . . . . . . . . . . . . . . . . . 10 | 3.1.4. Oscillation Estimate . . . . . . . . . . . . . . . . . 11 | |||

3.2. Flow Grouping . . . . . . . . . . . . . . . . . . . . . . 10 | 3.1.5. Packet loss . . . . . . . . . . . . . . . . . . . . . 11 | |||

3.2.1. Flow Grouping Algorithm . . . . . . . . . . . . . . . 10 | 3.2. Flow Grouping . . . . . . . . . . . . . . . . . . . . . . 12 | |||

3.2.2. Using the flow group signal . . . . . . . . . . . . . 12 | 3.2.1. Flow Grouping Algorithm . . . . . . . . . . . . . . . 12 | |||

3.3. Removing Noise from the Estimates . . . . . . . . . . . . 12 | 3.2.2. Using the flow group signal . . . . . . . . . . . . . 13 | |||

3.3.1. Oscillation noise . . . . . . . . . . . . . . . . . . 12 | 3.3. Removing Noise from the Estimates . . . . . . . . . . . . 13 | |||

3.3.2. Clock drift . . . . . . . . . . . . . . . . . . . . . 13 | 3.3.1. PDV noise . . . . . . . . . . . . . . . . . . . . . . 14 | |||

3.3.3. Bias in the skewness measure . . . . . . . . . . . . . 14 | 3.3.2. Oscillation noise . . . . . . . . . . . . . . . . . . 14 | |||

3.4. Reducing lag and Improving Responsiveness . . . . . . . . 14 | 3.3.3. Clock skew . . . . . . . . . . . . . . . . . . . . . . 15 | |||

3.4.1. Improving the response of the skewness estimate . . . 15 | 3.4. Reducing lag and Improving Responsiveness . . . . . . . . 15 | |||

3.4.2. Improving the response of the variance estimate . . . 15 | 3.4.1. Improving the response of the skewness estimate . . . 16 | |||

4. Measuring OWD . . . . . . . . . . . . . . . . . . . . . . . . 16 | 3.4.2. Improving the response of the variability estimate . . 16 | |||

4.1. Time stamp resolution . . . . . . . . . . . . . . . . . . 16 | 4. Measuring OWD . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||

5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 | 4.1. Time stamp resolution . . . . . . . . . . . . . . . . . . 17 | |||

6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 | 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 | |||

7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 | |||

8. Change history . . . . . . . . . . . . . . . . . . . . . . . . 17 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 | |||

9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 | 8. Change history . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||

9.1. Normative References . . . . . . . . . . . . . . . . . . . 17 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||

9.2. Informative References . . . . . . . . . . . . . . . . . . 17 | 9.1. Normative References . . . . . . . . . . . . . . . . . . . 18 | |||

Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 | 9.2. Informative References . . . . . . . . . . . . . . . . . . 18 | |||

Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 | ||||

1. Introduction | 1. Introduction | |||

In the Internet, it is not normally known if flows (e.g., TCP | In the Internet, it is not normally known if flows (e.g., TCP | |||

connections or UDP data streams) traverse the same bottlenecks. Even | connections or UDP data streams) traverse the same bottlenecks. Even | |||

flows that have the same sender and receiver may take different paths | flows that have the same sender and receiver may take different paths | |||

and share a bottleneck or not. Flows that share a bottleneck link | and share a bottleneck or not. Flows that share a bottleneck link | |||

usually compete with one another for their share of the capacity. | usually compete with one another for their share of the capacity. | |||

This competition has the potential to increase packet loss and | This competition has the potential to increase packet loss and | |||

delays. This is especially relevant for interactive applications | delays. This is especially relevant for interactive applications | |||

skipping to change at page 3, line 47 | skipping to change at page 3, line 47 | |||

End-to-end delay measurements include noise from every device along | End-to-end delay measurements include noise from every device along | |||

the path in addition to the delay perturbation at the bottleneck | the path in addition to the delay perturbation at the bottleneck | |||

device. The noise is often significantly increased if the round-trip | device. The noise is often significantly increased if the round-trip | |||

time is used. The cleanest signal is obtained by using One-Way-Delay | time is used. The cleanest signal is obtained by using One-Way-Delay | |||

(OWD). | (OWD). | |||

Measuring absolute OWD is difficult since it requires both the sender | Measuring absolute OWD is difficult since it requires both the sender | |||

and receiver clocks to be synchronised. However, since the | and receiver clocks to be synchronised. However, since the | |||

statistics being collected are relative to the mean OWD, a relative | statistics being collected are relative to the mean OWD, a relative | |||

OWD measurement is sufficient. Clock drift is not usually | OWD measurement is sufficient. Clock skew is not usually significant | |||

significant over the time intervals used by this SBD mechanism (see | over the time intervals used by this SBD mechanism (see [RFC6817] A.2 | |||

[RFC6817] A.2 for a discussion on clock drift and OWD measurements). | for a discussion on clock skew and OWD measurements). However, in | |||

However, in circumstances where it is significant, Section 3.3.2 | circumstances where it is significant, Section 3.3.3 outlines a way | |||

outlines a way of adjusting the calculations to cater for it. | of adjusting the calculations to cater for it. | |||

Each packet arriving at the bottleneck buffer may experience very | Each packet arriving at the bottleneck buffer may experience very | |||

different queue lengths, and therefore different waiting times. A | different queue lengths, and therefore different waiting times. A | |||

single OWD sample does not, therefore, characterize the path well. | single OWD sample does not, therefore, characterize the path well. | |||

However, multiple OWD measurements do reflect the distribution of | However, multiple OWD measurements do reflect the distribution of | |||

delays experienced at the bottleneck. | delays experienced at the bottleneck. | |||

1.1.3. Path Lag | 1.1.3. Path Lag | |||

Flows that share a common bottleneck may traverse different paths, | Flows that share a common bottleneck may traverse different paths, | |||

skipping to change at page 4, line 31 | skipping to change at page 4, line 31 | |||

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||

"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||

document are to be interpreted as described in RFC 2119 [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||

Acronyms used in this document: | Acronyms used in this document: | |||

OWD -- One Way Delay | OWD -- One Way Delay | |||

PDV -- Packet Delay Variation | PDV -- Packet Delay Variation | |||

MAD -- Mean Absolute Deviation | ||||

RTT -- Round Trip Time | RTT -- Round Trip Time | |||

SBD -- Shared Bottleneck Detection | SBD -- Shared Bottleneck Detection | |||

Conventions used in this document: | Conventions used in this document: | |||

T -- the base time interval over which measurements are | T -- the base time interval over which measurements are | |||

made. | made. | |||

N -- the number of base time, T, intervals used in some | N -- the number of base time, T, intervals used in some | |||

calculations. | calculations. | |||

sum_T(...) -- summation of all the measurements of the variable | sum_T(...) -- summation of all the measurements of the variable | |||

in parentheses taken over the interval T | in parentheses taken over the interval T | |||

sum(...) -- summation of terms of the variable in parentheses | ||||

sum(...) -- summation of terms of the variable in parentheses | ||||

sum_N(...) -- summation of N terms of the variable in parentheses | sum_N(...) -- summation of N terms of the variable in parentheses | |||

sum_NT(...) -- summation of all measurements taken over the | ||||

interval N*T | ||||

E_T(...) -- the expectation or mean of the measurements of the | sum_NT(...) -- summation of all measurements taken over the | |||

variable in parentheses over T | interval N*T | |||

E_N(...) -- The expectation or mean of the last N values of the | E_T(...) -- the expectation or mean of the measurements of the | |||

variable in parentheses | variable in parentheses over T | |||

E_M(...) -- The expectation or mean of the last M values of the | E_N(...) -- the expectation or mean of the last N values of the | |||

variable in parentheses, where M <= N. | variable in parentheses | |||

E_M(...) -- the expectation or mean of the last M values of the | ||||

variable in parentheses, where M <= N. | ||||

max_T(...) -- the maximum recorded measurement of the variable in | max_T(...) -- the maximum recorded measurement of the variable in | |||

parentheses taken over the interval T | parentheses taken over the interval T | |||

min_T(...) -- the minimum recorded measurement of the variable in | min_T(...) -- the minimum recorded measurement of the variable in | |||

parentheses taken over the interval T | parentheses taken over the interval T | |||

num_T(...) -- the count of measurements of the variable in | num_T(...) -- the count of measurements of the variable in | |||

parentheses taken in the interval T | parentheses taken in the interval T | |||

num_VM(...) -- the count of valid values of the variable in | num_VM(...) -- the count of valid values of the variable in | |||

parentheses given M records | parentheses given M records | |||

PC -- a boolean variable indicating the particular flow was | PC -- a boolean variable indicating the particular flow | |||

identified as experiencing congestion in the previous | was identified as experiencing congestion in the | |||

interval T (i.e. Previously Congested) | previous interval T (i.e. Previously Congested) | |||

CD_T -- an estimate of the effect of Clock Drift on the mean | skew_est -- a measure of skewness in a OWD distribution. | |||

OWD per T | ||||

CD_Adj(...) -- Mean OWD adjusted for clock drift | var_est -- a measure of variability in OWD measurements. | |||

p_l, p_f, p_pdv, c_s, c_h, p_s, p_d, p_v -- various thresholds | freq_est -- a measure of low frequency oscillation in the OWD | |||

used in the mechanism. | measurements. | |||

N, M, and F -- number of values (calculated over T). | p_l, p_f, p_pdv, p_mad, c_s, c_h, p_s, p_d, p_v -- various | |||

thresholds used in the mechanism | ||||

2.1. Parameter Values | M and F -- number of values related to N | |||

2.1. Parameters and their Effect | ||||

T T should be long enough so that there are enough packets | ||||

received during T for a useful estimate of short term mean | ||||

OWD and variation statistics. Making T too large can limit | ||||

the efficacy of PDV and freq_est. It will also increase the | ||||

response time of the mechanism. Making T too small will make | ||||

the metrics noisier. | ||||

N & M N should be large enough provide a stable estimate of | ||||

oscillations in OWD and average PDV. Usually M=N, though | ||||

having M<N may be beneficial in certain circumstances. M*T | ||||

needs to be long enough provide stable estimates of skewness | ||||

and MAD (if used). | ||||

F F determines the number of intervals over which statistics | ||||

are considered to be equally weighted. When F=M recent and | ||||

older measurements are considered equal. Making F<M can | ||||

increase the responsiveness of the SBD mechanism. If F is | ||||

too small, statistics will be too noisy. | ||||

c_s c_s is the threshold in skew_est used for determining whether | ||||

a flow is experiencing congestion or not. It should be | ||||

slightly negative so that a very lightly loaded path does not | ||||

give a false indication. Setting c_s more negative makes the | ||||

SBD mechanism less sensitive to transient and light | ||||

congestion episodes. | ||||

c_s c_h adds hysteresis to the congestion determination. It | ||||

should be large enough to avoid constant switching in the | ||||

determination, but low enough to ensure that grouping is not | ||||

attempted when there is no congestion and the delay and loss | ||||

signals cannot be relied upon. | ||||

p_v p_v determines the sensitivity of freq_est to noise. Making | ||||

it smaller will yield higher but noisier values for freq_est. | ||||

Making it too large will render it ineffective for | ||||

determining groups. | ||||

p_* Flows are separated when the skew_est|var_est|freq_est | ||||

measure is greater than p_s|p_f|p_d|(p_pdv|p_mad). Adjusting | ||||

these is a compromise between false grouping of flows that do | ||||

not share a bottleneck and false splitting of flows that do. | ||||

Making them larger can help if the measures are very noisy, | ||||

but reducing the noise in the statistical measures by | ||||

adjusting T and N|M may be a better solution. | ||||

2.2. Recommended Parameter Values | ||||

Reference [Hayes-LCN14] uses T=350ms, N=50, p_l = 0.1. The other | Reference [Hayes-LCN14] uses T=350ms, N=50, p_l = 0.1. The other | |||

parameters have been tightened to reflect minor enhancements to the | parameters have been tightened to reflect minor enhancements to the | |||

algorithm outlined in Section 3.3: c_s = -0.01, p_f = p_s = p_d = | algorithm outlined in Section 3.3: c_s = -0.01, p_f = p_s = p_d = | |||

0.1, p_pdv = 0.2, p_v = 0.2. M=50, F=10, and c_h = 0.3 are | 0.1, p_pdv = 0.2, p_v = 0.2 (or p_mad=0.1, p_v=0.7). M=50, F=25, and | |||

additional parameters defined in the document. These are values that | c_h = 0.3 are additional parameters defined in the document. These | |||

seem to work well over a wide range of practical Internet conditions, | are values that seem to work well over a wide range of practical | |||

but are the subject of ongoing tests. | Internet conditions. | |||

3. Mechanism | 3. Mechanism | |||

The mechanism described in this document is based on the observation | The mechanism described in this document is based on the observation | |||

that the distribution of delay measurements of packets from flows | that the distribution of delay measurements of packets that traverse | |||

that share a common bottleneck have similar shape characteristics. | a common bottleneck have similar shape characteristics. These shape | |||

These shape characteristics are described using 3 key summary | characteristics are described using 3 key summary statistics: | |||

statistics: | ||||

variance (estimate var_est, see Section 3.1.3) | variability (estimate var_est, see Section 3.1.3) | |||

skewness (estimate skew_est, see Section 3.1.2) | skewness (estimate skew_est, see Section 3.1.2) | |||

oscillation (estimate freq_est, see Section 3.1.4) | oscillation (estimate freq_est, see Section 3.1.4) | |||

with packet loss (estimate pkt_loss, see Section 3.1.5) used as a | with packet loss (estimate pkt_loss, see Section 3.1.5) used as a | |||

supplementary statistic. | supplementary statistic. | |||

Summary statistics help to address both the noise and the path lag | Summary statistics help to address both the noise and the path lag | |||

problems by describing the general shape over a relatively long | problems by describing the general shape over a relatively long | |||

period of time. This is sufficient for their application in coupled | period of time. This is sufficient for their application in coupled | |||

congestion control for RTP Media. They can be signalled from a | congestion control for RTP Media. They can be signalled from a | |||

receiver, which measures the OWD and calculates the summary | receiver, which measures the OWD and calculates the summary | |||

statistics, to a sender, which is the entity that is transmitting the | statistics, to a sender, which is the entity that is transmitting the | |||

media stream. An RTP Media device may be both a sender and a | media stream. An RTP Media device may be both a sender and a | |||

receiver. SBD can be performed at either Sender or receiver or both. | receiver. SBD can be performed at either a sender or a receiver or | |||

both. | ||||

+----+ | +----+ | |||

| H2 | | | H2 | | |||

+----+ | +----+ | |||

| | | | |||

| L2 | | L2 | |||

| | | | |||

+----+ L1 | L3 +----+ | +----+ L1 | L3 +----+ | |||

| H1 |------|------| H3 | | | H1 |------|------| H3 | | |||

+----+ +----+ | +----+ +----+ | |||

skipping to change at page 7, line 50 | skipping to change at page 9, line 28 | |||

necessarily be synchronized. However, it is a base measure for the 3 | necessarily be synchronized. However, it is a base measure for the 3 | |||

summary statistics. The mean delay, E_T(OWD), is the average one way | summary statistics. The mean delay, E_T(OWD), is the average one way | |||

delay measured over T. | delay measured over T. | |||

To facilitate the other calculations, the last N E_T(OWD) values will | To facilitate the other calculations, the last N E_T(OWD) values will | |||

need to be stored in a cyclic buffer along with the moving average of | need to be stored in a cyclic buffer along with the moving average of | |||

E_T(OWD): | E_T(OWD): | |||

mean_delay = E_M(E_T(OWD)) = sum_M(E_T(OWD)) / M | mean_delay = E_M(E_T(OWD)) = sum_M(E_T(OWD)) / M | |||

where M <= N. Generally M=N, setting M to be less than N allows the | where M <= N. Generally M=N: setting M to be less than N allows the | |||

mechanism to be more responsive to changes, but potentially at the | mechanism to be more responsive to changes, but potentially at the | |||

expense of a higher error rate (see Section 3.4 for a discussion on | expense of a higher error rate (see Section 3.4 for a discussion on | |||

improving the responsiveness of the mechanism.) | improving the responsiveness of the mechanism.) | |||

3.1.2. Skewness Estimate | 3.1.2. Skewness Estimate | |||

Skewness is difficult to calculate efficiently and accurately. | Skewness is difficult to calculate efficiently and accurately. | |||

Ideally it should be calculated over the entire period (M * T) from | Ideally it should be calculated over the entire period (M * T) from | |||

the mean OWD over that period. However this would require storing | the mean OWD over that period. However this would require storing | |||

every delay measurement over the period. Instead, an estimate is | every delay measurement over the period. Instead, an estimate is | |||

made over T using the previous calculation of mean_delay. | made over M * T based on a calculation every T using the previous T's | |||

Comparisons are made using the mean of M skew estimates (an | calculation of mean_delay. | |||

alternative that removes bias in the mean is given in Section 3.3.3). | ||||

The skewness is estimated using two counters, counting the number of | The skewness is estimated using two counters, counting the number of | |||

one way delay samples (OWD) above and below the mean: | one way delay samples (OWD) above and below the mean: | |||

skew_est_T = (sum_T(OWD < mean_delay) | skew_base_T = sum_T(OWD < mean_delay) - sum_T(OWD > mean_delay) | |||

- sum_T(OWD > mean_delay)) / num_T(OWD) | ||||

where | where | |||

if (OWD < mean_delay) 1 else 0 | if (OWD < mean_delay) 1 else 0 | |||

if (OWD > mean_delay) 1 else 0 | if (OWD > mean_delay) 1 else 0 | |||

skew_est_T is a number between -1 and 1 | and mean_delay does not include the mean of the current T | |||

interval. | ||||

skew_est = E_M(skew_est_T) = sum_M(skew_est_T) / M | skew_est = sum_MT(skew_base_T)/num_MT(OWD) | |||

For implementation ease, mean_delay does not include the mean of the | where skew_est is a number between -1 and 1 | |||

current T interval. | ||||

Note: Care must be taken when implementing the comparisons to ensure | Note: Care must be taken when implementing the comparisons to ensure | |||

that rounding does not bias skew_est. It is important that the mean | that rounding does not bias skew_est. It is important that the mean | |||

is calculated with a higher precision than the samples. | is calculated with a higher precision than the samples. | |||

3.1.3. Variance Estimate | 3.1.3. Variability Estimate | |||

Packet Delay Variation (PDV) ([RFC5481] and [ITU-Y1540]) is used as | Packet Delay Variation (PDV) ([RFC5481] and [ITU-Y1540]) is used as | |||

an estimator of the variance of the delay signal. We define PDV as | an estimator of the variability of the delay signal. We define PDV | |||

follows: | as follows: | |||

PDV = PDV_max = max_T(OWD) - E_T(OWD) | PDV = PDV_max = max_T(OWD) - E_T(OWD) | |||

var_est = E_M(PDV) = sum_M(PDV) / M | var_est = E_M(PDV) = sum_M(PDV) / M | |||

This modifies PDV as outlined in [RFC5481] to provide a summary | This modifies PDV as outlined in [RFC5481] to provide a summary | |||

statistic version that best aids the grouping decisions of the | statistic version that best aids the grouping decisions of the | |||

algorithm (see [Hayes-LCN14] section IVB). | algorithm (see [Hayes-LCN14] section IVB). | |||

The use of PDV = PDV_min = E_T(OWD) - min_T(OWD) is currently being | Generally the maximum is sampled well during congestion, though it is | |||

investigated as an alternative that is less sensitive to noise. The | more sensitive to path and operating system noise. The use of PDV = | |||

drawback of using PDV_min is that it does not distinguish between | PDV_min = E_T(OWD) - min_T(OWD) would be less sensitive to this | |||

groups of flows with similar values of skew_est as well as PDV_max | noise, but is not well sampled during congestion at the bottleneck | |||

(see [Hayes-LCN14] section IVB). | and therefore not recommended. | |||

3.1.4. Oscillation Estimate | 3.1.4. Oscillation Estimate | |||

An estimate of the low frequency oscillation of the delay signal is | An estimate of the low frequency oscillation of the delay signal is | |||

calculated by counting and normalising the significant mean, | calculated by counting and normalising the significant mean, | |||

E_T(OWD), crossings of mean_delay: | E_T(OWD), crossings of mean_delay: | |||

freq_est = number_of_crossings / N | freq_est = number_of_crossings / N | |||

Where | where we define a significant mean crossing as a crossing that | |||

we define a significant mean crossing as a crossing that | ||||

extends p_v * var_est from mean_delay. In our experiments we | extends p_v * var_est from mean_delay. In our experiments we | |||

have found that p_v = 0.2 is a good value. | have found that p_v = 0.2 is a good value. | |||

Freq_est is a number between 0 and 1. Freq_est can be approximated | Freq_est is a number between 0 and 1. Freq_est can be approximated | |||

incrementally as follows: | incrementally as follows: | |||

With each new calculation of E_T(OWD) a decision is made as to | With each new calculation of E_T(OWD) a decision is made as to | |||

whether this value of E_T(OWD) significantly crosses the current | whether this value of E_T(OWD) significantly crosses the current | |||

long term mean, mean_delay, with respect to the previous | long term mean, mean_delay, with respect to the previous | |||

significant mean crossing. | significant mean crossing. | |||

A cyclic buffer, last_N_crossings, records a 1 if there is a | A cyclic buffer, last_N_crossings, records a 1 if there is a | |||

significant mean crossing, otherwise a 0. | significant mean crossing, otherwise a 0. | |||

The counter, number_of_crossings, is incremented when there is a | The counter, number_of_crossings, is incremented when there is a | |||

significant mean crossing and subtracted from when a non-zero | significant mean crossing and decremented when a non-zero value is | |||

value is removed from the last_N_crossings. | removed from the last_N_crossings. | |||

This approximation of freq_est was not used in [Hayes-LCN14], which | This approximation of freq_est was not used in [Hayes-LCN14], which | |||

calculated freq_est every T using the current E_N(E_T(OWD)). Our | calculated freq_est every T using the current E_N(E_T(OWD)). Our | |||

tests show that this approximation of freq_est yields results that | tests show that this approximation of freq_est yields results that | |||

are almost identical to when the full calculation is performed every | are almost identical to when the full calculation is performed every | |||

T. | T. | |||

3.1.5. Packet loss | 3.1.5. Packet loss | |||

The proportion of packets lost is used as a supplementary measure: | The proportion of packets lost is used as a supplementary measure: | |||

pkt_loss = sum_NT(lost packets) / sum_NT(total packets) | pkt_loss = sum_NT(lost packets) / sum_NT(total packets) | |||

Note: When pkt_loss is small it is very variable, however, when | Note: When pkt_loss is small it is very variable, however, when | |||

pkt_loss is high it becomes a stable measure for making grouping | pkt_loss is high it becomes a stable measure for making grouping | |||

decisions. | decisions.. | |||

3.2. Flow Grouping | 3.2. Flow Grouping | |||

3.2.1. Flow Grouping Algorithm | 3.2.1. Flow Grouping Algorithm | |||

The following grouping algorithm is RECOMMENDED for SBD in the RMCAT | The following grouping algorithm is RECOMMENDED for SBD in the RMCAT | |||

context and is sufficient and efficient for small to moderate numbers | context and is sufficient and efficient for small to moderate numbers | |||

of flows. For very large numbers of flows (e.g. hundreds), a more | of flows. For very large numbers of flows (e.g. hundreds), a more | |||

complex clustering algorithm may be substituted. | complex clustering algorithm may be substituted. | |||

skipping to change at page 11, line 51 | skipping to change at page 13, line 30 | |||

diff(skew_est) < p_s | diff(skew_est) < p_s | |||

otherwise | otherwise | |||

diff(pkt_loss) < (p_d * pkt_loss) | diff(pkt_loss) < (p_d * pkt_loss) | |||

The threshold, (p_d * pkt_loss), is with respect to the | The threshold, (p_d * pkt_loss), is with respect to the | |||

highest value in the difference. | highest value in the difference. | |||

This procedure involves sorting estimates from highest to lowest. It | This procedure involves sorting estimates from highest to lowest. It | |||

is simple to implement, and efficient for small numbers of flows, | is simple to implement, and efficient for small numbers of flows (up | |||

such as are expected in RTCWEB. | to 10-20). | |||

3.2.2. Using the flow group signal | 3.2.2. Using the flow group signal | |||

A grouping decisions is made every T from the second T, though they | A grouping decisions is made every T from the second T, though they | |||

will not attain their full design accuracy until after the N'th T | will not attain their full design accuracy until after the N'th T | |||

interval. | interval. | |||

Network conditions, and even the congestion controllers, can cause | Network conditions, and even the congestion controllers, can cause | |||

bottlenecks to fluctuate. A coupled congestion controller MAY decide | bottlenecks to fluctuate. A coupled congestion controller MAY decide | |||

only to couple groups that remain stable, say grouped together 90% of | only to couple groups that remain stable, say grouped together 90% of | |||

skipping to change at page 12, line 26 | skipping to change at page 14, line 5 | |||

coupled congestion controllers objectives. | coupled congestion controllers objectives. | |||

3.3. Removing Noise from the Estimates | 3.3. Removing Noise from the Estimates | |||

The following describe small changes to the calculation of the key | The following describe small changes to the calculation of the key | |||

metrics that help remove noise from them. Currently these "tweaks" | metrics that help remove noise from them. Currently these "tweaks" | |||

are described separately to keep the main description succinct. In | are described separately to keep the main description succinct. In | |||

future revisions of the draft these enhancements may replace the | future revisions of the draft these enhancements may replace the | |||

original key metric calculations. | original key metric calculations. | |||

3.3.1. Oscillation noise | 3.3.1. PDV noise | |||

When a path has no congestion, the PDV will be very small and the | Usually during congestion the max_T(OWD) is quite well sampled as the | |||

delay distribution is skewed toward the maximum. However max_T(OWD) | ||||

is subject to delay noise from other queues along the path as well as | ||||

the host operating system. Min_T(OWD) is less prone to noise along | ||||

the path and from the host operating system, but is not well sampled | ||||

during congestion (i.e. when there is a bottleneck). Flows with very | ||||

different packet send rates exacerbate the problem. | ||||

An alternative delay variation measure that is less sensitive to | ||||

extreme values and different send rates is Mean Absolute Deviation | ||||

(MAD). It can be implemented in an online manner as follows: | ||||

var_base_T = sum_T(|OWD - E_T(OWD)|) | ||||

where | ||||

|x| is the absolute value of x | ||||

E_T(OWD) is the mean OWD calculated in the previous T | ||||

var_est = MAD_MT = sum_MT(var_base_T)/num_MT(OWD) | ||||

For calculation of freq_est p_v=0.7 (MAD is a smaller number than | ||||

PDV) | ||||

For the grouping threshold p_mad=0.1 instead of p_pdv (MAD is less | ||||

noisy so the test can be tighter) | ||||

Note that the method for improving responsiveness of MAD_MT is the | ||||

same as that described in Section 3.4.1 for skew_est. | ||||

3.3.2. Oscillation noise | ||||

When a path has no congestion, var_est will be very small and the | ||||

recorded significant mean crossings will be the result of path noise. | recorded significant mean crossings will be the result of path noise. | |||

Thus up to N-1 meaningless mean crossings can be a source of error at | Thus up to N-1 meaningless mean crossings can be a source of error at | |||

the point a link becomes a bottleneck and flows traversing it begin | the point a link becomes a bottleneck and flows traversing it begin | |||

to be grouped. | to be grouped. | |||

To remove this source of noise from freq_est: | To remove this source of noise from freq_est: | |||

1. Set the current PDV to PDV = NaN (a value representing an invalid | 1. Set the current PDV to PDV = NaN (a value representing an invalid | |||

record, ie Not a Number) for flows that are deemed to not be | record, i.e. Not a Number) for flows that are deemed to not be | |||

experiencing congestion by the first skew_est based grouping test | experiencing congestion by the first skew_est based grouping test | |||

(see Section 3.2.1). | (see Section 3.2.1). | |||

2. Then var_est = sum_M(PDV != NaN) / num_VM(PDV) | 2. Then var_est = sum_M(PDV != NaN) / num_VM(PDV) | |||

3. For freq_est, only record a significant mean crossing if flow is | 3. For freq_est, only record a significant mean crossing if flow is | |||

experiencing congestion. | experiencing congestion. | |||

These three changes will remove the non-congestion noise from | These three changes will remove the non-congestion noise from | |||

freq_est. | freq_est. A similar adjustment can be made for MAD based var_est. | |||

3.3.2. Clock drift | 3.3.3. Clock skew | |||

Generally sender and receiver clock drift will be too small to cause | Generally sender and receiver clock skew will be too small to cause | |||

significant errors in the estimators. Skew_est is most sensitive to | significant errors in the estimators. Skew_est is most sensitive to | |||

this type of noise. In circumstances where clock drift is high, | this type of noise. In circumstances where clock skew is high, | |||

making M < N can reduce this error. | making M < N can reduce this error. | |||

A better method is to estimate the effect the clock drift is having | A better method is to estimate the effect the clock skew is having on | |||

on the E_N(E_T(OWD)), and then adjust mean_delay accordingly. A | the summary statistics, and then adjust statistics accordingly. A | |||

simple method of doing this follows: | simple online method of doing this based on min_T(OWD) will be | |||

described here in a subsequent version of the draft. | ||||

First divide the N E_T(OWD) values into two halves (N/2 in each) | ||||

-- old and new. | ||||

Calculate a mean of the old half: | ||||

Older_mean = E_old(E_T(OWD)) / N/2 | ||||

Calculate a mean of the new (most recent) half: | ||||

Newer_mean = E_new(E_T(OWD)) / N/2 | ||||

A linear estimate of the Clock Drift per T estimates is: | ||||

CD_T = (Newer_mean - Older_mean)/N/2 | ||||

An adjusted mean estimate then is: | ||||

mean_delay = CD_Adj(E_M(E_T(OWD))) = E_M(E_T(OWD)) + CD_T * | ||||

(M/2 + 0.5) | ||||

CD_Adj can be thought of as a prediction of what the long term mean | ||||

will be in the current measurement period T. It is used as the basis | ||||

for skew_est and freq_est. | ||||

3.3.3. Bias in the skewness measure | ||||

If successive calculations of skew_est are made with very different | ||||

numbers of samples (num_T(OWD)), the simple calculation of | ||||

E_M(skew_est) used for grouping decisions will be biased by the | ||||

intervals that have few samples samples. This bias can be corrected | ||||

if necessary as follows. | ||||

skew_base_T = sum_T(OWD < mean_delay) - sum_T(OWD > mean_delay) | ||||

skew_est = sum_MT(skew_base_T)/num_MT(OWD) | ||||

This calculation requires slightly more state, since an | ||||

implementation will need to maintain two cyclic buffers storing | ||||

skew_base_T and num_T(OWD) respectively to manage the rolling | ||||

summations (note only one cyclic buffer is needed for the calculation | ||||

of skew_est outlined previously). | ||||

3.4. Reducing lag and Improving Responsiveness | 3.4. Reducing lag and Improving Responsiveness | |||

Measurement based shared bottleneck detection makes decisions in the | Measurement based shared bottleneck detection makes decisions in the | |||

present based on what has been measured in the past. This means that | present based on what has been measured in the past. This means that | |||

there is always a lag in responding to changing conditions. This | there is always a lag in responding to changing conditions. This | |||

mechanism is based on summary statistics taken over (N*T) seconds. | mechanism is based on summary statistics taken over (N*T) seconds. | |||

This mechanism can be made more responsive to changing conditions by: | This mechanism can be made more responsive to changing conditions by: | |||

1. Reducing N and/or M -- but at the expense of less accurate | 1. Reducing N and/or M -- but at the expense of having less accurate | |||

metrics, and/or | metrics, and/or | |||

2. Exploiting the fact that more recent measurements are more | 2. Exploiting the fact that more recent measurements are more | |||

valuable than older measurements and weighting them accordingly. | valuable than older measurements and weighting them accordingly. | |||

Although more recent measurements are more valuable, older | Although more recent measurements are more valuable, older | |||

measurements are still needed to gain an accurate estimate of the | measurements are still needed to gain an accurate estimate of the | |||

distribution descriptor we are measuring. Unfortunately, the simple | distribution descriptor we are measuring. Unfortunately, the simple | |||

exponentially weighted moving average weights drop off too quickly | exponentially weighted moving average weights drop off too quickly | |||

for our requirements and have an infinite tail. A simple linearly | for our requirements and have an infinite tail. A simple linearly | |||

skipping to change at page 15, line 8 | skipping to change at page 16, line 8 | |||

to the most recent measurements. We propose a piecewise linear | to the most recent measurements. We propose a piecewise linear | |||

distribution of weights, such that the first section (samples 1:F) is | distribution of weights, such that the first section (samples 1:F) is | |||

flat as in a simple moving average, and the second section (samples | flat as in a simple moving average, and the second section (samples | |||

F+1:M) is linearly declining weights to the end of the averaging | F+1:M) is linearly declining weights to the end of the averaging | |||

window. We choose integer weights, which allows incremental | window. We choose integer weights, which allows incremental | |||

calculation without introducing rounding errors. | calculation without introducing rounding errors. | |||

3.4.1. Improving the response of the skewness estimate | 3.4.1. Improving the response of the skewness estimate | |||

The weighted moving average for skew_est, based on skew_est in | The weighted moving average for skew_est, based on skew_est in | |||

Section 3.3.3, can be calculated as follows: | Section 3.1.2, can be calculated as follows: | |||

skew_est = ((M-F+1)*sum(skew_base_T(1:F)) | skew_est = ((M-F+1)*sum(skew_base_T(1:F)) | |||

+ sum([(M-F):1].*skew_base_T(F+1:M))) | + sum([(M-F):1].*skew_base_T(F+1:M))) | |||

/ ((M-F+1)*sum(numsampT(1:F)) | / ((M-F+1)*sum(numsampT(1:F)) | |||

+ sum([(M-F):1].*numsampT(F+1:M))) | + sum([(M-F):1].*numsampT(F+1:M))) | |||

where numsampT is an array of the number of OWD samples in each T (ie | where numsampT is an array of the number of OWD samples in each T | |||

num_T(OWD)), and numsampT(1) is the most recent; skew_base_T(1) is | (i.e. num_T(OWD)), and numsampT(1) is the most recent; skew_base_T(1) | |||

the most recent calculation of skew_base_T; 1:F refers to the integer | is the most recent calculation of skew_base_T; 1:F refers to the | |||

values 1 through to F, and [(M-F):1] refers to an array of the | integer values 1 through to F, and [(M-F):1] refers to an array of | |||

integer values (M-F) declining through to 1; and ".*" is the array | the integer values (M-F) declining through to 1; and ".*" is the | |||

scalar dot product operator. | array scalar dot product operator. | |||

3.4.2. Improving the response of the variance estimate | 3.4.2. Improving the response of the variability estimate | |||

The weighted moving average for var_est can be calculated as follows: | The weighted moving average for var_est can be calculated as follows: | |||

var_est = ((M-F+1)*sum(PDV(1:F)) + sum([(M-F):1].*PDV(F+1:M))) | var_est = ((M-F+1)*sum(PDV(1:F)) + sum([(M-F):1].*PDV(F+1:M))) | |||

/ (F*(M-F+1) + sum([(M-F):1]) | / (F*(M-F+1) + sum([(M-F):1]) | |||

where 1:F refers to the integer values 1 through to F, and [(M-F):1] | where 1:F refers to the integer values 1 through to F, and [(M-F):1] | |||

refers to an array of the integer values (M-F) declining through to | refers to an array of the integer values (M-F) declining through to | |||

1; and ".*" is the array scalar dot product operator. When removing | 1; and ".*" is the array scalar dot product operator. When removing | |||

oscillation noise (see Section 3.3.1) this calculation must be | oscillation noise (see Section 3.3.2) this calculation must be | |||

adjusted to allow for invalid PDV records. | adjusted to allow for invalid PDV records. | |||

4. Measuring OWD | 4. Measuring OWD | |||

This section discusses the OWD measurements required for this | This section discusses the OWD measurements required for this | |||

algorithm to detect shared bottlenecks. | algorithm to detect shared bottlenecks. | |||

The SBD mechanism described in this draft relies on differences | The SBD mechanism described in this draft relies on differences | |||

between OWD measurements to avoid the practical problems with | between OWD measurements to avoid the practical problems with | |||

measuring absolute OWD (see [Hayes-LCN14] section IIIC). Since all | measuring absolute OWD (see [Hayes-LCN14] section IIIC). Since all | |||

skipping to change at page 17, line 9 | skipping to change at page 18, line 9 | |||

Non-authenticated RTCP packets carrying shared bottleneck indications | Non-authenticated RTCP packets carrying shared bottleneck indications | |||

and summary statistics could allow attackers to alter the bottleneck | and summary statistics could allow attackers to alter the bottleneck | |||

sharing characteristics for private gain or disruption of other | sharing characteristics for private gain or disruption of other | |||

parties communication. | parties communication. | |||

8. Change history | 8. Change history | |||

Changes made to this document: | Changes made to this document: | |||

02->WG-00 : Fixed missing 0.5 in 3.3.2 and missing brace in 3.3.3 | WG-00->WG-01 : Moved unbiased skew section to replace skew | |||

estimate, more robust variability estimator, the | ||||

term variance replaced with variability, clock | ||||

drift term corrected to clock skew, revision to | ||||

clock skew section with a place holder, description | ||||

of parameters. | ||||

01->02 : New section describing improvements to the key metric | 02->WG-00 : Fixed missing 0.5 in 3.3.2 and missing brace in | |||

calculations that help to remove noise, bias, and | 3.3.3 | |||

reduce lag. Some revisions to the notation to make | ||||

it clearer. Some tightening of the thresholds. | ||||

00->01 : Revisions to terminology for clarity | 01->02 : New section describing improvements to the key | |||

metric calculations that help to remove noise, | ||||

bias, and reduce lag. Some revisions to the | ||||

notation to make it clearer. Some tightening of | ||||

the thresholds. | ||||

00->01 : Revisions to terminology for clarity | ||||

9. References | 9. References | |||

9.1. Normative References | 9.1. Normative References | |||

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||

Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||

9.2. Informative References | 9.2. Informative References | |||

End of changes. 61 change blocks. | ||||

175 lines changed or deleted | | 222 lines changed or added | ||

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |