draft-ietf-rmcat-video-traffic-model-05.txt   draft-ietf-rmcat-video-traffic-model-06.txt 
Network Working Group X. Zhu Network Working Group X. Zhu
Internet-Draft S. Mena Internet-Draft S. Mena
Intended status: Informational Cisco Systems Intended status: Informational Cisco Systems
Expires: January 20, 2019 Z. Sarker Expires: May 7, 2019 Z. Sarker
Ericsson AB Ericsson AB
July 19, 2018 November 3, 2018
Video Traffic Models for RTP Congestion Control Evaluations Video Traffic Models for RTP Congestion Control Evaluations
draft-ietf-rmcat-video-traffic-model-05 draft-ietf-rmcat-video-traffic-model-06
Abstract Abstract
This document describes two reference video traffic models for This document describes two reference video traffic models for
evaluating RTP congestion control algorithms. The first model evaluating RTP congestion control algorithms. The first model
statistically characterizes the behavior of a live video encoder in statistically characterizes the behavior of a live video encoder in
response to changing requests on target video rate. The second model response to changing requests on target video rate. The second model
is trace-driven, and emulates the output of actual encoded video is trace-driven, and emulates the output of actual encoded video
frame sizes from a high-resolution test sequence. Both models are frame sizes from a high-resolution test sequence. Both models are
designed to strike a balance between simplicity, repeatability, and designed to strike a balance between simplicity, repeatability, and
skipping to change at page 1, line 42 skipping to change at page 1, line 42
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 20, 2019. This Internet-Draft will expire on May 7, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 24 skipping to change at page 2, line 24
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Desired Behavior of A Synthetic Video Traffic Model . . . . . 3 3. Desired Behavior of A Synthetic Video Traffic Model . . . . . 3
4. Interactions Between Synthetic Video Traffic Source and 4. Interactions Between Synthetic Video Traffic Source and
Other Components at the Sender . . . . . . . . . . . . . . . 4 Other Components at the Sender . . . . . . . . . . . . . . . 4
5. A Statistical Reference Model . . . . . . . . . . . . . . . . 6 5. A Statistical Reference Model . . . . . . . . . . . . . . . . 6
5.1. Time-damped response to target rate update . . . . . . . 7 5.1. Time-damped response to target rate update . . . . . . . 7
5.2. Temporary burst and oscillation during transient . . . . 8 5.2. Temporary burst and oscillation during the transient
period . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.3. Output rate fluctuation at steady state . . . . . . . . . 8 5.3. Output rate fluctuation at steady state . . . . . . . . . 8
5.4. Rate range limit imposed by video content . . . . . . . . 9 5.4. Rate range limit imposed by video content . . . . . . . . 9
6. A Trace-Driven Model . . . . . . . . . . . . . . . . . . . . 9 6. A Trace-Driven Model . . . . . . . . . . . . . . . . . . . . 9
6.1. Choosing the video sequence and generating the traces . . 10 6.1. Choosing the video sequence and generating the traces . . 10
6.2. Using the traces in the synthetic codec . . . . . . . . . 11 6.2. Using the traces in the synthetic codec . . . . . . . . . 11
6.2.1. Main algorithm . . . . . . . . . . . . . . . . . . . 11 6.2.1. Main algorithm . . . . . . . . . . . . . . . . . . . 11
6.2.2. Notes to the main algorithm . . . . . . . . . . . . . 13 6.2.2. Notes to the main algorithm . . . . . . . . . . . . . 13
6.3. Varying frame rate and resolution . . . . . . . . . . . . 13 6.3. Varying frame rate and resolution . . . . . . . . . . . . 13
7. Combining The Two Models . . . . . . . . . . . . . . . . . . 14 7. Combining The Two Models . . . . . . . . . . . . . . . . . . 14
8. Implementation Status . . . . . . . . . . . . . . . . . . . . 15 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 15
skipping to change at page 3, line 42 skipping to change at page 3, line 44
A live video encoder employs encoder rate control to meet a target A live video encoder employs encoder rate control to meet a target
rate by varying its encoding parameters, such as quantization step rate by varying its encoding parameters, such as quantization step
size, frame rate, and picture resolution, based on its estimate of size, frame rate, and picture resolution, based on its estimate of
the video content (e.g., motion and scene complexity). In practice, the video content (e.g., motion and scene complexity). In practice,
however, several factors prevent the output video rate from perfectly however, several factors prevent the output video rate from perfectly
conforming to the input target rate. conforming to the input target rate.
Due to uncertainties in the captured video scene, the output rate Due to uncertainties in the captured video scene, the output rate
typically deviates from the specified target. In the presence of a typically deviates from the specified target. In the presence of a
significant change in target rate, it sometimes takes several frames significant change in target rate, the encoder output frame sizes
before the encoder output rate converges to the new target. Finally, sometimes fluctuates for a short, transient period of time before the
while most of the frames in a live session are encoded in predictive output rate converges to the new target. Finally, while most of the
mode, the encoder can occasionally generate a large intra-coded frame frames in a live session are encoded in predictive mode, the encoder
(or a frame partially containing intra-coded blocks) in an attempt to can occasionally generate a large intra-coded frame (or a frame
recover from losses, to re-sync with the receiver, or during the partially containing intra-coded blocks) in an attempt to recover
transient period of responding to target rate or spatial resolution from losses, to re-sync with the receiver, or during the transient
changes. period of responding to target rate or spatial resolution changes.
Hence, a synthetic video source should have the following Hence, a synthetic video source should have the following
capabilities: capabilities:
o To change bitrate. This includes ability to change framerate and/ o To change bitrate. This includes ability to change framerate and/
or spatial resolution, or to skip frames when required. or spatial resolution, or to skip frames when required.
o To fluctuate around the target bitrate specified by the congestion o To fluctuate around the target bitrate specified by the congestion
control module. control module.
skipping to change at page 5, line 15 skipping to change at page 5, line 15
Section 6 --- follow the same set of interactions. Section 6 --- follow the same set of interactions.
The synthetic video source dynamically generates a sequence of dummy The synthetic video source dynamically generates a sequence of dummy
video frames with varying size and interval. These dummy frames are video frames with varying size and interval. These dummy frames are
processed by other modules in order to transmit the video stream over processed by other modules in order to transmit the video stream over
the network. During the lifetime of a video transmission session, the network. During the lifetime of a video transmission session,
the synthetic video source will typically be required to adapt its the synthetic video source will typically be required to adapt its
encoding bitrate, and sometimes the spatial resolution and frame encoding bitrate, and sometimes the spatial resolution and frame
rate. rate.
In our model, the synthetic video source module has a group of In this model, the synthetic video source module has a group of
incoming and outgoing interface calls that allow for interaction with incoming and outgoing interface calls that allow for interaction with
other modules. The following are some of the possible incoming other modules. The following are some of the possible incoming
interface calls --- marked as (a) in Figure 1 --- that the synthetic interface calls --- marked as (a) in Figure 1 --- that the synthetic
video traffic source may accept. The list is not exhaustive and can video traffic source may accept. The list is not exhaustive and can
be complemented by other interface calls if deemed necessary. be complemented by other interface calls if deemed necessary.
o Target rate R_v: target rate request, typically calculated by the o Target rate R_v: target rate request, typically calculated by the
congestion control module and updated dynamically over time. congestion control module and updated dynamically over time.
Depending on the congestion control algorithm in use, the update Depending on the congestion control algorithm in use, the update
requests can either be periodic (e.g., once per second), or on- requests can either be periodic (e.g., once per second), or on-
skipping to change at page 7, line 14 skipping to change at page 7, line 14
+===========+====================================+================+ +===========+====================================+================+
| Notation | Parameter Name | Example Value | | Notation | Parameter Name | Example Value |
+===========+====================================+================+ +===========+====================================+================+
| R_v | Target rate request | 1 Mbps | | R_v | Target rate request | 1 Mbps |
+-----------+------------------------------------+----------------+ +-----------+------------------------------------+----------------+
| FPS | Target frame rate | 30 Hz | | FPS | Target frame rate | 30 Hz |
+-----------+------------------------------------+----------------+ +-----------+------------------------------------+----------------+
| tau_v | Encoder reaction latency | 0.2 s | | tau_v | Encoder reaction latency | 0.2 s |
+-----------+------------------------------------+----------------+ +-----------+------------------------------------+----------------+
| K_d | Burst duration during transient | 8 frames | | K_d | Burst duration of the transient | 8 frames |
| | period | |
+-----------+------------------------------------+----------------+ +-----------+------------------------------------+----------------+
| K_B | Burst frame size during transient | 13.5 KBytes* | | K_B | Burst frame size during the | 13.5 KBytes* |
| | transient period | |
+-----------+------------------------------------+----------------+ +-----------+------------------------------------+----------------+
| t0 | Reference frame interval 1/FPS | 33 ms | | t0 | Reference frame interval 1/FPS | 33 ms |
+-----------+------------------------------------+----------------+ +-----------+------------------------------------+----------------+
| B0 | Reference frame size R_v/8/FPS | 4.17 KBytes | | B0 | Reference frame size R_v/8/FPS | 4.17 KBytes |
+-----------+------------------------------------+----------------+ +-----------+------------------------------------+----------------+
| | Scaling parameter of the zero-mean | | | | Scaling parameter of the zero-mean | |
| | Laplacian distribution describing | | | | Laplacian distribution describing | |
| SCALE_t | deviations in normalized frame | 0.15 | | SCALE_t | deviations in normalized frame | 0.15 |
| | interval (t-t0)/t0 | | | | interval (t-t0)/t0 | |
+-----------+------------------------------------+----------------+ +-----------+------------------------------------+----------------+
skipping to change at page 8, line 7 skipping to change at page 8, line 8
5.1. Time-damped response to target rate update 5.1. Time-damped response to target rate update
While the congestion control module can update its target rate While the congestion control module can update its target rate
request R_v at any time, the statistical model dictates that the request R_v at any time, the statistical model dictates that the
encoder will only react to such changes tau_v seconds after a encoder will only react to such changes tau_v seconds after a
previous rate transition. In other words, when the encoder has previous rate transition. In other words, when the encoder has
reacted to a rate change request at time t, it will simply ignore all reacted to a rate change request at time t, it will simply ignore all
subsequent rate change requests until time t+tau_v. subsequent rate change requests until time t+tau_v.
5.2. Temporary burst and oscillation during transient 5.2. Temporary burst and oscillation during the transient period
The output rate R_o during the period [t, t+tau_v] is considered to The output rate R_o during the period [t, t+tau_v] is considered to
be in transient. Based on observations from video encoder output be in a transient state. Based on observations from video encoder
data, the transient behavior of an encoder upon reacting to a new output data, the encoder reaction to a new target rate request can be
target rate request is modelled in the form of high variation in characterized by high variation in output frame sizes. It is assumed
output frame sizes. It is assumed that the overall average output in the model that the overall average output rate R_o during this
rate R_o during this period matches the target rate R_v. transient period matches the target rate R_v. Consequently, the
Consequently, the occasional burst of large frames are followed by occasional burst of large frames are followed by smaller-than-average
smaller-than-average encoded frames. encoded frames.
This temporary burst is characterized by two parameters: This temporary burst is characterized by two parameters:
o burst duration K_d: number of frames in the burst event; and o burst duration K_d: number of frames in the burst event; and
o burst frame size K_B: size of the initial burst frame which is o burst frame size K_B: size of the initial burst frame which is
typically significantly larger than average frame size at steady typically significantly larger than average frame size at steady
state. state.
It can be noted that these burst parameters can also be used to mimic It can be noted that these burst parameters can also be used to mimic
skipping to change at page 10, line 24 skipping to change at page 10, line 24
are representative of the target use cases for the video traffic are representative of the target use cases for the video traffic
model. For the example use case of interactive video conferencing, model. For the example use case of interactive video conferencing,
it is recommended to choose a low-motion sequence that resembles a it is recommended to choose a low-motion sequence that resembles a
"talking head", e.g. from a news broadcast or recording of an actual "talking head", e.g. from a news broadcast or recording of an actual
video conferencing call. video conferencing call.
The length of the chosen video sequence is a tradeoff. If it is too The length of the chosen video sequence is a tradeoff. If it is too
long, it will be difficult to manage the data structures containing long, it will be difficult to manage the data structures containing
the traces. If it is too short, there will be an obvious periodic the traces. If it is too short, there will be an obvious periodic
pattern in the output frame sizes, leading to biased results when pattern in the output frame sizes, leading to biased results when
evaluating congestion control performance. In our experience, a evaluating congestion control performance. It has been empirically
sequence with a length between 2 and 4 minutes is a fair tradeoff. determined that a sequence with a length between 2 and 4 minutes
strikes a fair tradeoff.
Given the chosen raw video sequence, denoted S, one can use a live Given the chosen raw video sequence, denoted S, one can use a live
encoder, e.g. some implementation of [H264] or [HEVC], to produce a encoder, e.g. some implementation of [H264] or [HEVC], to produce a
set of encoded sequences. As discussed in Section 3, the output set of encoded sequences. As discussed in Section 3, the output
bitrate of the live encoder can be achieved by tuning three input bitrate of the live encoder can be achieved by tuning three input
parameters: quantization step size, frame rate, and picture parameters: quantization step size, frame rate, and picture
resolution. In order to simplify the choice of these parameters for resolution. In order to simplify the choice of these parameters for
a given target rate, one can typically assume a fixed frame rate a given target rate, one can typically assume a fixed frame rate
(e.g. 30 fps) and a fixed resolution (e.g., 720p) when configuring (e.g. 30 fps) and a fixed resolution (e.g., 720p) when configuring
the live encoder. See Section 6.3 for a discussion on how to relax the live encoder. See Section 6.3 for a discussion on how to relax
skipping to change at page 13, line 14 skipping to change at page 13, line 14
factor = R_v / R_min factor = R_v / R_min
framesize = max(1, factor * Traces[R_min][t_current]) framesize = max(1, factor * Traces[R_min][t_current])
c) R_v >= R_max: the output frame size is calculated by scaling with c) R_v >= R_max: the output frame size is calculated by scaling with
respect to the highest bitrate R_max: respect to the highest bitrate R_max:
factor = R_v / R_max factor = R_v / R_max
framesize = factor * Traces[R_max][t_current] framesize = factor * Traces[R_max][t_current]
In case b), we set the minimum output size to 1 byte, since the value In case b), the minimum output size is set to 1 byte, since the value
of factor can be arbitrarily close to 0. of factor can be arbitrarily close to 0.
6.2.2. Notes to the main algorithm 6.2.2. Notes to the main algorithm
Note that main algorithm as described above can be further extended Note that main algorithm as described above can be further extended
to mimic some additional typical behaviors of a live video encoder. to mimic some additional typical behaviors of a live video encoder.
Two examples are given below: Two examples are given below:
o I-frames on demand: The synthetic codec can be extended to o I-frames on demand: The synthetic codec can be extended to
simulate the sending of I-frames on demand, e.g., as a reaction to simulate the sending of I-frames on demand, e.g., as a reaction to
skipping to change at page 14, line 42 skipping to change at page 14, line 42
whereas it is straightforward for a trace-driven model to obtain whereas it is straightforward for a trace-driven model to obtain
encoded frame size data. On the other hand, once validated, the encoded frame size data. On the other hand, once validated, the
statistical model is more flexible in mimicking a wide range of statistical model is more flexible in mimicking a wide range of
encoder/content behaviors by simply varying the correponding encoder/content behaviors by simply varying the correponding
parameters in the model. In this regard, a trace-driven model relies parameters in the model. In this regard, a trace-driven model relies
-- by definition -- on additional data collection efforts for -- by definition -- on additional data collection efforts for
accommodating new codecs or video contents. accommodating new codecs or video contents.
In general, the trace-driven model is more realistic for mimicking In general, the trace-driven model is more realistic for mimicking
ongoing, steady-state behavior of a video traffic source whereas the ongoing, steady-state behavior of a video traffic source whereas the
statistical model is more versatile for simulating transient events statistical model is more versatile for simulating its transient-
(e.g., when target rate changes from A to B with temporary bursts state behavior such as a sudden rate change. It is also possible to
during the transition). It is also possible to combine both models combine both methods into a hybrid model, so that the steady-state
into a hybrid approach, using traces during steady-state and behavior is driven by traces during steady-state and the transient-
statistical model during transients. state behavior is driven by the statistical model.
+---------------+ transient +---------------+
transient | Generate next | state | Generate next |
+------>| K_d transient | +------>| K_d transient |
+-------------+ / | frames | +-------------+ / | frames |
R_v | Compare | / +---------------+ R_v | Compare | / +---------------+
------->| against |/ ------->| against |/
| previous | | previous |
| target rate |\ | target rate |\
+-------------+ \ +---------------+ +-------------+ \ +---------------+
\ | Generate next | \ | Generate next |
+------>| frame from | +------>| frame from |
steady-state | trace | steady | trace |
+---------------+ state +---------------+
Figure 3: Hybrid approach for modeling video traffic Figure 3: A hybrid video traffic model
As shown in Figure 3, the video traffic model operates in transient As shown in Figure 3, the video traffic model operates in transient
state if the requested target rate R_v is substantially higher than state if the requested target rate R_v is substantially higher than
the previous target, or else it operates in steady state. During the previous target, or else it operates in steady state. During the
transient state, a total of K_d frames are generated by the transient state, a total of K_d frames are generated by the
statistical model, resulting in one (1) big burst frame with size K_B statistical model, resulting in one (1) big burst frame with size K_B
followed by K_d-1 smaller frames. When operating at steady-state, followed by K_d-1 smaller frames. When operating at steady-state,
the video traffic model simply generates a frame according to the the video traffic model simply generates a frame according to the
trace-driven model given the target rate, while modulating the frame trace-driven model given the target rate, while modulating the frame
interval according to the distribution specified by the statistical interval according to the distribution specified by the statistical
model. One example criterion for determining whether the traffic model. One example criterion for determining whether the traffic
model should operate in transient state is whether the rate increase model should operate in transient state is whether the rate increase
exceeds 10% of previous target rate. Finally, as this model follows exceeds 10% of previous target rate. Finally, as this model follows
transient state behavior dictated by the statistical model, upon a transient state behavior dictated by the statistical model, upon a
 End of changes. 18 change blocks. 
38 lines changed or deleted 42 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/