[Docs] [txt|pdf|xml|html] [Tracker] [Email] [Nits]
Versions: 00 01
Audio/Video Transport Working G. Hunt
Group P. Arden
Internet-Draft BT
Intended status: Informational July 7, 2008
Expires: January 8, 2009
Monitoring Architectures for RTP
draft-hunt-avt-monarch-00.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 8, 2009.
Hunt & Arden Expires January 8, 2009 [Page 1]
Internet-Draft RTP Monitoring Architectures July 2008
Abstract
This memo is intended to stimulate discussion on a hierarchical
monitoring architecture for RTP, including a scheme for the
definition of lower-layer metrics which are usable by a range of
applications. Systematic investigation of a monitoring architecture
for RTP/RTCP was requested at the IETF71 (Philadelphia) AVT session.
This first version of the draft is restricted to transport metrics
and to a subset of audio application metrics, but it is envisaged
that future work should extend this to other applications,
principally video.
Table of Contents
1. Requirements notation . . . . . . . . . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Transport layer metrics . . . . . . . . . . . . . . . . . . . 8
3.1. Option 1 - Monitoring every packet . . . . . . . . . . . . 8
3.2. Option 2 - Real-time histogram methods . . . . . . . . . . 10
3.3. Option 3 - Monitoring by exception . . . . . . . . . . . . 11
3.4. Option 4 - Application-specific monitoring . . . . . . . . 12
4. RTP terminal metrics . . . . . . . . . . . . . . . . . . . . . 13
5. Application layer metrics . . . . . . . . . . . . . . . . . . 14
5.1. Requirements for speech quality monitoring metrics . . . . 14
5.2. The audio hierarchy . . . . . . . . . . . . . . . . . . . 16
5.3. Individual network transport and terminal parameters
affecting speech quality . . . . . . . . . . . . . . . . . 16
5.4. Composite objective speech quality metrics . . . . . . . . 18
6. Choosing transport protocols for metrics . . . . . . . . . . . 23
6.1. RTCP as a transport for metrics - advantages and
disadvantages . . . . . . . . . . . . . . . . . . . . . . 23
6.1.1. Advantages of RTCP . . . . . . . . . . . . . . . . . . 23
6.1.2. Disadvantages of RTCP . . . . . . . . . . . . . . . . 24
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25
8. Security Considerations . . . . . . . . . . . . . . . . . . . 26
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 27
10. Informative References . . . . . . . . . . . . . . . . . . . . 28
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30
Intellectual Property and Copyright Statements . . . . . . . . . . 31
Hunt & Arden Expires January 8, 2009 [Page 2]
Internet-Draft RTP Monitoring Architectures July 2008
1. Requirements notation
This memo is informative and as such contains no normative
requirements.
Hunt & Arden Expires January 8, 2009 [Page 3]
Internet-Draft RTP Monitoring Architectures July 2008
2. Introduction
The development of multiple metrics for transport and application
quality monitoring has been identified as a potential problem for
RTP/RTCP interoperability. The AVT group has requested work on an
architectural framework for monitoring which recognises that
different applications layered on RTP may have some monitoring
requirements in common, which should be satisfied by a common design.
When this work was initiated, the objective was to design a framework
and a small number of re-usable metrics at each appropriate layer to
reduce implementation costs and to maximise inter-operability. Since
then, work-in-progress on [GUIDELINES] has stated that RTCP should be
used primarily to provide information to peer RTP systems, whilst
information used for network management should be carried by out-of-
band protocols. By implication, AVT should not work on metrics or
their transport in RTCP unless they are motivated by RTP-system-to-
RTP-system requirements. However, metrics supporting network and
service management are still required for RTP and the applications
transported over it, to support many significant real-world
deployments.
Service providers may wish to answer some or all of the following:
o is a user experiencing a problem?
o what is the nature of the problem?
o how severe is the problem?
o what is the location of the problem?
Metrics of transport performance and application performance,
considered either on an isolated per-session basis or as a collection
of metrics for multiple sessions using a common network component,
can answer or contribute to answers to some or all of these
questions.
One example which might lead to a shared metric arises from a shared
requirement for monitoring of packet transport, which might be useful
for every media type (audio, video, text, messaging) carried over
RTP.
Another example is the set of applications all of which transmit
audio, including streaming audio speech, streaming music, two-party
conversational speech, and audio conferencing. This set of
applications might be able to share a suitably defined set of audio
metrics, e.g. for parameters such as noise floor, mean level, or
amplitude clipping. The subset of interactive speech applications
Hunt & Arden Expires January 8, 2009 [Page 4]
Internet-Draft RTP Monitoring Architectures July 2008
may be able to use common additional metrics related to interactivity
(e.g. media delay and echo) which are not applicable to all audio
applications. Some or all of these audio metrics may be applicable
to the audio channel(s) of a video application, such as IP TV or
conversational video.
[Editor's note: need to add a video-based view and examples]
Metrics of RTP transport performance usually relate to single packet
network segments, whilst metrics of application performance are more
likely to represent the end-to-end connection which may include
transmission over non-packet networks and/or over multiple packet
networks. Access to, and integration of, multiple sets of packet
transport metrics relevant to a single connection typically present
difficulties in current networks.
Metrics are typically measured in an RTP systems but may be required
at another RTP system or at a non-RTP system. Hence transport of
metrics is often required. Metrics might be transported alongside
RTP media using the extensibility mechanism defined in [RFC3611] but
this is not an input requirement. Other methods may be used if RTCP
XR blocks are not suitable or another method offers significant
technical advantages. Following the work-in-progress in [GUIDELINES]
which restricts the usage of RTCP, the method for transporting
metrics need not be RTCP and should be chosen independently of the
metrics themselves. If the transport is not by RTCP, it is likely
that multiple transport mechanisms should be permitted, and probably
should not be restricted by AVT.
For transport metrics, IETF and other SDOs have defined metrics.
There is a wide choice of potentially useful metrics. Some metrics
may embed arbitrary design choices, or be application-specific. It
is a goal of this work to find generic and re-usable metrics. This
may result in a preference for some of the existing metrics over
others, or to the definition of alternative metrics meeting the
architectural goals of this work.
For metrics at layers higher than transport, metrics are developed by
a variety of external SDOs, e.g. by ITU-T for voice telephony
applications.
The development of application metrics is an active field. Any
framework should be extensible to accommodate useful innovations when
there is a consensus for their adoption.
It is obviously desirable to achieve some consensus (the more, the
better) on a set of useful metrics (the fewer, the better) which may
be widely implemented, widely inter-operated, and widely understood.
Hunt & Arden Expires January 8, 2009 [Page 5]
Internet-Draft RTP Monitoring Architectures July 2008
Large data sets of raw measurements must be condensed into a smaller
set of metrics or statistics before any agent (human or machine) can
make decisions based on them. It has been suggested that AVT might
remain "metric-neutral" by storing and transporting raw measurement
data, rather than the condensed metrics (see Option 1 below). Even
if data volumes are sufficiently small to make this feasible, some
layer must perform the condensation and hence commit to specific
metrics.
A four-step process is suggested. The AVT community may wish to
contribute to some of these steps.
1. Choose a set of metrics which is useful for each application.
2. Classify each member of the sets of metrics according to the
architectural layer which they monitor, creating sets of per-
application, per-layer metrics.
3. Define a set of required metrics at each layer as the union of
the application-specific sets in each layer. This should include
the selection of only one from any group of metrics with
overlapping or nearly-overlapping capabilities, leading to agreed
sets of per-layer metrics. All of these metrics should be
available within the architecture, but each application may
select a subset which meets its needs. Most RTP end systems and
RTP mixers implement only a subset of possible RTP applications,
and clearly these devices need not implement any metric which is
relevant only to applications which they do not support.
4. Choose one or more transport protocols for those cases where
metrics are measured at one location but must be available at
another, e.g. to cause a reaction in an RTP system's peer, or for
network or service management purposes.
The fourth question seems at first sight to be of secondary
importance ("We've chosen our metrics, now all we have to do is to
transport them") but the choice of transport protocols may be tightly
constrained, for example because the measuring point has limited
performance and/or limited access bandwidth and/or is in a different
trust domain.
Section 3 describes some options for metrics of transport
performance. This includes an initial quantitative investigation of
the feasibility of becoming "metric-neutral" by sending raw
measurement data rather than condensed metrics.
Section 5 starts the process of describing requirements for
application-layer monitoring and the metrics frameworks available to
Hunt & Arden Expires January 8, 2009 [Page 6]
Internet-Draft RTP Monitoring Architectures July 2008
meet them. In this first version of the draft, the description is
limited to interactive speech and takes most of its material from the
work of ITU-T.
Section 6 discusses the choice of transport protocols, including
discussion of the merits of RTCP which remains a candidate protocol.
Hunt & Arden Expires January 8, 2009 [Page 7]
Internet-Draft RTP Monitoring Architectures July 2008
3. Transport layer metrics
The objective is to provide a set of metrics which characterise the
three transport impairments of packet loss, packet delay, and packet
delay variation. These metrics should be usable by any application
which uses RTP transport.
3.1. Option 1 - Monitoring every packet
Most transport metrics, almost by definition, condense a large amount
of information about packet arrivals into a small number of
statistics. Usually, the aim of the statistics is to present key
features of any transport impairments in ways which are readily
understood by the operators of the network, with the minimum of
distracting additional information. Unfortunately there are multiple
ways to condense data about packet arrivals, and the "key features"
(those impairments which result in degraded application performance)
are likely to be application-dependent. Given this, it is not
surprising that there are no known provably optimal metrics for the
three transport impairments. There are instead multiple heuristic
metrics.
The aim of "monitoring every packet" is to ensure that the
information reported is not dependent on the application. In this
scheme, RTP systems will report arrival data for each individual RTP
packet. RTP (or other) systems receiving this "raw" data may use it
to calculate any preferred heuristic metrics, but such calculations
and the reporting of the results (e.g. to a session control layer or
a management layer) are outside the scope of RTP and RTCP.
Run-length encoding (RLE) is a well-known technique for compressing
per-packet information about packet loss. The efficiency of RLE
compression is reduced as the packet loss fraction increases, leading
to unpredictable metrics data.
If packet round-trip delay is measured using the technique described
in [RFC3550] section 6.4.1 and [RFC3550] Figure 2, the rate of
measurement is low (at most one measurement per RTCP measurement
cycle) and the volume of data involved in reporting the result is
insignificant.
There are no obvious techniques for substantial compression of data
related to the arrival times of individual packets, but such data is
needed to compute packet delay variation. Hence it appears that an
item of data must be sent per packet, if packet delay variation is to
be calculated from "raw" data.
The following calculation estimates the volume of data needed to send
Hunt & Arden Expires January 8, 2009 [Page 8]
Internet-Draft RTP Monitoring Architectures July 2008
per-packet data, assuming a simple logarithmic scheme to code the
delay variation.
Consider the raw delay variation metric D(1,j) using the notation of
[RFC3550] section 6.4.1. If delay variation, relative to that of the
first packet of the connection, is measured in RTP timestamp units,
delay could be coded on a compressed "logarithmic" scale similar to
G.711 A-law, which can code with a resolution of 1 unit on the
uncompressed chord, and resolutions 2, 4, 8, 16, 32, 64 on each
successive more compressed chord to give a range of +/- 2048. This
would correspond to +/- 2048/8000s ~ +/- 250ms for 8kHz sampled
speech (enough to cover jitter), whilst using 1 byte per packet.
Modifications would be needed for other sampling rates. It might be
necessary to standardise a timing unit resolution independent of the
sampling clock. Specific reserved values could be used to indicate
that an expected packet did not arrive.
To estimate data volume, consider a low-bandwidth codec like G.729
with 20ms packetisation. Over a 5s RTCP cycle there will be 250
media packets and 102 bytes/packet (20ms G.729 in RTP/UDP/IP/Ethernet
including preamble and Inter-Frame Gap) for a total media layer-2
bandwidth of 25500 bytes/5s (about 40kbit/s). 1 byte per received
packet is 250 bytes "raw data" and an overhead of 82 bytes (RTP/UDP/
IP/Ethernet, same basis) - say 350 bytes total including some
identification (SSRCs etc). This is a fraction 350/25500~1.4% which
is within RTP guidelines for RTCP bandwidth. The corresponding
calculation for G.711 with 10ms packetisation is 81000 bytes/5s media
and a 600-byte "raw transport report" or 0.75%.
However, the use of D(i,j) [RFC3550] for estimation of packet delay
variation relies on a fixed relationship in the source RTP system
between the RTP timestamp and the transmission time of the packet
onto the wire. This fixed relationship is not guaranteed even for
audio coding and is almost certainly significantly wrong for many
video formats, where the RTP timestamp indicates the sampling instant
of a frame which may be encoded into multiple packets sent at
significantly different times throughout a frame interval. It could
be argued that the current RTP framework provides no means for
reliable estimation of packet delay variation in general, despite the
usefulness of the D(i,j) metric for simple audio streams. This could
lead to a conclusion that an RTP-based measure of packet delay
variation is not re-usable across RTP applications other than simple
VoIP codecs.
Logically, digital signal processors (DSPs) would be used to
calculate metrics, including the per-packet data described above.
Current advice is that an additional overhead of 600 bytes per
channel is needed to store measurement results before periodic
Hunt & Arden Expires January 8, 2009 [Page 9]
Internet-Draft RTP Monitoring Architectures July 2008
transmission, and as such, the per-channel-memory required to support
this option will increase memory requirements on infrastructure
devices. As memory solutions in currently deployed infrastructure
gateways are sized for optimum performance, cost and power, adding
this measurement function would result in a reduction of channel
density which of course ultimately impacts cost and power. Including
additional memory in future designs of course has the same cost and
power impacts.
The principle that RTP systems should send per-packet reception
report data, and correspondingly that the RTP (or other) system
receiving this report data should calculate the metrics of its choice
from this data, results in a requirement for computation both at the
RTP system which sends the per-packet report and at the RTP (or
other) system which receives the report. If DSPs are used to perform
this computation in the system which receives the report, there is a
further demand on the memory of the DSP devices involved. If
general-purpose computing devices are used, then the cost of these
devices may be significant. For example, for a 16000 channel trunk
media gateway implementing the scheme above and using 10ms
packetisation, the gateway must code or decode a total of 3200000
bytes of data per second.
Note that this general method of supplying raw data from the RTP
system is the only one which gives the system which receives the data
the flexibility to calculate any chosen transport metric for upward
reporting. All other methods below either omit or condense data,
such that the RTP (or other) system receiving the report is informed
only about certain aspects of the transport performance which was
measured at the remote RTP system. However the method does not
report on the impairment to far-end application that the impairment
to outgoing transport caused. For example, it provides no
information about far-end jitter buffer events or late packets deemed
lost by the application. This is considered further in Section 4
below.
3.2. Option 2 - Real-time histogram methods
There are several potentially useful metrics which rely on the
accumulation of a histogram in real time, so that a packet arrival
results in a counter being incremented rather than in the creation of
a new data item. These metrics may be gathered with a low and
predictable storage requirement. Each counter corresponds to a
single class interval or "bin" of the histogram. Examples of metrics
which may be accumulated in this way include the observed
distribution of packet delay variation, and the number of packets
lost per unit time interval.
Hunt & Arden Expires January 8, 2009 [Page 10]
Internet-Draft RTP Monitoring Architectures July 2008
Different networks may have very different expected and achieved
levels of performance, but it may be useful to fix the number of
class intervals in the reported histogram to give a predictable
volume of data. This can be achieved by starting with small class
intervals ("bin widths") and automatically increasing the width (e.g.
by factors of two) if outliers are seen beyond the current upper
limit of the histogram. Data already accumulated may be assigned
unambiguously to the new set of bins, given some simple conditions on
the relationship between the old and new origins and bin widths.
A significant disadvantage of the histogram method is the loss of any
information about time-domain correlations between the samples which
build the histogram. For example, a histogram of packet delay
variation provides no indication of whether successive samples of
packet delay variation were uncorrelated, or alternatively that the
packet delay variation showed a highly-correlated low-frequency
wander.
3.3. Option 3 - Monitoring by exception
An entity which both monitors the packet stream, and has sufficient
knowledge of the application to know when transport impairments may
have degraded the application's performance, may choose to send
exception reports containing details of the transport impairments to
a receiving system. The crossing of a transport impairment
threshold, or some application-layer event, would trigger such
reports. RTP end systems and mixers are likely to contain
application implementations which may, in principle, identify this
type of exception.
It is likely that RTP translators will not contain suitable
implementations which could identify such exceptions.
On-path devices such as routers and switches are not likely to be
aware of RTP at all. Even if they are aware of RTP, they are
unlikely to be aware of the RTP-level performance required by
specific applications, and hence they are unlikely to be able to
identify the level of impairment at which exceptional transport
conditions may start to affect application performance.
This type of monitoring typically requires the storage of recent data
in a FIFO (e.g. a circular buffer) so that data relevant to the
period just before and just after the exception may be reported. It
is not usually helpful to report transport data only from the period
following an exception event detected by an application. This
imposes some storage requirement (though less than needed for Option
1). It also implies the existence of additional cross-layer
primitives or APIs to trigger the transport layer to generate and
Hunt & Arden Expires January 8, 2009 [Page 11]
Internet-Draft RTP Monitoring Architectures July 2008
send its exception report. Such a capability might be considered
architecturally undesirable, in that it complicates one or more
interfaces above the RTP layer.
3.4. Option 4 - Application-specific monitoring
This is a business-as-usual option which suggests that the current
approach should not be changed, based on the idea that previous
application-specific approaches such as that of [RFC3611] were valid.
If a large category of RTP applications (such as VoIP) has a
requirement for a unique set of transport metrics, arising from its
different requirements of the transport, then it seems reasonable for
each application category to define its preferred set of metrics to
describe transport impairments. We expect that there will be few
such categories, probably less than 10.
It may be easier to achieve interworking for a well-defined set of
application-specific metrics than it would be in the case that
applications select a profile from a palette of many independent re-
usable metrics.
Hunt & Arden Expires January 8, 2009 [Page 12]
Internet-Draft RTP Monitoring Architectures July 2008
4. RTP terminal metrics
By "RTP terminal metrics" we mean metrics relating to the way a
terminal deals with transport impairments affecting the incident RTP
stream. These may include de-jitter buffering, packet loss
concealment, and the use of redundant streams (if any) for correction
of error or loss.
An examples of such a metric is a count of packets arriving too late
to be played out at current de-jitter buffer settings.
Hunt & Arden Expires January 8, 2009 [Page 13]
Internet-Draft RTP Monitoring Architectures July 2008
5. Application layer metrics
5.1. Requirements for speech quality monitoring metrics
RTP transport can be used for different application types such as IP
(including public internet) and non-IP. It can also apply to
different user group sizes running over networks ranging in size from
a small closed user group through an enterprise system to national
and international networks. Engineering judgment is required to
choose the most suitable set of speech quality monitoring metrics for
the type of application and the size of the network the application
is running on. Some metrics are more suitable for monitoring service
level agreements (SLAs), others may be required for regular routine
monitoring, and still others may be required for fault diagnosis.
The resolution of the metrics may also be different for different
types of monitoring. These considerations make it difficult to
propose a "one size fits all" set of metrics. However some general
points can be made and it is also useful to propose a minimum set of
metrics.
Mean Opinion Score (MOS) speech quality metrics such as MOS-LQO for
listening quality and MOS-CQO for conversation quality (see later
section for further discussion of MOS metrics) are useful for
measuring end-to-end speech quality. However they typically require
significant time and processing power to produce a result and some
MOS-LQO test methods require test calls that consume bandwidth. This
rules out MOS metrics for frequent large-scale monitoring. Also
methods for measuring conversational MOS are not yet mature enough
for VoIP monitoring applications, even although many vendors are
using an E-model [G.107] approach in the absence of anything else.
This only leaves MOS-LQO as an overall composite speech quality
metric, and, being a listening-only metric, it does not take account
of interactive effects such as fixed delay and echo. However, MOS-
LQO is often used for SLAs and usually provides a better estimate of
what a user actually experiences, than a single network or terminal
metric or a group of such metrics. However, a poor MOS score by
itself gives little indication of the cause of a problem, and further
metrics are required for diagnostic purposes.
A proposed minimum set of metrics with suggested resolutions is as
follows:
Hunt & Arden Expires January 8, 2009 [Page 14]
Internet-Draft RTP Monitoring Architectures July 2008
+----------------------------------+------------+--------------+
| Metric | Resolution | Range |
+----------------------------------+------------+--------------+
| MOS-LQO | 0.1 MOS | 1 to 5 |
| | | |
| Received speech level | 0.1 dB | -60 to +10 |
| | | |
| Received noise level | 0.5 dB | -130 to +10 |
| | | |
| Echo return loss | 0.1 dB | 6 to 40 |
| | | |
| Round trip delay | 1 ms | 1 ms to 65 s |
| | | |
| Packet delay variation or jitter | 1 ms | 1 ms to 65 s |
| | | |
| Packet loss | 1 packet | 0 to 2^24 |
+----------------------------------+------------+--------------+
Table 1
[Editor's note: More detail required here in a future draft to add
information about meaningful measurement durations and whether
measurements should include mean and peak values etc. Also require
some discussion around "second level" metrics such as jitter buffer
parameters for diagnosis of more complicated problems.]
Note that some voiceband data applications running over the same
transport network as voice applications may require much lower values
of packet loss and packet delay variation than would be required for
voice applications alone.
A reporting system for these metrics should be capable of
accommodating intermediate network and terminal parameters as well as
end-to-end quality metrics for both monitoring and diagnostic
purposes.
This minimum set of metrics should allow a wide range of problems to
be diagnosed particularly if metrics are available at intermediate
points in the network as well as at the endpoints. Echo return loss
and delay can be used to establish whether echo is a problem (which
would not affect the MOS-LQO score as this is a listening only
measurement). Poor MOS-LQO scores could be caused by several
factors, but individual measures of packet loss, jitter and noise
levels could be used to establish the presence or absence of these
degradations. Finally, the level of received speech gives an
indication of whether the operating point is correct and whether
possible distortion or poor signal-to-noise are causing problems.
Hunt & Arden Expires January 8, 2009 [Page 15]
Internet-Draft RTP Monitoring Architectures July 2008
The codec type will often be known and this can also be very useful
for diagnostic purposes if information about typical MOS scores and
susceptibility to packet loss is known for example. Knowledge of
network topology is also very useful and can give an indication of
possible bandwidth bottlenecks for example.
5.2. The audio hierarchy
The audio hierarchy can be broadly split into listening (one-way) and
conversation (two-way, or multi-way conferencing) applications.
These categories can be further split as shown in Figure 1. In
addition, ITU-T has defined a number of bandwidth categories;
narrowband (300 to 3400 Hz), wideband (50 to 7000 Hz), super wideband
(50 to 14000 Hz) and full band (20 to 20,000 Hz).
Audio
|
----------------------
Listening Conversation
| |
------------- -----------
| | | |
Streaming Non-streaming Two-way Conferencing
|
-------------
| |
Non-spatial Spatial
Figure 1: The audio hierarchy
The following sections concentrate on one-way (listening only) and
two-way (conversational) telephony applications, for which several
composite speech quality metrics exist in ITU-T Recommendations.
Similar considerations could apply to other applications such as
conferencing and this should be addressed in further drafts.
Suitable metrics for spatial conferencing are more difficult to
derive at this stage since the technology is still relatively new.
5.3. Individual network transport and terminal parameters affecting
speech quality
Parameters affecting both listening and conversation quality include:
o Listening level
o Noise (both electrical circuit noise and environmental noise)
Hunt & Arden Expires January 8, 2009 [Page 16]
Internet-Draft RTP Monitoring Architectures July 2008
o Distortion (including amplitude clipping and codec distortion)
o Syllable clipping
o Comfort noise and voice activity detection
o Packet delay variation and jitter buffer operation
o Packet loss
Listening levels that are either too quiet or too loud can be
unpleasant and make communication difficult.
High noise levels can make listening difficult and in a conversation,
high background noise levels may cause a speaker to raise their voice
level so that they can hear themselves above the noise.
Certain types of signal distortion such as amplitude clipping can be
very unpleasant.
Syllable clipping occurs when the speech at the start or end of a
syllable is missing and can cause words to be misunderstood.
Voice activity detection is used to sense periods of voice inactivity
and then transmit them as silence periods to reduce bandwidth.
Artificial noise (comfort noise) is then injected on the receiving
side of a connection to mask the silence caused by the voice activity
detection. Without the comfort noise injection the listener might
think that the connection had died. However, the contrast between
comfort noise and transmitted background noise may be unpleasant for
the listener if the comfort noise has not been well matched to the
background noise.
Packet delay variation caused by the underlying transport has to be
"smoothed out" by using a jitter buffer to temporarily store received
speech and then play it back at a uniform rate. Jitter buffers that
are too short or have been incorrectly implemented may cause packet
loss, or "stuttering" of speech, and jitter buffers delays that are
too long unduly add to the overall delay of a connection. For speech
or music applications (not data) adaptive jitter buffers that reduce
delay as much as possible whilst minimising the risk of packet loss
are preferable. However buffer length adaptations must be carefully
managed to ensure they are inaudible. This is usually achieved by
ensuring that such adaptations occur during silence intervals.
Finally packet loss causes temporary loss of the signal that may
become unintelligible as a result.
Hunt & Arden Expires January 8, 2009 [Page 17]
Internet-Draft RTP Monitoring Architectures July 2008
In addition, a good conversational experience requires interactivity
between parties which in turn requires low delay, low echo
applications. So some additional parameters affecting conversation
quality can be listed as follows:
o Delay
o Talker echo
o Listener echo
o Double-talk performance
o Sidetone
Long delays affect interactivity and can cause one party to think
that the other party is being "very slow" in answering. In extreme
cases, very long delays can be very confusing and can cause one party
to talk over the other party. The only way round this problem is for
the conversation to become half duplex where each party takes it in
turns to speak, and each makes it clear when they have finished
speaking. Echo can either be caused by electrical reflections at a
2-wire to 4-wire converter or by acoustic or mechanical transmission
paths between microphone and earphone. The latter effect is known as
terminal coupling loss. Talker echo cause the speaker to hear an
echo of his own voice and can be very confusing. Listener echo is
generally less common and occurs when the listener hears an echo of
the speaker's voice. Short delays cause the signal to sound hollow
or slightly reverberant, whilst longer delays cause a distinct echo
or echoes.
Echo cancellers are used to minimise echo, but can cause other
problems if not carefully designed. For example, periods of double-
talk where both parties are speaking at the same time may cause the
canceller to diverge and produce echo.
Sidetone is local feedback from the speaker's microphone to their
earpiece, which lets them know that the connection is still "live".
Without this feedback, the connection would sound "dead", which would
be confusing. The level, frequency response and distortion of the
sidetone can all affect the user's experience.
5.4. Composite objective speech quality metrics
In addition to the individual "network" or "terminal" metrics
described in the previous section, there are several composite speech
quality metrics for objectively measuring end-to-end overall speech
quality, based on a 5-point scale defined as follows:
Hunt & Arden Expires January 8, 2009 [Page 18]
Internet-Draft RTP Monitoring Architectures July 2008
Where
o 5 = Excellent
o 4 = Good
o 3 = Fair
o 2 = Poor
o 1 = Bad
A measurement using the scale just described results in a Mean
Opinion Score (MOS), which represents the mean of several opinions
obtained from a subjective test. Mean opinion score terminology is
defined in [P.800.1].
The composite speech quality metrics are useful for commissioning and
Service Level Agreements (SLAs), but (as previously discussed)
further additional diagnostic information is required when these
metrics fall below threshold values.
Composite objective speech quality metrics can be divided into
listening quality (MOS-LQO) and conversational quality (MOS-CQO).
The ITU-T has produced several recommendations for measuring these
composite speech quality metrics [P.561], [P.562], [P.563], [P.564],
[P.862], [P.862.1], and [P.862.2]. A hierarchy of the various ITU
speech quality test methods is shown in Figure 2.
Hunt & Arden Expires January 8, 2009 [Page 19]
Internet-Draft RTP Monitoring Architectures July 2008
Objective speech quality test methods
|
-----------------------
| |
Listening Conversation
| |
----------------- |
| | |
Intrusive Non-intrusive INMD
Double-ended Single-ended P.561,P.562
| | |
| ----------- |
PESQ P.563 P.564 P.CQO
P.862, P.862.1 Estimate Estimate under
P.862.2 based based on development
WB extension on speech IP n/work
| payload parameters
|
P.OLQA
Under
Development
Figure 2: Hierarchy of ITU Speech quality test methods
Double-ended test methods (P.862/P.862.1/P.862.2) rely on a reference
signal that is injected at one end of the network and then captured
at the other end of the network. The reference and degraded signal
are compared and an auditory transform that models the human hearing
system is then applied to produce the final MOS value. In contrast,
single-ended systems do not require a reference signal and rely
solely on the speech payload (eg P.563) or on IP network parameters
(eg P.564). P.563 measures several individual characteristics of the
received speech signal and then combines the results to form a MOS-
LQO, which has been verified against subjectively scored degraded
speech files. P.564 uses several IP network parameters and permitted
RTCP-XR data to again produce a MOS-LQO. In general double-ended
methods are more accurate because they have a reference signal
against which to compare the degraded signal.
P.561 describes an In-service Non-intrusive Measurement Device (INMD)
for making in-service measurements of several voice and network
parameters, which can then be used to produce a conversational mean
opinion score as described in P.562. However the algorithm in P.562
was originally intended for TDM rather than IP applications and
therefore can only be applied to situations where the impact of IP
impairments is negligible. The term "In-service" means that the
measurements are made during real customer calls.
Hunt & Arden Expires January 8, 2009 [Page 20]
Internet-Draft RTP Monitoring Architectures July 2008
In addition to the recommendations already mentioned, there is also a
planning tool called the E-Model described in another ITU-T
recommendation [G.107]. This was not designed for monitoring
applications, but has unfortunately been mis-used for this purpose by
several vendors.
Another objective measurement tool is described in an ITU-R
Recommendation [BS.1387]. Perceptual Evaluation of Audio Quality
(PEAQ) has generally been optimised for the assessment of music
signals rather than speech and is applicable to high-quality coded
audio systems as used by broadcasters for example.
The listening quality methods already mentioned (P.862/P.862.1/
P.862.2, P.563 and P.564) all produce MOS-LQO values as their primary
outputs and either require speech as an input or individual network
parameters in the case of P.564. Each can be used at intermediate,
or end-points of the network provided that appropriate interfaces are
available. Except in the case of P.564, these methods either require
computational power at the measurement point, or the speech file has
to be captured and sent to a server for processing. In the latter
case, the size of the speech file is too large for transport by RTCP.
By contrast, a P.564 MOS-LQO calculation only relies on packet header
information and permitted information from RTCP-XR ie relatively
lightweight data.
P.561/P.562 is the only ITU conversational monitoring method
(although P.CQO is under development) and it requires the following
parameters to be measured:
o Active speech level
o Noise level (psophometrically weighted)
o Speech activity factor
o Speech echo path delay
And at least one of
o Echo loss
o Echo path loss
o Speech echo path loss
Class D INMDs [P.561] for IP applications are required to implement
the following functions:
Hunt & Arden Expires January 8, 2009 [Page 21]
Internet-Draft RTP Monitoring Architectures July 2008
o De-jitter buffer
o Voice decoder
o Comfort noise generator
o Error concealment process
and are required to measure packet delay variation and IP packet loss
ratio.
P.562 uses these input parameters to calculate a MOS-CQO score.
However as already mentioned the algorithm is at present suitable
only for situations where the impact of IP impairments is negligible.
Hunt & Arden Expires January 8, 2009 [Page 22]
Internet-Draft RTP Monitoring Architectures July 2008
6. Choosing transport protocols for metrics
Metrics related to RTP sessions are measured by RTP systems but may
use any convenient transport mechanism "horizontally" to other RTP
systems or "northbound" to session control or management systems,
e.g. RTCP XR [RFC3611], SNMP [RFC3410], as SIP [RFC3261] headers or
attachments, or TR-069 mechanisms [DSLF-TR-069].
6.1. RTCP as a transport for metrics - advantages and disadvantages
RTCP XR remains at least as a candidate transport protocol for
metrics, though note that [GUIDELINES] states explicitly that "The
amount of information going into RTCP reports should primarily target
the peer (and thus include information that can be meaningfully
reacted upon). Gathering and reporting statistics beyond this is not
an RTCP task and should be addressed by out-of-band protocols".
If RTCP is used, AVT need define only a generic means to transport
arbitrary payloads. Such a means is already available in the form of
RTCP XR block types [RFC3611]. If the data is self-describing, e.g.
based on ASN.1 [X.680] or XML [XML], or if usage is standardised in
profiles, it would be possible to transmit many different collections
of data whilst using only a small number of codepoints from the
limited namespace of XR report block types. As a minimum, only one
XR block type codepoint need be allocated per SDO, with delegation to
the SDO to manage a namespace defined by a type field in the payload.
The measurements of round-trip delay and packet loss could still use
the established mechanisms from RFC 3550.
This approach is analogous to the definition of codec payload formats
for RTP. A specification could define how metrics payloads are
carried in RTCP, and how SDP (including offer/answer) is used to
request an RTP system to send a metrics payload. The approach
decouples the RTCP base protocol (transport format, routing, and
transmission rate rules, and RTCP's base metrics) from less generic
use cases.
6.1.1. Advantages of RTCP
RTCP uses the same transport as the RTP media path and hence if media
may be transmitted, it is likely that RTCP may also be transmitted -
although for connections not using [RTPRTCPMUX], this is subject to
possible difficulties with NAT and firewall devices which may
sometimes not open a port for RTCP.
RTCP uses the same transport as the RTP media path so will normally
experience the same transport performance as that experienced by the
RTP media packets. Firstly this allows an RTCP-based mechanism to
Hunt & Arden Expires January 8, 2009 [Page 23]
Internet-Draft RTP Monitoring Architectures July 2008
make a representative measurement of round-trip delay. Secondly, if
QoS mechanisms such as expedited forwarding (EF) have been
implemented in support of the RTP media traffic, the transport is
likely to be low-delay and possibly also low-loss, compared with a
best-efforts class.
Existing transport devices (for example, SBCs, BGWs, NAT) have often
been implemented to allow RTCP to transit transparently on next
higher UDP port. The devices are unlikely to pass another protocol
for the transport of metrics without modification. This would make
it harder to introduce any non-RTCP protocol for transport of
metrics.
6.1.2. Disadvantages of RTCP
RTCP is usually carried over an unreliable RTP/UDP/IP transport. Any
monitoring scheme using RTCP as its transport must be designed to
tolerate message loss and duplication.
Bandwidth for the transport of RTCP may be limited. [RFC3550]
explicitly limits the bandwidth consumed by RTCP traffic to 5% of the
bandwidth used by RTP media. Even without this limitation, the
volume of traffic which is allowed access to EF queues may be
policed, such that large fractions of RTCP traffic might result in
high loss for both the RTCP traffic and for RTP media.
Hunt & Arden Expires January 8, 2009 [Page 24]
Internet-Draft RTP Monitoring Architectures July 2008
7. IANA Considerations
None.
Hunt & Arden Expires January 8, 2009 [Page 25]
Internet-Draft RTP Monitoring Architectures July 2008
8. Security Considerations
This document itself contains no normative text and hence should not
give rise to any new security considerations, to be confirmed.
[Editor's note - should this section consider security merits/
demerits of proposals for alternative protocols to RTCP?]
Hunt & Arden Expires January 8, 2009 [Page 26]
Internet-Draft RTP Monitoring Architectures July 2008
9. Acknowledgments
This document was originally motivated by ideas from Colin Perkins.
The authors would like to thank Graeme Gibbs at BT, and Debbie
Greenstreet and her TI colleagues for their review comments.
Hunt & Arden Expires January 8, 2009 [Page 27]
Internet-Draft RTP Monitoring Architectures July 2008
10. Informative References
[BS.1387] ITU-R, "Recommendation BS.1387. Method for objective
measurements of perceived audio quality", November 2001.
[DSLF-TR-069]
DSL Forum, "TR-069 CPE WAN Management Protocol v1.1",
December 2007.
[G.107] ITU-T, "Recommendation G.107. The E-model, a
computational model for use in transmission planning.",
March 2005.
[GUIDELINES]
Ott, J., "Guidelines for Extending the RTP Control
Protocol (RTCP)", ID draft-ott-avt-rtcp-guidelines-01,
June 2008.
[P.561] ITU-T, "Recommendation P.561, In-service non-intrusive
measurement device - Voice service measurements",
July 2002.
[P.562] ITU-T, "Recommendation P.562. Analysis and interpretation
of INMD voice-service measurements", May 2004.
[P.563] ITU-T, "Recommendation P.563. Single-ended method for
objective speech quality assessment in narrow-band
telephony applications", May 2004.
[P.564] ITU-T, "Recommendation P.564. Conformance testing for
narrowband voice over IP transmission quality assessment
models", November 2007.
[P.800.1] ITU-T, "Recommendation P.800.1, Mean Opinion Score (MOS)
terminology", July 2006.
[P.862] ITU-T, "Recommendation P.862. Perceptual evaluation of
speech quality (PESQ): An objective method for end-to-end
speech quality assessment of narrow-band telephone
networks and speech codecs", February 2001.
[P.862.1] ITU-T, "Recommendation P.862.1. Mapping function for
transforming P.862 raw result scores to MOS-LQO",
November 2003.
[P.862.2] ITU-T, "Recommendation P.862.2. Wideband extension to
Recommendation P.862 for the assessment of wideband
telephone networks and speech codecs", November 2007.
Hunt & Arden Expires January 8, 2009 [Page 28]
Internet-Draft RTP Monitoring Architectures July 2008
[RFC3261] Rosenberg, J., "SIP: Session Initiation Protocol",
RFC 3261, June 2002.
[RFC3410] Case, J., "Introduction and Applicability Statements for
Internet Standard Management Framework", RFC 3410,
December 2002.
[RFC3550] Schulzrinne, H., "RTP: A Transport Protocol for Real-Time
Applications", RFC 3550, July 2003.
[RFC3611] Friedman, T., "RTP Control Protocol Extended Reports (RTCP
XR)", RFC 3611, November 2003.
[RTPRTCPMUX]
Perkins, C., "Multiplexing RTP Data and Control Packets on
a Single Port", ID draft-ietf-avt-rtp-and-rtcp-mux-07,
August 2007.
[X.680] ITU-T, "Recommendation X.680, Abstract Syntax Notation One
(ASN.1): Specification of basic notation", July 2002.
[XML] W3C, "Extensible Markup Language (XML) 1.0 (Fourth
Edition)", September 2006.
Hunt & Arden Expires January 8, 2009 [Page 29]
Internet-Draft RTP Monitoring Architectures July 2008
Authors' Addresses
Geoff Hunt
BT
Orion 1 PP9
Adastral Park
Martlesham Heath
Ipswich, Suffolk IP5 3RE
United Kingdom
Phone: +44 1473 608325
Email: geoff.hunt@bt.com
Philip Arden
BT
Orion 3/7 PP4
Adastral Park
Martlesham Heath
Ipswich, Suffolk IP5 3RE
United Kingdom
Phone: +44 1473 644192
Email: philip.arden@bt.com
Hunt & Arden Expires January 8, 2009 [Page 30]
Internet-Draft RTP Monitoring Architectures July 2008
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Hunt & Arden Expires January 8, 2009 [Page 31]
Html markup produced by rfcmarkup 1.129d, available from
https://tools.ietf.org/tools/rfcmarkup/