draft-ietf-avtext-client-to-mixer-audio-level-02.txt   draft-ietf-avtext-client-to-mixer-audio-level-03.txt 
AVT J. Lennox, Ed. AVT J. Lennox, Ed.
Internet-Draft Vidyo Internet-Draft Vidyo
Intended status: Standards Track E. Ivov Intended status: Standards Track E. Ivov
Expires: December 4, 2011 Jitsi Expires: January 6, 2012 Jitsi
E. Marocco E. Marocco
Telecom Italia Telecom Italia
June 2, 2011 July 5, 2011
A Real-Time Transport Protocol (RTP) Header Extension for Client-to- A Real-Time Transport Protocol (RTP) Header Extension for Client-to-
Mixer Audio Level Indication Mixer Audio Level Indication
draft-ietf-avtext-client-to-mixer-audio-level-02 draft-ietf-avtext-client-to-mixer-audio-level-03
Abstract Abstract
This document defines a mechanism by which packets of Real-Time This document defines a mechanism by which packets of Real-Time
Transport Protocol (RTP) audio streams can indicate, in an RTP header Transport Protocol (RTP) audio streams can indicate, in an RTP header
extension, the audio level of the audio sample carried in the RTP extension, the audio level of the audio sample carried in the RTP
packet. In large conferences, this can reduce the load on an audio packet. In large conferences, this can reduce the load on an audio
mixer or other middlebox which wants to forward only a few of the mixer or other middlebox which wants to forward only a few of the
loudest audio streams, without requiring it to decode and measure loudest audio streams, without requiring it to decode and measure
every stream that is received. every stream that is received.
skipping to change at page 1, line 40 skipping to change at page 1, line 40
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 4, 2011. This Internet-Draft will expire on January 6, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 13 skipping to change at page 3, line 13
than English. than English.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Audio Levels . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Audio Levels . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Signaling (Setup) Information . . . . . . . . . . . . . . . . 6 4. Signaling (Setup) Information . . . . . . . . . . . . . . . . 6
5. Considerations on Use . . . . . . . . . . . . . . . . . . . . 6 5. Considerations on Use . . . . . . . . . . . . . . . . . . . . 6
6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8
8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 8.1. Normative References . . . . . . . . . . . . . . . . . . . 8
8.2. Informative References . . . . . . . . . . . . . . . . . . 8 8.2. Informative References . . . . . . . . . . . . . . . . . . 8
Appendix A. Changes From Earlier Versions . . . . . . . . . . . . 9 Appendix A. Changes From Earlier Versions . . . . . . . . . . . . 9
A.1. Changes From Draft -01 . . . . . . . . . . . . . . . . . . 9 A.1. Changes From Draft -02 . . . . . . . . . . . . . . . . . . 9
A.2. Changes From Draft -00 . . . . . . . . . . . . . . . . . . 9 A.2. Changes From Draft -01 . . . . . . . . . . . . . . . . . . 9
A.3. Changes From Individual Submission Draft -01 . . . . . . . 10 A.3. Changes From Draft -00 . . . . . . . . . . . . . . . . . . 10
A.4. Changes From Individual Submission Draft -00 . . . . . . . 10 A.4. Changes From Individual Submission Draft -01 . . . . . . . 10
A.5. Changes From Individual Submission Draft -00 . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction 1. Introduction
In a centralized Real-Time Transport Protocol (RTP) [RFC3550] audio In a centralized Real-Time Transport Protocol (RTP) [RFC3550] audio
conference, an audio mixer or forwarder receives audio streams from conference, an audio mixer or forwarder receives audio streams from
many or all of the conference participants. It then selectively many or all of the conference participants. It then selectively
forwards some of them to other participants in the conference. In forwards some of them to other participants in the conference. In
large conferences, it is possible that such a server might be large conferences, it is possible that such a server might be
receiving a large number of streams, of which only a few should be receiving a large number of streams, of which only a few should be
skipping to change at page 4, line 41 skipping to change at page 4, line 41
2. Terminology 2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119] and document are to be interpreted as described in RFC 2119 [RFC2119] and
indicate requirement levels for compliant implementations. indicate requirement levels for compliant implementations.
3. Audio Levels 3. Audio Levels
The audio level header extension element carries the level of the The audio level header extension carries the level of the audio in
audio in the RTP payload of the packet it is associated with, and the RTP payload of the packet it is associated with. This
also an indication as to whether voice activity has been detected in information is carried in an RTP header extension element as defined
the packet. This information is carried in an RTP header extension by the "General Mechanism for RTP Header Extensions" [RFC5285].
element as defined by [RFC5285].
The payload of the audio level header extension element is as The payload of the audio level header extension element can be
follows: encoded using the one or the two-byte header defined in [RFC5285].
Figure 1 and Figure 2 show sample audio level encodings with each of
them.
0 1 0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=0 |V| level | | ID | len=0 |V| level |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Sample audio level encoding using the one-byte header format
Figure 1 Figure 1
The length field takes the value 0 to indicate that 1 byte follows. 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=1 |V| level | 0 (pad) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The two-byte header defined in RFC 5285 [RFC5285] may also be used. Sample audio level encoding using the two-byte header format
The magnitude of the audio level is packed into the seven least Figure 2
significant bits of the single byte of the header extension, shown in
Figure 1. The least significant bit of the audio level magnitude is Note that, as indicated in [RFC5285] length field in the one-byte
packed into the least significant bit of the byte. The most header format takes the value 0 to indicate that 1 byte follows. In
significant bit of the byte is used as a separate flag bit "V", the two-byte header format on the other hand it takes the value of 1.
defined below.
The magnitude of the audio level itself is packed into the seven
least significant bits of the single byte of the header extension,
shown in Figure 1 and Figure 2. The least significant bit of the
audio level magnitude is packed into the least significant bit of the
byte. The most significant bit of the byte is used as a separate
flag bit "V", defined below.
The audio level is expressed in -dBov, with values from 0 to 127 The audio level is expressed in -dBov, with values from 0 to 127
representing 0 to -127 dBov. dBov is the level, in decibels, relative representing 0 to -127 dBov. dBov is the level, in decibels, relative
to the overload point of the system, i.e. the maximum-amplitude to the overload point of the system, i.e. the maximum-amplitude
signal that can be handled by the system without clipping. (Note: signal that can be handled by the system without clipping. (Note:
Representation relative to the overload point of a system is Representation relative to the overload point of a system is
particularly useful for digital implementations, since one does not particularly useful for digital implementations, since one does not
need to know the relative calibration of the analog circuitry.) For need to know the relative calibration of the analog circuitry.) For
example, in the case of u-law (audio/pcmu) audio [ITU.G711.1988], the example, in the case of u-law (audio/pcmu) audio [ITU.G711.1988], the
0 dBov reference would be a square wave with values +/- 8031. (This 0 dBov reference would be a square wave with values +/- 8031. (This
translates to 6.18 dBm0, relative to u-law's dBm0 definition in Table translates to 6.18 dBm0, relative to u-law's dBm0 definition in Table
6 of G.711.) 6 of G.711.)
The audio level for digital silence, for example for a muted audio The audio level for digital silence, for example for a muted audio
source, MAY be represented as 127 (-127 dBov), regardless of the source, MUST be represented as 127 (-127 dBov), regardless of the
dynamic range of the encoded audio format. dynamic range of the encoded audio format.
Implementations MAY choose to measure audio levels prior to encoding
them in the payload carried in the RTP payload, e.g. on raw linear
PCM input.
The audio level header extension only carries the level of the audio The audio level header extension only carries the level of the audio
in the RTP payload of the packet it is associated with, with no long- in the RTP payload of the packet it is associated with, with no long-
term averaging or smoothing applied. term averaging or smoothing applied. That level is measured as a
root mean square of all the samples in the measured range.
To simplify implementation of the encoding procedures described here, To simplify implementation of the encoding procedures described here,
the reference implementation section in the reference implementation section in
[I-D.ietf-avtext-mixer-to-client-audio-level] provides a sample Java [I-D.ietf-avtext-mixer-to-client-audio-level] provides a sample Java
implementation of an audio level calculator that helps obtain such implementation of an audio level calculator that helps obtain such
values from raw linear PCM audio samples. values from raw linear PCM audio samples.
In addition, a flag bit (labeled V) indicates whether the encoder In addition, a flag bit (labeled V) indicates whether the encoder
believes the audio packet contains voice activity (1) or does not believes the audio packet contains voice activity (1) or does not
(0). The voice activity detection algorithm is unspecified and left (0). The voice activity detection algorithm is unspecified and left
skipping to change at page 9, line 22 skipping to change at page 9, line 28
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, March 2004. RFC 3711, March 2004.
Appendix A. Changes From Earlier Versions Appendix A. Changes From Earlier Versions
Note to the RFC-Editor: please remove this section prior to Note to the RFC-Editor: please remove this section prior to
publication as an RFC. publication as an RFC.
A.1. Changes From Draft -01 A.1. Changes From Draft -02
o Changed encoding related text so that it would cover both the one-
byte and the two-byte header formats.
o Clarified use of root mean square for dBov calculation
o Other minor editorial changes.
A.2. Changes From Draft -01
o Changed the URI for declaring this header extension from o Changed the URI for declaring this header extension from
"urn:ietf:params:rtp-hdrext:audio-level" to "urn:ietf:params:rtp-hdrext:audio-level" to
"urn:ietf:params:rtp-hdrext:ssrc-audio-level" for consistency with "urn:ietf:params:rtp-hdrext:ssrc-audio-level" for consistency with
[I-D.ietf-avtext-mixer-to-client-audio-level]. [I-D.ietf-avtext-mixer-to-client-audio-level].
o Removed the "Limitations" section; it was discussing a potential o Removed the "Limitations" section; it was discussing a potential
extension that consensus indicated was out of scope of this extension that consensus indicated was out of scope of this
document. document.
o Closed the P.56 open issue. It was agreed on IETF 80 that P.56 is o Closed the P.56 open issue. It was agreed on IETF 80 that P.56 is
mostly about speech levels and the levels transported by the mostly about speech levels and the levels transported by the
skipping to change at page 9, line 39 skipping to change at page 10, line 4
extension that consensus indicated was out of scope of this extension that consensus indicated was out of scope of this
document. document.
o Closed the P.56 open issue. It was agreed on IETF 80 that P.56 is o Closed the P.56 open issue. It was agreed on IETF 80 that P.56 is
mostly about speech levels and the levels transported by the mostly about speech levels and the levels transported by the
extension defined here should also be able to serve as an extension defined here should also be able to serve as an
indication for noise. indication for noise.
o Closed the open issue about transmitting noise floor information. o Closed the open issue about transmitting noise floor information.
Noise floor is (loosely) inferrable by observing the per-packet Noise floor is (loosely) inferrable by observing the per-packet
level information over a period of time, so the additional level information over a period of time, so the additional
complexity seemed unnecessary. complexity seemed unnecessary.
o Editorial changes for consistency with o Editorial changes for consistency with
[I-D.ietf-avtext-mixer-to-client-audio-level]. [I-D.ietf-avtext-mixer-to-client-audio-level].
o Moved several descriptions of normative items that previously had o Moved several descriptions of normative items that previously had
only been described in informative sections of the text. only been described in informative sections of the text.
o Other editorial clarifications. o Other editorial clarifications.
A.2. Changes From Draft -00 A.3. Changes From Draft -00
o Added references to the sample level calculator in o Added references to the sample level calculator in
[I-D.ietf-avtext-mixer-to-client-audio-level]. [I-D.ietf-avtext-mixer-to-client-audio-level].
o Changed affiliation for Emil Ivov. o Changed affiliation for Emil Ivov.
A.3. Changes From Individual Submission Draft -01 A.4. Changes From Individual Submission Draft -01
o This version is primarily a document refresh. o This version is primarily a document refresh.
o Emil Ivov and Enrico Marocco have been added as co-authors. o Emil Ivov and Enrico Marocco have been added as co-authors.
o Additional open issues listed. o Additional open issues listed.
A.4. Changes From Individual Submission Draft -00 A.5. Changes From Individual Submission Draft -00
o The draft name has been changed to clarify that this document o The draft name has been changed to clarify that this document
defines Client-To-Mixer Audio Levels, to more clearly distinguish defines Client-To-Mixer Audio Levels, to more clearly distinguish
it from [I-D.ietf-avtext-mixer-to-client-audio-level]. it from [I-D.ietf-avtext-mixer-to-client-audio-level].
o The header extension format has been changed from a two-byte to a o The header extension format has been changed from a two-byte to a
one-byte payload, eliminating the 7 reserved bits and the one one-byte payload, eliminating the 7 reserved bits and the one
must-be-zero bit. must-be-zero bit.
o The sections Considerations on Use (Section 5) and Limitations o The sections Considerations on Use (Section 5) and Limitations
have been added. have been added.
o It has been noted that senders MAY indicate -127 dBov for digital o It has been noted that senders MAY indicate -127 dBov for digital
 End of changes. 21 change blocks. 
35 lines changed or deleted 54 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/