draft-ietf-avtext-client-to-mixer-audio-level-05.txt   draft-ietf-avtext-client-to-mixer-audio-level-06.txt 
AVT J. Lennox, Ed. AVT J. Lennox, Ed.
Internet-Draft Vidyo Internet-Draft Vidyo
Intended status: Standards Track E. Ivov Intended status: Standards Track E. Ivov
Expires: March 23, 2012 Jitsi Expires: May 17, 2012 Jitsi
E. Marocco E. Marocco
Telecom Italia Telecom Italia
September 20, 2011 November 14, 2011
A Real-Time Transport Protocol (RTP) Header Extension for Client-to- A Real-Time Transport Protocol (RTP) Header Extension for Client-to-
Mixer Audio Level Indication Mixer Audio Level Indication
draft-ietf-avtext-client-to-mixer-audio-level-05 draft-ietf-avtext-client-to-mixer-audio-level-06
Abstract Abstract
This document defines a mechanism by which packets of Real-Time This document defines a mechanism by which packets of Real-Time
Transport Protocol (RTP) audio streams can indicate, in an RTP header Transport Protocol (RTP) audio streams can indicate, in an RTP header
extension, the audio level of the audio sample carried in the RTP extension, the audio level of the audio sample carried in the RTP
packet. In large conferences, this can reduce the load on an audio packet. In large conferences, this can reduce the load on an audio
mixer or other middlebox which wants to forward only a few of the mixer or other middlebox which wants to forward only a few of the
loudest audio streams, without requiring it to decode and measure loudest audio streams, without requiring it to decode and measure
every stream that is received. every stream that is received.
skipping to change at page 1, line 40 skipping to change at page 1, line 40
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 23, 2012. This Internet-Draft will expire on May 17, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 24 skipping to change at page 2, line 24
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Audio Levels . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Audio Levels . . . . . . . . . . . . . . . . . . . . . . . . . 3
4. Signaling (Setup) Information . . . . . . . . . . . . . . . . 5 4. Signaling (Setup) Information . . . . . . . . . . . . . . . . 5
5. Considerations on Use . . . . . . . . . . . . . . . . . . . . 6 5. Considerations on Use . . . . . . . . . . . . . . . . . . . . 6
6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8
8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 8.1. Normative References . . . . . . . . . . . . . . . . . . . 8
8.2. Informative References . . . . . . . . . . . . . . . . . . 8 8.2. Informative References . . . . . . . . . . . . . . . . . . 8
Appendix A. Changes From Earlier Versions . . . . . . . . . . . . 9 Appendix A. Changes From Earlier Versions . . . . . . . . . . . . 9
A.1. Changes From Draft -03 . . . . . . . . . . . . . . . . . . 9 A.1. Changes From Draft -05 . . . . . . . . . . . . . . . . . . 9
A.2. Changes From Draft -02 . . . . . . . . . . . . . . . . . . 9 A.2. Changes From Draft -04 . . . . . . . . . . . . . . . . . . 9
A.3. Changes From Draft -01 . . . . . . . . . . . . . . . . . . 9 A.3. Changes From Draft -03 . . . . . . . . . . . . . . . . . . 9
A.4. Changes From Individual Submission Draft -01 . . . . . . . 10 A.4. Changes From Draft -02 . . . . . . . . . . . . . . . . . . 10
A.5. Changes From Individual Submission Draft -00 . . . . . . . 10 A.5. Changes From Draft -01 . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 A.6. Changes From Individual Submission Draft -01 . . . . . . . 10
A.7. Changes From Individual Submission Draft -00 . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
1. Introduction 1. Introduction
In a centralized Real-Time Transport Protocol (RTP) [RFC3550] audio In a centralized Real-Time Transport Protocol (RTP) [RFC3550] audio
conference, an audio mixer or forwarder receives audio streams from conference, an audio mixer or forwarder receives audio streams from
many or all of the conference participants. It then selectively many or all of the conference participants. It then selectively
forwards some of them to other participants in the conference. In forwards some of them to other participants in the conference. In
large conferences, it is possible that such a server might be large conferences, it is possible that such a server might be
receiving a large number of streams, of which only a few are intended receiving a large number of streams, of which only a few are intended
to be forwarded to the other conference participants. to be forwarded to the other conference participants.
skipping to change at page 4, line 38 skipping to change at page 4, line 38
The magnitude of the audio level itself is packed into the seven The magnitude of the audio level itself is packed into the seven
least significant bits of the single byte of the header extension, least significant bits of the single byte of the header extension,
shown in Figure 1 and Figure 2. The least significant bit of the shown in Figure 1 and Figure 2. The least significant bit of the
audio level magnitude is packed into the least significant bit of the audio level magnitude is packed into the least significant bit of the
byte. The most significant bit of the byte is used as a separate byte. The most significant bit of the byte is used as a separate
flag bit "V", defined below. flag bit "V", defined below.
The audio level is expressed in -dBov, with values from 0 to 127 The audio level is expressed in -dBov, with values from 0 to 127
representing 0 to -127 dBov. dBov is the level, in decibels, relative representing 0 to -127 dBov. dBov is the level, in decibels, relative
to the overload point of the system, i.e. the maximum-amplitude to the overload point of the system, i.e. the highest-intensity
signal that can be handled by the system without clipping. (Note: signal encodable by the payload format. (Note: Representation
Representation relative to the overload point of a system is relative to the overload point of a system is particularly useful for
particularly useful for digital implementations, since one does not digital implementations, since one does not need to know the relative
need to know the relative calibration of the analog circuitry.) For calibration of the analog circuitry.) For example, in the case of
example, in the case of u-law (audio/pcmu) audio [ITU.G711.1988], the u-law (audio/pcmu) audio [ITU.G711.1988], the 0 dBov reference would
0 dBov reference would be a square wave with values +/- 8031. (This be a square wave with values +/- 8031. (This translates to 6.18
translates to 6.18 dBm0, relative to u-law's dBm0 definition in Table dBm0, relative to u-law's dBm0 definition in Table 6 of G.711.)
6 of G.711.)
The audio level for digital silence, for example for a muted audio The audio level for digital silence, for example for a muted audio
source, MUST be represented as 127 (-127 dBov), regardless of the source, MUST be represented as 127 (-127 dBov), regardless of the
dynamic range of the encoded audio format. dynamic range of the encoded audio format.
The audio level header extension only carries the level of the audio The audio level header extension only carries the level of the audio
in the RTP payload of the packet it is associated with, with no long- in the RTP payload of the packet it is associated with, with no long-
term averaging or smoothing applied. That level is measured as a term averaging or smoothing applied. For payload formats that
root mean square of all the samples in the measured range. contain extra error-correction bits or loss-concealment information,
the level corresponds only to the data that would result from the
payload's normal decoding process, not what it would produce under
error or packet loss concealment. The level is measured as a root
mean square of all the samples in the audio encoded by the packet.
To simplify implementation of the encoding procedures described here, To simplify implementation of the encoding procedures described here,
the reference implementation section in the reference implementation section in
[I-D.ietf-avtext-mixer-to-client-audio-level] provides a sample Java [I-D.ietf-avtext-mixer-to-client-audio-level] provides a sample Java
implementation of an audio level calculator that helps obtain such implementation of an audio level calculator that helps obtain such
values from raw linear PCM audio samples. values from raw linear PCM audio samples.
In addition, a flag bit (labeled V) optionally indicates whether the In addition, a flag bit (labeled V) optionally indicates whether the
encoder believes the audio packet contains voice activity. If the V encoder believes the audio packet contains voice activity. If the V
bit is in use, the value 1 indicates that the encoder believes the bit is in use, the value 1 indicates that the encoder believes the
skipping to change at page 7, line 15 skipping to change at page 7, line 18
levels (average audio level levels during non-speech). levels (average audio level levels during non-speech).
6. Security Considerations 6. Security Considerations
A malicious endpoint could choose to set the values in this header A malicious endpoint could choose to set the values in this header
extension falsely, so as to falsely claim that audio or voice is or extension falsely, so as to falsely claim that audio or voice is or
is not present. It is not clear what could be gained by falsely is not present. It is not clear what could be gained by falsely
claiming that audio is not present, but an endpoint falsely claiming claiming that audio is not present, but an endpoint falsely claiming
that audio is present could perform a denial-of-service attack on an that audio is present could perform a denial-of-service attack on an
audio conference, so as to send silence to suppress other conference audio conference, so as to send silence to suppress other conference
members' audio. Thus, if a device relys on audio level data from members' audio, or could dominate a conference (by seizing its
untrusted endpoints, it SHOULD periodically audit the level speaker-selection algorithm) without actually speaking. Thus, if a
information transmitted, taking appropriate corrective action against device relies on audio level data from untrusted endpoints, it SHOULD
endpoints that appear to be sending incorrect data. (However, as it periodically audit the level information transmitted, taking
is valid for an endpoint to choose to measure audio levels prior to appropriate corrective action against endpoints that appear to be
encoding, some degree of discrepancy could be present. This would sending incorrect data. (However, as it is valid for an endpoint to
not indicate that an endpoint is malicous.) choose to measure audio levels prior to encoding, some degree of
discrepancy could be present. This would not indicate that an
endpoint is malicous.)
In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP
header extensions are authenticated but not encrypted. When this header extensions are authenticated but not encrypted. When this
header extension is used, audio levels are therefore visible on a header extension is used, audio levels are therefore visible on a
packet-by-packet basis to an attacker passively observing the audio packet-by-packet basis to an attacker passively observing the audio
stream. As discussed in [I-D.perkins-avt-srtp-vbr-audio], such an stream. As discussed in [I-D.ietf-avtcore-srtp-vbr-audio], such an
attacker might be able to infer information about the conversation, attacker might be able to infer information about the conversation,
possibly with phoneme-level resolution. In scenarios where this is a possibly with phoneme-level resolution. In scenarios where this is a
concern, additional mechanisms SHOULD be used to protect the concern, additional mechanisms MUST be used to protect the
confidentiality of the header extension. This mechanism could be confidentiality of the header extension. This mechanism could be
header extension encryption header extension encryption
[I-D.ietf-avtcore-srtp-encrypted-header-ext], or a lower-level [I-D.ietf-avtcore-srtp-encrypted-header-ext], or a lower-level
security and authentication mechanism. security and authentication mechanism such as IPsec [RFC4301].
7. IANA Considerations 7. IANA Considerations
This document defines a new extension URI to the RTP Compact Header This document defines a new extension URI to the RTP Compact Header
Extensions subregistry of the Real-Time Transport Protocol (RTP) Extensions subregistry of the Real-Time Transport Protocol (RTP)
Parameters registry, according to the following data: Parameters registry, according to the following data:
Extension URI: urn:ietf:params:rtp-hdrext:ssrc-audio-level Extension URI: urn:ietf:params:rtp-hdrext:ssrc-audio-level
Description: Audio Level Description: Audio Level
Contact: jonathan@vidyo.com Contact: jonathan@vidyo.com
skipping to change at page 8, line 21 skipping to change at page 8, line 29
[RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
September 1997. September 1997.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003. Applications", STD 64, RFC 3550, July 2003.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005.
[RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP
Header Extensions", RFC 5285, July 2008. Header Extensions", RFC 5285, July 2008.
8.2. Informative References 8.2. Informative References
[I-D.ietf-avtcore-srtp-encrypted-header-ext] [I-D.ietf-avtcore-srtp-encrypted-header-ext]
Lennox, J., "Encryption of Header Extensions in the Secure Lennox, J., "Encryption of Header Extensions in the Secure
Real-Time Transport Protocol (SRTP)", Real-Time Transport Protocol (SRTP)",
draft-ietf-avtcore-srtp-encrypted-header-ext-00 (work in draft-ietf-avtcore-srtp-encrypted-header-ext-01 (work in
progress), June 2011. progress), October 2011.
[I-D.ietf-avtcore-srtp-vbr-audio]
Perkins, C. and J. Valin, "Guidelines for the use of
Variable Bit Rate Audio with Secure RTP",
draft-ietf-avtcore-srtp-vbr-audio-03 (work in progress),
July 2011.
[I-D.ietf-avtext-mixer-to-client-audio-level] [I-D.ietf-avtext-mixer-to-client-audio-level]
Ivov, E., Marocco, E., and J. Lennox, "A Real-Time Ivov, E., Marocco, E., and J. Lennox, "A Real-Time
Transport Protocol (RTP) Header Extension for Mixer-to- Transport Protocol (RTP) Header Extension for Mixer-to-
Client Audio Level Indication", Client Audio Level Indication",
draft-ietf-avtext-mixer-to-client-audio-level-05 (work in draft-ietf-avtext-mixer-to-client-audio-level-05 (work in
progress), September 2011. progress), September 2011.
[I-D.perkins-avt-srtp-vbr-audio]
Perkins, C. and J. Valin, "Guidelines for the use of
Variable Bit Rate Audio with Secure RTP",
draft-perkins-avt-srtp-vbr-audio-05 (work in progress),
December 2010.
[ITU.G711.1988] [ITU.G711.1988]
International Telecommunications Union, "Pulse Code International Telecommunications Union, "Pulse Code
Modulation (PCM) of Voice Frequencies", ITU- Modulation (PCM) of Voice Frequencies", ITU-
T Recommendation G.711, November 1988. T Recommendation G.711, November 1988.
[RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for
Comfort Noise (CN)", RFC 3389, September 2002. Comfort Noise (CN)", RFC 3389, September 2002.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, March 2004. RFC 3711, March 2004.
Appendix A. Changes From Earlier Versions Appendix A. Changes From Earlier Versions
Note to the RFC-Editor: please remove this section prior to Note to the RFC-Editor: please remove this section prior to
publication as an RFC. publication as an RFC.
A.1. Changes From Draft -03 A.1. Changes From Draft -05
o Added an informative reference to RFC 4301 (IPsec). (Brought up
by Stephen Farrell)
o Clarified the meaning of "overload point of the system". (Brought
up by Robert Sparks).
o Clarified that levels correspond only to the audio carried in the
normal decoding process, not error or packet loss concealment.
(Brought up by Robert Sparks).
o Added security consideration that false audio levels could be used
to seize a speaker-selection algorithm (Brought up by Robert
Sparks and Stewart Bryant).
o Updated reference to [I-D.ietf-avtcore-srtp-vbr-audio].
A.2. Changes From Draft -04
o Adjusted IPR header.
A.3. Changes From Draft -03
o Added vad extension attribute to negotiate use of the V bit. o Added vad extension attribute to negotiate use of the V bit.
o Addressed editorial comments made on the mailing list. o Addressed editorial comments made on the mailing list.
A.2. Changes From Draft -02 A.4. Changes From Draft -02
o Changed encoding related text so that it would cover both the one- o Changed encoding related text so that it would cover both the one-
byte and the two-byte header formats. byte and the two-byte header formats.
o Clarified use of root mean square for dBov calculation o Clarified use of root mean square for dBov calculation
o Added references to the sample level calculator in o Added references to the sample level calculator in
[I-D.ietf-avtext-mixer-to-client-audio-level]. [I-D.ietf-avtext-mixer-to-client-audio-level].
o Changed affiliation for Emil Ivov. o Changed affiliation for Emil Ivov.
o Other minor editorial changes. o Other minor editorial changes.
A.3. Changes From Draft -01 A.5. Changes From Draft -01
o Changed the URI for declaring this header extension from o Changed the URI for declaring this header extension from
"urn:ietf:params:rtp-hdrext:audio-level" to "urn:ietf:params:rtp-hdrext:audio-level" to
"urn:ietf:params:rtp-hdrext:ssrc-audio-level" for consistency with "urn:ietf:params:rtp-hdrext:ssrc-audio-level" for consistency with
[I-D.ietf-avtext-mixer-to-client-audio-level]. [I-D.ietf-avtext-mixer-to-client-audio-level].
o Removed the "Limitations" section; it was discussing a potential o Removed the "Limitations" section; it was discussing a potential
extension that consensus indicated was out of scope of this extension that consensus indicated was out of scope of this
document. document.
o Closed the P.56 open issue. It was agreed on IETF 80 that P.56 is o Closed the P.56 open issue. It was agreed on IETF 80 that P.56 is
mostly about speech levels and the levels transported by the mostly about speech levels and the levels transported by the
skipping to change at page 10, line 4 skipping to change at page 10, line 36
extension defined here should also be able to serve as an extension defined here should also be able to serve as an
indication for noise. indication for noise.
o Closed the open issue about transmitting noise floor information. o Closed the open issue about transmitting noise floor information.
Noise floor is (loosely) inferrable by observing the per-packet Noise floor is (loosely) inferrable by observing the per-packet
level information over a period of time, so the additional level information over a period of time, so the additional
complexity seemed unnecessary. complexity seemed unnecessary.
o Editorial changes for consistency with o Editorial changes for consistency with
[I-D.ietf-avtext-mixer-to-client-audio-level]. [I-D.ietf-avtext-mixer-to-client-audio-level].
o Moved several descriptions of normative items that previously had o Moved several descriptions of normative items that previously had
only been described in informative sections of the text. only been described in informative sections of the text.
o Other editorial clarifications. o Other editorial clarifications.
A.4. Changes From Individual Submission Draft -01 A.6. Changes From Individual Submission Draft -01
o This version is primarily a document refresh. o This version is primarily a document refresh.
o Emil Ivov and Enrico Marocco have been added as co-authors. o Emil Ivov and Enrico Marocco have been added as co-authors.
o Additional open issues listed. o Additional open issues listed.
A.5. Changes From Individual Submission Draft -00 A.7. Changes From Individual Submission Draft -00
o The draft name has been changed to clarify that this document o The draft name has been changed to clarify that this document
defines Client-To-Mixer Audio Levels, to more clearly distinguish defines Client-To-Mixer Audio Levels, to more clearly distinguish
it from [I-D.ietf-avtext-mixer-to-client-audio-level]. it from [I-D.ietf-avtext-mixer-to-client-audio-level].
o The header extension format has been changed from a two-byte to a o The header extension format has been changed from a two-byte to a
one-byte payload, eliminating the 7 reserved bits and the one one-byte payload, eliminating the 7 reserved bits and the one
must-be-zero bit. must-be-zero bit.
o The sections Considerations on Use (Section 5) and Limitations o The sections Considerations on Use (Section 5) and Limitations
have been added. have been added.
o It has been noted that senders MAY indicate -127 dBov for digital o It has been noted that senders MAY indicate -127 dBov for digital
silence, and that level measurement MAY be done prior to encoding silence, and that level measurement MAY be done prior to encoding
audio. audio.
o A reference to [I-D.ietf-avtcore-srtp-encrypted-header-ext] has o A reference to [I-D.ietf-avtcore-srtp-encrypted-header-ext] has
been added to the security considerations. been added to the security considerations.
o The term "header extension" is now used consistentenly throughout o The term "header extension" is now used consistentenly throughout
the document (as opposed to "extension header"). the document (as opposed to "extension header").
 End of changes. 21 change blocks. 
45 lines changed or deleted 73 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/