draft-ietf-avtext-client-to-mixer-audio-level-06.txt   rfc6464.txt 
AVT J. Lennox, Ed. Internet Engineering Task Force (IETF) J. Lennox, Ed.
Internet-Draft Vidyo Request for Comments: 6464 Vidyo
Intended status: Standards Track E. Ivov Category: Standards Track E. Ivov
Expires: May 17, 2012 Jitsi ISSN: 2070-1721 Jitsi
E. Marocco E. Marocco
Telecom Italia Telecom Italia
November 14, 2011 December 2011
A Real-Time Transport Protocol (RTP) Header Extension for Client-to- A Real-time Transport Protocol (RTP) Header Extension for
Mixer Audio Level Indication Client-to-Mixer Audio Level Indication
draft-ietf-avtext-client-to-mixer-audio-level-06
Abstract Abstract
This document defines a mechanism by which packets of Real-Time This document defines a mechanism by which packets of Real-time
Transport Protocol (RTP) audio streams can indicate, in an RTP header Transport Protocol (RTP) audio streams can indicate, in an RTP header
extension, the audio level of the audio sample carried in the RTP extension, the audio level of the audio sample carried in the RTP
packet. In large conferences, this can reduce the load on an audio packet. In large conferences, this can reduce the load on an audio
mixer or other middlebox which wants to forward only a few of the mixer or other middlebox that wants to forward only a few of the
loudest audio streams, without requiring it to decode and measure loudest audio streams, without requiring it to decode and measure
every stream that is received. every stream that is received.
Status of this Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering This is an Internet Standards Track document.
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 5741.
This Internet-Draft will expire on May 17, 2012. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc6464.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction ....................................................2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology .....................................................3
3. Audio Levels . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Audio Levels ....................................................3
4. Signaling (Setup) Information . . . . . . . . . . . . . . . . 5 4. Signaling (Setup) Information ...................................5
5. Considerations on Use . . . . . . . . . . . . . . . . . . . . 6 5. Considerations on Use ...........................................6
6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 6. Security Considerations .........................................6
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 7. IANA Considerations .............................................7
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8. References ......................................................7
8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 8.1. Normative References .......................................7
8.2. Informative References . . . . . . . . . . . . . . . . . . 8 8.2. Informative References .....................................8
Appendix A. Changes From Earlier Versions . . . . . . . . . . . . 9
A.1. Changes From Draft -05 . . . . . . . . . . . . . . . . . . 9
A.2. Changes From Draft -04 . . . . . . . . . . . . . . . . . . 9
A.3. Changes From Draft -03 . . . . . . . . . . . . . . . . . . 9
A.4. Changes From Draft -02 . . . . . . . . . . . . . . . . . . 10
A.5. Changes From Draft -01 . . . . . . . . . . . . . . . . . . 10
A.6. Changes From Individual Submission Draft -01 . . . . . . . 10
A.7. Changes From Individual Submission Draft -00 . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
1. Introduction 1. Introduction
In a centralized Real-Time Transport Protocol (RTP) [RFC3550] audio In a centralized Real-time Transport Protocol (RTP) [RFC3550] audio
conference, an audio mixer or forwarder receives audio streams from conference, an audio mixer or forwarder receives audio streams from
many or all of the conference participants. It then selectively many or all of the conference participants. It then selectively
forwards some of them to other participants in the conference. In forwards some of them to other participants in the conference. In
large conferences, it is possible that such a server might be large conferences, it is possible that such a server might be
receiving a large number of streams, of which only a few are intended receiving a large number of streams, of which only a few are intended
to be forwarded to the other conference participants. to be forwarded to the other conference participants.
In such a scenario, in order to pick the audio streams to forward, a In such a scenario, in order to pick the audio streams to forward, a
centralized server needs to decode, measure audio levels, and centralized server needs to decode, measure audio levels, and
possibly perform voice activity detection on audio data from a large possibly perform voice activity detection on audio data from a large
number of streams. The need for such processing limits the size or number of streams. The need for such processing limits the size or
number of conferences such a server can support. number of conferences such a server can support.
As an alternative, this document defines an RTP header extension As an alternative, this document defines an RTP header extension
[RFC5285] through which senders of audio packets can indicate the [RFC5285] through which senders of audio packets can indicate the
audio level of the packets' payload, reducing the processing load for audio level of the packets' payload, reducing the processing load for
a server. a server.
The header extension in this draft is different than, but The header extension in this document is different than, but
complementary with, the one defined in complementary with, the one defined in [RFC6465], which defines a
[I-D.ietf-avtext-mixer-to-client-audio-level], which defines a
mechanism by which audio mixers can indicate to clients the levels of mechanism by which audio mixers can indicate to clients the levels of
the contributing sources that made up the mixed audio. the contributing sources that made up the mixed audio.
2. Terminology 2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119] and document are to be interpreted as described in RFC 2119 [RFC2119] and
indicate requirement levels for compliant implementations. indicate requirement levels for compliant implementations.
3. Audio Levels 3. Audio Levels
The audio level header extension carries the level of the audio in The audio level header extension carries the level of the audio in
the RTP [RFC3550] payload of the packet it is associated with. This the RTP [RFC3550] payload of the packet with which it is associated.
information is carried in an RTP header extension element as defined This information is carried in an RTP header extension element as
by the "General Mechanism for RTP Header Extensions" [RFC5285]. defined by "A General Mechanism for RTP Header Extensions" [RFC5285].
The payload of the audio level header extension element can be The payload of the audio level header extension element can be
encoded using the one-byte or the two-byte header defined in encoded using either the one-byte or two-byte header defined in
[RFC5285]. Figure 1 and Figure 2 show sample audio level encodings [RFC5285]. Figures 1 and 2 show sample audio level encodings with
with each of them. each of these header formats.
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=0 |V| level |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Sample audio level encoding using the one-byte header format
Figure 1 0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=0 |V| level |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 Figure 1: Sample Audio Level Encoding Using the
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 One-Byte Header Format
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=1 |V| level | 0 (pad) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Sample audio level encoding using the two-byte header format 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=1 |V| level | 0 (pad) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2 Figure 2: Sample Audio Level Encoding Using the
Two-Byte Header Format
Note that, as indicated in [RFC5285] length field in the one-byte Note that, as indicated in [RFC5285], the length field in the one-
header format takes the value 0 to indicate that 1 byte follows. In byte header format takes the value 0 to indicate that 1 byte follows.
the two-byte header format on the other hand it takes the value of 1. In the two-byte header format, on the other hand, the length field
takes the value of 1.
The magnitude of the audio level itself is packed into the seven The magnitude of the audio level itself is packed into the seven
least significant bits of the single byte of the header extension, least significant bits of the single byte of the header extension,
shown in Figure 1 and Figure 2. The least significant bit of the shown in Figures 1 and 2. The least significant bit of the audio
audio level magnitude is packed into the least significant bit of the level magnitude is packed into the least significant bit of the byte.
byte. The most significant bit of the byte is used as a separate The most significant bit of the byte is used as a separate flag bit
flag bit "V", defined below. "V", defined below.
The audio level is expressed in -dBov, with values from 0 to 127 The audio level is expressed in -dBov, with values from 0 to 127
representing 0 to -127 dBov. dBov is the level, in decibels, relative representing 0 to -127 dBov. dBov is the level, in decibels, relative
to the overload point of the system, i.e. the highest-intensity to the overload point of the system, i.e., the highest-intensity
signal encodable by the payload format. (Note: Representation signal encodable by the payload format. (Note: Representation
relative to the overload point of a system is particularly useful for relative to the overload point of a system is particularly useful for
digital implementations, since one does not need to know the relative digital implementations, since one does not need to know the relative
calibration of the analog circuitry.) For example, in the case of calibration of the analog circuitry.) For example, in the case of
u-law (audio/pcmu) audio [ITU.G711.1988], the 0 dBov reference would u-law (audio/pcmu) audio [ITU.G711], the 0 dBov reference would be a
be a square wave with values +/- 8031. (This translates to 6.18 square wave with values +/- 8031. (This translates to 6.18 dBm0,
dBm0, relative to u-law's dBm0 definition in Table 6 of G.711.) relative to u-law's dBm0 definition in Table 6 of [ITU.G711].)
The audio level for digital silence, for example for a muted audio The audio level for digital silence -- for a muted audio source, for
source, MUST be represented as 127 (-127 dBov), regardless of the example -- MUST be represented as 127 (-127 dBov), regardless of the
dynamic range of the encoded audio format. dynamic range of the encoded audio format.
The audio level header extension only carries the level of the audio The audio level header extension only carries the level of the audio
in the RTP payload of the packet it is associated with, with no long- in the RTP payload of the packet with which it is associated, with no
term averaging or smoothing applied. For payload formats that long-term averaging or smoothing applied. For payload formats that
contain extra error-correction bits or loss-concealment information, contain extra error-correction bits or loss-concealment information,
the level corresponds only to the data that would result from the the level corresponds only to the data that would result from the
payload's normal decoding process, not what it would produce under payload's normal decoding process, not what it would produce under
error or packet loss concealment. The level is measured as a root error or packet loss concealment. The level is measured as a root
mean square of all the samples in the audio encoded by the packet. mean square of all the samples in the audio encoded by the packet.
To simplify implementation of the encoding procedures described here, To simplify implementation of the encoding procedures described here,
the reference implementation section in Appendix A of [RFC6465] provides a sample Java implementation of an
[I-D.ietf-avtext-mixer-to-client-audio-level] provides a sample Java audio level calculator that helps obtain such values from raw linear
implementation of an audio level calculator that helps obtain such Pulse Code Modulation (PCM) audio samples.
values from raw linear PCM audio samples.
In addition, a flag bit (labeled V) optionally indicates whether the In addition, a flag bit (labeled "V") optionally indicates whether
encoder believes the audio packet contains voice activity. If the V the encoder believes the audio packet contains voice activity. If
bit is in use, the value 1 indicates that the encoder believes the the V bit is in use, the value 1 indicates that the encoder believes
audio packet contains voice activity, and the value 0 indicates that the audio packet contains voice activity, and the value 0 indicates
the encoder believes it does not. (The voice activity detection that the encoder believes it does not. (The voice activity detection
algorithm is unspecified and left implementation-specific.) If the V algorithm is unspecified and left implementation-specific.) If the V
bit is not in use, its value is unspecified and MUST be ignored by bit is not in use, its value is unspecified and MUST be ignored by
receivers. The use of the V bit is signaled using the extension receivers. The use of the V bit is signaled using the extension
attribute "vad", discussed in Section 4. attribute "vad", discussed in Section 4.
When this header extension is used with RTP data sent using the RTP When this header extension is used with RTP data sent using the RTP
Payload for Redundant Audio Data [RFC2198], the header's data Payload for Redundant Audio Data [RFC2198], the header's data
describes the contents of the primary encoding. describes the contents of the primary encoding.
Note: This audio level is defined in the same manner as is audio Note: This audio level is defined in the same manner as is audio
noise level in the RTP Payload Comfort Noise specification [RFC3389]. noise level in the RTP Payload Comfort Noise specification
In the comfort noise specification, the overall magnitude of the [RFC3389]. In [RFC3389], the overall magnitude of the noise level
noise level in comfort noise is encoded into the first byte of the in comfort noise is encoded into the first byte of the payload,
payload, with spectral information about the noise in subsequent with spectral information about the noise in subsequent bytes.
bytes. This specification's audio level parameter is defined so as This specification's audio level parameter is defined so as to be
to be identical to the comfort noise payload's noise-level byte. identical to the comfort noise payload's noise-level byte.
4. Signaling (Setup) Information 4. Signaling (Setup) Information
The URI for declaring this header extension in an extmap attribute is The URI for declaring this header extension in an extmap attribute is
"urn:ietf:params:rtp-hdrext:ssrc-audio-level". "urn:ietf:params:rtp-hdrext:ssrc-audio-level".
It has a single extension attribute, named "vad". It takes the form It has a single extension attribute, named "vad". It takes the form
"vad=on" or "vad=off". If the header extension element is signaled "vad=on" or "vad=off". If the header extension element is signaled
with "vad=on", the "V" bit described in Section 3 is in use, and MUST with "vad=on", the V bit described in Section 3 is in use, and MUST
be set by senders. If the header extension element is signaled with be set by senders. If the header extension element is signaled with
"vad=off", the "V" bit is not in use, and its value MUST be ignored "vad=off", the V bit is not in use, and its value MUST be ignored by
by receivers. If the "vad" extension attribute is not specified, the receivers. If the vad extension attribute is not specified, the
default is "vad=on". default is "vad=on".
An example attribute line in the SDP, for a conference might hence An example attribute line in the Session Description Protocol (SDP)
be: for a conference might hence be:
a=extmap:6 urn:ietf:params:rtp-hdrext:ssrc-audio-level vad=on a=extmap:6 urn:ietf:params:rtp-hdrext:ssrc-audio-level vad=on
The "vad" extension attribute only controls the semantics of this The vad extension attribute only controls the semantics of this
header extension attribute, and does not make any statement about header extension attribute, and does not make any statement about
whether the sender is using any other voice activity detection whether the sender is using any other voice activity detection
features such as discontinuous transmission, comfort noise, or features, such as discontinuous transmission, comfort noise, or
silence suppression. silence suppression.
Using the mechanisms of [RFC5285], an endpoint MAY signal multiple Using the mechanisms of [RFC5285], an endpoint MAY signal multiple
instances of the header extension element, with different values of instances of the header extension element, with different values of
the vad attribute, so long as these instances use different values the vad attribute, so long as these instances use different values
for the extension identifier. However, again following the rules of for the extension identifier. However, again following the rules of
[RFC5285], the semantics chosen for a header extension element [RFC5285], the semantics chosen for a header extension element
(including its vad setting) for a particular extension identifier (including its vad setting) for a particular extension identifier
value MUST NOT be changed within an RTP session. value MUST NOT be changed within an RTP session.
skipping to change at page 6, line 45 skipping to change at page 6, line 22
server, as would be done in the absence of this information. This server, as would be done in the absence of this information. This
section discusses several issues that mixers and forwarders may wish section discusses several issues that mixers and forwarders may wish
to take into account. (Note that this section provides design to take into account. (Note that this section provides design
guidance only, and is not normative.) guidance only, and is not normative.)
First of all, audio levels generally ought to be measured over longer First of all, audio levels generally ought to be measured over longer
intervals than that of a single audio packet. In order to avoid intervals than that of a single audio packet. In order to avoid
false-positives for short bursts of sound (such as a cough or a false-positives for short bursts of sound (such as a cough or a
dropped microphone), it is often useful to require that a dropped microphone), it is often useful to require that a
participant's audio level be maintained for some period of time participant's audio level be maintained for some period of time
before considering it to be "real", i.e. some type of low-pass filter before considering it to be "real"; i.e., some type of low-pass
ought to be applied to the audio levels. Note, though, that such filter ought to be applied to the audio levels. Note, though, that
filtering must be balanced with the need to avoid clipping of the such filtering must be balanced with the need to avoid clipping of
beginning of a speaker's speech. the beginning of a speaker's speech.
Additionally, different participants may have their audio input set Additionally, different participants may have their audio input set
differently. It may be useful to apply some sort of automatic gain differently. It may be useful to apply some sort of automatic gain
control to the audio levels. There are a number of possible control to the audio levels. There are a number of possible
approaches to acheiving this, e.g. by measuring peak audio levels, by approaches to achieving this, e.g., by measuring peak audio levels,
average audio levels during speech, or by measuring background audio by average audio levels during speech, or by measuring background
levels (average audio level levels during non-speech). audio levels (average audio levels during non-speech).
6. Security Considerations 6. Security Considerations
A malicious endpoint could choose to set the values in this header A malicious endpoint could choose to set the values in this header
extension falsely, so as to falsely claim that audio or voice is or extension falsely, so as to falsely claim that audio or voice is or
is not present. It is not clear what could be gained by falsely is not present. It is not clear what could be gained by falsely
claiming that audio is not present, but an endpoint falsely claiming claiming that audio is not present, but an endpoint falsely claiming
that audio is present could perform a denial-of-service attack on an that audio is present, or falsely exaggerating its reported levels,
audio conference, so as to send silence to suppress other conference could perform a denial-of-service attack on an audio conference, so
members' audio, or could dominate a conference (by seizing its as to send silence to suppress other conference members' audio, or
speaker-selection algorithm) without actually speaking. Thus, if a could dominate a conference by seizing its speaker-selection
device relies on audio level data from untrusted endpoints, it SHOULD algorithm. Thus, if a device relies on audio level data from
periodically audit the level information transmitted, taking untrusted endpoints, it SHOULD periodically audit the level
appropriate corrective action against endpoints that appear to be information transmitted, taking appropriate corrective action against
sending incorrect data. (However, as it is valid for an endpoint to endpoints that appear to be sending incorrect data. (However, as it
choose to measure audio levels prior to encoding, some degree of is valid for an endpoint to choose to measure audio levels prior to
discrepancy could be present. This would not indicate that an encoding, some degree of discrepancy could be present. This would
endpoint is malicous.) not indicate that an endpoint is malicious.)
In the Secure Real-time Transport Protocol (SRTP) [RFC3711], RTP
In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP
header extensions are authenticated but not encrypted. When this header extensions are authenticated but not encrypted. When this
header extension is used, audio levels are therefore visible on a header extension is used, audio levels are therefore visible on a
packet-by-packet basis to an attacker passively observing the audio packet-by-packet basis to an attacker passively observing the audio
stream. As discussed in [I-D.ietf-avtcore-srtp-vbr-audio], such an stream. As discussed in [SRTP-VBR-AUDIO], such an attacker might be
attacker might be able to infer information about the conversation, able to infer information about the conversation, possibly with
possibly with phoneme-level resolution. In scenarios where this is a phoneme-level resolution. In scenarios where this is a concern,
concern, additional mechanisms MUST be used to protect the additional mechanisms MUST be used to protect the confidentiality of
confidentiality of the header extension. This mechanism could be the header extension. This mechanism could be header extension
header extension encryption encryption [SRTP-ENCR-HDR], or a lower-level security and
[I-D.ietf-avtcore-srtp-encrypted-header-ext], or a lower-level authentication mechanism such as IPsec [RFC4301].
security and authentication mechanism such as IPsec [RFC4301].
7. IANA Considerations 7. IANA Considerations
This document defines a new extension URI to the RTP Compact Header This document defines a new extension URI in the RTP Compact Header
Extensions subregistry of the Real-Time Transport Protocol (RTP) Extensions subregistry of the Real-Time Transport Protocol (RTP)
Parameters registry, according to the following data: Parameters registry, according to the following data:
Extension URI: urn:ietf:params:rtp-hdrext:ssrc-audio-level Extension URI: urn:ietf:params:rtp-hdrext:ssrc-audio-level
Description: Audio Level Description: Audio Level
Contact: jonathan@vidyo.com Contact: jonathan@vidyo.com
Reference: RFC XXXX Reference: RFC 6464
Note to RFC Editor: please replace "RFC XXXX" with the number of this
RFC.
8. References 8. References
8.1. Normative References 8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
September 1997. September 1997.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003. Applications", STD 64, RFC 3550, July 2003.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005.
[RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP
Header Extensions", RFC 5285, July 2008. Header Extensions", RFC 5285, July 2008.
8.2. Informative References 8.2. Informative References
[I-D.ietf-avtcore-srtp-encrypted-header-ext] [ITU.G711] International Telecommunication Union, "Pulse Code
Lennox, J., "Encryption of Header Extensions in the Secure Modulation (PCM) of Voice Frequencies",
Real-Time Transport Protocol (SRTP)", ITU-T Recommendation G.711, November 1988.
draft-ietf-avtcore-srtp-encrypted-header-ext-01 (work in
progress), October 2011.
[I-D.ietf-avtcore-srtp-vbr-audio]
Perkins, C. and J. Valin, "Guidelines for the use of
Variable Bit Rate Audio with Secure RTP",
draft-ietf-avtcore-srtp-vbr-audio-03 (work in progress),
July 2011.
[I-D.ietf-avtext-mixer-to-client-audio-level]
Ivov, E., Marocco, E., and J. Lennox, "A Real-Time
Transport Protocol (RTP) Header Extension for Mixer-to-
Client Audio Level Indication",
draft-ietf-avtext-mixer-to-client-audio-level-05 (work in
progress), September 2011.
[ITU.G711.1988]
International Telecommunications Union, "Pulse Code
Modulation (PCM) of Voice Frequencies", ITU-
T Recommendation G.711, November 1988.
[RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for
Comfort Noise (CN)", RFC 3389, September 2002. Comfort Noise (CN)", RFC 3389, September 2002.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, March 2004. RFC 3711, March 2004.
Appendix A. Changes From Earlier Versions [RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005.
Note to the RFC-Editor: please remove this section prior to
publication as an RFC.
A.1. Changes From Draft -05
o Added an informative reference to RFC 4301 (IPsec). (Brought up
by Stephen Farrell)
o Clarified the meaning of "overload point of the system". (Brought
up by Robert Sparks).
o Clarified that levels correspond only to the audio carried in the
normal decoding process, not error or packet loss concealment.
(Brought up by Robert Sparks).
o Added security consideration that false audio levels could be used
to seize a speaker-selection algorithm (Brought up by Robert
Sparks and Stewart Bryant).
o Updated reference to [I-D.ietf-avtcore-srtp-vbr-audio].
A.2. Changes From Draft -04
o Adjusted IPR header.
A.3. Changes From Draft -03
o Added vad extension attribute to negotiate use of the V bit.
o Addressed editorial comments made on the mailing list.
A.4. Changes From Draft -02
o Changed encoding related text so that it would cover both the one-
byte and the two-byte header formats.
o Clarified use of root mean square for dBov calculation
o Added references to the sample level calculator in
[I-D.ietf-avtext-mixer-to-client-audio-level].
o Changed affiliation for Emil Ivov.
o Other minor editorial changes.
A.5. Changes From Draft -01
o Changed the URI for declaring this header extension from
"urn:ietf:params:rtp-hdrext:audio-level" to
"urn:ietf:params:rtp-hdrext:ssrc-audio-level" for consistency with
[I-D.ietf-avtext-mixer-to-client-audio-level].
o Removed the "Limitations" section; it was discussing a potential
extension that consensus indicated was out of scope of this
document.
o Closed the P.56 open issue. It was agreed on IETF 80 that P.56 is
mostly about speech levels and the levels transported by the
extension defined here should also be able to serve as an
indication for noise.
o Closed the open issue about transmitting noise floor information.
Noise floor is (loosely) inferrable by observing the per-packet
level information over a period of time, so the additional
complexity seemed unnecessary.
o Editorial changes for consistency with
[I-D.ietf-avtext-mixer-to-client-audio-level].
o Moved several descriptions of normative items that previously had
only been described in informative sections of the text.
o Other editorial clarifications.
A.6. Changes From Individual Submission Draft -01
o This version is primarily a document refresh.
o Emil Ivov and Enrico Marocco have been added as co-authors.
o Additional open issues listed.
A.7. Changes From Individual Submission Draft -00 [RFC6465] Ivov, E., Ed., Marocco, E., Ed., and J. Lennox,
"A Real-time Transport Protocol (RTP) Header Extension for
Mixer-to-Client Audio Level Indication", RFC 6465,
December 2011.
o The draft name has been changed to clarify that this document [SRTP-ENCR-HDR]
defines Client-To-Mixer Audio Levels, to more clearly distinguish Lennox, J., "Encryption of Header Extensions in the Secure
it from [I-D.ietf-avtext-mixer-to-client-audio-level]. Real-Time Transport Protocol (SRTP)", Work in Progress,
o The header extension format has been changed from a two-byte to a October 2011.
one-byte payload, eliminating the 7 reserved bits and the one
must-be-zero bit.
o The sections Considerations on Use (Section 5) and Limitations [SRTP-VBR-AUDIO]
have been added. Perkins, C. and JM. Valin, "Guidelines for the use of
o It has been noted that senders MAY indicate -127 dBov for digital Variable Bit Rate Audio with Secure RTP", Work
silence, and that level measurement MAY be done prior to encoding in Progress, July 2011.
audio.
o A reference to [I-D.ietf-avtcore-srtp-encrypted-header-ext] has
been added to the security considerations.
o The term "header extension" is now used consistentenly throughout
the document (as opposed to "extension header").
Authors' Addresses Authors' Addresses
Jonathan Lennox (editor) Jonathan Lennox (editor)
Vidyo, Inc. Vidyo, Inc.
433 Hackensack Avenue 433 Hackensack Avenue
Seventh Floor Seventh Floor
Hackensack, NJ 07601 Hackensack, NJ 07601
US US
Email: jonathan@vidyo.com EMail: jonathan@vidyo.com
Emil Ivov Emil Ivov
Jitsi Jitsi
Strasbourg 67000 Strasbourg 67000
France France
Email: emcho@jitsi.org EMail: emcho@jitsi.org
Enrico Marocco Enrico Marocco
Telecom Itialia Telecom Italia
Via G. Reiss Romoli, 274 Via G. Reiss Romoli, 274
Turin 10148 Turin 10148
Italy Italy
Email: enrico.marocco@telecomitalia.it EMail: enrico.marocco@telecomitalia.it
 End of changes. 49 change blocks. 
254 lines changed or deleted 142 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/