draft-ietf-payload-rtp-sbc-01.txt   draft-ietf-payload-rtp-sbc-02.txt 
Working Group PAYLOAD C. Hoene Working Group PAYLOAD C. Hoene
Internet Draft University of Tuebingen Internet Draft University of Tuebingen
Intended status: Standards Track F. de Bont Intended status: Standards Track F. de Bont
Expires: June 2012 Philips Electronics Expires: July 2012 Philips Electronics
December 15, 2011 January 4, 2012
RTP Payload Format for Bluetooth's SBC audio codec RTP Payload Format for Bluetooth's SBC Audio Codec
draft-ietf-payload-rtp-sbc-01 draft-ietf-payload-rtp-sbc-02
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 32 skipping to change at page 1, line 32
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on December 15, 2011. This Internet-Draft will expire on July 4, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided without
skipping to change at page 2, line 22 skipping to change at page 2, line 22
Profile (A2DP) Specification written by the Bluetooth(r) Special Profile (A2DP) Specification written by the Bluetooth(r) Special
Interest Group (SIG). The payload format is designed to be able to Interest Group (SIG). The payload format is designed to be able to
interoperate with existing Bluetooth A2DP devices, to provide high interoperate with existing Bluetooth A2DP devices, to provide high
streaming audio quality, interactive audio transmission over the streaming audio quality, interactive audio transmission over the
internet, and ultra-low delay coding for jam sessions on the internet, and ultra-low delay coding for jam sessions on the
internet. This document contains also a media type registration internet. This document contains also a media type registration
which specifies the use of the RTP payload format. which specifies the use of the RTP payload format.
Table of Contents Table of Contents
1. Introduction ................................................ 3 1. Introduction ................................................. 3
2. Conventions used in this document ........................... 3 2. Conventions used in this Document ............................ 3
3. Background .................................................. 3 3. Background ................................................... 3
4. Usage Scenarios ............................................. 5 3.1. SBC Media Payload Format ................................ 5
4.1. Scenario 1: Interconnection of A2DP devices ............ 5 3.2. SBC Fragmentation ....................................... 5
4.2. Scenario 2: High quality interactive audio transmissions 6 3.3. Media Payload Format Header ............................. 6
4.3. Scenario 3: Ensembles performing over a network ........ 6 3.4. SBC Frame Structure ..................................... 7
5. Header Usage ................................................ 7 3.5. Frame Header ............................................ 7
6. Payload Format .............................................. 8 3.6. Remaining Frame Part .................................... 9
6.1. Media payload format header ............................ 9 4. Usage Scenarios .............................................. 9
6.2. SBC Frame Structure .................................... 9 4.1. Scenario 1: Interconnection of A2DP Devices ............ 10
6.3. Frame header .......................................... 10 4.2. Scenario 2: High Quality Interactive Audio Transmissions 10
6.4. Remaining frame........................................ 12 4.3. Scenario 3: Ensembles performing over a Network ........ 11
7. Payload Format Parameters .................................. 12 5. Header Usage ................................................ 11
7.1. SBC Media Type Registration ........................... 12 6. Payload Format .............................................. 12
7.1.1. Capabilities: A2DP modes ......................... 14 7. Payload Format Parameters ................................... 12
7.1.2. Capabilities: other modes ........................ 15 7.1. Media Type Registration for SBC ........................ 13
7.2. Mapping to SDP Parameters ............................. 15 7.1.1. Capabilities: A2DP Modes .......................... 14
7.2.1. Offer-Answer Model Considerations ................ 16 7.1.2. Capabilities: Other Modes ......................... 15
7.2.2. Declarative SDP Considerations ................... 18 7.2. Mapping to SDP Parameters .............................. 15
8. Congestion Control ......................................... 18 7.2.1. Offer-Answer Model Considerations ................. 16
9. Packet loss concealment .................................... 19 7.2.2. Declarative SDP Considerations .................... 18
10. Security Considerations ................................... 19 8. Congestion Control .......................................... 18
11. IANA Considerations........................................ 20 9. Packet Loss Concealment ..................................... 19
12. References ................................................ 21 10. Security Considerations .................................... 19
12.1. Normative References ................................. 21 11. IANA Considerations......................................... 20
12.2. Informative References ............................... 21 12. References ................................................. 21
13. Acknowledgments ........................................... 23 12.1. Normative References .................................. 21
12.2. Informative References ................................ 21
13. Acknowledgments ............................................ 23
1. Introduction 1. Introduction
The Bluetooth(r) Special Interest Group (SIG) specifies in the The Bluetooth(r) Special Interest Group (SIG) specifies in the
Advanced Audio Distribution Profile (A2DP) [A2DPV10] a mono and Advanced Audio Distribution Profile (A2DP) [A2DPV12] a mono and
stereo high quality audio subband codec (SBC). This document stereo high quality audio subband codec (SBC). This document
specifies the payload format for the encapsulation of SBC encoded specifies the payload format for the encapsulation of SBC encoded
audio frames into the Real-time Transport Protocol (RTP). audio frames into the Real-time Transport Protocol (RTP).
SBC has a low computational complexity at modest compression rates. SBC has a low computational complexity at modest compression rates.
Its bit rate can be controlled widely. Recommended operational modes Its bit rate can be controlled widely. Recommended operational modes
range from 127 to 345 kb/s, for mono and stereo audio signals. SBC's range from 127 to 345 kb/s, for mono and stereo audio signals. SBC's
algorithmic delay can be as low as 16 samples making it ideal for algorithmic delay can be as low as 16 samples making it ideal for
ensembles playing music over the network requiring ultra low ensembles playing music over the network requiring ultra low
acoustic delays. acoustic delays.
2. Conventions used in this document 2. Conventions used in this Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [RFC2119]. document are to be interpreted as described in RFC-2119 [RFC2119].
The following acronyms are used in this document: The following acronyms are used in this document:
A2DP - Audio Distribution Profile A2DP - Audio Distribution Profile
AAC - Advanced Audio Coding AAC - Advanced Audio Coding
ATRAC - Adaptive Transform Acoustic Coding ATRAC - Adaptive Transform Acoustic Coding
DCCP - Datagram Congestion Control Protocol DCCP - Datagram Congestion Control Protocol
MP3 - MPEG-1 Audio Layer 3 MP3 - MPEG-1 Audio Layer 3
SBC - SubBand Codec SBC - SubBand Codec
SIG - Special Interest Group SIG - Special Interest Group
3. Background 3. Background
The A2DP specification is intended for streaming of music content to The A2DP specification [A2DPV12] is intended for streaming of music
headphones, headsets, or speakers over Bluetooth wireless channels. content to headphones, headsets, or speakers over Bluetooth wireless
A2DP supports multiple audio coding including MP3, AAC, ATRAC, which channels. A2DP supports multiple audio coding including MP3, AAC,
are all non-mandatory. To ensure interoperability, the SBC codec has ATRAC, which are all non-mandatory. To ensure interoperability, the
been specified, which shall be included into all A2DP Bluetooth SBC codec has been specified, in appendix B of the A2DP
specification, which shall be included into all A2DP Bluetooth
devices. devices.
The SBC is a low complexity subband codec based on earlier work SBC is a low complexity subband codec based on earlier work
presented in [Bon1995] and [Rault1989]. It has a moderate presented in [Bon1995] and [Rault1989]. It has a moderate
compression ratio. The SBC encoder has filter banks splitting the compression ratio. The SBC encoder has filter banks splitting the
audio signal into 4 or 8 subbands. Then the codec decides with how audio signal into 4 or 8 subbands. Then the codec decides with how
many bits each subband is encoded and finally quantizes the subband many bits each subband is encoded and finally quantizes the subband
signals blockwise. An SBC frame can have different block sizes. The signals blockwise. An SBC frame can have different block sizes. The
size of a block can be 4, 8, 12 or 16. Both decoder and encoder size of a block can be 4, 8, 12 or 16. Both decoder and encoder
shall support all four block sizes. shall support all four block sizes.
SBC can operate at four different sampling frequencies. The sampling SBC can operate at four different sampling frequencies. The sampling
frequency can be selected from a set of 16, 32, 44.1, and 48 kHz. It frequency can be selected from a set of 16, 32, 44.1, and 48 kHz. It
skipping to change at page 4, line 27 skipping to change at page 4, line 30
mode. mode.
SBC can use four or eight subbands. The decoder shall support both; SBC can use four or eight subbands. The decoder shall support both;
the encoder shall support at least 8 subbands. the encoder shall support at least 8 subbands.
The bit allocation modes of SBC can be either based on signal to The bit allocation modes of SBC can be either based on signal to
noise ratio or on loudness. The decoder shall support both modes; noise ratio or on loudness. The decoder shall support both modes;
the encoder shall support at least the loudness mode. the encoder shall support at least the loudness mode.
The SBC encoder reduces one block to a given number of bits. The The SBC encoder reduces one block to a given number of bits. The
bit-pool variable defines how many bits are used per block. A2DP bit-pool variable defines how many bits are used per block. The A2DP
devices define the range of valid bit-pool values by providing profile defines the range of valid bit-pool values by providing
minimum and maximum bit-pool values. The bit-pool values shall range minimum and maximum bit-pool values. The bit-pool values shall range
from 2 to 250 but shall not be larger than number of subbands times from 2 to 250 but shall not be larger than number of subbands times
16 for the mono and dual and times 32 for the stereo and joint- 16 for the mono and dual and times 32 for the stereo and joint-
stereo channel modes. stereo channel modes.
SBC encoders inside A2DP devices may be capable of changing the bit- SBC encoders according to the A2DP profile may be capable of
pool parameter dynamically during the encoding process. For example, changing the bit-pool parameter dynamically during the encoding
algorithms were invented that change the number of bits depending on process. For example, algorithms were invented that change the
the current acoustic content [Pilati2008]. number of bits depending on the current acoustic content
[Pilati2008].
The decoder shall support all possible bit-pool values that do not An SBC decoder according to the A2DP profile shall support all
result in excess of maximum bit rate, which is 320kb/s for mono and possible bit-pool values that do not result in excess of maximum bit
512kb/s for two-channel modes. The encoder is required to support at rate, which is 320kb/s for mono and 512kb/s for two-channel modes.
least one possible bit-pool value. The A2DP specification recommends The encoder is required to support at least one possible bit-pool
the encoding parameters given in Table 1. value. The A2DP profile recommends the encoding parameters given in
Table 1.
+------------------------------------------------------------+ +------------------------------------------------------------+
| SBC encoder settings at Medium Quality | | SBC encoder settings at Medium Quality |
+--------------------------------+-------------+-------------+ +--------------------------------+-------------+-------------+
| | Mono | Joint Stereo| | | Mono | Joint Stereo|
| Sampling frequency (kHz) | 44.1 | 48 | 44.1 | 48 | | Sampling frequency (kHz) | 44.1 | 48 | 44.1 | 48 |
| Bitpool value | 19 | 18 | 35 | 33 | | Bitpool value | 19 | 18 | 35 | 33 |
| Resulting frame length (bytes) | 46 | 44 | 83 | 79 | | Resulting frame length (bytes) | 46 | 44 | 83 | 79 |
| Resulting bit rate (kb/s) | 127 | 132 | 229 | 237 | | Resulting bit rate (kb/s) | 127 | 132 | 229 | 237 |
+--------------------------------+------+------+------+------+ +--------------------------------+------+------+------+------+
skipping to change at page 5, line 26 skipping to change at page 5, line 26
| | Mono | Joint Stereo| | | Mono | Joint Stereo|
| Sampling frequency (kHz) | 44.1 | 48 | 44.1 | 48 | | Sampling frequency (kHz) | 44.1 | 48 | 44.1 | 48 |
| Bitpool value | 31 | 29 | 53 | 51 | | Bitpool value | 31 | 29 | 53 | 51 |
| Resulting frame length (bytes) | 70 | 66 | 119 | 115 | | Resulting frame length (bytes) | 70 | 66 | 119 | 115 |
| Resulting bit rate (kb/s) | 193 | 198 | 328 | 345 | | Resulting bit rate (kb/s) | 193 | 198 | 328 | 345 |
+--------------------------------+------+------+------+------+ +--------------------------------+------+------+------+------+
+ Other settings: Block length = 16, loudness, subbands = 8 | + Other settings: Block length = 16, loudness, subbands = 8 |
+------------------------------------------------------------+ +------------------------------------------------------------+
Table 1: Recommended sets of SBC parameters in the SRC device as Table 1: Recommended sets of SBC parameters in the SRC device as
given in [A2DPV10] given in [A2DPV12]
The A2DP V1.0 specification describes a media payload format, which
we adopt in this document one-to-one without any change.
4. Usage Scenarios
As compared to many other encoding schemes, the SBC is general
enough to support multiple, quite diverse usage scenarios. Thus, it
might be required to change the behavior of the encoding and
transmission to achieve a good performance for a given usage
scenario. Thus, we enlist three main scenarios and describe their
quality requirements and their impact on the encoding and
transmission.
4.1. Scenario 1: Interconnection of A2DP devices
In this scenario it is intended to interconnect Bluetooth A2DP
devices. RTP frames generated by an A2DP device can be transmitted
directly via this RTP profile. Vice versa, an A2DP device should be
able to receive the RTP profile by default. Thus, the payload format
describe in this RFC MUST be fully interoperable with any A2DP
device.
The transmission between two A2DP devices has a constant frame rate
with a sender-controlled bit rate. It is not anticipated that the
transmission is adapted to congestion and bandwidth variation.
4.2. Scenario 2: High quality interactive audio transmissions
In the second scenario we consider a telephone call having a very
good audio quality at modest acoustic one-way latencies ranging from
50 and 150 ms [ITUG107], so that music can be listened over the
telephone while two persons talk together interactively.
In addition, the reliability of the audio transmission should be
high, even in cases of low and varying bandwidth.
This second scenario assumes that the SBC transmission is used on
top of a transport protocol that implements a congestion control
algorithm. Using the SBC encoding, the sampling, bit, and frame
rates should be controlled to cope with congestion. For example, if
the available transmission bandwidth is too low to allow SBC to
transmit audio at a high quality, the application can lower the
sampling, bit, or frame rate of the stream at the cost of higher
algorithmic delay or a degraded audio quality. In this case,
changing the sampling or frame rate may cause a short acoustic
artifact because SBC's internal filters must be reset.
The A2DP media format does not allow a dynamic change of the
encoding parameters beside the bit-pool value. The encoding
parameters can only be altered with the "Change Parameters"
procedure, which is defined in [GAVDPV12]. Such a change will cause
a hearable interruption and thus shall be avoided.
If an application using RTP wants to switch between different sets
of encoding parameters, then these set of parameter CAN be either
negotiate beforehand (as described in Section 7.2.) or an
renegotiation similar to the "Change Parameters" procedure CAN take
place. An application MUST NOT change the sampling frequency, block
length, encoding mode or the number of subbands within one RTP
session having the same RTP payload identifier.
4.3. Scenario 3: Ensembles performing over a network
In some usage scenarios, users want to act simultaneously and not
just interactively. For example, if persons sing in a chorus, if
musicians jam, or if e-sportsmen play computer games in a team
together, they need to acoustically communicate.
In these scenarios, the latency requirements are much harder than
for interactive usages. For example, if two musicians are placed
more than 10 meters apart, they can hardly keep synchronized.
Empirical studies [Gurevich2004] have shown that if ensembles
playing over networks, the optimal acoustic latency is around 11.5
ms with targeted range from 10 to 25 ms.
To fulfill such requirements, it might be necessary to further
reduce the algorithmic coding delay by varying the block length
parameter. The default value of the block length parameter is chosen
such that the coding efficiency is maximized. For example, at 44.1
kHz and using 8 subbands and a block length of 16, the algorithmic
delay is 4.72 ms (208 samples). The value of the block length
parameter can be decreased, at the expense of a higher bit rate or
lower quality, to lower the latency to fulfill the very stringent
latency requirements of this scenario.
Still, given the speed of light as the fundamental limit of speed of
information exchange, distributed ensembles can perform only
regionally if latency budget of 25 ms must keep. Typically, an
optical fiber has a refractive index of 1.46 and thus in an optical
fiber bits travel about 5136 km one-way in 25 ms.
5. Header Usage
The format of the RTP header is specified in [RFC3550]. The payload
format defined in this document uses the fields of the header in a
manner fully consistent with that specification.
marker (M): In accordance with [A2DPV10] the marker bit MUST be set
to zero.
payload type (PT): The assignment of an RTP payload type for this
packet format is outside the scope of the document, and
will not be specified here. It is expected that the RTP
profile under which this payload format is being used will
assign a payload type for this codec or specify that the
payload type is to be bound dynamically (see Section 6.2).
timestamp (TS): The RTP timestamp clock frequency MUST be the same 3.1. SBC Media Payload Format
as the sampling frequency, which has been negotiated for
the current RTP session (see Section 6.2). If a media
payload consists of multiple SBC frames, the TS of the
media packet header represents the TS of the first SBC
frame. The TS of the following SBC frames MUST be
calculated using the sampling rate and the number of
samples per frame per channel. A change in sampling
frequency MUST NOT occur within one media packet.
A SBC frame may be fragmented into multiple media packets
to reduce the packetisation delay. Then, all packets that
make up a fragmented SBC frame MUST use the same TS.
6. Payload Format The A2DP V1.2 specification describes a media payload format, which
is adopted in this document one-to-one without any change. In the
following, for the sake of clarity, the payload format definition is
repeated in the following.
The format of the payload MUST follow exactly the description given 3.2. SBC Fragmentation
in the appendix of [A2DPV10]. In the following, for the sake of
clarity, we repeat the payload format definition.
The payload MUST consist of one media payload format header The payload MUST consist of one media payload format header
described in Section 5.2 and SBC frames described in Section 5.3. described in Section 3.3 and SBC frames described in Section 3.4.
Either an integral number of SBC frames or one fragment of an SBC Either an integral number of SBC frames or one fragment of an SBC
frame can be transmitted: frame can be transmitted:
(a) When the payload contains an integral number of SBC frames (a) When the payload contains an integral number of SBC frames
+--------+-----------+----------- -+ +--------+-----------+----------- -+
| Header | SBC frame | SBC frame ... | | Header | SBC frame | SBC frame ... |
+--------+-----------+----------- -+ +--------+-----------+----------- -+
(b) When the SBC frame is fragmented (b) When the SBC frame is fragmented
+--------+---------------------------------------+ +--------+---------------------------------------+
| Header | First fragment of SBC frame | | Header | First fragment of SBC frame |
+--------+---------------------------------------+ +--------+---------------------------------------+
+--------+---------------------------------------+ +--------+---------------------------------------+
| Header | Subsequent fragments of the SBC frame | | Header | Subsequent fragments of the SBC frame |
+--------+---------------------------------------+ +--------+---------------------------------------+
A media payload always starts with an 8-bit header, which is placed A media payload always starts with an 8-bit header, which is placed
before the SBC data. before the SBC data.
The SBC frame can be fragmented across several media payloads. All An SBC frame can be fragmented across several media payloads. All
fragmented packets, except the last one, MUST have the same total fragmented packets, except the last one, MUST have the same total
data packet size. data packet size.
This payload fragmentation CAN be preferred against the This payload fragmentation CAN be preferred against the
fragmentation mechanisms of lower layers (e.g., IP) because the fragmentation mechanisms of lower layers (e.g., IP) because the
packetisation delay and thus the acoustic latency are reduced and packetisation delay and thus the acoustic latency are reduced and
the error robustness is increased because parts of the SBC frame can the error robustness is increased because parts of the SBC frame can
be considered for decoding. be considered for decoding.
6.1. Media payload format header 3.3. Media Payload Format Header
The following figure shows the format of media payload header, which The following figure shows the format of media payload header, which
consists of one byte. consists of one byte.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-+-+-+---+-+-+-+-+ +-+-+-+---+-+-+-+-+
|F|S|L|RFA|#frames| |F|S|L|RFA|#frames|
+-+-+-+---+-+-+-+-+ +-+-+-+---+-+-+-+-+
F bit - Set to 1 if the SBC frame is fragmented, otherwise set to 0. F bit - Set to 1 if the SBC frame is fragmented, otherwise set to 0.
skipping to change at page 9, line 39 skipping to change at page 7, line 13
RFA - SHOULD be zero, reserved for future addition. RFA - SHOULD be zero, reserved for future addition.
#frames (4 bits) - If the F bit is set to 0, this field indicates #frames (4 bits) - If the F bit is set to 0, this field indicates
the number of frames contained in this packet. If the F the number of frames contained in this packet. If the F
bit is set to 1, this field indicates the number of bit is set to 1, this field indicates the number of
remaining fragments, including the current fragment. Thus remaining fragments, including the current fragment. Thus
the last counter value MUST be one. For example, if there the last counter value MUST be one. For example, if there
are three fragments then the counter has value 3, 2 and 1 are three fragments then the counter has value 3, 2 and 1
for subsequent fragments. for subsequent fragments.
6.2. SBC Frame Structure 3.4. SBC Frame Structure
The complete SBC frame consists of a frame header, scale factors, An SBC frame consists of a frame header, scale factors, audio
audio samplings, and padding bits. The following diagram shows the samples, and padding bits. The following diagram shows the general
general SBC frame format layout: SBC frame format layout:
+--------------+---------------+---------------+---------+ +--------------+---------------+---------------+---------+
| frame_header | scale_factors | audio_samples | padding | | frame_header | scale_factors | audio_samples | padding |
+--------------+---------------+---------------+---------+ +--------------+---------------+---------------+---------+
The following sections describe the audio format, which consists of The following sections describe the audio format, which consists of
bits stored in a bandwidth-efficient, compact mode. bits stored in a bandwidth-efficient, compact mode.
6.3. Frame header 3.5. Frame Header
The frame header consists of fields defined in [A2DPV10], which are The frame header consists of fields defined in [A2DPV12], which are
SYNCWORD, SAMPLING_FREQUENCY, BLOCKS, CHANNEL_MODE, SYNCWORD, SAMPLING_FREQUENCY, BLOCKS, CHANNEL_MODE,
ALLOCATION_METHOD, SUBBANDS, BITPOOL, CRC_CHECK, optionally JOIN bit ALLOCATION_METHOD, SUBBANDS, BITPOOL, CRC_CHECK, optionally JOIN bit
fields and a RFA. The layout of the first four bytes of the frame fields and a RFA. The layout of the first four bytes of the frame
header is given in the following table. header is given in the following table.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SYNCWORD |SF.|BL.|CM.|A|S|BITPOOL |CRC_CHECK | | SYNCWORD |SF.|BL.|CM.|A|S|BITPOOL |CRC_CHECK |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 12, line 28 skipping to change at page 9, line 41
subbands for the STEREO and JOINT_STEREO channel modes. subbands for the STEREO and JOINT_STEREO channel modes.
The bitpool value MAY change from SBC frame to the next. The bitpool value MAY change from SBC frame to the next.
In addition, the bitpool value MUST be restricted such In addition, the bitpool value MUST be restricted such
that it does not result in excess of maximum bit rate, that it does not result in excess of maximum bit rate,
which is 320kb/s for mono and 512kb/s for two-channel which is 320kb/s for mono and 512kb/s for two-channel
modes. modes.
The remaining part of the header consists of CRC_CHECK, optionally The remaining part of the header consists of CRC_CHECK, optionally
JOIN bit fields and a RFA. JOIN bit fields and a RFA.
6.4. Remaining frame 3.6. Remaining Frame Part
The remaining part of the frame includes scale factors and audio The remaining part of the frame includes scale factors and audio
sample data, which are processed by the codec as described in sample data, which are processed by the codec as described in
[A2DPV10]. [A2DPV12].
4. Usage Scenarios
As compared to many other encoding schemes, the SBC codec is general
enough to support multiple, quite diverse usage scenarios. Thus, it
might be required to change the behavior of the encoding and
transmission to achieve a good performance for a given usage
scenario. Thus, three main scenarios are listed and their quality
requirements and impact on encoding and transmission are described.
4.1. Scenario 1: Interconnection of A2DP Devices
This scenario is intended for interconnecting Bluetooth A2DP
devices. RTP frames generated by an A2DP device can be transmitted
directly via this RTP profile. Vice versa, an A2DP device should be
able to receive the RTP profile by default. Thus, the payload format
describe in this RFC MUST be fully interoperable with any A2DP
device.
The transmission between two A2DP devices has a constant frame rate
with a sender-controlled bit rate. It is not anticipated that the
transmission is adapted to congestion and bandwidth variation.
4.2. Scenario 2: High Quality Interactive Audio Transmissions
In the second scenario a telephone call is considered having a very
good audio quality at modest acoustic one-way latencies ranging from
50 and 150 ms [ITUG107], so that music can be listened over the
telephone while two persons talk together interactively.
In addition, the reliability of the audio transmission should be
high, even in cases of low and varying bandwidth.
This second scenario assumes that the SBC transmission is used on
top of a transport protocol that implements a congestion control
algorithm. Using the SBC encoding, the sampling, bit, and frame
rates should be controlled to cope with congestion. For example, if
the available transmission bandwidth is too low to allow SBC to
transmit audio at a high quality, the application can lower the
sampling, bit, or frame rate of the stream at the cost of higher
algorithmic delay or a degraded audio quality. In this case,
changing the sampling or frame rate may cause a short acoustic
artifact because SBC's internal filters must be reset.
The A2DP media format does not allow a dynamic change of the
encoding parameters beside the bit-pool value. The encoding
parameters can only be altered with the "Change Parameters"
procedure, which is defined in [GAVDPV12]. Such a change will cause
a hearable interruption and thus shall be avoided.
If an application using RTP wants to switch between different sets
of encoding parameters, then these set of parameter CAN be either
negotiate beforehand (as described in Section 7.2.) or an
renegotiation similar to the "Change Parameters" procedure CAN take
place. An application MUST NOT change the sampling frequency, block
length, encoding mode or the number of subbands within one RTP
session having the same RTP payload identifier.
4.3. Scenario 3: Ensembles performing over a Network
In some usage scenarios, users want to act simultaneously and not
just interactively. For example, if persons sing in a chorus, if
musicians jam, or if e-sportsmen play computer games in a team
together, they need to acoustically communicate.
In these scenarios, the latency requirements are much harder than
for interactive usages. For example, if two musicians are placed
more than 10 meters apart, they can hardly keep synchronized.
Empirical studies [Gurevich2004] have shown that if ensembles
playing over networks, the optimal acoustic latency is around 11.5
ms with targeted range from 10 to 25 ms.
To fulfill such requirements, it might be necessary to further
reduce the algorithmic coding delay by varying the block length
parameter. The default value of the block length parameter is chosen
such that the coding efficiency is maximized. For example, at 44.1
kHz and using 8 subbands and a block length of 16, the algorithmic
delay is 4.72 ms (208 samples). The value of the block length
parameter can be decreased, at the expense of a higher bit rate or
lower quality, to lower the latency to fulfill the very stringent
latency requirements of this scenario.
Still, given the speed of light as the fundamental limit of speed of
information exchange, distributed ensembles can perform only
regionally if latency budget of 25 ms must keep. Typically, an
optical fiber has a refractive index of 1.46 and thus in an optical
fiber bits travel about 5136 km one-way in 25 ms.
5. Header Usage
The format of the RTP header is specified in [RFC3550]. The payload
format defined in this document uses the fields of the header in a
manner fully consistent with that specification.
marker (M): In accordance with [A2DPV12] the marker bit MUST be set
to zero.
payload type (PT): The assignment of an RTP payload type for this
packet format is outside the scope of the document, and
will not be specified here. It is expected that the RTP
profile under which this payload format is being used will
assign a payload type for this codec or specify that the
payload type is to be bound dynamically (see Section 6.2).
timestamp (TS): The RTP timestamp clock frequency MUST be the same
as the sampling frequency, which has been negotiated for
the current RTP session (see Section 6.2). If a media
payload consists of multiple SBC frames, the TS of the
media packet header represents the TS of the first SBC
frame. The TS of the following SBC frames MUST be
calculated using the sampling rate and the number of
samples per frame per channel. A change in sampling
frequency MUST NOT occur within one media packet.
A SBC frame may be fragmented into multiple media packets
to reduce the packetisation delay. Then, all packets that
make up a fragmented SBC frame MUST use the same TS.
6. Payload Format
The format of the payload MUST follow exactly the description given
in appendix B of [A2DPV12] and which is repeated in Section 3. If
appendix B of [A2DPV12] and the description in Section 3 differ, the
former standard is normative.
If the payload format parameters have been negotiated and a
restricted set of encoding and decoding modes have been selected,
than any SBC frame that describes a coding mode that has not been
chosen MUST be ignored.
7. Payload Format Parameters 7. Payload Format Parameters
This section defines the parameters that MAY be used to configure This section defines the parameters that MAY be used to configure
optional features in the SBC payload format over RTP transmission. optional features in the SBC payload format over RTP transmission.
The parameters are defined here as part of the media subtype The parameters are defined here as part of the media subtype
registrations for the SBC. A mapping of the parameters into the registrations for the SBC codec. A mapping of the parameters into
Session Description Protocol (SDP) [RFC4566] is also provided for the Session Description Protocol (SDP) [RFC4566] is also provided
those applications that use SDP. In control protocols that do not for those applications that use SDP. In control protocols that do
use MIME or SDP, the media type parameters must be mapped to the not use MIME or SDP, the media type parameters must be mapped to the
appropriate format used with that control protocol. appropriate format used with that control protocol.
7.1. SBC Media Type Registration 7.1. Media Type Registration for SBC
[Note to RFC Editor: Please replace all occurrences of RFC XXXX by [Note to RFC Editor: Please replace all occurrences of RFC XXXX by
the RFC number assigned to this document] the RFC number assigned to this document]
This registration is done using the template defined in [RFC4288] This registration is done using the template defined in [RFC4288]
and following [RFC4855]. and following [RFC4855].
MIME media type name: audio Media type name: audio
MIME subtype name: SBC Subtype name: SBC
Required parameters: none Required parameters:
Rate: The RTP timestamp clock rate. See Section 5 for usage
details.
Optional parameters: Optional parameters:
Capabilities: The capabilities of the encoder and decoder are Channels: Specifies the number of audio channels: 2 for stereo
described by a parameter string that MUST start with an (refer to RFC 4566 [RFC4566]) and 1 for mono,
octet written as two hexadecimal digits. This octet is accordingly the SBC channel mode. If one channel is
called VERSION and MUST be identical to the SYNCWORD that used, this parameter can be omitted.
will be used in the SBC frames. It is used to distinguish
different negotiation procedures. Capabilities: The capabilities of the encoder and decoder are
The interpretation of the following characters depends on described by a parameter string that MUST start with an
the value of the VERSION octet. Refer to Section 7.1.1. octet written as two hexadecimal digits. This octet is
and Section 7.1.2. to find a description. called VERSION and MUST be identical to the SYNCWORD
that will be used in the SBC frames. It is used to
distinguish different negotiation procedures.
The interpretation of the following characters depends
on the value of the VERSION octet. Refer to Section
7.1.1. and Section 7.1.2. to find a description.
Encoding considerations: This media type is framed and contains Encoding considerations: This media type is framed and contains
binary data; see Section 4.8 of RFC 4288. binary data; see Section 4.8 of RFC 4288.
Security considerations: See Section 9 of RFC XXXX Security considerations: See Section 9 of RFC XXXX
Interoperability considerations: none Interoperability considerations: none
Published specification: RFC XXXX Published specification: RFC XXXX
Applications which use this media type: Audio and video conferencing Applications which use this media type: Audio and video conferencing
tools, distributed orchestras tools, distributed orchestras
Additional information: none Additional information: none
Person & email address to contact for further information: Christian Person & email address to contact for further information:
Hoene, hoene@uni-tuebingen.org See Authors' Addresses at the end of RFC XXXX
Intended usage: COMMON Intended usage: COMMON
Restrictions on usage: none Restrictions on usage: none
Author: Christian Hoene, Frans de Bont Author: See Authors' Addresses at the end of RFC XXXX
Change controller: IETF Audio/Video Transport working group Change controller: IETF Audio/Video Transport Payloads working group
delegated from the IESG delegated from the IESG
7.1.1. Capabilities: A2DP modes 7.1.1. Capabilities: A2DP Modes
The capabilities of the encoder and decoder MUST start with the The capabilities of the encoder and decoder MUST start with the
hexadecimal value of 9C, followed by a comma and four comma- hexadecimal value of 9C, followed by a comma and four comma-
separated hexadecimal octets. These four octets called Octet 1, 2, separated hexadecimal octets. These four octets called Octet 1, 2,
3, and 4 share a similar meaning as those defined in Section 4.3.2 3, and 4 share a similar meaning as those defined in Section 4.3.2
of [A2DPV10]. However, because sampling frequency and number of of [A2DPV12]. However, because sampling frequency and number of
channels are already given in the SDP parameter "a=rtpmap", bit 0 up channels are already given in the SDP parameter "a=rtpmap", bit 0 up
to and including bit 3 of Octet 1 MUST BE ignored if received. The to and including bit 3 of Octet 1 MUST BE ignored if received. The
meaning of the bits and the octets are described in the following meaning of the bits and the octets are described in the following
enumeration. The bit numbering follows the network bit order having enumeration. The bit numbering follows the network bit order having
the highest bit first. the highest bit first.
o Octet 1: Bit 0 (aka 2^7): If one, then the sampling frequency o Octet 1: Bit 0 (aka 2^7): If one, then the sampling frequency
16000 Hz is supported (ignored during SDP negotiations but SHOULD 16000 Hz is supported (ignored during SDP negotiations but SHOULD
be set if the clock rate is 16000 and CAN be cleared otherwise). be set if the clock rate is 16000 and MUST be cleared otherwise).
o Octet 1: Bit 1: If one, then the sampling frequency 32000 Hz is o Octet 1: Bit 1: If one, then the sampling frequency 32000 Hz is
supported (ignored during SDP negotiations but SHOULD be set if supported (ignored during SDP negotiations but SHOULD be set if
the clock rate is 32000 and CAN be cleared otherwise). the clock rate is 32000 and MUST be cleared otherwise).
o Octet 1: Bit 2: If one, then the sampling frequency 44100 Hz is o Octet 1: Bit 2: If one, then the sampling frequency 44100 Hz is
supported (ignored during SDP negotiations but SHOULD be set if supported (ignored during SDP negotiations but SHOULD be set if
the clock rate is 44100 and CAN be cleared otherwise). the clock rate is 44100 and MUST be cleared otherwise).
o Octet 1: Bit 3: If one, then the sampling frequency 48000 Hz is o Octet 1: Bit 3: If one, then the sampling frequency 48000 Hz is
supported (ignored during SDP negotiations but SHOULD be set if supported (ignored during SDP negotiations but SHOULD be set if
the clock rate is 48000 and CAN be cleared otherwise). the clock rate is 48000 and MUST be cleared otherwise).
o Octet 1: Bit 4: If one, then the channel mode MONO is supported o Octet 1: Bit 4: If one, then the channel mode MONO is supported
(ignored during SDP negotiations but SHOULD be set if the number (ignored during SDP negotiations but SHOULD be set if the number
of channels is one and CAN be cleared otherwise). of channels is one and MUST be cleared otherwise).
o Octet 1: Bit 5: If one, then the channel mode DUAL_CHANNEL is o Octet 1: Bit 5: If one, then the channel mode DUAL_CHANNEL is
supported (*). supported (*).
o Octet 1: Bit 6: If one, then the channel mode STEREO is supported o Octet 1: Bit 6: If one, then the channel mode STEREO is supported
(*). (*).
o Octet 1: Bit 7 (aka 2^0): If one, then the channel mode o Octet 1: Bit 7 (aka 2^0): If one, then the channel mode
JOINT_STEREO is supported (*). JOINT_STEREO is supported (*).
skipping to change at page 15, line 29 skipping to change at page 15, line 42
o Octet 3: Unsigned integer: The minimal bit-pool value that the o Octet 3: Unsigned integer: The minimal bit-pool value that the
device supports. MUST be larger or equal than 2 and less or equal device supports. MUST be larger or equal than 2 and less or equal
than the maximal bit-pool value. than the maximal bit-pool value.
o Octet 4: Unsigned integer: The maximal bit-pool value that the o Octet 4: Unsigned integer: The maximal bit-pool value that the
device supports MUST be equal or lower than 250. device supports MUST be equal or lower than 250.
(*) At least one of the bits 5, 6 or 7 of Octet 1 MUST be set if the (*) At least one of the bits 5, 6 or 7 of Octet 1 MUST be set if the
number of channels is set to two in the SDP parameter "a=rtpmap". number of channels is set to two in the SDP parameter "a=rtpmap".
7.1.2. Capabilities: other modes 7.1.2. Capabilities: Other Modes
If the value of the VERSION octet is not equal to a known SYNCWORD If the value of the VERSION octet is not equal to a known SYNCWORD
value, then the capabilities MUST be ignored. value, then the capabilities MUST be ignored.
7.2. Mapping to SDP Parameters 7.2. Mapping to SDP Parameters
The information carried in the media type specification has a The information carried in the media type specification has a
specific mapping to fields in the Session Description Protocol (SDP) specific mapping to fields in the Session Description Protocol (SDP)
[RFC4566], which is commonly used to describe RTP sessions. When SDP [RFC4566], which is commonly used to describe RTP sessions. When SDP
is used to specify sessions employing the SBC codec, the mapping is is used to specify sessions employing the SBC codec, the mapping is
as follows: as follows:
o The media type ("audio") goes in SDP "m=" as the media name. o The media type ("audio") goes in SDP "m=" as the media name.
o The media subtype ("SBC") goes in SDP "a=rtpmap" as the encoding o The media subtype ("SBC") goes in SDP "a=rtpmap" as the encoding
name. name.
o The RTP <clock rate> in "a=rtpmap" MUST be set to the selected o The required parameter "rate" goes in SDP "a=rtpmap" as the RTP
sampling frequency. <clock rate>.
o The RTP <encoding parameters> in "a=rtpmap" specifies the number o The optional parameter "channels", if present, goes in SDP as the
of audio channels: 2 for stereo material (refer to RFC 4566 "a=rtpmap" RTP <encoding parameters>.
[RFC4566]) and 1 for mono. If one channel is used, the encoding
parameter can be omitted.
o The parameter "capabilities" goes in the SDP "a=fmtp" by the o The optional parameter "capabilities", if present, goes in the SDP
capabilities description as described in Section 7.1. "a=fmtp" by the capabilities description as described in Section
7.1.
7.2.1. Offer-Answer Model Considerations 7.2.1. Offer-Answer Model Considerations
The Bluetooth standard document [AVDTPV12] describes how an A2DP The Bluetooth standard document [AVDTPV12] describes how an A2DP
source and an A2DP sink negotiate their capabilities. Prior to the source and an A2DP sink negotiate their capabilities. Prior to the
establishment of the audio stream, one A2DP device can query the establishment of the audio stream, one A2DP device can query the
service capabilities of the other device using the "Get Capabilities service capabilities of the other device using the "Get Capabilities
Procedure". In any case, the coding mode is set using the "Set Procedure". In any case, the coding mode is set using the "Set
Configuration" procedure. Only after a successful configuration, the Configuration" procedure. Only after a successful configuration, the
stream connection can be established. stream connection can be established.
skipping to change at page 16, line 43 skipping to change at page 17, line 7
by the declaring entity. If the capabilities were supplied in the by the declaring entity. If the capabilities were supplied in the
offer, the answerer MUST return either the same mode-set or a offer, the answerer MUST return either the same mode-set or a
subset of this mode-set. If no capabilities were supplied in the subset of this mode-set. If no capabilities were supplied in the
offer, the answerer MAY return capabilities to restrict the offer, the answerer MAY return capabilities to restrict the
possible modes. In any case, the capabilities in the answer then possible modes. In any case, the capabilities in the answer then
apply for both offerer and answerer. The offerer MUST NOT send apply for both offerer and answerer. The offerer MUST NOT send
frames of a mode that has been removed by the answerer. The frames of a mode that has been removed by the answerer. The
negotiation is finished if the offerer and the answerer have negotiation is finished if the offerer and the answerer have
agreed upon explicit capabilities for each payload type number. agreed upon explicit capabilities for each payload type number.
The number of blocks and subbands and the kind of allocation The number of blocks and subbands and the kind of allocation
method and channel mode MUST haven been negotiated unambiguously. method and channel mode MUST have been negotiated unambiguously.
o Any unknown parameter in an offer MUST be ignored by the receiver o Any unknown parameter in an offer MUST be ignored by the receiver
and MUST NOT be included in the answer. and MUST NOT be included in the answer.
Below are some example parts of SDP offer-answer exchanges. Below are some example parts of SDP offer-answer exchanges.
o Example 1 o Example 1
Offer: SBC all A2DP modes Offer: SBC all A2DP modes
m=audio 54874 RTP/AVP 96 m=audio 54874 RTP/AVP 96
a=rtpmap:96 SBC/48000/2 a=rtpmap:96 SBC/48000/2
skipping to change at page 19, line 19 skipping to change at page 19, line 30
5. If the packet loss rate is very high, the session shall be 5. If the packet loss rate is very high, the session shall be
terminated because the quality of the audio transmission is too terminated because the quality of the audio transmission is too
bad to be useful [Widmer2002]. bad to be useful [Widmer2002].
Because the SBC encoding can be tuned with many parameters, it is Because the SBC encoding can be tuned with many parameters, it is
especially useful for rate adaptive transport protocols such as DCCP especially useful for rate adaptive transport protocols such as DCCP
[RFC4340] or TCP [RFC4571]. The report [Hoene2009] describes, which [RFC4340] or TCP [RFC4571]. The report [Hoene2009] describes, which
SBC coding mode gives the best speech and audio quality under known SBC coding mode gives the best speech and audio quality under known
bandwidth and time constrains. bandwidth and time constrains.
9. Packet loss concealment 9. Packet Loss Concealment
In order to cope with packet losses, the SBC decoder SHOULD be In order to cope with packet losses, the SBC decoder SHOULD be
extended by a packet loss concealment algorithm. The packet loss extended by a packet loss concealment algorithm. The packet loss
concealment algorithm SHOULD provide a good audio quality in case of concealment algorithm SHOULD provide a good audio quality in case of
losses. Otherwise, the congestion control algorithm can not trade losses. Otherwise, the congestion control algorithm can not trade
off well the quality impairment due to packet losses versus the off well the quality impairment due to packet losses versus the
quality impairment caused by different encoding modes. It is quality impairment caused by different encoding modes. It is
RECOMMENDED that at a least the reserve order replicated pitch RECOMMENDED that at a least the reserve order replicated pitch
periods (RORPP) algorithm as defined in [Hoene2009] or any better is periods (RORPP) algorithm as defined in [Hoene2009] or any better is
used. used.
skipping to change at page 21, line 9 skipping to change at page 21, line 9
11. IANA Considerations 11. IANA Considerations
It is requested that one new media subtype (audio/SBC) and one It is requested that one new media subtype (audio/SBC) and one
optional parameter for this media subtype ("capabilities") are optional parameter for this media subtype ("capabilities") are
registered by IANA, see Section 5.1 and Section 5.2. registered by IANA, see Section 5.1 and Section 5.2.
12. References 12. References
12.1. Normative References 12.1. Normative References
[A2DPV10] Bluetooth SIG, "Advanced Audio Distribution Profile", [A2DPV12] Bluetooth SIG, "Advanced Audio Distribution Profile",
Audio Video WG, adopted specification, revision V1.0, May Audio Video WG, adopted specification, revision V1.2,
22th, 2003. April 16th, 2007.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3264] Rosenberg, J. and Schulzrinne, H., "An Offer/Answer [RFC3264] Rosenberg, J. and Schulzrinne, H., "An Offer/Answer
Modelwith Session Description Protocol (SDP)", RFC 3264, Modelwith Session Description Protocol (SDP)", RFC 3264,
June 2002. June 2002.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time Jacobson, "RTP: A Transport Protocol for Real-Time
skipping to change at page 24, line 19 skipping to change at page 24, line 19
Wilhelm-Schickard-Institute Wilhelm-Schickard-Institute
Sand 13 Sand 13
72076 Tuebingen 72076 Tuebingen
DE DE
Phone: +49 7071 29 70532 Phone: +49 7071 29 70532
Email: hoene@uni-tuebingen.de Email: hoene@uni-tuebingen.de
Frans de Bont Frans de Bont
Philips Electronics Philips Electronics
High Tech Campus 5 High Tech Campus 36
5656 AE Eindhoven 5656 AE Eindhoven
NL NL
Phone: +31 40 2740234 Phone: +31 40 2740234
Email: frans.de.bont@philips.com Email: frans.de.bont@philips.com
 End of changes. 51 change blocks. 
231 lines changed or deleted 256 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/