[Docs] [txt|pdf] [Tracker] [Email] [Nits]

Versions: 00 draft-ietf-avt-rtp-g719

Network Working Group                                      M. Westerlund
Internet-Draft                                              I. Johansson
Intended status: Standards Track                             Ericsson AB
Expires: December 18, 2008                                 June 16, 2008

                      RTP Payload format for G.719

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on December 18, 2008.


   This document specifies the payload format for packetization of the
   G.719 full-band codec encoded audio signals into the Real-time
   Transport Protocol (RTP).  The payload format supports transmission
   of multiple channels, multiple frames per payload, and interleaving.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC 2119 [RFC2119].

Westerlund & Johansson  Expires December 18, 2008               [Page 1]

Internet-Draft        RTP Payload format for G.719             June 2008

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Conventions, Definitions and Acronyms  . . . . . . . . . . . .  3
   3.  G.719 Description  . . . . . . . . . . . . . . . . . . . . . .  3
   4.  Payload format Capabilities  . . . . . . . . . . . . . . . . .  4
     4.1.  Multi-rate Encoding and Rate Adaptation  . . . . . . . . .  4
     4.2.  Support for Multi-Channel Sessions . . . . . . . . . . . .  4
     4.3.  Robustness against Packet Loss . . . . . . . . . . . . . .  5
       4.3.1.  Use of Forward Error Correction (FEC)  . . . . . . . .  5
       4.3.2.  Use of Frame Interleaving  . . . . . . . . . . . . . .  6
   5.  Payload format . . . . . . . . . . . . . . . . . . . . . . . .  7
     5.1.  RTP Header Usage . . . . . . . . . . . . . . . . . . . . .  8
     5.2.  Payload Structure  . . . . . . . . . . . . . . . . . . . .  8
       5.2.1.  Basic ToC element  . . . . . . . . . . . . . . . . . .  8
     5.3.  Basic mode . . . . . . . . . . . . . . . . . . . . . . . . 10
     5.4.  Interleaved mode . . . . . . . . . . . . . . . . . . . . . 10
     5.5.  Audio Data . . . . . . . . . . . . . . . . . . . . . . . . 11
     5.6.  Implementation Considerations  . . . . . . . . . . . . . . 11
       5.6.1.  Receiving Redundant Frames . . . . . . . . . . . . . . 11
       5.6.2.  Interleaving . . . . . . . . . . . . . . . . . . . . . 11
       5.6.3.  Decoding Validation  . . . . . . . . . . . . . . . . . 13
   6.  Payload Examples . . . . . . . . . . . . . . . . . . . . . . . 13
     6.1.  3 mono frames with 2 different bitrates  . . . . . . . . . 13
     6.2.  2 stereo frame-blocks of the same bitrate  . . . . . . . . 14
     6.3.  4 mono frames interleaved  . . . . . . . . . . . . . . . . 14
   7.  Payload Format Parameters  . . . . . . . . . . . . . . . . . . 15
     7.1.  Media Type Definition  . . . . . . . . . . . . . . . . . . 15
     7.2.  Mapping to SDP . . . . . . . . . . . . . . . . . . . . . . 17
       7.2.1.  Offer/Answer Considerations  . . . . . . . . . . . . . 18
       7.2.2.  Declarative SDP Considerations . . . . . . . . . . . . 19
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 20
   9.  Congestion Control . . . . . . . . . . . . . . . . . . . . . . 20
   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 20
     10.1. Confidentiality  . . . . . . . . . . . . . . . . . . . . . 20
     10.2. Authentication and Integrity . . . . . . . . . . . . . . . 21
   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21
   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21
     12.1. Informative References . . . . . . . . . . . . . . . . . . 21
     12.2. Normative References . . . . . . . . . . . . . . . . . . . 22
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22
   Intellectual Property and Copyright Statements . . . . . . . . . . 24

Westerlund & Johansson  Expires December 18, 2008               [Page 2]

Internet-Draft        RTP Payload format for G.719             June 2008

1.  Introduction

   This document specifies the payload format for packetization of the
   G.719 719 full-band(FB) codec encoded audio signals into the Real-
   time Transport Protocol (RTP) [RFC3550].  The payload format supports
   transmission of multiple channels, multiple frames per payload,
   packet loss robustness methods using redundancy or interleaving.

   This document starts with conventions, a brief description of the
   codec, and the payload formats capabilities.  The payload format is
   specified in Section 5.  Examples can be found in Section 6.  The
   media type and its mappings to SDP, usage in SDP offer/answer is then
   specified.  The document ends with considerations around congestion
   control and security.

2.  Conventions, Definitions and Acronyms

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC 2119.

   The term "frame-block" is used in this document to describe the time-
   synchronized set of audio frames in a multi-channel audio session.
   In particular, in an N-channel session, a frame-block will contain N
   audio frames, one from each of the channels, and all N speech frames
   represents exactly the same time period.

3.  G.719 Description

   The ITU-T G.719 full-band codec is a transform coder based on
   Modulated Lapped Transform (MLT) .  G.719 is a low complexity full
   bandwidth codec for conversational speech and audio coding.  The
   encoder input and decoder output are sampled at 48 kHz.  The codec
   enables full bandwidth, from 20 Hz to 20 kHz, encoding of speech,
   music and general audio content at rates from 32 kbit/s up to 128
   kbit/s.  The codec operates on 20ms frames and has an algorithmic
   delay of 40 ms.

   The codec provides excellent quality for speech, music and other
   types of audio.  Some of the applications for which this coder is
   suitable are:

   o  Real-time communications such as video conferencing and telephony.

   o  Streaming audio

Westerlund & Johansson  Expires December 18, 2008               [Page 3]

Internet-Draft        RTP Payload format for G.719             June 2008

   o  Archival and messaging

   The encoding and decoding algorithm can change the bit rate at any
   20ms frame boundary.  The encoder receives the audio sampled at
   48kHz.  The support of other sampling rates is possible by re-
   sampling the input signal to the codec's sampling rate, i.e. 48kHz,
   however, this functionality is not part of the standard.

   The encoding is performed on equally sized frames.  For each frame,
   the encoder decides on two encoding modes, a transient mode and a
   stationary mode.  The decision is based on statistics derived from
   the input signal.  The stationary mode uses a long MLT that leads to
   a spectrum of 960 coefficients while the transient encoding mode uses
   a short MLT (higher time resolution transform) which results in 4
   spectra (4 x 240 = 960 coefficients).  The encoding of the spectrum
   is done in two steps.  First, the spectral envelope is computed,
   quantized and Huffman encoded.  The envelope is computed on a non-
   uniform frequency subdivision.  From the coded spectral envelope, a
   weighted spectral envelope is derived and is used for bit-allocation,
   this process is also repeated at the decoder, thus only the spectral
   envelope is transmitted.  The output of the bit-allocation is used in
   order to quantize the spectra.  In addition, for stationary frames
   the encoder estimates the amount of noise level.  The decoder applies
   the reverse operation upon reception of the bitstream.  The non-coded
   coefficients (i.e. no bits allocated) are replaced by entries of a
   noise codebook which is built based on the decoded coefficients.

4.  Payload format Capabilities

   This payload format have a number of capabilities and this section
   discuss them in some detail.

4.1.  Multi-rate Encoding and Rate Adaptation

   G.719 supports multi-rate encoding capability that enables on a per
   frame basis variation of the encoding rate.  This enables support for
   bit-rate adaptation and congestion control.  The possibility to
   aggregate multiple audio frames into a single RTP payload is another
   dimension of adaptation.  The RTP and payload format overhead can
   thus be reduced by the aggregation at the cost of increased delay and
   reduced packet-loss robustness.

4.2.  Support for Multi-Channel Sessions

   The RTP payload format defined in this document supports multi-
   channel audio content (e.g. stereophonic or surround audio sessions).
   Although the G.719 codec itself does not support encoding of multi-

Westerlund & Johansson  Expires December 18, 2008               [Page 4]

Internet-Draft        RTP Payload format for G.719             June 2008

   channel audio content into a single bit stream, it can be used to
   separately encode and decode each of the individual channels.  To
   transport (or store) the separately encoded multi-channel content,
   the audio frames for all channels that are framed and encoded for the
   same 20 ms period are logically collected in a "frame-block".

   At the session setup, out-of-band signaling must be used to indicate
   the number of channels in the session, and the order of the audio
   frames from different channels in each frame-block.  When using SDP
   for signaling, the number of channels is specified in the rtpmap
   attribute and the order of channels carried in each frame-block is
   implied by the number of channels as specified in Section 4.1 in

4.3.  Robustness against Packet Loss

   The payload format supports several means, including forward error
   correction (FEC) and frame interleaving, to increase robustness
   against packet loss.

4.3.1.  Use of Forward Error Correction (FEC)

   Generic forward error correction within RTP is defined, for example,
   in RFC 5109 [RFC5109].  Audio redundancy coding is defined in RFC
   2198 [RFC2198].  Either scheme can be used to add redundant
   information to the RTP packet stream and make it more resilient to
   packet losses, at the expense of a higher bit rate.  Please see
   either RFC for a discussion of the implications of the higher bit
   rate to network congestion.

   In addition to these media-unaware mechanisms, this memo specifies an
   optional G.719 specific form of audio redundancy coding, which may be
   beneficial in terms of packetization overhead.  Conceptually,
   previously transmitted transport frames are aggregated together with
   new ones.  A sliding window is used to group the frames to be sent in
   each payload.  Figure 1 below shows an example.
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |

      <---- p(n-1) ---->
               <----- p(n) ----->
                        <---- p(n+1) ---->
                                 <---- p(n+2) ---->
                                          <---- p(n+3) ---->
                                                   <---- p(n+4) ---->

              Figure 1: An example of redundant transmission

Westerlund & Johansson  Expires December 18, 2008               [Page 5]

Internet-Draft        RTP Payload format for G.719             June 2008

   Here, each frame is retransmitted once in the following RTP payload
   packet.  F(n-2)...f(n+4) denote a sequence of audio frames, and p(n-
   1)...p(n+4) a sequence of payload packets.

   The mechanism described does not require signaling at the session
   setup.  In other words, the audio sender can choose to use this
   scheme without consulting the receiver.  For a certain timestamp, the
   receiver may receive multiple copies of a frame containing encoded
   audio data, even at different encoding rates.  The cost of this
   scheme is bandwidth and the receiver delay necessary to allow the
   redundant copy to arrive.

   This redundancy scheme provides a functionality similar to the one
   described in RFC 2198, but it works only if both original frames and
   redundant representations are G.719 frames.  When the use of other
   media coding schemes is desirable, one has to resort to RFC 2198.

   The sender is responsible for selecting an appropriate amount of
   redundancy based on feedback about the channel conditions, e.g., in
   the RTP Control Protocol (RTCP) [RFC3550] receiver reports.  The
   sender is also responsible for avoiding congestion, which may be
   exacerbated by redundancy (see Section 9 for more details).

4.3.2.  Use of Frame Interleaving

   To decrease protocol overhead, the payload design allows several
   audio transport frames to be encapsulated into a single RTP packet.
   One of the drawbacks of such an approach is that in case of packet
   loss several consecutive frames are lost.  Consecutive frame loss
   normally renders error concealment less efficient and usually causes
   clearly audible and annoying distortions in the reconstructed audio.
   Interleaving of transport frames can improve the audio quality in
   such cases by distributing the consecutive losses into a number of
   isolated frame losses, which are easier to conceal.  However,
   interleaving and bundling several frames per payload also increases
   end-to-end delay and sets higher buffering requirements.  Therefore,
   interleaving is not appropriate for all use cases or devices.
   Streaming applications should most likely be able to exploit
   interleaving to improve audio quality in lossy transmission

   Note that this payload design supports the use of frame interleaving
   as an option.  The usage of this feature needs to be negotiated in
   the session setup.

   The interleaving supported by this format is rather flexible.  For
   example, a continuous pattern can be defined, as depicted in
   Figure 2.

Westerlund & Johansson  Expires December 18, 2008               [Page 6]

Internet-Draft        RTP Payload format for G.719             June 2008

     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |

              [ P(n)   ]
     [ P(n+1) ]                 [ P(n+1) ]
                       [ P(n+2) ]                 [ P(n+2) ]
                                         [ P(n+3) ]
                                                           [ P(n+4) ]

   Figure 2: An example of interleaving pattern that has constant delay

   In Figure 2 the consecutive frames, denoted f(n-2) to f(n+4), are
   aggregated into packets P(n) to P(n+4), each packet carrying two
   frames.  This approach provides an interleaving pattern that allows
   for constant delay in both the interleaving and de-interleaving
   processes.  The de-interleaving buffer needs to have room for at
   least three frames, including the one that is ready to be consumed.
   The storage space for three frames is needed, for example, when f(n)
   is the next frame to be decoded: since frame f(n) was received in
   packet P(n+2), which also carried frame f(n+3), both these frames are
   stored in the buffer.  Furthermore, frame f(n+1) received in the
   previous packet, P(n+1), is also in the de-interleaving buffer.  Note
   also that in this example the buffer occupancy varies: when frame
   f(n+1) is the next one to be decoded, there are only two frames,
   f(n+1) and f(n+3), in the buffer.

5.  Payload format

   The main purpose of the payload design for G.719 is to maximize the
   potential of the codec to its fullest degree with an as minimal
   overhead as possible.  In the design both basic and interleaved modes
   have been included as the codec is suitable both for conversational
   and other low delay applications as well as streaming, where more
   delay is acceptable.

   The main structural difference between the basic and interleaved
   modes is the extension of the table of content entries with frame
   displacement fields in the interleaved mode.  The basic mode supports
   aggregation of multiple consecutive frames in a payload.  The
   interleaved mode supports aggregation of multiple frames that are
   non-consecutive in time.  In both modes it is possible to have frames
   encoded with different frame types in the same payload.

   The payload format also supports the usage of G.719 for carrying
   multi-channel content using one discrete encoder per channel all
   using the same bit-rate.  In this case a complete frame-block with

Westerlund & Johansson  Expires December 18, 2008               [Page 7]

Internet-Draft        RTP Payload format for G.719             June 2008

   data from all channels are included in the RTP payload.  The data is
   the concatenation of all the encoded audio frames in the order
   specified for that number of included channels.  Also interleaving is
   done on complete frame-blocks rather than individual audio frames.

5.1.  RTP Header Usage

   The RTP timestamp corresponds to the sampling instant of the first
   sample encoded for the first frame-block in the packet.  The
   timestamp clock frequency SHALL be 48000 Hz.  The timestamp is thus
   also used to recover the correct decoding order of the frame-blocks.

   The RTP header marker bit (M) SHALL be set to 1 whenever the first
   frame-block carried in the packet is the first frame-block in a
   talkspurt (see definition of the talkspurt in section 4.1 of
   [RFC3551]).  For all other packets the marker bit SHALL be set to
   zero (M=0).

   The assignment of an RTP payload type for the format defined in this
   memo is outside the scope of this document.  The RTP profiles in use
   currently mandates binding the payload type dynamically for this
   payload format.  This is basically necessary due to that the payload
   type expresses the configuration of the payload itself, i.e. basic or
   interleaved mode and the number of channels carried.

   The remaining RTP header fields are used as specified in RFC 3550

5.2.  Payload Structure

   The payload consists of one or more table of contents (ToC) entires
   followed by the audio data corresponding to the ToC entries.  The
   following sections describe both the basic mode and the interleaved
   mode.  Each ToC entry MUST be padded to a byte boundary to ensure
   octet alignment.  The rules regarding maximum payload size given in
   [I-D.ietf-tsvwg-udp-guidelines] SHOULD be followed.

5.2.1.  Basic ToC element

   All the different formats and modes in this draft use a common basic
   ToC which may be extended in the different options described below.

    0 1 2 3 4 5 6 7
   |F|    L    |R|R|

                        Figure 3: Basic TOC element

Westerlund & Johansson  Expires December 18, 2008               [Page 8]

Internet-Draft        RTP Payload format for G.719             June 2008

   F (1 bit):  If set to 1, indicates that this ToC entry is followed by
      another ToC entry; if set to 0, indicates that this ToC entry is
      the last one in the ToC.

   L (5 bits):  A field that gives the frame length of each individual
      frame within the frame-block.

        L          length(bytes)
        0           0 NO_DATA
        1-7         N/A (reserved)
        8-22        80+10*(L-8)
       23-27        240+20*(L-23)
       28-31        N/A (reserved)

                Figure 4: How to map L values to frame lengths

      L=0 is used to indicate an empty frame, this may be useful if
      frames are missing e.g at re-packetization.
      The value range [1..7] and [28..31] inclusive is reserved for
      future use in this draft version, if these values occur in a ToC
      the entire packet SHOULD be treated as invalid and discarded.
      A few examples are given below where the frame size and the
      corresponding codec bitrate is computed based on the value L.

         L    Bytes    Bitrate(kbps)
         8      80        32
         9      90        36
        10     100        40
        12     120        48
        16     160        64
        22     220        88
        23     240        96
        25     280       112
        27     320       128

        Figure 5: Examples of L values and corresponding frame lengths

      This encoding yields a granularity of 4kbps between 32 and 88kbps
      and a granularity of 8kps between 88 and 128kbps with a defined
      range of 32-128kbps.

   R (2bits):  Reserved bits.  SHALL be set to 0 on sending and SHALL be
      ignored on reception.

Westerlund & Johansson  Expires December 18, 2008               [Page 9]

Internet-Draft        RTP Payload format for G.719             June 2008

5.3.  Basic mode

   The basic ToC element Figure 3 is extended with a number of frame-
   blocks field (#frames) to form the ToC entry.  The frame-blocks field
   tells how many frame-blocks of the same length the ToC entry relates

      |    #frames    |

                  Figure 6: Number of frame-blocks field

5.4.  Interleaved mode

   The basic ToC is extended with a number of frame-blocks field
   (#frames) and the DIS fields to form a ToC entry in interleaved mode.
   The frame-blocks field tells how many frame-blocks of the same length
   the ToC relates to.  The DIS fields, one for each frame-block
   indicated by the #frames field, express the interleaving distance
   between audio frames carried in the payload.  If necessary to achieve
   octet alignment, a 4-bit padding is added.

      |    #frames    | DIS1  |  ...  | DISi  |  ...  | DISn  | Padd  |

            Figure 7: Number of frame-block + interleave fields

   DIS1...DISn (4 bits):  A list of n (n=#frames) displacement fields
      indicating the displacement of the i:th (i=1..n) audio frame-block
      relative to the preceding frame-block in the payload, in units of
      20 ms long audio frame-blocks).  The four-bit unsigned integer
      displacement values may be between 0 and 15 indicating the number
      of audio frame-blocks in decoding order between the (i-1):th and
      the i:th frame in the payload.  Note that for the first ToC entry
      of the payload the value of DIS1 is meaningless.  It SHALL be set
      to zero by a sender, and SHALL be ignored by a receiver.  This
      frame-block's location in the decoding order is uniquely defined
      by the RTP timestamp.  Note also that for subsequent ToC entries
      DIS1 indicates the number of frames between the last frame of the
      previous group and the first frame of this group.

   Padd (4 bits):  To ensure octet alignment, four padding bits SHALL be
      included at the end of the ToC entry in case there is an odd
      number of frame-blocks in the group referenced by this ToC entry.
      These bits SHALL be set to zero and SHALL be ignored by the
      receiver.  If a group containing an even number of frames is

Westerlund & Johansson  Expires December 18, 2008              [Page 10]

Internet-Draft        RTP Payload format for G.719             June 2008

      referenced by this ToC entry, these padding bits SHALL NOT be
      included in the payload.

5.5.  Audio Data

   The audio data part follows the table of contents.  All the octets
   comprising an audio frame SHALL be appended to the payload as a unit.
   For each frame-block the audio frames are concatenated in order
   indicated by table in Section 4.1 of [RFC3551] for the number of
   channels configured for the payload type in use.  So the first
   channel (left most) indicated comes first followed by the next
   channel.  The audio frame-blocks are packetized in increasing
   timestamp order within each group of frame-blocks (per ToC entry),
   i.e. oldest frame-block first.  The groups of frame-blocks are
   packetized in the same order as their corresponding ToC entries.

   The audio frames are specified in ITU recommendation [ITU-T-G719].

5.6.  Implementation Considerations

   An application implementing this payload format MUST understand all
   the payload parameters.  Any mapping of the parameters to a signaling
   protocol MUST support all parameters.  So an implementation of this
   payload format in an application using SDP is required to understand
   all the payload parameters in their SDP-mapped form.  This
   requirement ensures that an implementation always can decide whether
   it is capable of communicating.

   Basic mode SHALL be implemented and the interleaved mode SHOULD be
   implemented.  The implementation burden of both is rather small, and
   supporting both ensures interoperability.  However, interleaving is
   not mandated as it has limited applicability for conversational
   application that requires tight delay boundaries.

5.6.1.  Receiving Redundant Frames

   The reception of redundant audio frames, i.e. more than one audio
   frame from the same source for the same time slot, MUST be supported
   by the implementation.  In the case that the receiver gets multiple
   audio frames in different bit-rates for the same time slot it is
   RECOMMENDED that the receiver keeps the one with the highest bit-

5.6.2.  Interleaving

   The use of interleaving requires further considerations.  As
   presented in the example in Section 4.3.2, a given interleaving
   pattern requires a certain amount of the de-interleaving buffer.

Westerlund & Johansson  Expires December 18, 2008              [Page 11]

Internet-Draft        RTP Payload format for G.719             June 2008

   This buffer space, expressed in a number of transport frame slots, is
   indicated by the "interleaving" media type parameter.  The number of
   frame slots needed can be converted into actual memory requirements
   by considering the 320 bytes per frame used by the highest bit-rate
   rate of G.719.

   The information about the frame buffer size is not always sufficient
   to determine when it is appropriate to start consuming frames from
   the interleaving buffer.  Additional information is needed when the
   interleaving pattern changes.  The "int-delay" media type parameter
   is defined to convey this information.  It allows a sender to
   indicate the minimal media time that needs to be present in the
   buffer before the decoder can start consuming frames from the buffer.
   Because the sender has full control over the interleaving pattern, it
   can calculate this value.  In certain cases (for example, if joining
   a multicast session with interleaving mid-session), a receiver may
   initially receive only part of the packets in the interleaving
   pattern.  This initial partial reception (in frame sequence order) of
   frames can yield too few frames for acceptable quality from the audio
   decoding.  This problem also arises when using encryption for access
   control, and the receiver does not have the previous key.  Although
   the G.719 is robust and thus tolerant to a high random frame erasure
   rate, it would have difficulties handling consecutive frame losses at
   startup.  Thus, some special implementation considerations are

   In order to handle this type of startup efficiently, decoding can
   start provided that:

   1.  There are at least two consecutive frames available.

   2.  More than or equal to half the frames are available in the time
       period from where decoding was planned to start and the most
       forward received decoding.

   After receiving a number of packets, in the worst case as many
   packets as the interleaving pattern covers, the previously described
   effects disappear and normal decoding is resumed.  Similar issues
   arise when a receiver leaves a session or has lost access to the
   stream.  If the receiver leaves the session, this would be a minor
   issue since playout is normally stopped.  The sender can avoid this
   type of problem in many sessions by starting and ending interleaving
   patterns correctly when risks of losses occur.  One such example is a
   key-change done for access control to encrypted streams.  If only
   some keys are provided to clients and there is a risk of they
   receiving content for which they do not have the key, it is
   recommended that interleaving patterns do not overlap key changes.

Westerlund & Johansson  Expires December 18, 2008              [Page 12]

Internet-Draft        RTP Payload format for G.719             June 2008

5.6.3.  Decoding Validation

   If the receiver finds a mismatch between the size of a received
   payload and the size indicated by the ToC of the payload, the
   receiver SHOULD discard the packet.  This is recommended because
   decoding a frame parsed from a payload based on erroneous ToC data
   could severely degrade the audio quality.

6.  Payload Examples

   A few examples to highlight the payload format

6.1.  3 mono frames with 2 different bitrates

   The first example is a payload consisting of 3 mono frames where the
   2 first frames correspond to a bitrate of 32kbps (80byte/frame) and
   the last is 48kbps (120byte/frame).

      The first 32 bits are ToC fields.
      Bit 0 is '1' as another ToC field follow.
      Bits 1..5 is 01000 = 80bytes/frame
      Bits 8..15 is 00000010 = 2 frame-blocks with 80bytes/frame
      Bit 16 is '0', no more ToC follows
      Bits 17..21 is 01100 = 120 bytes/frame
      Bits 24..31 = 00000001 = 1 frame-block with 120bytes/frame

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      |1|0 1 0 0 0|0 0|0 0 0 0 0 0 1 0|0|0 1 1 0 0|0 0|0 0 0 0 0 0 0 1|
      |d(0)   frame 1                                                 |
      .                                                               .
      |                                                         d(639)|
      |d(0)   frame 2                                                 |
      .                                                               .
      |                                                         d(639)|
      |d(0)   frame 3                                                 |
      .                                                               .
      |                                                         d(959)|

Westerlund & Johansson  Expires December 18, 2008              [Page 13]

Internet-Draft        RTP Payload format for G.719             June 2008

6.2.  2 stereo frame-blocks of the same bitrate

   A payload consisting of 2 stereo frames corresponding to a bitrate of
   32kbps (80byte/frame) per channel.  The receiver calculates the
   number of frames in the audio block by multiplying the value of the
   channels parameter (2) with the #frames field value (2) to derive
   that there are 4 audio frames in the payload.

      The first 16 bits is the ToC field.
      Bit 0 is '0' as no ToC field follow.
      Bits 1..5 is 01000 = 80bytes/frame
      Bits 8..15 is 00000010 = 2 frame-blocks with 80bytes/frame

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      |0|0 1 0 0 0|0 0|0 0 0 0 0 0 1 0| d(0) frame 1 left ch.         |
      .                                                               .
      |                         d(639)| d(0) frame 1 right ch.        |
      .                                                               .
      |                         d(639)| d(0) frame 2 left ch.         |
      .                                                               .
      |                         d(639)| d(0) frame 2 right ch.        |
      |                         d(639)|

6.3.  4 mono frames interleaved

   A payload consisting of 4 stereo frames corresponding to a bitrate of
   32kbps (80byte/frame) interleaved.  A pattern of interleaving for
   constant delay when aggregating 4 frames is the below one used in
   this example.  The actual packet illustrate

      Packet n-3:  1,  6, 11, 16
      Packet n-2:  5, 10, 15, 20
      Packet n-1:  9, 14, 19, 24
      Packet   n: 13, 18, 23, 28
      Packet n+1: 17, 22, 27, 32
      Packet n+2: 21, 26, 31, 36

      The first 16 bits is the ToC field.
      Bit 0 is '0' as there are no ToC field following.
      Bits 1..5 is 01000 = 80bytes/frame
      Bits 8..15 is 00000100 = 4 frame-blocks with 80bytes/frame

Westerlund & Johansson  Expires December 18, 2008              [Page 14]

Internet-Draft        RTP Payload format for G.719             June 2008

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      |0|0 1 0 0 0|0 0|0 0 0 0 0 1 0 0|0 0 0 0|0 1 0 0|0 1 0 0|0 1 0 0|
      | d(0) frame 13                                                 |
      .                                                               .
      |                                                         d(639)|
      | d(0) frame 18                                                 |
      .                                                               .
      |                                                         d(639)|
      | d(0) frame 23                                                 |
      .                                                               .
      |                                                         d(639)|
      | d(0) frame 28                                                 |
      .                                                               .
      |                                                         d(639)|

7.  Payload Format Parameters

   This RTP payload format is identified using the media type audio/g719
   which is registered in accordance with [RFC4855] and using the
   template of [RFC4288].

7.1.  Media Type Definition

   The media type for the G.719 codec is allocated from the IETF tree
   since G.719 is a has the potential to become a widely used audio
   codec in general VoIP, teleconferencing and streaming applications.
   This media type registration covers real-time transfer via RTP.

   Note, any unspecified parameter MUST be ignored by the receiver to
   ensure that additional parameters can be added in the future.

   Type name: audio

   Subtype name: g719

   Required parameters: none

   Optional parameters:

Westerlund & Johansson  Expires December 18, 2008              [Page 15]

Internet-Draft        RTP Payload format for G.719             June 2008

   interleaving:  Indicates that interleaved mode SHALL be used for the
      payload.  The parameter specifies the number of transport frame
      slots required in a de-interleaving buffer (including the frame
      that is ready to be consumed).  Its value is equal to one plus the
      maximum number of frames that precede any frame in transmission
      order and follow the frame in RTP timestamp order.  The value MUST
      be greater than zero.  If this parameter is not present,
      interleaved mode SHALL NOT be used.

   int-delay:  The minimal media time delay in RTP timestamp ticks that
      is needed in the de-interleaving buffer, i.e., the difference in
      RTP timestamp ticks between the earliest and latest audio frame
      present in the de-interleaving buffer.

   max-red:  The maximum duration in milliseconds that elapses between
      the primary (first) transmission of a frame and any redundant
      transmission that the sender will use.  This parameter allows a
      receiver to have a bounded delay when redundancy is used.  Allowed
      values are between 0 (no redundancy will be used) and 65535.  If
      the parameter is omitted, no limitation on the use of redundancy
      is present.

   channels:  The number of audio channels.  The possible values (1-6)
      and their respective channel order is specified in Section 4.1 in
      [RFC3551].  If omitted, it has the default value of 1.

   CBR:  Constant Bit Rate (CBR), indicates the exact codec-bitrate in
      bits per second (not including the overhead from packetization,
      RTP header or lower layers) that the codec MUST use.  CBR is to be
      used when dynamic rate cannot be supported (one case is e.g GW to

   ptime:  see [RFC4566].

   maxptime:  see [RFC4566].

   Encoding considerations:

      This media type is framed and binary, see section 4.8 in RFC4288

   Security considerations: none

      See Section 10 of RFCXXXX.

   Interoperability considerations:

   Published specification:

Westerlund & Johansson  Expires December 18, 2008              [Page 16]

Internet-Draft        RTP Payload format for G.719             June 2008

      RFC XXXX

   Applications that use this media type:

      Real-time audio applications like voice over IP and
      teleconference, and multi-media streaming.

   Additional information: none

   Person & email address to contact for further information:

      Payload format: IngemarJohansson

      Codec spec.: Anisse Taleb <anisse<dot>taleb<AT>ericsson<dot>com>

   Intended usage: COMMON

   Restrictions on usage:

      This media type depends on RTP framing, and hence is only defined
      for transfer via RTP [RFC3550].  Transport within other framing
      protocols is not defined at this time.


      Ingemar Johansson <ingemar.s.johansson@ericsson.com>

      Magnus Westerlund <magnus.westerlund@ericsson.com>

   Change controller:

      IETF Audio/Video Transport working group delegated from the IESG.

7.2.  Mapping to SDP

   The information carried in the media type specification has a
   specific mapping to fields in the Session Description Protocol (SDP)
   [RFC4566], which is commonly used to describe RTP sessions.  When SDP
   is used to specify sessions employing the G.719 codec, the mapping is
   as follows:

   o  The media type ("audio") goes in SDP "m=" as the media name.

   o  The media subtype (payload format name) goes in SDP "a=rtpmap" as
      the encoding name.  The RTP clock rate in "a=rtpmap" MUST be
      48000, and the encoding parameters (number of channels) MUST
      either be explicitly set to N or omitted, implying a default value

Westerlund & Johansson  Expires December 18, 2008              [Page 17]

Internet-Draft        RTP Payload format for G.719             June 2008

      of 1.  The values of N that are allowed are specified in Section
      4.1 in [RFC3551].

   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
      "a=maxptime" attributes, respectively.

   o  Any remaining parameters go in the SDP "a=fmtp" attribute by
      copying them directly from the media type parameter string as a
      semicolon-separated list of parameter=value pairs.

7.2.1.  Offer/Answer Considerations

   The following considerations apply when using SDP Offer-Answer
   procedures to negotiate the use of G.719 payload in RTP:

   o  Each combination of the RTP payload transport format configuration
      parameters ( interleaving, and channels) is unique in its bit-
      pattern and not compatible with any other combination.  When
      creating an offer in an application desiring to use the more
      advanced features (interleaving, or more than one channel), the
      offerer is RECOMMENDED to also offer a payload type containing
      only the configuration with a single channel.  If multiple
      configurations are of interest to the application, they may all be
      offered; however, care should be taken not to offer too many
      payload types.  An SDP answerer MUST include, in the SDP answer
      for a payload type, the following parameters unmodified from the
      SDP offer (unless it removes the payload type): "interleaving";
      and "channels".  The SDP offerer and answerer MUST generate G.719
      packets as described by these parameters.

   o  The "int-delay" parameter is declarative.  For streams declared as
      sendrecv or recvonly, the value indicates the maximum initial
      delay the receiver will accept in the de-interleaving buffer.  For
      sendonly streams, the value is the amount of media time the sender
      desires to use.  The value SHOULD be copied into any response.

   o  In most cases, the parameters "maxptime" and "ptime" will not
      affect interoperability; however, the setting of the parameters
      can affect the performance of the application.  The SDP offer-
      answer handling of the "ptime" parameter is described in
      [RFC3264].  The "maxptime" parameter MUST be handled in the same

   o  The parameter "max-red" is a stream property parameter.  For
      sendonly or sendrecv unicast media streams, the parameter declares
      the limitation on redundancy that the stream sender will use.  For
      recvonly streams, it indicates the desired value for the stream
      sent to the receiver.  The answerer MAY change the value, but is

Westerlund & Johansson  Expires December 18, 2008              [Page 18]

Internet-Draft        RTP Payload format for G.719             June 2008

      RECOMMENDED to use the same limitation as the offer declares.  In
      the case of multicast, the offerer MAY declare a limitation; this
      SHALL be answered using the same value.  A media sender using this
      payload format is RECOMMENDED to always include the "max-red"
      parameter.  This information is likely to simplify the media
      stream handling in the receiver.  This is especially true if no
      redundancy will be used, in which case "max-red" is set to 0.

   o  Any unknown parameter in an offer SHALL be removed in the answer.

   o  The b= SDP parameter SHOULD be used to negotiate the maximum
      bandwidth to be used for the audio stream.  The offerer may offer
      a maximum rate and the answer may contain a lower rate.  If no b=
      parameter is present in the offer or answer it implies a rate up
      to 128kbps

   o  The parameter "CBR" is a receiver capability.  For recvonly and
      sendrecv streams, it indicates the desired constant bit rate that
      the receiver wants to accept.  A sender MUST be able to send
      constant bit rate stream since it is a subset of the variable bit
      rate capability.  If the offer includes this parameter the
      answerer SHOULD send at the constant bit rate if it is within the
      allowed call bit rate (b= parameter).  The Answerer MAY change the
      value of CBR to a lower rate but it is RECOMMENDED to use the same
      rate.  The answerer MAY add this parameter if it wants to receive
      at a constant bit rate even if the offer did not include the CBR
      parameter.  In this case, the offerer SHOULD send at the constant
      bit rate but SHOULD be able to accept media at variable bit rate.

7.2.2.  Declarative SDP Considerations

   In declarative usage, like SDP in RTSP [RFC2326] or SAP [RFC2974],
   the parameters SHALL be interpreted as follows:

   o  The payload format configuration parameters (interleaving, and
      channels) are all declarative, and a participant MUST use the
      configuration(s) that is provided for the session.  More than one
      configuration may be provided if necessary by declaring multiple
      RTP payload types; however, the number of types should be kept

   o  Any "maxptime" and "ptime" values should be selected with care to
      ensure that the session's participants can achieve reasonable

Westerlund & Johansson  Expires December 18, 2008              [Page 19]

Internet-Draft        RTP Payload format for G.719             June 2008

8.  IANA Considerations

   One media type (audio/g719) has been defined and needs registration
   in the media types registry; see Section 7.1.

9.  Congestion Control

   The general congestion control considerations for transporting RTP
   data apply; see RTP [RFC3550] and any applicable RTP profile like AVP
   [RFC3551].  However, the multi-rate capability of G.719 audio coding
   provides a mechanism that may help to control congestion, since the
   bandwidth demand can be adjusted (within the limits of the codec) by
   selecting a different encoding bit-rate.

   The number of frames encapsulated in each RTP payload highly
   influences the overall bandwidth of the RTP stream due to header
   overhead constraints.  Packetizing more frames in each RTP payload
   can reduce the number of packets sent and hence the header overhead,
   at the expense of increased delay and reduced error robustness.  If
   forward error correction (FEC) is used, the amount of FEC-induced
   redundancy needs to be regulated such that the use of FEC itself does
   not cause a congestion problem.

10.  Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the general security considerations discussed in RTP
   [RFC3550] and any applicable profile such as AVP [RFC3551] or SAVP
   [RFC3711].  As this format transports encoded audio, the main
   security issues include confidentiality, integrity protection, and
   data origin authentication of the audio itself.  The payload format
   itself does not have any built-in security mechanisms.  Any suitable
   external mechanisms, such as SRTP [RFC3711], MAY be used.

   This payload format and the G.719 decoder do not exhibit any
   significant non-uniformity in the receiver-side computational
   complexity for packet processing, and thus are unlikely to pose a
   denial-of-service threat due to the receipt of pathological data.

10.1.  Confidentiality

   In order to ensure confidentiality of the encoded audio, all audio
   data bits MUST be encrypted.  There is less need to encrypt the
   payload header or the table of contents since they only carry
   information about the frame type.  This information could also be
   useful to a third party, for example, for quality monitoring.

Westerlund & Johansson  Expires December 18, 2008              [Page 20]

Internet-Draft        RTP Payload format for G.719             June 2008

   The use of interleaving in conjunction with encryption can have a
   negative impact on confidentiality, for a short period of time.
   Consider the following packets (in brackets) containing frame numbers
   as indicated: {10, 14, 18}, {13, 17, 21}, {16, 20, 24} (a popular
   continuous diagonal interleaving pattern).  The originator wishes to
   deny some participants the ability to hear material starting at time
   16.  Simply changing the key on the packet with the timestamp at or
   after 16, and denying that new key to those participants, does not
   achieve this; frames 17, 18, and 21 have been supplied in prior
   packets under the prior key, and error concealment may make the audio
   intelligible at least as far as frame 18 or 19, and possibly further.

10.2.  Authentication and Integrity

   To authenticate the sender of the audio-stream, an external mechanism
   MUST be used.  It is RECOMMENDED that such a mechanism protects both
   the complete RTP header and the payload (audio and data bits).  Data
   tampering by a man-in-the-middle attacker could replace audio content
   and also result in erroneous depacketization/decoding that could
   lower the audio quality.

11.  Acknowledgements

   The authors would like to thank Roni Even and Anisse Taleb for their
   help with this draft.

12.  References

12.1.  Informative References

              Eggert, L. and G. Fairhurst, "Guidelines for Application
              Designers on Using Unicast UDP",
              draft-ietf-tsvwg-udp-guidelines-08 (work in progress),
              June 2008.

   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
              September 1997.

   [RFC2326]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
              Streaming Protocol (RTSP)", RFC 2326, April 1998.

   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session
              Announcement Protocol", RFC 2974, October 2000.

Westerlund & Johansson  Expires December 18, 2008              [Page 21]

Internet-Draft        RTP Payload format for G.719             June 2008

   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
              RFC 3711, March 2004.

   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
              Registration Procedures", BCP 13, RFC 4288, December 2005.

   [RFC4855]  Casner, S., "Media Type Registration of RTP Payload
              Formats", RFC 4855, February 2007.

   [RFC5109]  Li, A., "RTP Payload Format for Generic Forward Error
              Correction", RFC 5109, December 2007.

12.2.  Normative References

              ITU-T, "Specification : ITU-T G.719 extension for 20 kHz
              fullband audio", April 2008.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264,
              June 2002.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
              Video Conferences with Minimal Control", STD 65, RFC 3551,
              July 2003.

   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
              Description Protocol", RFC 4566, July 2006.

Westerlund & Johansson  Expires December 18, 2008              [Page 22]

Internet-Draft        RTP Payload format for G.719             June 2008

Authors' Addresses

   Magnus Westerlund
   Ericsson AB
   Torshamnsgatan 21-23
   SE-164 83 Stockholm

   Phone: +46 8 7190000
   Email: magnus.westerlund@ericsson.com

   Ingemar Johansson
   Ericsson AB
   Laboratoriegrand 11
   SE-971 28 Lulea

   Phone: +46 73 0783289
   Email: ingemar.s.johansson@ericsson.com

Westerlund & Johansson  Expires December 18, 2008              [Page 23]

Internet-Draft        RTP Payload format for G.719             June 2008

Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at

Westerlund & Johansson  Expires December 18, 2008              [Page 24]

Html markup produced by rfcmarkup 1.111, available from https://tools.ietf.org/tools/rfcmarkup/