[Docs] [txt|pdf] [Tracker] [Email] [Diff1] [Diff2] [Nits]
Versions: 00 01 02 RFC 2343
Internet Engineering Task Force M. Reha Civanlar
INTERNET-DRAFT Glenn L. Cash
File: draft-civanlar-bmpeg-01.txt Barry G. Haskell
Expire in six months
AT&T Labs-Research
February, 1997
RTP Payload Format for Bundled MPEG
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).
Distribution of this memo is unlimited.
Abstract
This document describes a payload type for bundled, MPEG-2 encoded
video and audio data to be used with RTP, version 2. Bundling has some
advantages for this payload type particularly when it is used for
video-on-demand applications. This payload type is to be used when its
advantages are important enough to sacrifice the modularity of having
separate audio and video streams.
A technique to improve packet loss resilience based on "out-of-band"
transmission of MPEG-2 specific, vital information is described also.
1. Introduction
draft-civanlar-bmpeg-01.txt [Page 1]
INTERNET-DRAFT RTP Payload Format for Bundled MPEG February, 1997
This document describes a bundled packetization scheme for MPEG-2
encoded audio and video streams using the Real-time Transport Protocol
(RTP), version 2 [1].
The MPEG-2 International standard consists of three layers: audio,
video and systems [2]. The audio and the video layers define the
syntax and semantics of the corresponding "elementary streams." The
systems layer supports synchronization and interleaving of multiple
compressed streams, buffer initialization and management, and time
identification. RFC 2038 [3] describes packetization techniques to
transport individual audio and video elementary streams as well as the
transport stream, which is defined at the system layer, using the RTP.
The bundled packetization scheme is needed because it has several
advantages over other schemes for some important applications
including video-on-demand (VOD) where, audio and video are always used
together. Its advantages over independent packetization of audio and
video are:
1. Uses a single port per "program" (i.e. bundled A/V).
This may increase the number of streams that can be served
e.g., from a VOD server. Also, it eliminates the performance
hit when two ports are used for the separate audio and video
messages on the client side.
2. Provides implicit synchronization of audio and video.
The server need not do anything else (e.g. generate
RTCP packets) for this purpose. This is particularly
convenient when the A/V data is stored in an interleaved
format at the server and no stream other than the bundled
A/V is to be transmitted during the session.
3. Reduces the header overhead. Since using large packets
increases the effects of losses and delay, audio only
packets need to be smaller increasing the overhead. An
A/V bundled format can provide about 1% overall overhead
reduction. Considering the high bitrates used for MPEG-2
encoded material, e.g. 4 Mbps, the number of bits saved,
e.g. 40 Kbps, may provide noticeable audio or video
quality improvement.
4. May reduce overall receiver buffer size. Audio and
video streams may experience different delays when
transmitted separately. The receiver buffers need to be
designed for the longest of these delays. For example,
let's assume that using two buffers, each with a size B,
is sufficient with probability P when each stream is
transmitted individually. The probability that the same
draft-civanlar-bmpeg-01.txt [Page 2]
INTERNET-DRAFT RTP Payload Format for Bundled MPEG February, 1997
buffer size will be sufficient when both streams need to
be received is P times the conditional probability of B
being sufficient for the second stream given that it was
sufficient for the first one. This conditional probability
is, generally, less than one requiring use of a larger
buffer size to achieve the same probability level.
And, the advantages over packetization of the transport layer streams
are:
1. Reduced overhead. It does not contain systems layer
information which is redundant for the RTP (essentially
they address similar issues).
2. Easier error recovery. Because of the structured
packetization consistent with the ALF principle, loss
concealment and error recovery can be made simpler and
more effective.
2. Encapsulation of Bundled MPEG Video and Audio
Video encapsulation follows the rules described in [3] with the addition
of the following:
each packet must contain an integral number of video slices
The video data is followed by a sufficient number of integral audio
frames to cover the duration of the video segment included in a packet.
For example, if the first packet contains three 1/900 seconds long
slices of video, and Layer I audio coding is used at a 44.1kHz sampling
rate, only one audio frame covering 384/44100 seconds of audio need be
included in this packet. Since the length of this audio frame (8.71
msec.) is longer than that of the video segment contained in this packet
(3.33 msec), the next few packets may not contain any audio frames until
the packet in which the covered video time extends outside the length of
the previously transmitted audio frames. Alternatively, it is possible,
in this proposal, to repeat the latest audio frame in "no-audio" packets
for packet loss resilience.
2.1. RTP Fixed Header for BMPEG Encapsulation
The following RTP header fields are used:
Payload Type: A distinct payload type number should be assigned to
BMPEG.
M Bit: Set for packets containing end of a picture.
draft-civanlar-bmpeg-01.txt [Page 3]
INTERNET-DRAFT RTP Payload Format for Bundled MPEG February, 1997
timestamp: 32-bit 90 kHz timestamp representing transmission time of
the MPEG picture and is monotonically increasing. Same for all packets
belonging to the same picture. For packets that contain only a
sequence, extension and/or GOP header, the timestamp is that of the
subsequent picture.
2.2. BMPEG Specific Header:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|MBZ|R|N| P | Audio Length | Audio Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
MBZ: Reserved for future use (2 bits). They must be set to zero now.
R: Redundant audio (1 bit). Set if the audio frame contained in the
packet is a repetition of the last audio frame.
N: Header data changed (1 bit). Set if any part of the video sequence,
extension, GOP and picture header data is different than that of the
previously sent headers. It gets reset, when all the header data gets
repeated.
P: Picture type (2 bits). I (0), P (1), B (2).
Audio Length: (10 bits) Length of the audio data in this packet in
bytes.
Audio Offset: (16 bits) The offset between the audio frame and the
start of the video segment in this packet in number of audio samples.
3. Out-of-band Transmission of the "High Priority" Information
In MPEG encoded video, loss of the header information, which includes
sequence, GOP, and picture headers, and the corresponding extensions,
causes severe degradations in the decoded video. When possible,
dependable transmission of the header information to the receivers can
improve the loss resiliency of MPEG video significantly [4]. RFC 2038
describes a payload type where the header information can be repeated in
each RTP packet. Although this is a straightforward approach, it may
increase the overhead.
The "data partitioning" method in MPEG-2 defines the syntax and
semantics for partitioning an MPEG-2 encoded video bitstream into "high
priority" and "low priority" parts. If the "high priority" (HP) part is
selected to contain only the header information, it is less than two
percent of the video data and can be transmitted before the start of the
draft-civanlar-bmpeg-01.txt [Page 4]
INTERNET-DRAFT RTP Payload Format for Bundled MPEG February, 1997
real-time transmission using a reliable protocol. In order to
synchronize the HP data with the corresponding real-time stream, the
initial value of the timestamp for the real-time stream may be inserted
at the beginning of the HP data.
Alternatively, the HP data may be transmitted along with the A/V data
using layered multimedia transmission techniques for RTP [5].
Appendix 1. Error Recovery
Packet losses can be detected from a combination of the sequence number
and the timestamp fields of the RTP fixed header. The extent of the loss
can be determined from the timestamp, the slice number and the
horizontal location of the first slice in the packet. The slice number
and the horizontal location can be determined from the slice header and
the first macroblock address increment, which are located at fixed bit
positions.
If lost data consists of slices all from the same picture, new data
following the loss can simply be given to the video decoder which will
normally repeat missing pixels from a previous picture. The next audio
frame must be delayed by the duration of the lost video segment.
If the received new data after a loss is from the next picture and the N
bit is not set, previously received headers for the particular picture
type (determined from the P bits) can be given to the video decoder
followed by the new data. If N is set, data deletion until a new picture
start code is advisable unless headers are available from previously
received HP data. In both cases audio needs to be delayed properly.
If data for more than one picture is lost and HP data is not available,
resynchronization to a new video sequence header is advisable.
In all cases of large packet losses, if the HP data is available,
appropriate portions of it can be given to the video decoder and the
received data can be used irrespective of the N bit value or the number
of lost pictures.
Appendix 2. Resynchronization
As described in [3], use of frequent video sequence headers makes it
possible to join in a program at arbitrary times. Also, it reduces the
resynchronization time after severe losses.
References:
[1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson,
"RTP: A Transport Protocol for Real-Time Applications,"
draft-civanlar-bmpeg-01.txt [Page 5]
INTERNET-DRAFT RTP Payload Format for Bundled MPEG February, 1997
RFC 1889, January 1996.
[2] ISO/IEC International Standard 13818; "Generic coding of moving
pictures and associated audio information," November 1994.
[3] D. Hoffman, G. Fernando, S. Kleiman, V. Goyal, "RTP Payload Format
for MPEG1/MPEG2 Video," RFC 2038, October 1996.
[4] M. R. Civanlar, G. L. Cash, "A practical system for MPEG-2 based
video-on-demand over ATM packet networks and the WWW," Signal
Processing: Image Communication, no. 8, pp. 221-227, Elsevier, 1996.
[5] M. F. Speer, S. McCanne, "RTP Usage with Layered Multimedia
Streams," Internet Draft, draft-speer-avt-layered-video-02.txt,
December 1996.
Author's Address:
M. Reha Civanlar
Glenn L. Cash
Barry G. Haskell
AT&T Labs-Research
101 Crawfords Corner Road
Holmdel, NJ 07733
USA
e-mail: civanlar|glenn|bgh@research.att.com
draft-civanlar-bmpeg-01.txt [Page 6]
Html markup produced by rfcmarkup 1.129b, available from
https://tools.ietf.org/tools/rfcmarkup/