Network Working Group J. Lennox
Internet-Draft Vidyo
Intended status: Informational K. Gross
Expires: October 13, 2013 AVA
April 11, 2013

A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources


The terminology about, and associations among, Real-Time Transport Protocol (RTP) sources can be complex and somewhat opaque. This document describes a number of existing and proposed relationships among RTP sources, and attempts to define common terminology for discussing protocol entities and their relationships.

This document is still very rough, but is submitted in the hopes of making future discussion productive.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on October 13, 2013.

Copyright Notice

Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents ( in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

Table of Contents

1. Introduction

Terminology and understanding about sources in RTP, and how they relate, is very confused. Different protocols use the same terms differently, and complexities addressed at one layer are often glossed over or ignored at another.

This document attempts to provide some clarity by reviewing the semantics of various aspects of sources in RTP. As an organizing mechanism, it approaches this by describing various ways that RTP sources can be grouped and associated together.

2. RTP Terminology

2.1. Synchronization Source

In RTP, the smallest unit of media transport is the "synchronization source" (SSRC). Defined in [RFC3550], a synchronization source is the source of a stream of RTP packets, identified by a 32-bit numeric SSRC identifier carried in the RTP header of every RTP packet. All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback. In the most common cases, a synchronization source is a single flow of encoded media that can be fed to a single decoding process.

In some more complex cases, a synchronization source is also used to encode repair flows and sub-flows of a layered encoding. See Section 4.2.2 and Section 4.2.3 below.

(Note: as the definition above mentions, "synchronization source" in [RFC3550] technically refers to the *source* of the stream of RTP packets, not the packets themselves. A better term for the stream could be helpful, and the authors are open to one; however, finding a good term that doesn't collide with terms used in other standards is very difficult. In particular, both "stream" and "flow" are overloaded enough that their use would probably confuse more than it clarifies.)

2.2. RTP Session

The RTP protocol transmits sources in an "RTP session". An RTP session is an association among a group of participants communicating with RTP. It is a group communications channel which carries a number of synchronization sources. Within an RTP session, every participant finds out meta-data and control information (over RTCP) about all the synchronization sources in the RTP session, all synchronization source identifiers are unique in the RTP session, and the bandwidth of the RTCP control channel is shared by all the synchronization sources in the session.

2.3. Contributing Source

In an RTP session, in some cases, the media flows of some synchronization sources are not forwarded by RTP middleboxes (known as mixers and translators). In the case of mixers, each of the synchronization sources that are used to create a mixed stream of media becomes visible instead as a "contributing source" (CSRC), whose media is no longer directly visibile in some parts of the RTP session, but whose meta-data is still known by all participants through RTCP.

2.4. Multimedia Session

A group of RTP sessions can be grouped into a "multimedia session", a group of concurrent, associated RTP sessions among a common group of participants. A multimedia session defines logical relationships among sources that appear in multiple RTP sessions.

2.5. Media Source

At a higher level than this is the "media source", a single piece of media that can be indepentently rendered. While in the simplest cases a synchronization source uniquely encodes a media source, many more complicated cases, discussed in Section 4.2.2 and Section 4.2.3 below, associate multiple synchronization sources together to provide various methods and options for encoding a media source. (In some of these cases, these multiple synchronization sources are even sent in separate RTP sessions, though always in the same multimedia session.)

Unlike the other terminology defined in this section, "media source" is not a well-established term; existing protocols use a number of different terms to define similar concepts, described below. ("Media source" may not be the best term in for this concept; the authors of this document are open to better ones.)

3. Terminology used other protocols and groups

3.1. SDP

The Session Description Protocol [RFC4566] provides a mechanism for describing -- and, with SDP Offer/Answer [RFC3264], negotiation -- multimedia sessions.

3.1.1. SDP Multimedia Session

SDP uses the term "multimedia session" in essentially the same manner as RTP. Media Stream

"Media stream" is perhaps one of the most confusing terms in SDP. It is never explicitly defined in [RFC4566]; its meaning can only be inferred from descriptions of its properties, the SDP syntax used to describe it, and its usage in other protocols.

In SDP, a media stream is described my a "media description", a block of SDP starting with an "m=" line and followed by other SDP fields which are scoped to that media description.

When SDP is used to describe a media stream that uses RTP, the attributes used (in the core SDP protocol) all define attributes of an RTP session. Furthermore, [RFC4566] does note that "Media streams can be many-to-many." Thus, when SDP describes a media stream that uses RTP, the RTP-level protocol element that is described is an RTP session, which can carry any number of synchronization sources (and thus media sources).

However, for a number of reasons, many implementations of SDP, notably SIP devices, instead interpret SDP media streams as containing only a single media source in each direction (assuming unicast flows). There are a number of reasons this implementation decision was chosen, including the lack of definition of "media stream" in the SDP specification; the poorly chosen name (the term "media stream" seems like something that should refer to what this document calls a "media source"); the fact that core SDP, and offer/answer, provide no description or negotiation mechanisms for anything more fine-grained than a media source; and the fact that most existing implementations have no need for more than one media source and received of each media type.

Other than in discussions of other existing terminology, this document attempts to avoid the use of the word "stream".

3.1.2. CLUE

The CLUE working group is ongoing work in the IETF to define protocols and mechanism for interoperable telepresence systems. Its architecture and terms are defined in [I-D.ietf-clue-framework]. CLUE Capture Scene

In CLUE, a "Capture Scene" is a structure representing a spatial region containing one or more Capture Devices, each capturing media representing a portion of the region. The spatial region represented by a scene may or may not correspond to a real region in physical space, such as a room. A capture scene includes attributes and one or more capture scene entries, with each entry including one or more media captures. CLUE Capture Scene Entry

In CLUE, a "Capture Scene Entry" is a list of Media Captures of the same media type that together form one way to represent the entire Capture scene. CLUE Capture

In CLUE, a "Media Capture" is a source of Media, such as from one or more Capture Devices or constructed from other Media streams.

The "Capture" is CLUE's term that is most closely equivalent to what this document calls a "media source"; however, some usages (such as "dynamic sources" of "switched captures") may be more complex, and are still actively being defined. CLUE Capture Encoding

In CLUE, a "Capture Encoding" is a specific encoding of a Media Capture, to be sent by a Media Provider to a Media Consumer via RTP.

In simulcast scenearios (see Section 4.2.2), multiple capture encodings can be used simultaneously to transmit a single capture.

3.1.3. WebRTC

WebRTC is ongoing work in the IETF and W3C to define mechanisms and APIs to allow Javascript applications in web browsers to participate in multimedia sessions. An overview can be found in [I-D.ietf-rtcweb-overview]. WebRTC RtcMediaStreamTrack


An "RtcMediaStreamTrack" is an object in WebRTC that maps most closely to what this document calls a "media source", though some of the details remain to be defined. However, as WebRTC is defining an API, the RtcMediaStreamTrack object also includes a good deal of meta-data about the media source, such as an ID, enabled/disabled state, and so forth. WebRTC RtcMediaStream

An "RtcMediaStream" in WebRTC is an object that groups a set of RtcMediaStreamTracks that are synchronized with each other.

4. Grouping of RTP sources

Synchronization sources can have a large variety of relationships among them. These relationships can apply both between sources within a single session, and between sources that occur in multiple sessons.

Ways of relating synchronization sources typically involve groups: a set of synchronizaiton sources has some relationship that applies to all those in the group, and no others. (Relationships that involve arbitrary non-grouping associations among synchronization sources, such that e.g., A relates to B and B to C, but A and C are unrelated, are uncommon if not nonexistant.)

In many cases, the semantics of groups are not simply that the sources form an undifferentiated group, but rather that members of the group have certain roles.

4.1. Overview of hierarchical relationships among groups

Many (though not all) associations among groups can be described hierarchically.

These hierarchichal groups can be divided into three large categories: associations among independent media sources; alternative representations of media sources; and mechanisms for robustness and repair of a particular representation of a media source.

4.2. Existing and proposed source group semantics

We can divide group semantics into several broad categories.

(TODO: each of the items below needs to be described in detail.)

4.2.1. Relationships among media sources

Synchronization Contexts -- things identified by a CNAME.

CLUE Scenes (and their substructures, down to Captures.).

WebRTC MediaStreams.

Clock source signaling.

4.2.2. Alternative representations of media sources


Multi-Stream Transmission of Layered Codecs.

The original FID description in RFC 5888 (grouping) Section 8.

4.2.3. Robustness and Repair

Forward Error Correction (FEC). (But in some cases this can also be across multiple media sources!)


4.2.4. Reporting Associations

RTP also uses SSRC values to identify receivers of media, the source of feedback about synchronization sources (and thus, also media sources). In some cases these receivers are also grouped. However, these associations are not associations of senders of media, and thus are not the same type of relationship as the ones described above.

Inter-Domain Media Synchronization

Reporting groups.

5. Existing and proposed mechanisms for describing groups

SDP groups

SDP source groups




Application-specific SDP attributes

SDP "number of ports" field to create multiple sessions with SSRC alignment.

6. Security Considerations

Different for each way of grouping. Is there anything we can generalize?

Hopefully having a well-defined common terminology and understanding of the complexities of the RTP architecture will help lead us to better standards, avoiding security problems.

7. Open Issues

Much of the terminology is still a matter of dispute.

It might be useful to distinguish between a single endpoint's view of a source, or RTP session, or multimedia session, versus the full set of sessions and every endpoint that's communicating in them, with the signaling that established them.

(Sure to be many more...)

8. IANA Considerations

This document makes no request of IANA.

9. Informative References

[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.
[RFC4566] Handley, M., Jacobson, V. and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006.
[RFC6222] Begen, A., Perkins, C. and D. Wing, "Guidelines for Choosing RTP Control Protocol (RTCP) Canonical Names (CNAMEs)", RFC 6222, April 2011.
[I-D.ietf-clue-framework] Duckworth, M., Pepperell, A. and S. Wenger, "Framework for Telepresence Multi-Streams", Internet-Draft draft-ietf-clue-framework-09, February 2013.
[I-D.ietf-rtcweb-overview] Alvestrand, H., "Overview: Real Time Protocols for Brower-based Applications", Internet-Draft draft-ietf-rtcweb-overview-06, February 2013.

Appendix A. RTP terminology

(TODO: This list of definitions needs to be cleaned up and expanded, if it's useful. Possibly its content should simply be merged into the appropriate sections of the main document, instead.)

A.1. Capture

"Capture" is the term used in the IETF CLUE Telepresence Framework to refer to what this document calls a "media source".


An RTP canonical name (RTP CNAME or CNAME) is a persistent transport-level identifier for an RTP endpoint [RTPendpoint]. CNAMEs must be unique within an RTP session [RTPsession]. The RTCP CNAME can be either persistent across different RTP sessions for an RTP endpoint or unique per session, meaning that an RTP endpoint chooses a different CNAME for each session.[RFC6222] receivers [Receiver] require the CNAME to keep track of each participant. Receivers may also require the CNAME to associate multiple data streams from a given participant in a set of related RTP sessions, for example to synchronize audio and video.

A.3. End system

An application that generates the content to be sent in RTP packets and/or consumes the content of received RTP packets. An end system can produce one or more synchronization sources [SSRC] in a particular RTP session [RTPsession].[RFC3550]

A.4. Group

A.5. Multimedia session

A set of concurrent, associated RTP sessions [RTPsession] among a common group of participants.[RFC3550]

A.6. Multi-stream transmission

Multi-stream transmission (MST) is a mechanism by which different portions of a layered encoding of a media stream are sent using separate synchronization sources (sometimes in separate RTP sessons). MSTs are useful for receiver control of layered media.

A.7. Receiver

A.8. Repair stream

A repair stream is a secondary stream used for error recovery. The repair mecahnisms available include

A.9. RTP endpoint

A.10. RTP session

An RTP session or simply, session is an association among a set of participants communicating with RTP.[RFC3550]

A.11. Scene

A scene is associated with IETF CLUE and describes a collection of captures [Capture] and the metadata required to make associations (e.g. relative spacial orintation of cameras) between the captures.

A.12. Sender

A.13. Simulcast

A.14. Source

A synchronization source (SSRC) is the source of a stream of RTP packets, identified by a 32-bit numeric SSRC identifier carried in the RTP header so as not to be dependent upon the network address. All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback.[RFC3550]

A.15. Media stream

A.16. Switched capture

A switched capture is used in teleconference appliations and is a stream [AppMediastream] whose content is selected from one of a selection of sources. The source is selected based on an algorithm in the equipment producing the stream for example, a closeup of the person currently speaking in a conference room.

A.17. Track

A track is associated with W3C WebRTC and IETF RTCWeb projects. A track is defined by SSRC and CNAME. A track is the smallest media entity that may be independently rendered. A single audio channel is a track.

Authors' Addresses

Jonathan Lennox Vidyo, Inc. 433 Hackensack Avenue Seventh Floor Hackensack, NJ 07601 US EMail:
Kevin Gross AVA Networks, LLC Boulder, CO US EMail: