draft-ietf-avtext-rtp-grouping-taxonomy-00.txt   draft-ietf-avtext-rtp-grouping-taxonomy-01.txt 
Network Working Group J. Lennox Network Working Group J. Lennox
Internet-Draft Vidyo Internet-Draft Vidyo
Intended status: Informational K. Gross Intended status: Informational K. Gross
Expires: May 10, 2014 AVA Expires: August 18, 2014 AVA
S. Nandakumar S. Nandakumar
G. Salgueiro G. Salgueiro
Cisco Systems Cisco Systems
B. Burman B. Burman
Ericsson Ericsson
November 06, 2013 February 14, 2014
A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport
Protocol (RTP) Sources Protocol (RTP) Sources
draft-ietf-avtext-rtp-grouping-taxonomy-00 draft-ietf-avtext-rtp-grouping-taxonomy-01
Abstract Abstract
The terminology about, and associations among, Real-Time Transport The terminology about, and associations among, Real-Time Transport
Protocol (RTP) sources can be complex and somewhat opaque. This Protocol (RTP) sources can be complex and somewhat opaque. This
document describes a number of existing and proposed relationships document describes a number of existing and proposed relationships
among RTP sources, and attempts to define common terminology for among RTP sources, and attempts to define common terminology for
discussing protocol entities and their relationships. discussing protocol entities and their relationships.
Status of This Memo Status of This Memo
skipping to change at page 1, line 41 skipping to change at page 1, line 41
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 10, 2014. This Internet-Draft will expire on August 18, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Media Chain . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Media Chain . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1. Physical Stimulus . . . . . . . . . . . . . . . . . . 7 2.1.1. Physical Stimulus . . . . . . . . . . . . . . . . . . 8
2.1.2. Media Capture . . . . . . . . . . . . . . . . . . . . 7 2.1.2. Media Capture . . . . . . . . . . . . . . . . . . . . 8
2.1.3. Raw Stream . . . . . . . . . . . . . . . . . . . . . 7 2.1.3. Raw Stream . . . . . . . . . . . . . . . . . . . . . 8
2.1.4. Media Source . . . . . . . . . . . . . . . . . . . . 8 2.1.4. Media Source . . . . . . . . . . . . . . . . . . . . 9
2.1.5. Source Stream . . . . . . . . . . . . . . . . . . . . 9 2.1.5. Source Stream . . . . . . . . . . . . . . . . . . . . 10
2.1.6. Media Encoder . . . . . . . . . . . . . . . . . . . . 9 2.1.6. Media Encoder . . . . . . . . . . . . . . . . . . . . 10
2.1.7. Encoded Stream . . . . . . . . . . . . . . . . . . . 10 2.1.7. Encoded Stream . . . . . . . . . . . . . . . . . . . 11
2.1.8. Dependent Stream . . . . . . . . . . . . . . . . . . 10 2.1.8. Dependent Stream . . . . . . . . . . . . . . . . . . 11
2.1.9. Media Packetizer . . . . . . . . . . . . . . . . . . 10 2.1.9. Media Packetizer . . . . . . . . . . . . . . . . . . 12
2.1.10. Packet Stream . . . . . . . . . . . . . . . . . . . . 11 2.1.10. Packet Stream . . . . . . . . . . . . . . . . . . . . 12
2.1.11. Media Redundancy . . . . . . . . . . . . . . . . . . 12 2.1.11. Media Redundancy . . . . . . . . . . . . . . . . . . 13
2.1.12. Redundancy Packet Stream . . . . . . . . . . . . . . 12 2.1.12. Redundancy Packet Stream . . . . . . . . . . . . . . 14
2.1.13. Media Transport . . . . . . . . . . . . . . . . . . . 12 2.1.13. Media Transport . . . . . . . . . . . . . . . . . . . 14
2.1.14. Received Packet Stream . . . . . . . . . . . . . . . 15 2.1.14. Received Packet Stream . . . . . . . . . . . . . . . 16
2.1.15. Received Redundandy Packet Stream . . . . . . . . . . 15 2.1.15. Received Redundandy Packet Stream . . . . . . . . . . 16
2.1.16. Media Repair . . . . . . . . . . . . . . . . . . . . 15 2.1.16. Media Repair . . . . . . . . . . . . . . . . . . . . 16
2.1.17. Repaired Packet Stream . . . . . . . . . . . . . . . 15 2.1.17. Repaired Packet Stream . . . . . . . . . . . . . . . 17
2.1.18. Media Depacketizer . . . . . . . . . . . . . . . . . 15 2.1.18. Media Depacketizer . . . . . . . . . . . . . . . . . 17
2.1.19. Received Encoded Stream . . . . . . . . . . . . . . . 15 2.1.19. Received Encoded Stream . . . . . . . . . . . . . . . 17
2.1.20. Media Decoder . . . . . . . . . . . . . . . . . . . . 16 2.1.20. Media Decoder . . . . . . . . . . . . . . . . . . . . 17
2.1.21. Received Source Stream . . . . . . . . . . . . . . . 16 2.1.21. Received Source Stream . . . . . . . . . . . . . . . 18
2.1.22. Media Sink . . . . . . . . . . . . . . . . . . . . . 16 2.1.22. Media Sink . . . . . . . . . . . . . . . . . . . . . 18
2.1.23. Received Raw Stream . . . . . . . . . . . . . . . . . 16 2.1.23. Received Raw Stream . . . . . . . . . . . . . . . . . 18
2.1.24. Media Render . . . . . . . . . . . . . . . . . . . . 16 2.1.24. Media Render . . . . . . . . . . . . . . . . . . . . 18
2.2. Communication Entities . . . . . . . . . . . . . . . . . 17 2.2. Communication Entities . . . . . . . . . . . . . . . . . 19
2.2.1. End Point . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1. End Point . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2. RTP Session . . . . . . . . . . . . . . . . . . . . . 17 2.2.2. RTP Session . . . . . . . . . . . . . . . . . . . . . 19
2.2.3. Participant . . . . . . . . . . . . . . . . . . . . . 18 2.2.3. Participant . . . . . . . . . . . . . . . . . . . . . 20
2.2.4. Multimedia Session . . . . . . . . . . . . . . . . . 19 2.2.4. Multimedia Session . . . . . . . . . . . . . . . . . 20
2.2.5. Communication Session . . . . . . . . . . . . . . . . 19 2.2.5. Communication Session . . . . . . . . . . . . . . . . 21
3. Relations at Different Levels . . . . . . . . . . . . . . . . 20 3. Relations at Different Levels . . . . . . . . . . . . . . . . 22
3.1. Media Source Relations . . . . . . . . . . . . . . . . . 20 3.1. Media Source Relations . . . . . . . . . . . . . . . . . 22
3.1.1. Synchronization Context . . . . . . . . . . . . . . . 20 3.1.1. Synchronization Context . . . . . . . . . . . . . . . 22
3.1.2. End Point . . . . . . . . . . . . . . . . . . . . . . 21 3.1.2. End Point . . . . . . . . . . . . . . . . . . . . . . 23
3.1.3. Participant . . . . . . . . . . . . . . . . . . . . . 22 3.1.3. Participant . . . . . . . . . . . . . . . . . . . . . 24
3.1.4. WebRTC MediaStream . . . . . . . . . . . . . . . . . 22 3.1.4. WebRTC MediaStream . . . . . . . . . . . . . . . . . 24
3.2. Packetization Time Relations . . . . . . . . . . . . . . 22 3.2. Packetization Time Relations . . . . . . . . . . . . . . 24
3.2.1. Single Stream Transport of SVC . . . . . . . . . . . 23 3.2.1. Single and Multi-Session Transmission of SVC . . . . 24
3.2.2. Multi-Channel Audio . . . . . . . . . . . . . . . . . 23 3.2.2. Multi-Channel Audio . . . . . . . . . . . . . . . . . 25
3.2.3. Redundancy Format . . . . . . . . . . . . . . . . . . 23 3.2.3. Redundancy Format . . . . . . . . . . . . . . . . . . 25
3.3. Packet Stream Relations . . . . . . . . . . . . . . . . . 24 3.3. Packet Stream Relations . . . . . . . . . . . . . . . . . 26
3.3.1. Simulcast . . . . . . . . . . . . . . . . . . . . . . 24 3.3.1. Simulcast . . . . . . . . . . . . . . . . . . . . . . 27
3.3.2. Layered Multi-Stream Transmission . . . . . . . . . . 25 3.3.2. Layered Multi-Stream . . . . . . . . . . . . . . . . 28
3.3.3. Robustness and Repair . . . . . . . . . . . . . . . . 26 3.3.3. Robustness and Repair . . . . . . . . . . . . . . . . 29
3.3.4. Packet Stream Separation . . . . . . . . . . . . . . 29 3.3.4. Packet Stream Separation . . . . . . . . . . . . . . 32
3.4. Multiple RTP Sessions over one Media Transport . . . . . 30 3.4. Multiple RTP Sessions over one Media Transport . . . . . 33
4. Topologies and Communication Entities . . . . . . . . . . . . 30 4. Topologies and Communication Entities . . . . . . . . . . . . 33
4.1. Point-to-Point Communication . . . . . . . . . . . . . . 31 4.1. Point-to-Point Communication . . . . . . . . . . . . . . 33
4.2. Central Conferencing . . . . . . . . . . . . . . . . . . 32 4.2. Centralized Conferencing . . . . . . . . . . . . . . . . 34
4.3. Full Mesh Conferencing . . . . . . . . . . . . . . . . . 33 4.3. Full Mesh Conferencing . . . . . . . . . . . . . . . . . 37
4.4. Source-Specific Multicast . . . . . . . . . . . . . . . . 36 4.4. Source-Specific Multicast . . . . . . . . . . . . . . . . 39
5. Security Considerations . . . . . . . . . . . . . . . . . . . 37 5. Security Considerations . . . . . . . . . . . . . . . . . . . 41
6. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 38 6. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 41
7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 38 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 41
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 41
9.1. Normative References . . . . . . . . . . . . . . . . . . 38 9.1. Normative References . . . . . . . . . . . . . . . . . . 42
9.2. Informative References . . . . . . . . . . . . . . . . . 38 9.2. Informative References . . . . . . . . . . . . . . . . . 42
Appendix A. Changes From Earlier Versions . . . . . . . . . . . 40 Appendix A. Changes From Earlier Versions . . . . . . . . . . . 44
A.1. Modifications Between Version -02 and -03 . . . . . . . . 40 A.1. Modifications Between WG Version -00 and -03 . . . . . . 44
A.2. Modifications Between Version -01 and -02 . . . . . . . . 40 A.2. Modifications Between Version -02 and -03 . . . . . . . . 44
A.3. Modifications Between Version -00 and -01 . . . . . . . . 40 A.3. Modifications Between Version -01 and -02 . . . . . . . . 44
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 A.4. Modifications Between Version -00 and -01 . . . . . . . . 44
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 44
1. Introduction 1. Introduction
The existing taxonomy of sources in RTP is often regarded as The existing taxonomy of sources in RTP is often regarded as
confusing and inconsistent. Consequently, a deep understanding of confusing and inconsistent. Consequently, a deep understanding of
how the different terms relate to each other becomes a real how the different terms relate to each other becomes a real
challenge. Frequently cited examples of this confusion are (1) how challenge. Frequently cited examples of this confusion are (1) how
different protocols that make use of RTP use the same terms to different protocols that make use of RTP use the same terms to
signify different things and (2) how the complexities addressed at signify different things and (2) how the complexities addressed at
one layer are often glossed over or ignored at another. one layer are often glossed over or ignored at another.
skipping to change at page 4, line 18 skipping to change at page 4, line 20
transformations and streams in a given RTP usage. For each concept transformations and streams in a given RTP usage. For each concept
an attempt is made to list any alternate definitions and usages that an attempt is made to list any alternate definitions and usages that
co-exist today along with various characteristics that further co-exist today along with various characteristics that further
describes the concept. These concepts are divided into two describes the concept. These concepts are divided into two
categories, one related to the chain of streams and transformations categories, one related to the chain of streams and transformations
that media can be subject to, the other for entities involved in the that media can be subject to, the other for entities involved in the
communication. communication.
2.1. Media Chain 2.1. Media Chain
This section contains the concepts that can be involved in taking a In the context of this memo, Media is a sequence of synthetic or
sequence of physical world stimulus (sound waves, photons, key- Physical Stimulus (Section 2.1.1) (sound waves, photons, key-
strokes) at a sender side and transport them to a receiver, which may strokes), represented in digital form. Synthesized Media is
typically generated directly in the digital domain.
This section contains the concepts that can be involved in taking
Media at a sender side and transporting it to a receiver, which may
recover a sequence of physical stimulus. This chain of concepts is recover a sequence of physical stimulus. This chain of concepts is
of two main types, streams and transformations. Streams are time- of two main types, streams and transformations. Streams are time-
based sequences of samples of the physical stimulus in various based sequences of samples of the physical stimulus in various
representations, while transformations changes the representation of representations, while transformations changes the representation of
the streams in some way. the streams in some way.
The below examples are basic ones and it is important to keep in mind The below examples are basic ones and it is important to keep in mind
that this conceptual model enables more complex usages. Some will be that this conceptual model enables more complex usages. Some will be
further discussed in later sections of this document. In general the further discussed in later sections of this document. In general the
following applies to this model: following applies to this model:
skipping to change at page 5, line 10 skipping to change at page 5, line 18
o There are no formal limitations on how streams are connected to o There are no formal limitations on how streams are connected to
transformations, this may include loops if required by a transformations, this may include loops if required by a
particular transformation. particular transformation.
It is also important to remember that this is a conceptual model. It is also important to remember that this is a conceptual model.
Thus real-world implementations may look different and have different Thus real-world implementations may look different and have different
structure. structure.
To provide a basic understanding of the relationships in the chain we To provide a basic understanding of the relationships in the chain we
below first introduces the concepts for the sender side (Figure 1). below first introduce the concepts for the sender side (Figure 1).
This covers physical stimulus until media packets are emitted onto This covers physical stimulus until media packets are emitted onto
the network. the network.
Physical Stimulus Physical Stimulus
| |
V V
+--------------------+ +--------------------+
| Media Capture | | Media Capture |
+--------------------+ +--------------------+
| |
Raw stream Raw Stream
V V
+--------------------+ +--------------------+
| Media Source |<- Synchronization Timing | Media Source |<- Synchronization Timing
+--------------------+ +--------------------+
| |
Source Stream Source Stream
V V
+--------------------+ +--------------------+
| Media Encoder | | Media Encoder |
+--------------------+ +--------------------+
| |
Encoded Stream +-----------+ Encoded Stream +-----------+
V | V V | V
+--------------------+ | +--------------------+ +--------------------+ | +--------------------+
| Media Packetizer | | | Media Redundancy | | Media Packetizer | | | Media Redundancy |
+--------------------+ | +--------------------+ +--------------------+ | +--------------------+
| | | | | |
+------------+ Redundancy Packet Stream +------------+ Redundancy Packet Stream
Source Packet Stream | Source Packet Stream |
V V V V
+--------------------+ +--------------------+ +--------------------+ +--------------------+
| Media Transport | | Media Transport | | Media Transport | | Media Transport |
+--------------------+ +--------------------+ +--------------------+ +--------------------+
Figure 1: Sender Side Concepts in the Media Chain Figure 1: Sender Side Concepts in the Media Chain
In Figure 1 we have included a branched chain to cover the concepts In Figure 1 we have included a branched chain to cover the concepts
for using redundancy to improve the reliability of the transport. for using redundancy to improve the reliability of the transport.
The Media Transport concept is an aggregate that is decomposed below The Media Transport concept is an aggregate that is decomposed below
in Section 2.1.13.2. in Section 2.1.13.
Below we review a receiver media chain (Figure 2) matching the sender Below we review a receiver media chain (Figure 2) matching the sender
side to look at the inverse transformations and their attempts to side to look at the inverse transformations and their attempts to
recover possibly identical streams as in the sender chain. Note that recover possibly identical streams as in the sender chain. Note that
the streams out of a reverse transformation, like the Source Stream the streams out of a reverse transformation, like the Source Stream
out the Media Decoder are in many cases not the same as the out the Media Decoder are in many cases not the same as the
corresponding ones on the sender side, thus they are prefixed with a corresponding ones on the sender side, thus they are prefixed with a
"Received" to denote a potentially modified version. The reason for "Received" to denote a potentially modified version. The reason for
not being the same lies in the transformations that can be of not being the same lies in the transformations that can be of
irreversible type. For example, lossy source coding in the Media irreversible type. For example, lossy source coding in the Media
Encoder prevents the Source Stream out of the Media Decoder to be the Encoder prevents the Source Stream out of the Media Decoder to be the
same as the one fed into the Media Encoder. Other reasons include same as the one fed into the Media Encoder. Other reasons include
packet loss or late loss in the Media Transport transformation that packet loss or late loss in the Media Transport transformation that
even Media Repair, if used, fails to repair. It should be noted that even Media Repair, if used, fails to repair. It should be noted that
some transformations are not always present, like Media Repair that some transformations are not always present, like Media Repair that
cannot operate without Redundancy Packet Streams. cannot operate without Redundancy Packet Streams.
+--------------------+ +--------------------+ +--------------------+ +--------------------+
| Media Transport | | Media Transport | | Media Transport | | Media Transport |
+--------------------+ +--------------------+ +--------------------+ +--------------------+
| | | |
Received Packet Stream Received Redundancy PS Received Packet Stream Received Redundancy PS
| | | |
| +-------------------+ | +-------------------+
V V V V
+--------------------+ +--------------------+
| Media Repair | | Media Repair |
+--------------------+ +--------------------+
| |
Repaired Packet Stream Repaired Packet Stream
V V
+--------------------+ +--------------------+
| Media Depacketizer | | Media Depacketizer |
+--------------------+ +--------------------+
| |
Received Encoded Stream Received Encoded Stream
V V
+--------------------+ +--------------------+
| Media Decoder | | Media Decoder |
+--------------------+ +--------------------+
| |
Received Source Stream Received Source Stream
V V
+--------------------+ +--------------------+
| Media Sink |--> Synchronization Information | Media Sink |--> Synchronization Information
+--------------------+ +--------------------+
| |
Received Raw Stream Received Raw Stream
V V
+--------------------+
+--------------------+ | Media Renderer |
| Media Renderer | +--------------------+
+--------------------+ |
| V
V Physical Stimulus
Physical Stimulus
Figure 2: Receiver Side Concepts of the Media Chain Figure 2: Receiver Side Concepts of the Media Chain
2.1.1. Physical Stimulus 2.1.1. Physical Stimulus
The physical stimulus is a physical event that can be captured and The physical stimulus is a physical event that can be measured and
provided as media to a receiver. This include sound waves making up converted to digital form by an appropriate sensor or transducer.
audio, photons in a light field that is visible, or other excitations This include sound waves making up audio, photons in a light field
or interactions with sensors, like keystrokes on a keyboard. that is visible, or other excitations or interactions with sensors,
like keystrokes on a keyboard.
2.1.2. Media Capture 2.1.2. Media Capture
The process of transforming the Physical Stimulus (Section 2.1.1) Media Capture is the process of transforming the Physical Stimulus
into captured media. The Media Capture performs a digital sampling (Section 2.1.1) into digital Media using an appropriate sensor or
of the physical stimulus, usually periodically, and outputs this in transducer. The Media Capture performs a digital sampling of the
some representation as a Raw Stream (Section 2.1.3). This data is physical stimulus, usually periodically, and outputs this in some
due to its periodical sampling, or at least being timed asynchronous representation as a Raw Stream (Section 2.1.3). This data is due to
events, some form of a stream of media data. The Media Capture is its periodical sampling, or at least being timed asynchronous events,
normally instantiated in some type of device, i.e. media capture some form of a stream of media data. The Media Capture is normally
device. Examples of different types of media capturing devices are instantiated in some type of device, i.e. media capture device.
digital cameras, microphones connected to A/D converters, or Examples of different types of media capturing devices are digital
keyboards. cameras, microphones connected to A/D converters, or keyboards.
2.1.2.1. Alternate Usages Alternate usages:
The CLUE WG uses the term "Capture Device" to identify a physical o The CLUE WG uses the term "Capture Device" to identify a physical
capture device. capture device.
WebRTC WG uses the term "Recording Device" to refer to the locally o WebRTC WG uses the term "Recording Device" to refer to the locally
available capture devices in an end-system. available capture devices in an end-system.
2.1.2.2. Characteristics Characteristics:
o A Media Capture is identified either by hardware/manufacturer ID o A Media Capture is identified either by hardware/manufacturer ID
or via a session-scoped device identifier as mandated by the or via a session-scoped device identifier as mandated by the
application usage. application usage.
o A Media Capture can generate an Encoded Stream (Section 2.1.7) if o A Media Capture can generate an Encoded Stream (Section 2.1.7) if
the capture device support such a configuration. the capture device support such a configuration.
2.1.3. Raw Stream 2.1.3. Raw Stream
The time progressing stream of digitally sampled information, usually The time progressing stream of digitally sampled information, usually
periodically sampled, provided by a Media Capture (Section 2.1.2). periodically sampled and provided by a Media Capture (Section 2.1.2).
A Raw Stream can also contain synthesized Media that may not require
any explicit Media Capture, since it is already in an appropriate
digital form.
2.1.4. Media Source 2.1.4. Media Source
A Media Source is the logical source of a reference clock A Media Source is the logical source of a reference clock
synchronized, time progressing, digital media stream, called a Source synchronized, time progressing, digital media stream, called a Source
Stream (Section 2.1.5). This transformation takes one or more Raw Stream (Section 2.1.5). This transformation takes one or more Raw
Streams (Section 2.1.3) and provides a Source Stream as output. This Streams (Section 2.1.3) and provides a Source Stream as output. This
output has been synchronized with some reference clock, even if just output has been synchronized with some reference clock, even if just
a system local wall clock. a system local wall clock.
The output can be of different types. One type is directly The output can be of different types. One type is directly
associated with a particular Media Capture's Raw Stream. Others are associated with a particular Media Capture's Raw Stream. Others are
more conceptual sources, like an audio mix of multiple Raw Streams more conceptual sources, like an audio mix of multiple Raw Streams
(Figure 3), a mixed selection of the three loudest inputs regarding (Figure 3), a mixed selection of the three loudest inputs regarding
speech activity, a selection of a particular video based on the speech activity, a selection of a particular video based on the
current speaker, i.e. typically based on other Media Sources. current speaker, i.e. typically based on other Media Sources.
Raw Raw Raw Raw Raw Raw
Stream Stream Stream Stream Stream Stream
| | | | | |
V V V V V V
+--------------------------+ +--------------------------+
| Media Source |<-- Reference Clock | Media Source |<-- Reference Clock
| Mixer | | Mixer |
+--------------------------+ +--------------------------+
| |
V V
Source Stream Source Stream
Figure 3: Conceptual Media Source in form of Audio Mixer Figure 3: Conceptual Media Source in form of Audio Mixer
2.1.4.1. Alternate Usages
The CLUE WG uses the term "Media Capture" for this purpose. A CLUE The CLUE WG uses the term "Media Capture" for this purpose. A CLUE
Media Capture is identified via indexed notation. The terms Audio Media Capture is identified via indexed notation. The terms Audio
Capture and Video Capture are used to identify Audio Sources and Capture and Video Capture are used to identify Audio Sources and
Video Sources respectively. Concepts such as "Capture Scene", Video Sources respectively. Concepts such as "Capture Scene",
"Capture Scene Entry" and "Capture" provide a flexible framework to "Capture Scene Entry" and "Capture" provide a flexible framework to
represent media captured spanning spatial regions. represent media captured spanning spatial regions.
The WebRTC WG defines the term "RtcMediaStreamTrack" to refer to a The WebRTC WG defines the term "RtcMediaStreamTrack" to refer to a
Media Source. An "RtcMediaStreamTrack" is identified by the ID Media Source. An "RtcMediaStreamTrack" is identified by the ID
attribute. attribute.
Typically a Media Source is mapped to a single m=line via the Session Typically a Media Source is mapped to a single m=line via the Session
Description Protocol (SDP) [RFC4566] unless mechanisms such as Description Protocol (SDP) [RFC4566] unless mechanisms such as
Source-Specific attributes are in place [RFC5576]. In the latter Source-Specific attributes are in place [RFC5576]. In the latter
cases, an m=line can represent either multiple Media Sources, cases, an m=line can represent either multiple Media Sources,
multiple Packet Streams (Section 2.1.10), or both. multiple Packet Streams (Section 2.1.10), or both.
2.1.4.2. Characteristics Characteristics:
o At any point, it can represent a physical captured source or o At any point, it can represent a physical captured source or
conceptual source. conceptual source.
2.1.5. Source Stream 2.1.5. Source Stream
A time progressing stream of digital samples that has been A time progressing stream of digital samples that has been
synchronized with a reference clock and comes from particular Media synchronized with a reference clock and comes from particular Media
Source (Section 2.1.4). Source (Section 2.1.4).
skipping to change at page 9, line 45 skipping to change at page 10, line 40
Scalable Media Encoders need special mentioning as they produce Scalable Media Encoders need special mentioning as they produce
multiple outputs that are potentially of different types. A scalable multiple outputs that are potentially of different types. A scalable
Media Encoder takes one input Source Stream and encodes it into Media Encoder takes one input Source Stream and encodes it into
multiple output streams of two different types; at least one Encoded multiple output streams of two different types; at least one Encoded
Stream that is independently decodable and one or more Dependent Stream that is independently decodable and one or more Dependent
Streams (Section 2.1.8) that requires at least one Encoded Stream and Streams (Section 2.1.8) that requires at least one Encoded Stream and
zero or more Dependent Streams to be possible to decode. A Dependent zero or more Dependent Streams to be possible to decode. A Dependent
Stream's dependency is one of the grouping relations this document Stream's dependency is one of the grouping relations this document
discusses further in Section 3.3.2. discusses further in Section 3.3.2.
Source Stream Source Stream
| |
V V
+--------------------------+ +--------------------------+
| Scalable Media Encoder | | Scalable Media Encoder |
+--------------------------+ +--------------------------+
| | ... | | | ... |
V V V V V V
Encoded Dependent Dependent Encoded Dependent Dependent
Stream Stream Stream Stream Stream Stream
Figure 4: Scalable Media Encoder Input and Outputs Figure 4: Scalable Media Encoder Input and Outputs
2.1.6.1. Alternate Usages There are also other variants of encoders, like so-called Multiple
Description Coding (MDC). Such Media Encoder produce multiple
independent and thus individually decodable Encoded Streams that are
possible to combine into a Received Source Stream that is somehow a
better representation of the original Source Stream than using only a
single Encoded Stream.
Within the SDP usage, an SDP media description (m=line) describes Alternate usages:
part of the necessary configuration required for encoding purposes.
CLUE's "Capture Encoding" provides specific encoding configuration o Within the SDP usage, an SDP media description (m=line) describes
for this purpose. part of the necessary configuration required for encoding
purposes.
2.1.6.2. Characteristics o CLUE's "Capture Encoding" provides specific encoding configuration
for this purpose.
Characteristics:
o A Media Source can be multiply encoded by different Media Encoders o A Media Source can be multiply encoded by different Media Encoders
to provide various encoded representations. to provide various encoded representations.
2.1.7. Encoded Stream 2.1.7. Encoded Stream
A stream of time synchronized encoded media that can be independently A stream of time synchronized encoded media that can be independently
decoded. decoded.
2.1.7.1. Characteristics Characteristics:
o Due to temporal dependencies, an Encoded Stream may have o Due to temporal dependencies, an Encoded Stream may have
limitations in where decoding can be started. These entry points, limitations in where decoding can be started. These entry points,
for example Intra frames from a video encoder, may require for example Intra frames from a video encoder, may require
identification and their generation may be event based or identification and their generation may be event based or
configured to occur periodically. configured to occur periodically.
2.1.8. Dependent Stream 2.1.8. Dependent Stream
A stream of time synchronized encoded media fragments that are A stream of time synchronized encoded media fragments that are
dependent on one or more Encoded Streams (Section 2.1.7) and zero or dependent on one or more Encoded Streams (Section 2.1.7) and zero or
more Dependent Streams to be possible to decode. more Dependent Streams to be possible to decode.
2.1.8.1. Characteristics Characteristics:
o Each Dependent Stream has a set of dependencies. These o Each Dependent Stream has a set of dependencies. These
dependencies must be understood by the parties in a multi-media dependencies must be understood by the parties in a multi-media
session that intend to use a Dependent Stream. session that intend to use a Dependent Stream.
2.1.9. Media Packetizer 2.1.9. Media Packetizer
The transformation of taking one or more Encoded (Section 2.1.7) or The transformation of taking one or more Encoded (Section 2.1.7) or
Dependent Stream (Section 2.1.8) and put their content into one or Dependent Stream (Section 2.1.8) and put their content into one or
more sequences of packets, normally RTP packets, and output Source more sequences of packets, normally RTP packets, and output Source
Packet Streams (Section 2.1.10). This step includes both generating Packet Streams (Section 2.1.10). This step includes both generating
RTP payloads as well as RTP packets. RTP payloads as well as RTP packets.
The Media Packetizer can use multiple inputs when producing a single The Media Packetizer can use multiple inputs when producing a single
Packet Stream. One such example is the packetization when using SVC, Packet Stream. One such example is SST packetization when using SVC
as in Single Stream Transport (SST) usage of the payload format both (Section 3.2.1).
an Encoded Stream as well as Dependent Streams are packetized in a
single Source Packet Stream using a single SSRC.
The Media Packetizer can also produce multiple Packet Streams, for The Media Packetizer can also produce multiple Packet Streams, for
example when Encoded and/or Dependent Streams are distributed over example when Encoded and/or Dependent Streams are distributed over
multiple Packet Streams, possibly in different RTP sessions. multiple Packet Streams. One example of this is MST packetization
when using SVC (Section 3.2.1).
2.1.9.1. Alternate Usages Alternate usages:
An RTP sender is part of the Media Packetizer. o An RTP sender is part of the Media Packetizer.
2.1.9.2. Characteristics Characteristics:
o The Media Packetizer will select which Synchronization source(s) o The Media Packetizer will select which Synchronization source(s)
(SSRC) [RFC3550] in which RTP sessions that are used. (SSRC) [RFC3550] in which RTP sessions that are used.
o Media Packetizer can combine multiple Encoded or Dependent Streams o Media Packetizer can combine multiple Encoded or Dependent Streams
into one or more Packet Streams. into one or more Packet Streams.
2.1.10. Packet Stream 2.1.10. Packet Stream
A stream of RTP packets containing media data, source or redundant. A stream of RTP packets containing media data, source or redundant.
The Packet Stream is identified by an SSRC belonging to a particular The Packet Stream is identified by an SSRC belonging to a particular
RTP session. The RTP session is identified as discussed in RTP session. The RTP session is identified as discussed in
Section 2.2.2. Section 2.2.2.
A Source Packet Stream is a packet stream containing at least some A Source Packet Stream is a packet stream containing at least some
content from an Encoded Stream. Source material is any media content from an Encoded Stream. Source material is any media
material that is produced for transport over RTP without any material that is produced for transport over RTP without any
additional redundancy applied to cope with network transport losses. additional redundancy applied to cope with network transport losses.
Compare this with the Redundancy Packet Stream (Section 2.1.12). Compare this with the Redundancy Packet Stream (Section 2.1.12).
2.1.10.1. Alternate Usages Alternate usages:
The term "Stream" is used by the CLUE WG to define an encoded Media o The term "Stream" is used by the CLUE WG to define an encoded
Source sent via RTP. "Capture Encoding", "Encoding Groups" are Media Source sent via RTP. "Capture Encoding", "Encoding Groups"
defined to capture specific details of the encoding scheme. are defined to capture specific details of the encoding scheme.
RFC3550 [RFC3550] uses the terms media stream, audio stream, video o RFC3550 [RFC3550] uses the terms media stream, audio stream, video
stream and streams of (RTP) packets interchangeably. It defines the stream and streams of (RTP) packets interchangeably. It defines
SSRC as the "The source of a stream of RTP packets, ..." the SSRC as the "The source of a stream of RTP packets, ...".
The equivalent mapping of a Packet Stream in SDP [RFC4566] is defined
per usage. For example, each Media Description (m=line) and
associated attributes can describe one Packet Stream OR properties
for multiple Packet Streams OR for an RTP session (via [RFC5576]
mechanisms for example).
2.1.10.2. Characteristics o The equivalent mapping of a Packet Stream in SDP [RFC4566] is
defined per usage. For example, each Media Description (m=line)
and associated attributes can describe one Packet Stream OR
properties for multiple Packet Streams OR for an RTP session (via
[RFC5576] mechanisms for example).
Characteristics:
o Each Packet Stream is identified by a unique Synchronization o Each Packet Stream is identified by a unique Synchronization
source (SSRC) [RFC3550] that is carried in every RTP and RTP source (SSRC) [RFC3550] that is carried in every RTP and RTP
Control Protocol (RTCP) packet header in a specific RTP session Control Protocol (RTCP) packet header in a specific RTP session
context. context.
o At any given point in time, a Packet Stream can have one and only o At any given point in time, a Packet Stream can have one and only
one SSRC. one SSRC. SSRC collision is a valid reason to change SSRC for a
Packet Stream, since the Packet Stream itself is not changed in
any way, only the identifying SSRC number.
o Each Packet Stream defines a unique RTP sequence numbering and o Each Packet Stream defines a unique RTP sequence numbering and
timing space. timing space.
o Several Packet Streams may map to a single Media Source via the o Several Packet Streams may map to a single Media Source via the
source transformations. source transformations.
o Several Packet Streams can be carried over a single RTP Session. o Several Packet Streams can be carried over a single RTP Session.
2.1.11. Media Redundancy 2.1.11. Media Redundancy
skipping to change at page 13, line 15 skipping to change at page 14, line 25
(Section 2.1.10) are subjected to by the end-to-end transport from (Section 2.1.10) are subjected to by the end-to-end transport from
one RTP sender to one specific RTP receiver (an RTP session may one RTP sender to one specific RTP receiver (an RTP session may
contain multiple RTP receivers per sender). Each Media Transport is contain multiple RTP receivers per sender). Each Media Transport is
defined by a transport association that is identified by a 5-tuple defined by a transport association that is identified by a 5-tuple
(source address, source port, destination address, destination port, (source address, source port, destination address, destination port,
transport protocol). Each transport association normally contains transport protocol). Each transport association normally contains
only a single RTP session, although a proposal exists for sending only a single RTP session, although a proposal exists for sending
multiple RTP sessions over one transport association multiple RTP sessions over one transport association
[I-D.westerlund-avtcore-transport-multiplexing]. [I-D.westerlund-avtcore-transport-multiplexing].
2.1.13.1. Characteristics Characteristics:
o Media Transport transmits Packet Streams of RTP Packets from a o Media Transport transmits Packet Streams of RTP Packets from a
source transport address to a destination transport address. source transport address to a destination transport address.
2.1.13.2. Media Stream Decomposition
The Media Transport concept sometimes needs to be decomposed into The Media Transport concept sometimes needs to be decomposed into
more steps to enable discussion of what a sender emits that gets more steps to enable discussion of what a sender emits that gets
transformed by the network before it is received by the receiver. transformed by the network before it is received by the receiver.
Thus we provide also this Media Transport decomposition (Figure 5). Thus we provide also this Media Transport decomposition (Figure 5).
Packet Stream Packet Stream
| |
V V
+--------------------------+ +--------------------------+
| Media Transport Sender | | Media Transport Sender |
+--------------------------+ +--------------------------+
| |
Sent Packet Stream Sent Packet Stream
V V
+--------------------------+ +--------------------------+
| Network Transport | | Network Transport |
+--------------------------+ +--------------------------+
| |
Transported Packet Stream Transported Packet Stream
V V
+--------------------------+ +--------------------------+
| Media Transport Receiver | | Media Transport Receiver |
+--------------------------+ +--------------------------+
| |
V V
Received Packet Stream Received Packet Stream
Figure 5: Decomposition of Media Transport Figure 5: Decomposition of Media Transport
2.1.13.2.1. Media Transport Sender 2.1.13.1. Media Transport Sender
The first transformation within the Media Transport (Section 2.1.13) The first transformation within the Media Transport (Section 2.1.13)
is the Media Transport Sender, where the sending End-Point is the Media Transport Sender, where the sending End-Point
(Section 2.2.1) takes a Packet Stream and emits the packets onto the (Section 2.2.1) takes a Packet Stream and emits the packets onto the
network using the transport association established for this Media network using the transport association established for this Media
Transport thus creating a Sent Packet Stream (Section 2.1.13.2.2). Transport thus creating a Sent Packet Stream (Section 2.1.13.2). In
In this process it transforms the Packet Stream in several ways. this process it transforms the Packet Stream in several ways. First,
First, it gains the necessary protocol headers for the transport it gains the necessary protocol headers for the transport
association, for example IP and UDP headers, thus forming IP/UDP/RTP association, for example IP and UDP headers, thus forming IP/UDP/RTP
packets. In addition, the Media Transport Sender may queue, pace or packets. In addition, the Media Transport Sender may queue, pace or
otherwise affect how the packets are emitted onto the network. Thus otherwise affect how the packets are emitted onto the network. Thus
adding delay, jitter and inter packet spacings that characterize the adding delay, jitter and inter packet spacings that characterize the
Sent Packet Stream. Sent Packet Stream.
2.1.13.2.2. Sent Packet Stream 2.1.13.2. Sent Packet Stream
The Sent Packet Stream is the Packet Stream as entering the first hop The Sent Packet Stream is the Packet Stream as entering the first hop
of the network path to its destination. The Sent Packet Stream is of the network path to its destination. The Sent Packet Stream is
identified using network transport addresses, like for IP/UDP the identified using network transport addresses, like for IP/UDP the
5-tuple (source IP address, source port, destination IP address, 5-tuple (source IP address, source port, destination IP address,
destination port, and protocol (UDP)). destination port, and protocol (UDP)).
2.1.13.2.3. Network Transport 2.1.13.3. Network Transport
Network Transport is the transformation that the Sent Packet Stream Network Transport is the transformation that the Sent Packet Stream
(Section 2.1.13.2.2) is subjected to by traveling from the source to (Section 2.1.13.2) is subjected to by traveling from the source to
the destination through the network. These transformations include, the destination through the network. These transformations include,
loss of some packets, varying delay on a per packet basis, packet loss of some packets, varying delay on a per packet basis, packet
duplication, and packet header or data corruption. These duplication, and packet header or data corruption. These
transformations produces a Transported Packet Stream transformations produces a Transported Packet Stream
(Section 2.1.13.2.4) at the exit of the network path. (Section 2.1.13.4) at the exit of the network path.
2.1.13.2.4. Transported Packet Stream 2.1.13.4. Transported Packet Stream
The Packet Stream that is emitted out of the network path at the The Packet Stream that is emitted out of the network path at the
destination, subjected to the Network Transport's transformation destination, subjected to the Network Transport's transformation
(Section 2.1.13.2.3). (Section 2.1.13.3).
2.1.13.2.5. Media Transport Receiver 2.1.13.5. Media Transport Receiver
The receiver End-Point's (Section 2.2.1) transformation of the The receiver End-Point's (Section 2.2.1) transformation of the
Transported Packet Stream (Section 2.1.13.2.4) by its reception Transported Packet Stream (Section 2.1.13.4) by its reception process
process that result in the Received Packet Stream (Section 2.1.14). that result in the Received Packet Stream (Section 2.1.14). This
This transformation includes transport checksums being verified and transformation includes transport checksums being verified and if
if non-matching, causing discarding of the corrupted packet. Other non-matching, causing discarding of the corrupted packet. Other
transformations can include delay variations in receiving a packet on transformations can include delay variations in receiving a packet on
the network interface and providing it to the application. the network interface and providing it to the application.
2.1.14. Received Packet Stream 2.1.14. Received Packet Stream
The Packet Stream (Section 2.1.10) resulting from the Media The Packet Stream (Section 2.1.10) resulting from the Media
Transport's transformation, i.e. subjected to packet loss, packet Transport's transformation, i.e. subjected to packet loss, packet
corruption, packet duplication and varying transmission delay from corruption, packet duplication and varying transmission delay from
sender to receiver. sender to receiver.
skipping to change at page 15, line 44 skipping to change at page 17, line 21
to try to re-create the Packet Stream (Section 2.1.10) as it was to try to re-create the Packet Stream (Section 2.1.10) as it was
before Media Transport (Section 2.1.13). before Media Transport (Section 2.1.13).
2.1.18. Media Depacketizer 2.1.18. Media Depacketizer
A Media Depacketizer takes one or more Packet Streams A Media Depacketizer takes one or more Packet Streams
(Section 2.1.10) and depacketizes them and attempts to reconstitute (Section 2.1.10) and depacketizes them and attempts to reconstitute
the Encoded Streams (Section 2.1.7) or Dependent Streams the Encoded Streams (Section 2.1.7) or Dependent Streams
(Section 2.1.8) present in those Packet Streams. (Section 2.1.8) present in those Packet Streams.
It should be noted that in practical implementations, the Media
Depacketizer and the Media Decoder may be tightly coupled and share
information to improve or optimize the overall decoding process in
various ways. It is however not expected that there would be any
benefit in defining a taxonomy for those detailed (and likely very
implementation-dependent) steps.
2.1.19. Received Encoded Stream 2.1.19. Received Encoded Stream
The received version of an Encoded Stream (Section 2.1.7). The received version of an Encoded Stream (Section 2.1.7).
2.1.20. Media Decoder 2.1.20. Media Decoder
A Media Decoder is a transformation that is responsible for decoding A Media Decoder is a transformation that is responsible for decoding
Encoded Streams (Section 2.1.7) and any Dependent Streams Encoded Streams (Section 2.1.7) and any Dependent Streams
(Section 2.1.8) into a Source Stream (Section 2.1.5). (Section 2.1.8) into a Source Stream (Section 2.1.5).
2.1.20.1. Alternate Usages It should be noted that in practical implementations, the Media
Decoder and the Media Depacketizer may be tightly coupled and share
information to improve or optimize the overall decoding process in
various ways. It is however not expected that there would be any
benefit in defining a taxonomy for those detailed (and likely very
implementation-dependent) steps.
Within the context of SDP, an m=line describes the necessary Alternate usages:
configuration and identification (RTP Payload Types) required to
decode either one or more incoming Media Streams.
2.1.20.2. Characteristics o Within the context of SDP, an m=line describes the necessary
configuration and identification (RTP Payload Types) required to
decode either one or more incoming Media Streams.
Characteristics:
o A Media Decoder is the entity that will have to deal with any o A Media Decoder is the entity that will have to deal with any
errors in the encoded streams that resulted from corruptions or errors in the encoded streams that resulted from corruptions or
failures to repair packet losses. This as a media decoder failures to repair packet losses. This as a media decoder
generally is forced to produce some output periodically. It thus generally is forced to produce some output periodically. It thus
commonly includes concealment methods. commonly includes concealment methods.
2.1.21. Received Source Stream 2.1.21. Received Source Stream
The received version of a Source Stream (Section 2.1.5). The received version of a Source Stream (Section 2.1.5).
skipping to change at page 16, line 40 skipping to change at page 18, line 26
The Media Sink receives a Source Stream (Section 2.1.5) that The Media Sink receives a Source Stream (Section 2.1.5) that
contains, usually periodically, sampled media data together with contains, usually periodically, sampled media data together with
associated synchronization information. Depending on application, associated synchronization information. Depending on application,
this Source Stream then needs to be transformed into a Raw Stream this Source Stream then needs to be transformed into a Raw Stream
(Section 2.1.3) that is sent in synchronization with the output from (Section 2.1.3) that is sent in synchronization with the output from
other Media Sinks to a Media Render (Section 2.1.24). The media sink other Media Sinks to a Media Render (Section 2.1.24). The media sink
may also be connected with a Media Source (Section 2.1.4) and be used may also be connected with a Media Source (Section 2.1.4) and be used
as part of a conceptual Media Source. as part of a conceptual Media Source.
2.1.22.1. Characteristics Characteristics:
o The media sink can further transform the source stream into a o The Media Sink can further transform the Source Stream into a
representation that is suitable for rendering on the Media Render representation that is suitable for rendering on the Media Render
as defined by the application or system-wide configuration. This as defined by the application or system-wide configuration. This
include sample scaling, level adjustments etc. include sample scaling, level adjustments etc.
2.1.23. Received Raw Stream 2.1.23. Received Raw Stream
The received version of a Raw Stream (Section 2.1.3). The received version of a Raw Stream (Section 2.1.3).
2.1.24. Media Render 2.1.24. Media Render
A Media Render takes a Raw Stream (Section 2.1.3) and converts it A Media Render takes a Raw Stream (Section 2.1.3) and converts it
into Physical Stimulus (Section 2.1.1) that a human user can into Physical Stimulus (Section 2.1.1) that a human user can
perceive. Examples of such devices are screens, D/A converters perceive. Examples of such devices are screens, D/A converters
connected to amplifiers and loudspeakers. connected to amplifiers and loudspeakers.
2.1.24.1. Characteristics Characteristics:
o An End Point can potentially have multiple Media Renders for each o An End Point can potentially have multiple Media Renders for each
media type. media type.
2.2. Communication Entities 2.2. Communication Entities
This section contains concept for entities involved in the This section contains concept for entities involved in the
communication. communication.
2.2.1. End Point 2.2.1. End Point
A single addressable entity sending or receiving RTP packets. It may A single addressable entity sending or receiving RTP packets. It may
be decomposed into several functional blocks, but as long as it be decomposed into several functional blocks, but as long as it
behaves as a single RTP stack entity it is classified as a single behaves as a single RTP stack entity it is classified as a single
"End Point". "End Point".
2.2.1.1. Alternate Usages Alternate usages:
The CLUE Working Group (WG) uses the terms "Media Provider" and o The CLUE Working Group (WG) uses the terms "Media Provider" and
"Media Consumer" to describes aspects of End Point pertaining to "Media Consumer" to describes aspects of End Point pertaining to
sending and receiving functionalities. sending and receiving functionalities.
2.2.1.2. Characteristics Characteristics:
End Points can be identified in several different ways. While RTCP o End Points can be identified in several different ways. While
Canonical Names (CNAMEs) [RFC3550] provide a globally unique and RTCP Canonical Names (CNAMEs) [RFC3550] provide a globally unique
stable identification mechanism for the duration of the Communication and stable identification mechanism for the duration of the
Session (see Section 2.2.5), their validity applies exclusively Communication Session (see Section 2.2.5), their validity applies
within a Synchronization Context (Section 3.1.1). Thus one End Point exclusively within a Synchronization Context (Section 3.1.1).
can have multiple CNAMEs. Therefore, mechanisms outside the scope of Thus one End Point can have multiple CNAMEs. Therefore,
RTP, such as application defined mechanisms, must be used to ensure mechanisms outside the scope of RTP, such as application defined
End Point identification when outside this Synchronization Context. mechanisms, must be used to ensure End Point identification when
outside this Synchronization Context.
2.2.2. RTP Session 2.2.2. RTP Session
An RTP session is an association among a group of participants An RTP session is an association among a group of participants
communicating with RTP. It is a group communications channel which communicating with RTP. It is a group communications channel which
can potentially carry a number of Packet Streams. Within an RTP can potentially carry a number of Packet Streams. Within an RTP
session, every participant can find meta-data and control information session, every participant can find meta-data and control information
(over RTCP) about all the Packet Streams in the RTP session. The (over RTCP) about all the Packet Streams in the RTP session. The
bandwidth of the RTCP control channel is shared between all bandwidth of the RTCP control channel is shared between all
participants within an RTP Session. participants within an RTP Session.
2.2.2.1. Alternate Usages Alternate usages:
Within the context of SDP, a singe m=line can map to a single RTP o Within the context of SDP, a singe m=line can map to a single RTP
Session or multiple m=lines can map to a single RTP Session. The Session or multiple m=lines can map to a single RTP Session. The
latter is enabled via multiplexing schemes such as BUNDLE latter is enabled via multiplexing schemes such as BUNDLE
[I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which allows [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which
mapping of multiple m=lines to a single RTP Session. allows mapping of multiple m=lines to a single RTP Session.
2.2.2.2. Characteristics Characteristics:
o Typically, an RTP Session can carry one ore more Packet Streams. o Typically, an RTP Session can carry one ore more Packet Streams.
o An RTP Session shares a single SSRC space as defined in RFC3550 o An RTP Session shares a single SSRC space as defined in RFC3550
[RFC3550]. That is, the End Points participating in an RTP [RFC3550]. That is, the End Points participating in an RTP
Session can see an SSRC identifier transmitted by any of the other Session can see an SSRC identifier transmitted by any of the other
End Points. An End Point can receive an SSRC either as SSRC or as End Points. An End Point can receive an SSRC either as SSRC or as
a Contributing source (CSRC) in RTP and RTCP packets, as defined a Contributing source (CSRC) in RTP and RTCP packets, as defined
by the endpoints' network interconnection topology. by the endpoints' network interconnection topology.
skipping to change at page 18, line 44 skipping to change at page 20, line 36
[I-D.westerlund-avtcore-transport-multiplexing]. [I-D.westerlund-avtcore-transport-multiplexing].
o Multiple RTP Sessions can be related. o Multiple RTP Sessions can be related.
2.2.3. Participant 2.2.3. Participant
A participant is an entity reachable by a single signaling address, A participant is an entity reachable by a single signaling address,
and is thus related more to the signaling context than to the media and is thus related more to the signaling context than to the media
context. context.
2.2.3.1. Characteristics Characteristics:
o A single signaling-addressable entity, using an application- o A single signaling-addressable entity, using an application-
specific signaling address space, for example a SIP URI. specific signaling address space, for example a SIP URI.
o A participant can have several Multimedia Sessions o A participant can have several Multimedia Sessions
(Section 2.2.4). (Section 2.2.4).
o A participant can have several associated transport flows, o A participant can have several associated transport flows,
including several separate local transport addresses for those including several separate local transport addresses for those
transport flows. transport flows.
2.2.4. Multimedia Session 2.2.4. Multimedia Session
A multimedia session is an association among a group of participants A multimedia session is an association among a group of participants
engaged in the communication via one or more RTP Sessions engaged in the communication via one or more RTP Sessions
(Section 2.2.2). It defines logical relationships among Media (Section 2.2.2). It defines logical relationships among Media
Sources (Section 2.1.4) that appear in multiple RTP Sessions. Sources (Section 2.1.4) that appear in multiple RTP Sessions.
2.2.4.1. Alternate Usages Alternate usages:
RFC4566 [RFC4566] defines a multimedia session as a set of multimedia o RFC4566 [RFC4566] defines a multimedia session as a set of
senders and receivers and the data streams flowing from senders to multimedia senders and receivers and the data streams flowing from
receivers. senders to receivers.
RFC3550 [RFC3550] defines it as set of concurrent RTP sessions among o RFC3550 [RFC3550] defines it as set of concurrent RTP sessions
a common group of participants. For example, a video conference among a common group of participants. For example, a video
(which is a multimedia session) may contain an audio RTP session and conference (which is a multimedia session) may contain an audio
a video RTP session. RTP session and a video RTP session.
2.2.4.2. Characteristics Characteristics:
o A Multimedia Session can be composed of several parallel RTP o A Multimedia Session can be composed of several parallel RTP
Sessions with potentially multiple Packet Streams per RTP Session. Sessions with potentially multiple Packet Streams per RTP Session.
o Each participant in a Multimedia Session can have a multitude of o Each participant in a Multimedia Session can have a multitude of
Media Captures and Media Rendering devices. Media Captures and Media Rendering devices.
2.2.5. Communication Session 2.2.5. Communication Session
A Communication Session is an association among group of participants A Communication Session is an association among group of participants
communicating with each other via a set of Multimedia Sessions. communicating with each other via a set of Multimedia Sessions.
2.2.5.1. Alternate Usages Alternate usages:
The Session Description Protocol (SDP) [RFC4566] defines a multimedia o The Session Description Protocol (SDP) [RFC4566] defines a
session as a set of multimedia senders and receivers and the data multimedia session as a set of multimedia senders and receivers
streams flowing from senders to receivers. In that definition it is and the data streams flowing from senders to receivers. In that
however not clear if a multimedia session includes both the sender's definition it is however not clear if a multimedia session
and the receiver's view of the same RTP Packet Stream. includes both the sender's and the receiver's view of the same RTP
Packet Stream.
2.2.5.2. Characteristics Characteristics:
o Each participant in a Communication Session is identified via an o Each participant in a Communication Session is identified via an
application-specific signaling address. application-specific signaling address.
o A Communication Session is composed of at least one Multimedia o A Communication Session is composed of at least one Multimedia
Session per participant, involving one or more parallel RTP Session per participant, involving one or more parallel RTP
Sessions with potentially multiple Packet Streams per RTP Session. Sessions with potentially multiple Packet Streams per RTP Session.
For example, in a full mesh communication, the Communication Session For example, in a full mesh communication, the Communication Session
consists of a set of separate Multimedia Sessions between each pair consists of a set of separate Multimedia Sessions between each pair
skipping to change at page 21, line 37 skipping to change at page 23, line 28
the one defined by the usage of CNAME source descriptions. the one defined by the usage of CNAME source descriptions.
3.1.1.3. CLUE Scenes 3.1.1.3. CLUE Scenes
In CLUE "Capture Scene", "Capture Scene Entry" and "Captures" define In CLUE "Capture Scene", "Capture Scene Entry" and "Captures" define
an implied Synchronization Context. an implied Synchronization Context.
3.1.1.4. Implicitly via RtcMediaStream 3.1.1.4. Implicitly via RtcMediaStream
The WebRTC WG defines "RtcMediaStream" with one or more The WebRTC WG defines "RtcMediaStream" with one or more
"RtcMediaStreamTracks". All tracks in a "RTCMediaStream" are "RtcMediaStreamTracks". All tracks in a "RtcMediaStream" are
intended to be possible to synchronize when rendered. intended to be possible to synchronize when rendered.
3.1.1.5. Explicitly via SDP Mechanisms 3.1.1.5. Explicitly via SDP Mechanisms
RFC5888 [RFC5888] defines m=line grouping mechanism called "Lip RFC5888 [RFC5888] defines m=line grouping mechanism called "Lip
Synchronization (LS)" for establishing the synchronization Synchronization (LS)" for establishing the synchronization
requirement across m=lines when they map to individual sources. requirement across m=lines when they map to individual sources.
RFC5576 [RFC5576] extends the above mechanism when multiple media RFC5576 [RFC5576] extends the above mechanism when multiple media
sources are described by a single m=line. sources are described by a single m=line.
skipping to change at page 23, line 5 skipping to change at page 24, line 42
3.2. Packetization Time Relations 3.2. Packetization Time Relations
At RTP Packetization time, there exists a possibility for a number of At RTP Packetization time, there exists a possibility for a number of
different types of relationships between Encoded Streams different types of relationships between Encoded Streams
(Section 2.1.7), Dependent Streams (Section 2.1.8) and Packet Streams (Section 2.1.7), Dependent Streams (Section 2.1.8) and Packet Streams
(Section 2.1.10). These are caused by grouping together or (Section 2.1.10). These are caused by grouping together or
distributing these different types of streams into Packet Streams. distributing these different types of streams into Packet Streams.
This section will look at such relationships. This section will look at such relationships.
3.2.1. Single Stream Transport of SVC 3.2.1. Single and Multi-Session Transmission of SVC
Scalable Video Coding [RFC6190] has a mode of operation where Encoded Scalable Video Coding [RFC6190] has a mode of operation called Single
Streams and Dependent Streams from the SVC Media Encoder is grouped Session Transmission (SST), where Encoded Streams and Dependent
together in a single Source Packet Stream using the SVC RTP Payload Streams from the SVC Media Encoder are sent in a single RTP Session
format. (Section 2.2.2) using the SVC RTP Payload format. There is another
mode of operation where Encoded Streams and Dependent Streams are
distributed across multiple RTP Sessions, called Multi-Session
Transmission (MST). Regardless if used with SST or MST, as they are
defined, each of those RTP Sessions may contain one or more Packet
Streams (SSRC) per Media Source.
To elaborate, what could be called SST-SingleStream (SST-SS) uses a
single Packet Stream in a single RTP Session to send all Encoded and
Dependent Streams. Similarly, SST-MultiStream (SST-MS) uses multiple
Packet Streams in a single RTP Session to send the Encoded and
Dependent Streams. MST-SS uses a single Packet Stream in each of
multiple RTP Sessions and MST-MS uses multiple Packet Streams in each
of the multiple RTP Sessions:
+-----------------------+--------------------+----------------------+
| | Single RTP Session | Multiple RTP |
| | | Sessions |
+-----------------------+--------------------+----------------------+
| Single Packet Stream | SST-SS | MST-SS |
| Multiple Packet | SST-MS | MST-MS |
| Streams | | |
+-----------------------+--------------------+----------------------+
3.2.2. Multi-Channel Audio 3.2.2. Multi-Channel Audio
There exist a number of RTP payload formats that can carry multi- There exist a number of RTP payload formats that can carry multi-
channel audio, despite the codec being a mono encoder. Multi-channel channel audio, despite the codec being a mono encoder. Multi-channel
audio can be viewed as multiple Media Sources sharing a common audio can be viewed as multiple Media Sources sharing a common
Synchronization Context. These are then independently encoded by a Synchronization Context. These are independently encoded by a Media
Media Encoder and the different Encoded Streams are then packetized Encoder and the different Encoded Streams are then packetized
together in a time synchronized way into a single Source Packet together in a time synchronized way into a single Source Packet
Stream using the used codec's RTP Payload format. Example of such Stream using the used codec's RTP Payload format. Example of such
codecs are, PCMA and PCMU [RFC3551], AMR [RFC4867], and G.719 codecs are, PCMA and PCMU [RFC3551], AMR [RFC4867], and G.719
[RFC5404]. [RFC5404].
3.2.3. Redundancy Format 3.2.3. Redundancy Format
The RTP Payload for Redundant Audio Data [RFC2198] defines how one The RTP Payload for Redundant Audio Data [RFC2198] defines how one
can transport redundant audio data together with primary data in the can transport redundant audio data together with primary data in the
same RTP payload. The redundant data can be a time delayed version same RTP payload. The redundant data can be a time delayed version
of the primary or another time delayed Encoded stream using a of the primary or another time delayed Encoded Stream using a
different Media Encoder to encode the same Media Source as the different Media Encoder to encode the same Media Source as the
primary, as depicted below in Figure 6. primary, as depicted below in Figure 6.
+--------------------+ +--------------------+
| Media Source | | Media Source |
+--------------------+ +--------------------+
| |
Source Stream Source Stream
| |
+------------------------+ +------------------------+
| | | |
V V V V
+--------------------+ +--------------------+ +--------------------+ +--------------------+
| Media Encoder | | Media Encoder | | Media Encoder | | Media Encoder |
+--------------------+ +--------------------+ +--------------------+ +--------------------+
| | | |
| +------------+ | +------------+
Encoded Stream | Time Delay | Encoded Stream | Time Delay |
| +------------+ | +------------+
| | | |
| +------------------+ | +------------------+
V V V V
+--------------------+ +--------------------+
| Media Packetizer | | Media Packetizer |
+--------------------+ +--------------------+
| |
V V
Packet Stream Packet Stream
Figure 6: Concept for usage of Audio Redundancy with different Media Figure 6: Concept for usage of Audio Redundancy with different Media
Encoders Encoders
The Redundancy format is thus providing the necessary meta The Redundancy format is thus providing the necessary meta
information to correctly relate different parts of the same Encoded information to correctly relate different parts of the same Encoded
Stream, or in the case depicted above (Figure 6) relate the Received Stream, or in the case depicted above (Figure 6) relate the Received
Source Stream fragments coming out of different Media Decoders to be Source Stream fragments coming out of different Media Decoders to be
able to combine them together into a less erroneous Source Stream. able to combine them together into a less erroneous Source Stream.
skipping to change at page 24, line 43 skipping to change at page 27, line 17
A Media Source represented as multiple independent Encoded Streams A Media Source represented as multiple independent Encoded Streams
constitutes a simulcast of that Media Source. Figure 7 below constitutes a simulcast of that Media Source. Figure 7 below
represents an example of a Media Source that is encoded into three represents an example of a Media Source that is encoded into three
separate and different Simulcast streams, that are in turn sent on separate and different Simulcast streams, that are in turn sent on
the same Media Transport flow. When using Simulcast, the Packet the same Media Transport flow. When using Simulcast, the Packet
Streams may be sharing RTP Session and Media Transport, or be Streams may be sharing RTP Session and Media Transport, or be
separated on different RTP Sessions and Media Transports, or be any separated on different RTP Sessions and Media Transports, or be any
combination of these two. It is other considerations that affect combination of these two. It is other considerations that affect
which usage is desirable, as discussed in Section 3.3.4. which usage is desirable, as discussed in Section 3.3.4.
+----------------+ +----------------+
| Media Source | | Media Source |
+----------------+ +----------------+
Source Stream | Source Stream |
+----------------------+----------------------+ +----------------------+----------------------+
| | | | | |
v v v v v v
+------------------+ +------------------+ +------------------+ +------------------+ +------------------+ +------------------+
| Media Encoder | | Media Encoder | | Media Encoder | | Media Encoder | | Media Encoder | | Media Encoder |
+------------------+ +------------------+ +------------------+ +------------------+ +------------------+ +------------------+
| Encoded | Encoded | Encoded | Encoded | Encoded | Encoded
| Stream | Stream | Stream | Stream | Stream | Stream
v v v v v v
+------------------+ +------------------+ +------------------+ +------------------+ +------------------+ +------------------+
| Media Packetizer | | Media Packetizer | | Media Packetizer | | Media Packetizer | | Media Packetizer | | Media Packetizer |
+------------------+ +------------------+ +------------------+ +------------------+ +------------------+ +------------------+
| Source | Source | Source | Source | Source | Source
| Packet | Packet | Packet | Packet | Packet | Packet
| Stream | Stream | Stream | Stream | Stream | Stream
+-----------------+ | +-----------------+ +-----------------+ | +-----------------+
| | | | | |
V V V V V V
+-------------------+ +-------------------+
| Media Transport | | Media Transport |
+-------------------+ +-------------------+
Figure 7: Example of Media Source Simulcast Figure 7: Example of Media Source Simulcast
The simulcast relation between the Packet Streams is the common Media The simulcast relation between the Packet Streams is the common Media
Source. In addition, to be able to identify the common Media Source, Source. In addition, to be able to identify the common Media Source,
a receiver of the Packet Stream may need to know which configuration a receiver of the Packet Stream may need to know which configuration
or encoding goals that lay behind the produced Encoded Stream and its or encoding goals that lay behind the produced Encoded Stream and its
properties. This to enable selection of the stream that is most properties. This to enable selection of the stream that is most
useful in the application at that moment. useful in the application at that moment.
3.3.2. Layered Multi-Stream Transmission 3.3.2. Layered Multi-Stream
Multi-stream transmission (MST) is a mechanism by which different Layered Multi-Stream (LMS) is a mechanism by which different portions
portions of a layered encoding of a Source Stream are sent using of a layered encoding of a Source Stream are sent using separate
separate Packet Streams (sometimes in separate RTP sessions). MSTs Packet Streams (sometimes in separate RTP Sessions). LMSs are useful
are useful for receiver control of layered media. for receiver control of layered media.
A Media Source represented as an Encoded Stream and multiple A Media Source represented as an Encoded Stream and multiple
Dependent Streams constitutes a Media Source that has layered Dependent Streams constitutes a Media Source that has layered
dependency. The figure below represents an example of a Media Source dependencies. The figure below represents an example of a Media
that is encoded into three dependent layers, where two layers are Source that is encoded into three dependent layers, where two layers
sent on the same Media Transport using different Packet Streams, i.e. are sent on the same Media Transport using different Packet Streams,
SSRCs, and the third layer is sent on a separate Media Transport, i.e. SSRCs, and the third layer is sent on a separate Media
i.e. a different RTP Session. Transport, i.e. a different RTP Session.
+----------------+ +----------------+
| Media Source | | Media Source |
+----------------+ +----------------+
| |
| |
V V
+---------------------------------------------------------+ +---------------------------------------------------------+
| Media Encoder | | Media Encoder |
+---------------------------------------------------------+ +---------------------------------------------------------+
skipping to change at page 26, line 25 skipping to change at page 28, line 48
| | | | | |
+------+ +------+ | +------+ +------+ |
| | | | | |
V V V V V V
+-----------------+ +-----------------+ +-----------------+ +-----------------+
| Media Transport | | Media Transport | | Media Transport | | Media Transport |
+-----------------+ +-----------------+ +-----------------+ +-----------------+
Figure 8: Example of Media Source Layered Dependency Figure 8: Example of Media Source Layered Dependency
The SVC MST relation needs to identify the common Media Encoder As an example, the SVC MST (Section 3.2.1) relation needs to identify
origin for the Encoded and Dependent Streams. The SVC RTP Payload the common Media Encoder origin for the Encoded and Dependent
RFC is not particularly explicit about how this relation is to be Streams. The SVC RTP Payload RFC is not particularly explicit about
implemented. When using different RTP Sessions, thus different Media how this relation is to be implemented. When using different RTP
Transports, and as long as there is only one Packet Stream per Media Sessions, thus different Media Transports, and as long as there is
Encoder and a single Media Source in each RTP Session, common SSRC only one Packet Stream per Media Encoder and a single Media Source in
and CNAMEs can be used to identify the common Media Source. When each RTP Session (MST-SS (Section 3.2.1)), common SSRC and CNAMEs can
multiple Packet Streams are sent from one Media Encoder in the same be used to identify the common Media Source. When multiple Packet
RTP Session, then CNAME is the only currently specified RTP Streams are sent from one Media Encoder in the same RTP Session (SST-
identifier that can be used. In cases where multiple Media Encoders MS), then CNAME is the only currently specified RTP identifier that
use multiple Media Sources sharing Synchronization Context, and thus can be used. In cases where multiple Media Encoders use multiple
having a common CNAME, additional heuristics need to be applied to Media Sources sharing Synchronization Context, and thus having a
create the MST relationship between the Packet Streams. common CNAME, additional heuristics need to be applied to create the
MST relationship between the Packet Streams.
3.3.3. Robustness and Repair 3.3.3. Robustness and Repair
Packet Streams may be protected by Redundancy Packet Streams during Packet Streams may be protected by Redundancy Packet Streams during
transport. Several approaches listed below can achieve the same transport. Several approaches listed below can achieve the same
result; result;
o Duplication of the original Packet Stream o Duplication of the original Packet Stream
o Duplication of the original Packet Stream with a time offset, o Duplication of the original Packet Stream with a time offset,
skipping to change at page 30, line 9 skipping to change at page 32, line 37
On the other hand, when Packet Streams that are related but are sent On the other hand, when Packet Streams that are related but are sent
in the context of different RTP Sessions to achieve separation, it is in the context of different RTP Sessions to achieve separation, it is
known as RTP Session-based separation. This is commonly used when known as RTP Session-based separation. This is commonly used when
the different Packet Streams are intended for different Media the different Packet Streams are intended for different Media
Transports. Transports.
Several mechanisms that use RTP Session-based separation rely on it Several mechanisms that use RTP Session-based separation rely on it
to enable an implicit grouping mechanism expressing the relationship. to enable an implicit grouping mechanism expressing the relationship.
The solutions have been based on using the same SSRC value in the The solutions have been based on using the same SSRC value in the
different RTP Sessions to implicitly indicate their relation. That different RTP Sessions to implicitly indicate their relation. That
way, no explicit RTP level mechanism has been needed, only signalling way, no explicit RTP level mechanism has been needed, only signaling
level relations have been established using semantics from Grouping level relations have been established using semantics from Grouping
of Media lines framework [RFC5888]. Examples of this are RTP of Media lines framework [RFC5888]. Examples of this are RTP
Retransmission [RFC4588], SVC Multi Stream Transmission [RFC6190] and Retransmission [RFC4588], SVC Multi-Session Transmission [RFC6190]
XOR Based FEC [RFC5109]. RTCP CNAME explicitly relates Packet and XOR Based FEC [RFC5109]. RTCP CNAME explicitly relates Packet
Streams across different RTP Sessions, as explained in the previous Streams across different RTP Sessions, as explained in the previous
section. Such a relationship can be used to perform inter-media section. Such a relationship can be used to perform inter-media
synchronization. synchronization.
Packet Streams that are related and need to be associated can be part Packet Streams that are related and need to be associated can be part
of different Multimedia Sessions, rather than just different RTP of different Multimedia Sessions, rather than just different RTP
sessions within the same Multimedia Session context. This puts sessions within the same Multimedia Session context. This puts
further demand on the scope of the mechanism(s) and its handling of further demand on the scope of the mechanism(s) and its handling of
identifiers used for expressing the relationships. identifiers used for expressing the relationships.
skipping to change at page 30, line 43 skipping to change at page 33, line 24
However, Multiple RTP Sessions over one Media Transport makes it However, Multiple RTP Sessions over one Media Transport makes it
clear that a single Media Transport 5-tuple is not sufficient to clear that a single Media Transport 5-tuple is not sufficient to
express which RTP Session context a particular Packet Stream exists express which RTP Session context a particular Packet Stream exists
in. Complexities in the relationship between Media Transports and in. Complexities in the relationship between Media Transports and
RTP Session already exist as one RTP Session contains multiple Media RTP Session already exist as one RTP Session contains multiple Media
Transports, e.g. even a Peer-to-Peer RTP Session with RTP/RTCP Transports, e.g. even a Peer-to-Peer RTP Session with RTP/RTCP
Multiplexing requires two Media Transports, one in each direction. Multiplexing requires two Media Transports, one in each direction.
The relationship between Media Transports and RTP Sessions as well as The relationship between Media Transports and RTP Sessions as well as
additional levels of identifiers need to be considered in both additional levels of identifiers need to be considered in both
signalling design and when defining terminology. signaling design and when defining terminology.
4. Topologies and Communication Entities 4. Topologies and Communication Entities
This Section reviews some communication topologies and looks at the This section reviews some communication topologies and looks at the
relationship among the communication entities that are defined in relationship among the communication entities that are defined in
Section 2.2. This section doesn't deal with discussions about the Section 2.2. It does not deal with discussions about the streams and
streams and their relation to the transport. Instead, it covers the their relation to the transport. Instead, it covers the aspects that
aspects that enable the transport of those streams. For example, the enable the transport of those streams. For example, the Media
Media Transports (Section 2.1.13) that exists between the End Points Transports (Section 2.1.13) that exists between the End Points
(Section 2.2.1) that are part of an RTP session (Section 2.2.2) and (Section 2.2.1) that are part of an RTP session (Section 2.2.2) and
their relationship to the Multi-Media Session (Section 2.2.4) between their relationship to the Multi-Media Session (Section 2.2.4) between
Participants (Section 2.2.3) and the established Communication Participants (Section 2.2.3) and the established Communication
session (Section 2.2.5) are explained. session (Section 2.2.5) are explained.
The text provided below is neither any exhaustive listing of possible
topologies, nor does it cover all topologies described in
[I-D.ietf-avtcore-rtp-topologies-update].
4.1. Point-to-Point Communication 4.1. Point-to-Point Communication
Figure 11 shows a very basic point-to-point communication session Figure 11 shows a very basic point-to-point communication session
between A and B. It uses two different audio and video RTP sessions between A and B. It uses two different audio and video RTP sessions
between A's and B's end points. Assume that the Multi-media session between A's and B's end points. Assume that the Multi-media session
shared by the participants is established using SIP (i.e., there is a shared by the participants is established using SIP (i.e., there is a
SIP Dialog between A and B). The high level representation of this SIP Dialog between A and B). The high level representation of this
communication scenario can be demonstrated using Figure 11. communication scenario can be demonstrated using Figure 11.
+---+ +---+ +---+ +---+
| A |<------->| B | | A |<------->| B |
+---+ +---+ +---+ +---+
Figure 11: Point to Point Communication Figure 11: Point to Point Communication
However, this picture gets slightly more complex when redrawn using However, this picture gets slightly more complex when redrawn using
the communication entities concepts defined earlier in this document. the communication entities concepts defined earlier in this document.
+-----------------------------------------------------------+ +-----------------------------------------------------------+
| Communication Session | | Communication Session |
| | | |
| +----------------+ +----------------+ | | +----------------+ +----------------+ |
| | Participant A | +-------------+ | Participant B | | | | Participant A | +-------------+ | Participant B | |
| | | | Multi-Media | | | | | | | | Multi-Media | | | |
| | +-------------+|<=>| Session |<=>|+-------------+ | | | | +-------------+|<=>| Session |<=>|+-------------+ | |
| | | End Point A || |(SIP Dialog) | || End Point B | | | | | | End Point A || |(SIP Dialog) | || End Point B | | |
| | | || +-------------+ || | | | | | | || +-------------+ || | | |
| | | +-----------++---------------------++-----------+ | | | | | | +-----------++---------------------++-----------+ | | |
| | | | RTP Session| | | | | | | | | | RTP Session| | | | | |
| | | | Audio |---Media Transport-->| | | | | | | | | Audio |---Media Transport-->| | | | |
| | | | |<--Media Transport---| | | | | | | | | |<--Media Transport---| | | | |
| | | +-----------++---------------------++-----------+ | | | | | | +-----------++---------------------++-----------+ | | |
| | | || || | | | | | | || || | | |
| | | +-----------++---------------------++-----------+ | | | | | | +-----------++---------------------++-----------+ | | |
| | | | RTP Session| | | | | | | | | | RTP Session| | | | | |
| | | | Video |---Media Transport-->| | | | | | | | | Video |---Media Transport-->| | | | |
| | | | |<--Media Transport---| | | | | | | | | |<--Media Transport---| | | | |
| | | +-----------++---------------------++-----------+ | | | | | | +-----------++---------------------++-----------+ | | |
| | +-------------+| |+-------------+ | | | | +-------------+| |+-------------+ | |
| +----------------+ +----------------+ | | +----------------+ +----------------+ |
+-----------------------------------------------------------+ +-----------------------------------------------------------+
Figure 12: Point to Point Communication Session with two RTP Sessions Figure 12: Point to Point Communication Session with two RTP Sessions
Figure 12 shows the two RTP Sessions only exist between the two End Figure 12 shows the two RTP Sessions only exist between the two End
Points A and B and over their respective Media Transports. The Points A and B and over their respective Media Transports. The
Multi-Media Session establishes the association between the two Multi-Media Session establishes the association between the two
Participants and configures these RTP sessions and the Media Participants and configures these RTP sessions and the Media
Transports that are used. Transports that are used.
4.2. Central Conferencing 4.2. Centralized Conferencing
This section looks at the central conferencing communication This section looks at the centralized conferencing communication
topology, where a number of participants, like A, B, C, and D in topology, where a number of participants, like A, B, C, and D in
Figure 13, communicate using an RTP mixer. Figure 13, communicate using an RTP mixer.
+---+ +------------+ +---+ +---+ +------------+ +---+
| A |<---->| |<---->| B | | A |<---->| |<---->| B |
+---+ | | +---+ +---+ | | +---+
| Mixer | | Mixer |
+---+ | | +---+ +---+ | | +---+
| C |<---->| |<---->| D | | C |<---->| |<---->| D |
+---+ +------------+ +---+ +---+ +------------+ +---+
skipping to change at page 33, line 35 skipping to change at page 36, line 49
| | | || || | | | | | | | | || || | | | | |
| | | +-----------++-----------------------------++------+ | | | | | | | +-----------++-----------------------------++------+ | | | |
| | | | RTP Session| | | | | | | | | | RTP Session| | | | | |
| | | | Audio |-------Media Transport------>| | | | | | | | | Audio |-------Media Transport------>| | | | |
| | | | |<------Media Transport-------| | | | | | | | | |<------Media Transport-------| | | | |
| | | +-----------++-----------------------------++-----------+ | | | | | | +-----------++-----------------------------++-----------+ | | |
| | +-------------+| |+-------------+ | | | | +-------------+| |+-------------+ | |
| +----------------+ +----------------+ | | +----------------+ +----------------+ |
+-------------------------------------------------------------------+ +-------------------------------------------------------------------+
Figure 14: Central Conferencing with Two Participants A and B Figure 14: Centralized Conferencing with Two Participants A and B
communicating over a Conference Bridge communicating over a Conference Bridge
It is important to stress that in the case of Figure 14, it might It is important to stress that in the case of Figure 14, it might
appear that the the Multi-Media Sessions context is scoped between A appear that the Multi-Media Sessions context is scoped between A and
and B over M. This might not be always true and they can have B over M. This might not be always true and they can have contexts
contexts that extend further. In this case the RTP session, its that extend further. In this case the RTP session, its common SSRC
common SSRC space goes beyond what occurs between A and M and B and M space goes beyond what occurs between A and M and B and M
respectively. respectively.
4.3. Full Mesh Conferencing 4.3. Full Mesh Conferencing
This section looks at the case where the three Participants (A, B and This section looks at the case where the three Participants (A, B and
C) wish to communicate. They establish individual Multi-Media C) wish to communicate. They establish individual Multi-Media
Sessions and RTP sessions between themselves and the other two peers. Sessions and RTP sessions between themselves and the other two peers.
Thus, each providing two copies of their media to every other Thus, each providing two copies of their media to every other
participant. Figure 15 shows a high level representation of such a participant. Figure 15 shows a high level representation of such a
topology. topology.
+---+ +---+ +---+ +---+
| A |<---->| B | | A |<---->| B |
+---+ +---+ +---+ +---+
^ ^ ^ ^
\ / \ /
\ / \ /
v v v v
+---+ +---+
| C | | C |
+---+ +---+
Figure 15: Full Mesh Conferencing with three Participants A, B and C Figure 15: Full Mesh Conferencing with three Participants A, B and C
In this particular case there are two aspects worth noting. The In this particular case there are two aspects worth noting. The
first is there will be multiple Multi-Media Sessions per first is there will be multiple Multi-Media Sessions per
Communication Session between the participants. This, however, Communication Session between the participants. This, however,
hasn't been true in the earlier examples; the Centralized hasn't been true in the earlier examples; the Centralized
Conferencing inSection 4.2 being the exception. The second aspect is Conferencing inSection 4.2 being the exception. The second aspect is
consideration of whether one needs to maintain relationships between consideration of whether one needs to maintain relationships between
entities and concepts, for example MediaSources, between these entities and concepts, for example Media Sources, between these
different Multi-Media Sessions and between Packet Streams in the different Multi-Media Sessions and between Packet Streams in the
independent RTP sessions configured by those Multi-Media Sessions. independent RTP sessions configured by those Multi-Media Sessions.
+-----------------------------------------+ +-----------------------------------------+
| Participant A | | Participant A |
+----------+ | +--------------------------------------+| +----------+ | +--------------------------------------+|
| Multi- | | | End Point A || | Multi- | | | End Point A ||
| Media |<======>| | || | Media |<======>| | ||
| Session | | |+-------+ +-------+ +-------+ || | Session | | |+-------+ +-------+ +-------+ ||
| 1 | | || RTP 1 |<----| MS A1 |---->| RTP 2 | || | 1 | | || RTP 1 |<----| MS A1 |---->| RTP 2 | ||
+----------+ | || | +-------+ | | || +----------+ | || | +-------+ | | ||
^^ | +|-------|-------------------|-------|-+| ^^ | +|-------|-------------------|-------|-+|
|| +--|-------|-------------------|-------|--+ || +--|-------|-------------------|-------|--+
|| | | ^^ | | || | | ^^ | |
VV | | || | | VV | | || | |
+-------------------------|-------|----+ || | | +-------------------------|-------|----+ || | |
| Participant B | | | VV | | | Participant B | | | VV | |
| +-----------------------|-------|---+| +----------+ | | | +-----------------------|-------|---+| +----------+ | |
| | End Point B +----->| | || | Multi- | | | | | End Point B +----->| | || | Multi- | | |
| | | +-------+ || | Media | | | | | | +-------+ || | Media | | |
| | +-------+ | +-------+ || | Session | | | | | +-------+ | +-------+ || | Session | | |
| | | MS B1 |------+----->| RTP 3 | || | 2 | | | | | | MS B1 |------+----->| RTP 3 | || | 2 | | |
| | +-------+ | | || +----------+ | | | | +-------+ | | || +----------+ | |
| +-----------------------|-------|---+| ^^ | | | +-----------------------|-------|---+| ^^ | |
+-------------------------|-------|----+ || | | +-------------------------|-------|----+ || | |
^^ | | || | | ^^ | | || | |
|| | | VV | | || | | VV | |
|| +--|-------|-------------------|-------|--+ || +--|-------|-------------------|-------|--+
VV | | | Participant C | | | VV | | | Participant C | | |
+----------+ | +|-------|-------------------|-------|-+| +----------+ | +|-------|-------------------|-------|-+|
| Multi- | | || | End Point C | | || | Multi- | | || | End Point C | | ||
| Media |<======>| |+-------+ +-------+ || | Media |<======>| |+-------+ +-------+ ||
| Session | | | ^ +-------+ ^ || | Session | | | ^ +-------+ ^ ||
| 3 | | | +---------| MS C1 |---------+ || | 3 | | | +---------| MS C1 |---------+ ||
+----------+ | | +-------+ || +----------+ | | +-------+ ||
| +--------------------------------------+| | +--------------------------------------+|
+-----------------------------------------+ +-----------------------------------------+
Figure 16: Full Mesh Conferencing between three Participants A, B and Figure 16: Full Mesh Conferencing between three Participants A, B and
C C
For the sake of clarity, Figure 16 above does not include all these For the sake of clarity, Figure 16 above does not include all these
concepts. The Media Sources (MS) from a given End Point is sent to concepts. The Media Sources (MS) from a given End Point is sent to
the two peers. This requires encoding and Media Packetization to the two peers. This requires encoding and Media Packetization to
enable the Packet Streams to be sent over Media Transports in the enable the Packet Streams to be sent over Media Transports in the
context of the RTP sessions depicted. The RTP sessions 1, 2, and 3 context of the RTP sessions depicted. The RTP sessions 1, 2, and 3
are independent, and established in the context of each of the Multi- are independent, and established in the context of each of the Multi-
skipping to change at page 38, line 15 skipping to change at page 41, line 32
6. Acknowledgement 6. Acknowledgement
This document has many concepts borrowed from several documents such This document has many concepts borrowed from several documents such
as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework], as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework],
Multiplexing Architecture Multiplexing Architecture
[I-D.westerlund-avtcore-transport-multiplexing]. The authors would [I-D.westerlund-avtcore-transport-multiplexing]. The authors would
like to thank all the authors of each of those documents. like to thank all the authors of each of those documents.
The authors would also like to acknowledge the insights, guidance and The authors would also like to acknowledge the insights, guidance and
contributions of Magnus Westerlund, Roni Even, Paul Kyzivat, Colin contributions of Magnus Westerlund, Roni Even, Paul Kyzivat, Colin
Perkins, Keith Drage, and Harald Alvestrand. Perkins, Keith Drage, Harald Alvestrand, and Alex Eleftheriadis.
7. Contributors 7. Contributors
Magnus Westerlund has contributed the concept model for the media Magnus Westerlund has contributed the concept model for the media
chain using transformations and streams model, including rewriting chain using transformations and streams model, including rewriting
pre-existing concepts into this model and adding missing concepts. pre-existing concepts into this model and adding missing concepts.
The first proposal for updating the relationships and the topologies The first proposal for updating the relationships and the topologies
based on this concept was also performed by Magnus. based on this concept was also performed by Magnus.
8. IANA Considerations 8. IANA Considerations
skipping to change at page 38, line 30 skipping to change at page 42, line 4
chain using transformations and streams model, including rewriting chain using transformations and streams model, including rewriting
pre-existing concepts into this model and adding missing concepts. pre-existing concepts into this model and adding missing concepts.
The first proposal for updating the relationships and the topologies The first proposal for updating the relationships and the topologies
based on this concept was also performed by Magnus. based on this concept was also performed by Magnus.
8. IANA Considerations 8. IANA Considerations
This document makes no request of IANA. This document makes no request of IANA.
9. References 9. References
9.1. Normative References 9.1. Normative References
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003. Applications", STD 64, RFC 3550, July 2003.
[UML] Object Management Group, "OMG Unified Modeling Language
(OMG UML), Superstructure, V2.2", OMG formal/2009-02-02,
February 2009.
9.2. Informative References 9.2. Informative References
[I-D.ietf-avtcore-clksrc] [I-D.ietf-avtcore-clksrc]
Williams, A., Gross, K., Brandenburg, R., and H. Stokking, Williams, A., Gross, K., Brandenburg, R., and H. Stokking,
"RTP Clock Source Signalling", draft-ietf-avtcore- "RTP Clock Source Signalling", draft-ietf-avtcore-
clksrc-07 (work in progress), October 2013. clksrc-09 (work in progress), December 2013.
[I-D.ietf-avtcore-rtp-topologies-update]
Westerlund, M. and S. Wenger, "RTP Topologies", draft-
ietf-avtcore-rtp-topologies-update-01 (work in progress),
October 2013.
[I-D.ietf-clue-framework] [I-D.ietf-clue-framework]
Duckworth, M., Pepperell, A., and S. Wenger, "Framework Duckworth, M., Pepperell, A., and S. Wenger, "Framework
for Telepresence Multi-Streams", draft-ietf-clue- for Telepresence Multi-Streams", draft-ietf-clue-
framework-12 (work in progress), October 2013. framework-14 (work in progress), February 2014.
[I-D.ietf-mmusic-sdp-bundle-negotiation] [I-D.ietf-mmusic-sdp-bundle-negotiation]
Holmberg, C., Alvestrand, H., and C. Jennings, Holmberg, C., Alvestrand, H., and C. Jennings,
"Multiplexing Negotiation Using Session Description "Multiplexing Negotiation Using Session Description
Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
bundle-negotiation-05 (work in progress), October 2013. bundle-negotiation-05 (work in progress), October 2013.
[I-D.ietf-rtcweb-overview] [I-D.ietf-rtcweb-overview]
Alvestrand, H., "Overview: Real Time Protocols for Brower- Alvestrand, H., "Overview: Real Time Protocols for Brower-
based Applications", draft-ietf-rtcweb-overview-08 (work based Applications", draft-ietf-rtcweb-overview-08 (work
skipping to change at page 40, line 35 skipping to change at page 44, line 9
May 2011. May 2011.
[RFC6222] Begen, A., Perkins, C., and D. Wing, "Guidelines for [RFC6222] Begen, A., Perkins, C., and D. Wing, "Guidelines for
Choosing RTP Control Protocol (RTCP) Canonical Names Choosing RTP Control Protocol (RTCP) Canonical Names
(CNAMEs)", RFC 6222, April 2011. (CNAMEs)", RFC 6222, April 2011.
Appendix A. Changes From Earlier Versions Appendix A. Changes From Earlier Versions
NOTE TO RFC EDITOR: Please remove this section prior to publication. NOTE TO RFC EDITOR: Please remove this section prior to publication.
A.1. Modifications Between Version -02 and -03 A.1. Modifications Between WG Version -00 and -03
o WG version -00 text is identical to individual draft -03
o Amended description of SVC SST and MST encodings with respect to
concepts defined in this text
o Removed UML as normative reference, since the text no longer uses
any UML notation
o Removed a number of level 4 sections and moved out text to the
level above
A.2. Modifications Between Version -02 and -03
o Section 4 rewritten (and new communication topologies added) to o Section 4 rewritten (and new communication topologies added) to
reflect the major updates to Sections 1-3 reflect the major updates to Sections 1-3
o Section 8 removed (carryover from initial -00 draft) o Section 8 removed (carryover from initial -00 draft)
o General clean up of text, grammar and nits o General clean up of text, grammar and nits
A.2. Modifications Between Version -01 and -02 A.3. Modifications Between Version -01 and -02
o Section 2 rewritten to add both streams and transformations in the o Section 2 rewritten to add both streams and transformations in the
media chain. media chain.
o Section 3 rewritten to focus on exposing relationships. o Section 3 rewritten to focus on exposing relationships.
A.3. Modifications Between Version -00 and -01 A.4. Modifications Between Version -00 and -01
o Too many to list o Too many to list
o Added new authors o Added new authors
o Updated content organization and presentation o Updated content organization and presentation
Authors' Addresses Authors' Addresses
Jonathan Lennox Jonathan Lennox
Vidyo, Inc. Vidyo, Inc.
433 Hackensack Avenue 433 Hackensack Avenue
Seventh Floor Seventh Floor
Hackensack, NJ 07601 Hackensack, NJ 07601
US US
Email: jonathan@vidyo.com Email: jonathan@vidyo.com
Kevin Gross Kevin Gross
 End of changes. 112 change blocks. 
483 lines changed or deleted 557 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/