draft-ietf-avtext-rtp-grouping-taxonomy-01.txt | draft-ietf-avtext-rtp-grouping-taxonomy-02.txt | |||
---|---|---|---|---|
Network Working Group J. Lennox | Network Working Group J. Lennox | |||
Internet-Draft Vidyo | Internet-Draft Vidyo | |||
Intended status: Informational K. Gross | Intended status: Informational K. Gross | |||
Expires: August 18, 2014 AVA | Expires: December 29, 2014 AVA | |||
S. Nandakumar | S. Nandakumar | |||
G. Salgueiro | G. Salgueiro | |||
Cisco Systems | Cisco Systems | |||
B. Burman | B. Burman | |||
Ericsson | Ericsson | |||
February 14, 2014 | June 27, 2014 | |||
A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport | A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport | |||
Protocol (RTP) Sources | Protocol (RTP) Sources | |||
draft-ietf-avtext-rtp-grouping-taxonomy-01 | draft-ietf-avtext-rtp-grouping-taxonomy-02 | |||
Abstract | Abstract | |||
The terminology about, and associations among, Real-Time Transport | The terminology about, and associations among, Real-Time Transport | |||
Protocol (RTP) sources can be complex and somewhat opaque. This | Protocol (RTP) sources can be complex and somewhat opaque. This | |||
document describes a number of existing and proposed relationships | document describes a number of existing and proposed relationships | |||
among RTP sources, and attempts to define common terminology for | among RTP sources, and attempts to define common terminology for | |||
discussing protocol entities and their relationships. | discussing protocol entities and their relationships. | |||
Status of This Memo | Status of This Memo | |||
skipping to change at page 1, line 41 | skipping to change at page 1, line 41 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on August 18, 2014. | This Internet-Draft will expire on December 29, 2014. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2014 IETF Trust and the persons identified as the | Copyright (c) 2014 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2.1. Media Chain . . . . . . . . . . . . . . . . . . . . . . . 4 | 2.1. Media Chain . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2.1.1. Physical Stimulus . . . . . . . . . . . . . . . . . . 8 | 2.1.1. Physical Stimulus . . . . . . . . . . . . . . . . . . 8 | |||
2.1.2. Media Capture . . . . . . . . . . . . . . . . . . . . 8 | 2.1.2. Media Capture . . . . . . . . . . . . . . . . . . . . 8 | |||
2.1.3. Raw Stream . . . . . . . . . . . . . . . . . . . . . 8 | 2.1.3. Raw Stream . . . . . . . . . . . . . . . . . . . . . 8 | |||
2.1.4. Media Source . . . . . . . . . . . . . . . . . . . . 9 | 2.1.4. Media Source . . . . . . . . . . . . . . . . . . . . 8 | |||
2.1.5. Source Stream . . . . . . . . . . . . . . . . . . . . 10 | 2.1.5. Source Stream . . . . . . . . . . . . . . . . . . . . 9 | |||
2.1.6. Media Encoder . . . . . . . . . . . . . . . . . . . . 10 | 2.1.6. Media Encoder . . . . . . . . . . . . . . . . . . . . 9 | |||
2.1.7. Encoded Stream . . . . . . . . . . . . . . . . . . . 11 | 2.1.7. Encoded Stream . . . . . . . . . . . . . . . . . . . 10 | |||
2.1.8. Dependent Stream . . . . . . . . . . . . . . . . . . 11 | 2.1.8. Dependent Stream . . . . . . . . . . . . . . . . . . 11 | |||
2.1.9. Media Packetizer . . . . . . . . . . . . . . . . . . 12 | 2.1.9. Media Packetizer . . . . . . . . . . . . . . . . . . 11 | |||
2.1.10. Packet Stream . . . . . . . . . . . . . . . . . . . . 12 | 2.1.10. RTP Stream . . . . . . . . . . . . . . . . . . . . . 11 | |||
2.1.11. Media Redundancy . . . . . . . . . . . . . . . . . . 13 | 2.1.11. Media Redundancy . . . . . . . . . . . . . . . . . . 12 | |||
2.1.12. Redundancy Packet Stream . . . . . . . . . . . . . . 14 | 2.1.12. Redundancy RTP Stream . . . . . . . . . . . . . . . . 12 | |||
2.1.13. Media Transport . . . . . . . . . . . . . . . . . . . 14 | 2.1.13. Media Transport . . . . . . . . . . . . . . . . . . . 13 | |||
2.1.14. Received Packet Stream . . . . . . . . . . . . . . . 16 | 2.1.14. Media Transport Sender . . . . . . . . . . . . . . . 14 | |||
2.1.15. Received Redundandy Packet Stream . . . . . . . . . . 16 | 2.1.15. Sent RTP Stream . . . . . . . . . . . . . . . . . . . 14 | |||
2.1.16. Media Repair . . . . . . . . . . . . . . . . . . . . 16 | 2.1.16. Network Transport . . . . . . . . . . . . . . . . . . 14 | |||
2.1.17. Repaired Packet Stream . . . . . . . . . . . . . . . 17 | 2.1.17. Transported RTP Stream . . . . . . . . . . . . . . . 14 | |||
2.1.18. Media Depacketizer . . . . . . . . . . . . . . . . . 17 | 2.1.18. Media Transport Receiver . . . . . . . . . . . . . . 14 | |||
2.1.19. Received Encoded Stream . . . . . . . . . . . . . . . 17 | 2.1.19. Received RTP Stream . . . . . . . . . . . . . . . . . 15 | |||
2.1.20. Media Decoder . . . . . . . . . . . . . . . . . . . . 17 | 2.1.20. Received Redundancy RTP Stream . . . . . . . . . . . 15 | |||
2.1.21. Received Source Stream . . . . . . . . . . . . . . . 18 | 2.1.21. Media Repair . . . . . . . . . . . . . . . . . . . . 15 | |||
2.1.22. Media Sink . . . . . . . . . . . . . . . . . . . . . 18 | 2.1.22. Repaired RTP Stream . . . . . . . . . . . . . . . . . 15 | |||
2.1.23. Received Raw Stream . . . . . . . . . . . . . . . . . 18 | 2.1.23. Media Depacketizer . . . . . . . . . . . . . . . . . 15 | |||
2.1.24. Media Render . . . . . . . . . . . . . . . . . . . . 18 | 2.1.24. Received Encoded Stream . . . . . . . . . . . . . . . 16 | |||
2.2. Communication Entities . . . . . . . . . . . . . . . . . 19 | 2.1.25. Media Decoder . . . . . . . . . . . . . . . . . . . . 16 | |||
2.2.1. End Point . . . . . . . . . . . . . . . . . . . . . . 19 | 2.1.26. Received Source Stream . . . . . . . . . . . . . . . 16 | |||
2.2.2. RTP Session . . . . . . . . . . . . . . . . . . . . . 19 | 2.1.27. Media Sink . . . . . . . . . . . . . . . . . . . . . 16 | |||
2.2.3. Participant . . . . . . . . . . . . . . . . . . . . . 20 | 2.1.28. Received Raw Stream . . . . . . . . . . . . . . . . . 17 | |||
2.1.29. Media Render . . . . . . . . . . . . . . . . . . . . 17 | ||||
2.2. Communication Entities . . . . . . . . . . . . . . . . . 17 | ||||
2.2.1. End Point . . . . . . . . . . . . . . . . . . . . . . 18 | ||||
2.2.2. RTP Session . . . . . . . . . . . . . . . . . . . . . 18 | ||||
2.2.3. Participant . . . . . . . . . . . . . . . . . . . . . 19 | ||||
2.2.4. Multimedia Session . . . . . . . . . . . . . . . . . 20 | 2.2.4. Multimedia Session . . . . . . . . . . . . . . . . . 20 | |||
2.2.5. Communication Session . . . . . . . . . . . . . . . . 21 | 2.2.5. Communication Session . . . . . . . . . . . . . . . . 20 | |||
3. Relations at Different Levels . . . . . . . . . . . . . . . . 22 | ||||
3.1. Media Source Relations . . . . . . . . . . . . . . . . . 22 | 3. Relations at Different Levels . . . . . . . . . . . . . . . . 21 | |||
3.1.1. Synchronization Context . . . . . . . . . . . . . . . 22 | 3.1. Synchronization Context . . . . . . . . . . . . . . . . . 22 | |||
3.1.2. End Point . . . . . . . . . . . . . . . . . . . . . . 23 | 3.1.1. RTCP CNAME . . . . . . . . . . . . . . . . . . . . . 22 | |||
3.1.3. Participant . . . . . . . . . . . . . . . . . . . . . 24 | 3.1.2. Clock Source Signaling . . . . . . . . . . . . . . . 22 | |||
3.1.4. WebRTC MediaStream . . . . . . . . . . . . . . . . . 24 | 3.1.3. Implicitly via RtcMediaStream . . . . . . . . . . . . 22 | |||
3.2. Packetization Time Relations . . . . . . . . . . . . . . 24 | 3.1.4. Explicitly via SDP Mechanisms . . . . . . . . . . . . 22 | |||
3.2.1. Single and Multi-Session Transmission of SVC . . . . 24 | 3.2. End Point . . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
3.2.2. Multi-Channel Audio . . . . . . . . . . . . . . . . . 25 | 3.3. Participant . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
3.2.3. Redundancy Format . . . . . . . . . . . . . . . . . . 25 | 3.4. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 23 | |||
3.3. Packet Stream Relations . . . . . . . . . . . . . . . . . 26 | 3.5. Single- and Multi-Session Transmission of SVC . . . . . . 23 | |||
3.3.1. Simulcast . . . . . . . . . . . . . . . . . . . . . . 27 | 3.6. Multi-Channel Audio . . . . . . . . . . . . . . . . . . . 24 | |||
3.3.2. Layered Multi-Stream . . . . . . . . . . . . . . . . 28 | 3.7. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
3.3.3. Robustness and Repair . . . . . . . . . . . . . . . . 29 | 3.8. Layered Multi-Stream . . . . . . . . . . . . . . . . . . 25 | |||
3.3.4. Packet Stream Separation . . . . . . . . . . . . . . 32 | 3.9. RTP Stream Duplication . . . . . . . . . . . . . . . . . 27 | |||
3.4. Multiple RTP Sessions over one Media Transport . . . . . 33 | 3.10. Redundancy Format . . . . . . . . . . . . . . . . . . . . 27 | |||
4. Topologies and Communication Entities . . . . . . . . . . . . 33 | 3.11. RTP Retransmission . . . . . . . . . . . . . . . . . . . 28 | |||
4.1. Point-to-Point Communication . . . . . . . . . . . . . . 33 | 3.12. Forward Error Correction . . . . . . . . . . . . . . . . 29 | |||
4.2. Centralized Conferencing . . . . . . . . . . . . . . . . 34 | 3.13. RTP Stream Separation . . . . . . . . . . . . . . . . . . 31 | |||
4.3. Full Mesh Conferencing . . . . . . . . . . . . . . . . . 37 | 3.14. Multiple RTP Sessions over one Media Transport . . . . . 32 | |||
4.4. Source-Specific Multicast . . . . . . . . . . . . . . . . 39 | 4. Mapping from Existing Terms . . . . . . . . . . . . . . . . . 32 | |||
5. Security Considerations . . . . . . . . . . . . . . . . . . . 41 | 4.1. Audio Capture . . . . . . . . . . . . . . . . . . . . . . 32 | |||
6. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 41 | 4.2. Capture Device . . . . . . . . . . . . . . . . . . . . . 32 | |||
7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 41 | 4.3. Capture Encoding . . . . . . . . . . . . . . . . . . . . 32 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41 | 4.4. Capture Scene . . . . . . . . . . . . . . . . . . . . . . 33 | |||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 41 | 4.5. Endpoint . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
9.1. Normative References . . . . . . . . . . . . . . . . . . 42 | 4.6. Individual Encoding . . . . . . . . . . . . . . . . . . . 33 | |||
9.2. Informative References . . . . . . . . . . . . . . . . . 42 | 4.7. Multipoint Control Unit (MCU) . . . . . . . . . . . . . . 33 | |||
Appendix A. Changes From Earlier Versions . . . . . . . . . . . 44 | 4.8. Media Capture . . . . . . . . . . . . . . . . . . . . . . 33 | |||
A.1. Modifications Between WG Version -00 and -03 . . . . . . 44 | 4.9. Media Consumer . . . . . . . . . . . . . . . . . . . . . 33 | |||
A.2. Modifications Between Version -02 and -03 . . . . . . . . 44 | 4.10. Media Description . . . . . . . . . . . . . . . . . . . . 33 | |||
A.3. Modifications Between Version -01 and -02 . . . . . . . . 44 | 4.11. Media Provider . . . . . . . . . . . . . . . . . . . . . 34 | |||
A.4. Modifications Between Version -00 and -01 . . . . . . . . 44 | 4.12. Media Stream . . . . . . . . . . . . . . . . . . . . . . 34 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 44 | 4.13. Multimedia Session . . . . . . . . . . . . . . . . . . . 34 | |||
4.14. Recording Device . . . . . . . . . . . . . . . . . . . . 34 | ||||
4.15. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 34 | ||||
4.16. RtcMediaStreamTrack . . . . . . . . . . . . . . . . . . . 35 | ||||
4.17. RTP Sender . . . . . . . . . . . . . . . . . . . . . . . 35 | ||||
4.18. RTP Session . . . . . . . . . . . . . . . . . . . . . . . 35 | ||||
4.19. SSRC . . . . . . . . . . . . . . . . . . . . . . . . . . 35 | ||||
4.20. Stream . . . . . . . . . . . . . . . . . . . . . . . . . 35 | ||||
4.21. Video Capture . . . . . . . . . . . . . . . . . . . . . . 35 | ||||
5. Security Considerations . . . . . . . . . . . . . . . . . . . 35 | ||||
6. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 36 | ||||
7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 36 | ||||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 | ||||
9. Informative References . . . . . . . . . . . . . . . . . . . 36 | ||||
Appendix A. Changes From Earlier Versions . . . . . . . . . . . 38 | ||||
A.1. Modifications Between WG Version -01 and -02 . . . . . . 38 | ||||
A.2. Modifications Between WG Version -00 and -01 . . . . . . 39 | ||||
A.3. Modifications Between Version -02 and -03 . . . . . . . . 40 | ||||
A.4. Modifications Between Version -01 and -02 . . . . . . . . 40 | ||||
A.5. Modifications Between Version -00 and -01 . . . . . . . . 40 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 | ||||
1. Introduction | 1. Introduction | |||
The existing taxonomy of sources in RTP is often regarded as | The existing taxonomy of sources in RTP is often regarded as | |||
confusing and inconsistent. Consequently, a deep understanding of | confusing and inconsistent. Consequently, a deep understanding of | |||
how the different terms relate to each other becomes a real | how the different terms relate to each other becomes a real | |||
challenge. Frequently cited examples of this confusion are (1) how | challenge. Frequently cited examples of this confusion are (1) how | |||
different protocols that make use of RTP use the same terms to | different protocols that make use of RTP use the same terms to | |||
signify different things and (2) how the complexities addressed at | signify different things and (2) how the complexities addressed at | |||
one layer are often glossed over or ignored at another. | one layer are often glossed over or ignored at another. | |||
skipping to change at page 4, line 41 | skipping to change at page 5, line 16 | |||
the streams in some way. | the streams in some way. | |||
The below examples are basic ones and it is important to keep in mind | The below examples are basic ones and it is important to keep in mind | |||
that this conceptual model enables more complex usages. Some will be | that this conceptual model enables more complex usages. Some will be | |||
further discussed in later sections of this document. In general the | further discussed in later sections of this document. In general the | |||
following applies to this model: | following applies to this model: | |||
o A transformation may have zero or more inputs and one or more | o A transformation may have zero or more inputs and one or more | |||
outputs. | outputs. | |||
o A Stream is of some type. | o A stream is of some type. | |||
o A Stream has one source transformation and one or more sink | o A stream has one source transformation and one or more sink | |||
transformation (with the exception of Physical Stimulus | transformations (with the exception of Physical Stimulus | |||
(Section 2.1.1) that can have no source or sink transformation). | (Section 2.1.1) that may lack source or sink transformation). | |||
o Streams can be forwarded from a transformation output to any | o Streams can be forwarded from a transformation output to any | |||
number of inputs on other transformations that support that type. | number of inputs on other transformations that support that type. | |||
o If the output of a transformation is sent to multiple | o If the output of a transformation is sent to multiple | |||
transformations, those streams will be identical; it takes a | transformations, those streams will be identical; it takes a | |||
transformation to make them different. | transformation to make them different. | |||
o There are no formal limitations on how streams are connected to | o There are no formal limitations on how streams are connected to | |||
transformations, this may include loops if required by a | transformations, this may include loops if required by a | |||
skipping to change at page 6, line 5 | skipping to change at page 6, line 5 | |||
It is also important to remember that this is a conceptual model. | It is also important to remember that this is a conceptual model. | |||
Thus real-world implementations may look different and have different | Thus real-world implementations may look different and have different | |||
structure. | structure. | |||
To provide a basic understanding of the relationships in the chain we | To provide a basic understanding of the relationships in the chain we | |||
below first introduce the concepts for the sender side (Figure 1). | below first introduce the concepts for the sender side (Figure 1). | |||
This covers physical stimulus until media packets are emitted onto | This covers physical stimulus until media packets are emitted onto | |||
the network. | the network. | |||
Physical Stimulus | Physical Stimulus | |||
| | | | |||
V | V | |||
+--------------------+ | +--------------------+ | |||
| Media Capture | | | Media Capture | | |||
+--------------------+ | +--------------------+ | |||
| | | | |||
Raw Stream | Raw Stream | |||
V | V | |||
+--------------------+ | +--------------------+ | |||
| Media Source |<- Synchronization Timing | | Media Source |<- Synchronization Timing | |||
+--------------------+ | +--------------------+ | |||
| | | | |||
Source Stream | Source Stream | |||
V | V | |||
+--------------------+ | +--------------------+ | |||
| Media Encoder | | | Media Encoder | | |||
+--------------------+ | +--------------------+ | |||
| | | | |||
Encoded Stream +-----------+ | Encoded Stream +-----------+ | |||
V | V | V | V | |||
+--------------------+ | +--------------------+ | +--------------------+ | +--------------------+ | |||
| Media Packetizer | | | Media Redundancy | | | Media Packetizer | | | Media Redundancy | | |||
+--------------------+ | +--------------------+ | +--------------------+ | +--------------------+ | |||
| | | | | | | | |||
+------------+ Redundancy Packet Stream | +------------+ Redundancy RTP Stream | |||
Source Packet Stream | | Source RTP Stream | | |||
V V | V V | |||
+--------------------+ +--------------------+ | +--------------------+ +--------------------+ | |||
| Media Transport | | Media Transport | | | Media Transport | | Media Transport | | |||
+--------------------+ +--------------------+ | +--------------------+ +--------------------+ | |||
Figure 1: Sender Side Concepts in the Media Chain | Figure 1: Sender Side Concepts in the Media Chain | |||
In Figure 1 we have included a branched chain to cover the concepts | In Figure 1 we have included a branched chain to cover the concepts | |||
for using redundancy to improve the reliability of the transport. | for using redundancy to improve the reliability of the transport. | |||
The Media Transport concept is an aggregate that is decomposed below | The Media Transport concept is an aggregate that is decomposed below | |||
in Section 2.1.13. | in Section 2.1.13. | |||
Below we review a receiver media chain (Figure 2) matching the sender | Below we review a receiver media chain (Figure 2) matching the sender | |||
side to look at the inverse transformations and their attempts to | side to look at the inverse transformations and their attempts to | |||
skipping to change at page 7, line 9 | skipping to change at page 7, line 9 | |||
out the Media Decoder are in many cases not the same as the | out the Media Decoder are in many cases not the same as the | |||
corresponding ones on the sender side, thus they are prefixed with a | corresponding ones on the sender side, thus they are prefixed with a | |||
"Received" to denote a potentially modified version. The reason for | "Received" to denote a potentially modified version. The reason for | |||
not being the same lies in the transformations that can be of | not being the same lies in the transformations that can be of | |||
irreversible type. For example, lossy source coding in the Media | irreversible type. For example, lossy source coding in the Media | |||
Encoder prevents the Source Stream out of the Media Decoder to be the | Encoder prevents the Source Stream out of the Media Decoder to be the | |||
same as the one fed into the Media Encoder. Other reasons include | same as the one fed into the Media Encoder. Other reasons include | |||
packet loss or late loss in the Media Transport transformation that | packet loss or late loss in the Media Transport transformation that | |||
even Media Repair, if used, fails to repair. It should be noted that | even Media Repair, if used, fails to repair. It should be noted that | |||
some transformations are not always present, like Media Repair that | some transformations are not always present, like Media Repair that | |||
cannot operate without Redundancy Packet Streams. | cannot operate without Redundancy RTP Streams. | |||
+--------------------+ +--------------------+ | +--------------------+ +--------------------+ | |||
| Media Transport | | Media Transport | | | Media Transport | | Media Transport | | |||
+--------------------+ +--------------------+ | +--------------------+ +--------------------+ | |||
| | | | | | |||
Received Packet Stream Received Redundancy PS | Received RTP Stream Received Redundancy RTP Stream | |||
| | | | | | |||
| +-------------------+ | | +-------------------+ | |||
V V | V V | |||
+--------------------+ | +--------------------+ | |||
| Media Repair | | | Media Repair | | |||
+--------------------+ | +--------------------+ | |||
| | | | |||
Repaired Packet Stream | Repaired RTP Stream | |||
V | V | |||
+--------------------+ | +--------------------+ | |||
| Media Depacketizer | | | Media Depacketizer | | |||
+--------------------+ | +--------------------+ | |||
| | | | |||
Received Encoded Stream | Received Encoded Stream | |||
V | V | |||
+--------------------+ | +--------------------+ | |||
| Media Decoder | | | Media Decoder | | |||
+--------------------+ | +--------------------+ | |||
| | | | |||
Received Source Stream | Received Source Stream | |||
V | V | |||
+--------------------+ | +--------------------+ | |||
| Media Sink |--> Synchronization Information | | Media Sink |--> Synchronization Information | |||
+--------------------+ | +--------------------+ | |||
| | | | |||
Received Raw Stream | Received Raw Stream | |||
V | V | |||
+--------------------+ | +--------------------+ | |||
| Media Renderer | | | Media Renderer | | |||
+--------------------+ | +--------------------+ | |||
| | | | |||
V | V | |||
Physical Stimulus | Physical Stimulus | |||
Figure 2: Receiver Side Concepts of the Media Chain | Figure 2: Receiver Side Concepts of the Media Chain | |||
2.1.1. Physical Stimulus | 2.1.1. Physical Stimulus | |||
The physical stimulus is a physical event that can be measured and | The physical stimulus is a physical event that can be measured and | |||
converted to digital form by an appropriate sensor or transducer. | converted to digital form by an appropriate sensor or transducer. | |||
This include sound waves making up audio, photons in a light field | This include sound waves making up audio, photons in a light field | |||
that is visible, or other excitations or interactions with sensors, | that is visible, or other excitations or interactions with sensors, | |||
like keystrokes on a keyboard. | like keystrokes on a keyboard. | |||
skipping to change at page 8, line 26 | skipping to change at page 8, line 26 | |||
(Section 2.1.1) into digital Media using an appropriate sensor or | (Section 2.1.1) into digital Media using an appropriate sensor or | |||
transducer. The Media Capture performs a digital sampling of the | transducer. The Media Capture performs a digital sampling of the | |||
physical stimulus, usually periodically, and outputs this in some | physical stimulus, usually periodically, and outputs this in some | |||
representation as a Raw Stream (Section 2.1.3). This data is due to | representation as a Raw Stream (Section 2.1.3). This data is due to | |||
its periodical sampling, or at least being timed asynchronous events, | its periodical sampling, or at least being timed asynchronous events, | |||
some form of a stream of media data. The Media Capture is normally | some form of a stream of media data. The Media Capture is normally | |||
instantiated in some type of device, i.e. media capture device. | instantiated in some type of device, i.e. media capture device. | |||
Examples of different types of media capturing devices are digital | Examples of different types of media capturing devices are digital | |||
cameras, microphones connected to A/D converters, or keyboards. | cameras, microphones connected to A/D converters, or keyboards. | |||
Alternate usages: | ||||
o The CLUE WG uses the term "Capture Device" to identify a physical | ||||
capture device. | ||||
o WebRTC WG uses the term "Recording Device" to refer to the locally | ||||
available capture devices in an end-system. | ||||
Characteristics: | Characteristics: | |||
o A Media Capture is identified either by hardware/manufacturer ID | o A Media Capture is identified either by hardware/manufacturer ID | |||
or via a session-scoped device identifier as mandated by the | or via a session-scoped device identifier as mandated by the | |||
application usage. | application usage. | |||
o A Media Capture can generate an Encoded Stream (Section 2.1.7) if | o A Media Capture can generate an Encoded Stream (Section 2.1.7) if | |||
the capture device support such a configuration. | the capture device support such a configuration. | |||
2.1.3. Raw Stream | 2.1.3. Raw Stream | |||
skipping to change at page 9, line 21 | skipping to change at page 9, line 12 | |||
output has been synchronized with some reference clock, even if just | output has been synchronized with some reference clock, even if just | |||
a system local wall clock. | a system local wall clock. | |||
The output can be of different types. One type is directly | The output can be of different types. One type is directly | |||
associated with a particular Media Capture's Raw Stream. Others are | associated with a particular Media Capture's Raw Stream. Others are | |||
more conceptual sources, like an audio mix of multiple Raw Streams | more conceptual sources, like an audio mix of multiple Raw Streams | |||
(Figure 3), a mixed selection of the three loudest inputs regarding | (Figure 3), a mixed selection of the three loudest inputs regarding | |||
speech activity, a selection of a particular video based on the | speech activity, a selection of a particular video based on the | |||
current speaker, i.e. typically based on other Media Sources. | current speaker, i.e. typically based on other Media Sources. | |||
Raw Raw Raw | Raw Raw Raw | |||
Stream Stream Stream | Stream Stream Stream | |||
| | | | | | | | |||
V V V | V V V | |||
+--------------------------+ | +--------------------------+ | |||
| Media Source |<-- Reference Clock | | Media Source |<-- Reference Clock | |||
| Mixer | | | Mixer | | |||
+--------------------------+ | +--------------------------+ | |||
| | | | |||
V | V | |||
Source Stream | Source Stream | |||
Figure 3: Conceptual Media Source in form of Audio Mixer | Figure 3: Conceptual Media Source in form of Audio Mixer | |||
The CLUE WG uses the term "Media Capture" for this purpose. A CLUE | ||||
Media Capture is identified via indexed notation. The terms Audio | ||||
Capture and Video Capture are used to identify Audio Sources and | ||||
Video Sources respectively. Concepts such as "Capture Scene", | ||||
"Capture Scene Entry" and "Capture" provide a flexible framework to | ||||
represent media captured spanning spatial regions. | ||||
The WebRTC WG defines the term "RtcMediaStreamTrack" to refer to a | ||||
Media Source. An "RtcMediaStreamTrack" is identified by the ID | ||||
attribute. | ||||
Typically a Media Source is mapped to a single m=line via the Session | ||||
Description Protocol (SDP) [RFC4566] unless mechanisms such as | ||||
Source-Specific attributes are in place [RFC5576]. In the latter | ||||
cases, an m=line can represent either multiple Media Sources, | ||||
multiple Packet Streams (Section 2.1.10), or both. | ||||
Characteristics: | Characteristics: | |||
o At any point, it can represent a physical captured source or | o At any point, it can represent a physical captured source or | |||
conceptual source. | conceptual source. | |||
2.1.5. Source Stream | 2.1.5. Source Stream | |||
A time progressing stream of digital samples that has been | A time progressing stream of digital samples that has been | |||
synchronized with a reference clock and comes from particular Media | synchronized with a reference clock and comes from particular Media | |||
Source (Section 2.1.4). | Source (Section 2.1.4). | |||
skipping to change at page 10, line 38 | skipping to change at page 10, line 13 | |||
parameters. | parameters. | |||
Scalable Media Encoders need special mentioning as they produce | Scalable Media Encoders need special mentioning as they produce | |||
multiple outputs that are potentially of different types. A scalable | multiple outputs that are potentially of different types. A scalable | |||
Media Encoder takes one input Source Stream and encodes it into | Media Encoder takes one input Source Stream and encodes it into | |||
multiple output streams of two different types; at least one Encoded | multiple output streams of two different types; at least one Encoded | |||
Stream that is independently decodable and one or more Dependent | Stream that is independently decodable and one or more Dependent | |||
Streams (Section 2.1.8) that requires at least one Encoded Stream and | Streams (Section 2.1.8) that requires at least one Encoded Stream and | |||
zero or more Dependent Streams to be possible to decode. A Dependent | zero or more Dependent Streams to be possible to decode. A Dependent | |||
Stream's dependency is one of the grouping relations this document | Stream's dependency is one of the grouping relations this document | |||
discusses further in Section 3.3.2. | discusses further in Section 3.8. | |||
Source Stream | Source Stream | |||
| | | | |||
V | V | |||
+--------------------------+ | +--------------------------+ | |||
| Scalable Media Encoder | | | Scalable Media Encoder | | |||
+--------------------------+ | +--------------------------+ | |||
| | ... | | | | ... | | |||
V V V | V V V | |||
Encoded Dependent Dependent | Encoded Dependent Dependent | |||
Stream Stream Stream | Stream Stream Stream | |||
Figure 4: Scalable Media Encoder Input and Outputs | Figure 4: Scalable Media Encoder Input and Outputs | |||
There are also other variants of encoders, like so-called Multiple | There are also other variants of encoders, like so-called Multiple | |||
Description Coding (MDC). Such Media Encoder produce multiple | Description Coding (MDC). Such Media Encoder produce multiple | |||
independent and thus individually decodable Encoded Streams that are | independent and thus individually decodable Encoded Streams that are | |||
possible to combine into a Received Source Stream that is somehow a | possible to combine into a Received Source Stream that is somehow a | |||
better representation of the original Source Stream than using only a | better representation of the original Source Stream than using only a | |||
single Encoded Stream. | single Encoded Stream. | |||
Alternate usages: | ||||
o Within the SDP usage, an SDP media description (m=line) describes | ||||
part of the necessary configuration required for encoding | ||||
purposes. | ||||
o CLUE's "Capture Encoding" provides specific encoding configuration | ||||
for this purpose. | ||||
Characteristics: | Characteristics: | |||
o A Media Source can be multiply encoded by different Media Encoders | o A Media Source can be multiply encoded by different Media Encoders | |||
to provide various encoded representations. | to provide various encoded representations. | |||
2.1.7. Encoded Stream | 2.1.7. Encoded Stream | |||
A stream of time synchronized encoded media that can be independently | A stream of time synchronized encoded media that can be independently | |||
decoded. | decoded. | |||
skipping to change at page 12, line 10 | skipping to change at page 11, line 22 | |||
o Each Dependent Stream has a set of dependencies. These | o Each Dependent Stream has a set of dependencies. These | |||
dependencies must be understood by the parties in a multi-media | dependencies must be understood by the parties in a multi-media | |||
session that intend to use a Dependent Stream. | session that intend to use a Dependent Stream. | |||
2.1.9. Media Packetizer | 2.1.9. Media Packetizer | |||
The transformation of taking one or more Encoded (Section 2.1.7) or | The transformation of taking one or more Encoded (Section 2.1.7) or | |||
Dependent Stream (Section 2.1.8) and put their content into one or | Dependent Stream (Section 2.1.8) and put their content into one or | |||
more sequences of packets, normally RTP packets, and output Source | more sequences of packets, normally RTP packets, and output Source | |||
Packet Streams (Section 2.1.10). This step includes both generating | RTP Streams (Section 2.1.10). This step includes both generating RTP | |||
RTP payloads as well as RTP packets. | payloads as well as RTP packets. | |||
The Media Packetizer can use multiple inputs when producing a single | The Media Packetizer can use multiple inputs when producing a single | |||
Packet Stream. One such example is SST packetization when using SVC | RTP Stream. One such example is SST packetization when using SVC | |||
(Section 3.2.1). | (Section 3.5). | |||
The Media Packetizer can also produce multiple Packet Streams, for | The Media Packetizer can also produce multiple RTP Streams, for | |||
example when Encoded and/or Dependent Streams are distributed over | example when Encoded and/or Dependent Streams are distributed over | |||
multiple Packet Streams. One example of this is MST packetization | multiple RTP Streams. One example of this is MST packetization when | |||
when using SVC (Section 3.2.1). | using SVC (Section 3.5). | |||
Alternate usages: | ||||
o An RTP sender is part of the Media Packetizer. | ||||
Characteristics: | Characteristics: | |||
o The Media Packetizer will select which Synchronization source(s) | o The Media Packetizer will select which Synchronization source(s) | |||
(SSRC) [RFC3550] in which RTP sessions that are used. | (SSRC) [RFC3550] in which RTP sessions that are used. | |||
o Media Packetizer can combine multiple Encoded or Dependent Streams | o Media Packetizer can combine multiple Encoded or Dependent Streams | |||
into one or more Packet Streams. | into one or more RTP Streams. | |||
2.1.10. Packet Stream | 2.1.10. RTP Stream | |||
A stream of RTP packets containing media data, source or redundant. | A stream of RTP packets containing media data, source or redundant. | |||
The Packet Stream is identified by an SSRC belonging to a particular | The RTP Stream is identified by an SSRC belonging to a particular RTP | |||
RTP session. The RTP session is identified as discussed in | session. The RTP session is identified as discussed in | |||
Section 2.2.2. | Section 2.2.2. | |||
A Source Packet Stream is a packet stream containing at least some | A Source RTP Stream is a RTP Stream containing at least some content | |||
content from an Encoded Stream. Source material is any media | from an Encoded Stream. Source material is any media material that | |||
material that is produced for transport over RTP without any | is produced for transport over RTP without any additional redundancy | |||
additional redundancy applied to cope with network transport losses. | applied to cope with network transport losses. Compare this with the | |||
Compare this with the Redundancy Packet Stream (Section 2.1.12). | Redundancy RTP Stream (Section 2.1.12). | |||
Alternate usages: | ||||
o The term "Stream" is used by the CLUE WG to define an encoded | ||||
Media Source sent via RTP. "Capture Encoding", "Encoding Groups" | ||||
are defined to capture specific details of the encoding scheme. | ||||
o RFC3550 [RFC3550] uses the terms media stream, audio stream, video | ||||
stream and streams of (RTP) packets interchangeably. It defines | ||||
the SSRC as the "The source of a stream of RTP packets, ...". | ||||
o The equivalent mapping of a Packet Stream in SDP [RFC4566] is | ||||
defined per usage. For example, each Media Description (m=line) | ||||
and associated attributes can describe one Packet Stream OR | ||||
properties for multiple Packet Streams OR for an RTP session (via | ||||
[RFC5576] mechanisms for example). | ||||
Characteristics: | Characteristics: | |||
o Each Packet Stream is identified by a unique Synchronization | o Each RTP Stream is identified by a unique Synchronization source | |||
source (SSRC) [RFC3550] that is carried in every RTP and RTP | (SSRC) [RFC3550] that is carried in every RTP and RTP Control | |||
Control Protocol (RTCP) packet header in a specific RTP session | Protocol (RTCP) packet header in a specific RTP session context. | |||
context. | ||||
o At any given point in time, a Packet Stream can have one and only | o At any given point in time, a RTP Stream can have one and only one | |||
one SSRC. SSRC collision is a valid reason to change SSRC for a | SSRC. SSRC collision and clock rate change [RFC7160] are examples | |||
Packet Stream, since the Packet Stream itself is not changed in | of valid reasons to change SSRC for a RTP Stream, since the RTP | |||
any way, only the identifying SSRC number. | Stream itself is not changed in any significant way, only the | |||
identifying SSRC number. | ||||
o Each Packet Stream defines a unique RTP sequence numbering and | o Each RTP Stream defines a unique RTP sequence numbering and timing | |||
timing space. | space. | |||
o Several Packet Streams may map to a single Media Source via the | o Several RTP Streams may map to a single Media Source via the | |||
source transformations. | source transformations. | |||
o Several Packet Streams can be carried over a single RTP Session. | o Several RTP Streams can be carried over a single RTP Session. | |||
2.1.11. Media Redundancy | 2.1.11. Media Redundancy | |||
Media redundancy is a transformation that generates redundant or | Media redundancy is a transformation that generates redundant or | |||
repair packets sent out as a Redundancy Packet Stream to mitigate | repair packets sent out as a Redundancy RTP Stream to mitigate | |||
network transport impairments, like packet loss and delay. | network transport impairments, like packet loss and delay. | |||
The Media Redundancy exists in many flavors; they may be generating | The Media Redundancy exists in many flavors; they may be generating | |||
independent Repair Streams that are used in addition to the Source | independent Repair Streams that are used in addition to the Source | |||
Stream (RTP Retransmission [RFC4588] and some FEC [RFC5109]), they | Stream (RTP Retransmission [RFC4588] and some FEC [RFC5109]), they | |||
may generate a new Source Stream by combining redundancy information | may generate a new Source Stream by combining redundancy information | |||
with source information (Using XOR FEC [RFC5109] as a redundancy | with source information (Using XOR FEC [RFC5109] as a redundancy | |||
payload [RFC2198]), or completely replace the source information with | payload [RFC2198]), or completely replace the source information with | |||
only redundancy packets. | only redundancy packets. | |||
2.1.12. Redundancy Packet Stream | 2.1.12. Redundancy RTP Stream | |||
A Packet Stream (Section 2.1.10) that contains no original source | A RTP Stream (Section 2.1.10) that contains no original source data, | |||
data, only redundant data that may be combined with one or more | only redundant data that may be combined with one or more Received | |||
Received Packet Stream (Section 2.1.14) to produce Repaired Packet | RTP Stream (Section 2.1.19) to produce Repaired RTP Streams | |||
Streams (Section 2.1.17). | (Section 2.1.22). | |||
2.1.13. Media Transport | 2.1.13. Media Transport | |||
A Media Transport defines the transformation that the Packet Streams | A Media Transport defines the transformation that the RTP Streams | |||
(Section 2.1.10) are subjected to by the end-to-end transport from | (Section 2.1.10) are subjected to by the end-to-end transport from | |||
one RTP sender to one specific RTP receiver (an RTP session may | one RTP sender to one specific RTP receiver (an RTP session may | |||
contain multiple RTP receivers per sender). Each Media Transport is | contain multiple RTP receivers per sender). Each Media Transport is | |||
defined by a transport association that is identified by a 5-tuple | defined by a transport association that is identified by a 5-tuple | |||
(source address, source port, destination address, destination port, | (source address, source port, destination address, destination port, | |||
transport protocol). Each transport association normally contains | transport protocol). Each transport association normally contains | |||
only a single RTP session, although a proposal exists for sending | only a single RTP session, although a proposal exists for sending | |||
multiple RTP sessions over one transport association | multiple RTP sessions over one transport association | |||
[I-D.westerlund-avtcore-transport-multiplexing]. | [I-D.westerlund-avtcore-transport-multiplexing]. | |||
Characteristics: | Characteristics: | |||
o Media Transport transmits Packet Streams of RTP Packets from a | o Media Transport transmits RTP Streams of RTP Packets from a source | |||
source transport address to a destination transport address. | transport address to a destination transport address. | |||
The Media Transport concept sometimes needs to be decomposed into | The Media Transport concept sometimes needs to be decomposed into | |||
more steps to enable discussion of what a sender emits that gets | more steps to enable discussion of what a sender emits that gets | |||
transformed by the network before it is received by the receiver. | transformed by the network before it is received by the receiver. | |||
Thus we provide also this Media Transport decomposition (Figure 5). | Thus we provide also this Media Transport decomposition (Figure 5). | |||
Packet Stream | RTP Stream | |||
| | | | |||
V | V | |||
+--------------------------+ | +--------------------------+ | |||
| Media Transport Sender | | | Media Transport Sender | | |||
+--------------------------+ | +--------------------------+ | |||
| | | | |||
Sent Packet Stream | Sent RTP Stream | |||
V | V | |||
+--------------------------+ | +--------------------------+ | |||
| Network Transport | | | Network Transport | | |||
+--------------------------+ | +--------------------------+ | |||
| | | | |||
Transported Packet Stream | Transported RTP Stream | |||
V | V | |||
+--------------------------+ | +--------------------------+ | |||
| Media Transport Receiver | | | Media Transport Receiver | | |||
+--------------------------+ | +--------------------------+ | |||
| | | | |||
V | V | |||
Received Packet Stream | Received RTP Stream | |||
Figure 5: Decomposition of Media Transport | Figure 5: Decomposition of Media Transport | |||
2.1.13.1. Media Transport Sender | 2.1.14. Media Transport Sender | |||
The first transformation within the Media Transport (Section 2.1.13) | The first transformation within the Media Transport (Section 2.1.13) | |||
is the Media Transport Sender, where the sending End-Point | is the Media Transport Sender, where the sending End-Point | |||
(Section 2.2.1) takes a Packet Stream and emits the packets onto the | (Section 2.2.1) takes a RTP Stream and emits the packets onto the | |||
network using the transport association established for this Media | network using the transport association established for this Media | |||
Transport thus creating a Sent Packet Stream (Section 2.1.13.2). In | Transport thus creating a Sent RTP Stream (Section 2.1.15). In this | |||
this process it transforms the Packet Stream in several ways. First, | process it transforms the RTP Stream in several ways. First, it | |||
it gains the necessary protocol headers for the transport | gains the necessary protocol headers for the transport association, | |||
association, for example IP and UDP headers, thus forming IP/UDP/RTP | for example IP and UDP headers, thus forming IP/UDP/RTP packets. In | |||
packets. In addition, the Media Transport Sender may queue, pace or | addition, the Media Transport Sender may queue, pace or otherwise | |||
otherwise affect how the packets are emitted onto the network. Thus | affect how the packets are emitted onto the network. Thus adding | |||
adding delay, jitter and inter packet spacings that characterize the | delay, jitter and inter packet spacings that characterize the Sent | |||
Sent Packet Stream. | RTP Stream. | |||
2.1.13.2. Sent Packet Stream | 2.1.15. Sent RTP Stream | |||
The Sent Packet Stream is the Packet Stream as entering the first hop | The Sent RTP Stream is the RTP Stream as entering the first hop of | |||
of the network path to its destination. The Sent Packet Stream is | the network path to its destination. The Sent RTP Stream is | |||
identified using network transport addresses, like for IP/UDP the | identified using network transport addresses, like for IP/UDP the | |||
5-tuple (source IP address, source port, destination IP address, | 5-tuple (source IP address, source port, destination IP address, | |||
destination port, and protocol (UDP)). | destination port, and protocol (UDP)). | |||
2.1.13.3. Network Transport | 2.1.16. Network Transport | |||
Network Transport is the transformation that the Sent Packet Stream | Network Transport is the transformation that the Sent RTP Stream | |||
(Section 2.1.13.2) is subjected to by traveling from the source to | (Section 2.1.15) is subjected to by traveling from the source to the | |||
the destination through the network. These transformations include, | destination through the network. These transformations include, loss | |||
loss of some packets, varying delay on a per packet basis, packet | of some packets, varying delay on a per packet basis, packet | |||
duplication, and packet header or data corruption. These | duplication, and packet header or data corruption. These | |||
transformations produces a Transported Packet Stream | transformations produces a Transported RTP Stream (Section 2.1.17) at | |||
(Section 2.1.13.4) at the exit of the network path. | the exit of the network path. | |||
2.1.13.4. Transported Packet Stream | 2.1.17. Transported RTP Stream | |||
The Packet Stream that is emitted out of the network path at the | The RTP Stream that is emitted out of the network path at the | |||
destination, subjected to the Network Transport's transformation | destination, subjected to the Network Transport's transformation | |||
(Section 2.1.13.3). | (Section 2.1.16). | |||
2.1.13.5. Media Transport Receiver | 2.1.18. Media Transport Receiver | |||
The receiver End-Point's (Section 2.2.1) transformation of the | The receiver End-Point's (Section 2.2.1) transformation of the | |||
Transported Packet Stream (Section 2.1.13.4) by its reception process | Transported RTP Stream (Section 2.1.17) by its reception process that | |||
that result in the Received Packet Stream (Section 2.1.14). This | result in the Received RTP Stream (Section 2.1.19). This | |||
transformation includes transport checksums being verified and if | transformation includes transport checksums being verified and if | |||
non-matching, causing discarding of the corrupted packet. Other | non-matching, causing discarding of the corrupted packet. Other | |||
transformations can include delay variations in receiving a packet on | transformations can include delay variations in receiving a packet on | |||
the network interface and providing it to the application. | the network interface and providing it to the application. | |||
2.1.14. Received Packet Stream | 2.1.19. Received RTP Stream | |||
The Packet Stream (Section 2.1.10) resulting from the Media | The RTP Stream (Section 2.1.10) resulting from the Media Transport's | |||
Transport's transformation, i.e. subjected to packet loss, packet | transformation, i.e. subjected to packet loss, packet corruption, | |||
corruption, packet duplication and varying transmission delay from | packet duplication and varying transmission delay from sender to | |||
sender to receiver. | receiver. | |||
2.1.15. Received Redundandy Packet Stream | 2.1.20. Received Redundancy RTP Stream | |||
The Redundancy Packet Stream (Section 2.1.12) resulting from the | The Redundancy RTP Stream (Section 2.1.12) resulting from the Media | |||
Media Transport's transformation, i.e. subjected to packet loss, | Transport transformation, i.e. subjected to packet loss, packet | |||
packet corruption, and varying transmission delay from sender to | corruption, and varying transmission delay from sender to receiver. | |||
receiver. | ||||
2.1.16. Media Repair | 2.1.21. Media Repair | |||
A Transformation that takes as input one or more Source Packet | A Transformation that takes as input one or more Source RTP Streams | |||
Streams (Section 2.1.10) as well as Redundancy Packet Streams | (Section 2.1.10) as well as Redundancy RTP Streams (Section 2.1.12) | |||
(Section 2.1.12) and attempts to combine them to counter the | and attempts to combine them to counter the transformations | |||
transformations introduced by the Media Transport (Section 2.1.13) to | introduced by the Media Transport (Section 2.1.13) to minimize the | |||
minimize the difference between the Source Stream (Section 2.1.5) and | difference between the Source Stream (Section 2.1.5) and the Received | |||
the Received Source Stream (Section 2.1.21) after Media Decoder | Source Stream (Section 2.1.26) after Media Decoder (Section 2.1.25). | |||
(Section 2.1.20). The output is a Repaired Packet Stream | The output is a Repaired RTP Stream (Section 2.1.22). | |||
(Section 2.1.17). | ||||
2.1.17. Repaired Packet Stream | 2.1.22. Repaired RTP Stream | |||
A Received Packet Stream (Section 2.1.14) for which Received | A Received RTP Stream (Section 2.1.19) for which Received Redundancy | |||
Redundancy Packet Stream (Section 2.1.15) information has been used | RTP Stream (Section 2.1.20) information has been used to try to re- | |||
to try to re-create the Packet Stream (Section 2.1.10) as it was | create the RTP Stream (Section 2.1.10) as it was before Media | |||
before Media Transport (Section 2.1.13). | Transport (Section 2.1.13). | |||
2.1.18. Media Depacketizer | 2.1.23. Media Depacketizer | |||
A Media Depacketizer takes one or more Packet Streams | A Media Depacketizer takes one or more RTP Streams (Section 2.1.10) | |||
(Section 2.1.10) and depacketizes them and attempts to reconstitute | and depacketizes them and attempts to reconstitute the Encoded | |||
the Encoded Streams (Section 2.1.7) or Dependent Streams | Streams (Section 2.1.7) or Dependent Streams (Section 2.1.8) present | |||
(Section 2.1.8) present in those Packet Streams. | in those RTP Streams. | |||
It should be noted that in practical implementations, the Media | It should be noted that in practical implementations, the Media | |||
Depacketizer and the Media Decoder may be tightly coupled and share | Depacketizer and the Media Decoder may be tightly coupled and share | |||
information to improve or optimize the overall decoding process in | information to improve or optimize the overall decoding process in | |||
various ways. It is however not expected that there would be any | various ways. It is however not expected that there would be any | |||
benefit in defining a taxonomy for those detailed (and likely very | benefit in defining a taxonomy for those detailed (and likely very | |||
implementation-dependent) steps. | implementation-dependent) steps. | |||
2.1.19. Received Encoded Stream | 2.1.24. Received Encoded Stream | |||
The received version of an Encoded Stream (Section 2.1.7). | The received version of an Encoded Stream (Section 2.1.7). | |||
2.1.20. Media Decoder | 2.1.25. Media Decoder | |||
A Media Decoder is a transformation that is responsible for decoding | A Media Decoder is a transformation that is responsible for decoding | |||
Encoded Streams (Section 2.1.7) and any Dependent Streams | Encoded Streams (Section 2.1.7) and any Dependent Streams | |||
(Section 2.1.8) into a Source Stream (Section 2.1.5). | (Section 2.1.8) into a Source Stream (Section 2.1.5). | |||
It should be noted that in practical implementations, the Media | It should be noted that in practical implementations, the Media | |||
Decoder and the Media Depacketizer may be tightly coupled and share | Decoder and the Media Depacketizer may be tightly coupled and share | |||
information to improve or optimize the overall decoding process in | information to improve or optimize the overall decoding process in | |||
various ways. It is however not expected that there would be any | various ways. It is however not expected that there would be any | |||
benefit in defining a taxonomy for those detailed (and likely very | benefit in defining a taxonomy for those detailed (and likely very | |||
implementation-dependent) steps. | implementation-dependent) steps. | |||
Alternate usages: | ||||
o Within the context of SDP, an m=line describes the necessary | ||||
configuration and identification (RTP Payload Types) required to | ||||
decode either one or more incoming Media Streams. | ||||
Characteristics: | Characteristics: | |||
o A Media Decoder is the entity that will have to deal with any | o A Media Decoder is the entity that will have to deal with any | |||
errors in the encoded streams that resulted from corruptions or | errors in the encoded streams that resulted from corruptions or | |||
failures to repair packet losses. This as a media decoder | failures to repair packet losses. This as a media decoder | |||
generally is forced to produce some output periodically. It thus | generally is forced to produce some output periodically. It thus | |||
commonly includes concealment methods. | commonly includes concealment methods. | |||
2.1.21. Received Source Stream | 2.1.26. Received Source Stream | |||
The received version of a Source Stream (Section 2.1.5). | The received version of a Source Stream (Section 2.1.5). | |||
2.1.22. Media Sink | 2.1.27. Media Sink | |||
The Media Sink receives a Source Stream (Section 2.1.5) that | The Media Sink receives a Source Stream (Section 2.1.5) that | |||
contains, usually periodically, sampled media data together with | contains, usually periodically, sampled media data together with | |||
associated synchronization information. Depending on application, | associated synchronization information. Depending on application, | |||
this Source Stream then needs to be transformed into a Raw Stream | this Source Stream then needs to be transformed into a Raw Stream | |||
(Section 2.1.3) that is sent in synchronization with the output from | (Section 2.1.3) that is sent in synchronization with the output from | |||
other Media Sinks to a Media Render (Section 2.1.24). The media sink | other Media Sinks to a Media Render (Section 2.1.29). The media sink | |||
may also be connected with a Media Source (Section 2.1.4) and be used | may also be connected with a Media Source (Section 2.1.4) and be used | |||
as part of a conceptual Media Source. | as part of a conceptual Media Source. | |||
Characteristics: | Characteristics: | |||
o The Media Sink can further transform the Source Stream into a | o The Media Sink can further transform the Source Stream into a | |||
representation that is suitable for rendering on the Media Render | representation that is suitable for rendering on the Media Render | |||
as defined by the application or system-wide configuration. This | as defined by the application or system-wide configuration. This | |||
include sample scaling, level adjustments etc. | include sample scaling, level adjustments etc. | |||
2.1.23. Received Raw Stream | 2.1.28. Received Raw Stream | |||
The received version of a Raw Stream (Section 2.1.3). | The received version of a Raw Stream (Section 2.1.3). | |||
2.1.24. Media Render | 2.1.29. Media Render | |||
A Media Render takes a Raw Stream (Section 2.1.3) and converts it | A Media Render takes a Raw Stream (Section 2.1.3) and converts it | |||
into Physical Stimulus (Section 2.1.1) that a human user can | into Physical Stimulus (Section 2.1.1) that a human user can | |||
perceive. Examples of such devices are screens, D/A converters | perceive. Examples of such devices are screens, D/A converters | |||
connected to amplifiers and loudspeakers. | connected to amplifiers and loudspeakers. | |||
Characteristics: | Characteristics: | |||
o An End Point can potentially have multiple Media Renders for each | o An End Point can potentially have multiple Media Renders for each | |||
media type. | media type. | |||
2.2. Communication Entities | 2.2. Communication Entities | |||
This section contains concept for entities involved in the | This section contains concept for entities involved in the | |||
communication. | communication. | |||
+----------------------------------------------------------+ | ||||
| Communication Session | | ||||
| | | ||||
| +----------------+ +----------------+ | | ||||
| | Participant A | +------------+ | Participant B | | | ||||
| | | | Multimedia | | | | | ||||
| | +-------------+|<=>| Session |<=>|+-------------+ | | | ||||
| | | End Point A || | | || End Point B | | | | ||||
| | | || +------------+ || | | | | ||||
| | | +-----------++--------------------++-----------+ | | | | ||||
| | | | RTP Session| | | | | | | ||||
| | | | Audio |--Media Transport-->| | | | | | ||||
| | | | |<--Media Transport--| | | | | | ||||
| | | +-----------++--------------------++-----------+ | | | | ||||
| | | || || | | | | ||||
| | | +-----------++--------------------++-----------+ | | | | ||||
| | | | RTP Session| | | | | | | ||||
| | | | Video |--Media Transport-->| | | | | | ||||
| | | | |<--Media Transport--| | | | | | ||||
| | | +-----------++--------------------++-----------+ | | | | ||||
| | +-------------+| |+-------------+ | | | ||||
| +----------------+ +----------------+ | | ||||
+----------------------------------------------------------+ | ||||
Figure 6: Example Point to Point Communication Session with two RTP | ||||
Sessions | ||||
The figure above shows a high-level example representation of a very | ||||
basic point-to-point Communication Session between Participants A and | ||||
B. It uses two different audio and video RTP Sessions between A's | ||||
and B's End Points, using separate Media Transports for those RTP | ||||
Sessions. The Multimedia Session shared by the participants can for | ||||
example be established using SIP (i.e., there is a SIP Dialog between | ||||
A and B). The terms used in that figure are further elaborated in | ||||
the sub-sections below. | ||||
2.2.1. End Point | 2.2.1. End Point | |||
Editor's note: Consider if a single word, "Endpoint", is | ||||
preferable | ||||
A single addressable entity sending or receiving RTP packets. It may | A single addressable entity sending or receiving RTP packets. It may | |||
be decomposed into several functional blocks, but as long as it | be decomposed into several functional blocks, but as long as it | |||
behaves as a single RTP stack entity it is classified as a single | behaves as a single RTP stack entity it is classified as a single | |||
"End Point". | "End Point". | |||
Alternate usages: | ||||
o The CLUE Working Group (WG) uses the terms "Media Provider" and | ||||
"Media Consumer" to describes aspects of End Point pertaining to | ||||
sending and receiving functionalities. | ||||
Characteristics: | Characteristics: | |||
o End Points can be identified in several different ways. While | o End Points can be identified in several different ways. While | |||
RTCP Canonical Names (CNAMEs) [RFC3550] provide a globally unique | RTCP Canonical Names (CNAMEs) [RFC3550] provide a globally unique | |||
and stable identification mechanism for the duration of the | and stable identification mechanism for the duration of the | |||
Communication Session (see Section 2.2.5), their validity applies | Communication Session (see Section 2.2.5), their validity applies | |||
exclusively within a Synchronization Context (Section 3.1.1). | exclusively within a Synchronization Context (Section 3.1). Thus | |||
Thus one End Point can have multiple CNAMEs. Therefore, | one End Point can handle multiple CNAMEs, each of which can be | |||
mechanisms outside the scope of RTP, such as application defined | shared among a set of End Points belonging to the same Participant | |||
mechanisms, must be used to ensure End Point identification when | (Section 2.2.3). Therefore, mechanisms outside the scope of RTP, | |||
outside this Synchronization Context. | such as application defined mechanisms, must be used to ensure End | |||
Point identification when outside this Synchronization Context. | ||||
o An End Point can be associated with at most one Participant | ||||
(Section 2.2.3) at any single point in time. | ||||
o In some contexts, an End Point would typically correspond to a | ||||
single "host". | ||||
2.2.2. RTP Session | 2.2.2. RTP Session | |||
Editor's note: Re-consider if this is really a Communication | ||||
Entity, or if it is rather an existing concept that should be | ||||
described in Section 4. | ||||
An RTP session is an association among a group of participants | An RTP session is an association among a group of participants | |||
communicating with RTP. It is a group communications channel which | communicating with RTP. It is a group communications channel which | |||
can potentially carry a number of Packet Streams. Within an RTP | can potentially carry a number of RTP Streams. Within an RTP | |||
session, every participant can find meta-data and control information | session, every participant can find meta-data and control information | |||
(over RTCP) about all the Packet Streams in the RTP session. The | (over RTCP) about all the RTP Streams in the RTP session. The | |||
bandwidth of the RTCP control channel is shared between all | bandwidth of the RTCP control channel is shared between all | |||
participants within an RTP Session. | participants within an RTP Session. | |||
Alternate usages: | ||||
o Within the context of SDP, a singe m=line can map to a single RTP | ||||
Session or multiple m=lines can map to a single RTP Session. The | ||||
latter is enabled via multiplexing schemes such as BUNDLE | ||||
[I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which | ||||
allows mapping of multiple m=lines to a single RTP Session. | ||||
Characteristics: | Characteristics: | |||
o Typically, an RTP Session can carry one ore more Packet Streams. | o Typically, an RTP Session can carry one ore more RTP Streams. | |||
o An RTP Session shares a single SSRC space as defined in RFC3550 | o An RTP Session shares a single SSRC space as defined in RFC3550 | |||
[RFC3550]. That is, the End Points participating in an RTP | [RFC3550]. That is, the End Points participating in an RTP | |||
Session can see an SSRC identifier transmitted by any of the other | Session can see an SSRC identifier transmitted by any of the other | |||
End Points. An End Point can receive an SSRC either as SSRC or as | End Points. An End Point can receive an SSRC either as SSRC or as | |||
a Contributing source (CSRC) in RTP and RTCP packets, as defined | a Contributing source (CSRC) in RTP and RTCP packets, as defined | |||
by the endpoints' network interconnection topology. | by the endpoints' network interconnection topology. | |||
o An RTP Session uses at least two Media Transports | o An RTP Session uses at least two Media Transports | |||
(Section 2.1.13), one for sending and one for receiving. | (Section 2.1.13), one for sending and one for receiving. | |||
skipping to change at page 20, line 32 | skipping to change at page 19, line 35 | |||
more than one RTP Session, unless a solution for multiplexing | more than one RTP Session, unless a solution for multiplexing | |||
multiple RTP sessions over a single Media Transport is used. One | multiple RTP sessions over a single Media Transport is used. One | |||
example of such a scheme is Multiple RTP Sessions on a Single | example of such a scheme is Multiple RTP Sessions on a Single | |||
Lower-Layer Transport | Lower-Layer Transport | |||
[I-D.westerlund-avtcore-transport-multiplexing]. | [I-D.westerlund-avtcore-transport-multiplexing]. | |||
o Multiple RTP Sessions can be related. | o Multiple RTP Sessions can be related. | |||
2.2.3. Participant | 2.2.3. Participant | |||
A participant is an entity reachable by a single signaling address, | A Participant is an entity reachable by a single signaling address, | |||
and is thus related more to the signaling context than to the media | and is thus related more to the signaling context than to the media | |||
context. | context. | |||
Characteristics: | Characteristics: | |||
o A single signaling-addressable entity, using an application- | o A single signaling-addressable entity, using an application- | |||
specific signaling address space, for example a SIP URI. | specific signaling address space, for example a SIP URI. | |||
o A participant can have several Multimedia Sessions | o A Participant can have several Multimedia Sessions | |||
(Section 2.2.4). | (Section 2.2.4). | |||
o A participant can have several associated transport flows, | o A Participant can have several associated End Points | |||
including several separate local transport addresses for those | (Section 2.2.1). | |||
transport flows. | ||||
2.2.4. Multimedia Session | 2.2.4. Multimedia Session | |||
A multimedia session is an association among a group of participants | A multimedia session is an association among a group of participants | |||
engaged in the communication via one or more RTP Sessions | engaged in the communication via one or more RTP Sessions | |||
(Section 2.2.2). It defines logical relationships among Media | (Section 2.2.2). It defines logical relationships among Media | |||
Sources (Section 2.1.4) that appear in multiple RTP Sessions. | Sources (Section 2.1.4) that appear in multiple RTP Sessions. | |||
Alternate usages: | ||||
o RFC4566 [RFC4566] defines a multimedia session as a set of | ||||
multimedia senders and receivers and the data streams flowing from | ||||
senders to receivers. | ||||
o RFC3550 [RFC3550] defines it as set of concurrent RTP sessions | ||||
among a common group of participants. For example, a video | ||||
conference (which is a multimedia session) may contain an audio | ||||
RTP session and a video RTP session. | ||||
Characteristics: | Characteristics: | |||
o A Multimedia Session can be composed of several parallel RTP | o A Multimedia Session can be composed of several parallel RTP | |||
Sessions with potentially multiple Packet Streams per RTP Session. | Sessions with potentially multiple RTP Streams per RTP Session. | |||
o Each participant in a Multimedia Session can have a multitude of | o Each participant in a Multimedia Session can have a multitude of | |||
Media Captures and Media Rendering devices. | Media Captures and Media Rendering devices. | |||
o A single Multimedia Session can contain media from one or more | ||||
Synchronization Contexts (Section 3.1). An example of that is a | ||||
Multimedia Session containing one set of audio and video for | ||||
communication purposes belonging to one Synchronization Context, | ||||
and another set of audio and video for presentation purposes (like | ||||
playing a video file) with a separate Synchronization Context that | ||||
has no strong timing relationship and need not be strictly | ||||
synchronized with the audio and video used for communication. | ||||
2.2.5. Communication Session | 2.2.5. Communication Session | |||
A Communication Session is an association among group of participants | A Communication Session is an association among group of participants | |||
communicating with each other via a set of Multimedia Sessions. | communicating with each other via a set of Multimedia Sessions. | |||
Alternate usages: | ||||
o The Session Description Protocol (SDP) [RFC4566] defines a | ||||
multimedia session as a set of multimedia senders and receivers | ||||
and the data streams flowing from senders to receivers. In that | ||||
definition it is however not clear if a multimedia session | ||||
includes both the sender's and the receiver's view of the same RTP | ||||
Packet Stream. | ||||
Characteristics: | Characteristics: | |||
o Each participant in a Communication Session is identified via an | o Each participant in a Communication Session is identified via an | |||
application-specific signaling address. | application-specific signaling address. | |||
o A Communication Session is composed of at least one Multimedia | o A Communication Session is composed of at least one Multimedia | |||
Session per participant, involving one or more parallel RTP | Session per participant, involving one or more parallel RTP | |||
Sessions with potentially multiple Packet Streams per RTP Session. | Sessions with potentially multiple RTP Streams per RTP Session. | |||
For example, in a full mesh communication, the Communication Session | For example, in a full mesh communication, the Communication Session | |||
consists of a set of separate Multimedia Sessions between each pair | consists of a set of separate Multimedia Sessions between each pair | |||
of Participants. Another example is a centralized conference, where | of Participants. Another example is a centralized conference, where | |||
the Communication Session consists of a set of Multimedia Sessions | the Communication Session consists of a set of Multimedia Sessions | |||
between each Participant and the conference handler. | between each Participant and the conference handler. | |||
3. Relations at Different Levels | 3. Relations at Different Levels | |||
This section uses the concepts from previous section and look at | This section uses the concepts from previous section and look at | |||
different types of relationships among them. These relationships | different types of relationships among them. These relationships | |||
occur at different levels and for different purposes. The section is | occur at different levels and for different purposes. The section is | |||
organized such as to look at the level where a relation is required. | organized such as to look at the level where a relation is required. | |||
The reason for the relationship may exist at another step in the | The reason for the relationship may exist at another step in the | |||
media handling chain. For example, using Simulcast (discussed in | media handling chain. For example, using Simulcast (discussed in | |||
Section 3.3.1) needs to determine relations at Packet Stream level, | Section 3.7) needs to determine relations at RTP Stream level, | |||
however the reason to relate Packet Streams is that multiple Media | however the reason to relate RTP Streams is that multiple Media | |||
Encoders use the same Media Source, i.e. to be able to identify a | Encoders use the same Media Source, i.e. to be able to identify a | |||
common Media Source. | common Media Source. | |||
3.1. Media Source Relations | ||||
Media Sources (Section 2.1.4) are commonly grouped and related to an | Media Sources (Section 2.1.4) are commonly grouped and related to an | |||
End Point (Section 2.2.1) or a Participant (Section 2.2.3). This | End Point (Section 2.2.1) or a Participant (Section 2.2.3). This | |||
occurs for several reasons; both application logic as well as media | occurs for several reasons; both due to application logic as well as | |||
handling purposes. These cases are further discussed below. | for media handling purposes. | |||
3.1.1. Synchronization Context | At RTP Packetization time, there exists a possibility for a number of | |||
different types of relationships between Encoded Streams | ||||
(Section 2.1.7), Dependent Streams (Section 2.1.8) and RTP Streams | ||||
(Section 2.1.10). These are caused by grouping together or | ||||
distributing these different types of streams into RTP Streams. | ||||
The resulting RTP Streams will thus also have relations. This is a | ||||
common relation to handle in RTP due to that RTP Streams are separate | ||||
and have their own SSRC, implying independent sequence numbers and | ||||
timestamp spaces. The underlying reasons for the RTP Stream | ||||
relationships are different, as can be seen in the sub-sections | ||||
below. | ||||
RTP Streams may be protected by Redundancy RTP Streams during | ||||
transport. Several approaches listed below can be used to create | ||||
Redundancy RTP Streams; | ||||
o Duplication of the original RTP Stream | ||||
o Duplication of the original RTP Stream with a time offset, | ||||
o Forward Error Correction (FEC) techniques, and | ||||
o Retransmission of lost packets (either globally or selectively). | ||||
The different RTP Streams can be transported within the same RTP | ||||
Session or in different RTP Sessions to accomplish different | ||||
transport goals. This explicit separation of RTP Streams is further | ||||
discussed in Section 3.13. | ||||
3.1. Synchronization Context | ||||
A Synchronization Context defines a requirement on a strong timing | A Synchronization Context defines a requirement on a strong timing | |||
relationship between the Media Sources, typically requiring alignment | relationship between the Media Sources, typically requiring alignment | |||
of clock sources. Such relationship can be identified in multiple | of clock sources. Such relationship can be identified in multiple | |||
ways as listed below. A single Media Source can only belong to a | ways as listed below. A single Media Source can only belong to a | |||
single Synchronization Context, since it is assumed that a single | single Synchronization Context, since it is assumed that a single | |||
Media Source can only have a single media clock and requiring | Media Source can only have a single media clock and requiring | |||
alignment to several Synchronization Contexts (and thus reference | alignment to several Synchronization Contexts (and thus reference | |||
clocks) will effectively merge those into a single Synchronization | clocks) will effectively merge those into a single Synchronization | |||
Context. | Context. | |||
A single Multimedia Session can contain media from one or more | 3.1.1. RTCP CNAME | |||
Synchronization Contexts. An example of that is a Multimedia Session | ||||
containing one set of audio and video for communication purposes | ||||
belonging to one Synchronization Context, and another set of audio | ||||
and video for presentation purposes (like playing a video file) with | ||||
a separate Synchronization Context that has no strong timing | ||||
relationship and need not be strictly synchronized with the audio and | ||||
video used for communication. | ||||
3.1.1.1. RTCP CNAME | ||||
RFC3550 [RFC3550] describes Inter-media synchronization between RTP | RFC3550 [RFC3550] describes Inter-media synchronization between RTP | |||
Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP) | Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP) | |||
[RFC5905] formatted timestamps of a reference clock. As indicated in | [RFC5905] formatted timestamps of a reference clock. As indicated in | |||
[I-D.ietf-avtcore-clksrc], despite using NTP format timestamps, it is | [I-D.ietf-avtcore-clksrc], despite using NTP format timestamps, it is | |||
not required that the clock be synchronized to an NTP source. | not required that the clock be synchronized to an NTP source. | |||
3.1.1.2. Clock Source Signaling | 3.1.2. Clock Source Signaling | |||
[I-D.ietf-avtcore-clksrc] provides a mechanism to signal the clock | [I-D.ietf-avtcore-clksrc] provides a mechanism to signal the clock | |||
source in SDP both for the reference clock as well as the media | source in SDP both for the reference clock as well as the media | |||
clock, thus allowing a Synchronization Context to be defined beyond | clock, thus allowing a Synchronization Context to be defined beyond | |||
the one defined by the usage of CNAME source descriptions. | the one defined by the usage of CNAME source descriptions. | |||
3.1.1.3. CLUE Scenes | 3.1.3. Implicitly via RtcMediaStream | |||
In CLUE "Capture Scene", "Capture Scene Entry" and "Captures" define | ||||
an implied Synchronization Context. | ||||
3.1.1.4. Implicitly via RtcMediaStream | ||||
The WebRTC WG defines "RtcMediaStream" with one or more | The WebRTC WG defines "RtcMediaStream" with one or more | |||
"RtcMediaStreamTracks". All tracks in a "RtcMediaStream" are | "RtcMediaStreamTracks". All tracks in a "RtcMediaStream" are | |||
intended to be possible to synchronize when rendered. | intended to be possible to synchronize when rendered. | |||
3.1.1.5. Explicitly via SDP Mechanisms | 3.1.4. Explicitly via SDP Mechanisms | |||
RFC5888 [RFC5888] defines m=line grouping mechanism called "Lip | RFC5888 [RFC5888] defines m=line grouping mechanism called "Lip | |||
Synchronization (LS)" for establishing the synchronization | Synchronization (LS)" for establishing the synchronization | |||
requirement across m=lines when they map to individual sources. | requirement across m=lines when they map to individual sources. | |||
RFC5576 [RFC5576] extends the above mechanism when multiple media | RFC5576 [RFC5576] extends the above mechanism when multiple media | |||
sources are described by a single m=line. | sources are described by a single m=line. | |||
3.1.2. End Point | 3.2. End Point | |||
Some applications requires knowledge of what Media Sources originate | Some applications requires knowledge of what Media Sources originate | |||
from a particular End Point (Section 2.2.1). This can include such | from a particular End Point (Section 2.2.1). This can include such | |||
decisions as packet routing between parts of the topology, knowing | decisions as packet routing between parts of the topology, knowing | |||
the End Point origin of the Packet Streams. | the End Point origin of the RTP Streams. | |||
In RTP, this identification has been overloaded with the | In RTP, this identification has been overloaded with the | |||
Synchronization Context through the usage of the source description | Synchronization Context (Section 3.1) through the usage of the RTCP | |||
CNAME item. This works for some usages, but sometimes it breaks | source description CNAME (Section 3.1.1) item. This works for some | |||
down. For example, if an End Point has two sets of Media Sources | usages, but sometimes it breaks down. For example, if an End Point | |||
that have different Synchronization Contexts, like the audio and | has two sets of Media Sources that have different Synchronization | |||
video of the human participant as well as a set of Media Sources of | Contexts, like the audio and video of the human participant as well | |||
audio and video for a shared movie. Thus, an End Point may have | as a set of Media Sources of audio and video for a shared movie. | |||
multiple CNAMEs. The CNAMEs or the Media Sources themselves can be | Thus, an End Point may have multiple CNAMEs. The CNAMEs or the Media | |||
related to the End Point. | Sources themselves can be related to the End Point. | |||
3.1.3. Participant | 3.3. Participant | |||
In communication scenarios, it is commonly needed to know which Media | In communication scenarios, it is commonly needed to know which Media | |||
Sources that originate from which Participant (Section 2.2.3). Thus | Sources that originate from which Participant (Section 2.2.3). Thus | |||
enabling the application to for example display Participant Identity | enabling the application to for example display Participant Identity | |||
information correctly associated with the Media Sources. This | information correctly associated with the Media Sources. This | |||
association is currently handled through the signaling solution to | association is currently handled through the signaling solution to | |||
point at a specific Multimedia Session where the Media Sources may be | point at a specific Multimedia Session where the Media Sources may be | |||
explicitly or implicitly tied to a particular End Point. | explicitly or implicitly tied to a particular End Point. | |||
Participant information becomes more problematic due to Media Sources | Participant information becomes more problematic due to Media Sources | |||
that are generated through mixing or other conceptual processing of | that are generated through mixing or other conceptual processing of | |||
Raw Streams or Source Streams that originate from different | Raw Streams or Source Streams that originate from different | |||
Participants. This type of Media Sources can thus have a dynamically | Participants. This type of Media Sources can thus have a dynamically | |||
varying set of origins and Participants. RTP contains the concept of | varying set of origins and Participants. RTP contains the concept of | |||
Contributing Sources (CSRC) that carries such information about the | Contributing Sources (CSRC) that carries such information about the | |||
previous step origin of the included media content on RTP level. | previous step origin of the included media content on RTP level. | |||
3.1.4. WebRTC MediaStream | 3.4. RtcMediaStream | |||
An RtcMediaStream, in addition to requiring a single Synchronization | ||||
Context as discussed above, is also an explicit grouping of a set of | ||||
Media Sources, as identified by RtcMediaStreamTracks, within the | ||||
RtcMediaStream. | ||||
3.2. Packetization Time Relations | ||||
At RTP Packetization time, there exists a possibility for a number of | An RtcMediaStream in WebRTC is an explicit grouping of a set of Media | |||
different types of relationships between Encoded Streams | Sources (RtcMediaStreamTracks) that share a common identifier and a | |||
(Section 2.1.7), Dependent Streams (Section 2.1.8) and Packet Streams | single Synchronization Context (Section 3.1). | |||
(Section 2.1.10). These are caused by grouping together or | ||||
distributing these different types of streams into Packet Streams. | ||||
This section will look at such relationships. | ||||
3.2.1. Single and Multi-Session Transmission of SVC | 3.5. Single- and Multi-Session Transmission of SVC | |||
Scalable Video Coding [RFC6190] has a mode of operation called Single | Scalable Video Coding [RFC6190] has a mode of operation called Single | |||
Session Transmission (SST), where Encoded Streams and Dependent | Session Transmission (SST), where Encoded Streams and Dependent | |||
Streams from the SVC Media Encoder are sent in a single RTP Session | Streams from the SVC Media Encoder are sent in a single RTP Session | |||
(Section 2.2.2) using the SVC RTP Payload format. There is another | (Section 2.2.2) using the SVC RTP Payload format. There is another | |||
mode of operation where Encoded Streams and Dependent Streams are | mode of operation where Encoded Streams and Dependent Streams are | |||
distributed across multiple RTP Sessions, called Multi-Session | distributed across multiple RTP Sessions, called Multi-Session | |||
Transmission (MST). Regardless if used with SST or MST, as they are | Transmission (MST). SST denotes one or more RTP Streams (SSRC) per | |||
defined, each of those RTP Sessions may contain one or more Packet | Media Source in a single RTP Session. MST denotes one or more RTP | |||
Streams (SSRC) per Media Source. | Streams (SSRC) per Media Source in each of multiple RTP Sessions. | |||
This is not always clear from the SVC payload format text [RFC6190], | ||||
but is what existing deployments of that RFC have implemented. | ||||
To elaborate, what could be called SST-SingleStream (SST-SS) uses a | To elaborate, what could be called SST-SingleStream (SST-SS) uses a | |||
single Packet Stream in a single RTP Session to send all Encoded and | single RTP Stream in a single RTP Session to send all Encoded and | |||
Dependent Streams. Similarly, SST-MultiStream (SST-MS) uses multiple | Dependent Streams from a single Media Source. Similarly, SST- | |||
Packet Streams in a single RTP Session to send the Encoded and | MultiStream (SST-MS) uses a single RTP Stream per Media Source in a | |||
Dependent Streams. MST-SS uses a single Packet Stream in each of | single RTP Session to send the Encoded and Dependent Streams. MST-SS | |||
multiple RTP Sessions and MST-MS uses multiple Packet Streams in each | uses a single RTP Stream in each of multiple RTP Sessions, where each | |||
of the multiple RTP Sessions: | RTP Stream can originate from any one of possibly multiple Media | |||
Sources. Finally, MST-MS uses multiple RTP Streams in each of the | ||||
multiple RTP Sessions, where each RTP Stream can originate from any | ||||
one of possibly multiple Media Sources. This is summarized below: | ||||
+-----------------------+--------------------+----------------------+ | +--------------------------+------------------+---------------------+ | |||
| | Single RTP Session | Multiple RTP | | | RTP Streams per Media | Single RTP | Multiple RTP | | |||
| | | Sessions | | | Source | Session | Sessions | | |||
+-----------------------+--------------------+----------------------+ | +--------------------------+------------------+---------------------+ | |||
| Single Packet Stream | SST-SS | MST-SS | | | Single | SST-SS | MST-SS | | |||
| Multiple Packet | SST-MS | MST-MS | | | Multiple | SST-MS | MST-MS | | |||
| Streams | | | | +--------------------------+------------------+---------------------+ | |||
+-----------------------+--------------------+----------------------+ | ||||
3.2.2. Multi-Channel Audio | Table 1: SST / MST Summary | |||
3.6. Multi-Channel Audio | ||||
There exist a number of RTP payload formats that can carry multi- | There exist a number of RTP payload formats that can carry multi- | |||
channel audio, despite the codec being a mono encoder. Multi-channel | channel audio, despite the codec being a mono encoder. Multi-channel | |||
audio can be viewed as multiple Media Sources sharing a common | audio can be viewed as multiple Media Sources sharing a common | |||
Synchronization Context. These are independently encoded by a Media | Synchronization Context. These are independently encoded by a Media | |||
Encoder and the different Encoded Streams are then packetized | Encoder and the different Encoded Streams are then packetized | |||
together in a time synchronized way into a single Source Packet | together in a time synchronized way into a single Source RTP Stream | |||
Stream using the used codec's RTP Payload format. Example of such | using the used codec's RTP Payload format. Example of such codecs | |||
codecs are, PCMA and PCMU [RFC3551], AMR [RFC4867], and G.719 | are, PCMA and PCMU [RFC3551], AMR [RFC4867], and G.719 [RFC5404]. | |||
[RFC5404]. | ||||
3.2.3. Redundancy Format | ||||
The RTP Payload for Redundant Audio Data [RFC2198] defines how one | ||||
can transport redundant audio data together with primary data in the | ||||
same RTP payload. The redundant data can be a time delayed version | ||||
of the primary or another time delayed Encoded Stream using a | ||||
different Media Encoder to encode the same Media Source as the | ||||
primary, as depicted below in Figure 6. | ||||
+--------------------+ | ||||
| Media Source | | ||||
+--------------------+ | ||||
| | ||||
Source Stream | ||||
| | ||||
+------------------------+ | ||||
| | | ||||
V V | ||||
+--------------------+ +--------------------+ | ||||
| Media Encoder | | Media Encoder | | ||||
+--------------------+ +--------------------+ | ||||
| | | ||||
| +------------+ | ||||
Encoded Stream | Time Delay | | ||||
| +------------+ | ||||
| | | ||||
| +------------------+ | ||||
V V | ||||
+--------------------+ | ||||
| Media Packetizer | | ||||
+--------------------+ | ||||
| | ||||
V | ||||
Packet Stream | ||||
Figure 6: Concept for usage of Audio Redundancy with different Media | ||||
Encoders | ||||
The Redundancy format is thus providing the necessary meta | ||||
information to correctly relate different parts of the same Encoded | ||||
Stream, or in the case depicted above (Figure 6) relate the Received | ||||
Source Stream fragments coming out of different Media Decoders to be | ||||
able to combine them together into a less erroneous Source Stream. | ||||
3.3. Packet Stream Relations | ||||
This section discusses various cases of relationships among Packet | ||||
Streams. This is a common relation to handle in RTP due to that | ||||
Packet Streams are separate and have their own SSRC, implying | ||||
independent sequence numbers and timestamp spaces. The underlying | ||||
reasons for the Packet Stream relationships are different, as can be | ||||
seen in the cases below. The different Packet Streams can be handled | ||||
within the same RTP Session or different RTP Sessions to accomplish | ||||
different transport goals. This separation of Packet Streams is | ||||
further discussed in Section 3.3.4. | ||||
3.3.1. Simulcast | 3.7. Simulcast | |||
A Media Source represented as multiple independent Encoded Streams | A Media Source represented as multiple independent Encoded Streams | |||
constitutes a simulcast of that Media Source. Figure 7 below | constitutes a simulcast of that Media Source. Figure 7 below | |||
represents an example of a Media Source that is encoded into three | represents an example of a Media Source that is encoded into three | |||
separate and different Simulcast streams, that are in turn sent on | separate and different Simulcast streams, that are in turn sent on | |||
the same Media Transport flow. When using Simulcast, the Packet | the same Media Transport flow. When using Simulcast, the RTP Streams | |||
Streams may be sharing RTP Session and Media Transport, or be | may be sharing RTP Session and Media Transport, or be separated on | |||
separated on different RTP Sessions and Media Transports, or be any | different RTP Sessions and Media Transports, or be any combination of | |||
combination of these two. It is other considerations that affect | these two. It is other considerations that affect which usage is | |||
which usage is desirable, as discussed in Section 3.3.4. | desirable, as discussed in Section 3.13. | |||
+----------------+ | +----------------+ | |||
| Media Source | | | Media Source | | |||
+----------------+ | +----------------+ | |||
Source Stream | | Source Stream | | |||
+----------------------+----------------------+ | +----------------------+----------------------+ | |||
| | | | | | | | |||
v v v | V V V | |||
+------------------+ +------------------+ +------------------+ | +------------------+ +------------------+ +------------------+ | |||
| Media Encoder | | Media Encoder | | Media Encoder | | | Media Encoder | | Media Encoder | | Media Encoder | | |||
+------------------+ +------------------+ +------------------+ | +------------------+ +------------------+ +------------------+ | |||
| Encoded | Encoded | Encoded | | Encoded | Encoded | Encoded | |||
| Stream | Stream | Stream | | Stream | Stream | Stream | |||
v v v | V V V | |||
+------------------+ +------------------+ +------------------+ | +------------------+ +------------------+ +------------------+ | |||
| Media Packetizer | | Media Packetizer | | Media Packetizer | | | Media Packetizer | | Media Packetizer | | Media Packetizer | | |||
+------------------+ +------------------+ +------------------+ | +------------------+ +------------------+ +------------------+ | |||
| Source | Source | Source | | Source | Source | Source | |||
| Packet | Packet | Packet | | RTP | RTP | RTP | |||
| Stream | Stream | Stream | | Stream | Stream | Stream | |||
+-----------------+ | +-----------------+ | +-----------------+ | +-----------------+ | |||
| | | | | | | | |||
V V V | V V V | |||
+-------------------+ | +-------------------+ | |||
| Media Transport | | | Media Transport | | |||
+-------------------+ | +-------------------+ | |||
Figure 7: Example of Media Source Simulcast | Figure 7: Example of Media Source Simulcast | |||
The simulcast relation between the Packet Streams is the common Media | The simulcast relation between the RTP Streams is the common Media | |||
Source. In addition, to be able to identify the common Media Source, | Source. In addition, to be able to identify the common Media Source, | |||
a receiver of the Packet Stream may need to know which configuration | a receiver of the RTP Stream may need to know which configuration or | |||
or encoding goals that lay behind the produced Encoded Stream and its | encoding goals that lay behind the produced Encoded Stream and its | |||
properties. This to enable selection of the stream that is most | properties. This to enable selection of the stream that is most | |||
useful in the application at that moment. | useful in the application at that moment. | |||
3.3.2. Layered Multi-Stream | 3.8. Layered Multi-Stream | |||
Layered Multi-Stream (LMS) is a mechanism by which different portions | Layered Multi-Stream (LMS) is a mechanism by which different portions | |||
of a layered encoding of a Source Stream are sent using separate | of a layered encoding of a Source Stream are sent using separate RTP | |||
Packet Streams (sometimes in separate RTP Sessions). LMSs are useful | Streams (sometimes in separate RTP Sessions). LMSs are useful for | |||
for receiver control of layered media. | receiver control of layered media. | |||
A Media Source represented as an Encoded Stream and multiple | A Media Source represented as an Encoded Stream and multiple | |||
Dependent Streams constitutes a Media Source that has layered | Dependent Streams constitutes a Media Source that has layered | |||
dependencies. The figure below represents an example of a Media | dependencies. The figure below represents an example of a Media | |||
Source that is encoded into three dependent layers, where two layers | Source that is encoded into three dependent layers, where two layers | |||
are sent on the same Media Transport using different Packet Streams, | are sent on the same Media Transport using different RTP Streams, | |||
i.e. SSRCs, and the third layer is sent on a separate Media | i.e. SSRCs, and the third layer is sent on a separate Media | |||
Transport, i.e. a different RTP Session. | Transport, i.e. a different RTP Session. | |||
+----------------+ | +----------------+ | |||
| Media Source | | | Media Source | | |||
+----------------+ | +----------------+ | |||
| | | | |||
| | | | |||
V | V | |||
+---------------------------------------------------------+ | +---------------------------------------------------------+ | |||
| Media Encoder | | | Media Encoder | | |||
+---------------------------------------------------------+ | +---------------------------------------------------------+ | |||
| | | | | | | | |||
Encoded Stream Dependent Stream Dependent Stream | Encoded Stream Dependent Stream Dependent Stream | |||
| | | | | | | | |||
V V V | V V V | |||
+----------------+ +----------------+ +----------------+ | +----------------+ +----------------+ +----------------+ | |||
|Media Packetizer| |Media Packetizer| |Media Packetizer| | |Media Packetizer| |Media Packetizer| |Media Packetizer| | |||
+----------------+ +----------------+ +----------------+ | +----------------+ +----------------+ +----------------+ | |||
| | | | | | | | |||
Packet Stream Packet Stream Packet Stream | RTP Stream RTP Stream RTP Stream | |||
| | | | | | | | |||
+------+ +------+ | | +------+ +------+ | | |||
| | | | | | | | |||
V V V | V V V | |||
+-----------------+ +-----------------+ | +-----------------+ +-----------------+ | |||
| Media Transport | | Media Transport | | | Media Transport | | Media Transport | | |||
+-----------------+ +-----------------+ | +-----------------+ +-----------------+ | |||
Figure 8: Example of Media Source Layered Dependency | Figure 8: Example of Media Source Layered Dependency | |||
As an example, the SVC MST (Section 3.2.1) relation needs to identify | As an example, the SVC MST (Section 3.5) relation needs to identify | |||
the common Media Encoder origin for the Encoded and Dependent | the common Media Encoder origin for the Encoded and Dependent | |||
Streams. The SVC RTP Payload RFC is not particularly explicit about | Streams. The SVC RTP Payload RFC is not particularly explicit about | |||
how this relation is to be implemented. When using different RTP | how this relation is to be implemented. When using different RTP | |||
Sessions, thus different Media Transports, and as long as there is | Sessions, thus different Media Transports, and as long as there is | |||
only one Packet Stream per Media Encoder and a single Media Source in | only one RTP Stream per Media Encoder and a single Media Source in | |||
each RTP Session (MST-SS (Section 3.2.1)), common SSRC and CNAMEs can | each RTP Session (MST-SS (Section 3.5)), common SSRC and CNAMEs can | |||
be used to identify the common Media Source. When multiple Packet | be used to identify the common Media Source. When multiple RTP | |||
Streams are sent from one Media Encoder in the same RTP Session (SST- | Streams are sent from one Media Encoder in the same RTP Session (SST- | |||
MS), then CNAME is the only currently specified RTP identifier that | MS), then CNAME is the only currently specified RTP identifier that | |||
can be used. In cases where multiple Media Encoders use multiple | can be used. In cases where multiple Media Encoders use multiple | |||
Media Sources sharing Synchronization Context, and thus having a | Media Sources sharing Synchronization Context, and thus having a | |||
common CNAME, additional heuristics need to be applied to create the | common CNAME, additional heuristics need to be applied to create the | |||
MST relationship between the Packet Streams. | MST relationship between the RTP Streams. | |||
3.3.3. Robustness and Repair | 3.9. RTP Stream Duplication | |||
Packet Streams may be protected by Redundancy Packet Streams during | RTP Stream Duplication [RFC7198], using the same or different Media | |||
transport. Several approaches listed below can achieve the same | Transports, and optionally also delaying the duplicate [RFC7197], | |||
result; | offers a simple way to protect media flows from packet loss in some | |||
cases. It is a specific type of redundancy and all but one Source | ||||
RTP Stream (Section 2.1.10) are effectively Redundancy RTP Streams | ||||
(Section 2.1.12), but since both Source and Redundant RTP Streams are | ||||
the same it does not matter which is which. This can also be seen as | ||||
a specific type of Simulcast (Section 3.7) that transmits the same | ||||
Encoded Stream (Section 2.1.7) multiple times. | ||||
o Duplication of the original Packet Stream | +----------------+ | |||
| Media Source | | ||||
+----------------+ | ||||
Source Stream | | ||||
V | ||||
+----------------+ | ||||
| Media Encoder | | ||||
+----------------+ | ||||
Encoded Stream | | ||||
+-----------+-----------+ | ||||
| | | ||||
V V | ||||
+------------------+ +------------------+ | ||||
| Media Packetizer | | Media Packetizer | | ||||
+------------------+ +------------------+ | ||||
Source | RTP Stream Source | RTP Stream | ||||
| V | ||||
| +-------------+ | ||||
| | Delay (opt) | | ||||
| +-------------+ | ||||
| | | ||||
+-----------+-----------+ | ||||
| | ||||
V | ||||
+-------------------+ | ||||
| Media Transport | | ||||
+-------------------+ | ||||
o Duplication of the original Packet Stream with a time offset, | Figure 9: Example of RTP Stream Duplication | |||
o Forward Error Correction (FEC) techniques, and | 3.10. Redundancy Format | |||
o Retransmission of lost packets (either globally or selectively). | The RTP Payload for Redundant Audio Data [RFC2198] defines how one | |||
can transport redundant audio data together with primary data in the | ||||
same RTP payload. The redundant data can be a time delayed version | ||||
of the primary or another time delayed Encoded Stream using a | ||||
different Media Encoder to encode the same Media Source as the | ||||
primary, as depicted below in Figure 10. | ||||
3.3.3.1. RTP Retransmission | +--------------------+ | |||
| Media Source | | ||||
+--------------------+ | ||||
| | ||||
Source Stream | ||||
| | ||||
+------------------------+ | ||||
| | | ||||
V V | ||||
+--------------------+ +--------------------+ | ||||
| Media Encoder | | Media Encoder | | ||||
+--------------------+ +--------------------+ | ||||
| | | ||||
| +------------+ | ||||
Encoded Stream | Time Delay | | ||||
| +------------+ | ||||
| | | ||||
| +------------------+ | ||||
V V | ||||
+--------------------+ | ||||
| Media Packetizer | | ||||
+--------------------+ | ||||
| | ||||
V | ||||
RTP Stream | ||||
The figure below (Figure 9) represents an example where a Media | Figure 10: Concept for usage of Audio Redundancy with different Media | |||
Source's Source Packet Stream is protected by a retransmission (RTX) | Encoders | |||
flow [RFC4588]. In this example the Source Packet Stream and the | ||||
Redundancy Packet Stream share the same Media Transport. | The Redundancy format is thus providing the necessary meta | |||
information to correctly relate different parts of the same Encoded | ||||
Stream, or in the case depicted above (Figure 10) relate the Received | ||||
Source Stream fragments coming out of different Media Decoders to be | ||||
able to combine them together into a less erroneous Source Stream. | ||||
3.11. RTP Retransmission | ||||
The figure below (Figure 11) represents an example where a Media | ||||
Source's Source RTP Stream is protected by a retransmission (RTX) | ||||
flow [RFC4588]. In this example the Source RTP Stream and the | ||||
Redundancy RTP Stream share the same Media Transport. | ||||
+--------------------+ | +--------------------+ | |||
| Media Source | | | Media Source | | |||
+--------------------+ | +--------------------+ | |||
| | | | |||
V | V | |||
+--------------------+ | +--------------------+ | |||
| Media Encoder | | | Media Encoder | | |||
+--------------------+ | +--------------------+ | |||
| Retransmission | | Retransmission | |||
Encoded Stream +--------+ +---- Request | Encoded Stream +--------+ +---- Request | |||
V | V V | V | V V | |||
+--------------------+ | +--------------------+ | +--------------------+ | +--------------------+ | |||
| Media Packetizer | | | RTP Retransmission | | | Media Packetizer | | | RTP Retransmission | | |||
+--------------------+ | +--------------------+ | +--------------------+ | +--------------------+ | |||
| | | | | | | | |||
+------------+ Redundancy Packet Stream | +------------+ Redundancy RTP Stream | |||
Source Packet Stream | | Source RTP Stream | | |||
| | | | | | |||
+---------+ +---------+ | +---------+ +---------+ | |||
| | | | | | |||
V V | V V | |||
+-----------------+ | +-----------------+ | |||
| Media Transport | | | Media Transport | | |||
+-----------------+ | +-----------------+ | |||
Figure 9: Example of Media Source Retransmission Flows | Figure 11: Example of Media Source Retransmission Flows | |||
The RTP Retransmission example (Figure 9) helps illustrate that this | ||||
mechanism works purely on the Source Packet Stream. The RTP | ||||
Retransmission transform buffers the sent Source Packet Stream and | ||||
upon requests emits a retransmitted packet with some extra payload | ||||
header as a Redundancy Packet Stream. The RTP Retransmission | ||||
mechanism [RFC4588] is specified so that there is a one to one | ||||
relation between the Source Packet Stream and the Redundancy Packet | ||||
Stream. Thus a Redundancy Packet Stream needs to be associated with | ||||
its Source Packet Stream upon being received. This is done based on | ||||
CNAME selectors and heuristics to match requested packets for a given | ||||
Source Packet Stream with the original sequence number in the payload | ||||
of any new Redundancy Packet Stream using the RTX payload format. In | ||||
cases where the Redundancy Packet Stream is sent in a separate RTP | ||||
Session from the Source Packet Stream, these sessions are related, | ||||
e.g. using the SDP Media Grouping's [RFC5888] FID semantics. | ||||
3.3.3.2. Forward Error Correction | The RTP Retransmission example (Figure 11) helps illustrate that this | |||
mechanism works purely on the Source RTP Stream. The RTP | ||||
Retransmission transform buffers the sent Source RTP Stream and upon | ||||
requests emits a retransmitted packet with some extra payload header | ||||
as a Redundancy RTP Stream. The RTP Retransmission mechanism | ||||
[RFC4588] is specified so that there is a one to one relation between | ||||
the Source RTP Stream and the Redundancy RTP Stream. Thus a | ||||
Redundancy RTP Stream needs to be associated with its Source RTP | ||||
Stream upon being received. This is done based on CNAME selectors | ||||
and heuristics to match requested packets for a given Source RTP | ||||
Stream with the original sequence number in the payload of any new | ||||
Redundancy RTP Stream using the RTX payload format. In cases where | ||||
the Redundancy RTP Stream is sent in a separate RTP Session from the | ||||
Source RTP Stream, these sessions are related, e.g. using the SDP | ||||
Media Grouping's [RFC5888] FID semantics. | ||||
The figure below (Figure 10) represents an example where two Media | 3.12. Forward Error Correction | |||
Sources' Source Packet Streams are protected by FEC. Source Packet | ||||
Stream A has a Media Redundancy transformation in FEC Encoder 1. | ||||
This produces a Redundancy Packet Stream 1, that is only related to | The figure below (Figure 12) represents an example where two Media | |||
Source Packet Stream A. The FEC Encoder 2, however takes two Source | Sources' Source RTP Streams are protected by FEC. Source RTP Stream | |||
Packet Streams (A and B) and produces a Redundancy Packet Stream 2 | A has a Media Redundancy transformation in FEC Encoder 1. This | |||
that protects them together, i.e. Redundancy Packet Stream 2 relate | produces a Redundancy RTP Stream 1, that is only related to Source | |||
to two Source Packet Streams (a FEC group). FEC decoding, when | RTP Stream A. The FEC Encoder 2, however takes two Source RTP | |||
needed due to packet loss or packet corruption at the receiver, | Streams (A and B) and produces a Redundancy RTP Stream 2 that | |||
requires knowledge about which Source Packet Streams that the FEC | protects them together, i.e. Redundancy RTP Stream 2 relate to two | |||
encoding was based on. | Source RTP Streams (a FEC group). FEC decoding, when needed due to | |||
packet loss or packet corruption at the receiver, requires knowledge | ||||
about which Source RTP Streams that the FEC encoding was based on. | ||||
In Figure 10 all Packet Streams are sent on the same Media Transport. | In Figure 12 all RTP Streams are sent on the same Media Transport. | |||
This is however not the only possible choice. Numerous combinations | This is however not the only possible choice. Numerous combinations | |||
exist for spreading these Packet Streams over different Media | exist for spreading these RTP Streams over different Media Transports | |||
Transports to achieve the communication application's goal. | to achieve the communication application's goal. | |||
+--------------------+ +--------------------+ | +--------------------+ +--------------------+ | |||
| Media Source A | | Media Source B | | | Media Source A | | Media Source B | | |||
+--------------------+ +--------------------+ | +--------------------+ +--------------------+ | |||
| | | | | | |||
V V | V V | |||
+--------------------+ +--------------------+ | +--------------------+ +--------------------+ | |||
| Media Encoder A | | Media Encoder B | | | Media Encoder A | | Media Encoder B | | |||
+--------------------+ +--------------------+ | +--------------------+ +--------------------+ | |||
| | | | | | |||
Encoded Stream Encoded Stream | Encoded Stream Encoded Stream | |||
V V | V V | |||
+--------------------+ +--------------------+ | +--------------------+ +--------------------+ | |||
| Media Packetizer A | | Media Packetizer B | | | Media Packetizer A | | Media Packetizer B | | |||
+--------------------+ +--------------------+ | +--------------------+ +--------------------+ | |||
| | | | | | |||
Source Packet Stream A Source Packet Stream B | Source RTP Stream A Source RTP Stream B | |||
| | | | | | |||
+-----+-------+-------------+ +-------+------+ | +-----+---------+-------------+ +---+---+ | |||
| V V V | | | V V V | | |||
| +---------------+ +---------------+ | | | +---------------+ +---------------+ | | |||
| | FEC Encoder 1 | | FEC Encoder 2 | | | | | FEC Encoder 1 | | FEC Encoder 2 | | | |||
| +---------------+ +---------------+ | | | +---------------+ +---------------+ | | |||
| | | | | | Redundancy | Redundancy | | | |||
| Redundancy PS 1 Redundancy PS 2 | | | RTP Stream 1 | RTP Stream 2 | | | |||
V V V V | V V V V | |||
+----------------------------------------------------------+ | +----------------------------------------------------------+ | |||
| Media Transport | | | Media Transport | | |||
+----------------------------------------------------------+ | +----------------------------------------------------------+ | |||
Figure 10: Example of FEC Flows | Figure 12: Example of FEC Flows | |||
As FEC Encoding exists in various forms, the methods for relating FEC | As FEC Encoding exists in various forms, the methods for relating FEC | |||
Redundancy Packet Streams with its source information in Source | Redundancy RTP Streams with its source information in Source RTP | |||
Packet Streams are many. The XOR based RTP FEC Payload format | Streams are many. The XOR based RTP FEC Payload format [RFC5109] is | |||
defined in such a way that a Redundancy RTP Stream has a one to one | ||||
[RFC5109] is defined in such a way that a Redundancy Packet Stream | relation with a Source RTP Stream. In fact, the RFC requires the | |||
has a one to one relation with a Source Packet Stream. In fact, the | Redundancy RTP Stream to use the same SSRC as the Source RTP Stream. | |||
RFC requires the Redundancy Packet Stream to use the same SSRC as the | This requires to either use a separate RTP session or to use the | |||
Source Packet Stream. This requires to either use a separate RTP | Redundancy RTP Payload format [RFC2198]. The underlying relation | |||
session or to use the Redundancy RTP Payload format [RFC2198]. The | requirement for this FEC format and a particular Redundancy RTP | |||
underlying relation requirement for this FEC format and a particular | Stream is to know the related Source RTP Stream, including its SSRC. | |||
Redundancy Packet Stream is to know the related Source Packet Stream, | ||||
including its SSRC. | ||||
3.3.4. Packet Stream Separation | 3.13. RTP Stream Separation | |||
Packet Streams can be separated exclusively based on their SSRCs or | RTP Streams can be separated exclusively based on their SSRCs, at the | |||
at the RTP Session level or at the Multi-Media Session level as | RTP Session level, or at the Multi-Media Session level. | |||
explained below. | ||||
When the Packet Streams that have a relationship are all sent in the | When the RTP Streams that have a relationship are all sent in the | |||
same RTP Session and are uniquely identified based on their SSRC | same RTP Session and are uniquely identified based on their SSRC | |||
only, it is termed an SSRC-Only Based Separation. Such streams can | only, it is termed an SSRC-Only Based Separation. Such streams can | |||
be related via RTCP CNAME to identify that the streams belong to the | be related via RTCP CNAME to identify that the streams belong to the | |||
same End Point. [RFC5576]-based approaches, when used, can | same End Point. [RFC5576]-based approaches, when used, can | |||
explicitly relate various such Packet Streams. | explicitly relate various such RTP Streams. | |||
On the other hand, when Packet Streams that are related but are sent | On the other hand, when RTP Streams that are related but are sent in | |||
in the context of different RTP Sessions to achieve separation, it is | the context of different RTP Sessions to achieve separation, it is | |||
known as RTP Session-based separation. This is commonly used when | known as RTP Session-based separation. This is commonly used when | |||
the different Packet Streams are intended for different Media | the different RTP Streams are intended for different Media | |||
Transports. | Transports. | |||
Several mechanisms that use RTP Session-based separation rely on it | Several mechanisms that use RTP Session-based separation rely on it | |||
to enable an implicit grouping mechanism expressing the relationship. | to enable an implicit grouping mechanism expressing the relationship. | |||
The solutions have been based on using the same SSRC value in the | The solutions have been based on using the same SSRC value in the | |||
different RTP Sessions to implicitly indicate their relation. That | different RTP Sessions to implicitly indicate their relation. That | |||
way, no explicit RTP level mechanism has been needed, only signaling | way, no explicit RTP level mechanism has been needed, only signaling | |||
level relations have been established using semantics from Grouping | level relations have been established using semantics from Grouping | |||
of Media lines framework [RFC5888]. Examples of this are RTP | of Media lines framework [RFC5888]. Examples of this are RTP | |||
Retransmission [RFC4588], SVC Multi-Session Transmission [RFC6190] | Retransmission [RFC4588], SVC Multi-Session Transmission [RFC6190] | |||
and XOR Based FEC [RFC5109]. RTCP CNAME explicitly relates Packet | and XOR Based FEC [RFC5109]. RTCP CNAME explicitly relates RTP | |||
Streams across different RTP Sessions, as explained in the previous | Streams across different RTP Sessions, as explained in the previous | |||
section. Such a relationship can be used to perform inter-media | section. Such a relationship can be used to perform inter-media | |||
synchronization. | synchronization. | |||
Packet Streams that are related and need to be associated can be part | RTP Streams that are related and need to be associated can be part of | |||
of different Multimedia Sessions, rather than just different RTP | different Multimedia Sessions, rather than just different RTP | |||
sessions within the same Multimedia Session context. This puts | sessions within the same Multimedia Session context. This puts | |||
further demand on the scope of the mechanism(s) and its handling of | further demand on the scope of the mechanism(s) and its handling of | |||
identifiers used for expressing the relationships. | identifiers used for expressing the relationships. | |||
3.4. Multiple RTP Sessions over one Media Transport | 3.14. Multiple RTP Sessions over one Media Transport | |||
[I-D.westerlund-avtcore-transport-multiplexing] describes a mechanism | [I-D.westerlund-avtcore-transport-multiplexing] describes a mechanism | |||
that allow several RTP Sessions to be carried over a single | that allow several RTP Sessions to be carried over a single | |||
underlying Media Transport. The main reasons for doing this are | underlying Media Transport. The main reasons for doing this are | |||
related to the impact of using one or more Media Transports. Thus | related to the impact of using one or more Media Transports. Thus | |||
using a common network path or potentially have different ones. | using a common network path or potentially have different ones. | |||
There is reduced need for NAT/FW traversal resources and no need for | There is reduced need for NAT/FW traversal resources and no need for | |||
flow based QoS. | flow based QoS. | |||
However, Multiple RTP Sessions over one Media Transport makes it | However, Multiple RTP Sessions over one Media Transport makes it | |||
clear that a single Media Transport 5-tuple is not sufficient to | clear that a single Media Transport 5-tuple is not sufficient to | |||
express which RTP Session context a particular Packet Stream exists | express which RTP Session context a particular RTP Stream exists in. | |||
in. Complexities in the relationship between Media Transports and | Complexities in the relationship between Media Transports and RTP | |||
RTP Session already exist as one RTP Session contains multiple Media | Session already exist as one RTP Session contains multiple Media | |||
Transports, e.g. even a Peer-to-Peer RTP Session with RTP/RTCP | Transports, e.g. even a Peer-to-Peer RTP Session with RTP/RTCP | |||
Multiplexing requires two Media Transports, one in each direction. | Multiplexing requires two Media Transports, one in each direction. | |||
The relationship between Media Transports and RTP Sessions as well as | The relationship between Media Transports and RTP Sessions as well as | |||
additional levels of identifiers need to be considered in both | additional levels of identifiers need to be considered in both | |||
signaling design and when defining terminology. | signaling design and when defining terminology. | |||
4. Topologies and Communication Entities | 4. Mapping from Existing Terms | |||
This section reviews some communication topologies and looks at the | This section describes a selected set of terms from some relevant | |||
relationship among the communication entities that are defined in | IETF RFC and Internet Drafts (at the time of writing), using the | |||
Section 2.2. It does not deal with discussions about the streams and | concepts from previous sections. | |||
their relation to the transport. Instead, it covers the aspects that | ||||
enable the transport of those streams. For example, the Media | ||||
Transports (Section 2.1.13) that exists between the End Points | ||||
(Section 2.2.1) that are part of an RTP session (Section 2.2.2) and | ||||
their relationship to the Multi-Media Session (Section 2.2.4) between | ||||
Participants (Section 2.2.3) and the established Communication | ||||
session (Section 2.2.5) are explained. | ||||
The text provided below is neither any exhaustive listing of possible | 4.1. Audio Capture | |||
topologies, nor does it cover all topologies described in | ||||
[I-D.ietf-avtcore-rtp-topologies-update]. | ||||
4.1. Point-to-Point Communication | Telepresence specifications from CLUE WG uses this term to describe | |||
an audio Media Source (Section 2.1.4). | ||||
Figure 11 shows a very basic point-to-point communication session | 4.2. Capture Device | |||
between A and B. It uses two different audio and video RTP sessions | ||||
between A's and B's end points. Assume that the Multi-media session | ||||
shared by the participants is established using SIP (i.e., there is a | ||||
SIP Dialog between A and B). The high level representation of this | ||||
communication scenario can be demonstrated using Figure 11. | ||||
+---+ +---+ | Telepresence specifications from CLUE WG use this term to identify a | |||
| A |<------->| B | | physical entity performing a Media Capture (Section 2.1.2) | |||
+---+ +---+ | transformation. | |||
Figure 11: Point to Point Communication | 4.3. Capture Encoding | |||
However, this picture gets slightly more complex when redrawn using | Telepresence specifications from CLUE WG uses this term to describe | |||
the communication entities concepts defined earlier in this document. | an Encoded Stream (Section 2.1.7) related to CLUE specific semantic | |||
information. | ||||
+-----------------------------------------------------------+ | 4.4. Capture Scene | |||
| Communication Session | | ||||
| | | ||||
| +----------------+ +----------------+ | | ||||
| | Participant A | +-------------+ | Participant B | | | ||||
| | | | Multi-Media | | | | | ||||
| | +-------------+|<=>| Session |<=>|+-------------+ | | | ||||
| | | End Point A || |(SIP Dialog) | || End Point B | | | | ||||
| | | || +-------------+ || | | | | ||||
| | | +-----------++---------------------++-----------+ | | | | ||||
| | | | RTP Session| | | | | | | ||||
| | | | Audio |---Media Transport-->| | | | | | ||||
| | | | |<--Media Transport---| | | | | | ||||
| | | +-----------++---------------------++-----------+ | | | | ||||
| | | || || | | | | ||||
| | | +-----------++---------------------++-----------+ | | | | ||||
| | | | RTP Session| | | | | | | ||||
| | | | Video |---Media Transport-->| | | | | | ||||
| | | | |<--Media Transport---| | | | | | ||||
| | | +-----------++---------------------++-----------+ | | | | ||||
| | +-------------+| |+-------------+ | | | ||||
| +----------------+ +----------------+ | | ||||
+-----------------------------------------------------------+ | ||||
Figure 12: Point to Point Communication Session with two RTP Sessions | Telepresence specifications from CLUE WG uses this term to describe a | |||
set of spatially related Media Sources (Section 2.1.4). | ||||
Figure 12 shows the two RTP Sessions only exist between the two End | 4.5. Endpoint | |||
Points A and B and over their respective Media Transports. The | ||||
Multi-Media Session establishes the association between the two | ||||
Participants and configures these RTP sessions and the Media | ||||
Transports that are used. | ||||
4.2. Centralized Conferencing | Telepresence specifications from CLUE WG use this term to describe | |||
exactly one Participant (Section 2.2.3) and one or more End Points | ||||
(Section 2.2.1). | ||||
This section looks at the centralized conferencing communication | 4.6. Individual Encoding | |||
topology, where a number of participants, like A, B, C, and D in | ||||
Figure 13, communicate using an RTP mixer. | ||||
+---+ +------------+ +---+ | Telepresence specifications from CLUE WG use this term to describe | |||
| A |<---->| |<---->| B | | the configuration information needed to perform a Media Encoder | |||
+---+ | | +---+ | (Section 2.1.6) transformation. | |||
| Mixer | | ||||
+---+ | | +---+ | ||||
| C |<---->| |<---->| D | | ||||
+---+ +------------+ +---+ | ||||
Figure 13: Centralized Conferincing using an RTP Mixer | 4.7. Multipoint Control Unit (MCU) | |||
In this case each of the Participants establish their Multi-media | This term is commonly used to describe the central node in any type | |||
session with the Conference Bridge. Thus, negotiation for the | of star topology [I-D.ietf-avtcore-rtp-topologies-update] conference. | |||
establishment of the used RTP sessions and their configuration | It describes a device that includes one Participant (Section 2.2.3) | |||
happens between these entities. The participants have their End | (usually corresponding to a so-called conference focus) and one or | |||
Points (A, B, C, D) and the Conference Bridge has the host running | more related End Points (Section 2.2.1) (sometimes one or more per | |||
the RTP mixer, referred to as End Point M in Figure 14. However, | conference participant). | |||
despite the individual establishment of four Multi-Media Sessions and | ||||
the corresponding Media Transports for each of the RTP sessions | ||||
between the respective End Points and the Conference Bridge, there is | ||||
actually only two RTP sessions. One for audio and one for Video, as | ||||
these RTP sessions are, in this topology, shared between all the | ||||
Participants. | ||||
+-------------------------------------------------------------------+ | 4.8. Media Capture | |||
| Communication Session | | ||||
| | | ||||
| +----------------+ +----------------+ | | ||||
| | Participant A | +-------------+ | Conference | | | ||||
| | | | Multi-Media | | Bridge | | | ||||
| | +-------------+|<=====>| Session A |<=====>|+-------------+ | | | ||||
| | | End Point A || |(SIP Dialog) | || End Point M | | | | ||||
| | | || +-------------+ || | | | | ||||
| | | +-----------++-----------------------------++-----------+ | | | | ||||
| | | | RTP Session| | | | | | | ||||
| | | | Audio |-------Media Transport------>| | | | | | ||||
| | | | |<------Media Transport-------| | | | | | ||||
| | | +-----------++-----------------------------++------+ | | | | | ||||
| | | || || | | | | | | ||||
| | | +-----------++-----------------------------++----+ | | | | | | ||||
| | | | RTP Session| | | | | | | | | ||||
| | | | Video |-------Media Transport------>| | | | | | | | ||||
| | | | |<------Media Transport-------| | | | | | | | ||||
| | | +-----------++-----------------------------++ | | | | | | | ||||
| | +-------------+| || | | | | | | | ||||
| +----------------+ || | | | | | | | ||||
| || | | | | | | | ||||
| +----------------+ || | | | | | | | ||||
| | Participant B | +-------------+ || | | | | | | | ||||
| | | | Multi-Media | || | | | | | | | ||||
| | +-------------+|<=====>| Session B |<=====>|| | | | | | | | ||||
| | | End Point B || |(SIP Dialog) | || | | | | | | | ||||
| | | || +-------------+ || | | | | | | | ||||
| | | +-----------++-----------------------------++ | | | | | | | ||||
| | | | RTP Session| | | | | | | | | ||||
| | | | Video |-------Media Transport------>| | | | | | | | ||||
| | | | |<------Media Transport-------| | | | | | | | ||||
| | | +-----------++-----------------------------++----+ | | | | | | ||||
| | | || || | | | | | | ||||
| | | +-----------++-----------------------------++------+ | | | | | ||||
| | | | RTP Session| | | | | | | ||||
| | | | Audio |-------Media Transport------>| | | | | | ||||
| | | | |<------Media Transport-------| | | | | | ||||
| | | +-----------++-----------------------------++-----------+ | | | | ||||
| | +-------------+| |+-------------+ | | | ||||
| +----------------+ +----------------+ | | ||||
+-------------------------------------------------------------------+ | ||||
Figure 14: Centralized Conferencing with Two Participants A and B | Telepresence specifications from CLUE WG uses this term to describe | |||
communicating over a Conference Bridge | either a Media Capture (Section 2.1.2) or a Media Source | |||
(Section 2.1.4), depending on in which context the term is used. | ||||
It is important to stress that in the case of Figure 14, it might | 4.9. Media Consumer | |||
appear that the Multi-Media Sessions context is scoped between A and | ||||
B over M. This might not be always true and they can have contexts | ||||
that extend further. In this case the RTP session, its common SSRC | ||||
space goes beyond what occurs between A and M and B and M | ||||
respectively. | ||||
4.3. Full Mesh Conferencing | Telepresence specifications from CLUE WG use this term to describe | |||
the media receiving part of an End Point (Section 2.2.1). | ||||
This section looks at the case where the three Participants (A, B and | 4.10. Media Description | |||
C) wish to communicate. They establish individual Multi-Media | ||||
Sessions and RTP sessions between themselves and the other two peers. | ||||
Thus, each providing two copies of their media to every other | ||||
participant. Figure 15 shows a high level representation of such a | ||||
topology. | ||||
+---+ +---+ | A single Source Description Protocol (SDP) [RFC4566] media | |||
| A |<---->| B | | description (or media block; an m-line and all subsequent lines until | |||
+---+ +---+ | the next m-line or the end of the SDP) describes part of the | |||
^ ^ | necessary configuration and identification information needed for a | |||
\ / | Media Encoder transformation, as well as the necessary configuration | |||
\ / | and identification information for the Media Decoder to be able to | |||
v v | correctly interpret a received RTP Stream. | |||
+---+ | ||||
| C | | ||||
+---+ | ||||
Figure 15: Full Mesh Conferencing with three Participants A, B and C | A Media Description typically relates to a single Media Source. This | |||
is for example an explicit restriction in WebRTC. However, nothing | ||||
prevents that the same Media Description (and same RTP Session) is | ||||
re-used for multiple Media Sources | ||||
[I-D.ietf-avtcore-rtp-multi-stream]. It can thus describe properties | ||||
of one or more RTP Streams, and can also describe properties valid | ||||
for an entire RTP Session (via [RFC5576] mechanisms, for example). | ||||
In this particular case there are two aspects worth noting. The | 4.11. Media Provider | |||
first is there will be multiple Multi-Media Sessions per | ||||
Communication Session between the participants. This, however, | ||||
hasn't been true in the earlier examples; the Centralized | ||||
Conferencing inSection 4.2 being the exception. The second aspect is | ||||
consideration of whether one needs to maintain relationships between | ||||
entities and concepts, for example Media Sources, between these | ||||
different Multi-Media Sessions and between Packet Streams in the | ||||
independent RTP sessions configured by those Multi-Media Sessions. | ||||
+-----------------------------------------+ | Telepresence specifications from CLUE WG use this term to describe | |||
| Participant A | | the media sending part of an End Point (Section 2.2.1). | |||
+----------+ | +--------------------------------------+| | ||||
| Multi- | | | End Point A || | ||||
| Media |<======>| | || | ||||
| Session | | |+-------+ +-------+ +-------+ || | ||||
| 1 | | || RTP 1 |<----| MS A1 |---->| RTP 2 | || | ||||
+----------+ | || | +-------+ | | || | ||||
^^ | +|-------|-------------------|-------|-+| | ||||
|| +--|-------|-------------------|-------|--+ | ||||
|| | | ^^ | | | ||||
VV | | || | | | ||||
+-------------------------|-------|----+ || | | | ||||
| Participant B | | | VV | | | ||||
| +-----------------------|-------|---+| +----------+ | | | ||||
| | End Point B +----->| | || | Multi- | | | | ||||
| | | +-------+ || | Media | | | | ||||
| | +-------+ | +-------+ || | Session | | | | ||||
| | | MS B1 |------+----->| RTP 3 | || | 2 | | | | ||||
| | +-------+ | | || +----------+ | | | ||||
| +-----------------------|-------|---+| ^^ | | | ||||
+-------------------------|-------|----+ || | | | ||||
^^ | | || | | | ||||
|| | | VV | | | ||||
|| +--|-------|-------------------|-------|--+ | ||||
VV | | | Participant C | | | | ||||
+----------+ | +|-------|-------------------|-------|-+| | ||||
| Multi- | | || | End Point C | | || | ||||
| Media |<======>| |+-------+ +-------+ || | ||||
| Session | | | ^ +-------+ ^ || | ||||
| 3 | | | +---------| MS C1 |---------+ || | ||||
+----------+ | | +-------+ || | ||||
| +--------------------------------------+| | ||||
+-----------------------------------------+ | ||||
Figure 16: Full Mesh Conferencing between three Participants A, B and | 4.12. Media Stream | |||
C | ||||
For the sake of clarity, Figure 16 above does not include all these | RTP [RFC3550] uses media stream, audio stream, video stream, and | |||
concepts. The Media Sources (MS) from a given End Point is sent to | stream of (RTP) packets interchangeably, which are all RTP Streams. | |||
the two peers. This requires encoding and Media Packetization to | ||||
enable the Packet Streams to be sent over Media Transports in the | ||||
context of the RTP sessions depicted. The RTP sessions 1, 2, and 3 | ||||
are independent, and established in the context of each of the Multi- | ||||
Media Sessions 1, 2 and 3. The joint communication session the full | ||||
figure represents (not shown here as it was Figure 14 in order to | ||||
save space), however, combines the received representations of the | ||||
peers' Media Sources and plays them back. | ||||
It is noteworthy that the full mesh conferencing topologies described | 4.13. Multimedia Session | |||
here have the potential for creating loops. For example, if one | ||||
compares the above full mesh with a mixing three party communication | ||||
session as depicted in (Figure 17). In this example A's Media Source | ||||
A1 is sent to B over a Multi-Media Session (A-B). In B the Media | ||||
Source A1 is mixed with Media Source B1 and the resulting Media | ||||
Source (MS AB) is sent to C over a Multi-Media Session (B-C). If C | ||||
and A would establish a Multi-Media Session (A-C) and C would act in | ||||
the same role as B, then A would receive a Media Source from C that | ||||
contains a mix of A, B and C's individual Media Sources. This would | ||||
result in A playing out a time delay version of its own signal (i.e., | ||||
the system has created an echo path). | ||||
+--------------+ +--------------+ +--------------+ | SDP [RFC4566] defines a multimedia session as a set of multimedia | |||
| A | | B +-------+ | | C | | senders and receivers and the data streams flowing from senders to | |||
| | | | MS B1 | | | | | receivers, which would correspond to a set of End Points and the RTP | |||
| | | +-------+ | | | | Streams that flow between them. In this memo, Multimedia Session | |||
| +-------+ | | | | | | | also assumes those End Points belong to a set of Participants that | |||
| | MS A1 |----|--->|-----+ MS AB -|--->| | | are engaged in communication via a set of related RTP Streams. | |||
| +-------+ | | | | | | ||||
+--------------+ +--------------+ +--------------+ | ||||
Figure 17: Mixing Three Party Communication Session | RTP [RFC3550] defines a multimedia session as a set of concurrent RTP | |||
Sessions among a common group of participants. For example, a video | ||||
conference may contain an audio RTP Session and a video RTP Session. | ||||
This would correspond to a group of Participants (each using one or | ||||
more End Points) sharing a set of concurrent RTP Sessions. In this | ||||
memo, Multimedia Session also defines those RTP Sessions to have some | ||||
relation and be part of a communication among the Participants. | ||||
The looping issue can be avoided, detected or prevented using two | 4.14. Recording Device | |||
general methods. The first method is to use great care when setting | ||||
up and establishing the communication session if participants have | ||||
any mixing or forwarding capacity, so that one doesn't end up getting | ||||
back a partial or full representation of one's own media believing it | ||||
is someone else's. The other method is to maintain some unique | ||||
identifiers at the communication session level for all Media Sources | ||||
and ensure that any Packet Streams received identify those Media | ||||
Sources that contributed to the content of the Packet Stream. | ||||
4.4. Source-Specific Multicast | WebRTC specifications use this term to refer to locally available | |||
entities performing a Media Capture (Section 2.1.2) transformation. | ||||
In one-to-many media distribution cases (e.g., IPTV), where one Media | 4.15. RtcMediaStream | |||
Sender or a set of Media Senders is allowed to send Packet Streams on | ||||
a particular Source-Specific Multicast (SSM) group to many receivers | ||||
(R), there are some different aspects to consider. Figure 18 | ||||
presents a high level SSM system for RTP/RTCP defined in [RFC5760]. | ||||
In this case, several Media Senders sends their Packet Streams to the | ||||
Distribution Source, which is the only one allowed to send to the SSM | ||||
group. The Receivers joining the SSM group can provide RTCP feedback | ||||
on its reception by sending unicast feedback to a Feedback Target | ||||
(FT). | ||||
+--------+ +-----+ | A WebRTC RtcMediaStreamTrack is a set of Media Sources | |||
|Media | | | Source-Specific | (Section 2.1.4) sharing the same Synchronization Context | |||
|Sender 1|<----->| D S | Multicast (SSM) | (Section 3.1). | |||
+--------+ | I O | +--+----------------> R(1) | ||||
| S U | | | | | ||||
+--------+ | T R | | +-----------> R(2) | | ||||
|Media |<----->| R C |->+ | : | | | ||||
|Sender 2| | I E | | +------> R(n-1) | | | ||||
+--------+ | B | | | | | | | ||||
: | U | +--+--> R(n) | | | | ||||
: | T +-| | | | | | ||||
: | I | |<---------+ | | | | ||||
+--------+ | O |F|<---------------+ | | | ||||
|Media | | N |T|<--------------------+ | | ||||
|Sender M|<----->| | |<-------------------------+ | ||||
+--------+ +-----+ RTCP Unicast | ||||
FT = Feedback Target | 4.16. RtcMediaStreamTrack | |||
Figure 18: Source-Specific Multicast Communication Topology | A WebRTC RtcMediaStreamTrack is a Media Source (Section 2.1.4). | |||
Here the Media Transport from the Distribution Source to all the SSM | 4.17. RTP Sender | |||
receivers (R) have the same 5-tuple, but in reality have different | ||||
paths. Also, the Multi-Media Sessions between the Distribution | ||||
Source and the individual receivers are normally identical. This is | ||||
due to one-way communication from the Distribution Source to the | ||||
receiver of configuration information. This is information typically | ||||
embedded in Electronic Program Guides (EPGs), distributed by the | ||||
Session Announcement Protocol (SAP) [RFC2974] or other one-way | ||||
protocols. In some cases load balancing occurs, for example, by | ||||
providing the receiver with a set of Feedback Targets and then it | ||||
randomly selects one out of the set. | ||||
This scenario varies significantly from previously described | RTP [RFC3550] uses this term, which can be seen as the RTP protocol | |||
communication topologies due to the asymmetric nature of the RTP | part of a Media Packetizer (Section 2.1.9). | |||
Session context across the Distribution Source. The Distribution | ||||
Source forms a focal point in collecting the unicasted RTCP feedback | ||||
from the receivers and then re-distributing it to the Media Senders. | ||||
Each Media Sender and the Distribution Source establish their own | ||||
Multi-Media Session Context for the underlying RTP Sessions but with | ||||
shared RTCP context across all the receivers. | ||||
To improve the readability,Figure 18 intentionally hides the details | 4.18. RTP Session | |||
of the various entities . Expanding on this, one can think of Media | ||||
Senders being part of one or more Multi-Media Sessions grouped under | Within the context of SDP, a singe m=line can map to a single RTP | |||
a Communication Session. The Media Sender in this scenario refers to | Session or multiple m=lines can map to a single RTP Session. The | |||
the Media Packetizer transformation Section 2.1.9. The Packet Stream | latter is enabled via multiplexing schemes such as BUNDLE | |||
generated by such a Media Sender can be part of its own RTP Session | [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which allows | |||
or can be multiplexed with other Packet Streams within an End Point. | mapping of multiple m=lines to a single RTP Session. | |||
The latter case requires careful consideration since the re- | ||||
distributed RTCP packets now correspond to a single RTP Session | Editor's note: Consider if the contents of Section 2.2.2 should be | |||
Context across all the Media Senders. | moved here, or if this section should be kept and refer to the | |||
above. | ||||
4.19. SSRC | ||||
RTP [RFC3550] defines this as "the source of a stream of RTP | ||||
packets", which indicates that an SSRC is not only a unique | ||||
identifier for the Encoded Stream (Section 2.1.7) carried in those | ||||
packets, but is also effectively used as a term to denote a Media | ||||
Packetizer (Section 2.1.9). | ||||
4.20. Stream | ||||
Telepresence specifications from CLUE WG use this term to describe an | ||||
RTP Stream (Section 2.1.10). | ||||
4.21. Video Capture | ||||
Telepresence specifications from CLUE WG uses this term to describe a | ||||
video Media Source (Section 2.1.4). | ||||
5. Security Considerations | 5. Security Considerations | |||
This document simply tries to clarify the confusion prevalent in RTP | This document simply tries to clarify the confusion prevalent in RTP | |||
taxonomy because of inconsistent usage by multiple technologies and | taxonomy because of inconsistent usage by multiple technologies and | |||
protocols making use of the RTP protocol. It does not introduce any | protocols making use of the RTP protocol. It does not introduce any | |||
new security considerations beyond those already well documented in | new security considerations beyond those already well documented in | |||
the RTP protocol [RFC3550] and each of the many respective | the RTP protocol [RFC3550] and each of the many respective | |||
specifications of the various protocols making use of it. | specifications of the various protocols making use of it. | |||
skipping to change at page 41, line 46 | skipping to change at page 36, line 33 | |||
Magnus Westerlund has contributed the concept model for the media | Magnus Westerlund has contributed the concept model for the media | |||
chain using transformations and streams model, including rewriting | chain using transformations and streams model, including rewriting | |||
pre-existing concepts into this model and adding missing concepts. | pre-existing concepts into this model and adding missing concepts. | |||
The first proposal for updating the relationships and the topologies | The first proposal for updating the relationships and the topologies | |||
based on this concept was also performed by Magnus. | based on this concept was also performed by Magnus. | |||
8. IANA Considerations | 8. IANA Considerations | |||
This document makes no request of IANA. | This document makes no request of IANA. | |||
9. References | 9. Informative References | |||
9.1. Normative References | ||||
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. | ||||
Jacobson, "RTP: A Transport Protocol for Real-Time | ||||
Applications", STD 64, RFC 3550, July 2003. | ||||
9.2. Informative References | ||||
[I-D.ietf-avtcore-clksrc] | [I-D.ietf-avtcore-clksrc] | |||
Williams, A., Gross, K., Brandenburg, R., and H. Stokking, | Williams, A., Gross, K., Brandenburg, R., and H. Stokking, | |||
"RTP Clock Source Signalling", draft-ietf-avtcore- | "RTP Clock Source Signalling", draft-ietf-avtcore- | |||
clksrc-09 (work in progress), December 2013. | clksrc-11 (work in progress), March 2014. | |||
[I-D.ietf-avtcore-rtp-multi-stream] | ||||
Lennox, J., Westerlund, M., Wu, W., and C. Perkins, | ||||
"Sending Multiple Media Streams in a Single RTP Session", | ||||
draft-ietf-avtcore-rtp-multi-stream-04 (work in progress), | ||||
May 2014. | ||||
[I-D.ietf-avtcore-rtp-topologies-update] | [I-D.ietf-avtcore-rtp-topologies-update] | |||
Westerlund, M. and S. Wenger, "RTP Topologies", draft- | Westerlund, M. and S. Wenger, "RTP Topologies", draft- | |||
ietf-avtcore-rtp-topologies-update-01 (work in progress), | ietf-avtcore-rtp-topologies-update-02 (work in progress), | |||
October 2013. | May 2014. | |||
[I-D.ietf-clue-framework] | [I-D.ietf-clue-framework] | |||
Duckworth, M., Pepperell, A., and S. Wenger, "Framework | Duckworth, M., Pepperell, A., and S. Wenger, "Framework | |||
for Telepresence Multi-Streams", draft-ietf-clue- | for Telepresence Multi-Streams", draft-ietf-clue- | |||
framework-14 (work in progress), February 2014. | framework-15 (work in progress), May 2014. | |||
[I-D.ietf-mmusic-sdp-bundle-negotiation] | [I-D.ietf-mmusic-sdp-bundle-negotiation] | |||
Holmberg, C., Alvestrand, H., and C. Jennings, | Holmberg, C., Alvestrand, H., and C. Jennings, | |||
"Multiplexing Negotiation Using Session Description | "Negotiating Media Multiplexing Using the Session | |||
Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- | Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- | |||
bundle-negotiation-05 (work in progress), October 2013. | negotiation-07 (work in progress), April 2014. | |||
[I-D.ietf-rtcweb-overview] | [I-D.ietf-rtcweb-overview] | |||
Alvestrand, H., "Overview: Real Time Protocols for Brower- | Alvestrand, H., "Overview: Real Time Protocols for | |||
based Applications", draft-ietf-rtcweb-overview-08 (work | Browser-based Applications", draft-ietf-rtcweb-overview-10 | |||
in progress), September 2013. | (work in progress), June 2014. | |||
[I-D.westerlund-avtcore-transport-multiplexing] | [I-D.westerlund-avtcore-transport-multiplexing] | |||
Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP | Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP | |||
Sessions onto a Single Lower-Layer Transport", draft- | Sessions onto a Single Lower-Layer Transport", draft- | |||
westerlund-avtcore-transport-multiplexing-07 (work in | westerlund-avtcore-transport-multiplexing-07 (work in | |||
progress), October 2013. | progress), October 2013. | |||
[RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., | [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., | |||
Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- | Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- | |||
Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, | Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, | |||
September 1997. | September 1997. | |||
[RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session | [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. | |||
Announcement Protocol", RFC 2974, October 2000. | Jacobson, "RTP: A Transport Protocol for Real-Time | |||
Applications", STD 64, RFC 3550, July 2003. | ||||
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model | ||||
with Session Description Protocol (SDP)", RFC 3264, June | ||||
2002. | ||||
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and | [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and | |||
Video Conferences with Minimal Control", STD 65, RFC 3551, | Video Conferences with Minimal Control", STD 65, RFC 3551, | |||
July 2003. | July 2003. | |||
[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session | [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session | |||
Description Protocol", RFC 4566, July 2006. | Description Protocol", RFC 4566, July 2006. | |||
[RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. | [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. | |||
Hakenberg, "RTP Retransmission Payload Format", RFC 4588, | Hakenberg, "RTP Retransmission Payload Format", RFC 4588, | |||
skipping to change at page 43, line 35 | skipping to change at page 38, line 15 | |||
[RFC5109] Li, A., "RTP Payload Format for Generic Forward Error | [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error | |||
Correction", RFC 5109, December 2007. | Correction", RFC 5109, December 2007. | |||
[RFC5404] Westerlund, M. and I. Johansson, "RTP Payload Format for | [RFC5404] Westerlund, M. and I. Johansson, "RTP Payload Format for | |||
G.719", RFC 5404, January 2009. | G.719", RFC 5404, January 2009. | |||
[RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific | [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific | |||
Media Attributes in the Session Description Protocol | Media Attributes in the Session Description Protocol | |||
(SDP)", RFC 5576, June 2009. | (SDP)", RFC 5576, June 2009. | |||
[RFC5760] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control | ||||
Protocol (RTCP) Extensions for Single-Source Multicast | ||||
Sessions with Unicast Feedback", RFC 5760, February 2010. | ||||
[RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description | [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description | |||
Protocol (SDP) Grouping Framework", RFC 5888, June 2010. | Protocol (SDP) Grouping Framework", RFC 5888, June 2010. | |||
[RFC5905] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network | [RFC5905] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network | |||
Time Protocol Version 4: Protocol and Algorithms | Time Protocol Version 4: Protocol and Algorithms | |||
Specification", RFC 5905, June 2010. | Specification", RFC 5905, June 2010. | |||
[RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, | [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, | |||
"RTP Payload Format for Scalable Video Coding", RFC 6190, | "RTP Payload Format for Scalable Video Coding", RFC 6190, | |||
May 2011. | May 2011. | |||
[RFC6222] Begen, A., Perkins, C., and D. Wing, "Guidelines for | [RFC7160] Petit-Huguenin, M. and G. Zorn, "Support for Multiple | |||
Choosing RTP Control Protocol (RTCP) Canonical Names | Clock Rates in an RTP Session", RFC 7160, April 2014. | |||
(CNAMEs)", RFC 6222, April 2011. | ||||
[RFC7197] Begen, A., Cai, Y., and H. Ou, "Duplication Delay | ||||
Attribute in the Session Description Protocol", RFC 7197, | ||||
April 2014. | ||||
[RFC7198] Begen, A. and C. Perkins, "Duplicating RTP Streams", RFC | ||||
7198, April 2014. | ||||
Appendix A. Changes From Earlier Versions | Appendix A. Changes From Earlier Versions | |||
NOTE TO RFC EDITOR: Please remove this section prior to publication. | NOTE TO RFC EDITOR: Please remove this section prior to publication. | |||
A.1. Modifications Between WG Version -00 and -03 | A.1. Modifications Between WG Version -01 and -02 | |||
o Major re-structure | ||||
o Moved media chain Media Transport detailing up one section level | ||||
o Collapsed level 2 sub-sections of section 3 and thus moved level 3 | ||||
sub-sections up one level, gathering some introductory text into | ||||
the beginning of section 3 | ||||
o Added that not only SSRC collision, but also a clock rate change | ||||
[RFC7160] is a valid reason to change SSRC value for an RTP stream | ||||
o Added a sub-section on clock source signaling | ||||
o Added a sub-section on RTP stream duplication | ||||
o Elaborated a bit in section 2.2.1 on the relation between End | ||||
Points, Participants and CNAMEs | ||||
o Elaborated a bit in section 2.2.4 on Multimedia Session and | ||||
synchronization contexts | ||||
o Removed the section on CLUE scenes defining an implicit | ||||
synchronization context, since it was incorrect | ||||
o Clarified text on SVC SST and MST according to list discussions | ||||
o Removed the entire topology section to avoid possible | ||||
inconsistencies or duplications with draft-ietf-avtcore-rtp- | ||||
topologies-update, but saved one example overview figure of | ||||
Communication Entities into that section | ||||
o Added a section 4 on mapping from existing terms with one sub- | ||||
section per term, mainly by moving text from sections 2 and 3 | ||||
o Changed all occurrences of Packet Stream to RTP Stream | ||||
o Moved all normative references to informative, since this is an | ||||
informative document | ||||
o Added references to RFC 7160, RFC 7197 and RFC 7198, and removed | ||||
unused references | ||||
A.2. Modifications Between WG Version -00 and -01 | ||||
o WG version -00 text is identical to individual draft -03 | o WG version -00 text is identical to individual draft -03 | |||
o Amended description of SVC SST and MST encodings with respect to | o Amended description of SVC SST and MST encodings with respect to | |||
concepts defined in this text | concepts defined in this text | |||
o Removed UML as normative reference, since the text no longer uses | o Removed UML as normative reference, since the text no longer uses | |||
any UML notation | any UML notation | |||
o Removed a number of level 4 sections and moved out text to the | o Removed a number of level 4 sections and moved out text to the | |||
level above | level above | |||
A.2. Modifications Between Version -02 and -03 | A.3. Modifications Between Version -02 and -03 | |||
o Section 4 rewritten (and new communication topologies added) to | o Section 4 rewritten (and new communication topologies added) to | |||
reflect the major updates to Sections 1-3 | reflect the major updates to Sections 1-3 | |||
o Section 8 removed (carryover from initial -00 draft) | o Section 8 removed (carryover from initial -00 draft) | |||
o General clean up of text, grammar and nits | o General clean up of text, grammar and nits | |||
A.3. Modifications Between Version -01 and -02 | A.4. Modifications Between Version -01 and -02 | |||
o Section 2 rewritten to add both streams and transformations in the | o Section 2 rewritten to add both streams and transformations in the | |||
media chain. | media chain. | |||
o Section 3 rewritten to focus on exposing relationships. | o Section 3 rewritten to focus on exposing relationships. | |||
A.4. Modifications Between Version -00 and -01 | A.5. Modifications Between Version -00 and -01 | |||
o Too many to list | o Too many to list | |||
o Added new authors | o Added new authors | |||
o Updated content organization and presentation | o Updated content organization and presentation | |||
Authors' Addresses | Authors' Addresses | |||
Jonathan Lennox | Jonathan Lennox | |||
Vidyo, Inc. | Vidyo, Inc. | |||
433 Hackensack Avenue | 433 Hackensack Avenue | |||
Seventh Floor | Seventh Floor | |||
Hackensack, NJ 07601 | Hackensack, NJ 07601 | |||
US | US | |||
Email: jonathan@vidyo.com | Email: jonathan@vidyo.com | |||
Kevin Gross | Kevin Gross | |||
skipping to change at page 45, line 38 | skipping to change at page 41, line 22 | |||
Gonzalo Salgueiro | Gonzalo Salgueiro | |||
Cisco Systems | Cisco Systems | |||
7200-12 Kit Creek Road | 7200-12 Kit Creek Road | |||
Research Triangle Park, NC 27709 | Research Triangle Park, NC 27709 | |||
US | US | |||
Email: gsalguei@cisco.com | Email: gsalguei@cisco.com | |||
Bo Burman | Bo Burman | |||
Ericsson | Ericsson | |||
Farogatan 6 | Kistavagen 25 | |||
SE-164 80 Kista | SE-164 80 Kista | |||
Sweden | Sweden | |||
Phone: +46 10 714 13 11 | Phone: +46 10 714 13 11 | |||
Email: bo.burman@ericsson.com | Email: bo.burman@ericsson.com | |||
End of changes. 203 change blocks. | ||||
954 lines changed or deleted | 828 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |