AVTEXT Working Group J. Xia Internet-Draft Huawei Intended status: InformationalFebruary 20,July 9, 2012 Expires:August 23, 2012January 10, 2013 Content Splicing for RTP Sessionsdraft-ietf-avtext-splicing-for-rtp-07draft-ietf-avtext-splicing-for-rtp-08 AbstractThis memo outlines RTP splicing. SplicingContent splicing is a process that replaces the content ofthea main multimedia stream with other multimedia content, and delivers the substitutive multimedia content toreceiverthe receivers for a period of time. Splicing is commonly used for local advertisement insertion by cable operators, replacing a national advertisement content with a local advertisement. This memoprovidesdescribes someRTP splicingusecases, then we enumeratecases for content splicing and a set of requirementsand analyze whether an existing RTP level middlebox can meet these requirements, at last we providefor splicing content delivered by RTP. It provides concrete guidelines for howthe chosen middlebox worksa RTP mixer can be used to handleRTPcontent splicing. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onAugust 23, 2012.January 10, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. System Model and Terminology . . . . . . . . . . . . . . . . . 3 3. Requirements for RTP Splicing . . . . . . . .3 3. RTP Splicing Discussion and Requirements . . .. . . . . . . .46 4.Recommended SolutionContent Splicing for RTPSplicingsessions . . . . . . . . . . . . . . 7 4.1. RTP Processing in RTP Mixer . . . . . . . . . . . . . . . 7 4.2. RTCP Processing in RTP Mixer . . . . . . . . . . . . . . .98 4.3. Media Clipping Considerations . . . . . . . . . . . . . . 10 4.4. Congestion Control Considerations . . . . . . . . . . . . 11 4.5. Processing Splicing in User Invisibility Case . . . . . .1312 5. Implementation Considerations . . . . . . . . . . . . . . . . 13 6. Security Considerations . . . . . . . . . . . . . . . . . . .1413 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 9.Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 14 9.1. draft-xia-avtext-splicing-for-rtp-01 . . . . . . . .10. Appendix- Why Mixer Is Chosen . . .14 9.2. draft-xia-avtext-splicing-for-rtp-00. . . . . . . . . . . 14 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 10.1. Normative References . . . . . . . . . . . . . . . . . . . 15 10.2. Informative References . . . . . . . . . . . . . . . . . .1615 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16 1. Introduction This document outlines how content splicing can be usedforin RTP sessions.SplicingSplicing, in general, is a processthat replaces the contentwhere part ofthe main RTP streama multimedia content is replaced with other multimedia content, anddelivers the substitutive contentdelivered toreceiverthe receivers for a period of time. The substitutive content can be provided for example via anotherRTPstream or via local media file storage. One representative use case for splicing isadvertisementslocal advertisement insertion,which allows operatorsallowing content providers to replace the national advertising content with its own regional advertising content prior to delivering the regional advertising content toreceiver.the receivers. Besides the advertisement insertion use case, there are other use casestoin whichRTPsplicing technology canapply.be applied. For example, splicing a recorded video into a video conferencing session,andor implementing a playlist server that stitches pieces of videotogether and so forth. So fartogether. Content splicing is a well-defined operation in MPEG-based cable TV systems. Indeed, the Society for Cable Telecommunications Engineers (SCTE) has created two standards, [SCTE30] and[SCTE35] have standardized[SCTE35], to standardize MPEG2-TS splicingrunning over cable. The introductionprocedure. SCTE 30 creates a standardized method for communication between advertisements server and splicer, and SCTE 35 supports splicing of MPEG2 transport streams. When using multimedia splicing intointernet requires changes to transport layer, but to date there is no guideline for how to handle content splicing for RTP sessions [RFC3550].the internet, the media may be transported by RTP. In thisdocument, we first describe a set of requirements of RTP splicing. Then we provide a method about how an intermediary node can be usedcase the original media content and substitutive media content will use the same time period, but may contain different numbers of RTP packets due toprocessdifferent media codecs and entropy coding. This mismatch may require some adjustments of the RTP header sequence number to maintain consistency. [RFC3550] provides the tools to enabled seamless content splicing in RTP session, but tomeetdate there has been no clear guidelines on how to use theserequirements from the aspects of feasibility, implementation complexity and backward compatibility. 2. Terminologytools. Thisdocument usesmemo outlines thefollowing terminologies. Currentrequirements for content splicing in RTPStream Thesessions and describes how an RTPstream thatmixer can be used to meet these requirements. 2. System Model and Terminology In this document, an intermediary network element, the Splicer handles RTPreceiver is currently receiving.splicing. Thecontent of current RTP streamSplicer canbe eitherreceive main contentorand substitutivecontent. Main Content The multimedia content that are conveyed in main RTP stream. Maincontent simultaneously, but willbe replaced by the substitutive content during splicing. Main RTP Stream Thesend one of them at one point of time. When RTPstream thatsplicing begins, the Spliceris receiving. Thesends the substitutive content to the RTP receiver instead of the mainRTP stream can be replaced by substitutivecontent for a period of time.Substitutive Content The multimediaWhen RTP splicing ends, the Splicer switches back sending the main contentthat replacesto the RTP receiver. A simplified RTP splicing diagram is depicted in Figure 1, in which only one main contentduring splicing. Theflow and one substitutive content flow are given. Actually, the Splicer can handle multiple splicing forexample be contained in an RTP stream from a media sender or fetched from local media file storage. Substitutivemultiple RTPStream Asessions simultaneously. RTPstream thatsplicing mayprovide substitutive content. Substitutive RTP stream and main RTP stream are two separate streams. If the substitutive content is provided via substitutive RTP stream, the substitutive RTP Stream must pass through Splicer before the substitutive content is delivered to receiver. Splicing In Point A virtual point in the RTP stream, suitable for substitutive content entry, that existshappen more than once in multiple time slots during theboundarylifetime oftwo independently decodable frames. Splicing Out Point A virtual point in the RTP stream, suitable for substitutive content exit, that exists intheboundary of two independently decodable frames. Splicer An intermediary node that inserts substitutive content intomain RTP stream. The methods how Splicersends substitutive contentlearns when toRTP receiver instead of main content during splicing. It is also responsible for processing RTCP traffic between media source and RTP receiver. 3. RTP Splicing Discussionstart andRequirements In this document, we assume an intermediary network element, which is referred to as Splicer, to playend thekey role to handle RTP splicing. A simplified RTPsplicingdiagramisdepicted in Figure 1, in which only one main content flow and one substitutive content flow are given.out of scope for this document. +---------------+ | | Main Content +-----------+|Main| Main RTPSender|------------->||------------->| |CurrentOutput Content | Content | | Splicer|---------->|---------------> +---------------+ ---------->| | | +-----------+ | | Substitutive Content | | +-----------------------+|Substitutive| Substitutive RTPSender|| | Content | | or | | Local File Storage | +-----------------------+ Figure 1: RTP Splicing ArchitectureWhen RTP splicing begins, Splicer stops deliveringThis document uses themain content, instead deliveringfollowing terminologies. Output RTP Stream The RTP stream that thesubstitutive content toRTP receiverfor a periodis currently receiving. The content oftime, and then resumes thecurrent RTP stream can be either main contentwhen splicing ends.or substitutive content. Main Content Themethods how Splicer learns when to start and end the splicing is out of scope for this document. The RTP splicing may happen more than once in casemultimedia content thatsubstitutiveare conveyed in main RTP stream. Main content will bedispersedly inserted in multiple time slots during the lifetime ofreplaced by themainsubstitutive content during splicing. Main RTPstream. When realizing splicing technology onStream The RTPlayer, there are a set of requirementsstream thatmust be satisfied to at least some degree on Splicer: REQ-1: Splicer must operate in either unicast or multicast session environment. REQ-2: Splicer should not cause perceptible media clipping at the splicing point and adverse impact onthequality of user experience. REQ-3: Splicer must be backward compatible with RTP/RTCP protocols, and its associated profiles and extensions to those protocols. For example, Splicer must be robust to packet loss, network congestion etc. REQ-4:Splicermustis receiving. The content of main RTP stream can betrustedreplaced bymedia source and receiver, and has the valid security context with media source and RTP receiver respectively. REQ-5: Splicer should allow the media source to learn the performance of the downstream receiver when itssubstitutive contentis being passed to RTP receiver. Infor anumberperiod ofdeployment scenarios, especially advertisement insertion, there may be one specific requirement. Given that it is unacceptable for advertisers that their advertising content is not delivered to user, this may requiretime. Main RTPsplicing to be operated withinSender The sender of RTP packets carrying thefollowing constraint: REQ-6: If Splicer intends to preventmain RTPreceiver from identifying and filteringstream. Substitutive Content The multimedia content that replaces the main content during splicing. The substitutivecontent, it should eliminate the visibility of splicing process oncontent can for example be contained in an RTPlevelstream from a media sender or fetched from local media file storage. Substitutive RTPreceiver point of view. However, substitutiveStream The multimedia contentandthat replaces the main contentare encoded by different encoders and have different parameter sets. In such case, a full media transcoding mustduring splicing. The substitutive content can for example bedone on Splicer to ensure the completely invisible impact oncontained in an RTPreceiver, but this may be prohibitively expensive and complex. Asstream from atrade-off, it is recommended to minimize the splicing visibility onmedia sender or fetched from local media file storage. Substitutive RTPreceiver, i.e., maintainingSender The sender of RTPheader parameters consistent but leavingpackets carrying the substitutive RTPpayload untranscoded. If one wants to realize complete invisibility, the cost of transcoding must be taken into account. Henceforth, we refer tostream. Splicing In Point A virtual point in theminimum and complete invisibility requirement as User Invisibility Requirement. To improveRTP stream, suitable for substitutive content entry, typically in theversatility of existing implementations and better interoperability, it is recommended to use existing toolsboundary between two independently decodable frames. Splicing Out Point A virtual point inRTP/RTCP protocol family to realize RTP splicing without any protocol extension unlesstheexisting tools are incompetent for splicing. 4. Recommended Solution forRTPSplicing Given thatstream, suitable for substitutive content exist, typically in the boundary between two independently decodable frames. Spliceris anAn intermediary nodeexists between thethat inserts substitutive content into mainmedia source and theRTPreceiver and splicing is not a very complicated processing, there are some chance that any existing RTP- level middlebox may has the incidental capability to meet the requirements described in previous section. Sincestream. The Splicerneeds to selectsends substitutive contentorto RTP receiver instead of main contentasduring splicing. It is also responsible for processing RTCP traffic between theinput content at one point of time, anRTPmixer seems to have such capability to do this under its own SSRC. Moreover, mixer may includesender and theCSRC list in outgoing packetsRTP receiver. 3. Requirements for RTP Splicing In order toindicate the source(s) ofallow seamless contentin some use cases like conferencing, this facilitatessplicing at thesystem debugging and loop detection. From this point of view, anRTPmixer may have some chance tolayer, the following requirements must beSplicer. In next four subsections (from subsection 4.1 to subsection 4.4), we start analyzing how an RTP mixer handles RTPmet. Meeting these will also allow, but not require, seamless content splicing at layers above RTP. REQ-1: The splicer should be agnostic about the network andhow it satisfiestransport layer protocols used to deliver thegeneral requirements listed in section 3. In subsection 4.5, we specially considerRTP streams. REQ-2: The splicing operation at thespecial requirement 6 (i.e., User Invisibility Requirement) since it needs to mask anyRTP layer must allow splicingclue on receiver (e.g, CSRC listat any point required by the media content, and must notbe includedconstrain when splicing inoutgoing packets to prevent receiver from identifying the difference between mainor splicing out operations can take place. REQ-3: Splicing of RTPstreamcontent must be backward compatible with the RTP/ RTCP protocol, associated profiles, payload formats, andsubstitutiveextensions. REQ-4: A content splicer will modify the content of RTPstream) when mixerpackets, and break the end-to-end security, e.g., breaking data integrity and source authentication. If the Splicer isused. 4.1. RTP Processing in RTP Mixer Once mixer has learnt whendesignated todo splicing,insert substitutive content, it mustget ready for the coming splicingbe trusted, i.e., be inadvance, e.g., fetchesthesubstitutive content either from local media file storage or via substitutivesame security context as the main RTPstream earlier than splicing in point. Ifsender, the substitutivecontent comes from local media file storage, mixer should leaveRTP sender, and theCSRC list blank inreceivers. If encryption is employed, theoutput stream. Even if splicing does not begin, mixer still needsSplicer must be able toreceivedecrypt themaininbound RTPstream,packets andgenerate a media stream as defined in RFC3550. Using main content, mixer generatesre-encrypt thecurrent media stream with its own SSRC, sequence number spaceoutbound RTP packets after splicing. REQ-5: The splicer should rewrite as necessary andtiming model. Moreover, mixer may insertforward RTCP messages (e.g., including packet loss, jitter, etc.) sent from downstream receiver to theSSRC ofmain RTPstream into CSRC list in the current media stream. When splicing begins, mixer choosessender or the substitutive RTPstream as input stream at splicing in point,sender, andextractsthus allow thepayload data (i.e., substitutive content). After that, mixer encapsulates substitutive content instead ofmaincontent asRTP sender or substitutive RTP sender to learn thepayloadperformance of thecurrent media stream, and then outputs the current media streamdownstream receiver when its content is being passed to RTP receiver.Moreover, mixer may insertIn addition, theSSRC ofsplicer should rewrite RTCP messages from the main RTP sender or substitutive RTPstream into CSRC list insender to thecurrent media stream. When splicing ends, mixer retrievesreceiver. REQ-6: The splicer must not affect other RTP sessions running between themainRTPstream as input stream at splicing out point,sender andextracts the payload data (i.e., main content). After that, mixer encapsulates main content instead of substitutive content as the payload ofthecurrent media stream,RTP receiver, andthen outputsmust be transparent for thecurrent media streamRTP sessions it does not splice. REQ-7: The content splicer should be able toreceiver. Moreover, mixer may insertmodify theSSRC of mainRTP streaminto CSRC listacross a splicing in or splicing out point such that thecurrent media stream. The whole RTPsplicingprocedurepoint isperhaps best explained by a pseudo code example: if (splicing begins) {not easy to detect in thesubstitutiveRTPstream is terminated on mixer and substitutive contentstream. For the advertisement insertion use case, it isencapsulated by mixer with its own SSRC identifier;important to make it difficult for thesequence numbers ofreceiver to detect it. Ensuring thecurrent RTP packets which contain substitutivesplicing point is not visible in the media contentare allocated by mixer and maintain consistentmay be easy with some codecs, but extremely difficult with others; in thesequence numbers of previous current RTP packets, untilworst case, the splicer may need to perform full media transcoding if it has to hide the splicingend;point in thetimestamp ofmedia content. This memo only focusses on making the splicing invisible at thecurrentRTPpacket increments linearly;layer. How (or if) theCSRC list ofsplicing is made invisible in thecurrentmedia stream is outside the scope of this memo. 4. Content Splicing for RTPpacket may include SSRCsessions The RTP specification [RFC3550] defines two types ofsubstitutivemiddlebox: RTP translators and RTP mixers. Splicing is best viewed as a mixing operation. The splicer generates a new RTPstream; } else {stream that is a mix of the main RTP streamis terminated on mixerandmain contentthe substitutive RTP stream. An RTP mixer isencapsulated bytherefore an appropriate model for a content splicer. In next four subsections (from subsection 4.1 to subsection 4.4), the document analyzes how the mixerwith its own SSRC identifier;handles RTP splicing and how it satisfies thesequence numbers ofgeneral requirements listed in section 3. In subsection 4.5, thecurrentdocument looks at REQ-7 in order to hide the fact that splicing take place. 4.1. RTPpackets which contain mainProcessing in RTP Mixer A contentare allocated bysplicer should be implemented as a mixer that receives the main RTP stream andmaintain consistent withthesequence numbers of previous currentsubstitutive content (possibly via a substitutive RTPpackets, untilstream), and sends a single output RTP stream to thesplicing begins;receiver(s). That output RTP stream will contain either thetimestamp ofmain content or thecurrentsubstitutive content. The output RTPpackets increments linearly;stream will come from theCSRC list ofmixer, and will have thecurrent RTP may includeSSRC ofmain RTP stream; } Splicing may occur morethe mixer rather thanone time duringthelifetime ofmain RTPstream, this means mixer needs to output main content andsender or the substitutivecontent in turn withRTP sender. The mixer uses its ownSSRC identifier. From receiver point of view,SSRC, sequence number space and timing model when generating theonly source ofoutput stream. Moreover, thecurrent stream ismixerwherever the content comes from. Note that, the substitutive content should be outputted inmay insert therangeSSRC ofsplicing duration. Any gap or overlap betweenmain RTP streamandinto CSRC list in the output media stream. At the splicing in point, when the substitutive content becomes active, the mixer chooses the substitutive RTP streammay induce media clippingas input stream at splicingpoint. More details about preventing media clipping are introducedinsection 4.3. 4.2. RTCP Processing in RTP Mixer By monitoring available bandwidth and buffer levels and by computing network metrics such as packet loss, network jitter,point, anddelay, RTP receiver can learnextracts thesituation on it and can communicate this information topayload data (i.e., substitutive content). If the substitutive content comes from local mediasource via RTCP reception reports. According tofile storage, thedescription in section 7.3 of [RFC3550],mixerdivides RTCP flow betweendirectly fetches the substitutive content. After that, the mixer encapsulates substitutive content instead of main content as the payload of the output mediasourcestream, andreceiver into two separate RTCP loops, media source probably has no idea aboutthen sends thesituation on receiver. Hence, mixer can use some mechanisms, allowingoutput RTP mediasource to at least some degreestream tohave some knowledgereceiver. The mixer may insert the SSRC of substitutive RTP stream into CSRC list in thesituation on receiver when itsoutput media stream. If the substitutive contentis being passed to receiver. Becausecomes from local media file storage, the mixer should leave the CSRC list blank. At the splicingis a processing thatout point, when the substitutive content ends, the mixerselects one mediaretrieves the main RTP streamfrom multiple streams but neither mixing nor transcoding them, upon receiving an RTCP receiver report from downstream receiver, mixer can forward it to original media source with its SSRC identifier intactas input stream at splicing out point, and extracts the payload data (i.e., main content). After that, theSSRCmixer encapsulates main content instead ofdownstream receiver). Given thatsubstitutive content as thenumberpayload of the outputRTP packets containing substitutive content is equalmedia stream, and then sends the output media stream to thenumber of input substitutive RTP packets (from substitutive RTP stream) during splicing. Inreceivers. Moreover, thesame manner,mixer may insert thenumberSSRC ofoutput RTP packets containingmain RTP stream into CSRC list in the output media stream as before. Note that if the content isequaltoo large tothe number of input mainfit into RTP packets(from mainsent to RTPstream) during non-splicing, soreceiver, the mixer needs to transcode or perform application- layer fragmentation. Usually the mixer is deployed as part of a managed system and MTU will be carefully managed by this system. This document does notneedraise any new MTU related issues compared tomodify loss packet fieldsa standard mixer described inReceiver Report Blocks unless[RFC3550]. Splicing may occur more than once during thereporting intervals spanslifetime of main RTP stream, this means thesplicing point. Butmixer needs tochange the SSRC fieldsend main content and substitutive content inreport block to theturn with its own SSRCidentifieridentifier. From receiver point oforiginal media source and rewriteview, theextended highest sequence number field to the corresponding original extended highest sequence number before forwarding the RTCP receiver report to original media source. When a RTCP receiver report spansonly source of thesplicing point, it reflectsoutput stream is thecharacteristicsmixer regardless of where thecombination of maincontent is coming from. 4.2. RTCP Processing in RTPpacketsMixer By monitoring available bandwidth andsubstitutivebuffer levels and by computing network metrics such as packet loss, network jitter, and delay, RTPpackets, in which case, mixer needs to divide the receiver report into two separatedreceiverreportscan learn the network performance andsend them to their original media sources respectively. For each separated receiver report, mixer also needscommunicate this tomakethecorresponding changesRTP sender via RTCP reception reports. According to thepacket loss fieldsdescription inreport block besides the SSRC field andsection 7.3 of [RFC3550], theextended highest sequence number field. Themixercan also informsplits themedia source of quality with whichRTCP flow between sender and receiver into two separate RTCP loops, RTP sender has no idea about thecontent reachessituation on themixer. Thisreceiver. But splicing isdone bya processing that the mixergenerating RTCP reports forselects one media stream from multiple streams rather than mixing them, so theRTP stream, which it sends upstream towardsmixer can leave the SSRC identifier in themedia source. TheseRTCPreports usereport intact (i.e., the SSRC of downstream receiver), this enables themixer. Based on above RTCP operating mechanism,main RTP sender or themedia source whose content is being passedsubstitutive RTP sender toreceiver, will seelearn thereception quality of its stream receivedsituation onmixer, andthereception quality of spliced stream received onreceiver.The media source whose content is not being passedWhen the RTCP report corresponds toreceiver, will only seea time interval that is entirely main content or entirely substitutive content, thereception qualitynumber ofits stream received on mixer. If theoutput RTP packets containing substitutive contentcomes from local media file storage ( i.e., mixer can be regarded asis equal to the number of input substitutivemedia source),RTP packets (from substitutive RTP stream) during splicing, in thereception reports received from downstream relate tosame manner, thesubstitutivenumber of output RTP packets containing main contentshould be terminated onis equal to the number of input main RTP packets (from main RTP stream) during non-splicing unless the mixerwithout any further processing. 4.3. Media Clipping Considerationsfragment the input RTP packets. Thissection provides informative guideline about how media clipping may shape and howmeans that the mixerdeal withdoes not need to modify themedia clipping. Ifloss packet fields in reception report blocks in RTCP reports. But if thetime slotmixer fragments the input RTP packets, it may need to modify the loss packet fields to compensate forsubstitutivethe fragmentation. Whether the input RTPstream mismatches (shorterpackets are fragmented orlonger than)not, thedurationmixer still needs to change the SSRC field in report block to the SSRC identifier of thereservedmain RTPstream for replacing,sender or themedia clipping may occur atsubstitutive RTP sender, and rewrite thesplicing point which usually isextended highest sequence number field to thejoint between two independently decodable frames. Atcorresponding original extended highest sequence number before forwarding the RTCP report to the main RTP sender or the substitutive RTP sender. When the RTCP report spans the splicing inpoint, mixer can fill up receiver's buffer with substitutive content several seconds earlier than the presentation time of substitutive content so that smooth playback can be achieved without pausespoint orstuttering on RTP receiver. Compared to buffering method used at splicing in point, things become somewhat complex atthe splicing outpoint. The case that insertion duration is shorter thanpoint, it reflects thereserved gap time may cause a little playback latencycharacteristics of the combination of main RTPstream onpackets and substitutive RTPreceiver, but not adversely impactpackets. In this case, thequality of user experience. One alternative approach is thatmixermay pad some blank content (e.g., all black sequence)needs tofill updivide thegap. Another alternative approach is that main media source mayRTCP report into two separate RTCP reports and sendfiller content (e.g., static channel identifier) during splicing,them to their original RTP senders respectively. For each RTCP report, the mixercan switch backalso needs toearly when it runs out of substitutive content. However,make the corresponding changes to the packet loss fields incase that insertion duration is longer thanreport block besides thereserved gap duration, there existsSSRC field and the extended highest sequence number field. When the mixer receives anoverlap ofRTCP extended report (XR) block, it should rewrite thesubstitutive RTP stream andXR report block in a similar way to the reception report block in the RTCP report. The mixer can also inform the main RTPstream at splicing out point. One straightforward approach is thatsender or the substitutive RTP sender of the reception quality of the content reaches the mixertakes a ungracefule action, terminatingduring thesplicing and switching backtime when the content is not sent tomainthe RTPstream even if this may cause media stuttering onreceiver.ThereThis isan alternative approach which may be mild but somewhat complex,done by the mixerbuffers main contentgenerating RTCP reports fora while untilthe main RTP stream and/or the substitutive RTP stream. These RTCP reports use the SSRC of the mixer. If the substitutive contentis finished, and then transmits buffered maincomes from local media file storage, the mixer does not need to generate RTCP reports for the substitutive stream. Based on above RTCP operating mechanism, the RTP sender whose content is being passed to receiverat an acceleated bitrate (as compared towill see thenominal bitratereception quality ofmain RTP stream) untilitsbuffer level returns to normal. At this point in time, mixer transmits mainstream as received by the mixer, and the reception quality of spliced stream as received by the receiver. The RTP sender whose content is not being passed to receiverat an nominal bitrate of main RTP stream. Note that mixer should take into account a varietywill only see the reception quality ofparameters, suchits stream asavailable bandwidth between mixer and receiver,received by the mixer. The mixerbuffer levelmust forward RTCP SDES andreceiver buffer level, to countBYE packets from theaccelerated bitrate value. Another reason to cause media clipping is synchronization delay at splicing point if RTPreceiverneedstosynchronize multiple current streams for playback. How to address this issue is discussedthe sender, and may forward them indetailinverse direction as defined in[RFC6051], which provides three feasible approaches to reduce synchronization delay. 4.4. Congestion Control Considerations Provided that the substitutive content has somewhat different characteristics tosection 7.3 of [RFC3550]. Once themain contentmixer receives an RTP/AVPF [RFC4585] transport layer feedback packet, itreplaces (e.g.,must handle it carefully as themore dynamic content,feedback packet may contain the information of thehigher bandwidth occupation), or substitutivecontentmay be encoded with different codec and hasthat come from differentencoding bitrate, some challenge raiseRTP senders. In this case the mixer needs tonetwork capacity and receiver buffer size. A more dynamic content or a higher encoding bitrate stream might overload the network and possibly exceeddivide thereceiver's media consumption rate, which might flood receiver's buffer and eventually result in a buffer overflow. Either network overload or buffer overflow would induce network congestion and congestion-causedfeedback packetloss. To be robust to network congestioninto two separate feedback packets andpacket loss, mixer must continuously monitor the network situation by means of a variety of manners: 1. RTCP receiver reports indicate packet loss [RFC3550]. 2. RTCP NACKs for lost packet recovery [RFC4585]. 3. RTCP ECN Feedback information [I-D.ietf-avtcore-ecn-for-rtp]. Upon detection of above three types of RTCP reports during splicing, mixer will treat them with three different manners as following: 1. If mixer receives the RTCP receiver reports with packet loss indication, it willprocessthem asthedescription giveninformation insection 7.3 of [RFC3550]. 2. If mixer receivestheRTCP NACK packets definedfeedback control information (FCI) in[RFC4585] from RTP receiver for packet loss recovery, it first identifiesthecontent category of lost packets to whichtwo feedback packets, just as theNACK corresponds. Then, mixer will generate newRTCPNACK for the lost packets with its own SSRC, and make corresponding changes to their sequence numbers to match original, pre-spliced, packets.report process described above. If thelostsubstitutive content comes from local media filestorage,storage (i.e., the mixeractingcan be regarded assubstitutive media source will directly fetchthelostsubstitutivecontent and retransmit it toRTPreceiver. It is somewhat complex that the lost packets requested in a singlesender), any RTCPNACK message not only contain the main content but alsopackets received from downstream relate to the substitutivecontent. To address this, mixercontent mustdividebe terminated on theRTCP NACK packet into two separate RTCP NACK packets: onemixer without any further processing. 4.3. Media Clipping Considerations This section provides informative guideline about how media clipping is shaped and how the mixer deal with the media clipping. If the time slot for substitutive content mismatches (is shorter or longer than) the duration of the main content to be replaced, then media clipping may occur at the splicing point. If the substitutive content has shorter duration from the main content, then there will be a gap in the output RTP stream. The RTP sequence number will be contiguous across this gap, but there will be an unexpected jump in the RTP timestamp. This gap will cause the receiver to have nothing to play. This is unavoidable, unless the mixer adjusts the splice in or splice out point to compensate, sending more of the main RTP stream in place of the shorter substitutive stream, or unless the mixer can vary the length of the substitutive content. It is the responsibility of the higher layer protocols to ensure that the substitutive content is of the same duration as the main content to be replaced. If the insertion duration is longer than the reserved gap duration, there will be an overlap between the substitutive RTP stream and the main RTP stream at splicing out point. One straightforward approach is that the mixer takes an ungraceful action, terminating the splicing and switching back to main RTP stream even if this may cause media stuttering on receiver. Alternatively, the splicer may transcode the substitutive content to play at a faster rate than normal, to adjust it to the length of the gap in the main content, and generate a new RTP stream for the transcoded content. This is a complex operation, and very specific to the content and media codec used. 4.4. Congestion Control Considerations If the substitutive content has somewhat different characteristics from the main content it replaces, or if the substitutive content is encoded with a different codec or has different encoding bitrate, it might overload the network and might cause network congestion on the path between the mixer and the RTP receiver(s) that would not have been caused by the main content. To be robust to network congestion and packet loss, a mixer that is performing splicing must continuously monitor the status of downstream network by monitoring any of the following RTCP reports that are used: 1. RTCP receiver reports indicate packet loss [RFC3550]. 2. RTCP NACKs for lost packet recovery [RFC4585]. 3. RTCP ECN Feedback information [I-D.ietf-avtcore-ecn-for-rtp]. Once the mixer detects congestion on its downstream link, it will treat these reports as follows: 1. If the mixer receives the RTCP receiver reports with packet loss indication, it will forward the reports to the substitutive RTP sender or the main RTP sender as described in section 4.2. 2. If mixer receives the RTCP NACK packets defined in [RFC4585] from RTP receiver for packet loss recovery, it first identifies the content category of lost packets to which the NACK corresponds. Then, the mixer will generate new RTCP NACK for the lost packets with its own SSRC, and make corresponding changes to their sequence numbers to match original, pre-spliced, packets. If the lost substitutive content comes from local media file storage, the mixer acting as substitutive RTP sender will directly fetch the lost substitutive content and retransmit it to RTP receiver. The mixer may buffer the sent RTP packets and do the retransmission. It is somewhat complex that the lost packets requested in a single RTCP NACK message not only contain the main content but also the substitutive content. To address this, the mixer must divide the RTCP NACK packet into two separate RTCP NACK packets: one requests for the lost main content, and another requests for the lost substitutive content. 3.In [I-D.ietf-avtcore-ecn-for-rtp], two RTCP extensions are defined for ECN feedback: RTP/AVPF transport layer ECN feedback packet for urgent ECN information, and RTCP XR ECN summary report block for regular reporting of the ECN marking information.If an ECN-aware mixer receivesanyRTCP ECNfeedback (i.e., RTCPfeedbacks (RTCP ECN feedback packets or RTCP XR summary reports) defined in [I-D.ietf-avtcore-ecn-for-rtp] from the RTP receiver, it must process them in a similar way to the RTP/AVPF feedback packet or RTCP XR process described in section 4.2 of this memo. These three methods require the mixer to run a congestion control loop and bitrate adaptation between itself and RTP receiver. The mixer can thin or transcode the main RTP stream or the substitutive RTPreceiver, it must operates as description givenstream, but such operations are very inefficient and difficult, and bring undesirable delay. Fortunately insection 8.4 of [I-D.ietf-avtcore-ecn-for-rtp], terminatingthis memo, the mixer acting as splicer can rewrite the RTCPECN feedbackpackets sent fromdownstream receivers,the RTP receiver and forward them to the RTP sender, letting the RTP sender knows that congestion is being experienced on the path between the mixer anddrivingthe RTP receiver. Then, the RTP sender applies its congestion controlloopalgorithm and reduces the media bitrate to a value that is in compliance with congestion control principles for the slowest link. The congestion control algorithm may be a TCP-friendly bitrate adaptationbetween itself and downstream receiveralgorithm specified in [RFC5348], or a DCCP congestion control algorithms defined in [RFC5762]. If the substitutive content comes from local media file storage, the mixer must directly reduce the bitrate as if it were themedia source. In addition, an ECN- awaresubstitutive RTPmixer must generate RTCP ECN feedback relatingsender. From above analysis, to reduce theinput RTP streams it terminates, and drivingrisk of congestioncontrol loopand remain the bandwidth consumption stable over time, the substitutive RTP stream is recommended to be encoded at an appropriate bitrateadaptation between itselfto match that of main RTP stream. If the substitutive RTP stream comes from the substitutive RTP sender, this sender had better has some knowledge about the media encoding bitrate of main content in advance. How it knows that is out of scope in this draft. 4.5. Processing Splicing in User Invisibility Case If it is desirable to prevent receivers from detecting that splicing has occurred at the RTP layer, the mixer must not include a CSRC list in outgoing RTP packets, andupstreammust not forward RTCP from the main RTP senderas if it wereor from the substitutive RTP sender.Once mixer learns that congestion is being experienced on its downstream link by means of above three detection mechanisms, it should adaptDue to thebitrateabsence ofoutput stream in response to network congestion. The bitrate adaptation may be determined by a TCP- friendly bitrate adaptation algorithm specified in [RFC5348], or by a DCCP congestion control algorithms definedCSRC list in[RFC5762]. In practice, during splicing,thereal reasonoutput RTP stream, the RTP receiver only initiates SDES, BYE and APP packets tocause congestion usually isthedifferent characteristicmixer without any knowledge ofsubstitutive RTP stream (more dynamic content or higher encoding bitrate) withthe main RTPstream,sender andthat stream transcoding or thinning on mixerthe substitutive RTP sender. CSRC list identifies the contributing sources, these SSRC identifiers of contributing sources are kept globally unique for each RTP session. The uniqueness of SSRC identifier isvery inefficientused to resolve collisions anddifficult operation. Therefore,detecting RTP-level forwarding loops as defined in section 8.2 of [RFC3550]. The absence of CSRC list in this case will create ameansdanger thatenables substitutive media sourceloops involving those contributing sources could not be detected. So Non-RTP means must be used tolimitdetect and resolve loops if themedia bitrate it is currently generating even insplicer does not add a CSRC list. 5. Implementation Considerations When theabsence of congestionmixer is used to handle RTP splicing, RTP receiver does not need any RTP/RTCP extension for splicing. As a trade-off, additional overhead could be induced on thepath between itselfmixer which uses its own sequence number space and timing model. So the mixer will rewrite RTP sequence number and timestamp whatever splicing isdesirable. The TMMBR message defined in [RFC5104] provides an effective method. Whenactive or not, and generate RTCP flows for both sides. In case the mixerdetects congestion on its downstream link during splicing, it uses TMMBR to request substitutive media sourceserves multiple main RTP streams simultaneously, this may lead toreducemore overhead on themedia bitrate to a value thatmixer. If User Invisibility Requirement is required, CSRC list is not included incomplianceoutgoing RTP packet, this brings a potential issue withcongestion control principles forloop detection as briefly described in section 4.5. 6. Security Considerations The splicing application is subject to theslowest link. Upon receptiongeneral security considerations ofTMMBR, substitutive media source applies its congestion control algorithm and responds Temporary Maximum Media Stream Bit Rate Notification (TMMBN) to mixer. Ifthesubstitutive content comes from local media file storage,RTP specification [RFC3550]. The mixermust directly reduce the substitutive media bitrateacting as splicer replace some content with other content in RTP packets, thus breaking thesubstitutive mediaend-to-end security, such as integrity protection and sourcewhen it detects any congestion on its downstream link during splicing. From above analysis,authentication. Its behavior looks like a middleman attack, but SRTP [RFC3711] can be used toreduceauthenticate therisk of congestionmixer, andremainprovide integrity protection on thebandwidth consumption stable over time,path between thesubstitutive RTP streammixer and the receivers, but the receiver cannot (and isrecommendednot supposed to beencoded at an appropriate bitrate to match that of main RTP stream. Ifable to) determine what content comes from thesubstitutivemain RTPstreamsender and what comes from the substitutivemedia source,RTP sender by looking at thesource had better has some knowledge aboutRTP layer. The RTP receiver does not communicate directly with themedia encoding bitrate ofmaincontent in advance. How it knows that is out of scope in this draft. 4.5. Processing Splicing in User Invisibility Case Mixer will not includes CRSC list in outgoingRTPpackets to prevent user from detectingsender or thesplicing occurred onsubstitutive RTPlevel. Due tosender, and does not have an end-to- end security relationship with them at theabsenceRTP layer. The nature ofCRSC list in currentthis RTPstream,service offered by a network operator employing a content splicer is that the RTP layer security relationship is between the receiveronly initiates SDES, BYEandAPP packets to mixer without any knowledge of main media sourcethe mixer, and between the senders and the mixer, andsubstitutive media source. This creates a danger that loops involving those sources couldnotbe detected. 5. Implementation Considerations When mixer is usedend-to-end. The network operator must delegate authority tohandle RTP splicing, RTP receiver does not need any RTP/RTCP extension for splicing. As a trade-off, additional overhead could be induced on mixer which uses its own sequence number space and timing model. Sothe mixerwill rewritein exchange for the ability to perform RTPsequence number and timestamp whateversplicing inside the network. If encryption isactive or not, and generate RTCP flows for both sides. In caseemployed, the mixerserves multiple main RTP streams simultaneously, this may lead to more overhead on mixer. In addition, there is a potential issue with loop detection, which wouldmust beproblematic if User Invisibility Requirement is required. 6. Security Considerationsable to decrypt the inbound RTP packets and re-encrypt the outbound RTP packets. If any payload internal security mechanisms (e.g., ISMACryp [ISMACryp]) are used, onlymedia sourcethe RTP sender and the RTP receiver can learn the security keying material generated by such internal security mechanism, in which case, any middlebox (e.g., mixer) betweenmedia sourcethe RTP sender and the RTP receiver can't get such keyingmaterial. Only when regular transport security mechanisms (e.g., SRTP, IPSec, etc) are used, mixer will process the packets passing through it. The security considerations of the RTP specification [RFC3550], the Extended RTP profile for RTCP-Based Feedback [RFC4585], and the Secure Real-time Transport Protocol [RFC3711] apply. Mixer must be trusted by main media source and insertion media source,material, andmust be included in the security context.thus fail to perform splicing. 7. IANA Considerations No IANA actions are required. 8. Acknowledgments The following individuals have reviewed the earlier versions of this specification and provided very valuable comments: Colin Perkins, Magnus Westerlund, Roni Even, Tom Van Caenegem, Joerg Ott, David R Oran, Cullen Jennings, Ali C Begen, Charles Eckel and Ning Zong. 9.Change Log 9.1. draft-xia-avtext-splicing-for-rtp-01 The following are the major changes compared to previous version 00: o Use10. Appendix- Why Mixer Is Chosen Translator and mixerto handlebothuser visible and invisible splicing. o Add one subsection to describe media clipping considerations. o Add one subsection to describe congestion control considerations. 9.2. draft-xia-avtext-splicing-for-rtp-00 The following are the major changes compared to previous AVT I-D version 00: o Change primarycan realize splicing by changing a set of RTPstreamparameters. Translator has no SSRC, hence it is transparent tomainRTPstream, add currentsender and receiver. Therefore, RTPstream assender sees thestreaming received by RTP receiver. o Eliminatefull path to theambiguity of inserted content with substitutive content which replacesreceiver when translator is passing its content. When translator insert themainsubstitutive contentrather than pause it. o ClarifyRTP sender could get a report on thesignaling requirements. o Deletepath up to translator itself. Additionally, if user detectability is not required, translator does not need to rewrite RTP headers, thedescription on Mixer and MCU in section 4, mainly focusoverhead onthe direction whether a Translatortranslator canact as a Splicer. o Add section 5be avoided. If mixer is used todescribedo splicing, it can also allow RTP sender to learn theexact guidancesituation of its content onhow an RTP Translatorreceiver or on mixer just like translator does, which isusedspecified in section 4.2. Compared to translator, mixer's outstanding benefit is that it is pretty straight forward to do with bit-rate adaptation to handlesplicing. o Modify the securityvarying network conditions. But translator needs more considerationssectionandadd acknowledges section.its implementation is more complex. From above analysis, both translator and mixer have their own advantages: less overhead or less complexity on handling RTCP. Through long and sophisticated discussion, the avtext WG members prefer less complexity rather than less overhead and incline to mixer to do splicing. If one chooses mixer as splicer, the overhead on mixer must be taken into account. If one chooses translator as splicer, the complex RTCP processing on translator must be taken into account. 10. References 10.1. Normative References [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 2006.[RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, "Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)", RFC 5104, February 2008. [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP Flows", RFC 6051, November 2010.[I-D.ietf-avtcore-ecn-for-rtp] Westerlund, M., "Explicit Congestion Notification (ECN) for RTP over UDP",draft-ietf-avtcore-ecn-for-rtp-06draft-ietf-avtcore-ecn-for-rtp-08 (work in progress),FebruaryMay 2012. 10.2. Informative References [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC 5348, September 2008. [RFC5762] Perkins, C., "RTP and the Datagram Congestion Control Protocol (DCCP)", RFC 5762, April 2010. [SCTE30] Society of Cable Telecommunications Engineers (SCTE), "Digital Program Insertion Splicing API",2001.2009. [SCTE35] Society of Cable Telecommunications Engineers (SCTE), "Digital Program Insertion Cueing Message for Cable",2004.2011. [ISMACryp] Internet Streaming Media Alliance (ISMA), "ISMA Encryption and Authentication Specification 2.0", November 2007. Author's Address Jinwei Xia Huawei Software No.101 Nanjing, Yuhuatai District 210012 China Phone: +86-025-86622310 Email: xiajinwei@huawei.com