STRAW Working Group L. Miniero
Internet-Draft Meetecho
Intended status: Standards Track S. Garcia Murillo
Expires: April 27, 2015 Medooze
V. Pascual
October 24, 2014

Guidelines to support RTCP end-to-end in Back-to-Back User Agents (B2BUAs)


SIP Back-to-Back User Agents (B2BUAs) are often envisaged to also be on the media path, rather than just intercepting signalling. This means that B2BUAs often implement an RTP/RTCP stack as well, whether to act as media transcoders or to just passthrough the media themselves, thus leading to separate multimedia sessions that the B2BUA correlates and bridges together. If not disciplined, though, this behaviour can severely impact the communication experience, especially when statistics and feedback information contained in RTCP packets get lost because of mismatches in the reported data.

This document defines the proper behaviour B2BUAs should follow when also acting on the signalling/media plane in order to preserve the end-to-end functionality of RTCP.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on April 27, 2015.

Copyright Notice

Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents ( in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

Table of Contents

1. Introduction

Session Initiation Protocol [RFC3261] Back-to-Back User Agents (B2BUAs) are SIP entities that can act as a logical combination of both a User Agent Server (UAS) and a User Agent Client (UAC). As such, their behaviour is not always completelely adherent to the standards, and can lead to unexpected situations the IETF is trying to address. [RFC7092] presents a taxonomy of the most deployed B2BUA implementations, describing how they differ in terms of the functionality and features they provide.

Such components often do not only act on the signalling plane, that is intercepting and possibly modifying SIP messages, but also on the media plane. This means that, when on the signalling path between two or more participants willing to communicate, such components also manipulate the session description [RFC4566] in order to have all RTP and RTCP [RFC3550] pass through it as well within the context of an SDP offer/answer [RFC3264]. The reasons for such a behaviour can be different: the B2BUA may want, for instance, to provide transcoding functionality for participants with incompatible codecs, or it may need the traffic to be directly handled for different reasons like billing, lawful interception, session recording and so on. This can lead to several different topologies for RTP-based communication, as documented in [RFC5117]. These topologies are currently being updated to address new commonly encountered scenarios as well [I-D.ietf-avtcore-rtp-topologies-update].

Whatever the reason, such a behaviour does not come without a cost. In fact, whenever a media-aware component is placed on the path between two or more participants that want to communicate by means of RTP/RTCP, the end-to-end nature of such protocols is broken, and their effectiveness may be affected as a consequence. While this may not be a problem for RTP packets, which from a protocol point of view just contain opaque media packets and as such can be quite easily relayed, it definitely can cause serious issue for RTCP packets, which carry important information and feedback on the communication quality the participants are experiencing. In fact, RTCP packets make use of specific ways to address the media they are referring to. Consider, for instance, the simple scenario only involving two participants and a single RTP session depicted in Figure 1:

+--------+              +---------+              +---------+
|        |=== SSRC1 ===>|         |=== SSRC3 ===>|         |
| Alice  |              |  B2BUA  |              |   Bob   |
|        |<=== SSRC2 ===|         |<=== SSRC4 ===|         |
+--------+              +---------+              +---------+

Figure 1: B2BUA modifying RTP headers

In this common scenario, a participant (Alice) is communicating with another participant (Bob) as a result of a signalling session managed by a B2BUA: this B2BUA is also on the media path between the two, and is acting as a media relay. This means that two separate RTP sessions are involved (one per side), each carrying two RTP streams (one per media direction). As part of this process, though, it is also rewriting some of the RTP header information on the way, for instance because that's how its RTP relaying stack works: in this example, just the SSRC of the incoming RTP audio streams is changed, but more information may be changed as well (e.g., sequence numbers, timestamps, etc.). In particular, whenever Alice sends an audio RTP packet, she sets her SSRC (SSRC1) to the RTP header of her RTP source stream; the B2BUA rewrites the SSRC (SSRC3) before relaying the packet to Bob. At the same time, RTP packets sent by Bob (SSRC4) get their SSRC rewritten as well (SSRC2) before being relayed to Alice.

Assuming now that Alice needs to inform Bob she has lost several audio packets in the last few seconds, maybe because of a network congestion, she would of course place the related received RTP stream SSRC she is aware of (SSRC2), together with her own (SSRC1), in RTCP Reports and/or NACKS to do so, hoping for a retransmission or for Bob to slow down. Since the B2BUA is making use of different SSRCs for the RTP streams in the RTP session it established with each participant, a blind relaying of the RTCP packets to Bob would in this case result, from Bob's perspective, in unknown SSRCs being addressed, thus resulting in the precious information being dropped. In fact, Bob is only aware of SSRCs SSRC4 (the one his source RTP stream uses) and SSRC3 (the one he's receiving from the B2BUA in the received RTP stream), and knows nothing about SSRCs SSRC1 and SSRC2 in the RTCP packets he would receive instead. As a consequence of the feedback being dropped, unaware of the issue Bob may continue to flood Alice with even more media packets and/or not retransmit Alice the packets she missed, which may easily lead to a very bad communication experience, if not eventually to an unwanted termination of the communication itself.

This is just a trivial example that, together with additional scenarios, will be addressed in the following sections. Nevertheless, it is a valid example of how such a trivial mishandling of precious information may lead to serious consequences, especially considering that more complex scenarios may involve several participants at the same time, multiple RTP sessions (e.g., a video stream along audio) rather than a single one, redundancy RTP streams, SSRC multiplexing and so on. Considering how common B2BUA deployments are, it is very important for them to properly address such feedback, in order to be sure that their activities on the media plane do not break anything they're not supposed to.

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

Besides, this document addresses, where relevant, the RTP-related terminology as disciplined in [I-D.ietf-avtext-rtp-grouping-taxonomy].

3. Signalling/Media Plane B2BUAs

As anticipated in the introductory section, it's very common for B2BUA deployments to also act on the media plane, rather than just signalling alone. In particular, [RFC7092] describes three different categories of such B2BUAs, according to the level of activities performed on the media plane: a B2BUA, in fact, may act as a simple media relay (1), effectively unaware of anything that is transported; it may be a media-aware relay (2), also inspecting and/or modifying RTP and RTCP packets as they flow by; or it may be a full-fledged media termination entity, terminating and generating RTP and RTCP packets as needed.

While [RFC3550] and [RFC5117] already mandate some specific behaviours when specific topologies are deployed, not all deployments strictly adhere to the specifications and as such it's not rare to encounter issues that may be avoided with a more disciplined behaviour in that regard. For this reason, the following subsections will describe the proper behaviour B2BUAs, whatever above category they fall in, should follow in order to avoid, or at least minimize, any impact on end-to-end RTCP effectiveness.

3.1. Media Relay

A media relay as identified in [RFC7092] basically just forwards, from an application level point of view, all RTP and RTP packets it receives, without either inspecting or modifying them. Using the RTP Topologies terminology, this can be seen as a RTP Transport Translator. As such, B2BUA acting as media relays are not aware of what traffic they're handling, meaning that not only the packet payloads are opaque to them, but headers as well. Many Session Border Controllers (SBC) implement this kind of behaviour, e.g., when acting as a bridge between an inner and outer network.

Considering all headers and identifiers in both RTP and RTCP are left untouched, issues like the SSRC mismatch described in the previous section would not occur. Similar problems could occur, though, should the session description end up providing incorrect information about the media flowing (e.g., if the SDP on either side contain 'ssrc' [RFC5576] attributes that don't match the actual SSRC being advertized on the media plane) or about the supported RTCP mechanisms (e.g., in case the B2BUA advertized support for NACK because it implements it, but the original INVITE didn't). Such an issue might occur, for instance, in case the B2BUA acting as a media relay is generating a new session description when bridging an incoming call, rather than taking into account the original session description in the first place. This may cause the participants to find a mismatch between the SSRCs advertized in SDP and the ones actually observed in RTP and RTCP packets (which may indeed change during a multimedia session anyway, but having them synced during setup would help nonetheless), or having them either ignore or generate RTCP feedback packets that were not explicitly advertized as supported.

In order to prevent such an issue, a media-relay B2BUA SHOULD forward all the SSRC- and RTCP-related SDP attributes when handling a multimedia session setup between interested participants: this includes attributes like 'ssrc' [RFC3261], 'rtcp-fb' [RFC4585], 'rtcp-xr-attrib' [RFC3611] and others. It SHOULD NOT, though, blindly forward all SDP attributes, as some of them (e.g., candidates, fingerprints, crypto, etc.) may lead to call failures for different reasons out of scope to this document. One notable example is the 'rtcp' [RFC3605] attribute that UAC may make use of to explicitly state the port they're willing to use for RTCP: considering the B2BUA would relay RTCP packets, the port as seen by the other UAC involved in the communication would differ from the one negotiated originally, and as such it MUST be rewritten accordingly.

Besides, it is worth mentioning that, leaving RTCP packets untouched, a media relay may also let through information that, according to policies, may be best left hidden or masqueraded, e.g., domain names in CNAME items. Nevertheless, that information cannot break the end-to-end RTCP behaviour.

3.2. Media-aware Relay

A Media-aware relay, unlike the the Media Relay addressed in the previous section, is actually aware of the media traffic it is handling. As such, it is able to inspect RTP and RTCP packets flowing by, and may even be able to modify the headers in any of them before forwarding them. Using the RFC3550 terminology, this can be seen as a RTP Translator. A B2BUA implementing this role would typically not, though, inspect the RTP payloads as well, which would be opaque to them: this means that the actual media would not be manipulated (e.g, transcoded).

This makes them quite different from the Media Relay previously discussed, especially in terms of the potential issues that may occur at the RTCP level. In fact, being able to modify the RTP and RTCP headers, such B2BUAs may end up modifying RTP related information like SSRC (and hence CSRC lists, that must of course be updated accordingly), sequence numbers, timestamps and the like in an RTP stream, before forwarding the modified packets to the other interested participants in the multimedia sessions on the RTP streams they're using to receive the media. This means that, if not properly disciplined, such a behaviour may easily lead to issues like the one described in the introductory section. As such, it is very important for a B2BUA modifying RTP-related information across two related RTP streams to also modify the same information in RTCP packets as well, and in a coherent way, so that not to confuse any of the participants involved in a communication.

It is worthwile to point out that such a B2BUA would not necessarily forward all the packets it is receiving, though: Selective Forwarding Units (SFU) [I-D.ietf-avtcore-rtp-topologies-update], for instance, could aggregate or drop incoming RTCP messages, while at the same time originating new ones on their own. For the messages that are forwarded and/or aggregated, though, it's important to make sure the information is coherent.

Besides the behaviour already mandated for RTCP translators in Section 7.2 of [RFC3550], a media-aware B2BUA MUST also handle incoming RTCP messages to forward following this guideline:

If the B2BUA has changed any SSRC in any RTP streams relation, it MUST update the SSRC-related information in the incoming SR packet before forwarding it. This includes the sender SSRC, which MUST be rewritten with the one the B2BUA uses in the RTP stream used to receive RTP packets from each participant, and the SSRC information in all the blocks, which MUST be rewritten using the related sender participant(s) SSRC. If the B2BUA has also changed the base RTP sequence number when forwarding RTP packets, then this change needs to be properly addressed in the 'extended highest sequence number received' field in the Report Blocks.
The same guidelines given for SR apply for RR as well.
If the B2BUA has changed any SSRC in any direction, it MUST update the SSRC-related information in all the chunks in the incoming SDES packet before forwarding it.
If the B2BUA has changed any SSRC in any direction, it MUST update the SSRC in the BYE message.
If the B2BUA has changed any SSRC in any direction, it MUST update the SSRC in the BYE message. Should the B2BUA be aware of any specific APP message format that contains additional information related to SSRCs, it SHOULD update them as well.
Extended Reports (XR):
If the B2BUA has changed any SSRC in any direction, it MUST update the SSRC-related information in the incoming XR message header before forwarding it. This includes the source SSRC, which MUST be rewritten with the one the B2BUA uses to send RTP packets to each sender participant, and the SSRC information in all the block types that include it, which MUST be rewritten using the related sender participant(s) SSRC. If the B2BUA has also changed the base RTP sequence number when forwarding RTP packets, then this change needs to be properly addressed in the 'begin_seq' and 'end_seq' fields that are available in most of the Report Block types that are part of the XR specification.
Receiver Summary Information (RSI):
If the B2BUA has changed any SSRC in any direction, it MUST update the SSRC-related information in the incoming RSI message header before forwarding it. This includes the distribution source SSRC, which MUST be rewritten with the one the B2BUA uses to send RTP packets to each sender participant, the summarized SSRC and, in case a Collision Sub-Report Block is available, the SSRCs in the related list.
Port Mapping (TOKEN):
If the B2BUA has changed any SSRC in any direction, it MUST update the SSRC-related information in the incoming TOKEN message before forwarding it. This includes the Packet Sender SSRC, which MUST be rewritten with the one the B2BUA uses to send RTP packets to each sender participant, and the Requesting Client SSRC in case the message is a response, which MUST be rewritten using the related sender participant(s) SSRC.
Feedback messages:
All Feedback messages have a common packet format, which includes the SSRC of the packet sender and the one of the media source the feedack is related to. Just as described for the previous messages, these SSRC identifiers MUST be updated if the B2BUA has changed any SSRC in any direction. It MUST NOT, though, change a media source SSRC that was originally set to zero. Besides, considering that many feedback messages also include additional data as part of their specific Feedback Control Information (FCI), a media-aware B2BUA MUST take care of them accordingly, if it can parse and regenerate them, according to the following guidelines.
Besides the common packet format management for feedback messages, a media-aware B2BUA MUST also properly rewrite the Packet ID (PID) of all addressed lost packets in the NACK FCI if it changed the RTP sequence numbers before forwarding a packet.
Besides the common packet format management for feedback messages, a media-aware B2BUA MUST also properly rewrite the additional SSRC identifier all those messages envisage as part of their specific FCI if it changed the related RTP SSRC of the media sender.
Besides the common packet format management for feedback messages, a media-aware B2BUA MUST also properly rewrite the additional SSRC identifier(s) REMB packets envisage as part of their specific FCI if it changed the related RTP SSRC of the media sender.

Apart from the generic guidelines related to Feedback messages, no additional modifications are needed for PLI, SLI and RPSI feedback messages instead.

Of course, the same considerations about the need for SDP and RTP/RTCP information to be coherent also applies to media-aware B2BUAs. This means that, if a B2BUA is going to change any SSRC, it SHOULD update the related 'ssrc' attributes if they were present in the original description before sending it to the recipient, just as it MUST rewrite the 'rtcp' attribute if provided. At the same time, the ability for a media-aware B2BUA to inspect/modify RTCP packets may also mean such a B2BUA may choose to drop RTCP packets it can't parse: in that case, a media-aware B2BUA SHOULD also advertize its RTCP level of support in the SDP in a coherent way, in order to prevent, for instance, a UAC to make use of NACK messages that would never reach the intended recipients.

3.3. Media Terminator

A Media Terminator B2BUA, unlike simple relays and media-aware ones, is also able to terminate media itself, that is taking care of RTP payloads as well and not only headers. This means that such components, for instance, can act as media transcoders and/or originate specific RTP media. Using the RTP Topologies terminology, this can be seen as a RTP Media Translator. Such a capability makes them quite different from the previously introduced B2BUA typologies, as this means they are going to terminate RTCP as well: in fact, since the media is terminated by themselves, the related statistics and feedback functionality can be taken care directly by the B2BUA, and does not need to be relayed to the other participants in the multimedia session.

For this reason, no specific guideline is needed to ensure a proper end-to-end RTCP behaviour in such scenarios, mostly because most of the times there would be no end-to-end RTCP interaction among the involved participants at all, as the B2BUA would terminate them all and take care of them accordingly. Nevertheless, should any RTCP packet actually need to be forwarded to another participant in the multimedia session, the same guidelines provided for the media-aware B2BUA case apply.

4. Media Path Security

The discussion made in the previous sections on the management of RTCP messages by a B2BUA has so far mostly worked under the assumption that the B2BUA has actually access to the RTP/RTCP information itself. This is indeed true if we assume that plain RTP and RTCP is being handled, but this may not be true once any security is enforced on RTP packets and RTCP messages by means of SRTP [RFC3711], whether the keying is done using Secure Descriptions [RFC4568] or DTLS-SRTP [RFC5764].

While typically not an issue in the Media Relay case, where RTP and RTCP packets are forwarded without any modification no matter whether security is involved or not, this could definitely have an impact on Media-aware Relays and Media Terminator B2BUAs. To make a simple example, if we think of a SRTP/SRTCP session across a B2BUA where the B2BUA itself has no access to the keys used to secure the session, there would be no way to manipulate SRTP headers without violating the hashing on the packet; at the same time, there would be no way to rewrite the RTCP information accordingly either, as most of the packet (especially when RTCP compound packets are involved) would be encrypted.

For this reason, it is important to point out that the operations described in the previous sections are only possible if the B2BUA has a way to effectively manipulate the packets and messages flowing by. This means that, in case media security is involved, the B2BUA willing to act as either a Media-aware Relay or a Media Terminator can only do so when acting as an intermediary with respect to the secure sessions. As such, different secure sessions would need to be negotiated (either via SDES or DTLS-SRTP) with the involved participants, in order to be able to have access to the unencrypted packets and, if needed, modify them before encrypting them again and forwarding them.

Of course, this is only a viable solution if all the involved participants trust the B2BUA. In fact, it is very important to point out that a B2BUA acting as an intermediary would break any end-to-end security mechanism that may be in place, as all the involved participants would have a secure communication up to the B2BUA and would have to rely on the B2BUA actually encrypting the communication on the other end(s) as well. This means that the participants involved in a multimedia session through a B2BUA have to trust the B2BUA to secure the session on the other end(s), taking care of any validation and protection that may be required as part of the process.

It is worth noting that some additional care may be needed, with respect to RTCP, when acting as an intermediary between two secure sessions. Specifically, issues may arise when relaying NACK feedback originated by a user who failed to receive some RTP packets, that were instead received by the B2BUA: such an issue might occur when the packet loss happens between the user and the B2BUA itself, and not between the RTP packet sender and the B2BUA. Such a situation might result in the RTP media sender retransmitting the encrypted packet, which would then be rejected by the B2BUA as a replayed one. However, this issue is well known and addressed in [RFC4588], which both the B2BUA and the involved participants in the communication SHOULD make use of to prevent it from happening. This mechanism allows for a Redundancy RTP Stream to be used for the purpose, which would prevent the replay error. Of course, all recommendations given previously with respect to managing SSRCs across the B2BUA still apply here as well: in fact, such a redundant RTP stream would make use of a different SSRC, that would need to be taken care of both at the RTP and the SDP level. If [RFC4588] is not supported, the B2BUA SHOULD handle NACK packets directly, and only forward feedback on lost packets it has not access to.

5. IANA Considerations

This document makes no request of IANA.

6. Security Considerations

TBD. Not any additional consideration to what the standards already give? Probably this section will need a few words about how NOT following the guidelines can lead to security issues: e.g., not properly translating REMB messages can cause an increasing flow of media packets, that may be seen as attacks to devices that can't handle the amount of data.

7. Change Summary

Note to RFC Editor: Please remove this whole section.

The following are the major changes between the 01 and the 02 versions of the draft:

The following are the major changes between the 00 and the 01 versions of the draft:

8. Acknowledgements


9. References

9.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.
[RFC4566] Handley, M., Jacobson, V. and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.
[RFC7092] Kaplan, H. and V. Pascual, "A Taxonomy of Session Initiation Protocol (SIP) Back-to-Back User Agents", RFC 7092, December 2013.

9.2. Informative References

[RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, January 2008.
[I-D.ietf-avtcore-rtp-topologies-update] Westerlund, M. and S. Wenger, "RTP Topologies", Internet-Draft draft-ietf-avtcore-rtp-topologies-update-00, April 2013.
[I-D.ietf-avtext-rtp-grouping-taxonomy] Lennox, J., Gross, K., Nandakumar, S. and G. Salgueiro, "A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources", Internet-Draft draft-ietf-avtext-rtp-grouping-taxonomy-02, June 2014.
[I-D.alvestrand-rmcat-remb] Alvestrand, H., "RTCP message for Receiver Estimated Maximum Bitrate", Internet-Draft draft-alvestrand-rmcat-remb-03, October 2013.
[RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C. and J. Rey, "Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 2006.
[RFC5104] Wenger, S., Chandra, U., Westerlund, M. and B. Burman, "Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)", RFC 5104, February 2008.
[RFC5576] Lennox, J., Ott, J. and T. Schierl, "Source-Specific Media Attributes in the Session Description Protocol (SDP)", RFC 5576, June 2009.
[RFC3605] Huitema, C., "Real Time Control Protocol (RTCP) attribute in Session Description Protocol (SDP)", RFC 3605, October 2003.
[RFC3611] Friedman, T., Caceres, R. and A. Clark, "RTP Control Protocol Extended Reports (RTCP XR)", RFC 3611, November 2003.
[RFC5760] Ott, J., Chesterfield, J. and E. Schooler, "RTP Control Protocol (RTCP) Extensions for Single-Source Multicast Sessions with Unicast Feedback", RFC 5760, February 2010.
[RFC6284] Begen, A., Wing, D. and T. Van Caenegem, "Port Mapping between Unicast and Multicast RTP Sessions", RFC 6284, June 2011.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E. and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004.
[RFC4568] Andreasen, F., Baugher, M. and D. Wing, "Session Description Protocol (SDP) Security Descriptions for Media Streams", RFC 4568, July 2006.
[RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer Security (DTLS) Extension to Establish Keys for the Secure Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.
[RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V. and R. Hakenberg, "RTP Retransmission Payload Format", RFC 4588, July 2006.

Authors' Addresses

Lorenzo Miniero Meetecho EMail:
Sergio Garcia Murillo Medooze EMail:
Victor Pascual Quobis EMail: