[Docs] [txt|pdf] [Tracker] [Email] [Nits]

Versions: 00 01 02

Network Working Group                                         H. Kaplan
Internet Draft                                              Acme Packet
Intended status: Informational                         October 24, 2011
Expires: April 21, 2011



                   Requirements for Interworking RTCWeb
                       with Current SIP Deployments
           draft-kaplan-rtcweb-sip-interworking-requirements-00


Abstract

   The IETF RTCWEB WG has been discussing how to interwork with
   deployed SIP equipment and domains.  Doing so may require an
   Interworking Function middlebox in the media-plane.  This document
   lists some RTCWeb-to-SIP use-cases, the RTCWeb requirements to
   support such, and the complexity involved in interworking if the
   requirements cannot be met.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with
   the provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 24, 2011.

Copyright and License Notice





Kaplan, et al           Expires April 24, 2011                [Page 1]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document.  Code Components extracted from this
   document must include Simplified BSD License text as described in
   Section 4.e of the Trust Legal Provisions and are provided without
   warranty as described in the BSD License.

Table of Contents

   1. Terminology...................................................3
   2. Introduction..................................................3
   3. Existing SIP/RTP Devices......................................4
      3.1. SIP/RTP Devices in Enterprises...........................4
      3.2. SIP/RTP Devices in Service Providers.....................5
      3.3. The Need for an Interworking Function....................5
   4. RTCWeb-SIP Interworking Architecture..........................6
      4.1. Interworking Function Goal: Lower Cost...................7
      4.2. Potential Interworking Functions and Complexity..........7
         4.2.1 ICE Termination......................................7
         4.2.2 SRTP Termination.....................................8
         4.2.3 RTP/RTCP Stream Multiplexing.........................8
         4.2.4 Multi-media Stream Multiplexing......................8
         4.2.5 RFC-4733 DTMF Generation.............................8
         4.2.6 RTCP Generation......................................9
         4.2.7 Transcoding and Transrating..........................9
   5. RTCWeb-SIP Interworking Use-cases............................10
      5.1. Basic Audio-Telephony Call..............................10
      5.2. Secure Basic Calls......................................10
      5.3. Conference Call in SIP Domain...........................11
      5.4. Call Hold and Mute in RTCWeb and SIP Domains............12
         5.4.1 Legacy Call-Hold Devices Impacting RTCP..............12
         5.4.2 RTP Generation when on Hold or Mute..................12
         5.4.3 Clipping with Off-hold/off-mute......................13
      5.5. Call Transfer in SIP Domain.............................13
      5.6. Audio/Video Call Transfer...............................14
      5.7. Find-Me-Follow-Me in SIP Domain.........................15
      5.8. Video in SIP Domain.....................................16
         5.8.1 Video and SIP/SDP....................................16
         5.8.2 Video Codec Compatibility............................16
         5.8.3 Separate Video RTP Stream............................16
         5.8.4 Video RTP Packet Size................................16
   6. Signaling-plane Interworking Requirements....................17
   7. Media-plane Interworking Requirements........................18


Kaplan                   Expires - April 2011                 [Page 2]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   8. Security Considerations......................................20
   9. IANA Considerations..........................................20
   10. Acknowledgments.............................................21
   11. References..................................................21
      11.1. Informative References..................................21
   Author's Address.................................................22


1. Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119.  The
   terminology in this document conforms to RFC 2828, "Internet
   Security Glossary".

   Browser: an Internet World-Wide-Web/HTTP browser capable of
   executing JavaScript/ECMAScript, with an RTCWeb RTP Library and
   associated WebRTC API.

   Web-Server: an HTTP/S server capable of hosting JavaScript to
   Browsers, as well as execute local code (e.g., PHP).

   RTCWeb Client: the combination of Browser and JavaScript on the
   user's host system.

   RTP-Peer: another device communicating RTP/RTCP directly with the
   local Client.

2. Introduction

   One of the desired use-cases for the RTCWeb architecture is to be
   able to communicate from RTCWeb applications to existing deployed
   SIP/RTP-based Voice/Video-over-IP devices in the signaling and
   media-planes.  This document assumes such deployed devices
   communicate using SIP at a signaling layer, but other protocols may
   be possible such as XMPP or H.323.

   For the signaling layer, it is assumed the Web-Server will have to
   play a role in interworking with the SIP world, either using an
   integrated Web Server module or separate signaling gateway.  In
   either case it should be possible to communicate with deployed SIP
   devices at a SIP and SDP layer.

   For the media-plane, however, the preference expressed thus far in
   this WG is that direct communication at an IP layer between the
   Browser and existing SIP devices be possible, without requiring a
   media-plane gateway.  Doing so with most deployed SIP devices might
   be impossible, depending on what requirements are imposed on RTCWeb


Kaplan                   Expires - April 2011                 [Page 3]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   Browsers.  An Interworking Function in the media-plane might be
   required, deployed by either the RTCWeb domain or the SIP domain.

   The goal of this document is to summarize the use-cases for
   communicating with deployed SIP devices and domains, and capture the
   requirements necessary to do so without using an Interworking
   Function, or to minimize its cost/complexity.  The impacts or
   difficulties with various Interworking Function needs are also
   discussed, in order to try to minimize the cost and complexity of
   using them.

   For those readers wishing to skip the background, the requirements
   can be found in sections 6 and 7.  Note that some of the
   requirements are already documented and achieved in current IETF
   RTCWEB and W3C WEBRTC Working Group drafts; some are likely
   unachievable.  This document simply lists what must be done, so that
   the Working Groups can discuss and decide if and how they can be
   done.

3. Existing SIP/RTP Devices

   This document covers two large groups of existing SIP and RTP
   devices that the WG should focus on communicating with: those in
   Enterprises, and those in Service Providers.

   It is extremely difficult, and undoubtedly contentious, to
   generalize existing SIP devices as having a common set of
   capabilities - they do not.  Some SIP devices implement ICE and
   iLBC, for example, while others do not even generate RTCP and only
   support G.711.  For example, there are several software-based SIP
   User Agents (i.e., softphones) which implement ICE, but virtually no
   PSTN/TDM Gateways do, very few PBXs do, very few media servers do,
   etc.

3.1. SIP/RTP Devices in Enterprises

   The Enterprise market includes PBXs, desk-phones, conference
   bridges, conference phones, soft-phones, PRI gateways, voicemail
   servers, IVR systems, and recording systems.  There are millions of
   RTP devices already deployed in Enterprises today; some are
   upgradeable, some are not.

   Even for those devices that are upgradeable, it is difficult to
   require upgrading them all at once; or require upgrading devices
   that are already working today, simply in order to communicate with
   RTCWeb-enabled Browsers.  An Enterprise that uses RTCWeb-based Web
   Applications itself would be more incented to do so, or be willing
   to deploy an Interworking Function to do so, but not an Enterprise



Kaplan                   Expires - April 2011                 [Page 4]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   that just happens to be the far-end peer for a voice/video call of
   an RTCWeb Application provided by someone else.

   If an Interworking Function is required to communicate with deployed
   Enterprise SIP devices, it is likely that the Enterprises that
   deploy RTCWeb-enabled applications, or RTCWeb Application providers
   wishing to communicate with SIP Enterprises, be the ones to deploy
   the Interworking Functions - not the SIP Enterprises with deployed
   SIP devices.  Therefore, it is beneficial for the RTCWEB WG to
   minimize the cost of such Interworking Functions, or not need any to
   begin with.

3.2. SIP/RTP Devices in Service Providers

   The SIP Service Provider market represents an enormous population of
   users and applications reachable through SIP and RTP.  There are
   over 100 Million deployed RTP devices in Service Providers, but more
   importantly approximately 5 Billion mobile phones, 1.5 Billion
   landlines, and an untold number of PRI PBX trunks, all reachable
   through SIP/RTP gateways or hosts in SIP Service Providers.  When
   compared to only about 2 Billion IP hosts on the public Internet, it
   becomes clear why connecting to existing RTP devices through SIP
   Service Providers is desirable.

   Unfortunately, many of the deployed RTP devices are not upgradeable
   to change behavior to match RTCWeb: some of them are from
   manufacturers that no longer exist or have stopped providing
   enhancements for them; some are incapable of performing new codecs,
   ICE, or RTCP due to hardware limitations; and in many cases a SIP
   call will transit through the Service Provider to another Provider
   or to an Enterprise, and the final RTP endpoint is not under the
   local Service Provider's control to upgrade.

   If an Interworking Function is required to communicate with deployed
   Service Provider SIP devices, it is likely that the Service
   Providers that deploy RTCWeb-enabled applications, or RTCWeb
   Application providers wishing to communicate with SIP Service
   Providers, be the ones to deploy the Interworking Functions - not
   the SIP Service Providers with deployed SIP devices.  Therefore, it
   is beneficial for the RTCWEB WG to minimize the cost of such
   Interworking Functions, or not need any to begin with.


3.3. The Need for an Interworking Function

   While the best-case scenario is one in which no Interworking
   Function is needed, it is likely one will be needed for many SIP
   deployments based on the current requirements and limitations in
   both RTCWeb and SIP-based devices.


Kaplan                   Expires - April 2011                 [Page 5]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011



   For example, because the Javascript in Browsers cannot be fully
   trusted, a means of peer-consent must be used in the media-plane
   before the Browser can be allowed to send RTP packets.  The
   currently proposed means of establishing such peer-consent is ICE
   using the STUN connectivity checks, whereby the STUN responses
   implicitly prove peer consent.  An RTCWeb Browser cannot allow
   session media to be used unless the peer uses ICE.  Since many SIP-
   based devices do not support ICE, and will not be upgraded to do so
   for the reasons described previously, an ICE-interworking device is
   needed.

4. RTCWeb-SIP Interworking Architecture

   Due to the issues described in section 3, there will likely be a
   need for an interworking function in the signaling or media-plane.
   Therefore, this document assumes an RTCWeb-SIP interworking
   architecture similar to Figure 1 below:

          RTCWeb domain           |             SIP domain
          +-----------+     +-----------+     +-----------+
          |           |     |           |     |           |
          |   Web     | SIP |    SIP    | SIP |   SIP     |
          |           |-----|   Inter-  |-----|           |
          |  Server   |     |  working  |     | User-Agent|
          |           |     |  Function |     |           |
          +-----------+     +-----------+     +-----------+
               /                  |                 \
              /                   |                  \
             /                    |                   \
            /                     |                    \
           / Proprietary over     | Logical or          \Logical or
          / HTTP/Websockets       | Physical API         \Physical API
         /                        |                       \
   +-----------+                  |                        \
   |JS/HTML/CSS|                  |                         \
   +-----------+                  |                          \
   +-----------+            +-----------+             +-----------+
   |           |            |  Media-   |             |           |
   |           |            |   plane   |             |   Media   |
   |  Browser  | -----------|  Inter-   |-------------|   Agent   |
   |           |            |  working  |             |           |
   |           |            |  Function |             |           |
   +-----------+            +-----------+             +-----------+

            Figure 1: RTCWeb-SIP Interworking Architecture

   Note that the "SIP Interworking Function" is a logical function; it
   may be a separate physical device, or it may be built into the Web


Kaplan                   Expires - April 2011                 [Page 6]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   Server or the SIP User Agent (UA).  Likewise, "Media-plane
   Interworking Function" is a logical function which may be a physical
   device or built into the Media Agent, and the vertical lines may be
   logical internal APIs or external physical protocols.

   The SIP and Media-plane Interworking Functions may be deployed by
   the RTCWeb domain administrator or the SIP domain administrator.

4.1. Interworking Function Goal: Lower Cost

   One of the main goals of this document is to provide requirements
   for interworking based on the desire for the least cost and
   complexity.  Determining cost is difficult because it depends in
   large extent on device implementation specifics and other cost
   factors that are not universally applicable.  Even seemingly
   unrelated costs, such as cost of space or power, have an impact on
   the costs of interworking RTCWeb and SIP.  This document, however,
   makes assumptions regarding cost that the author believes to be
   generally accurate, based on the assumption that the more
   complicated a function is, the more it costs.

   Even if one uses free software to perform all of the interworking
   functions, there is a cost burden tied to CPU, memory, and
   potentially bandwidth uses.  If a function takes more CPU
   instructions to perform, for example, then it will take more CPUs to
   perform it for the same number of sessions.  Thus it is more
   expensive.


4.2. Potential Interworking Functions and Complexity

   It is impossible to document the relative monetary costs of the
   different interworking functions that may need to occur, because
   they differ by manufacturer and system architecture.  This section
   highlights some of the complexities involved with the different
   interworking functions that may need to be used, because complexity
   usually translates to cost (though not always).

4.2.1     ICE Termination

   If the Interworking Function has to terminate ICE (i.e., be an ICE
   agent on behalf of the real SIP endpoint), this involves following
   the procedures in [ICE], including calculating SHA-1 for each STUN
   message, checking every UDP packet received during the lifetime of
   the session to see if it is a STUN request or indication rather than
   RTP, RTCP, or other message, and responding to STUN requests during
   ICE restarts.  Being an ICE-Lite agent is often simpler than being
   an ICE-Full agent, however, because of the simpler logic and lack of
   timers.


Kaplan                   Expires - April 2011                 [Page 7]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011



4.2.2     SRTP Termination

   If the Interworking Function has to terminate SRTP (i.e.,
   encrypt/decrypt SRTP on behalf of the real SIP endpoint), this
   involves performing encryption/decryption and authentication
   algorithms on every RTP/RTCP packet in both send/receive directions.

   It should be noted that if SRTP is required to be used for every
   call by RTCWeb but the [SDES] key exchange model cannot be used on
   the RTCWeb side, then the Interworking Function likely has to
   terminate SRTP from RTCWeb even if the SIP-domain supports SRTP,
   because [SDES] is the most commonly used form of key exchange in SIP
   today.

4.2.3     RTP/RTCP Stream Multiplexing

   If the Interworking Function has to multiplex/de-multiplex RTP and
   RTCP on the same 5-tuple, this involves checking every received
   packet for the RTP vs. RTCP header format and de-multiplexing them
   onto separate 5-tuple flows, and in the other direction taking
   packets from two 5-tuple flows and sending them on the same 5-tuple
   set.

   In some interworking system architectures, such a mux/demux function
   would be trivial, or even simpler to do than not do due to the
   reduction in number of ICE flows to terminate.  Therefore this
   document recommends it be possible to perform such muxing separately
   from the media-type muxing described in the next sub-section 4.2.4.

4.2.4     Multi-media Stream Multiplexing

   If the Interworking Function has to multiplex/de-multiplex RTP/RTCP
   for audio and video streams on the same 5-tuple, the behavior
   depends on how such multiplexing is defined.  If the 5-tuple
   multiplexing means they're all part of the same RTP session, then
   de-multiplexing them is very complicated; if multiplexing means
   they're all separate RTP/RTCP sessions and use some fixed header-
   field mode of separation, then mux/demux is likely far simpler.

4.2.5     RFC-4733 DTMF Generation

   If the Interworking Function has to generate [RFC4733] DTMF event
   RTP packets to the SIP-domain side, this involves keeping track of
   RTP timestamps and sequence numbers, and inserting the appropriate
   sequence of [RFC4733] packets, etc.  If SRTP is also used, then the
   Interworking Function has to terminate SRTP to be able to insert
   [RFC4733] events.



Kaplan                   Expires - April 2011                 [Page 8]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


4.2.6     RTCP Generation

   Because some SIP audio-only RTP endpoints do not generate RTCP, if
   RTCWeb requires receiving RTCP for calls to continue, then the
   Interworking Function has to generate RTCP on behalf of them.  This
   is only a known issue for audio calls.

   Unfortunately, generating fake RTCP is more complicated than most
   people realize.  The SDP in SIP does not indicate whether an
   endpoint will generate RTCP - it is implicitly assumed in the AVP
   profile.  Therefore, the Interworking Function will have to check
   every packet from the SIP-domain side to detect an RTCP message; if
   it does not see one for a certain period of time, it will need to
   generate one.  The RTCP messages it generates will need to appear to
   be true RTCP messages, and thus contain information for both sender
   and receiver reports, DLSR, SSRCs, etc.  It will need to continue to
   check every packet throughout the call and use expiration timers,
   because the call could be silently transferred as described in
   section 5.6, resulting in a new RTP endpoint that does generate RTCP
   on its own.

   Furthermore, it will have to terminate SRTP as well even if the SIP-
   domain side supports SRTP, in order to be able to generate the fake
   RTCP messages.  Even though it may appear unlikely that an RTP
   endpoint that would support SRTP does not support RTCP, as far as
   the Interworking Function knows that could be the case.  In fact,
   it's not unlikely to be the case, because middleboxes perform SRTP
   on behalf of endpoints today, without generating RTCP on their
   behalf.  For example, the call may be from an RTCWeb Browser to the
   Interworking Function deployed by the RTCWeb domain owner, to a
   Service Provider with an SBC performing SRTP termination, and then
   on to a PSTN gateway that does not generate RTCP (and some don't).

   It is also possible that generating RTCP might actually require
   transcoding in some system architectures, which would not only be
   prohibitively expensive but also increase delay for RTP.

4.2.7     Transcoding and Transrating

   If the Interworking Function has to perform transcoding, it is
   likely the most expensive function described in this document.
   Transcoding is typically performed in DSPs, which are expensive and
   consume significant power and heat in large scale.  DSP technology
   has improved over the years in terms of cost and density, but it is
   still one of the most expensive components of interworking.  It also
   impacts call quality at an audio level, as well as introduces delay
   at an RTP level.  For video, video transcoding DSPs exist, of
   course, but scale far worse than audio transcoding.



Kaplan                   Expires - April 2011                 [Page 9]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   Transrating (converting from one packetization rate to another) is
   typically simpler and cheaper than transcoding, but still requires
   terminating SRTP, RTP, and typically RTCP.  It can sometimes be done
   without using DSP technology, however, reducing the cost.

5. RTCWeb-SIP Interworking Use-cases

   Although [draft-use-cases] covers general use-cases, there are no
   specific use-cases which drive requirements for interworking with
   already-deployed SIP domains and their RTP endpoints.  This section
   provides such use-cases.

5.1. Basic Audio-Telephony Call

   An RTCWeb domain user should be able to generate and receive audio-
   based sessions with currently deployed SIP Enterprise and Service
   Provider domains.  The author assumes the SIP aspects for a basic
   call will "just work" or be easily inter-workable, but the media-
   plane issues are as follows:
   1) Most RTP endpoints do not support ICE.
   2) Many RTP endpoints do not generate RTCP.
   3) Most RTCP-capable endpoints only support RTCP on a separate UDP
     port (i.e., the +1 odd number).
   4) Most RTP endpoints do not support SRTP.
   5) Most SRTP-capable endpoints only support [SDES] key exchange.

   The above limitations drive some of the requirements in section 7,
   although it may not be possible to meet all of the requirements due
   to RTCWeb security issues.

5.2. Secure Basic Calls

   An RTCWeb domain user should be able to generate and receive calls
   with protection from eavesdropping and impersonation, to/from
   currently deployed SIP Enterprise and Service Provider domains.  For
   example an RTCWeb user should not be concerned about eavesdropping
   or impersonation when using their laptop in public WiFi networks, or
   at an IETF meeting, if their call goes to/from a SIP domain;
   likewise a SIP-based user should not be concerned about it if their
   call goes to/from an RTCWeb domain.

   Despite issue (4) in section 5.1 that most deployed RTP endpoints do
   not support SRTP, the majority of ones that do support it are SIP
   devices that are used from outside of the Enterprise or Service
   Provider's physical network, such as software-clients.  Within the
   physical network (or VPN) most Enterprises and Service Providers
   feel there is sufficient difficulty in eavesdropping and
   impersonation that the benefits of not using SIP/TLS and SRTP



Kaplan                   Expires - April 2011                [Page 10]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   outweigh the risks; but beyond their or their trusted partners'
   physical network(s) or VPNs there is not.

   Therefore, SIP Enterprises and Service Providers may well *require*
   SRTP be used in basic call scenarios with other RTCWeb-application
   domains.  The way they handle such calls today, however, is by using
   middleboxes to terminate SRTP and [SDES] based keying through secure
   signaling (either SIP/TLS or SIP over IPsec).  If [DTLS-SRTP] is
   required to be used, then the RTCWeb's Interworking Function will
   have to interwork that to SRTP using [SDES], which will then likely
   be terminated somewhere on the SIP Service Provider or Enterprise
   side.  This would be expensive for the RTCWeb provider, and provide
   dubious additional security beyond simply doing [SDES] in RTCWeb.
   In order to provide [SDES] in the Browser in a useful manner,
   however, it needs to be secured with HTTPS to the Web Server.

5.3. Conference Call in SIP Domain

   An RTCWeb domain user should be able to call a SIP Enterprise or
   Service Provider-reachable conference bridge, IVR services, make
   credit-card-based toll calls, and access such things as their
   voicemail, when the media server is in an Enterprise or Service
   Provider's SIP domain.  Typically such services are based on DTMF
   event indications.

   One means of generating DTMF events is using SIP messages, such as
   KPML [RFC4730] or SIP INFO messages, and it is assumed that such
   mechanisms would be possible in an RTCWeb context without new
   requirements.  Many deployed SIP/RTP systems, however, rely on DTMF
   events to be indicated in RTP using [RFC4733] event packets.

   The ability to interwork SIP-based DTMF indications, including KPML,
   to [RFC4733] DTMF events is already supported by some interworking
   manufacturers, but it adds complexity.  For example if SRTP is used,
   handling DTMF interworking will require the Interworking Function to
   also perform SRTP termination.  An alternative solution is to
   provide the means for both a Javascript-driven signaling-plane
   indication (which likely already exists), as well as a Javascript-
   driven media-plane [RFC4733] method in the Browser.

   It should be noted that some deployed systems only use DTMF in-band
   as tones in G.711 audio.  This is a much smaller population of
   deployed media servers, however, than it is of clients, and thus the
   author believes may not be an issue for RTCWeb.  In other words,
   most servers that need to process received DTMF events also support
   [RFC4733], whereas some endpoints can only generate DTMF in-band;
   since the use-case involves RTCWeb Browsers generating DTMF to
   deployed SIP media servers, rather than deployed SIP endpoints
   generating DTMF to RTCWeb Browsers, this is likely a non-issue.


Kaplan                   Expires - April 2011                [Page 11]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011



5.4. Call Hold and Mute in RTCWeb and SIP Domains

   An RTCWeb domain user should be able to call to, or be called by, a
   SIP Enterprise or Service Provider and put their call on hold or
   mute, and un-hold/un-mute it at any time; and have their call put on
   hold/mute by the SIP side.

   This use-case may seem obvious and non-problematic, since SDP has
   direction attributes to indicate inactive/sendonly/recvonly for such
   things.  A call-hold case, for example, is often performed by
   sending an SDP offer with a sendonly direction attribute and muting
   the local inputs.  There are subtle issues, however, depending on
   whether RTCP is required, as well as depending on the RTCWeb/WebRTC
   API design and architecture.

5.4.1     Legacy Call-Hold Devices Impacting RTCP

   From a legacy deployment perspective, there are still SIP devices
   which generate SDP with a connection address of 0.0.0.0 to indicate
   call hold, and expect to receive such to be put on-hold.  SIP B2BUA
   middleboxes already interwork such cases to/from an SDP sendonly or
   inactive direction mode, but the device receiving the SDP connection
   address of 0.0.0.0 will not generate RTCP until the call is taken
   off hold.  Therefore, if RTCWeb requires Browsers to receive RTCP as
   a consent-refresh to continue the call, the call will fail if it is
   put on hold too long.  To avoid the call failure, the Interworking
   Function may have to generate RTCP, which is complicated and thus
   expensive.

5.4.2     RTP Generation when on Hold or Mute

   Another potential issue depends on what the Browser does when
   Javascript tells it to put the session on mute (i.e., disable the
   microphone/camera inputs), or full hold (i.e., also stop rendering
   received media).  If the Browser stops generating RTP, but does not
   send SDP to the SIP domain indicating such, the call may fail.

   The reason for this is that many SIP Enterprises and Service
   Providers have middleboxes in various locations, which detect an
   absence of RTP packets for a sendrecv-mode call as a call failure,
   and will tear the call down by issuing BYEs.  Therefore, if an
   RTCWeb user puts a call on mute or hold by no longer generating RTP
   but does not send SDP to the SIP domain indicating the appropriate
   direction attribute, the call will be terminated eventually by the
   SIP domain.





Kaplan                   Expires - April 2011                [Page 12]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   One way to avoid this is to offer the ability for the Javascript to
   tell the Browser to turn off the microphone/camera inputs, while
   still generating RTP packets.

5.4.3     Clipping with Off-hold/off-mute

   Another issue is the clipping that can occur when taking a call off
   hold or off mute.  If the SIP user puts a call on hold, and a new
   SDP Offer is sent with a direction attribute of sendonly, and some
   time later the user takes the call off hold, it will take some time
   to get a new SDP Offer to the RTCWeb side Browser; the extra time it
   takes may cause clipping: the RTCWeb user will be able to hear/see
   but not speak/be-seen for a bit.  Likewise for the reverse
   direction: if the RTCWeb user puts the call on/off-hold.

   In SIP, this generally doesn't take too long because the signaling
   is over UDP, on managed networks, going through tightly managed
   servers.  In RTCWeb, it will likely be over lossy access mediums,
   over TCP, across the public Internet, and through Web Servers
   performing a lot of other functions.  A clever Web-Application
   developer, therefore, might realize that clipping can be avoided by
   not notifying the Browser of any direction change when the call is
   put on hold from SIP; such a developer could have the Javascript
   change the SDP Offer before giving it to the Browser, to be
   sendrecv.  What's needed, then, is the ability to tell the Browser
   not to render received from the on-hold Browser and not send it to
   the peer, so the peer never stops sending RTP to the on-hold
   Browser; or the developer could be even too clever and send the
   direction information separately in a direct data channel, for
   example.

5.5. Call Transfer in SIP Domain

   An RTCWeb domain user should be able to call to, or be called by, an
   Enterprise or Service Provider and have their call transferred to
   another user in the same or different Enterprise or Service
   Provider.

   In the SIP signaling architecture model, this should either require
   the SIP domain to issue a REFER request to the RTCWeb domain's
   logical SIP UA, to tell the logical UA to generate an INVITE to the
   new party; or it should require the SIP domain to issue an INVITE
   with Replaces header to the RTCWeb domain's logical SIP UA, to
   replace the original dialog.  In the former case, it requires the
   RTCWeb application to issue a new SDP Offer for a new session; in
   the latter case it causes the RTCWeb application to receive an SDP
   Offer for a new session.  In both cases, however, the general
   expectation of users is that the media impacts are minimal or non-
   existent: they may hear a short-duration click or nothing at all


Kaplan                   Expires - April 2011                [Page 13]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   when the audio party changes.  Likewise they would probably expect
   to see the new transferred-to party in the same video window.

   In practice for audio-only calls, it is quite common for the SIP
   transfer to occur without the transferred UA being aware of it, by
   having the REFER and INVITE signaling from the
   transferor/transferred-to be locally processed by B2BUAs, such as a
   PBX, Application Server or SBC.  It is not very common, for example,
   to send REFER or INVITE with Replaces-header SIP Requests across SIP
   Enterprise-to-Service-Provider trunks or between Service Providers.
   In practice, therefore, SIP and SDP signaling may not be sent to the
   RTCWeb domain for this call transfer use-case.

   The RTP media source will change inside the Enterprise or Service
   Provider, of course, but the change is hidden by the transfer-
   processing B2BUA, at least at an IP:port transport layer.  At an
   audio codec and RTP layer, however, the change is frequently not
   hidden, and the result is the transferred party suddenly starts
   receiving RTP/RTCP packets from a new SSRC, sequence number space,
   timestamp, CNAME, etc.  The same Payload Type and codec is used, of
   course.  Naturally, this assumes SRTP is not used or not used end-
   to-end (i.e., it may be terminated at the transfer-processing
   B2BUA).

   From an RTCWeb interworking perspective, what this means is that the
   Browser has to be able to receive a new SSRC and timestamp/sequence
   number space from the Interworking Function, without receiving a new
   SDP Offer, without changing SRTP keys, and without ICE re-
   negotiation.

   Note that this use-case describes Call Transfer cases, but similar
   media-plane behavior sometimes occurs in Call Park and Pickup, Find-
   Me-Follow-Me, Call Hunting, Rich-Ringtone, and Voicemail fallback
   cases.

5.6. Audio/Video Call Transfer

   An RTCWeb domain user should be able to call to, or be called by, an
   Enterprise or Service Provider and transfer their RTCWeb call to
   another user in the same or different RTCWeb domain, SIP Enterprise
   or Service Provider.  This is similar to the previous use-case but
   the RTCWeb user is now the transferor.

   In the SIP signaling architecture model, this should either require
   the RTCWeb domain to issue a REFER request to the SIP domain, to
   tell the logical UA to generate an INVITE to the new party; or it
   should require the SIP domain to issue an INVITE with Replaces
   header to the RTCWeb domain's logical SIP UA, to replace the
   original dialog.  In the former case, it requires the RTCWeb


Kaplan                   Expires - April 2011                [Page 14]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   application to issue a new SDP Offer for a new session; in the
   latter case it causes the RTCWeb application to receive an SDP Offer
   for a new session.  In both cases, however, the general expectation
   of users is that the media impacts are minimal or non-existent: they
   may hear a short-duration click or nothing at all when the audio
   party changes, and they likely expect the video rendering to replace
   the previous video in the same window, even though the incoming SDP
   Offer is for a new logical session.

5.7. Find-Me-Follow-Me in SIP Domain

   An RTCWeb domain user should be able to call to a SIP Enterprise or
   Service Provider and have their call find the target user in the
   same or different Enterprise or Service Provider, with a SIP Find-
   Me-Follow-Me service (FMFM).  FMFM service is similar to Call
   Hunting and Call Forwarding services, but with the caller hearing a
   "Please wait while we try to locate your party" type announcement
   message. (Note that Call Hunting and Call Forwarding services
   sometimes do this as well, in which case they're the same as FMFM)

   A common method of providing FMFM is for the SIP INVITE to be
   logically or physically forked to a media server that generates the
   announcement; the media server sends back a 18x response with an
   initial SDP Answer, and then when the final UAS is reached the UAS
   sends a 200 response with a final SDP Answer.  To the SIP UAC (i.e.,
   the Web Server), it often appears as a parallel-forked call case.
   Therefore the RTCWeb model must support forked SIP calls, with two
   or more SDP Answers for a given Offer.  It is likely that Web-
   Application developers will want this type of behavior as well, even
   for RTCWeb uses that do not go to SIP.

   From an SDP offer/answer perspective, this means RTCWeb needs to
   support multiple, provisional SDP Answers.  How it does so is beyond
   the scope of this document.

   From a media perspective, this means the RTCWeb Browser needs to be
   able to receive and render media from different IP/RTP peers on the
   same local listen IP:port at different times, without having
   generated nor received a new SDP Offer in-between.

   Note that this use-case describes FMFM cases, but similar media-
   plane behavior sometimes occurs in Call Park and Pickup, Call
   Hunting, Rich-Ringtone, and Voicemail fallback cases.

   It should also be noted that some media servers generate the
   announcement message without sending a provisional 18x response with
   SDP Answer.  Such servers won't function correctly with UAs behind
   NATs anyway, since an SDP Answer has to be sent to perform either
   ICE or SBC-type Latching; and many PSTN Gateways won't accept media


Kaplan                   Expires - April 2011                [Page 15]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   until they get an SDP Answer either.  Therefore, such media servers
   have issues even in SIP, and can be effectively ignored for the
   purposes of this document.

5.8. Video in SIP Domain

   An RTCWeb domain user should be able to make a video call to, or be
   called by, a SIP Enterprise or Service Provider.  While video is not
   nearly as ubiquitous in SIP as audio-only calls, it does exist and
   is a growing market, particularly now that most video-conferencing
   vendors (both terminals and MCUs) have shifted from H.323 to SIP.

5.8.1     Video and SIP/SDP

   From a SIP perspective there is nothing unique about this use-case;
   but from an SDP perspective some video MCUs use the [SDP-CAP-NEG]
   SDP capability negotiation mechanism.  The author believes this
   should not pose a problem for RTCWeb, as [SDP-CAP-NEG] is backwards-
   compatible with basic [SDP-CAP-NEG] SDP and reverts to using it.
   [Note: what are the impacts for video-conf calls if SDP-CAP-NEG is
   not used?  Video MCU vendors need to be consulted]

5.8.2     Video Codec Compatibility

   Codec compatibility is a concern because transcoding video codecs in
   the Interworking Function would be prohibitively expensive: DSPs
   don't scale well for video, and are very expensive.  If the
   currently used video codecs in SIP are all encumbered by royalties,
   then the author recognizes this may not be a solvable problem for
   Browsers.

5.8.3     Separate Video RTP Stream

   SIP-based video terminals/MCUs use separate RTP sessions, in
   separate UDP port numbers, for video vs. audio media.  Furthermore,
   some use separate video RTP sessions for separate cameras/screens,
   while some use the same one and de-multiplex using SSRC.
   [Note: this latter use is believed but not known by the author]

5.8.4     Video RTP Packet Size

   Video-codec RTP packet size is a concern if IP-layer fragmentation
   occurs, because many NATs and middleboxes discard IP fragments;
   otherwise they would have to re-assemble them to correctly process
   the whole UDP packet, and such re-assembly is processing intensive.
   Carrier Grade NATs (CGNs), consumer NATs, and Firewalls, have
   similar behavior, and thus this is an issue for RTCWeb video usage
   in general on the public Internet.



Kaplan                   Expires - April 2011                [Page 16]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   In particular, although video codecs can "fragment" themselves at
   the codec layer, in deployed SIP and H.323 uses it has been found
   that some devices don't do so, resulting in IP-fragmented packets
   that get dropped along the way.  Other devices constrain themselves
   to an IP MTU of 1500 bytes, without leaving overhead space for
   packet growth on the path, as can be caused by IPv4-to-IPv6
   conversion, IPsec tunneling/VPNs, SSL-VPNs, etc.  Unfortunately,
   path MTU discovery is not supported or used in practice.  Therefore,
   the Browser's maximum codec packet size needs to be carefully
   thought out.


6. Signaling-plane Interworking Requirements

      REQ-ID          DESCRIPTION
      ----------------------------------------------------------------
      A1-1            RTCWeb MUST provide a means for a sent
                      SIP SDP Offer to be forked and receive
                      multiple SDP Answers; how RTCWeb accomplishes
                      this internally is up to the RTCWEB WG,
                      and need not require SDP be used in RTCWeb.
      ----------------------------------------------------------------
      A1-2            RTCWeb MUST provide a means for a received
                      SIP SDP Offer to be Answered to a completion
                      state; i.e., that the SIP-side can know to
                      send a final SDP Answer back to the SIP domain,
                      either in a 200 OK or reliable provisional
                      response.
      ----------------------------------------------------------------
      A1-3            RTCWeb MUST provide a means for a received
                      session request to be requested without an SDP
                      Offer, and to send an SDP Offer from RTCWeb
                      back to the SIP side; i.e., that the SIP-side
                      can receive a SIP INVITE without SDP, and be
                      able to send back SDP Offer in a response.
      ----------------------------------------------------------------
      A1-4            RTCWeb MUST provide a means for the
                      Browser to indicate SRTP [SDES], [DTLS-SRTP],
                      or RTP optionally in SDP.  In other words
                      either [SDP-CAP-NEG] or some similar
                      mechanism, such as [draft-best-effort-srtp],
                      in order to make an SDP Offer that offers
                      both plaintext RTP and both types of SRTP key
                      exchanges.







Kaplan                   Expires - April 2011                [Page 17]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


7. Media-plane Interworking Requirements

      REQ-ID          DESCRIPTION
      ----------------------------------------------------------------
      A2-1            RTCWeb MUST provide a means for the
                      Browser to generate and receive RTP
                      and RTCP using UDP transport.
      ----------------------------------------------------------------
      A2-2            RTCWeb Browsers MUST support the ability
                      to use separate, distinct RTP sessions
                      on separate UDP ports for separate media
                      streams, such as audio vs. video.
      ----------------------------------------------------------------
      A2-3            RTCWeb Browsers SHOULD support the ability
                      to use the same UDP port for RTP and
                      RTCP of the same media type, without
                      needing to also multiplex media types
                      on the same UDP port.
      ----------------------------------------------------------------
      A2-4            RTCWeb SHOULD provide a means for the
                      Browser to generate and receive RTP
                      without having to perform ICE.
      ----------------------------------------------------------------
      A2-5            RTCWeb MUST provide a means for the
                      Browser to generate and receive RTP
                      with an ICE-Lite peer.
      ----------------------------------------------------------------
      A2-6            RTCWeb Browsers MUST support the
                      G.711 PCMU and PCMA codecs for 10,
                      20, and 30ms packetization times.
      ----------------------------------------------------------------
      A2-7            RTCWeb Browsers MUST support the
                      G.729, G.722, G.722.1, AMR, and AMR-WB codecs.
      ----------------------------------------------------------------
      A2-8            RTCWeb Browsers MUST support the
                      H.263 and H.263+ codecs.
      ----------------------------------------------------------------
      A2-9            RTCWeb Browsers MUST support the
                      H.264-AVC and SVC codecs for Baseline
                      profile.
      ----------------------------------------------------------------
      A2-10           RTCWeb Browsers MUST support a
                      minimum of QCIF, QSIF, CIF, and SIF
                      resolutions, and optionally higher.
      ----------------------------------------------------------------
      A2-11           RTCWeb Browsers MUST not generate
                      RTP or RTCP packets larger than 1460
                      bytes at an IP layer using UDP transport.
      ----------------------------------------------------------------


Kaplan                   Expires - April 2011                [Page 18]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


      A2-12           RTCWeb MUST provide a means for the
                      Browser to generate and receive RTP
                      without receiving RTCP, for at least the
                      G.711 PCMU and PCMA codecs.
      ----------------------------------------------------------------
      A2-13           RTCWeb MUST provide a means for the
                      Browser to generate [RFC4733] DMTF RTP
                      Events for at least the events 0-15, in
                      an audio-type RTP packet stream.
      ----------------------------------------------------------------
      A2-14           RTCWeb MAY provide a means for the
                      Browser to receive [RFC4733] DMTF RTP
                      Events for the events 0-15.
      ----------------------------------------------------------------
      A2-15           RTCWeb MUST provide a means for the
                      Javascript application to invoke [RFC4733]
                      DTMF events to be generated, and their
                      duration, with a default duration of 50ms.
                      In other words, the Javascript should be
                      able to tell the Browser to generate event
                      "0" for 50ms based on a button click, for
                      example.
      ----------------------------------------------------------------
      A2-16           RTCWeb MUST provide a means for the
                      Javascript application to enable or
                      disable [RFC4733] use, per session.
      ----------------------------------------------------------------
      A2-17           RTCWeb MUST provide a means for the
                      Browser to generate and receive RTP
                      and RTCP over UDP without using SRTP.
      ----------------------------------------------------------------
      A2-18           RTCWeb MUST provide a means for the
                      Browser to generate and receive SRTP
                      using [SDES]; at least if the Web-Server
                      connection is HTTPS.
      ----------------------------------------------------------------
      A2-19           RTCWeb MUST provide a means for the
                      Browser to receive RTP/RTCP from a different
                      peer RTP stack instance, over the same
                      IP and port 5-tuple, at any time.  In other
                      words, the SSRC, timestamp, sequence number
                      space, etc., may change during the lifetime
                      of receiving a remote stream, without the
                      remote IP:port nor SRTP key changing, and
                      without ICE restarting.
      ----------------------------------------------------------------





Kaplan                   Expires - April 2011                [Page 19]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


8. Security Considerations

   From a SIP-signaling perspective, this document makes no
   requirements which impact SIP-signaling security.  SIP over TLS may
   be used, or not, depending on what the RTCWeb domain and SIP
   Enterprise or Service Provider supports, with the usual security
   issues and implications.

   If [RFC4474] is used, the Interworking Function would likely need to
   change SDP and thus break the signature, and would have to verify
   and re-sign the request using a certificate it owns.  Or the
   Interworking Function could also be the trusted signer and verifier
   for a domain to begin with, in which case it signs and verifier only
   once.  In practice, [RFC4474] is not used by most SIP Service
   Providers and Enterprises, so it does not matter.

   From a media-plane perspective, the difficulty of communicating with
   deployed SIP devices using SRTP is discussed in section 5.2.  The
   idea of not requiring SRTP be used for all sessions is
   controversial, but the author believes if the RTCWeb Web-Server and
   Browser are not using HTTPS but only plaintext HTTP, then a user
   should not expect the session to be secure; thus, at least in this
   case, SRTP should be optional.  When HTTPS is being used, the idea
   of not using SRTP becomes less appealing as the user likely expects
   the session to be secure; but in such a case optionally using [SDES]
   would also seem more reasonable than only allowing [DTLS-SRTP].

   Technically, [SDES] is less secure than [DTLS-SRTP] in the sense
   that the RTCWeb Web-Server and Javascript can view the keys; and
   with [DTLS-SRTP] the user could verify the session is secure end-to-
   end by manually checking the fingerprint and asking the far-end user
   if they sent it.  Unless the user actually performs the manual
   inspection and verification, however, [DTLS-SRTP] proves no more
   than [SDES] does, since the Javascript could have maliciously sent
   the call through a Man-in-the-Middle that terminated the DTLS-key-
   based SRTP.  In fact, in order to interwork with deployed SIP
   devices it would have to use a middleman: the Interworking Function
   itself.  Therefore, there is little to gain by not just supporting
   [SDES] as well as [DTLS-SRTP]; those users who wish to verify the
   security can still do so, in exactly the same way they would verify
   [DTLS-SRTP] fingerprints, and see there is no fingerprint to verify,
   with appropriate text explaining why.

9.   IANA Considerations

   This document makes no request of IANA.





Kaplan                   Expires - April 2011                [Page 20]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


10.  Acknowledgments

   Thanks to Xavier Marjou and Victor Pascual for input and feedback.
   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).

11.  References

11.1.     Informative References

   [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
        A., Peterson, J., Sparks, R., Handley, M., and E. Schooler,
        "SIP: Session Initiation Protocol", RFC 3261, June 2002.

   [RFC4566] Handley, M., Jacobson, V., Perkins, C., "SDP: Session
        Description Protocol", RFC 4566, July 2006.

   [ICE] Rosenberg, J., "Interactive Connectivity Establishment (ICE):
        A Protocol for Network Address Translator (NAT) Traversal for
        Offer/Answer Protocols", RFC 5245, March 2010.

   [SDES] Andreasen, F., Baugher, M., and D. Wing, "Session Description
        Protocol (SDP) Security Descriptions for Media Streams", RFC
        4568, July 2006.

   [RFC4733] Schulzrinne, H., Taylor, T., "RTP Payload for DTMF Digits,
        Telephony Tones, and Telephony Signals", RFC 4733, December
        2006.

   [KPML] Burger, E., Dolly M., "A Session Initiation Protocol (SIP)
        Event Package for Key Press Stimulus (KPML)", RFC 4730,
        November 2006.

   [DTLS-SRTP] McGrew, D., Resocrla, E., " Datagram Transport Layer
        Security (DTLS) Extension to Establish Keys for the Secure
        Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.

   [RFC4474] Peterson, J., Jennings, C., "Enhancements for
        Authenticated Identity Management in the Session Initiation
        Protocol (SIP)", RFC 4474, August 2006.

   [SDP-CAP-NEG] Andreasen, F., "Session Description Protocol (SDP)
        Capability Negotiation", RFC 5939, September 2010.

   [draft-best-effort-srtp] Kaplan, H., Audet, F., "Session Description
        Protocol (SDP) Offer/Answer Negotiation For Best-Effort Secure
        Real-Time Transport Protocol", draft-kaplan-mmusic-best-effort-
        srtp-01, October 2006.



Kaplan                   Expires - April 2011                [Page 21]


Internet-Draft   RTCWeb-SIP Interworking Requirements     October 2011


   [draft-use-cases] Holmberg, C., Hakansson, S., Eriksson, G., "Web
        Real-Time Communication Use-cases and Requirements", draft-
        ietf-rtcweb-use-cases-and-requirements-06, October 4, 2011.



Author's Address

   Hadriel Kaplan
   Acme Packet
   Email: hkaplan@acmepacket.com








































Kaplan                   Expires - April 2011                [Page 22]

Html markup produced by rfcmarkup 1.129d, available from https://tools.ietf.org/tools/rfcmarkup/