Network Working Group                                          S. Proust
Internet-Draft                                                    Orange
Intended status: Informational                                 E. Berger                          December 2, 2015
Expires: February 8, June 4, 2016                                          Cisco
                                                               B. Feiten
                                                        Deutsche Telekom
                                                               B. Burman
                                                                Ericsson
                                                             K. Bogineni
                                                        Verizon Wireless
                                                                  M. Lei
                                                                  Huawei
                                                              E. Marocco
                                                          Telecom Italia
                                                          August 7, 2015

          Additional WebRTC audio codecs for interoperability.
             draft-ietf-rtcweb-audio-codecs-for-interop-02
             draft-ietf-rtcweb-audio-codecs-for-interop-03

Abstract

   To ensure a baseline level of interoperability between WebRTC
   clients, [I-D.ietf-rtcweb-audio] requires a minimum set of codecs. required codecs is specified.  However, to
   maximize the possibility to establish the session without the need
   for audio transcoding, it is also recommended to include in the offer
   other suitable audio codecs that are available to the browser.

   This document provides some guidelines on the suitable codecs to be
   considered for WebRTC clients to address the most relevant
   interoperability use cases.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on February 8, June 4, 2016.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3   2
   2.  Definitions . . . . . . . . .  Definition and abbreviations  . . . . . . . . . . . . . . . .   3
   3.  Rationale for additional WebRTC codecs  . . . . . . . . . . .   3
   4.  Additional suitable codecs for WebRTC . . . . . . . . . . . .   4   5
     4.1.  AMR-WB  . . . . . . . . . . . . . . . . . . . . . . . . .   5
       4.1.1.  AMR-WB General description  . . . . . . . . . . . . .   5
       4.1.2.  WebRTC relevant use case for AMR-WB . . . . . . . . .   5
       4.1.3.  Guidelines for AMR-WB usage and implementation with
               WebRTC  . . . . . . . . . . . . . . . . . . . . . . .   5
     4.2.  AMR . . . . . . . . . . . . . . . . . . . . . . . . . . .   6
       4.2.1.  AMR General description . . . . . . . . . . . . . . .   6
       4.2.2.  WebRTC relevant use case for AMR  . . . . . . . . . .   6
       4.2.3.  Guidelines for AMR usage and implementation with
               WebRTC  . . . . . . . . . . . . . . . . . . . . . . .   6   7
     4.3.  G.722 . . . . . . . . . . . . . . . . . . . . . . . . . .   7
       4.3.1.  G.722 General description . . . . . . . . . . . . . .   7
       4.3.2.  WebRTC relevant use case for G.722  . . . . . . . . .   7
       4.3.3.  Guidelines for G.722 usage and implementation . . . .   7   8
     4.4.  Other codecs  . . . . . . . . . . . . . . . . . . . . . .   7   8
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   8
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   8
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   8
     8.1.  Normative references  . . . . . . . . . . . . . . . . . .   8   9
     8.2.  Informative references  . . . . . . . . . . . . . . . . .   8
   Authors' Addresses  10
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . .   9 .  11

1.  Introduction

   As indicated in [I-D.ietf-rtcweb-overview], it has been anticipated
   that WebRTC will not remain an isolated island and that some WebRTC
   endpoints will need to communicate with devices used in other
   existing networks with the help of a gateway.  Therefore, in order to
   maximize the possibility to establish the session without the need
   for audio transcoding, it is recommended in [I-D.ietf-rtcweb-audio]
   to include in the offer other suitable audio codecs that are
   available to the browser.  This document provides some guidelines on
   the suitable codecs to be considered for WebRTC clients to address
   the most relevant interoperability use cases.

   The codecs considered in this document are recommended to be
   supported and included in the Offer only for WebRTC clients for which
   interoperability with other non WebRTC end points non-WebRTC endpoints and non WebRTC non-WebRTC based
   services is relevant as described in sections 5.1.2, 5.2.2 and
   5.3.2. Section 4.1.2, Section 4.2.2,
   Section 4.3.2.  Other use cases may justify offering other additional
   codecs to avoid transcodings.  It is the intent of this document to
   inventory and document any other additional interoperability use
   cases and codecs if needed.

2.  Definitions  Definition and abbreviations

   o  Legacy networks: In this draft, document, legacy networks encompass the
      conversational networks that are already deployed like the PSTN,
      the PLMN, the IMS, H.323 networks. IP/IMS networks offering VoIP services, including
      3GPP "4G" Evolved Packet System[TS23.002] supporting voice over
      LTE radio access (VoLTE) [IR.92].

   o  AMR: Adaptive Multi-Rate.

   o  AMR-WB: Adaptive Multi-Rate WideBand.

   o  CAT-iq: Cordless Advanced Technology-internet and quality.

   o  DECT: Digital Enhanced Cordless Telecommunications

   o  IMS: IP Multimedia Subsystem

   o  LTE: Long Term Evolution (3GPP "4G" wireless data transmission
      standard)

   o  MOS: Mean Opinion Score

   o  PSTN:Public Switched Telephone Network

   o  PLMN: Public Land Mobile Network

   o  VoLTE: Voice Over LTE

3.  Rationale for additional WebRTC codecs

   The mandatory implementation of OPUS [RFC6716] in WebRTC clients can
   guarantee the codec interoperability (without transcoding) at the state of
   the art voice quality (better than narrow band "PSTN" quality)
   between WebRTC clients.  The WebRTC technology is however also expected to be
   used to communicate with other types of clients using other
   technologies.  It can be used for instance as an access technology to 3GPP IMS
   VoLTE services (e.g.  VoLTE, ViLTE) (Voice over LTE as specified in [IR.92]) or to
   interoperate with fixed or mobile Circuit Switched or VoIP services
   like mobile 3GPP 3G/2G Circuit Switched voice over 3GPP 2G/3G mobile networks

   [TS23.002] or DECT based VoIP
   telephony. telephony [EN300175-1].  Consequently,
   a significant number of calls are likely to occur between terminals
   supporting WebRTC clients and other terminals like mobile handsets,
   fixed VoIP terminals, DECT terminals that do not support WebRTC
   clients nor implement OPUS.  As a consequence, these calls are likely
   to be either of low narrow band PSTN quality using G.711 [G.711] at
   both ends or affected by transcoding operations.  The drawbacks of
   such transcoding operations are recalled below:

   o  Degraded user experience with respect to voice quality: voice
      quality is significantly degraded by transcoding.  For instance,
      the degradation is around 0.2 to 0.3 MOS for most of transcoding
      use cases with AMR-WB codec (Section 4.1) at 12.65 kbit/s and in
      the same range for other wideband transcoding cases.  It should be
      stressed that if G.711 is used as a fall back codec for
      interoperation, wideband voice quality will be lost.  Such
      bandwidth reduction effect down to narrow band clearly degrades
      the user perceived quality of service leading to shorter and less
      frequent calls.  Such a switch to G.711 is less than desirable or
      acceptable choice for customers.  If transcoding is performed
      between OPUS and any other wideband codec, wideband communication
      could be maintained but with degraded quality (MOS scores of
      transcoding between AMR-WB 12.65 kbit/s and OPUS at 16 kbit/s in
      both directions are significantly lower than those of AMR-WB at
      12.65 kbit/s or OPUS at 16 kbit/s).  Furthermore, in degraded
      conditions, the addition of defects, like audio artifacts due to
      packet losses, and the audio effects resulting from the cascading
      of different packet loss recovery algorithms may result in a
      quality below the acceptable limit for the customers.

   o  Degraded user experience with respect to conversational
      interactivity: the degradation of conversational interactivity is
      due to the increase of end to end latency for both directions that
      is introduced by the transcoding operations.  Transcoding requires
      full de-packetization for decoding of the media stream (including
      mechanisms of de-jitter buffering and packet loss recovery) then
      re-encoding, re-packetization and re-sending.  The delays produced
      by all these operations are additive and may increase the end to
      end delay beyond acceptable limits like with more than 1s end to
      end latency.

   o  Additional costs in networks: transcoding places important
      additional costs on network gateways mainly related to codec
      implementation, codecs license, deployments, testing and
      validation costs.  It must be noted that transcoding of wideband
      to wideband would require more CPU processing and be more costly
      than between narrowband codecs.

4.  Additional suitable codecs for WebRTC

   The following codecs are considered as relevant suitable codecs with
   respect to the general purpose described in section 4. Section 3.  This list
   reflects the current status of WebRTC foreseen use cases.  It is not
   limitative and opened to further inclusion of other codecs for which
   relevant use cases can be identified.  These additional codecs are
   recommended to be included in the offer in addition to OPUS and G.711
   according to the foreseen interoperability cases to be addressed.

4.1.  AMR-WB

4.1.1.  AMR-WB General description

   The Adaptive Multi-Rate WideBand (AMR-WB) is a 3GPP defined speech
   codec that is mandatory to implement in any 3GPP terminal that
   supports wideband speech communication.  It is being used in circuit
   switched mobile telephony services and new multimedia telephony
   services over IP/IMS and 4G/VoLTE, like for voice over LTE as specified by GSMA as voice IMS
   profile for VoLTE in
   [IR.92].  More detailed information on AMR-WB can be found in
   [IR.36].  [IR.36] includes references  References for all 3GPP AMR-WB related specifications including
   detailed codec description and Source code. code are in [TS26.171],
   [TS26.173], [TS26.190], [TS26.204].

4.1.2.  WebRTC relevant use case for AMR-WB

   The market of voice personal voice communication is driven by mobile
   terminals.  AMR-WB is now implemented in several hundreds of devices
   models and more than 130 145 HD mobile networks in 80 85 countries with a customer
   base of more than 300 450 millions.  A high number of calls are
   consequently likely to occur between WebRTC clients and mobile 3GPP
   terminals.  The use of AMR-WB by WebRTC clients would consequently
   allow transcoding free interoperation with all mobile 3GPP wideband
   terminal.
   terminals.  Besides, WebRTC clients running on mobile terminals
   (smartphones) may reuse the AMR-WB codec already implemented on these
   devices.

4.1.3.  Guidelines for AMR-WB usage and implementation with WebRTC

   Guidelines

   The payload format to be used for AMR-WB is described in [RFC4867]
   with bandwidth efficient format and one speech frame encapsulated in
   each RTP packets.  Further guidelines for implementing and using AMR-WB AMR-
   WB and ensuring interoperability with 3GPP mobile services can be
   found in [TS26.114].  In order to ensure interoperability with 4G/VoLTE 4G/
   VoLTE as specified by GSMA, the more specific IMS profile for voice
   derived from [TS26.114] should be considered in [IR.92].  In order to
   maximize the possibility of successful call establishment for WebRTC
   client offering AMR-WB it is important that the WebRTC client:

   o  Offer AMR in addition to AMR-WB with AMR-WB, being a wideband
      codec, listed first as preferred payload type with respect to
      other narrow band codecs (AMR, G.711...)and with Bandwidth
      Efficient payload format preferred.

   o  Be capable of operating AMR-WB with any subset of the nine codec
      modes and source controlled rate operation.  Offer at least one
      AMR-WB configuration with parameter settings as defined in
      Table 6.1 of [TS 26.114]. [TS26.114].  In order to maximize the
      interoperability and quality this offer does not restrict the
      codec modes offered.  Restrictions in the use of codec modes may
      be included in the answer.

4.2.  AMR

4.2.1.  AMR General description

   Adaptive Multi-Rate (AMR) is a 3GPP defined speech codec that is
   mandatory to implement in any 3GPP terminal that supports voice
   communication, i.e. several hundred millions of terminals.  This
   include both mobile phone calls using GSM and 3G cellular systems as
   well as multimedia telephony services over IP/IMS and 4G/VoLTE, such
   as GSMA voice IMS profile for VoLTE in [IR.92].  In addition to
   impacts listed above, support of AMR can avoid degrading the high
   efficiency over mobile radio access. access.References for AMR related
   specifications including detailed codec description and Source code
   are in [TS26.071], [TS26.073], [TS26.090], [TS26.104].

4.2.2.  WebRTC relevant use case for AMR

   A user of a WebRTC endpoint on a device integrating an AMR module
   wants to communicate with another user that can only be reached on a
   mobile device that only supports AMR.  Although more and more
   terminal devices are now "HD voice" and support AMR-WB; there is
   still a high number of legacy terminals supporting only AMR
   (terminals with no wideband / HD Voice capabilities) are still used.
   The use of AMR by WebRTC client would consequently allow transcoding
   free interoperation with all mobile 3GPP terminals.  Besides, WebRTC
   client running on mobile terminals (smartphones) may reuse the AMR
   codec already implemented on these devices.

4.2.3.  Guidelines for AMR usage and implementation with WebRTC

   Guidelines

   The payload format to be used for AMR is described in [RFC4867] with
   bandwidth efficient format and one speech frame encapsulated in each
   RTP packets.  Further guidelines for implementing and using AMR with
   purpose to ensure interoperability with 3GPP mobile services can be
   found in [TS26.114].  In order to ensure interoperability with 4G/VoLTE 4G/
   VoLTE as specified by GSMA, the more specific IMS profile for voice
   derived from [TS26.114] should be considered in [IR.92].  In order to
   maximize the possibility of successful call establishment for WebRTC
   client offering AMR, it is important that the WebRTC client:

   o  Be capable of operating AMR with any subset of the eight codec
      modes and source controlled rate operation.

   o  Offer at least one configuration with parameter settings as
      defined in Table 6.1 and Table 6.2 of [TS26.114].  In order to
      maximize the interoperability and quality this offer shall not
      restrict AMR codec modes offered.  Restrictions in the use of
      codec modes may be included in the answer.

4.3.  G.722

4.3.1.  G.722 General description

   G.722 [G.722] is an ITU-T defined wideband speech codec.  [G.722]  G.722 was
   approved by ITU-T in 1988.  It is a royalty free codec that is common
   in a wide range of terminals and end-points endpoints supporting wideband speech
   and requiring low complexity.  The complexity of G.722 is estimated
   to 10 MIPS [EN300175-8] which is 2.5 to 3 times lower than AMR-WB.
   Especially, G.722 has been chosen by ETSI DECT as the mandatory
   wideband codec for New Generation DECT with purpose to greatly
   increase the voice quality by extending the bandwidth from narrow
   band to wideband.  G.722 is the wideband codec required for CAT-iq
   DECT certified terminal terminals and the V2.0 of CAT-iq specifications have
   been approved by GSMA as minimum requirements for HD voice logo usage
   on "fixed" devices; i.e., broadband connections using the G.722
   codec.

4.3.2.  WebRTC relevant use case for G.722

   G.722 is the wideband codec required for DECT CAT-iq terminals.  The
   market for DECT cordeless cordless phones including DECT chipset is more than
   150 Millions per year and CAT-IQ is a registered trade make in 47
   countries worldwide.  G.722 has also been specified by ETSI in
   [TS181005] as mandatory wideband codec for IMS multimedia telephony
   communication service and supplementary services using fixed
   broadband access.  The support of G.722 would consequently allow
   transcoding free IP interoperation between WebRTC client and fixed
   VoIP terminals including DECT / CAT-IQ terminals supporting G.722.
   Besides, WebRTC client running on fixed terminals implementing G.722
   may reuse the G.722 codec already implemented on these devices.

4.3.3.  Guidelines for G.722 usage and implementation

   Guidelines

   The payload format to be used for implementing and using G.722 is defined in [RFC3551] with purpose
   each octet of the stream of octets produced by the codec to ensure
   interoperability with Multimedia Telephony services overs IMS can be
   found octet-
   aligned in section 7 of [TS26.114].  Additional information of an RTP packet.  The sampling frequency for G.722
   implementation is 16 kHz
   but the rtp clock rate is set to 8000Hz in SDP to stay backward
   compatible with an erroneous definition in the original version of
   the RTP A/V profile.  Further guidelines for implementing and using
   G.722 with purpose to ensure interoperability with Multimedia
   Telephony services overs IMS can be found in section 7 of [TS26.114].
   Additional information of G.722 implementation in DECT can be found
   in [EN300175-8]  and full codec description and C source code in
   [G.722].

4.4.  Other codecs

   Other interoperability use cases may justify the use of other codecs.

5.  Security Considerations

   Security considerations for WebRTC Audio Codec and Processing
   Requirements can be found in [I-D.ietf-rtcweb-audio].  Implementors
   making use of the additional codecs considered in this document are
   advised to also report more specifically to the "Security
   Considerations" sections of [RFC4867] (for AMR and AMR-WB) and
   [RFC3551].

6.  IANA Considerations

   None.

7.  Acknowledgements

   None.

   Special thanks to Espen Berger, Bernhard Feiten, Bo Burman, Kalyani
   Bogineni, Miao Lei, Enrico Marocco, who co-authored the initial
   document.  Thanks, as well, to Magnus Westerlund and Barry Dingle who
   carefully reviewed the document and helped to improve it.

8.  References
8.1.  Normative references

   [RFC2119]  Bradner, S., "Key words for use

   [G.722]    ITU, "Recommendation ITU-T G.722 (2012): 7 kHz audio-
              coding within 64 kbit/s", 2012-09.

   [I-D.ietf-rtcweb-audio]
              Valin, J. and C. Bran, "WebRTC Audio Codec and Processing
              Requirements", draft-ietf-rtcweb-audio-09 (work in RFCs to Indicate
              Requirement Levels", BCP 14,
              progress), November 2015.

   [IR.92]    GSMA, "IMS Profile for Voice and SMS V9.0", April 2015.

   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
              Video Conferences with Minimal Control", STD 65, RFC 2119, 3551,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>. 10.17487/RFC3551, July 2003,
              <http://www.rfc-editor.org/info/rfc3551>.

   [RFC4867]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie,
              "RTP Payload Format and File Storage Format for the
              Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband
              (AMR-WB) Audio Codecs", RFC 4867, DOI 10.17487/RFC4867,
              April 2007, <http://www.rfc-editor.org/info/rfc4867>.

   [TS26.071]
              3GPP, "3GPP TS 26.071 v12.0.0: Recommendation ITU-T G.722
              (2012): "Mandatory Speech Codec speech processing
              functions; AMR Speech CODEC; General description".",
              2014-09.

   [TS26.073]
              3GPP, "3GPP TS 26.073 v12.0.0: ANSI C code for the
              Adaptive Multi Rate (AMR) speech codec", 2014-09.

   [TS26.090]
              3GPP, "3GPP TS 26.090 v12.0.0: Mandatory Speech Codec
              speech processing functions; Adaptive Multi-Rate (AMR)
              speech codec; Transcoding functions.", 2014-09.

   [TS26.104]
              3GPP, "3GPP TS 26.104 v12.0.0: ANSI C code for the
              floating-point Adaptive Multi Rate (AMR) speech codec.",
              2014-09.

   [TS26.114]
              3GPP, "IP Multimedia Subsystem (IMS); Multimedia
              telephony; Media handling and interaction V13.0.0", June
              2015.

   [TS26.171]
              3GPP, "3GPP TS 26.071 v12.0.0: Recommendation ITU-T G.722
              (2012): "Speech codec speech processing functions;
              Adaptive Multi-Rate - Wideband (AMR-WB) speech codec;
              General description".", 2014-09.

   [TS26.173]
              3GPP, "3GPP TS 26.073 v12.1.0: ANSI-C code for the
              Adaptive Multi-Rate - Wideband (AMR-WB) speech codec.",
              2015-03.

   [TS26.190]
              3GPP, "3GPP TS 26.090 v12.0.0: Speech codec speech
              processing functions; Adaptive Multi-Rate - Wideband (AMR-
              WB) speech codec; Transcoding functions.", 2014-09.

   [TS26.204]
              3GPP, "3GPP TS 26.104 v12.1.0: Speech codec speech
              processing functions; Adaptive Multi-Rate - Wideband (AMR-
              WB) speech codec; ANSI-C code.", 2015-03.

8.2.  Informative references

   [EN300175-1]
              ETSI, "ETSI EN 300 175-1, Digital Enhanced Cordless
              Telecommunications (DECT); Common Interface (CI); Part 1:
              Overview v2.5.1", 2009.

   [EN300175-8]
              ETSI, "ETSI EN 300 175-8, v2.5.1: "Digital Digital Enhanced
              Cordless Telecommunications (DECT); Common Interface (CI);
              Part 8: Speech and audio coding and transmission".", transmission.", 2009.

   [G.722]

   [G.711]    ITU, "Recommendation ITU-T G.722 G.711 (2012): "7 kHz audio-
              coding within 64 kbit/s".", 2012.

   [I-D.ietf-rtcweb-audio]
              Valin, J. and C. Bran, "WebRTC Audio Codec and Processing
              Requirements", draft-ietf-rtcweb-audio-08 (work in
              progress), April 2015. Pulse code
              modulation (PCM) of voice frequencies", 1988-11.

   [I-D.ietf-rtcweb-overview]
              Alvestrand, H., "Overview: Real Time Protocols for
              Browser-based Applications", draft-ietf-rtcweb-overview-14
              (work in progress), June 2015.

   [IR.36]    GSMA, "Adaptive Multirate Wide Band V3.0", September 2014.

   [IR.92]    GSMA, "IMS Profile for Voice and SMS V9.0", April 2015.

   [RFC6716]  Valin, JM., Vos, K., and T. Terriberry, "Definition of the
              Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716,
              September 2012, <http://www.rfc-editor.org/info/rfc6716>.

   [RFC7478]  Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real-
              Time Communication Use Cases and Requirements", RFC 7478,
              DOI 10.17487/RFC7478, March 2015,
              <http://www.rfc-editor.org/info/rfc7478>.

   [TS181005]
              ETSI, "Telecommunications and Internet converged Services
              and Protocols for Advanced Networking (TISPAN); Service
              and Capability Requirements V3.3.1 (2009-12)", 2009.

   [TS26.114]

   [TS23.002]
              3GPP, "IP Multimedia Subsystem (IMS); Multimedia
              telephony; Media handling and interaction V13.0.0", June
              2015.

Authors' Addresses "3GPP TS 23.002 v13.3.0: Network architecture",
              2015-09.

Author's Address

   Stephane Proust
   Orange
   2, avenue Pierre Marzin
   Lannion  22307
   France

   Email: stephane.proust@orange.com

   Espen Berger
   Cisco

   Email: espeberg@cisco.com

   Bernhard Feiten
   Deutsche Telekom

   Email: Bernhard.Feiten@telekom.de

   Bo Burman
   Ericsson

   Email: bo.burman@ericsson.com

   Kalyani Bogineni
   Verizon Wireless

   Email: Kalyani.Bogineni@VerizonWireless.com
   Miao Lei
   Huawei

   Email: lei.miao@huawei.com

   Enrico Marocco
   Telecom Italia

   Email: enrico.marocco@telecomitalia.it