SIPPING WG                                                       R. Mahy
Internet-Draft                                               Plantronics
Intended status: Informational                                 R. Sparks
Expires: October 18, 2008                               Estacado Systems September 6, 2009                                       Tekelek
                                                            J. Rosenberg
                                                           Cisco Systems
                                                               D. Petrie
                                                                  SIP EZ
                                                        A. Johnston, Ed.
                                                                   Avaya
                                                          April 16, 2008
                                                           March 5, 2009

     A Call Control and Multi-party usage framework for the Session
                       Initiation Protocol (SIP)
                   draft-ietf-sipping-cc-framework-10
                   draft-ietf-sipping-cc-framework-11

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of which he BCP 78 and BCP 79.  This document may contain material
   from IETF Documents or she is aware
   have been IETF Contributions published or will made publicly
   available before November 10, 2008.  The person(s) controlling the
   copyright in some of this material may not have granted the IETF
   Trust the right to allow modifications of such material outside the
   IETF Standards Process.  Without obtaining an adequate license from
   the person(s) controlling the copyright in such materials, this
   document may not be disclosed, modified outside the IETF Standards Process, and any
   derivative works of which he or she becomes
   aware will it may not be disclosed, in accordance with Section 6 of BCP 79. created outside the IETF Standards
   Process, except to format it for publication as an RFC or to
   translate it into languages other than English.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on October 18, 2008. September 6, 2009.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   This document defines a framework and requirements for call control
   and multi-party usage of SIP.  To enable discussion of multi-party
   features and applications we define an abstract call model for
   describing the media relationships required by many of these.  The
   model and actions described here are specifically chosen to be
   independent of the SIP signaling and/or mixing approach chosen to
   actually setup the media relationships.  In addition to its dialog
   manipulation aspect, this framework includes requirements for
   communicating related information and events such as conference and
   session state, and session history.  This framework also describes
   other goals that embody the spirit of SIP applications as used on the
   Internet.

Table of Contents

   1.  Motivation and Background  . . . . . . . . . . . . . . . . . .  4  5
   2.  Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . .  6  7
     2.1.  "Conversation Space" Model . . . . . . . . . . . . . . . .  6  7
     2.2.  Relationship Between Conversation Space, SIP Dialogs,
           and SIP Sessions . . . . . . . . . . . . . . . . . . . . .  7  8
     2.3.  Signaling Models . . . . . . . . . . . . . . . . . . . . .  8  9
     2.4.  Mixing Models  . . . . . . . . . . . . . . . . . . . . . .  9 10
       2.4.1.  Tightly Coupled  . . . . . . . . . . . . . . . . . . . 10 11
       2.4.2.  Loosely Coupled  . . . . . . . . . . . . . . . . . . . 11 12
     2.5.  Conveying Information and Events . . . . . . . . . . . . . 12 13
     2.6.  Componentization and Decomposition . . . . . . . . . . . . 14 15
       2.6.1.  Media Intermediaries . . . . . . . . . . . . . . . . . 14 15
       2.6.2.  Mixer  . . . . . . . . . . . . . . . . . . . . . . . . 14 16
       2.6.3.  Transcoder . . . . . . . . . . . . . . . . . . . . . . 15 16
       2.6.4.  Media Relay  . . . . . . . . . . . . . . . . . . . . . 15 16
       2.6.5.  Queue Server . . . . . . . . . . . . . . . . . . . . . 15 16
       2.6.6.  Parking Place  . . . . . . . . . . . . . . . . . . . . 15 16
       2.6.7.  Announcements and Voice Dialogs  . . . . . . . . . . . 15 17
     2.7.  Use of URIs  . . . . . . . . . . . . . . . . . . . . . . . 17 18
       2.7.1.  Naming Users in SIP  . . . . . . . . . . . . . . . . . 18 19
       2.7.2.  Naming Services with SIP URIs  . . . . . . . . . . . . 19 20
     2.8.  Invoker Independence . . . . . . . . . . . . . . . . . . . 21 22
     2.9.  Billing issues . . . . . . . . . . . . . . . . . . . . . . 21 22
   3.  Catalog of call control actions and sample features  . . . . . 22 23
     3.1.  Remote Call Control Actions on Early Dialogs . . . . . . . 22 23
       3.1.1.  Remote Answer  . . . . . . . . . . . . . . . . . . . . 23 24
       3.1.2.  Remote Forward or Put  . . . . . . . . . . . . . . . . 23 24
       3.1.3.  Remote Busy or Error Out . . . . . . . . . . . . . . . 23 24
     3.2.  Remote Call Control Actions on Single Dialogs  . . . . . . 23 24
       3.2.1.  Remote Dial  . . . . . . . . . . . . . . . . . . . . . 23 24
       3.2.2.  Remote On and Off Hold . . . . . . . . . . . . . . . . 23 24
       3.2.3.  Remote Hangup  . . . . . . . . . . . . . . . . . . . . 23 25
     3.3.  Call Control Actions on Multiple Dialogs . . . . . . . . . 24 25
       3.3.1.  Transfer . . . . . . . . . . . . . . . . . . . . . . . 24 25
       3.3.2.  Take . . . . . . . . . . . . . . . . . . . . . . . . . 25 26
       3.3.3.  Add  . . . . . . . . . . . . . . . . . . . . . . . . . 26 27
       3.3.4.  Local Join . . . . . . . . . . . . . . . . . . . . . . 27 28
       3.3.5.  Insert . . . . . . . . . . . . . . . . . . . . . . . . 27 28
       3.3.6.  Split  . . . . . . . . . . . . . . . . . . . . . . . . 28 29
       3.3.7.  Near-fork  . . . . . . . . . . . . . . . . . . . . . . 28 29
       3.3.8.  Far fork . . . . . . . . . . . . . . . . . . . . . . . 28 29
   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 29 30
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 30 31
   6.  Appendix A: Example Features . . . . . . . . . . . . . . . . . 30 31
     6.1.  Implementation of these features  Attended Transfer  . . . . . . . . . . . . . 33
       6.1.1.  Barge-in . . . . . . . 31
     6.2.  Auto Answer  . . . . . . . . . . . . . . . . 34
       6.1.2.  Call Monitoring . . . . . . . 31
     6.3.  Automatic Callback . . . . . . . . . . . . 34
       6.1.3.  Call Park . . . . . . . . 32
     6.4.  Barge-in . . . . . . . . . . . . . . 35
       6.1.4.  Call . . . . . . . . . . . 32
     6.5.  Blind Transfer . . . . . . . . . . . . . . . . . . . . . . 32
     6.6.  Call Forwarding  . . . . . . . . . . . . . . . . . . . . . 32
     6.7.  Call Monitoring  . . . . . . . . . . . . . . . . . . . . . 32
     6.8.  Call Park  . . . . . . . . . . . . . . . . . . . . . . . . 32
     6.9.  Call Pickup  . . . . . . . . . . . . . . . . . . . . . 35
       6.1.5.  Click-to-dial . . 33
     6.10. Call Return  . . . . . . . . . . . . . . . . . . 35
       6.1.6.  Distinctive ring . . . . . 33
     6.11. Call Waiting . . . . . . . . . . . . . . 36
       6.1.7.  Intercom . . . . . . . . . 33
     6.12. Click-to-Dial  . . . . . . . . . . . . . . 36
       6.1.8.  Music on Hold . . . . . . . . 34
     6.13. Conference Call  . . . . . . . . . . . . 36
       6.1.9.  Pre-paid calling . . . . . . . . . 34
     6.14. Consultative Transfer  . . . . . . . . . . 36
       6.1.10. Single Line Extension/Multiple Line Appearance . . . . 37
       6.1.11. Speakerphone paging . . . . 34
     6.15. Distinctive Ring . . . . . . . . . . . . . 37
       6.1.12. Voice message screening . . . . . . . . 34
     6.16. Do Not Disturb . . . . . . . 37
       6.1.13. Voice Portal . . . . . . . . . . . . . . . 34
     6.17. Find-Me  . . . . . . 38
   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . 35
     6.18. Hotline  . . . . 38
   8.  Informative References . . . . . . . . . . . . . . . . . . . . 38
   Authors' Addresses . 35
     6.19. IM Conference Alerts . . . . . . . . . . . . . . . . . . . 35
     6.20. Inbound Call Screening . . . . 41
   Intellectual Property and Copyright Statements . . . . . . . . . . 43

1.  Motivation and Background

   The Session Initiation Protocol [RFC3261] (SIP) was defined for the
   initiation, maintenance, and termination of sessions or calls between
   one or more users.  However, despite its origins as a large-scale
   multiparty conferencing protocol, SIP is used today primarily for
   point to point calls.  This two-party configuration is the focus of
   the SIP specification and most of its extensions.

   This document defines a framework and requirements for call control
   and multi-party usage of SIP.  Most multi-party operations manipulate
   SIP dialogs (also known as call legs) or SIP conference media policy
   to cause participants in a conversation to perceive specific media
   relationships.  In other protocols that deal with the concept of
   calls, this manipulation is known as call control.  In addition to
   its dialog or policy manipulation aspect, "call control" also
   includes communicating information and events related to manipulating
   calls, including information and events dealing with session state
   and history, conference state, user state, and even message state.

   Based . . . . 35
     6.21. Intercom . . . . . . . . . . . . . . . . . . . . . . . . . 35
     6.22. Message Waiting  . . . . . . . . . . . . . . . . . . . . . 35
     6.23. Music on input from the SIP community, the authors compiled the
   following set of goals for SIP call control and multiparty
   applications:
   o  Define Primitives, Not Services.  Allow for a handful of robust
      yet simple mechanisms that can be combined to deliver features and
      services.  Throughout this document we refer to these simple
      mechanisms as "primitives".  Primitives should be sufficiently
      robust so that when they are combined with eachother, they can be
      used to build lots of services.  However, the goal is not to
      define a provably complete set of primitives.  Note that while the
      IETF will NOT standardize behavior or services, it may define
      example services for informational purposes, as in service
      examples [I-D.ietf-sipping-service-examples].
   o  Participant oriented.  The primitives should be designed to
      provide services that are oriented around the experience of the
      participants.  The authors observe that end users of features and
      services usually don't care how a media relationship is setup.
      Their ultimate experience is based only on the resulting media and
      other externally visible characteristics.
   o  Signaling Model independent: Support both a central control and a
      peer-to-peer feature invocation model (and combinations of the
      two).  Baseline SIP already supports a centralized control model
      described in 3pcc [RFC3725], and the SIP community has expressed a
      great deal of interest in peer-to-peer or distributed call control
      using primitives such as those defined in REFER [RFC3515],
      Replaces [RFC3891], and Join [RFC3911].

   o  Mixing Model independent: The bulk of interesting multiparty
      applications involve mixing or combining media from multiple
      participants.  This mixing can be performed by one or more of the
      participants, or by a centralized mixing resource.  The experience
      of the participants should not depend on the mixing model used.
      While most examples in this document refer to audio mixing, the
      framework applies to any media type.  In this context a "mixer"
      refers to combining media of the same type in an appropriate,
      media-specific way.  This is consistent with model described in
      the SIP conferencing framework.
   o  Invoker oriented.  Only the user who invokes a feature or a
      service needs to know exactly which service is invoked or why.
      This is good because it allows new services to be created without
      requiring new primitives from all the participants; and it allows
      for much simpler feature authorization policies, for example, when
      participation spans organizational boundaries.  As discussed in
      section 3.8, this also avoids exponential state explosion when
      combining features.  The invoker only has to manage a user
      interface or API to prevent local feature interactions.  All the
      other participants simply need to manage the feature interactions
      of a much smaller number of primitives.
   o  Primitives make full use of URIs.  URIs are a very powerful
      mechanism for describing users and services.  They represent a
      plentiful resource that can be extremely expressive and easily
      routed, translated, and manipulated--even across organizational
      boundaries.  URIs can contain special parameters and informational
      headers that need only be relevant to the owner of the namespace
      (domain) of the URI.  Just as a user who selects an http: URL need
      not understand the significance and organization of the web site
      it references, a user may encounter a SIP URI that translates into
      an email-style group alias, that plays a pre-recorded message, or
      runs some complex call-handling logic.  Note that while this may
      seem paradoxical to the previous goal, both goals can be satisfied
      by the same model.
   o  Make use of SIP headers and SIP event packages to provide SIP
      entities with information about their environment.  These should
      include information about the status / handling of dialogs on
      other user agents, information about the history of other contacts
      attempted prior to the current contact, the status of
      participants, the status of conferences, user presence
      information, and the status of messages.
   o  Encourage service decomposition, and design to make use of
      standard components using well-defined, simple interfaces.  Sample
      components include a SIP mixer, recording service, announcement
      server, and voice dialog server.  (This is not an exhaustive
      list).

   o  Include authentication, authorization, policy, logging, and
      accounting mechanisms to allow these primitives to be used safely
      among mutually untrusted participants.  Some of these mechanisms
      may be used to assist in billing, but no specific billing system
      will be endorsed.
   o  Permit graceful fallback to baseline SIP.  Definitions for new SIP
      call control extensions/primitives must describe a graceful way to
      fallback to baseline SIP behavior.  Support for one primitive must
      not imply support for another primitive.
   o  There is no desire or goal to reinvent traditional models, such as
      the model used the H.450 family of protocols, JTAPI, or the CSTA
      call model, as these other models do not share the design goals
      presented in this document.

2.  Key Concepts

   This section introduces a number of key concepts which will be used
   to describe and explain various call control operations and services
   in the remainder of this document.  This includes the conversation
   space model, signaling and mixing models, common components, and the
   use of URIs.

2.1.  "Conversation Space" Model

   This document introduces the concept of an abstract "conversation
   space" as a set of participants who believe they are all
   communicating among one another.  Each conversation space contains
   one or more participants.

   Participants are SIP User Agents that send original media to or
   terminate and receive media from other members of the conversation
   space.  Logically, every participant in the conversation space has
   access to all the media generated in that space (this is strictly
   true if all participants share a common media type).  A SIP User
   Agent that does not contribute or consume any media is NOT a
   participant; nor is a user agent that merely forwards, transcoders,
   mixes, or selects media originating elsewhere in the conversation
   space.  [Note that a conversation space consists of zero or more SIP
   calls or SIP conferences.  A conversation space is similar to the
   definition of a "call" in some other call models.]

   Participants may represent human users or non-human users (referred
   to as robots or automatons in this document).  Some participants may
   be hidden within a conversation space.  Some examples of hidden
   participants include: robots that generate tones, images, or
   announcements during a conference to announce users arriving and
   departing, a human call center supervisor monitoring a conversation
   between a trainee and a customer, and robots that record media for
   training or archival purposes.

   Participants may also be active or passive.  Active participants are
   expected to be intelligent enough to leave a conversation space when
   they no longer desire to participate.  (An attentive human
   participant is obviously active.)  Some robotic participants (such as
   a voice messaging system, an instant messaging agent, or a voice
   dialog system) may be active participants if they can leave the
   conversation space when there is no human interaction.  Other robots
   (for example our tone generating robot from the previous example) are
   passive participants.  A human participant "on-hold" is passive.

   An example diagram of a conversation space can be shown as a "bubble"
   or ovals, or as a "set" in curly or square brace notation.  Each set,
   oval, or "bubble" represents a conversation space.  Hidden
   participants are shown in lowercase letters.

   Note that while the term "conversation" usually applies to oral
   exchange of information, we apply the conversation space model to any
   media exchange between participants.

   { A , B }                   [ A , b, C, D ]

      .-.                 .---.
     /   \               /     \
    /  A  \             / A   b \
   (       )           (         )
    \  B  /             \ C   D /
     \   /               \     /
      '-'                 '---'

2.2.  Relationship Between Conversation Space, SIP Dialogs, and SIP
      Sessions

   In SIP, a call is "an informal term that refers to some communication
   between peers, generally set up for the purposes of a multimedia
   conversation."  Obviously we cannot discuss normative behavior based
   on such an intentionally vague definition.  The concept of a
   conversation space is needed because the SIP definition of call is
   not sufficiently precise for the purpose of describing the user
   experience of multiparty features.

   Do any other definitions convey the correct meaning?  SIP, and SDP
   [RFC4566] both define a conference as "a multimedia session
   identified by a common session description."  A session is defined as
   "a set of multimedia senders and receivers Hold  . . . . . . . . . . . . . . . . . . . . . . 36
     6.24. Outbound Call Screening  . . . . . . . . . . . . . . . . . 36
     6.25. Pre-paid Calling . . . . . . . . . . . . . . . . . . . . . 36
     6.26. Presence-Enabled Conferencing  . . . . . . . . . . . . . . 37
     6.27. Single Line Extension/Multiple Line Appearance . . . . . . 37
     6.28. Speakerphone Paging  . . . . . . . . . . . . . . . . . . . 37
     6.29. Speed Dial . . . . . . . . . . . . . . . . . . . . . . . . 38
     6.30. Voice Message Screening  . . . . . . . . . . . . . . . . . 38
     6.31. Voice Portal . . . . . . . . . . . . . . . . . . . . . . . 38
     6.32. Voicemail  . . . . . . . . . . . . . . . . . . . . . . . . 39
     6.33. Whispered Call Waiting . . . . . . . . . . . . . . . . . . 39
   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 39
   8.  Informative References . . . . . . . . . . . . . . . . . . . . 40
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42

1.  Motivation and Background

   The Session Initiation Protocol [RFC3261] (SIP) was defined for the data streams
   flowing from senders to receivers."  Both
   initiation, maintenance, and termination of these definitions are
   heavily oriented toward multicast sessions with little
   differentiation among participants.  As such, neither or calls between
   one or more users.  However, despite its origins as a large-scale
   multiparty conferencing protocol, SIP is particularly
   useful used today primarily for our purposes.  In fact, the definition of "call" in some
   call models is more similar
   point to our definition of a conversation
   space.

   Some examples point calls.  This two-party configuration is the focus of
   the relationship between conversation spaces, SIP
   dialogs, specification and SIP sessions are listed below.  In each example, a human
   user will perceive that there is a single call.
   o  A simple two-party call is a single conversation space, most of its extensions.

   This document defines a single
      session, framework and a single dialog.
   o  A locally mixed three-way requirements for call is two sessions and two dialogs.
      It is also a single conversation space.
   o  A simple dial-in audio conference is a single conversation space,
      but is represented by as many dialogs and sessions as there are
      human participants.
   o  A multicast conference is a single conversation space, a single
      session, and as many dialogs as participants.

2.3.  Signaling Models

   Obviously to make changes to a conversation space, you must be able
   to use SIP signaling to cause these changes.  Specifically there must
   be a way to manipulate SIP dialogs (call legs) to move participants
   into control
   and out multi-party usage of conversation spaces.  Although this is not as
   obvious, there also must be a way to SIP.  Most multi-party operations manipulate
   SIP dialogs (also known as call legs) or SIP conference media policy
   to
   include non-participant user agents that are otherwise involved cause participants in a conversation space (ex: B2BUAs, 3pcc controllers, mixers,
   transcoders, translators, or relays).

   Implementations may setup the to perceive specific media relationships described in
   relationships.  In other protocols that deal with the
   conversation space model using a centralized control model.  One
   common way to implement concept of
   calls, this using SIP manipulation is known as 3rd Party Call
   Control (3pcc) call control.  In addition to
   its dialog or policy manipulation aspect, "call control" also
   includes communicating information and is described in 3pcc [RFC3725].  The 3pcc approach
   relies events related to manipulating
   calls, including information and events dealing with session state
   and history, conference state, user state, and even message state.

   Based on only input from the SIP community, the authors compiled the
   following 3 primitive operations:
   o  Create a new dialog (INVITE)
   o  Modify a dialog (reINVITE)
   o  Destroy a dialog (BYE)

   The main advantage set of the 3pcc approach is that it only requires very
   basic goals for SIP support from end systems to support call control features.
   As such, third-party call control is a natural way to handle protocol
   conversion and mid-call features.  It also has the advantage and
   disadvantage multiparty
   applications:
   o  Define Primitives, Not Services.  Allow for a handful of robust
      yet simple mechanisms that new features can/must can be implemented in one place
   only (the controller), combined to deliver features and neither requires enhanced client
   functionality, nor takes advantage of it.

   In addition, a peer-to-peer approach is discussed at length in
      services.  Throughout this
   draft.  The primary drawback document we refer to these simple
      mechanisms as "primitives".  Primitives should be sufficiently
      robust so that when they are combined with each other, they can be
      used to build lots of services.  However, the peer-to-peer model goal is additional
   complexity in the end system and authentication and management
   models.  The benefits not to
      define a provably complete set of primitives.  Note that while the peer-to-peer model include:
   o  state remains at the edges
      IETF will NOT standardize behavior or services, it may define
      example services for informational purposes, as in service
      examples [RFC5359].
   o  call signaling need only go through participants involved (there  Participant oriented.  The primitives should be designed to
      provide services that are no additional points of failure)
   o  peers can take advantage oriented around the experience of end-to-end message integrity or
      encryption the
      participants.  The peer-to-peer approach relies on additional "primitive"
   operations, some authors observe that end users of which are identified here.
   o  Replace an existing dialog
   o  Join features and
      services usually don't care how a new dialog with an existing dialog
   o  Locally perform media forking (multi-unicast)
   o  Ask another UA to send a request on your behalf

   The peer-to-peer approach also relationship is setup.
      Their ultimate experience is based only results in a single SIP dialog,
   directly between on the two UAs.  The 3pcc approach results in two SIP
   dialogs, between each UA resulting media and the controller.  As
      other externally visible characteristics.
   o  Signaling Model independent: Support both a result, the SIP
   features central control and extensions that will be used during the dialog are
   limited to the those understood by a
      peer-to-peer feature invocation model (and combinations of the controller.  As
      two).  Baseline SIP already supports a result, centralized control model
      described in a
   situation where both 3pcc [RFC3725], and the UAs support an advanced SIP feature but community has expressed a
      great deal of interest in peer-to-peer or distributed call control
      using primitives such as those defined in REFER [RFC3515],
      Replaces [RFC3891], and Join [RFC3911].

   o  Mixing Model independent: The bulk of interesting multiparty
      applications involve mixing or combining media from multiple
      participants.  This mixing can be performed by one or more of the
   controller does not,
      participants, or by a centralized mixing resource.  The experience
      of the feature will participants should not be able to be used.

   Many of depend on the features, primitives, and actions described mixing model used.
      While most examples in this document also require some type of media refer to audio mixing, combining, or
   selection as described in the next section.

2.4.  Mixing Models

   SIP permits
      framework applies to any media type.  In this context a variety "mixer"
      refers to combining media of mixing models, which are discussed here
   briefly. the same type in an appropriate,
      media-specific way.  This topic is discussed more thoroughly consistent with the model described
      in the SIP conferencing framework [RFC4353] and cc-conferencing [RFC4579].  SIP
   supports both tightly-coupled and loosely-coupled conferencing,
   although more sophisticated behavior is available in tightly-coupled
   conferences.  In a tightly-coupled conference, a single SIP user
   agent (called framework.
   o  Invoker oriented.  Only the focus) has a direct dialog relationship with each
   participant (and may control non participant user agents as well).
   In who invokes a loosely-coupled conference there is no coordinated signaling
   relationships among the participants.

   For brevity, only feature or a
      service needs to know exactly which service is invoked or why.
      This is good because it allows new services to be created without
      requiring new primitives from all the two most popular conferencing models are
   significantly participants; and it allows
      for much simpler feature authorization policies, for example, when
      participation spans organizational boundaries.  As discussed in
      section 2.7, this document (local and centralized
   mixing).  Applications of also avoids exponential state explosion when
      combining features.  The invoker only has to manage a user
      interface or API to prevent local feature interactions.  All the conversation spaces model
      other participants simply need to loosely-
   coupled multicast and distributed manage the feature interactions
      of a much smaller number of primitives.
   o  Primitives make full unicast mesh conferences use of URIs.  URIs are
   left as an exercise a very powerful
      mechanism for the reader.  Note that describing users and services.  They represent a distributed full
   mesh conference
      plentiful resource that can be used for basic conferences, but does not extremely expressive and easily allow for more complex conferencing actions like splitting,
   merging,
      routed, translated, and sidebars.

   Call control features should manipulated--even across organizational
      boundaries.  URIs can contain special parameters and informational
      headers that need only be designed relevant to allow the owner of the namespace
      (domain) of the URI.  Just as a mixer (local or
   centralized) to decide when to reduce user who selects an http: URL need
      not understand the significance and organization of the web site
      it references, a conference back to user may encounter a 2-party
   call, SIP URI that translates into
      an email-style group alias, that plays a pre-recorded message, or drop all
      runs some complex call-handling logic.  Note that while this may
      seem paradoxical to the participants (for example if only two
   automatons are communicating).  The actual heuristics used previous goal, both goals can be satisfied
      by the same model.
   o  Make use of SIP headers and SIP event packages to release
   calls are beyond provide SIP
      entities with information about their environment.  These should
      include information about the scope status / handling of this document, but may depend dialogs on
   properties in
      other user agents, information about the conversation space, such as history of other contacts
      attempted prior to the number current contact, the status of active,
   passive, or hidden participants;
      participants, the status of conferences, user presence
      information, and the send-only, receive-only, or
   send-and-receive orientation status of various participants.

2.4.1.  Tightly Coupled

   Tightly coupled conferences utilize a central point for signaling messages.
   o  Encourage service decomposition, and
   authentication known as design to make use of
      standard components using well-defined, simple interfaces.  Sample
      components include a focus [RFC4353].  The actual media can be
   centrally mixed or distributed.

2.4.1.1.  (Single) End System Mixing

   The first model we call "end system mixing".  In this model, user A
   calls user B, SIP mixer, recording service, announcement
      server, and they have a conversation.  At some point later, A
   decides voice dialog server.  (This is not an exhaustive
      list).

   o  Include authentication, authorization, policy, logging, and
      accounting mechanisms to conference allow these primitives to be used safely
      among mutually untrusted participants.  Some of these mechanisms
      may be used to assist in user C. To do this, A calls C, using a
   completely separate billing, but no specific billing system
      will be endorsed.
   o  Permit graceful fallback to baseline SIP.  Definitions for new SIP call.  This
      call uses control extensions/primitives must describe a different Call-ID,
   different tags, etc. graceful way to
      fallback to baseline SIP behavior.  Support for one primitive must
      not imply support for another primitive.
   o  There is no call set up directly between B and
   C. No SIP extension desire or external signaling is needed.  A merely
   decides to locally join two dialogs.

      B     C
       \   /
        \ /
         A

   A receives media streams from both B and C, and mixes them.  A sends
   a stream containing A's and C's streams to B, and a stream containing
   A's and B's streams goal to C. Basically, user A handles both signaling
   and media mixing.

2.4.1.2.  Centralized Mixing

   In a centralized mixing model, all participants have a pairwise SIP
   and media relationship with reinvent traditional models, such as
      the mixer.  Common applications model used the H.450 family of
   centralized mixing include ad-hoc conferences and scheduled dial-in protocols, JTAPI, or dial-out conferences.  In the figure below, the mixer M receives
   and sends media to participants A, B, C, D, and E.

      B     C
       \   /
        \ /
         M --- A
        / \
       /   \
      D     E

2.4.1.3.  Centralized Signaling, Distributed Media

   In this conferencing CSTA
      call model, there is a centralized controller, as in
   the dial-in and dial-out cases.  However, these other models do not share the centralized server
   handles signaling only.  The media is still sent directly between
   participants, using either multicast or multi-unicast.  Participants
   perform their own mixing.  Multi-unicast is when a user sends
   multiple packets (one for each recipient, addressed to that
   recipient). design goals
      presented in this document.

2.  Key Concepts

   This is referred to as section introduces a "Decentralized Multipoint
   Conference" number of key concepts which will be used
   to describe and explain various call control operations and services
   in [H.323].  Full mesh media with centralized the remainder of this document.  This includes the conversation
   space model, signaling and mixing is
   another approach.

2.4.2.  Loosely Coupled

   In these models, there is no point of central control common components, and the
   use of SIP
   signaling.  As in URIs.

2.1.  "Conversation Space" Model

   This document introduces the "Centralized Signaling, Distributed Media" case
   above, concept of an abstract "conversation
   space" as a set of participants who believe they are all endpoints
   communicating among one another.  Each conversation space contains
   one or more participants.

   Participants are SIP User Agents that send original media to all other endpoints.  Consequently
   every endpoint mixes their own or
   terminate and receive media from all the other sources, and
   sends their own media to members of the conversation
   space.  Logically, every other participant.

2.4.2.1.  Large-Scale Multicast Conferences

   Large-scale multicast conferences were participant in the original motivation for
   both conversation space has
   access to all the Session Description Protocol SDP [RFC4566] and SIP.  In media generated in that space (this is strictly
   true if all participants share a
   large- scale multicast conference, one common media type).  A SIP User
   Agent that does not contribute or more multicast addresses
   are allocated to the conference.  Each participant joins those
   multicast groups, and sends their consume any media to those groups.  Signaling is not sent NOT a
   participant; nor is a user agent that merely forwards, transcoders,
   mixes, or selects media originating elsewhere in the conversation
   space.
      Note that a conversation space consists of zero or more SIP calls
      or SIP conferences.  A conversation space is similar to the multicast groups.  The sole purpose
      definition of the
   signaling is a "call" in some other call models.

   Participants may represent human users or non-human users (referred
   to inform as robots or automatons in this document).  Some participants may
   be hidden within a conversation space.  Some examples of which multicast groups hidden
   participants include: robots that generate tones, images, or
   announcements during a conference to
   join.  Large-scale multicast conferences are usually pre-arranged,
   with specific start announce users arriving and stop times.  However, multicast conferences
   do not need to be pre-arranged, so long as
   departing, a mechanism exists to
   dynamically obtain human call center supervisor monitoring a multicast address.

2.4.2.2.  Full Distributed Unicast Conferencing

   In this conferencing model, each participant has both conversation
   between a pairwise
   media relationship trainee and a pairwise signaling relationship with every
   other participant (a full mesh).  This model requires a mechanism customer, and robots that record media for
   training or archival purposes.

   Participants may also be active or passive.  Active participants are
   expected to
   maintain be intelligent enough to leave a consistent view of distributed state across the group.
   This conversation space when
   they no longer desire to participate.  (An attentive human
   participant is a classic hard problem in computer science.  Also, this model
   does not scale well for large numbers of participants. because for
   <n> obviously active.)  Some robotic participants the number of media and signaling relationships is
   approximately n-squared.  As (such as
   a result, this model is not generally
   available in commercial implementations; to voice messaging system, an instant messaging agent, or a voice
   dialog system) may be active participants if they can leave the contrary it
   conversation space when there is
   primarily no human interaction.  Other robots
   (for example our tone generating robot from the topic previous example) are
   passive participants.  A human participant "on-hold" is passive.

   An example diagram of research a conversation space can be shown as a "bubble"
   or experimental implementations. ovals, or as a "set" in curly or square brace notation.  Each set,
   oval, or "bubble" represents a conversation space.  Hidden
   participants are shown in lowercase letters.  Examples are given in
   Figure 1.

   Note that this model assumes peer-to-peer signaling.

2.5.  Conveying Information and Events

   Participants should have access while the term "conversation" usually applies to information about oral
   exchange of information, we apply the other
   participants in a conversation space, so that this information can be
   rendered space model to any
   media exchange between participants.

   { A , B }                   [ A , b, C, D ]

      .-.                 .---.
     /   \               /     \
    /  A  \             / A   b \
   (       )           (         )
    \  B  /             \ C   D /
     \   /               \     /
      '-'                 '---'

   Figure 1.  Conversation Spaces.

2.2.  Relationship Between Conversation Space, SIP Dialogs, and SIP
      Sessions

   In SIP, a human user or processed by an automaton.  Although call is "an informal term that refers to some
   of this information may be available from communication
   between peers, generally set up for the Request-URI or To,
   From, Contact, or other SIP headers, another mechanism purposes of reporting
   this information a multimedia
   conversation."  The concept of a conversation space is necessary.

   Many applications are driven by knowledge about needed because
   the progress of calls
   and conferences.  In general these types SIP definition of events allow call is not sufficiently precise for the
   construction
   purpose of distributed applications, where describing the application
   requires information on dialog and conference state, but is not
   necessarily co-resident with an endpoint user agent or experience of multiparty features.

   Do any other definitions convey the correct meaning?  SIP, and SDP
   [RFC4566] both define a conference
   server.  For example, as "a multimedia session
   identified by a focus involved common session description."  A session is defined as
   "a set of multimedia senders and receivers and the data streams
   flowing from senders to receivers."  The definition of "call" in some
   call models is more similar to our definition of a conversation space may
   wish to provide URIs for conference status, and/or conference/floor
   control.

   The
   space.

   Some examples of the relationship between conversation spaces, SIP Events [RFC3265] architecture defines general mechanisms for
   subscription to
   dialogs, and notification of events within SIP networks.  It
   introduces the notion of sessions are listed below.  In each example, a package human
   user will perceive that there is a specific "instantiation"
   of the events mechanism for single call.
   o  A simple two-party call is a single conversation space, a single
      session, and a single dialog.
   o  A locally mixed three-way call is two sessions and two dialogs.
      It is also a single conversation space.
   o  A simple dial-in audio conference is a single conversation space,
      but is represented by as many dialogs and sessions as there are
      human participants.
   o  A multicast conference is a single conversation space, a single
      session, and as many dialogs as participants.

2.3.  Signaling Models

   Obviously to make changes to a well-defined set of events.

   Event packages are needed conversation space, you must be able
   to provide the status of use SIP signaling to cause these changes.  Specifically there must
   be a user's dialogs,
   provide the status of conferences and its participants, provide user
   presence information, provide the status of registrations, way to manipulate SIP dialogs (call legs) to move participants
   into and
   provide the status out of user's messages.  While conversation spaces.  Although this is not an
   exhaustive list, these are sufficient to enable the sample features
   described in this document.

   The conference event package [RFC4575] allows users to subscribe as
   obvious, there also must be a way to
   information about an entire tightly-coupled manipulate SIP conference.
   Notifications convey information about the participants such as: dialogs to
   include non-participant user agents that are otherwise involved in a
   conversation space (ex: B2BUAs, 3pcc controllers, mixers,
   transcoders, translators, or relays).

   Implementations may setup the
   SIP URI identifying each user, their status media relationships described in the
   conversation space (active,
   declined, departed), URIs to invoke other features (such as sidebar
   conversations), links model using a centralized control model.  One
   common way to other relevant information (such implement this using SIP is known as floor
   control policies), and if floor control policies are in place, the
   user's floor control status.  For conversation spaces created from
   cascaded conferences, conversation state can be gathered from
   relevant foci 3rd Party Call
   Control (3pcc) and merged into a cohesive set of state.

   The dialog package [RFC4235] provides information about all the
   dialogs the target user is maintaining, what conversations the user described in participating in, and how these are correlated.  Likewise 3pcc [RFC3725].  The 3pcc approach
   relies on only the
   registration package [RFC3680] provides notifications when contacts
   have changed for following 3 primitive operations:
   o  Create a specific address-of-record. new dialog (INVITE)
   o  Modify a dialog (reINVITE)
   o  Destroy a dialog (BYE)

   The combination main advantage of
   these allows the 3pcc approach is that it only requires very
   basic SIP support from end systems to support call control features.
   As such, third-party call control is a user agent natural way to learn about all conversations occurring
   for handle protocol
   conversion and mid-call features.  It also has the entire registered contact set for an address-of-record.

   Note advantage and
   disadvantage that user presence new features can/must be implemented in SIP [RFC3856] has one place
   only (the controller), and neither requires enhanced client
   functionality, nor takes advantage of it.

   In addition, a close relationship
   with these later two event packages.  It peer-to-peer approach is fundamental to discussed at length in this
   draft.  The primary drawback of the
   presence peer-to-peer model that the information used to obtain user presence is
   constructed from any number of different input sources.  Examples of
   other such sources include calendaring information additional
   complexity in the end system and uploads authentication and management
   models.  The benefits of
   presence documents.  These two packages the peer-to-peer model include:
   o  state remains at the edges
   o  call signaling need only go through participants involved (there
      are no additional points of failure)
   o  peers can be considered another
   mechanism that allows take advantage of end-to-end message integrity or
      encryption

   The peer-to-peer approach relies on additional "primitive"
   operations, some of which are identified here.
   o  Replace an existing dialog
   o  Join a presence agent new dialog with an existing dialog
   o  Locally perform media forking (multi-unicast)
   o  Ask another UA to determine send a request on your behalf

   The peer-to-peer approach also only results in a single SIP dialog,
   directly between the presence
   state of two UAs.  The 3pcc approach results in two SIP
   dialogs, between each UA and the user.  Specifically, a user presence server can act as controller.  As a
   subscriber for result, the dialog SIP
   features and registration packages to obtain
   additional information extensions that can will be used to construct a presence
   document.

   The multi-party architecture may also need to provide a mechanism to
   get information about during the status /handling of a dialog (for example,
   information about the history of other contacts attempted prior are
   limited to the current contact).  Finally, those understood by the architecture should provide ample
   opportunities to present informational URIs that relate to calls,
   conversations, or dialogs controller.  As a result, in some way.  For example, consider a
   situation where both the UAs support an advanced SIP
   Call-Info header, or Contact headers returned in a 300-class
   response.  Frequently additional information about a call or dialog
   can feature but the
   controller does not, the feature will not be fetched via non-SIP URIs.  For example, consider a web page
   for package tracking when calling a delivery company, or a web page
   with related documentation when joining a dial-in conference.  The
   use able to be used.

   Many of URIs in the multiparty framework is discussed features, primitives, and actions described in more detail this
   document also require some type of media mixing, combining, or
   selection as described in Section 3.7.

   Finally the interaction of next section.

2.4.  Mixing Models

   SIP with stimulus-signaling-based
   applications, that allow permits a user agent to interact with an application
   without knowledge of the semantics variety of that application, mixing models, which are discussed here
   briefly.  This topic is discussed more thoroughly in the SIP application interaction
   conferencing framework
   [I-D.ietf-sipping-app-interaction-framework].  Stimulus signaling can
   occur to [RFC4353] and [RFC4579].  SIP supports both
   tightly-coupled and loosely-coupled conferencing, although more
   sophisticated behavior is available in tightly-coupled conferences.
   In a tightly-coupled conference, a single SIP user interface running locally with agent (called the client, or to
   focus) has a
   remote direct dialog relationship with each participant (and
   may control non participant user interface, through media streams.  Stimulus signaling
   encompasses agents as well).  The focus can
   authoritatively publish information about the character and
   participants in a wide range of mechanisms, ranging from clicking on
   hyperlinks, to pressing buttons, to traditional Dual Tone Multi
   Frequency (DTMF) input. conference.  In all cases, stimulus signaling a loosely-coupled conference there
   is
   supported through no coordinated signaling relationships among the use of markup languages, which play a key role participants.

   For brevity, only the two most popular conferencing models are
   significantly discussed in that framework.

2.6.  Componentization this document (local and Decomposition

   This framework proposes a decomposed component architecture with a
   very loose coupling centralized
   mixing).  Applications of services the conversation spaces model to loosely-
   coupled multicast and components.  This means that a
   service (such as a conferencing server or an auto-attendant) need not
   be implemented distributed full unicast mesh conferences are
   left as an actual server.  Rather, these services exercise for the reader.  Note that a distributed full
   mesh conference can be
   built by combining a few used for basic components in straightforward or
   arbitrarily complex ways.

   Since the components are conferences, but does not
   easily deployed on separate boxes, by
   separate vendors, allow for more complex conferencing actions like splitting,
   merging, and sidebars.

   Call control features should be designed to allow a mixer (local or even with separate providers, we achieve
   centralized) to decide when to reduce a
   separation of function that allows each piece conference back to be developed in
   complete isolation.  We can also reuse existing components for new
   applications.  This allows rapid service creation, and a 2-party
   call, or drop all the ability
   for services participants (for example if only two
   automatons are communicating).  The actual heuristics used to be distributed across organizational domains anywhere release
   calls are beyond the scope of this document, but may depend on
   properties in the Internet.

   For many conversation space, such as the number of these components it is also desirable to discover their
   capabilities, for example querying active,
   passive, or hidden participants; and the ability send-only, receive-only, or
   send-and-receive orientation of various participants.

2.4.1.  Tightly Coupled

   Tightly coupled conferences utilize a mixer to host central point for signaling and
   authentication known as a
   10 dialog conference, focus [RFC4353].  The actual media can be
   centrally mixed or distributed.

2.4.1.1.  (Single) End System Mixing

   The first model we call "end system mixing".  In this model, user A
   calls user B, and they have a conversation.  At some point later, A
   decides to reserve resources for conference in user C. To do this, A calls C, using a
   completely separate SIP call.  This call uses a specific time.
   These actions could be provided in the form of URIs, provided there different Call-ID,
   different tags, etc.  There is an a priori means of understanding their semantics.  For example
   if there no call set up directly between B and
   C. No SIP extension or external signaling is needed.  A merely
   decides to locally join two dialogs.

      B     C
       \   /
        \ /
         A

   Figure 2.  End System mixing Example.

   In Figure 2, A receives media streams from both B and C, and mixes
   them.  A sends a published dictionary of operations, a way stream containing A's and C's streams to query the
   service for the available operations B, and the associated URIs, the URI
   can be the interface for providing these service operations.  This
   concept is described in more detail in the context of dialog
   operations in Section 3.

2.6.1.  Media Intermediaries

   Media Intermediaries are not participants in any conversation space,
   although an entity that is also a
   stream containing A's and B's streams to C. Basically, user A handles
   both signaling and media translator may also have mixing.

2.4.1.2.  Centralized Mixing

   In a
   co-located participant component (for example centralized mixing model, all participants have a mixer that also
   announces pairwise SIP
   and media relationship with the arrival mixer.  Common applications of a new participant;
   centralized mixing include ad-hoc conferences and scheduled dial-in
   or dial-out conferences.  In Figure 3 below, the announcement portion mixer M receives and
   sends media to participants A, B, C, D, and E.

      B     C
       \   /
        \ /
         M --- A
        / \
       /   \
      D     E

   Figure 3.  Centralized Mixing Example.

2.4.1.3.  Centralized Signaling, Distributed Media

   In this conferencing model, there is a participant, but the mixer itself is not).  Media intermediaries
   should be as transparent centralized controller, as possible to the end users--offering a
   useful, fundamental service; without getting in
   the way of new
   features implemented by participants.  Some common dial-in and dial-out cases.  However, the centralized server
   handles signaling only.  The media
   intermediaries are described below.

2.6.2.  Mixer

   A SIP mixer is still sent directly between
   participants, using either multicast or multi-unicast.  Participants
   perform their own mixing.  Multi-unicast is when a component user sends
   multiple packets (one for each recipient, addressed to that combines
   recipient).  This is referred to as a "Decentralized Multipoint
   Conference" in [H.323].  Full mesh media from all dialogs with centralized mixing is
   another approach.

2.4.2.  Loosely Coupled

   In these models, there is no point of central control of SIP
   signaling.  As in the same conversation in a "Centralized Signaling, Distributed Media" case
   above, all endpoints send media specific way.  For example, the
   default combining for an audio conference might be an N-1
   configuration, while a text mixer might interleave text messages on a
   per-line basis.  More details about how to manipulate the all other endpoints.  Consequently
   every endpoint mixes their own media
   policy used by mixers is being discussed in from all the XCON Working Group.

2.6.3.  Transcoder

   A transcoder translates other sources, and
   sends their own media from one encoding or format to another
   (for example, GSM voice to G.711, MPEG2 to H.261, every other participant.

2.4.2.1.  Large-Scale Multicast Conferences

   Large-scale multicast conferences were the original motivation for
   both the Session Description Protocol SDP [RFC4566] and SIP.  In a
   large- scale multicast conference, one or text/html more multicast addresses
   are allocated to
   text/plain), or from one the conference.  Each participant joins those
   multicast groups, and sends their media type to another (for example text those groups.  Signaling
   is not sent to
   speech).  A more thorough discussion the multicast groups.  The sole purpose of transcoding the
   signaling is described in
   SIP transcoding services invocation
   [I-D.ietf-sipping-transc-framework].

2.6.4.  Media Relay

   A media relay terminates media and simply forwards it to a new
   destination without changing the content in any way.  Sometimes media
   relays are used inform participants of which multicast groups to provide source IP address anonymity,
   join.  Large-scale multicast conferences are usually pre-arranged,
   with specific start and stop times.  However, multicast conferences
   do not need to facilitate
   middlebox traversal, or be pre-arranged, so long as a mechanism exists to provide
   dynamically obtain a trusted entity where multicast address.

2.4.2.2.  Full Distributed Unicast Conferencing

   In this conferencing model, each participant has both a pairwise
   media can
   be forcefully disconnected.

2.6.5.  Queue Server

   A queue server relationship and a pairwise signaling relationship with every
   other participant (a full mesh).  This model requires a mechanism to
   maintain a consistent view of distributed state across the group.
   This is a location where calls can be entered into one classic hard problem in computer science.  Also, this model
   does not scale well for large numbers of
   several FIFO (first-in, first-out) queues.  A queue server would
   subscribe to participants. because for
   <n> participants the presence number of groups or individuals who are interested
   in its queues.  When detecting that media and signaling relationships is
   approximately n-squared.  As a user result, this model is not generally
   available to service a
   queue, the server redirects or transfers the last call in the
   relevant queue commercial implementations; to the available user.  On a queue-by-queue basis,
   authorized users could also subscribe to contrary it is
   primarily the call state (dialog
   information) topic of calls within a queue.  Authorized users could use research or experimental implementations.
   Note that this information to effectively pluck (take) a call out of the queue
   (for example by sending an INVITE with a Replaces header model assumes peer-to-peer signaling.

2.5.  Conveying Information and Events

   Participants should have access to one of information about the user agents other
   participants in the queue).

2.6.6.  Parking Place

   A parking place is a location where calls conversation space, so that this information can be terminated
   temporarily and then retrieved later.  While
   rendered to a call is "parked", it
   can receive media "on-hold" such as music, announcements, human user or
   advertisements.  Such a service could processed by an automaton.  Although some
   of this information may be further decomposed such that
   announcements available from the Request-URI or music To,
   From, Contact, or other SIP headers, another mechanism of reporting
   this information is necessary.

   Many applications are handled driven by a separate component.

2.6.7.  Announcements knowledge about the progress of calls
   and Voice Dialogs

   An announcement server conferences.  In general these types of events allow for the
   construction of distributed applications, where the application
   requires information on dialog and conference state, but is a server that can play digitized media
   (frequently audio), such as music or recorded speech.  These servers
   are typically accessible via SIP, HTTP, not
   necessarily co-resident with an endpoint user agent or RTSP.  An analogous
   service is conference
   server.  For example, a recording service that stores digitized media.  A
   convention for specifying announcements focus involved in SIP a conversation space may
   wish to provide URIs is described in
   [RFC4240].  Likewise for conference status, and/or conference/floor
   control.

   The SIP Events [RFC3265] architecture defines general mechanisms for
   subscription to and notification of events within SIP networks.  It
   introduces the same server could easily provide notion of a service package that records digitized media.

   A "voice dialog" is a model specific "instantiation"
   of spoken interactive behavior between the events mechanism for a
   human and an automaton that can include synthesized speech, digitized
   audio, recognition well-defined set of spoken events.

   Event packages are needed to provide the status of a user's dialogs,
   provide the status of conferences and DTMF key input, recording their participants, provide
   user presence information, provide the status of spoken
   input, registrations, and interaction with call control.  Voice dialogs frequently
   consist
   provide the status of forms or menus.  Forms present user's messages.  While this is not an
   exhaustive list, these are sufficient to enable the sample features
   described in this document.

   The conference event package [RFC4575] allows users to subscribe to
   information about an entire tightly-coupled SIP conference.
   Notifications convey information about the participants such as: the
   SIP URI identifying each user, their status in the space (active,
   declined, departed), URIs to invoke other features (such as sidebar
   conversations), links to other relevant information (such as floor
   control policies), and gather
   input; menus offer choices if floor control policies are in place, the
   user's floor control status.  For conversation spaces created from
   cascaded conferences, conversation state can be gathered from
   relevant foci and merged into a cohesive set of what to do next.

   Spoken state.

   The dialog package [RFC4235] provides information about all the
   dialogs the target user is maintaining, what conversations the user
   in participating in, and how these are correlated.  Likewise the
   registration package [RFC3680] provides notifications when contacts
   have changed for a basic building block specific address-of-record.  The combination of applications that use
   voice.  Consider for example that
   these allows a voice mail system, user agent to learn about all conversations occurring
   for the
   conference-id and passcode collection system entire registered contact set for a conferencing
   system, and complicated voice portal applications all require a voice
   dialog component.

2.6.7.1.  Text-to-Speech and Automatic Speech Recognition

   Text-to-Speech (TTS) is a service an address-of-record.

   Note that converts text into digitized
   audio.  TTS is frequently integrated into other applications, but
   when separated as user presence in SIP [RFC3856] has a component, it provides greater opportunity for
   broad reuse.  Automatic Speech Recognition (ASR) close relationship
   with these later two event packages.  It is a service fundamental to the
   presence model that
   attempts the information used to decipher digitized speech based on a proposed grammar.
   Like TTS, ASR services obtain user presence is
   constructed from any number of different input sources.  Examples of
   other such sources include calendaring information and uploads of
   presence documents.  These two packages can be embedded, or exposed so considered another
   mechanism that many
   applications can take advantage of such services.  A standardized
   (decomposed) interface allows a presence agent to access standalone TTS and ASR services is
   currently being developed in determine the SPEECHSC Working Group.

2.6.7.2.  VoiceXML

   VoiceXML is presence
   state of the user.  Specifically, a W3C recommendation that was designed to give authors
   control over user presence server can act as a
   subscriber for the spoken dialog between users and applications. registration packages to obtain
   additional information that can be used to construct a presence
   document.

   The
   application and user take turns speaking: the application prompts multi-party architecture may also need to provide a mechanism to
   get information about the
   user, and status /handling of a dialog (for example,
   information about the user in turn responds.  Its major goal is history of other contacts attempted prior to bring
   the
   advantages of web-based development and content delivery current contact).  Finally, the architecture should provide ample
   opportunities to
   interactive voice response applications.  We believe present informational URIs that VoiceXML
   represents relate to calls,
   conversations, or dialogs in some way.  For example, consider the ideal partner for SIP
   Call-Info header, or Contact headers returned in the development of
   distributed IVR servers.  VoiceXML is an XML based scripting language a 300-class
   response.  Frequently additional information about a call or dialog
   can be fetched via non-SIP URIs.  For example, consider a web page
   for describing IVR services at an abstract level.  VoiceXML supports
   DTMF recognition, speech recognition, text-to-speech, and playing out
   of recorded media files. package tracking when calling a delivery company, or a web page
   with related documentation when joining a dial-in conference.  The results
   use of URIs in the data collected from multiparty framework is discussed in more detail
   in Section 3.7.

   Finally the
   user are passed to interaction of SIP with stimulus-signaling-based
   applications, that allow a controlling entity through an HTTP POST
   operation.  The controller can then return another script, or
   terminate user agent to interact with an application
   without knowledge of the semantics of that application, is discussed
   in the SIP application interaction framework
   [I-D.ietf-sipping-app-interaction-framework].  Stimulus signaling can
   occur to a user interface running locally with the IVR server.

   A VoiceXML server also need not be implemented as client, or to a monolithic
   server.  Below is
   remote user interface, through media streams.  Stimulus signaling
   encompasses a diagram wide range of mechanisms, ranging from clicking on
   hyperlinks, to pressing buttons, to traditional Dual Tone Multi
   Frequency (DTMF) input.  In all cases, stimulus signaling is
   supported through the use of markup languages, which play a VoiceXML browser key role
   in that is split into
   media and non-media handling parts.  The VoiceXML interpreter handles
   SIP dialog state framework.

2.6.  Componentization and state within Decomposition

   This framework proposes a VoiceXML document, and sends
   requests to the media decomposed component over another protocol.

                       +-------------+
                       |             |
                       | VoiceXML    |
                       | Interpreter |
                       | (signaling) |
                       +-------------+
                         ^          ^
                         |          |
                     SIP |          | RTSP
                         |          |
                         |          |
                         v          v
            +-------------+        +-------------+
            |             |        |             |
            |  SIP UA     |   RTP  | RTSP Server |
            |             |<------>|   (media)   |
            |             |        |             |
            +-------------+        +-------------+

                   Figure : Decomposed VoiceXML Server

2.7.  Use of URIs

   All naming in SIP uses URIs.  URIs in SIP are used in architecture with a plethora
   very loose coupling of
   contexts: the Request-URI; Contact, To, From, and *-Info headers;
   application/uri bodies; services and embedded components.  This means that a
   service (such as a conferencing server or an auto-attendant) need not
   be implemented as an actual server.  Rather, these services can be
   built by combining a few basic components in email, web pages, instant
   messages, and ENUM records.  The request-URI identifies straightforward or
   arbitrarily complex ways.

   Since the user components are easily deployed on separate boxes, by
   separate vendors, or
   service even with separate providers, we achieve a
   separation of function that the call is destined for.

   SIP URIs embedded allows each piece to be developed in informational SIP headers, SIP bodies, and non-
   SIP content
   complete isolation.  We can also specify methods, special parameters, headers, reuse existing components for new
   applications.  This allows rapid service creation, and even bodies.  For example:

   sip:bob@b.example.com;method=REFER?Refer-To=http://example.com/~alice

   Throughout this draft we discuss call control primitive operations.
   One of the biggest problems is defining how these operations may ability
   for services to be
   invoked.  There are a number distributed across organizational domains anywhere
   in the Internet.

   For many of ways to do this.  One way these components it is also desirable to
   define the primitives in the protocol itself such that SIP methods
   (for example REFER) or SIP headers (for discover their
   capabilities, for example Replaces) indicate querying the ability of a
   specific call control action.  Another way mixer to invoke call control
   primitives is host a
   10 dialog conference, or to define reserve resources for a specific Request-URI naming convention.
   Either these conventions must time.
   These actions could be shared between the client (the
   invoker) and the server, or published by or on behalf of the server.
   The former involves defining URI construction techniques (e.g.  URI
   parameters and/or token conventions) as proposed in [RFC4240].  The
   latter technique usually involves discovering the URI via a SIP event
   package, a web page, a business card, or provided in the form of URIs, provided there
   is an Instant Message.  Yet
   another a priori means to acquire the URIs of understanding their semantics.  For example
   if there is to define a published dictionary of
   primitives with well-defined semantics and provide operations, a means way to query the named primitives
   service for the available operations and corresponding URIs that may the associated URIs, the URI
   can be invoked on the interface for providing these service or dialogs.

2.7.1.  Naming Users operations.  This
   concept is described in SIP

   An address-of-record, or public SIP address, more detail in the context of dialog
   operations in Section 3.

2.6.1.  Media Intermediaries

   Media Intermediaries are not participants in any conversation space,
   although an entity that is also a SIP (or SIPS) URI
   that points to media translator may also have a domain with
   co-located participant component (for example a location server mixer that can map the URI
   to set of Contact URIs where the user might be available.  Typically also
   announces the Contact URIs are populated via registration.

   Address arrival of Record               Contacts

   sip:bob@biloxi.example.com -> sip:bob@babylon.biloxi.example.com:5060
                                 sip:bbrown@mailbox.provider.example.net
                                 sip:+1.408.555.6789@mobile.example.net

   Callee Capabilities [RFC3840] defines a set of additional parameters
   to the Contact header that define the characteristics of the user
   agent at new participant; the specified URI.  For example, there announcement portion
   is a mobility
   parameter that indicates whether participant, but the UA mixer itself is fixed or mobile.  When not).  Media intermediaries
   should be as transparent as possible to the end users--offering a
   user agent registers, it places these parameters
   useful, fundamental service; without getting in the Contact
   headers to characterize the URIs it way of new
   features implemented by participants.  Some common media
   intermediaries are described below.

2.6.2.  Mixer

   A SIP mixer is registering.  This allows a
   proxy for component that domain to have information about combines media from all dialogs in
   the contact addresses same conversation in a media specific way.  For example, the
   default combining for that user.

   When an audio conference might be an N-1
   configuration, while a caller sends text mixer might interleave text messages on a request, it can optionally request Caller
   Preferences [RFC3841], by including
   per-line basis.  More details about how to manipulate the Accept-Contact, Request-
   Disposition, and Reject-Contact headers that request certain handling media
   policy used by the proxy mixers is being discussed in the target domain.  These headers contain preferences
   that describe the set [I-D.ietf-xcon-ccmp].

2.6.3.  Transcoder

   A transcoder translates media from one encoding or format to another
   (for example, GSM voice to G.711, MPEG2 to H.261, or text/html to
   text/plain), or from one media type to another (for example text to
   speech).  A more thorough discussion of desired URIs transcoding is described in
   SIP transcoding services invocation [RFC5369].

2.6.4.  Media Relay

   A media relay terminates media and simply forwards it to which a new
   destination without changing the caller would like
   their request routed.  The proxy content in the target domain matches these
   preferences with the Contact characteristics originally registered by
   the target user.  The target user can also choose any way.  Sometimes media
   relays are used to run arbitrarily
   complex "Find-me" feature logic on provide source IP address anonymity, to facilitate
   middlebox traversal, or to provide a proxy in the target domain.

   There trusted entity where media can
   be forcefully disconnected.

2.6.5.  Queue Server

   A queue server is a strong asymmetry in how preferences for callers and
   callees location where calls can be presented entered into one of
   several FIFO (first-in, first-out) queues.  A queue server would
   subscribe to the network.  While presence of groups or individuals who are interested
   in its queues.  When detecting that a caller takes an
   active role by initiating the request, the callee takes user is available to service a passive
   role in waiting for requests.  This motivates
   queue, the use of callee-
   supplied scripts and caller preferences included in server redirects or transfers the last call request.
   This asymmetry is also reflected in the appropriate relationship
   between caller and callee preferences.  A server for
   relevant queue to the available user.  On a callee should
   respect queue-by-queue basis,
   authorized users could also subscribe to the wishes call state (dialog
   information) of the caller calls within a queue.  Authorized users could use
   this information to avoid certain locations, while effectively pluck (take) a call out of the preferences among locations has queue
   (for example by sending an INVITE with a Replaces header to be one of
   the callee's choice, as it
   determines where, for example, user agents in the phone rings queue).

2.6.6.  Parking Place

   A parking place is a location where calls can be terminated
   temporarily and whether the callee
   incurs mobile telephone charges for incoming calls.

   SIP User Agent implementations are encouraged to make intelligent
   decisions based on the type of participants (active/passive, hidden,
   human/robot) in then retrieved later.  While a conversation space.  This information call is conveyed
   via the dialog package "parked", it
   can receive media "on-hold" such as music, announcements, or in a SIP header parameter communicated
   using an appropriate SIP header.  For example, a music on hold
   advertisements.  Such a service may take the sensible approach could be further decomposed such that if there
   announcements or music are two handled by a separate component.

2.6.7.  Announcements and Voice Dialogs

   An announcement server is a server that can play digitized media
   (frequently audio), such as music or more
   unhidden participants, it should not provide hold music; recorded speech.  These servers
   are typically accessible via SIP, HTTP, or RTSP.  An analogous
   service is a recording service that it
   will not send hold music to robots.

   Multiple participants stores digitized media.  A
   convention for specifying announcements in SIP URIs is described in
   [RFC4240].  Likewise the same conversation space may represent
   the same server could easily provide a service
   that records digitized media.

   A "voice dialog" is a model of spoken interactive behavior between a
   human user.  For example, the user may and an automaton that can include synthesized speech, digitized
   audio, recognition of spoken and DTMF key input, recording of spoken
   input, and interaction with call control.  Voice dialogs frequently
   consist of forms or menus.  Forms present information and gather
   input; menus offer choices of what to do next.

   Spoken dialogs are a basic building block of applications that use one participant
   voice.  Consider for video, chat, and whiteboard media on example that a PC voice mail system, the
   conference-id and another passcode collection system for audio
   media on a SIP phone.  In this case, the address-of-record conferencing
   system, and complicated voice portal applications all require a voice
   dialog component.

2.6.7.1.  Text-to-Speech and Automatic Speech Recognition

   Text-to-Speech (TTS) is the
   same for both user agents, a service that converts text into digitized
   audio.  TTS is frequently integrated into other applications, but the Contacts are different.  In
   addition, human users may add robot participants
   when separated as a component, it provides greater opportunity for
   broad reuse.  Automatic Speech Recognition (ASR) is a service that act
   attempts to decipher digitized speech based on their
   behalf (for example a call recording service, proposed grammar.
   Like TTS, ASR services can be embedded, or a calendar
   announcement reminder).  Call Control features in SIP should continue
   to function as expected in exposed so that many
   applications can take advantage of such an environment.

2.7.2.  Naming Services with SIP URIs services.  A critical piece of defining standardized
   (decomposed) interface to access standalone TTS and ASR services is
   currently being developed in [RFC4313].

2.6.7.2.  VoiceXML

   VoiceXML is a session level service W3C recommendation that can be
   accessed by SIP was designed to give authors
   control over the spoken dialog between users and applications.  The
   application and user take turns speaking: the application prompts the
   user, and the user in turn responds.  Its major goal is defining to bring the naming
   advantages of the resources within web-based development and content delivery to
   interactive voice response applications.  We believe that
   service.  This point cannot be overstated.

   In VoiceXML
   represents the context of ideal partner for SIP control of application components, we take
   advantage of the fact that in the left-hand-side development of a standard SIP URI
   distributed IVR servers.  VoiceXML is a user part.  Most an XML based scripting language
   for describing IVR services may be thought of as user automatons
   that participate in SIP sessions.  It naturally follows that at an abstract level.  VoiceXML supports
   DTMF recognition, speech recognition, text-to-speech, and playing out
   of recorded media files.  The results of the data collected from the
   user
   part should are passed to a controlling entity through an HTTP POST
   operation.  The controller can then return another script, or
   terminate the interaction with the IVR server.

   A VoiceXML server also need not be utilized implemented as a service indicator.

   For example, media servers commonly offer multiple services at monolithic
   server.  Figure 4 shows a
   single host address.  Use diagram of the user part as a service indicator
   enables service consumers to direct their VoiceXML browser that is split
   into media and non-media handling parts.  The VoiceXML interpreter
   handles SIP dialog state and state within a VoiceXML document, and
   sends requests without ambiguity.
   It has to the added benefit of enabling media services to register their
   availability with component over another protocol.

                       +-------------+
                       |             |
                       | VoiceXML    |
                       | Interpreter |
                       | (signaling) |
                       +-------------+
                         ^          ^
                         |          |
                     SIP Registrars just as any "real" |          | RTSP
                         |          |
                         |          |
                         v          v
            +-------------+        +-------------+
            |             |        |             |
            |  SIP user would.
   This maintains consistency and provides enhanced flexibility in the
   deployment UA     |   RTP  | RTSP Server |
            |             |<------>|   (media)   |
            |             |        |             |
            +-------------+        +-------------+

   Figure 4.  Decomposed VoiceXML Server.

2.7.  Use of media services URIs

   All naming in the network.

   There has been much discussion about the potential for confusion if
   media services SIP uses URIs.  URIs are not readily distinguishable from other types
   of in SIP UAs.  The use of a service namespace provides are used in a mechanism to
   unambiguously identify standard interfaces while not constraining the
   development plethora of private or experimental services.

   In SIP,
   contexts: the Request-URI Request-URI; Contact, To, From, and *-Info headers;
   application/uri bodies; and embedded in email, web pages, instant
   messages, and ENUM records.  The request-URI identifies the user or
   service that the call is destined for.  The great advantage of using

   SIP URIs (specifically,
   the embedded in informational SIP Request-URI) as headers, SIP bodies, and non-
   SIP content can also specify methods, special parameters, headers,
   and even bodies.  For example:

   sip:bob@b.example.com;method=REFER?Refer-To=http://example.com/~alice

   Throughout this draft we discuss call control primitive operations.

   One of the biggest problems is defining how these operations may be
   invoked.  There are a service identifier comes because number of ways to do this.  One way is to
   define the
   combination primitives in the protocol itself such that SIP methods
   (for example REFER) or SIP headers (for example Replaces) indicate a
   specific call control action.  Another way to invoke call control
   primitives is to define a specific Request-URI naming convention.
   Either these conventions must be shared between the client (the
   invoker) and the server, or published by or on behalf of two facts.  First, unlike the server.
   The former involves defining URI construction techniques (e.g.  URI
   parameters and/or token conventions) as proposed in [RFC4240].  The
   latter technique usually involves discovering the PSTN, where URI via a SIP event
   package, a web page, a business card, or an Instant Message.  Yet
   another means to acquire the
   namespace (dialable telephone numbers) are limited, URIs come from an
   infinite space.  They are plentiful, is to define a dictionary of
   primitives with well-defined semantics and they are free.  Secondly, provide a means to query
   the named primitives and corresponding URIs that may be invoked on
   the primary function of service or dialogs.

2.7.1.  Naming Users in SIP

   An address-of-record, or public SIP address, is call routing through manipulations of
   the Request-URI.  In the traditional a SIP application, this (or SIPS) URI
   represents
   that points to a person.  However, the URI can also represent domain with a service,
   as we propose here.  This means we location service that can apply map the routing services SIP
   provides URI
   to routing set of calls to services.  The result - Contact URIs where the problem user might be available.  Typically
   the Contact URIs are populated via registration.

   Address of service invocation and service location becomes a routing problem,
   for which SIP provides a scalable and flexible solution.  Since there
   is such Record               Contacts

   sip:bob@biloxi.example.com -> sip:bob@babylon.biloxi.example.com:5060
                                 sip:bbrown@mailbox.provider.example.net
                                 sip:+1.408.555.6789@mobile.example.net

   Callee Capabilities [RFC3840] defines a vast namespace set of services, we can explicitly name each
   service in a finely granular way.  This allows additional parameters
   to the distribution Contact header that define the characteristics of
   services across the network. user
   agent at the specified URI.  For further discussion about services
   and SIP URIs, see RFC 3087 [RFC3087]

   Consider example, there is a conferencing service, where we have separated mobility
   parameter that indicates whether the names of
   ad-hoc conferences from scheduled conferences, we can program proxies
   to route calls for ad-hoc conferences UA is fixed or mobile.  When a
   user agent registers, it places these parameters in the Contact
   headers to one set of servers, and
   calls characterize the URIs it is registering.  This allows a
   proxy for scheduled ones that domain to another, possibly even in a different
   provider.  In fact, since each conference itself is given have information about the contact addresses
   for that user.

   When a URI, we caller sends a request, it can distribute conferences across servers, optionally request Caller
   Preferences [RFC3841], by including the Accept-Contact, Request-
   Disposition, and easily guarantee Reject-Contact headers that
   calls for the same conference always get routed to request certain handling
   by the same server.
   This is in stark contrast to conferences proxy in the telephone network,
   where target domain.  These headers contain preferences
   that describe the equivalent set of desired URIs to which the URI - caller would like
   their request routed.  The proxy in the phone number - is scarce.  An
   entire conferencing provider generally has one or two numbers.
   Conference IDs must be obtained through IVR interactions target domain matches these
   preferences with the
   caller, or through Contact characteristics originally registered by
   the target user.  The target user can also choose to run arbitrarily
   complex "Find-me" feature logic on a human attendant.  This makes it difficult proxy in the target domain.

   There is a strong asymmetry in how preferences for callers and
   callees can be presented to
   distribute conferences across servers all over the network, since network.  While a caller takes an
   active role by initiating the
   PSTN routing only knows about request, the dialed number.

   For more examples, consider callee takes a passive
   role in waiting for requests.  This motivates the URI conventions use of RFC 4240 [RFC4240]
   for media servers and RFC 4458 [RFC4458] for voicemail callee-
   supplied scripts and IVR
   systems.

   In practical applications, it caller preferences included in the call request.
   This asymmetry is important that an invoker does not
   necessarily apply semantic rules to various URIs it did not create.
   Instead, it also reflected in the appropriate relationship
   between caller and callee preferences.  A server for a callee should allow any arbitrary string
   respect the wishes of the caller to avoid certain locations, while
   the preferences among locations has to be provisioned, the callee's choice, as it
   determines where, for example, the phone rings and
   map whether the string callee
   incurs mobile telephone charges for incoming calls.

   SIP User Agent implementations are encouraged to make intelligent
   decisions based on the desired behavior.  The administrator type of participants (active/passive, hidden,
   human/robot) in a conversation space.  This information is conveyed
   via the dialog package or in a SIP header parameter communicated
   using an appropriate SIP header.  For example, a music on hold
   service may choose to provision specific conventions or mnemonic
   strings, but take the application sensible approach that if there are two or more
   unhidden participants, it should not require it.  In any large
   installation, the system owner is likely to have pre-existing rules
   for mnemonic URIs, and any attempt by an application to define its
   own rules may create a conflict.  Implementations should allow an
   arbitrary mix of URIs from these schemes, provide hold music; or any other scheme that
   renders valid SIP URIs it
   will not send hold music to be provisioned, rather than enforce only robots.

   Multiple participants in the same conversation space may represent
   the same human user.  For example, the user may use one particular scheme.

   As we have shown, participant
   device for video, chat, and whiteboard media on a PC and another for
   audio media on a SIP URIs represent an ideal, flexible mechanism phone.  In this case, the address-of-record is
   the same for
   describing and naming service resources, regardless if both user agents, but the resources Contacts are queues, conferences, voice dialogs, announcements, voicemail
   treatments, or phone features.

2.8.  Invoker Independence

   With functional signaling, different.  In
   this case, there is really only the invoker of one human participant.  In addition,
   human users may add robot participants that act on their behalf (for
   example a call recording service, or a calendar announcement
   reminder).  Call control features in SIP need should continue to know exactly which feature they are invoking.  One of the primary
   benefits of this approach is that combinations of functional features
   work function
   as expected in such an environment.

2.7.2.  Naming Services with SIP call control without requiring complex feature
   interaction matrices.  For example, let us examine the combination of
   a "transfer" URIs

   A critical piece of defining a call session level service that is "conferenced".

   Alice calls Bob. Alice silently "conferences in" her robotic
   assistant Albert as a hidden party.  Bob transfers Alice to Carol.
   If Bob asks Alice to Replace her leg with a new one to Carol then
   both Alice and Albert should can be communicating with Carol
   (transparently).

   Using
   accessed by SIP is defining the peer-to-peer model, this combination naming of features works fine
   if A is doing local mixing (Alice replaces Bob's dialog with
   Carol's), or if A is using a central mixer (the mixer replaces Bob's
   dialog with Carol's).  A clever implementation using the 3pcc model
   can generate similar results.

   New extensions to resources within that
   service.  This point cannot be overstated.

   In the SIP Call Control Framework should attempt to
   preserve this property.

2.9.  Billing issues

   Billing in context of SIP control of application components, we take
   advantage of the PSTN is typically based on who initiated a call.  At fact that the moment billing in left-hand-side of a standard SIP network URI
   is neither consistent with
   itself, nor with the PSTN.  (A billing model for SIP should allow for
   both PSTN-style billing, and non-PSTN billing.)  The example below
   demonstrates one such inconsistency.

   Alice places a call to Bob. Alice then blind transfers Bob to Carol
   through a PSTN gateway.  In current usage of REFER, Bob user part.  Most services may be billed
   for a call he did not initiate (his UA originated thought of as user automatons
   that participate in SIP sessions.  It naturally follows that the outgoing dialog
   however).  This is not necessarily a terrible thing, but it
   demonstrates a security concern (Bob must have appropriate local
   policy to prevent fraud).  Also, Alice may wish to pay for Bob's
   session with Carol.  There user
   part should be utilized as a way to signal this in SIP.

   Likewise service indicator.

   For example, media servers commonly offer multiple services at a Replacement call may maintain
   single host address.  Use of the same billing
   relationship user part as a Replaced call, so if Alice first calls Carol, then
   asks Bob service indicator
   enables service consumers to Replace this call, Alice may continue direct their requests without ambiguity.
   It has the added benefit of enabling media services to receive a bill.

   Further work register their
   availability with SIP Registrars just as any "real" SIP user would.
   This maintains consistency and provides enhanced flexibility in the
   deployment of media services in the network.

   There has been much discussion about the potential for confusion if
   media services URIs are not readily distinguishable from other types
   of SIP billing should define UAs.  The use of a way service namespace provides a mechanism to set
   unambiguously identify standard interfaces while not constraining the
   development of private or discover experimental services.

   In SIP, the direction Request-URI identifies the user or service that the call
   is destined for.  The great advantage of billing.

3.  Catalog using URIs (specifically,
   the SIP Request-URI) as a service identifier comes because of call control actions and sample features

   Call control actions can be categorized by the dialogs upon which
   combination of two facts.  First, unlike in the PSTN, where the
   namespace (dialable telephone numbers) are limited, URIs come from an
   infinite space.  They are plentiful, and they operate.  The actions may involve are free.  Secondly,
   the primary function of SIP is call routing through manipulations of
   the Request-URI.  In the traditional SIP application, this URI
   represents a single or multiple dialogs.
   These dialogs person.  However, the URI can be early or established.  Multiple dialogs may be
   related in also represent a conversation space service,
   as we propose here.  This means we can apply the routing services SIP
   provides to form a conference or other
   interesting media topologies.

   It should be noted that it is desirable routing of calls to provide services.  The result - the problem
   of service invocation and service location becomes a means by routing problem,
   for which SIP provides a
   party scalable and flexible solution.  Since there
   is such a vast namespace of services, we can discover the actions that may be performed on explicitly name each
   service in a dialog.
   The interested party may be independent or related to finely granular way.  This allows the dialogs.
   One means distribution of accomplishing this is through
   services across the ability network.  For further discussion about services
   and SIP URIs, see RFC 3087 [RFC3087]

   Consider a conferencing service, where we have separated the names of
   ad-hoc conferences from scheduled conferences, we can program proxies
   to define route calls for ad-hoc conferences to one set of servers, and
   obtain URIs
   calls for these actions as described scheduled ones to another, possibly even in section .

   Below are listed several call control "actions" that establish or
   modify dialogs a different
   provider.  In fact, since each conference itself is given a URI, we
   can distribute conferences across servers, and relate easily guarantee that
   calls for the participants in a conversation space.
   The names of same conference always get routed to the actions listed are for descriptive purposes only
   (they are not normative). same server.
   This list of actions is not meant in stark contrast to be
   exhaustive.

   In conferences in the examples, all actions are initiated by telephone network,
   where the user "Alice"
   represented by UA "A".

3.1.  Remote Call Control Actions on Early Dialogs

   The following are a set equivalent of actions that may be performed on a single
   early dialog.  These actions can the URI - the phone number - is scarce.  An
   entire conferencing provider generally has one or two numbers.
   Conference IDs must be thought of as a set of remote
   control operations.  For example an automaton might perform obtained through IVR interactions with the
   operation on behalf of a user.  Alternatively
   caller, or through a user might use human attendant.  This makes it difficult to
   distribute conferences across servers all over the
   remote control in network, since the form
   PSTN routing only knows about the dialed number.

   For more examples, consider the URI conventions of RFC 4240 [RFC4240]
   for media servers and RFC 4458 [RFC4458] for voicemail and IVR
   systems.

   In practical applications, it is important that an application invoker does not
   necessarily apply semantic rules to perform various URIs it did not create.
   Instead, it should allow any arbitrary string to be provisioned, and
   map the action on string to the early dialog desired behavior.  The administrator of a UA that
   service may be out of reach.  All of these
   actions correspond choose to telling provision specific conventions or mnemonic
   strings, but the UA how application should not require it.  In any large
   installation, the system owner is likely to respond have pre-existing rules
   for mnemonic URIs, and any attempt by an application to define its
   own rules may create a request conflict.  Implementations should allow an
   arbitrary mix of URIs from these schemes, or any other scheme that
   renders valid SIP URIs to
   establish be provisioned, rather than enforce only
   one particular scheme.

   As we have shown, SIP URIs represent an early dialog.  These actions provide useful
   functionality ideal, flexible mechanism for PDA, PC
   describing and server based applications naming service resources, regardless if the resources
   are queues, conferences, voice dialogs, announcements, voicemail
   treatments, or phone features.

2.8.  Invoker Independence

   With functional signaling, only the invoker of features in SIP needs
   to know exactly which feature they are invoking.  One of the primary
   benefits of this approach is that desire
   the ability to combinations of functional features
   work in SIP call control without requiring complex feature
   interaction matrices.  For example, let us examine the combination of
   a UA.  A proposed mechanism for this type "transfer" of
   functionality is described in Remote Call Control
   [I-D.mahy-sip-remote-cc].

3.1.1.  Remote Answer

   A dialog a call that is in some early dialog state such "conferenced".

   Alice calls Bob. Alice silently "conferences in" her robotic
   assistant Albert as 180 Ringing.  It may
   be desirable to tell the UA to answer the dialog.  That is tell it to
   send a 200 Ok response to establish the dialog.

3.1.2.  Remote Forward or Put

   It may be desirable hidden party.  Bob transfers Alice to tell the UA Carol.
   If Bob asks Alice to respond Replace her leg with a 3xx class
   response to forward an early dialog new one to another UA.

3.1.3.  Remote Busy or Error Out

   It may Carol then
   both Alice and Albert should be desirable to instruct communicating with Carol
   (transparently).

   Using the UA to send an error response such
   as 486 Busy Here.

3.2.  Remote Call Control Actions on Single Dialogs

   There is another useful set peer-to-peer model, this combination of actions that operate on a single
   established dialog.  These operations are useful in building
   productivity applications for aiding users to control their phone.
   For example a Customer Relationship Management (CRM) application that
   sets up calls for features works fine
   if A is doing local mixing (Alice replaces Bob's dialog with
   Carol's), or if A is using a user eliminating central mixer (the mixer replaces Bob's
   dialog with Carol's).  A clever implementation using the need for 3pcc model
   can generate similar results.

   New extensions to the user SIP Call Control Framework should attempt to
   actually enter an address.  These operations can also be thought of a
   remote control actions.  A proposed mechanism for
   preserve this type of
   functionality is described property.

2.9.  Billing issues

   Billing in Remote Call Control
   [I-D.mahy-sip-remote-cc].

3.2.1.  Remote Dial

   This action instructs the UA to initiate PSTN is typically based on who initiated a dialog.  This action can
   be performed using call.  At
   the REFER method.

3.2.2.  Remote On and Off Hold

   This action instructs moment billing in a SIP network is neither consistent with
   itself, nor with the UA PSTN.  (A billing model for SIP should allow for
   both PSTN-style billing, and non-PSTN billing.)  The example below
   demonstrates one such inconsistency.

   Alice places a call to put an established dialog on hold.
   Though this operation can conceptually Bob. Alice then blind transfers Bob to Carol
   through a PSTN gateway.  In current usage of REFER, Bob may be performed with billed
   for a call he did not initiate (his UA originated the REFER
   method, there outgoing dialog
   however).  This is no semantics defined as not necessarily a terrible thing, but it
   demonstrates a security concern (Bob must have appropriate local
   policy to what the referred party
   should do prevent fraud).  Also, Alice may wish to pay for Bob's
   session with the SDP. Carol.  There is no should be a way to distinguish between signal this in SIP.

   Likewise a Replacement call may maintain the
   desire same billing
   relationship as a Replaced call, so if Alice first calls Carol, then
   asks Bob to Replace this call, Alice may continue to go on or off hold on receive a per media stream basis.

3.2.3.  Remote Hangup

   This action instructs the UA bill.

   Further work in SIP billing should define a way to terminate an early set or established
   dialog.  A REFER request with the following Refer-To URI and Target-
   Dialog header field [RFC4538] performs this action.  Note: this
   example does not show discover
   the full set direction of header fields.

   REFER sip:carol@client.chicago.net SIP/2.0
   Refer-To: sip:bob@babylon.biloxi.example.com;method=BYE
   Target-Dialog: 13413098;local-tag=879738;remote-tag=023214

3.3. billing.

3.  Catalog of call control actions and sample features

   Call Control Actions on Multiple Dialogs

   These control actions apply to can be categorized by the dialogs upon which
   they operate.  The actions may involve a set of related single or multiple dialogs.

3.3.1.  Transfer

   This section describes how call transfer
   These dialogs can be achieved using
   centralized (3pcc) and peer-to-peer (REFER) approaches.

   The early or established.  Multiple dialogs may be
   related in a conversation space changes as follows:

    before            after
   { A , B }  -->   { C , B }

   A replaces itself with C.

   To make this happen using the peer-to-peer approach, "A" would send
   two SIP requests.  A shorthand for those requests is shown below:

   REFER B  Refer-To:C
   BYE B

   To make this happen instead using the 3pcc approach, the controller
   sends requests represented by the shorthand below:

   INVITE C (w/SDP of B)
   reINVITE B (w/SDP of C)
   BYE A

   Features enabled by this action:

   - blind transfer
   - transfer to form a central mixer (some type of conference or forking)
   - transfer to park server (park)
   - transfer to music on hold or announcement server
   - transfer or other
   interesting media topologies.

   It should be noted that it is desirable to provide a "queue"
   - transfer to means by which a service (such as Voice Dialogs service)
   - transition from local mixer
   party can discover the actions that may be performed on a dialog.
   The interested party may be independent or related to central mixer

   This action the dialogs.
   One means of accomplishing this is frequently referred through the ability to define and
   obtain URIs for these actions as "completing an attended
   transfer".  It is described in more detail in cc-transfer
   [I-D.ietf-sipping-cc-transfer].

   Note section .

   Below are listed several call control "actions" that if a transfer requires URI hiding establish or privacy, then
   modify dialogs and relate the 3pcc
   approach can more easily implement this.  For example, if participants in a conversation space.
   The names of the URI actions listed are for descriptive purposes only
   (they are not normative).  This list of
   C needs actions is not meant to be hidden from B, then the use of 3pcc helps accomplish
   this.

3.3.2.  Take

   The conversation space changes as follows:

   { B , C } --> { B , A }

   A forcibly replaces C with itself.
   exhaustive.

   In most uses of this primitive, A
   is just "un-replacing" itself.

   Using the peer-to-peer approach, "A" sends:

    INVITE B  Replaces: <dialog between B and C>

   Using the 3pcc approach (all requests sent from controller)

    INVITE A (w/SDP of B)
    reINVITE B (w/SDP of A)
    BYE C

   Features enabled examples, all actions are initiated by this action:

   - transferee completes an attended transfer
   - retrieve from central mixer (not recommended)
   - retrieve from music the user "Alice"
   represented by UA "A".

3.1.  Remote Call Control Actions on hold or park
   - retrieve from queue
   - call center take
   - voice portal resuming ownership of Early Dialogs

   The following are a call it originated
   - answering-machine style screening (pickup)
   - pickup set of actions that may be performed on a ringing call (i.e. single
   early dialog)

   Note: that pick up dialog.  These actions can be thought of as a ringing call has perhaps some interesting
   additional requirements.  First set of all it is an early dialog as
   opposed to remote
   control operations.  For example an established dialog.  Secondly automaton might perform the party which is to
   pickup
   operation on behalf of a user.  Alternatively a user might use the call may only wish to do so only while it is an early
   dialog.  That is
   remote control in the race condition where the ringing UA accepts
   just before it receives signaling from the party wishing form of an application to take perform the
   call, action on
   the taking party wishes early dialog of a UA that may be out of reach.  All of these
   actions correspond to yield or cancel telling the take.  The goal
   is UA how to avoid yanking respond to a request to
   establish an answered call from early dialog.  These actions provide useful
   functionality for PDA, PC and server based applications that desire
   the called party.

   This action ability to control a UA.  A proposed mechanism for this type of
   functionality is described in Replaces [RFC3891] and Remote Call Control
   [I-D.audet-sipping-feature-ref].

3.1.1.  Remote Answer

   A dialog is in cc-transfer
   [I-D.ietf-sipping-cc-transfer].

3.3.3.  Add

   Note that some early dialog state such as 180 Ringing.  It may
   be desirable to tell the following 4 actions are described in cc-conferencing
   [RFC4579].

   This UA to answer the dialog.  That is merely adding a participant tell it to
   send a SIP conference.  The
   conversation space changes as follows:

   { A , B } --> { A , B , C }

   A adds C 200 Ok response to establish the conversation.

   Using the peer-to-peer approach, adding a party using local mixing
   requires no signaling.  To transition from a 2-party call dialog.

3.1.2.  Remote Forward or a
   locally mixed conference Put

   It may be desirable to centrally mixing A could send tell the
   following requests:

    REFER B  Refer-To: conference-URI
    INVITE conference-URI
    BYE B

   To add a party UA to respond with a conference:

    REFER C  Refer-To: conference-URI 3xx class
   response to forward an early dialog to another UA.

3.1.3.  Remote Busy or
    REFER conference-URI  Refer-To: C

   Using Error Out

   It may be desirable to instruct the 3pcc approach UA to transition send an error response such
   as 486 Busy Here.

3.2.  Remote Call Control Actions on Single Dialogs

   There is another useful set of actions that operate on a single
   established dialog.  These operations are useful in building
   productivity applications for aiding users to centrally mixed, control their phone.
   For example a Customer Relationship Management (CRM) application that
   sets up calls for a user eliminating the
   controller would send:

    INVITE mixer leg 1 (w/SDP of A)
    INVITE mixer leg 2 (w/SDP of B)
    INVITE C (late SDP)
    reINVITE A (w/SDP of mixer leg 1)
    reINVITE B (w/SDP of mixer leg 2)
    INVITE mixer leg3 (w/SDP need for the user to
   actually enter an address.  These operations can also be thought of C)

   To add a party to a SIP conference:

    INVITE C (late SDP)
    INVITE conference-URI (w/SDP of C)

   Features enabled:

   - standard conference feature
   - call recording
   - answering-machine style screening (screening)

3.3.4.  Local Join

   The conversation space changes like this:

   { A , B } , { A , C }  -->  {
   remote control actions.  A , B , C }

           or like proposed mechanism for this

   { A , B } , { C , D }  -->  { A , B , C , D }

   A takes two conversation spaces and joins them together into a single
   space.

   Using type of
   functionality is described in Remote Call Control
   [I-D.audet-sipping-feature-ref].

3.2.1.  Remote Dial

   This action instructs the peer-to-peer approach, A UA to initiate a dialog.  This action can mix locally, or
   be performed using the REFER method.

3.2.2.  Remote On and Off Hold

   This action instructs the
   participants of both conversation spaces UA to put an established dialog on hold.
   Though this operation can conceptually be performed with the same central mixer
   (as in 3.3.5).

   For REFER
   method, there is no semantics defined as to what the 3pcc approach, referred party
   should do with the call flows for inserting participants, and
   joining and splitting conversation spaces are tedious yet
   straightforward, so these are left as SDP.  There is no way to distinguish between the
   desire to go on or off hold on a per media stream basis.

3.2.3.  Remote Hangup

   This action instructs the UA to terminate an exercise for early or established
   dialog.  A REFER request with the following Refer-To URI and Target-
   Dialog header field [RFC4538] performs this action.  Note: this
   example does not show the reader.

   Features enabled:

   - standard conference feature
   - leaving a sidebar full set of header fields.

   REFER sip:carol@client.chicago.net SIP/2.0
   Refer-To: sip:bob@babylon.biloxi.example.com;method=BYE
   Target-Dialog: 13413098;local-tag=879738;remote-tag=023214

3.3.  Call Control Actions on Multiple Dialogs

   These actions apply to rejoin a larger conference

3.3.5.  Insert set of related dialogs.

3.3.1.  Transfer

   This section describes how call transfer can be achieved using
   centralized (3pcc) and peer-to-peer (REFER) approaches.

   The conversation space changes like this: as follows:

    before            after
   { B A , C B }  -->   { A C , B , C }

   A inserts replaces itself into a conversation space.

   A proposed mechanism for signaling with C.

   To make this happen using the peer-to-peer
   approach is to approach, "A" would send a new header in an INVITE with "joining"
   [RFC3911] semantics.  For example:

   INVITE B Join: <dialog id of
   two SIP requests.  A shorthand for those requests is shown below:

   REFER B and C>

   If  Refer-To:C
   BYE B accepted

   To make this happen instead using the INVITE, B would accept responsibility to setup 3pcc approach, the
   dialogs and mixing necessary (for example: to mix locally or to
   transfer controller
   sends requests represented by the participants shorthand below:

   INVITE C (w/SDP of B)
   reINVITE B (w/SDP of C)
   BYE A

   Features enabled by this action:

   - blind transfer
   - transfer to a central mixer)

   Features enabled:

   - barge-in mixer (some type of conference or forking)
   - call center monitoring transfer to park server (park)
   - call recording

3.3.6.  Split

   { A , B , C , D } --> { A , B } , { C , D }

   If using transfer to music on hold or announcement server
   - transfer to a central conference with peer-to-peer

    REFER C  Refer-To: conference-URI (new URI)
    REFER D  Refer-To: conference-URI (new URI)
    BYE C
    BYE D

   Features enabled: "queue"
   - sidebar conversations during transfer to a larger conference

3.3.7.  Near-fork

   A participates in two conversation spaces simultaneously:

   { A, B } --> { B , A } & { A , C }

   A service (such as Voice Dialogs service)
   - transition from local mixer to central mixer

   This action is a participant frequently referred to as "completing an attended
   transfer".  It is described in two conversation spaces such more detail in
   [I-D.ietf-sipping-cc-transfer].

   Note that A sends the
   same media to both spaces, and renders media from both spaces,
   presumably by mixing if a transfer requires URI hiding or rendering privacy, then the media from both.  We 3pcc
   approach can define
   that A is more easily implement this.  For example, if the "anchor" point for both forks, each URI of which is a
   separate conversation space.

   This action is purely local implementation (it requires no special
   signaling).  Local features such as switching calls between
   C needs to be hidden from B, then the
   background and foreground are possible using this media relationship.

3.3.8.  Far fork use of 3pcc helps accomplish
   this.

3.3.2.  Take

   The conversation space diagram... changes as follows:

   { A, B } --> { A , B C } & --> { B , C A }

   A requests B to be the "anchor" forcibly replaces C with itself.  In most uses of two conversation spaces.

   This this primitive, A
   is easily setup by creating a conference with two sub-
   conferences just "un-replacing" itself.

   Using the peer-to-peer approach, "A" sends:

    INVITE B  Replaces: <dialog between B and setting C>

   Using the media policy appropriately such that B is
   a participant in both.  Media forking can also be setup using 3pcc as
   described in Section 5.1 approach (all requests sent from controller)

    INVITE A (w/SDP of RFC3264 [RFC3264] (an offer/answer model
   for SDP).  The session descriptions for forking are quite complex.

   Controllers should verify that endpoints can handle forked-media, for
   example using prior configuration. B)
    reINVITE B (w/SDP of A)
    BYE C

   Features enabled: enabled by this action:

   - barge-in transferee completes an attended transfer
   - voice portal services retrieve from central mixer (not recommended)
   - whisper retrieve from music on hold or park
   - hotword detection retrieve from queue
   - sending DTMF somewhere else

4.  Security Considerations

   Call Control primitives provide a powerful set of features that can
   be dangerous in the hands call center take
   - voice portal resuming ownership of an attacker.  To complicate matters, a call control primitives are likely to be automatically authorized
   without direct human oversight.

   The class it originated
   - answering-machine style screening (pickup)
   - pickup of attacks that are possible using these tools include the
   ability to eavesdrop on calls, disconnect calls, redirect calls,
   render irritating content (including ringing) at a user agent, cause
   an action ringing call (i.e. early dialog)

   Note: that has billing consequences, subvert billing (theft-of-
   service), and obtain private information.  Call control extensions
   must take extra care to describe how these attacks will be prevented.

   We can also make some general observations about authorization and
   trust with respect to pick up of a ringing call control.  The security model has perhaps some interesting
   additional requirements.  First of all it is
   dramatically dependent on the signaling model chosen (see section
   3.2)

   Let us first examine the security model used in the 3pcc approach.
   All signaling goes through an early dialog as
   opposed to an established dialog.  Secondly the controller, party which is a trusted entity.
   Traditional SIP authentication and hop-by-hop encryption and message
   integrity work fine in this environment, but end-to-end encryption
   and message integrity may not be possible.

   When using to
   pickup the peer-to-peer approach, call control actions and
   primitives can be legitimately initiated by a) may only wish to do so only while it is an existing
   participant early
   dialog.  That is in the conversation space, b) a former participant in race condition where the
   conversation space, ringing UA accepts
   just before it receives signaling from the party wishing to take the
   call, the taking party wishes to yield or c) an entity trusted by one of cancel the
   participants.  For example, a participant always initiates a
   transfer; a retrieve take.  The goal
   is to avoid yanking an answered call from Park (a take) the called party.

   This action is initiated on behalf of a
   former participant; described in Replaces [RFC3891] and a barge-in (insert or far-fork) in
   [I-D.ietf-sipping-cc-transfer].

3.3.3.  Add

   Note that the following 4 actions are described in [RFC4579].

   This is initiated
   by merely adding a trusted entity (an operator for example).

   Authenticating requests by an existing participant or to a trusted
   entity can be done with baseline SIP mechanisms.  In conference.  The
   conversation space changes as follows:

   { A , B } --> { A , B , C }

   A adds C to the case of
   features initiated by conversation.

   Using the peer-to-peer approach, adding a former participant, these should be protected
   against replay attacks by party using local mixing
   requires no signaling.  To transition from a unique name 2-party call or identifier per
   invocation.  The Replaces header exhibits this behavior as a by-
   product of its operation (once a Replaces operation is successful,
   locally mixed conference to centrally mixing A could send the dialog being Replaced no longer exists).  For other requests,
   following requests:

    REFER B  Refer-To: conference-URI
    INVITE conference-URI
    BYE B

   To add a
   "one-time" Request-URI may be provided party to a conference:

    REFER C  Refer-To: conference-URI
                   or
    REFER conference-URI  Refer-To: C

   Using the 3pcc approach to transition to centrally mixed, the feature invoker.
   controller would send:

    INVITE mixer leg 1 (w/SDP of A)
    INVITE mixer leg 2 (w/SDP of B)
    INVITE C (late SDP)
    reINVITE A (w/SDP of mixer leg 1)
    reINVITE B (w/SDP of mixer leg 2)
    INVITE mixer leg3 (w/SDP of C)

   To authorize call control primitives that trigger special behavior
   (such as an add a party to a SIP conference:

    INVITE with Replaces or C (late SDP)
    INVITE conference-URI (w/SDP of C)

   Features enabled:

   - standard conference feature
   - call recording
   - answering-machine style screening (screening)

3.3.4.  Local Join semantics),

   The conversation space changes like this:

   { A , B } , { A , C }  -->  { A , B , C }

           or like this

   { A , B } , { C , D }  -->  { A , B , C , D }

   A takes two conversation spaces and joins them together into a single
   space.

   Using the receiving
   user agent may have trouble finding appropriate credentials with
   which to challenge peer-to-peer approach, A can mix locally, or authorize the request, as the sender may be
   completely unknown to the receiver, except through REFER the introduction
   participants of a third party.  These credentials need both conversation spaces to be passed transitively
   in some way or fetched in an event body, for example.

5.  IANA Considerations

   This document required no action by IANA.

6.  Appendix A: Example Features

   Primitives are defined the same central mixer
   (as in terms of their ability to provide features.
   These example features should require an amply robust set of services
   to demonstrate a useful set of primitives.  They are described here
   briefly.  Note that 3.3.5).

   For the descriptions of these features 3pcc approach, the call flows for inserting participants, and
   joining and splitting conversation spaces are non-
   normative.  Some of tedious yet
   straightforward, so these features are used left as examples in section 6
   to demonstrate how some features may require certain media
   relationships.  Note also that this document describes a mixture of
   both features originating in an exercise for the world of telephones, and features
   that are clearly Internet oriented.

   Example Feature Definitions:

   Attended Transfer reader.

   Features enabled:

   - standard conference feature
   - leaving a sidebar to rejoin a larger conference

3.3.5.  Insert

   The transferring party establishes conversation space changes like this:

   { B , C } --> { A , B , C }

   A inserts itself into a session conversation space.

   A proposed mechanism for signaling this using the peer-to-peer
   approach is to send a new header in an INVITE with "joining"
   [RFC3911] semantics.  For example:

   INVITE B Join: <dialog id of B and C>

   If B accepted the transfer target before completing INVITE, B would accept responsibility to setup the transfer.

   Auto Answer - Calls
   dialogs and mixing necessary (for example: to a certain address mix locally or location answer
   immediately via a speakerphone.

   Automatic Callback: Alice calls Bob, but Bob is busy.  Alice would
   like Bob to call her automatically when he is available.  When Bob
   hangs up, Alice's phone rings.  When Alice answers, Bob's phone
   rings.  Bob answers and they talk.

   Barge-in - Carol interrupts Alice who has
   transfer the participants to a central mixer)
   Features enabled:

   - barge-in
   - call in-progress center monitoring
   - call
   with Bob. In some variations, Alice forcibly joins recording

3.3.6.  Split

   { A , B , C , D } --> { A , B } , { C , D }

   If using a new conversation central conference with Carol, in other variations, all three parties are placed in the
   same conversation (basically a 3-way conference).

   Blind Transfer peer-to-peer

    REFER C  Refer-To: conference-URI (new URI)
    REFER D  Refer-To: conference-URI (new URI)
    BYE C
    BYE D

   Features enabled:

   - Alice is in sidebar conversations during a conversation with Bob. Alice asks Bob
   to contact Carol, but makes no attempt to contact Carol
   independently.  In many implementations, Alice does not verify Bob's
   success or failure larger conference

3.3.7.  Near-fork

   A participates in contacting Carol.

   Call Forwarding - Before a dialog is accepted it two conversation spaces simultaneously:

   { A, B } --> { B , A } & { A , C }

   A is redirected to
   another location, for example, because a participant in two conversation spaces such that A sends the originally intended
   recipient is busy, does not answer, is disconnected
   same media to both spaces, and renders media from both spaces,
   presumably by mixing or rendering the network,
   configured all requests to go somewhere else.

   Call Monitoring - media from both.  We can define
   that A call center supervisor joins an in-progress call is the "anchor" point for monitoring purposes.

   Call Park - A call participant parks both forks, each of which is a call (essentially puts
   separate conversation space.

   This action is purely local implementation (it requires no special
   signaling).  Local features such as switching calls between the
   call on hold),
   background and then retrieves it at a later time (typically from
   another location).

   Call Pickup - foreground are possible using this media relationship.

3.3.8.  Far fork

   The conversation space diagram...

   { A, B } --> { A party picks up , B } & { B , C }

   A requests B to be the "anchor" of two conversation spaces.

   This is easily setup by creating a call that was ringing at another
   location.  One variation allows conference with two sub-
   conferences and setting the caller to choose which location,
   another variation just picks up any call in media policy appropriately such that user's "pickup
   group".

   Call Return B is
   a participant in both.  Media forking can also be setup using 3pcc as
   described in Section 5.1 of RFC3264 [RFC3264] (an offer/answer model
   for SDP).  The session descriptions for forking are quite complex.
   Controllers should verify that endpoints can handle forked-media, for
   example using prior configuration.

   Features enabled:

   - Alice calls Bob. Bob misses the call or is disconnected
   before he is finished talking to Alice.  Bob invokes barge-in
   - voice portal services
   - whisper
   - key word detection
   - sending DTMF somewhere else

4.  Security Considerations

   Call return that
   calls Alice, even if Alice did not Control primitives provide her real identity or
   location to Bob.

   Call Waiting - Alice is in a call, then receives another call.  Alice powerful set of features that can place
   be dangerous in the first hands of an attacker.  To complicate matters,
   call on hold, and talk with the other caller.
   She can typically switch back and forth between control primitives are likely to be automatically authorized
   without direct human oversight.

   The class of attacks that are possible using these tools include the callers.

   Click-to-dial - Alice looks in her company directory for Bob. When
   she finds Bob, she clicks
   ability to eavesdrop on calls, disconnect calls, redirect calls,
   render irritating content (including ringing) at a URI to call him.  Her phone rings (or
   possibly answers automatically), user agent, cause
   an action that has billing consequences, subvert billing (theft-of-
   service), and when she answers, Bob's phone
   rings.

   Conference obtain private information.  Call - Three or more active, visible participants in the
   same conversation space.

   Consultative transfer - the transferring party establishes a session
   with the target and mixes both sessions together so that all three
   parties can participate, then disconnects leaving the transferee control extensions
   must take extra care to describe how these attacks will be prevented.

   We can also make some general observations about authorization and
   transfer target
   trust with an active session.

   Distinctive ring - Incoming calls have different ring cadences or
   sample sounds depending respect to call control.  The security model is
   dramatically dependent on the From party, signaling model chosen (see section
   3.2)

   Let us first examine the To party, or other
   factors.

   Do Not Disturb - Alice selects security model used in the Do Not Disturb option.  Calls to
   her either ring briefly or not at all 3pcc approach.
   All signaling goes through the controller, which is a trusted entity.
   Traditional SIP authentication and are forwarded elsewhere.
   Some variations allow specially authorized callers to override hop-by-hop encryption and message
   integrity work fine in this
   feature environment, but end-to-end encryption
   and ring Alice anyway.

   Find-Me - Alice sets up complicated rules for how she can message integrity may not be reached
   (possibly using CPL (Call Processing Language) [RFC3880], presence
   RFC3856 [RFC3264], or other factors). possible.

   When Bob calls Alice, his call
   is eventually routed to a temporary Contact where Alice happens to be
   available.

   Hotline - Alice picks up a phone and is immediately connected to using the
   technical support hotline, for example.

   IM Conference Alerts: A user receives an notification as an Instant
   Message whenever someone joins a conference they are also in.

   Inbound Call Screening - Alice doesn't want to receive calls from
   Matt.  Inbound Screening prevents Matt from disturbing Alice.  In
   some variations this works even if Matt hides his identity.

   Intercom - Alice typically presses a button on a phone that
   immediately connects to another user or phone peer-to-peer approach, call control actions and causes that phone
   to play her voice over its speaker.  Some variations immediately
   setup two-way communications, other variations require another button
   to
   primitives can be pressed to enable a two-way conversation.

   Message Waiting - Bob calls Alice when she steps away from her phone,
   when she returns legitimately initiated by a) an existing
   participant in the conversation space, b) a visible former participant in the
   conversation space, or audible indicator conveys that someone
   has left her c) an entity trusted by one of the
   participants.  For example, a voicemail message.  The message waiting indication may
   also convey how many messages are waiting, participant always initiates a
   transfer; a retrieve from whom, what time, and
   other useful pieces of information.

   Music Park (a take) is initiated on Hold - When Alice places behalf of a call with Bob on hold, it
   replaces its audio with streaming content such as music,
   announcements,
   former participant; and a barge-in (insert or advertisements.

   Outbound Call Screening - Alice far-fork) is paged and unknowingly calls initiated
   by a PSTN
   pay-service telephone number in the Caribbean, but local policy
   blocks her call, and possibly informs her why.

   Pre-paid calling - Alice pays trusted entity (an operator for a certain currency example).

   Authenticating requests by an existing participant or unit amount a trusted
   entity can be done with baseline SIP mechanisms.  In the case of calling value.  When she places
   features initiated by a call, she provides her account
   number somehow.  If her account runs out former participant, these should be protected
   against replay attacks by using a unique name or identifier per
   invocation.  The Replaces header exhibits this behavior as a by-
   product of calling value during its operation (once a
   call her call Replaces operation is disconnected or redirected to successful,
   the dialog being Replaced no longer exists).  For other requests, a service where she
   can purchase more calling value.

   Presence-Enabled Conferencing: Alice wants
   "one-time" Request-URI may be provided to set up a conference the feature invoker.

   To authorize call control primitives that trigger special behavior
   (such as an INVITE with Bob and Cathy when they all happen Replaces or Join semantics), the receiving
   user agent may have trouble finding appropriate credentials with
   which to challenge or authorize the request, as the sender may be available (rather
   than scheduling a predefined time).  The server providing
   completely unknown to the
   application monitors their status, and calls all three when they receiver, except through the introduction
   of a third party.  These credentials need to be passed transitively
   in some way or fetched in an event body, for example.

5.  IANA Considerations

   This document required no action by IANA.

6.  Appendix A: Example Features

   Primitives are
   all "online", not idle, and not defined in another call.

   Single Line Extension/Multiple Line Appearance -- A group terms of phones
   are all treated as "extensions" their ability to provide features.
   These example features should require an amply robust set of services
   to demonstrate a single line.  A call for one
   rings them all.  As soon as one answers, useful set of primitives.  They are described here
   briefly.  Note that the others stop ringing.  If
   any extension is actively descriptions of these features are non-
   normative.  Note also that this document describes a mixture of both
   features originating in a conversation, another extension can
   "pick up" and immediately join the conversation.  This emulates the
   behavior world of telephones, and features that
   are clearly Internet oriented.

6.1.  Attended Transfer

   In Attended Transfer [I-D.ietf-sipping-cc-transfer] the transferring
   party establishes a home telephone line session with multiple phones.

   Speakerphone paging - Alice calls the paging address and speaks.  Her
   voice is played on transfer target before
   completing the speaker of every idle phone in transfer.

6.2.  Auto Answer

   In Auto Answer, calls to a preconfigured
   group of phones.

   Speed dial - Alice dials an abbreviated number, or enters an alias, certain address or presses URI answer immediately
   via a special speed dial button representing Bob. Her action
   is interpreted as if she specified the full address of Bob.

   Voice message screening - Bob calls Alice. speakerphone.  The Answer-Mode [RFC5373] header field can be
   used for this feature.

6.3.  Automatic Callback

   In Automatic Callback [RFC5359], Alice is screening her
   calls, so calls Bob, but Bob hears Alice's voicemail greeting. is busy.
   Alice can hear would like Bob
   leave his message.  If she decides to talk to Bob, she can take the call back from the voicemail system, otherwise she can let Bob leave
   a message.  This emulates the behavior of a home telephone answering
   machine

   Voice Portal - A service that allows users to access a portal site
   using spoken dialog interaction.  For example, Alice needs to
   schedule a working dinner with her co-worker Carol. automatically when he is available.
   When Bob hangs up, Alice's phone rings.  When Alice uses a
   voice portal to check Carol's flight schedule, find a restaurant near
   her hotel, make a reservation, get directions there, answers, Bob's
   phone rings.  Bob answers and page they talk.

6.4.  Barge-in

   In Barge-in, Carol
   with this information.

   Whispered call waiting - interrupts Alice is in who has a conversation call in-progress call
   with Bob. Carol
   calls Alice.  Either Carol can "whisper" to In some variations, Alice directly ("Can you
   get lunch forcibly joins a new conversation
   with Carol, in 15 minutes?"), or an automaton whispers to Alice
   informing her that Carol is trying to reach her.

6.1.  Implementation of these features

   Example Features:

 Attended Transfer        [I-D.ietf-sipping-cc-transfer]
 Auto Answer              [I-D.ietf-sip-answermode]
 Automatic Callback       Two person presence-based conference
 Barge-in                 Section 6.1.1
 Blind Transfer           [I-D.ietf-sipping-cc-transfer]
 Call Forwarding          Proxy or Local implementation
 Call Hold                [I-D.ietf-sipping-service-examples]
 Call Monitoring          Section 6.1.2
 Call Park                Sec 6.1.3, [I-D.ietf-sipping-service-examples]
 Call Pickup              Sec 6.1.4, [I-D.ietf-sipping-service-examples]
 Call Return              Proxy feature
 Call Waiting             Local Implementation
 Click-to-dial            Sec 6.1.5, [I-D.ietf-sipping-service-examples]
 Conference Call          [RFC4579]
 Presence-based
 Conferencing             [RFC4579], [RFC3856]
 Consultative transfer    [I-D.ietf-sipping-cc-transfer]
 Distinctive ring         Section 6.1.6, Proxy or Local implementation
 Do Not Disturb           [RFC3856]
 Find-Me                  Proxy service based on presence
 Hotline                  Local Implementation
 IM Conference Alerts     Subscribe to conference status
 Inbound Call Screening   Proxy or Local implementation
 Intercom                 Section 6.1.7, [I-D.ietf-sip-answermode]
 Message Waiting          [RFC3842]
 Multiple Appearances     Section 6.1.10
 Music on Hold            Sec 6.1.8, [I-D.ietf-sipping-service-examples]
 Outbound Call Screening  Proxy feature
 Pre-Paid Calling         Section 6.1.9
 Single Line Extension    Section 6.1.10
 Speakerphone paging      Section 6.1.11, Speed dial + Auto Answer
 Speed dial               Local Implementation
 Voice Message Screening  Section 6.1.12
 Voice Portal             Section 6.1.13
 Whispered call waiting   Local implementation

6.1.1.  Barge-in other variations, all three parties are placed in the
   same conversation (basically a 3-way conference).  Barge-in works the
   same as call monitoring except that it must indicate that the send
   media stream to be mixed so that all of the other parties can hear
   the stream stream from the UA which is barging in.

6.5.  Blind Transfer

   In Blind Transfer [I-D.ietf-sipping-cc-transfer], Alice is in a
   conversation with Bob. Alice asks Bob to contact Carol, but makes no
   attempt to contact Carol independently.  In many implementations,
   Alice does not verify Bob's success or failure in contacting Carol.

6.6.  Call Forwarding

   In call forwarding [RFC5359], before a dialog is accepted it is
   redirected to another location, for example, because the originally
   intended recipient is busy, does not answer, is disconnected from UA barging in.

6.1.2. the
   network, configured all requests to go somewhere else.

6.7.  Call Monitoring

   Call monitoring is a Join [RFC3911] operation.  For example, a call
   center supervisor joins an in-progress call for monitoring purposes.
   The monitoring UA sends a Join to the dialog it wants to listen to.
   It is able to discover the dialog via the dialog state on the
   monitored UA.  The monitoring UA sends SDP in the INVITE that
   indicates receive only media.  As the UA is monitoring only it does
   not matter whether the UA indicates it wishes the send stream be mix
   or point to point.

6.1.3.

6.8.  Call Park

   In Call Park [RFC5359], a participant parks a call (essentially puts
   the call on hold), and then retrieves it at a later time (typically
   from another location).  Call park requires the ability to: put a
   dialog some place, advertise it to users in a pickup group and to
   uniquely identify it in a means that can be communicated (including
   human voice).  The dialog can be held locally on the UA parking the
   dialog or alternatively transferred to the park service for the
   pickup group.  The parked dialog then needs to be labeled (e.g. orbit
   12) in a way that can be communicated to the party that is to pick up the call.  The UAs
   the call.  The UAs in the pick up group discovers the parked
   dialog(s) via the dialog package from the park service.  If the
   dialog is parked locally the park service merely aggregates the
   parked call states from the set of UAs in the pickup up group.

6.9.  Call Pickup

   There are two different features that are called Call Pickup
   [RFC5359].  The first is the pickup of a parked dialog.  The UA from
   which the dialog is to be picked up subscribes to the dialog state of
   the park service or the UA that has locally parked the dialog.
   Dialogs that are parked should be labeled with an identifier.  The
   labels are used by the UA to allow the user to indicate which dialog
   is to be picked up.  The UA picking up the call invoked the URI in
   the call state that is labeled as replace-remote.

   The other call pickup feature involves picking up an early dialog
   (typically ringing).  A party picks up a call that was ringing at
   another location.  One variation allows the caller to choose which
   location, another variation just picks up any call in that user's
   "pickup group".  This feature uses some of the same primitives as the
   pick up group discovers the of a parked dialog(s) via the dialog
   package from call.  The call state of the park service.  If UA ringing phone is
   advertised using the dialog package.  The UA that is parked locally to pickup the
   park
   early dialog subscribes either directly to the ringing UA or to a
   service merely aggregates aggregating the parked call states from the set of for UAs in the pickup up group.

6.1.4.  Call Pickup

   There are two different features that are called call pickup.  The
   first is the pickup of a parked dialog. call
   state identifies early dialogs.  The UA from which uses the call state(s) to
   help the user choose which early dialog that is to be picked up subscribes to up.  The
   UA then invokes the dialog URI in the call state of labeled as replace-remote.

6.10.  Call Return

   In Call Return, Alice calls Bob. Bob misses the park service call or the UA is
   disconnected before he is finished talking to Alice.  Bob invokes
   Call return that has locally parked calls Alice, even if Alice did not provide her real
   identity or location to Bob.

6.11.  Call Waiting

   In Call Waiting, Alice is in a call, then receives another call.
   Alice can place the dialog.  Dialogs that are
   parked should be labeled first call on hold, and talk with an identifier. the other
   caller.  She can typically switch back and forth between the callers.

6.12.  Click-to-Dial

   In Click-to-Dial [RFC5359], Alice looks in her company directory for
   Bob. When she finds Bob, she clicks on a URI to call him.  Her phone
   rings (or possibly answers automatically), and when she answers,
   Bob's phone rings.  The labels are used by application or server that hosts the UA to allow Click-
   to-Dial application captures the user to indicate which dialog is URI to be picked up.
   The UA picking up dialed and can setup the
   call invoked the URI in using 3pcc or can send a REFER request to the call state UA that is
   labeled as replace-remote.

   The other call pickup feature involves picking up an early dialog
   (typically ringing).  This feature uses some of the same primitives
   as to dial
   the pick address.  As users sometimes change their mind or wish to give up of
   listing to a parked call.  The call state of the UA ringing
   phone is advertised using or voicemail answered phone, this application
   illustrates the dialog package.  The UA that is need to
   pickup also have the early dialog subscribes either directly ability to the ringing UA remotely hangup a
   call.

6.13.  Conference Call

   In a Conference Call [RFC4579], there are three or to more active,
   visible participants in the same conversation space.

6.14.  Consultative Transfer

   In Consultative Transfer [I-D.ietf-sipping-cc-transfer], the
   transferring party establishes a service aggregating session with the states for UAs in target and mixes
   both sessions together so that all three parties can participate,
   then disconnects leaving the pickup group.
   The call state identifies early dialogs. transferee and transfer target with an
   active session.

6.15.  Distinctive Ring

   In Distinctive Ring, incoming calls have different ring cadences or
   sample sounds depending on the From party, the To party, or other
   factors.  The target UA uses either makes a local decision based on
   information in an incoming INVITE (To, From, Contact, Request-URI) or
   trusts an Alert-Info [RFC3261] header provided by the caller or
   inserted by a trusted proxy.  In the call
   state(s) to help latter case, the user choose which early dialog that is to be
   picked up.  The UA then invokes fetches the URI
   content described in the call state labeled as
   replace-remote.

6.1.5.  Click-to-dial

   The application or server that hosts URI (typically via http) and renders it to
   the click-to-dial application
   captures user.

6.16.  Do Not Disturb

   In Do Not Disturb, Alice selects the URI Do Not Disturb option.  Calls to be dialed
   her either ring briefly or not at all and are forwarded elsewhere.
   Some variations allow specially authorized callers to override this
   feature and ring Alice anyway.  Do Not Disturb is best implemented in
   SIP using presence [RFC3264].

6.17.  Find-Me

   In Find-Me, Alice sets up complicated rules for how she can setup the call be
   reached (possibly using 3pcc CPL (Call Processing Language) [RFC3880],
   presence [RFC3856], or
   can send other factors).  When Bob calls Alice, his
   call is eventually routed to a REFER request temporary Contact where Alice happens
   to the UA that be available.

6.18.  Hotline

   In Hotline, Alice picks up a phone and is immediately connected to dial
   the address.  As
   users technical support hotline, for example.  Hotline is also
   sometimes change their mind or wish to give up listing to known as a
   ringing or voicemail answered phone, this application illustrates the
   need to Ringdown line.

6.19.  IM Conference Alerts

   In IM Conference Alerts, A user receives an notification as an
   Instant Message whenever someone joins a conference they are also have the ability in.

6.20.  Inbound Call Screening

   In Inbound Call Screening, Alice doesn't want to remotely hangup a call.

6.1.6.  Distinctive ring

   The target UA either makes receive calls from
   Matt.  Inbound Screening prevents Matt from disturbing Alice.  In
   some variations this works even if Matt hides his identity.

6.21.  Intercom

   In Intercom, Alice typically presses a local decision based button on information in
   an incoming INVITE (To, From, Contact, Request-URI) or trusts an
   Alert-Info header provided by the caller or inserted by a trusted
   proxy.  In the latter case, the UA fetches the content described in
   the URI (typically via http) phone that
   immediately connects to another user or phone and renders it causes that phone
   to the user.

6.1.7.  Intercom play her voice over its speaker.  Some variations immediately
   setup two-way communications, other variations require another button
   to be pressed to enable a two-way conversation.  The UA initiates a
   dialog using INVITE and the Answer-Mode: Auto header field as
   described in [I-D.ietf-sip-answermode]. [RFC5373].  The called UA accepts the INVITE with a 200
   OK and automatically enables the speakerphone.

   Alternatively this can be a local decision for the UA to auto answer
   based upon called party identification.

6.1.8.

6.22.  Message Waiting

   In Message Waiting [RFC3842], Bob calls Alice when she steps away
   from her phone, when she returns a visible or audible indicator
   conveys that someone has left her a voicemail message.  The message
   waiting indication may also convey how many messages are waiting,
   from whom, what time, and other useful pieces of information.

6.23.  Music on Hold

   In Music on Hold [RFC5359], when Alice places a call with Bob on
   hold, it replaces its audio with streaming content such as music,
   announcements, or advertisements.  Music on hold can be implemented a
   number of ways.  One way is to transfer the held call to a holding
   service.  When the UA wishes to take the call off hold it basically
   performs a take on the call from the holding service.  This involves
   subscribing to call state on the holding service and then invoking
   the URI in the call state labeled as replace-remote.

   Alternatively music on hold can be performed as a local mixing
   operation.  The UA holding the call can mix in the music from the
   music service via RTP (i.e. an additional dialog) or RTSP or other
   streaming media source.  This approach is simpler (i.e. the held
   dialog does not move so there is less chance of loosing them) from a
   protocol perspective, however it does use more LAN bandwidth and
   resources on the UA.

6.1.9.

6.24.  Outbound Call Screening

   In Outbound Call Screening, Alice is paged and unknowingly calls a
   PSTN pay-service telephone number in the Caribbean, but local policy
   blocks her call, and possibly informs her why.

6.25.  Pre-paid Calling

   In Pre-paid Calling, Alice pays for a certain currency or unit amount
   of calling value.  When she places a call, she provides her account
   number somehow.  If her account runs out of calling value during a
   call her call is disconnected or redirected to a service where she
   can purchase more calling value.

   For prepaid calling, the user's media always passes through a device
   that is trusted by the pre-paid provider.  This may be the other
   endpoint (for example a PSTN gateway).  In either case, an
   intermediary proxy or B2BUA can periodically verify the amount of
   time available on the pre-paid account, and use the session-timer
   extension to cause the trusted endpoint (gateway) or intermediary
   (media relay) to send a reINVITE before that time runs out.  During
   the reINVITE, the SIP intermediary can re-verify the account and
   insert another session-timer header.

   Note that while most pre-paid systems on the PSTN use an IVR to
   collect the account number and destination, this isn't strictly
   necessary for a SIP-originated prepaid call.  SIP requests and SIP
   URIs are sufficiently expressive to convey the final destination, the
   provider of the prepaid service, the location from which the user is
   calling, and the prepaid account they want to use.  If a pre-paid IVR
   is used, the mechanism described below (Voice Portals) can be
   combined as well.

6.1.10.

6.26.  Presence-Enabled Conferencing

   In Presence-Enabled Conferencing, Alice wants to set up a conference
   call with Bob and Cathy when they all happen to be available (rather
   than scheduling a predefined time).  The server providing the
   application monitors their status, and calls all three when they are
   all "online", not idle, and not in another call.  This could be
   implemented using conferencing [RFC4579] and presence [RFC3264]
   primitives.

6.27.  Single Line Extension/Multiple Line Appearance

   In Single Line Extension/Multiple Line Appearances, group of phones
   are all treated as "extensions" of a single line or AOR.  A call for
   one rings them all.  As soon as one answers, the others stop ringing.
   If any extension is actively in a conversation, another extension can
   "pick up" and immediately join the conversation.  This emulates the
   behavior of a home telephone line with multiple phones.  Incoming
   calls ring all the extensions through basic parallel forking.  Each
   extension subscribes to dialog events from each other extension.
   While one user has an active call, any other UA extension can insert
   itself into that conversation (it already knows the dialog
   information) in the same way as barge-in.

   Standardization work

   When implemented using SIP, this feature is known as Shared
   Appearances of an AOR [I-D.ietf-bliss-shared-appearances].
   Extensions to allow line the dialog package are used to convey appearance
   numbers to be
   coordinated across (line numbers).

6.28.  Speakerphone Paging

   In Speakerphone Paging, Alice calls the paging address and speaks.
   Her voice is played on the speaker of every idle phone in a
   preconfigured group of UAs is currently underway.

6.1.11.  Speakerphone paging phones.  Speakerphone paging can be
   implemented using either multicast or through a simple multipoint
   mixer.  In the multicast solution the paging UA sends a multicast
   INVITE with send only media in the SDP (see also RFC3264).  The
   automatic answer and enabling of the speakerphone is a locally
   configured decision on the paged UAs.  The paging UA sends RTP via
   the multicast address indicated in the SDP.

   The multipoint solution is accomplished by sending an INVITE to the
   multipoint mixer.  The mixer is configured to automatically answer
   the dialog.  The paging UA then sends REFER requests for each of the
   UAs that are to become paging speakers (The UA is likely to send out
   a single REFER that is parallel forked by the proxy server).  The UAs
   performing as paging speakers are configured to automatically answer
   based upon caller identification (e.g.  To field, URI or Referred-To
   headers).

   Finally as a third option, the user agent can send a mass-invitation
   request to a conference server, which would create a conference and
   send INVITEs containing the Answer-Mode: Auto header field to all
   user agents in all
   user agents in the paging group.

6.29.  Speed Dial

   In Speed Dial, Alice dials an abbreviated number, or enters an alias,
   or presses a special speed dial button representing Bob. Her action
   is interpreted as if she specified the full address of Bob.

6.30.  Voice Message Screening

   In Voice Message Screening, Bob calls Alice.  Alice is screening her
   calls, so Bob hears Alice's voicemail greeting.  Alice can hear Bob
   leave his message.  If she decides to talk to Bob, she can take the paging group.

6.1.12.  Voice message screening
   call back from the voicemail system, otherwise she can let Bob leave
   a message.  This emulates the behavior of a home telephone answering
   machine.

   At first, this is the same as call monitoring. Call Monitoring (Section 6.7).  In this
   case the voicemail service is one of the UAs.  The UA screening the
   message monitors the call on the voicemail service, and also
   subscribes to dialog information.  If the user screening their
   messages decides to answer, they perform a Take from the voicemail
   system (for example, send an INVITE with Replaces to the UA leaving
   the message)

6.1.13.

6.31.  Voice Portal

   Voice Portal is service that allows users to access a portal site
   using spoken dialog interaction.  For example, Alice needs to
   schedule a working dinner with her co-worker Carol.  Alice uses a
   voice portal to check Carol's flight schedule, find a restaurant near
   her hotel, make a reservation, get directions there, and page Carol
   with this information.  A voice portal is essentially a complex
   collection of voice dialogs used to access interesting content.  One
   of the most desirable call control features of a Voice Portal is the
   ability to start a new outgoing call from within the context of the
   Portal (to make a restaurant reservation, or return a voicemail
   message for example).  Once the new call is over, the user should be
   able to return to the Portal by pressing a special key, using some
   DTMF sequence (ex: a very long pound or hash tone), or by speaking a hotword
   key word (ex: "Main Menu").

   In order to accomplish this, the Voice Portal starts with the
   following media relationship:

   { User , Voice Portal }

   The user then asks to make an outgoing call.  The Voice Portal asks
   the User to perform a Far-Fork.  In other words the Voice Portal
   wants the following media relationship:

           { Target , User }  &  { User , Voice Portal }

   The Voice Portal is now just listening for a hotword key word or the
   appropriate DTMF.  As soon as the user indicates they are done, the
   Voice Portal takes the call from the old Target, and we are back to
   the original media relationship.

   This feature can also be used by the account number and phone number
   collection menu in a pre-paid calling service.  A user can press a
   DTMF sequence that presents them with the appropriate menu again.

6.32.  Voicemail

   In Voicemail, Alice calls Bob who does not answer or is not
   available.  The call forwards to a voicemail server which plays Bob's
   greeting and records Alice's message for Bob. An indication is sent
   to Bob that a new message is waiting, and he retrieves the message at
   a later date.  This feature is implemented using features such as
   Call Forwarding (Section 6.6) and the History-Info [RFC4244] header
   field or voicemail URI [RFC4458] convention and Message Waiting
   [RFC3842] features.

6.33.  Whispered Call Waiting

   In Whispered Call Waiting, Alice is in a conversation with Bob. Carol
   calls Alice.  Either Carol can "whisper" to Alice directly ("Can you
   get lunch in 15 minutes?"), or an automaton whispers to Alice
   informing her that Carol is trying to reach her.

7.  Acknowledgements

   The authors would like to acknowledge Ben Campbell for his
   contributions to the document and thank AC Mahendran, John Elwell,
   and Xavier Marjou for their detailed Working Group review of the
   document.

8.  Informative References

   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
              A., Peterson, J., Sparks, R., Handley, M., and E.
              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
              June 2002.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264,
              June 2002.

   [RFC3265]  Roach, A., "Session Initiation Protocol (SIP)-Specific
              Event Notification", RFC 3265, June 2002.

   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
              Description Protocol", RFC 4566, July 2006.

   [I-D.ietf-sipping-service-examples]

   [RFC5359]  Johnston, A., Sparks, R., Cunningham, C., Donovan, S., and
              K. Summers, "Session Initiation Protocol Service
              Examples", draft-ietf-sipping-service-examples-14 (work in
              progress), February BCP 144, RFC 5359, October 2008.

   [RFC3725]  Rosenberg, J., Peterson, J., Schulzrinne, H., and G.
              Camarillo, "Best Current Practices for Third Party Call
              Control (3pcc) in the Session Initiation Protocol (SIP)",
              BCP 85, RFC 3725, April 2004.

   [RFC3515]  Sparks, R., "The Session Initiation Protocol (SIP) Refer
              Method", RFC 3515, April 2003.

   [RFC3891]  Mahy, R., Biggs, B., and R. Dean, "The Session Initiation
              Protocol (SIP) "Replaces" Header", RFC 3891,
              September 2004.

   [RFC3911]  Mahy, R. and D. Petrie, "The Session Initiation Protocol
              (SIP) "Join" Header", RFC 3911, October 2004.

   [RFC4235]  Rosenberg, J., Schulzrinne, H., and R. Mahy, "An INVITE-
              Initiated Dialog Event Package for the Session Initiation
              Protocol (SIP)", RFC 4235, November 2005.

   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session
              Initiation Protocol (SIP) Event Package for Conference
              State", RFC 4575, August 2006.

   [RFC3680]  Rosenberg, J., "A Session Initiation Protocol (SIP) Event
              Package for Registrations", RFC 3680, March 2004.

   [RFC3856]  Rosenberg, J., "A Presence Event Package for the Session
              Initiation Protocol (SIP)", RFC 3856, August 2004.

   [RFC4353]  Rosenberg, J., "A Framework for Conferencing with the
              Session Initiation Protocol (SIP)", RFC 4353,
              February 2006.

   [I-D.ietf-sipping-app-interaction-framework]
              Rosenberg, J., "A Framework for Application Interaction in
              the Session Initiation Protocol  (SIP)",
              draft-ietf-sipping-app-interaction-framework-05 (work in
              progress), July 2005.

   [I-D.ietf-sipping-transc-framework]

   [RFC5369]  Camarillo, G., "Framework for Transcoding with the Session
              Initiation Protocol (SIP)",
              draft-ietf-sipping-transc-framework-05 RFC 5369, October 2008.

   [I-D.ietf-xcon-ccmp]
              Barnes, M., Boulton, C., Romano, S., and H. Schulzrinne,
              "Centralized Conferencing Manipulation Protocol",
              draft-ietf-xcon-ccmp-01 (work in progress),
              December 2006. November 2008.

   [I-D.ietf-sipping-cc-transfer]
              Sparks, R., R. and A. Johnston, "Session Initiation Protocol
              Call Control - Transfer", draft-ietf-sipping-cc-transfer-09
              draft-ietf-sipping-cc-transfer-12 (work in progress), December 2007.
              March 2009.

   [RFC4579]  Johnston, A. and O. Levin, "Session Initiation Protocol
              (SIP) Call Control - Conferencing for User Agents",
              BCP 119, RFC 4579, August 2006.

   [RFC3840]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat,
              "Indicating User Agent Capabilities in the Session
              Initiation Protocol (SIP)", RFC 3840, August 2004.

   [RFC3841]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller
              Preferences for the Session Initiation Protocol (SIP)",
              RFC 3841, August 2004.

   [RFC3087]  Campbell, B. and R. Sparks, "Control of Service Context
              using SIP Request-URI", RFC 3087, April 2001.

   [I-D.mahy-sip-remote-cc]
              Jennings, C. and R.

   [I-D.audet-sipping-feature-ref]
              Audet, F., Johnston, A., Mahy, "Remote Call Control R., and C. Jennings,
              "Feature Referral in the Session Initiation Protocol (SIP) using the REFER  method
              and the session-oriented dialog package",
              draft-mahy-sip-remote-cc-05
              (SIP)", draft-audet-sipping-feature-ref-00 (work in
              progress),
              March 2007. February 2008.

   [RFC4240]  Burger, E., Van Dyke, J., and A. Spitzer, "Basic Network
              Media Services with SIP", RFC 4240, December 2005.

   [RFC4458]  Jennings, C., Audet, F., and J. Elwell, "Session
              Initiation Protocol (SIP) URIs for Applications such as
              Voicemail and Interactive Voice Response (IVR)", RFC 4458,
              April 2006.

   [RFC4538]  Rosenberg, J., "Request Authorization through Dialog
              Identification in the Session Initiation Protocol (SIP)",
              RFC 4538, June 2006.

   [RFC3880]  Lennox, J., Wu, X., and H. Schulzrinne, "Call Processing
              Language (CPL): A Language for User Control of Internet
              Telephony Services", RFC 3880, October 2004.

   [I-D.ietf-sip-answermode]

   [RFC5373]  Willis, D. and A. Allen, "Requesting Answering Modes for
              the Session Initiation Protocol (SIP)",
              draft-ietf-sip-answermode-06 (work in progress),
              September 2007. RFC 5373,
              November 2008.

   [RFC3842]  Mahy, R., "A Message Summary and Message Waiting
              Indication Event Package for the Session Initiation
              Protocol (SIP)", RFC 3842, August 2004.

   [I-D.ietf-bliss-shared-appearances]
              Johnston, A., Soroushnejad, M., and V. Venkataramanan,
              "Shared Appearances of a Session Initiation Protocol (SIP)
              Address of Record  (AOR)",
              draft-ietf-bliss-shared-appearances-01 (work in progress),
              November 2008.

   [RFC4244]  Barnes, M., "An Extension to the Session Initiation
              Protocol (SIP) for Request History Information", RFC 4244,
              November 2005.

   [RFC4313]  Oran, D., "Requirements for Distributed Control of
              Automatic Speech Recognition (ASR), Speaker
              Identification/Speaker Verification (SI/SV), and Text-to-
              Speech (TTS) Resources", RFC 4313, December 2005.

Authors' Addresses

   Rohan Mahy
   Plantronics
   345 Encincal Street
   Santa Cruz, CA
   USA

   Email: rohan@ekabal.com
   Robert Sparks
   Estacado Systems
   Tekelek

   Email: rjsparks@nostrum.com

   Jonathan Rosenberg
   Cisco Systems

   Email: jdrosen@cisco.com

   Dan Petrie
   SIP EZ

   Email: dpetrie@sipez.com

   Alan Johnston (editor)
   Avaya

   Email: alan@sipstation.com

Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.