SIPPING Working Group                                        Mahy/Cisco
Internet Draft                                     Campbell/dynamicsoft
Document: draft-ietf-sipping-cc-framework-00.txt draft-ietf-sipping-cc-framework-01.txt      Johnston/Worldcom
February
June 2002                                                Petrie/Pingtel
                                                   Rosenberg/dynamicsoft
Expires: August December 2002                               Sparks/dynamicsoft

                 A Multi-party Application Framework for SIP

    Status of this Memo

    This document is an Internet-Draft and is in full conformance with
    all provisions of Section 10 of RFC2026 [RFC2026]. RFC2026.

    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups. Note that
    other groups may also distribute working documents as Internet-
    Drafts. Internet-Drafts are draft documents valid for a maximum of
    six months and may be updated, replaced, or obsoleted by other
    documents at any time. It is inappropriate to use Internet- Drafts
    as reference material or to cite them other than as "work in
    progress."
    The list of current Internet-Drafts can be accessed at
       http://www.ietf.org/ietf/1id-abstracts.txt
    The list of Internet-Draft Shadow Directories can be accessed at
       http://www.ietf.org/shadow.html.

1 Abstract

    This document defines a framework and requirements for multi-party
    applications in SIP.  To enable discussion of multi-party
    applications we define an abstract call model for describing the
    media relationships required by many of these applications.  The
    model and actions described here are specifically chosen to be
    independent of the SIP signaling and/or mixing approach chosen to
    actually setup the media relationships.  In addition to its dialog
    manipulation aspect, this framework includes requirements for
    communicating related information and events such as conference and
    session state, and session history.  This framework also describes
    other goals which embody the spirit of SIP applications as used on
    the Internet.

2 Conventions used in this document

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" this
    document are to be interpreted as described in RFC-2119 [RFC2119].

                        SIP Multiparty Framework

    Table of Contents
    1   Abstract.......................................................1
    2   Conventions used in this document..............................1
    3   Motivation and Background......................................4
    3.1   Goals........................................................4
    3.2   Example Features.............................................6 Features............................................28
    4   Key Concepts...................................................9 Concepts...................................................6
    4.1   "Conversation Space" Model...................................9 Model...................................6
    4.1.1   Comparison with Related Definitions.......................10 Definitions........................7
    4.2   Signaling Models............................................11 Models.............................................7
    4.3   Mixing Models...............................................12 Models................................................8
    4.3.1   (Single) End System Mixing................................12 Mixing.................................9
    4.3.2   Centralized Mixing........................................12 Mixing.........................................9
    4.3.3   Multicast and Multi-unicast conferences...................14 conferences...................10
    4.4   Conveying Information and Events............................15 Events............................11
    4.5   Componentization and Decomposition..........................16 Decomposition..........................13
    4.5.1   Media Intermediaries......................................17 Intermediaries......................................13
    4.5.2   Queue Server..............................................18 Server..............................................14
    4.5.3   Parking Place.............................................18 Place.............................................14
    4.5.4   Announcements and Voice Dialogs...........................19 Dialogs...........................14
    4.6   Use of URIs.................................................21 URIs.................................................16
    4.6.1   Naming Users in SIP.......................................21 SIP.......................................17
    4.6.2   Naming Services with SIP URIs.............................23 URIs.............................18
    4.7   Invoker Independence........................................26 Independence........................................21
    4.8   Billing issues..............................................26 issues..............................................21
    5  Catalog of call control actions and sample features............26 features............22
    5.1   Early Dialog Actions........................................27 Actions........................................22
    5.1.1   Remote Answer.............................................27 Answer.............................................22
    5.1.2   Remote Forward or Put.....................................27 Put.....................................22
    5.1.3   Remote Busy or Error Out..................................27 Out..................................23
    5.2   Single Dialog Actions.......................................27 Actions.......................................23
    5.2.1   Remote Dial...............................................28 Dial...............................................23
    5.2.2   Remote On and Off Hold....................................28 Hold....................................23
    5.2.3   Remote Hangup.............................................28 Hangup.............................................23
    5.3   Multi-dialog actions........................................28 actions........................................23
    5.3.1   Transfer..................................................28   Transfer..................................................23
    5.3.2   Take......................................................29   Take......................................................24
    5.3.3   Add.......................................................29   Add.......................................................25
    5.3.4   Local Join................................................30 Join................................................25
    5.3.5   Insert....................................................30   Insert....................................................26
    5.3.6   Split.....................................................31   Split.....................................................26
    5.3.7   Near-fork.................................................31   Near-fork.................................................26
    5.3.8   Far fork..................................................31 fork..................................................27
    6   Putting it all together.......................................33 together.............Error! Bookmark not defined.
    6.1   Feature Solutions...........................................34 Solutions.................Error! Bookmark not defined.
    6.1.1   Call Park.................................................34 Park.................................................32
    6.1.2   Call Pickup...............................................34 Pickup...............................................32
    6.1.3   Music on Hold.............................................35 Hold.............................................33
    6.1.4   Call Monitoring...........................................35 Monitoring...........................................33
    6.1.5   Barge-in..................................................35   Barge-in..................................................33
    6.1.6   Intercom..................................................35   Intercom..................................................33
    6.1.7   Speakerphone paging.......................................36 paging.......................................34
    6.1.8   Distinctive ring..........................................36 ring..........................................34
                        SIP Multiparty Framework

    6.1.9   Voice message screening...................................36 screening...................................34
    6.1.10  Single Line Extension.....................................36 Extension.....................................34
    6.1.11  Click-to-dial.............................................36  Click-to-dial.............................................34
    6.1.12  Pre-paid calling..........................................37 calling..........................................35
    6.1.13  Voice Portal..............................................37 Portal..............................................35
    7   Security Considerations.......................................38 Considerations.......................................27
    8   References....................................................39   References....................................................36
    9   Acknowledgments...............................................41   Acknowledgments...............................................39
    10   Author's Addresses...........................................41 Addresses...........................................39
                        SIP Multiparty Framework

3 Motivation and Background

    The Session Initiation Protocol [SIP] was defined for the
    initiation, maintenance, and termination of sessions or calls
    between one or more users.  However, despite its origins as a large-
    scale multiparty conferencing protocol, SIP is used today primarily
    for point to point calls.  This two-party configuration is the focus
    of the SIP specification and most of its extensions.

    This document defines a framework and requirements for multi-party
    applications in SIP.  Most multi-party applications manipulate SIP
    dialogs (also known as call legs) to cause participants in a
    conversation to perceive specific media relationships.  In other
    protocols that deal with the concept of calls, this manipulation is
    known as call control.  In addition to its dialog manipulation
    aspect, "call control" also includes communicating information and
    events related to manipulating calls, including information and
    events dealing with session state and history, conference state,
    user state, and even message state.

3.1 Goals
    Based on input from the SIP community, the authors compiled the
    following set of goals for SIP call control: control and multiparty
    applications:

    - Define Primitives, Not Services.  Allow for a handful of robust
    yet simple mechanisms which can be combined to deliver features and
    services.  Throughout this document we refer to these simple
    mechanisms as "primitives".  Primitives should be sufficiently
    robust that when they are combined they can be used to build lots of
    services.  However, the goal is not to define a provably complete
    set of primitives. Note that while the IETF will NOT standardize
    behavior or services, it may define example services for
    informational purposes, as in [service examples].

    - Participant oriented.  The primitives should be designed to
    provide services which are oriented around the experience of the
    participants.  The authors observe that end users of features and
    services usually don't care how a media relationship is setup.
    Their ultimate experience is based only on the resulting media and
    other externally visible characteristics.

    - Signaling Model independent: Support both a central control and a
    peer-to-peer feature invocation model (and combinations of the two).
    baseline SIP already supports a centralized control model described
    in [3pcc], and the SIP community has expressed a great deal of
    interest in peer-to-peer or distributed call control.  Some such
    primitives are already defined in [REFER] and [Replaces].

    - Mixing Model independent: The bulk of interesting multiparty
    applications involve mixing or combining media from multiple
    participants.  This mixing can be performed by one or more of the
                        SIP Multiparty Framework

    participants, or by a centralized mixing resource.  The experience
                       SIP Multiparty Framework
    of the participants should not depend on the mixing model used.
    While most examples in this document refer to audio mixing, the
    framework applies to any media type.  In this context a "mixer"
    refers to combining media in an appropriate, media-specific way.

    - Invoker oriented. Only the user who invokes a feature or a service
    needs to know exactly which service is invoked or why.  This is good
    because it allows new services to be created without requiring new
    primitives from all the participants; and it allows for much simpler
    feature authorization policies, for example, when participation
    spans organizational boundaries.  As discussed in section 4.7, this
    also avoids exponential state explosion when combining features.
    The invoker only has to manage a user interface or API to prevent
    local feature interactions.  All the other participants simply need
    to manage the feature interactions of a much smaller number of
    primitives.

    - Primitives make full use of URIs.  URIs are a very powerful
    mechanism for describing users and services.  They represent a
    plentiful resource which can be extremely expressive and easily
    routed, translated, and manipulated--even across organizational
    boundaries.  URIs can contain special parameters and informational
    headers which need only be relevant to the owner of the namespace
    (domain) of the URI.  Just as a user who selects an http: URL need
    not understand the significance and organization of the web site it
    references, a user may encounter a SIP URL which translates into an
    email-style group alias, which plays a pre-recorded message, or runs
    some complex call-handling logic.

    - Make use of SIP headers and SIP event packages to provide SIP
    entities with information about their environment.  These should
    include information about the status / handling of dialogs on other
    user agents, information about the history of other contacts
    attempted prior to the current contact, the status of participants,
    the status of conferences, user presence information, and the status
    of messages.

    - Encourage service decomposition, and design to make use of
    standard components using well-defined, simple interfaces.  Sample
    components include a media SIP mixer, recording service, announcement
    server, and voice dialog server.  (This is not an exhaustive list).

    - Include authentication, authorization, policy, logging, and
    accounting mechanisms to allow these primitives to be used safely
    among mutually untrusted participants.  Some of these mechanisms may
    be used to assist in billing, but no specific billing system will be
    endorsed.

    - Permit graceful fallback to baseline SIP.  Definitions for new SIP
    call control extensions/primitives MUST describe a graceful way to
    fallback to baseline SIP behavior. Support for one primitive MUST
    NOT imply support for another primitive.

                        SIP Multiparty Framework

    - Do not There is no desire or goal to reinvent traditional models, such as
    the model used the
   H.450 [H.450] family of protocols, JTAPI, [JTAPI], or the CSTA
    [CSTA] call model.  In the
   opinion of the authors, these models share more characteristics of
   the traditional telephone network than with SIP.  As model, as these other models do not share the design
    goals presented in this document, it
   would be a disservice to these other protocols and SIP to try to
   shoehorn our new design goals into an existing model.

3.2 Example Features

   Primitives are defined in terms document.

4 Key Concepts

4.1 "Conversation Space" Model

    This document introduces the concept of their ability to provide
   features.  These example features should require an amply robust set
   of services to demonstrate abstract "conversation
    space" (essentially as a useful set of primitives.  They participants who believe they are
   described here briefly. Note that the descriptions of these features
    all communicating among one another).  Each conversation space
    contains one or more participants.

    Participants are non-normative.  Some SIP User Agents which send original media to or
    terminate and receive media from other members of these features are used as examples the conversation
    space.  Logically, every participant in
   section 6 the conversation space has
    access to demonstrate how some features may require certain all the media
   relationships.  Note also generated in that this document describes space (this is strictly
    true if all participants share a mixture of
   both features common media type).  A SIP User
    Agent which does not contribute or consume any media is NOT a
    participant; nor is a user agent which merely forwards, transcodes,
    mixes, or selects media originating elsewhere in the world conversation
    space.  [Note that a conversation space consists of telephones, and features
   which are clearly Internet oriented.

   Example Features:

   Call Waiting - Alice zero or more SIP
    calls or SIP conferences.  A conversation space is in a call, then receives another call.
   Alice can place the first call on hold, and talk with similar to the
    definition of a "call" in some other
   caller.  She can typically switch back and forth between the
   callers.

   Blind Transfer - Alice is call models.]

    Participants may represent human users or non-human users (referred
    to as robots or automatons in this document).  Some participants may
    be hidden within a conversation with Bob.  Alice asks
   Bob to contact Carol, but makes no attempt to contact Craol
   independently.  In many implementations, Alice does not verify Bob's
   success space. Some examples of hidden
    participants include: robots which generate tones, images, or failure in contacting Carol.

   Attended Transfer - The transferring party establishes
    announcements during a session
   with the transfer target before completing the transfer.

   Consultative transfer - the transferring party establishes conference to announce users arriving and
    departing, a session
   with the target human call center supervisor monitoring a conversation
    between a trainee and mixes both sessions together so that all three
   parties can participate, then disconnects leaving the transferee a customer, and
   transfer target with an robots which record media for
    training or archival purposes.

    Participants may also be active session.

   Conference Call - Three or more active, visible passive.  Active participants in the
   same are
    expected to be intelligent enough to leave a conversation space.

   Call Park - A call space when
    they no longer desire to participate.  (An attentive human
    participant parks is obviously active.)  Some robotic participants (such
    as a call (essentially puts the
   call on hold), and then retrieves it at voice messaging system, an instant messaging agent, or a later time (typically voice
    dialog system) may be active participants if they can leave the
    conversation space when there is no human interaction.  Other robots
    (for example our tone generating robot from
   another location).

   Call Pickup - the previous example)
    are passive participants.  A party picks up human participant "on-hold" is passive.

    An example diagram of a call that was ringing at another
   location.  One variation allows the caller to choose which location,
   another variation just picks up any call conversation space can be shown as a
    "bubble" or ovals, or as a "set" in that user's "pickup
   group". curly or square brace notation.
    Each set, oval, or "bubble" represents a conversation space. Hidden
    participants are shown in lowercase letters.

    { A , B }          [ A , B ]
                        SIP Multiparty Framework

   Music on Hold - When Alice places

       .-.                 .---.
      /   \               /     \
     /  A  \             / A   b \
    (       )           (         )
     \  B  /             \ C   D /
      \   /               \     /
       '-'                 '---'

4.1.1 Comparison with Related Definitions

    In SIP, a call with Bob is "an informal term that refers to some
    communication between peers, generally set up for the purposes of a
    multimedia conversation."  Obviously we cannot discuss normative
    behavior based on hold, it
   replaces its audio with streaming content such as music,
   announcements, or advertisements.

   Call Monitoring - A call center supervisor joins an in-progress intentionally vague definition.  The
    concept of a conversation space is needed because the SIP definition
    of call is not sufficiently precise for monitoring purposes.

   Barge-in - Carol interrupts Alice who has the purpose of describing
    the user experience of multiparty features.

    Do any other definitions convey the correct meaning?  SIP, and [SDP]
    both define a call in-progress call conference as "a multimedia session identified by a
    common session description."  A session is defined as "a set of
    multimedia senders and receivers and the data streams flowing from
    senders to receivers."  Both of these definitions are heavily
    oriented toward multicast sessions with Bob. little differenciation among
    participants.  As such, neither is particularly useful for our
    purposes.  In fact, the definition of "call" in some variations, Alice forcibly joins call models is
    more similar to our definition of a new conversation with Carol, in other variations, all three parties are
   placed in space.

    Some examples of the same relationship between conversation (basically a 3-way conference).

   Hotline - Alice picks up a phone spaces, SIP
    call legs, and SIP sessions are listed below.  In each example, a
    human user will perceive that there is immediately connected to the
   technical support hotline, for example.

   Autoanswer - Calls to a certain address or location answer
   immediately via single call.

        A simple two-party call is a speakerphone.

   Intercom - Alice typically presses single conversation space, a button on single
        session, and a phone which
   immediately connects to another user or phone single call-leg.

        A locally mixed three-way call is two sessions and casues that phone
   to play her voice over its speaker.  Some variations immediately
   setup two-way communications, other variations require another
   button to be pressed to enable two call-
        legs.  It is also a two-way conversation.

   Speakerphone paging - Alice calls the paging address and speaks.
   Her voice single conversation space.

        A simple dial-in audio conference is played on the speaker of every idle phone in a
   preconfigured group of phones.

   Speed dial - Alice dials an abbreviated number, or enters an alias,
   or presses a special speed dial button representing Bob.  Her action single conversation
        space, but is interpreted represented by as if she specified the full address of Bob.

   Call Return - Alice calls Bob.  Bob misses the call or is
   disconnected before he many call-legs and sessions as
        there are human participants.

        A multicast conference is finished talking a single conversation space, a single
        session, and as many call-legs as participants.

4.2 Signaling Models

    Obviously to Alice.  Bob invokes
   Call return which calls Alice, even if Alice did not provide her
   real identity or location make changes to Bob.

   Inbound Call Screening - Alice doesn't want a conversation space, you must be able
    to receive calls from
   Matt.  Inbound Screening prevents Matt from disturbing Alice.  In
   some variations use SIP signaling to cause these changes.  Specifically there
    must be a way to manipulate SIP dialogs (call legs) to move
    participants into and out of conversation spaces.  Although this works even if Matt hides his identity.

   Outbound Call Screening - Alice is paged and unknowingly calls
    not as obvious, there also must be a
   PSTN pay-service telephone number way to manipulate SIP dialogs
    to include non-participant user agents which are otherwise involved
                        SIP Multiparty Framework

    in a conversation space (ex: B2BUAs, 3pcc controllers, mixers,
    transcoders, translators, or relays).

    Implementations may setup the Carribean, but local policy
   blocks her call, and possibly informs her why.

   Call Forwarding - Before media relationships described in the
    conversation space model using the approach described in [3pcc]. The
    3pcc approach relies on only the following 3 primitive operations:

        Create a new call-leg  (INVITE)
        Modify a call-leg      (reINVITE)
        Destroy a call-leg     (BYE)

    The main advantage of the 3pcc approach is accepted that it only requires
    very basic SIP support from end systems to support call control
    features.  As such, third-party call control is redirected a natural way to
   another location, for example, because
    handle protocol conversion and mid-call features.  It also has the originally intended
   recipient
    advantage and disadvantage that new features can/must be implemented
    in one place only (the controller), and neither requires enhanced
    client functionality, nor takes advantage of it.

    In addition, a peer-to-peer approach is busy, does not answer, discussed at length in this
    draft.  The primary drawback of the peer-to-peer model is disconnected from additional
    end system complexity.  The benefits of the
   network, configured all requests to peer-to-peer model
    include:
    - state remains at the edges
    - call signaling need only go soemwhere else.

   Message Waiting through participants involved
      (there are no additional points of failure)
    - Bob calls Alice when she steps away from her
   phone, when she returns a visible or audible indicator conveys that
   someone has left her a voicemail message.  The peers can take advantage of end-to-end message waiting
                       SIP Multiparty Framework

   indication may also convey how many integrity or
      encryption
    - setup time is shorter (fewer messages are waiting, from whom,
   what time, and other useful pieces of information.

   Do Not Disturb - Alice selects the Do Not Disturb option.  Calls to
   her either ring briefly or not at all and round trips
      are forwarded elsewhere.
   Some variations allow specially authorized callers required)

    The peer-to-peer approach relies on additional "primitive"
    operations, some of which are identified here.

        Replace an existing dialog
        Join a new dialog with an existing dialog [Join]
        Fork a new dialog with an existing dialog
        Locally do media forking (multi-unicast)
        Ask another UA to override this
   feature and ring Alice anyway.

   Distinctive ring - Incoming calls have different ring cadences or
   sample sounds depending send a request on your behalf

    Many of the From party, the To party, features, primitives, and actions described in this
    document also require some type of media mixing, combining, or other
   factors.

   Automatic Callback: Alice calls Bob, but Bob
    selection as described in the next section.

4.3 Mixing Models

    SIP permits a variety of mixing models, which are discussed here
    briefly.  This topic is busy.  Alice would
   like Bob discussed more thoroughly in [conf-models].
    For brevity, only the two most popular conferencing models are
    significantly discussed in this document (local and centralized
    mixing).  Applications of the conversation spaces model to call her automatically when he is available.  When Bob
   hangs up, alice's phone rings. When Alice answers, Bob's phone
   rings.  Bob answers multicast
    and they talk.

   Find-Me - Alice sets up complicated rules multi-unicast (full unicast mesh) conferences are left as an
    exercise for how she the reader.  Note that a distributed full mesh
    conference can be reached
   (possibly using [CPL], [presence] or other factors).  When Bob calls
   Alice, his call is eventually routed used for basic conferences, but does not easily
                        SIP Multiparty Framework

    allow for more complex conferencing actions like splitting, joining,
    and forking.

    Call control features should be designed to allow a temporary Contact where
   Alice happens mixer (local or
    centralized) to be available.

   Whispered call waiting - Alice is in decide when to reduce a conversation with Bob.  Carol
   calls Alice.  Either Carol can "whisper" conference back to Alice directly ("Can you
   get lunch in 15 minutes?"), a 2-party
    call, or an automaton whispers to Alice
   informing her that Carol is trying drop all the participants (for example if only two
    automatons are communicating).  The actual heuristics used to reach her.

   Voice message screening - Bob
    release calls Alice.  Alice is screening her
   calls, so Bob hears Alice's voicemail greeting.  Alice can hear Bob
   leave his message.  If she decides to talk to Bob, she can take are beyond the
   call back from scope of this document, but may depend
    on properties in the voicemail system, otherwise she can let Bob leave
   a message. This emulates conversation space, such as the behavior number of a home telephone answering
   machine

   Presence-Enabled Conferencing: Alice wants to set up a conference
   call with Bob
    active, passive, or hidden participants; and Cathy when they all happen to be available (rather
   than scheduling a predefined time).  The server providing the
   application monitors their status, and send-only, receive-
    only, or send-and-receive orientation of various participants.

4.3.1  (Single) End System Mixing

    The first model we call "end system mixing". In this model, user A
    calls all three when user B, and they are
   all "online", not idle, and not in another call.

   IM Conference Alerts: A user receives an notification as an Instant
   Message whenever someone joins a conference they are also in.

   Single Line Extension -- A group of phones are all treated as
   "extensions" of have a single line. conversation. At some point later, A call for one rings them all.  As
   soon as one answers, the others stop ringing.  If any extension is
   actively
    decides to conference in user C. To do this, A calls C, using a coversation, another extension can "pick up" and
   immediately join the conversation.
    completely separate SIP call. This emulates the behavior of a
   home telephone line with multiple phones.

   Click-to-dial - Alice looks in her company directory for Bob.  When
   she finds Bob, she clicks on call uses a URL to different Call-ID,
    different tags, etc. There is no call him.  Her phone rings (or
   possibly answers automatically), set up directly between B and when she answers, Bob's phone
   rings.
    C.  No SIP Multiparty Framework

   Pre-paid calling - Alice pays for a certain currency extension or unit amount
   of calling value.  When she places a call, she provides her account
   number somehow.  If her account runs out of calling value during a
   call her call external signaling is disconnected or redirected to a service where she
   can purchase more calling value.

   Voice Portal - needed.  A service that allows users merely
    decides to access locally join two call-legs.

       B     C
        \   /
         \ /
          A

    A receives media streams from both B and C, and mixes them. A sends
    a portal site
   using spoken dialog interaction.  For example, Alice needs stream containing A's and C's streams to
   schedule a working dinner with her co-worker Carol. Alice uses B, and a
   voice portal stream
    containing A's and B's streams to check Carol's flight schedule, find C. Basically, user A handles both
    signaling and media mixing.

4.3.2 Centralized Mixing

    In a restauraunt
   near her hotel, make centralized mixing model, all participants have a reservation, get directions there, pairwise SIP
    and page
   Carol media relationship with this information.

4 Key Concepts

4.1 "Conversation Space" Model

   This document introduces the concept mixer.  Three applications of an abstract "conversation
   space" (essentially
    centralized mixing are also discussed below.

    [diagram]

4.3.2.1 Dial-In Conference Servers

    Dial-In conference servers closely mirror dial-in conference bridges
    in the traditional PSTN. A dial-in conference server acts as a set of participants who believe they are
   all communicating among one another).  Each conversation space
   contains one or more participants.

   Participants are
    normal SIP User Agents which send original media to or
   terminate UA. Users call it, and receive the server maintains point to
    point SIP relationships with each user that calls in. The server
    takes the media from other members of the conversation
   space.  Logically, every participant in users who dial into the conversation space has
   access to all same conference,
    mixes them, and sends out the media generated appropriate mixed stream to each
    participant separately.

    As in that space (this other applications of centralized mixing, the conference is strictly
   true if all participants share
    identified by the request URI of the calls from each participant.
    This provides numerous advantages from a common media type).  A SIP User
   Agent which does not contribute or consume any media is NOT a
   participant; nor is a user agent which merely forwards, transcodes,
   mixes, or selects media originating elsewhere in the conversation
   space.  [Note that a conversation space consists services and routing point
    of zero or more SIP
   calls or view. For example, one conference on the server might be known as
                        SIP Multiparty Framework

    sip:conference34@servers.com. All users who call
    sip:conference34@servers.com are mixed together. Dial-In conference
    servers are usually associated with pre-arranged conferences.  A conversation space
    However, the same model applies to ad-hoc conferences. An ad-hoc
    conference server creates the conference state when the first user
    joins, and destroys it when the last one leaves. The SIP interface
    is similar identical to the
   definition of pre-arranged case.

4.3.2.2 Ad-hoc Centralized Conferences

    In an ad-hoc centralized conference, two users A and B start with a "call" in
    normal SIP call. At some other call models.]

   Participants may represent human users or non-human users (referred point later, they decide to as robots or automatons in this document).  Some participants may
   be hidden within add a conversation space. Some examples third
    party. Instead of hidden
   participants include: robots which generate tones, images, or
   announcements during using end system mixing, they would prefer to use
    a conference central SIP mixer. Initially, A calls B. At some point, B decides
    to announce users arriving add user C to the call, and
   departing, begins the transition to a human call center supervisor monitoring conference
    server. The first step in this process is the discovery of a conversation
   between
    conference server that supports ad-hoc conferences. This can be done
    through static configuration, or through any of a trainee and a customer, and robots which record media for
   training or archival purposes.

   Participants may also be active or passive.  Active participants are
   expected to be intelligent enough to leave a conversation space when
   they no longer desire to participate.  (An attentive human
   participant is obviously active.)  Some robotic participants (such number of standard
    service discovery protocols, such as the Service Location Protocol
    [SLP]. Once the server is discovered, a voice messaging system, conference ID is chosen. The
    first participant to send an instant messaging agent, or a voice
   dialog system) may be active participants if they can leave INVITE to this URL creates the
   conversation space when there is no human interaction.  Other robots
   (for example our tone generating robot from initial
    conference state in the previous example) server.  SIP dialogs are passive participants.  A human manipulated (using
    any combination of 3pcc or peer-to-peer signaling) so that each
    participant "on-hold" is passive.

                       SIP Multiparty Framework

   An example diagram of sending media to the conference server. It is also
    possible to transition from a conversation space can be shown as end system mixed conference (even one
    with a
   "bubble" or ovals, or as complex connection topology), to a "set" in curly or square brace notation.
   Each set, oval, or "bubble" represents centralized conference
    server.

4.3.2.3 Dial-Out Conferences

    Dial-out conferences are a conversation space. Hidden
   participants simple variation on dial-in conferences.
    Instead of the users joining the conference by sending an INVITE to
    the server, the server chooses the users who are shown in lowercase letters.

   { A , B }          [ A , B ]

      .-.                 .---.
     /   \               /     \
    /  A  \             / A   b \
   (       )           (         )
    \  B  /             \ C   D /
     \   /               \     /
      '-'                 '---'

4.1.1 Comparison with Related Definitions

   In SIP, a call is "an informal term that refers to some
   communication between peers, generally set up for the purposes of a
   multimedia conversation."  Obviously we cannot discuss normative
   behavior based on such an intentionally vague definition.  The
   concept be members of a conversation space is needed because
    the SIP definition
   of call is not sufficiently precise conference, and then sends them the INVITE. Typically dial out
    conferences are pre-arranged, with specific start times and an
    initial group membership list. However, there are other means for
    the purpose of describing dial-out server to determine the user experience list of multiparty features.

   Do any other definitions convey participants, including
    user presence [13].  Once the correct meaning?  SIP, and [SDP]
   both define a conference as "a multimedia session identified by a
   common session description."  A session is defined as "a set of
   multimedia senders and receivers and users accept or reject the data streams flowing call from
   senders to receivers."  Both of these definitions are heavily
   oriented toward multicast sessions with little differenciation among
   participants.  As such, neither is particularly useful for our
   purposes.  In fact,
    the definition dial out server, the behavior of "call" in some call models this system is
   more similar identical to our definition of a conversation space.

   Some examples of the relationship between conversation spaces, SIP
   call legs,
    dial-in server case.

4.3.3 Multicast and SIP sessions are listed below. Multi-unicast conferences

    In each example, a
   human user will perceive that there is a single call.

       A simple two-party call is a single conversation space, a single
       session, these models, all endpoints send media to all other endpoints.
    Consequently every endpoint mixes their own media from all the other
    sources, and a single call-leg.

       A locally mixed three-way call is two sessions sends their own media to every other participant.

    [diagrams]

4.3.3.1 Large-Scale Multicast Conferences

    Large-scale multicast conferences were the original motivation for
    both the Session Description Protocol [SDP] and two call-
       legs.  It is also a single conversation space.

       A simple dial-in audio conference is SIP. In a single conversation
       space, but is represented by as many call-legs and sessions as
       there are human participants.

       A large-
    scale multicast conference is a single conversation space, a single
       session, and as many call-legs as participants. conference, one or more multicast addresses are
                        SIP Multiparty Framework

4.2 Signaling Models

   Obviously

    allocated to make changes the conference. Each participant joins that multicast
    groups, and sends their media to a conversation space, you must be able those groups. Signaling is not sent
    to use SIP the multicast groups. The sole purpose of the signaling is to cause these changes.  Specifically there
   must be a way to manipulate SIP dialogs (call legs) to move
    inform participants into and out of conversation spaces.  Although this is which multicast groups to join. Large-scale
    multicast conferences are usually pre-arranged, with specific start
    and stop times.  However, multicast conferences do not as obvious, there also must need to be
    pre-arranged, so long as a way to manipulate SIP dialogs mechanism exists to include non-participant user agents which are otherwise involved
   in dynamically obtain a conversation space (ex: B2BUAs, 3pcc controllers, mixers,
   transcoders, translators, or relays).

   Implementations may setup the media relationships described
    multicast address.

4.3.3.2 Centralized Signaling, Distributed Media

    In this conferencing model, there is a centralized controller, as in
    the
   conversation space model using the approach described in [3pcc]. The
   3pcc approach relies on only dial-in and dial-out cases. However, the following 3 primitive operations:

       Create a new call-leg  (INVITE)
       Modify a call-leg      (reINVITE)
       Destroy a call-leg     (BYE) centralized server
    handles signaling only. The main advantage of the 3pcc approach media is that it only requires
   very basic SIP support from end systems to support call control
   features.  As such, third-party call control still sent directly between
    participants, using either multicast or multi-unicast. Multi-unicast
    is when a natural way user sends multiple packets (one for each recipient,
    addressed to
   handle protocol conversion and mid-call features.  It also has the
   advantage and disadvantage that new features can/must be implemented recipient). This is referred to as a
    "Decentralized Multipoint Conference" in one place only (the controller), [H.323].

4.3.3.3 Full Distributed Unicast Conferencing

    In this conferencing model, each participant has both a pairwise
    media relationship and neither a pairwise SIP relationship with every other
    participant (a full mesh).  This model requires enhanced
   client functionality, nor takes advantage of it.

   In addition, a peer-to-peer approach mechanism to
    maintain a consistent view of distributed state across the group.
    This is discussed at length a classic hard problem in computer science.  Also, this
   draft.  The primary drawback of the peer-to-peer
    model is additional
   end system complexity.  The benefits does not scale well for large numbers of participants.
    bascause for <n> participants the peer-to-peer model
   include:
   - state remains at the edges
   - call signaling need only go through participants involved
     (there are no additional points of failure)
   - peers can take advantage number of end-to-end message integrity or
     encryption
   - setup time is shorter (fewer messages media and round trips
     are required)

   The peer-to-peer approach relies on additional "primitive"
   operations, some of which are identified here.

       Replace an existing dialog
       Join a new dialog with an existing dialog [Join]
       Fork SIP
    relationships is approximately n-squared.  As a new dialog with an existing dialog
       Locally do media forking (multi-unicast)
       Ask another UA result, this model
    is not generally available in commercial implementations; to send a request on your behalf

   Many of the features, primitives,
    contrary it is primarily the topic of research or experimental
    implementations.  Note that this model assumes peer-to-peer
    signaling.

4.4 Conveying Information and actions described Events

    Participants should have access to information about the other
    participants in a conversation space, so that this
   document also require information can
    be rendered to a human user or processed by an automaton.  Although
    some type of media mixing, combining, or
   selection as described in this information may be available from the next section.

                       SIP Multiparty Framework

4.3 Mixing Models Request-URI or
    To, From, Contact, or other SIP permits a variety headers, another mechanism of mixing models, which are discussed here
   briefly.  This topic
    reporting this information is discussed more thoroughly in [conf-models].
   For brevity, only the two most popular conferencing models necessary.

    Many applications are
   significantly discussed in this document (local and centralized
   mixing).  Applications of driven by knowledge about the conversation spaces model to multicast
   and multi-unicast (full unicast mesh) conferences are left as an
   exercise progress of
    calls and conferences.  In general these types of events allow for
    the reader.  Note that a construction of distributed full mesh applications, where the application
    requires information on dialog and conference can be used for basic conferences, state, but does is not easily
   allow for more complex conferencing actions like splitting, joining,
   and forking.

   Call control features should be designed to allow
    necessarily co-resident with an endpoint user agent or conference
    server.  For example, a mixer (local or
   centralized) to decide when to reduce involved in a conversation space may
    wish to provide URLs for conference back status, and/or conference/floor
    control.

                        SIP Multiparty Framework

    The SIP [Events] architecture defines general mechanisms for
    subscription to and notification of events within SIP networks.  It
    introduces the notion of a 2-party
   call, or drop all package which is a specific
    "instantiation" of the participants (for example if only two
   automatons are communicating).  The actual heuristics used events mechanism for a well-defined set of
    events.

    New event packages should be able to
   release calls are beyond
    provide the scope status of this document, but may depend
   on properties in the conversation space, such as a user's call-legs (dialogs), provide the number
    status of
   active, passive, or hidden participants; conferences and its participants, provide user presence
    information, and provide the send-only, receive-
   only, or send-and-receive orientation status of various participants.

4.3.1  (Single) End System Mixing

   The first model we call "end system mixing". In user's messages.  While this model, user
    is not an exhaustive list, these are sufficient to enable the sample
    features described in this document.

    A
   calls user B, and they have a conversation. At some point later, A
   decides conference event package allows users to subscribe to information
    about an entire conference in user C. To do this, A calls C, using a
   completely separate SIP call. or conversation space.  This call uses conference
    state could be provided by a different Call-ID,
   different tags, etc. There is no call set up directly between B and
   C.  No SIP extension conference server or external signaling mixing component
    (described in a later section) if centralized mixing is needed.  A merely
   decides to locally join two call-legs.

   [diagram]

   A receives media streams used, or
    gathered from both B and C, and mixes them. A sends
   a stream containing A's and C's streams to B, relevant peers and merged into a stream
   containing A's and B's streams to C. Basically, user A handles both
   signaling and media mixing. B and C are unaware cohesive set of
    state.  Notifications would convey information about the
    pariticipants such as: the multi-party
   call, from a SIP perspective at least. From an RTP perspective, A is
   a mixer, URL identifying each user, their
    status in the space (active, declined, departed), URLs to invoke
    other features (such as sidebar conversations), links to other
    relevant information (such as floor control policies), and so if floor
    control policies are in place, the RTCP reports from user's floor control status.  A will contain SDES
    dialog event package would provide information that indicates about all the existence of an additional party in dialogs
    the media stream.

4.3.2 Centralized Mixing

   In a centralized mixing model, all participants have a pairwise SIP
   and media relationship with target user is maintaining, what conversations the mixer.  Three applications of
   centralized mixing are also discussed below.

   [diagram]

4.3.2.1 Dial-In Conference Servers
                       SIP Multiparty Framework

   Dial-In conference servers closely mirror dial-in conference bridges user in the traditional PSTN. A dial-in conference server acts as a
   normal SIP UA. Users call it,
    participating in, and the server maintains point to
   point SIP relationships with each user that calls in. The server
   takes the media from the users who dial into the same conference,
   mixes them, how these are correlated.  Concrete proposals
    for conference events and sends out the appropriate mixed stream to each
   participant separately. The model is depicted dialog events are described in Figure 3. [dialog-
    pkg] and [conf-pkg] respectively.

    Note that
   each UA (A,B,C,D) user presence has a point to point SIP and RTP close relationship with
   the conference server. Each call has a different Call-ID. Each user
   sends their own media these two
    proposed event packages. It is fundamental to the server. The media delivered presence model
    that the information used to obtain user A
   by the server presence is the media mixed constructed
    from users B, C any number of different input sources. Examples of such sources
    include SIP REGISTER requests and D. The media
   delivered uploads of presence documents.
    These two packages can be considered another mechanism that allows a
    presence agent to user B by determine the server is presence state of the media mixed from users A, C
   and D. The media delivered to user.
    Specifically, a user C by the presence server is can act as a subscriber for the media
   mixed from users A, B
    call-leg and D. conference packages to obtain additional information
    that can be used to construct a presence document.

    The media delivered multi-party architecture should also provide a mechanism to user D is get
    information about the
   media mixed from users A, B and C (this is also known as a mix-minus
   configuration).

   As in other applications of centralized mixing, the conference is
   identified by the request URI status /handling of the calls from each participant.
   This provides numerous advantages from a services and routing point
   of view [9]. For dialog (for example, one conference on the server might be
   known as sip:conference34@servers.com. All users who call
   sip:conference34@servers.com are mixed together. Dial-In conference
   servers are usually associated with pre-arranged conferences.
   However,
    information about the same model applies history of other contacts attempted prior to ad-hoc conferences. An ad-hoc
   conference server creates the conference state when the first user
   joins, and destroys it when
    the last one leaves. The SIP and RTP
   interfaces are identical to current contact).  Finally, the pre-arranged case.

4.3.2.2 Ad-hoc Centralized Conferences

   In an ad-hoc centralized conference, two users A and B start with a
   normal SIP call. At some point later, they decide architecture should provide
    ample opportunities to add a third
   party. Instead of using end system mixing, they would prefer present informational URIs which relate to use
   a conference server. Initially, A calls B. At
    calls, conversations, or dialogs in some point, B decides
   to add user C to the call, and begins way.  For example, consider
    the transition to a conference
   server. The first step SIP Call-Info header, or Contact headers returned in this process is the discovery of a
   conference server that supports ad-hoc conferences. This 300-class
    response.  Frequently additional information about a call or dialog
    can be done
   through static configuration, or through any of fetched via non-SIP URIs.  For example, consider a number web page
    for package tracking when calling a delivery company, or a web page
    with related documentation when joining a dial-in conference.  The
    use of standard
   service discovery protocols, such as the Service Location Protocol
   [SLP]. Once URIs in the server multiparty framework is discovered, discussed in more detail
    in Section 4.6.

                        SIP Multiparty Framework

4.5 Componentization and Decomposition

    This framework proposes a conference ID is chosen. decomposed component architecture with a
    very loose coupling of services and components.  This ID must means that a
    service (such as a conferencing server or an auto-attendant) need
    not be globally unique. The conference ID is then prepended
   to implemented as an actual server.  Rather, these services can
    be built by combining a few basic components in straightforward or
    arbitrarily complex ways.

    Since the server, and components are easily deployed on separate boxes, by
    separate vendors, or even with separate providers, we achieve a SIP URL
    separation of function that allows each piece to be developed in
    complete isolation.  We can also reuse existing components for the ad-hoc conference is formed.
   For example, if the server "a.servers.com" is used, new
    applications.  This allows rapid service creation, and the unique
   ID is "a7hytaskp09878a", the SIP URL ability
    for this conference is
   sip:a7hytaskp09878a@a.servers.com. The first participant to send an
   INVITE services to this URL creates the initial conference state be distributed across organizational domains
    anywhere in the
   server.  SIP dialogs are manipulated (using any combination Internet.

    For many of 3pcc
   or peer-to-peer signaling) so that each participant these components it is sending media also desirable to discover their
    capabilities, for example querying the conference server. It is also possible ability of a mixer to transition from host a
   end system mixed conference (even one with a complex connection
   topology),
    10 dialog conference, or to reserve resources for a centralized conference server.

                       SIP Multiparty Framework

4.3.2.3 Dial-Out Conferences

   Dial-out conferences are a simple variation on dial-in conferences.
   Instead of the users joining specific time.
    These actions could be provided in the conference by sending form of URLs, provided there
    is an INVITE a priori means of understanding their semantics.  For example
    if there is a published dictionary of operations, a way to query the server,
    service for the server chooses available operations and the users who are to be members of associated URLs, the conference, and then sends them
    URL can be the INVITE. Typically dial out
   conferences are pre-arranged, with specific start times and an
   initial group membership list. However, there are other means interface for providing these service operations.
    This concept is described in more detail in the dial-out server to determine the list context of participants, including
   user presence [13]. The model dialog
    operations in no way limits the means by section 4.6

4.5.1 Media Intermediaries

    Media Intermediaries are not participants in any conversation space,
    although an entity which is also a media translator may also have a
    colocated participant component (for example a mixer which also
    announces the server determines the set arrival of users. Once the users accept or
   reject the call from a new participant; the dial out server, announcement portion
    is a participant, but the behavior of this
   system mixer itself is identical not).  Media
    intermediaries should be as transparent as possible to the dial-in server case of Section 4. Thus, end
    users--offering a
   dial-out conference server will generally need to support dial-in
   access for the same conference, if it wishes to allow joining after
   the conference begins. Note that, from the participants perspective,
   they will learn the conference identity (the URL) from the From
   field useful, fundamental service; without getting in
    the INVITE messages received from the server.

4.3.3 Multicast and Multi-unicast conferences

   In these models, all endpoints send way of new features implemented by participants.  Some common
    media to all other endpoints.
   Consequently every endpoint mixes their own intermediaries are desribed below.

4.5.1.1 Mixer

    A SIP mixer is a component that combines media from all dialogs in
    the other
   sources, and sends their own same conversation in a media to every other participant.

   [diagrams]

4.3.3.1 Large-Scale Multicast Conferences

   Large-scale multicast conferences were specific way.  For example, the original motivation
    default combining for
   both an audio conference would be an N-1
    configuration, while the Session Description Protocol [SDP] and SIP. In same mixer might interleave text messages
    on a large-
   scale multicast conference, one per-line basis.

    Conventions for specifying a mixing or more multicast addresses are
   allocated to the conference (more than one may be needed if layered
   encodings conferencing service in a SIP
    URI are proposed in use). Each participant joins that multicast groups,
   and sends their [ms-uri].

                        SIP Multiparty Framework

4.5.1.2 Transcoder

    A transcoder translates media from one encoding or format to those groups. Signaling is not sent another
    (for example, GSM voice to the
   multicast groups. The sole purpose of the signaling is G.711, MPEG2 to inform
   participants of which multicast groups H.261, or text/html to join. Large-scale
   multicast conferences are usually pre-arranged, with specific start
    text/plain).

4.5.1.3 Media Relay

    A media relay terminates media and stop times (which is why this information exists in SDP).
   Protocols such as the Session Announcement Protocol [SAP] are used simply forwards it to announce these conferences. However, multicast conferences do not
   need to be pre-arranged, so long as a mechanism exists new
    destination without changing the content in any way.  Sometimes
    media relays are used to
   dynamically obtain provide source IP address anonymity, to
    facilitate middlebox traversal, or to provide a multicast address. So, if there are N
   participants, there will trusted entity where
    media can be point-to-point SIP relationships with
   pairs of participants. Each participant sends forcefully disconnected.

4.5.2 Queue Server

    A queue server is a single media stream
   to the group, and receives up location where calls can be entered into one of
    several FIFO (first-in, first-out) queues.  A queue server would
    subscribe to N-1 streams at any time. Note that the number presence of streams groups or individuals who are
    interested in its queues.  When detecting that a user will receive depends on who is
   actually sending at any given time. If available
    to service a queue, the stream is audio, and
   silence suppression is utilized, server redirects or transfers the number of streams last call
    in the relevant queue to the available user.  On a user will
   receive at any given time is equal queue-by-queue
    basis, authorized users could also subscribe to the number call state
    (dialog information) of calls within a queue.  Authorized users talking at
   any given time. Even for very large conferences,
    could use this is usually
   just information to effectively pluck (take) a small number call out of users.

                       SIP Multiparty Framework

4.3.3.2 Centralized Signaling, Distributed Media

   In this conferencing model, there is
    the queue (for example by sending an INVITE with a centralized controller, as in Replaces header
    to one of the dial-in and dial-out cases. However, user agents in the centralized server
   handles signaling only. The media is still sent directly between
   participants, using either multicast or multi-unicast. Multi-unicast queue).

4.5.3 Parking Place

    A parking place is when a user sends multiple packets (one for each recipient,
   addressed to that recipient). This location where calls can be terminated
    temporarily and then retrieved later.  While a call is referred to "parked", it
    can receive media "on-hold" such as music, announcements, or
    advertisements.  Such a
   "Decentralized Multipoint Conference" in [H.323].

4.3.3.3 Full Distributed Unicast Conferencing

   In this conferencing model, each participant has both service could be further decomposed such
    that announcements or music are handled by a pairwise
   media relationship separate component.

4.5.4 Announcements and a pairwise SIP relationship with every other
   participant (a full mesh).  This model requires a mechanism to
   maintain a consistent view of distributed state across the group.
   This Voice Dialogs

    An announcement server is a classic hard problem in computer science.  Also, this
   model does not scale well for large numbers of participants.
   bascause for <n> participants the number of server which can play digitized media and SIP
   relationships
    (frequently audio), such as music or recorded speech.  These servers
    are typically accessible via SIP, HTTP, or RTSP.  An analogous
    service is approximately n-squared.  As a result, this model recording service which stores digitized media.  A
    convention for specifying announcements in SIP URIs is not generally available described in commercial implementations; to
    [ms-uri].  Likewise the
   contrary it same server could easily provide a service
    which records digitized media.

    A "voice dialog" is primarily the topic of research or experimental
   implementations.  Note that this a model assumes peer-to-peer
   signaling.

4.4 Conveying Information of spoken interactive behavior between a
    human and Events

   Participants should have access to information about the other
   participants in a conversation space, so that this information can
   be rendered to a human user or processed by an automaton.  Although
   some of this information may be available from the Request-URI or
   To, From, Contact, or other SIP headers, another mechanism of
   reporting this information is necessary.  Note that the data
   reported by RTCP is insufficient for these purposes, as deletions
   and additions are not detectable in real-time, and SIP may setup
   session automaton which do not involve RTP media.

   Many applications are driven by knowledge about the progress can include synthesized speech,
    digitized audio, recognition of
   calls spoken and conferences.  In general these types of events allow for
   the construction DTMF key input, recording
    of distributed applications, where the application
   requires information on dialog spoken input, and conference state, but is not
   necessarily co-resident interaction with an endpoint user agent or conference
   server.  For example, a mixer involved in a conversation space may
   wish to provide URLs for conference status, and/or conference/floor call control.

   The SIP [Events] architecture defines general mechanisms for
   subscription to and notification of events within SIP networks.  It
   introduces the notion of a package which is a specific
   "instantiation" Dialogs
    frequently consist of the events mechanism for a well-defined set forms or menus. Forms present information and
    gather input; menus offer choices of
   events.

   New event packages should be able what to do next.

                        SIP Multiparty Framework

   provide the status

    Spoken dialogs are a basic building block of applications which use
    voice.  Consider for example that a user's call-legs (dialogs), provide voice mail system, the
   status of conferences
    conference-id and its participants, provide user presence
   information, passcode collection system for a conferencing
    system, and provide the status of user's messages.  While this complicated voice portal applications all require a
    voice dialog component.

4.5.4.1. Text-to-Speech and Automatic Speech Recognition

    Text-to-Speech (TTS) is not an exhaustive list, these are sufficient to enable the sample
   features described in this document.

   A conference event package allows users to subscribe to information
   about an entire conference or conversation space.  This conference
   state could be provided by a conference server or mixing component
   (described in Section 4.5) if centralized mixing service which converts text into digitized
    audio.  TTS is used, or
   gathered from relevant peers and merged frequently integrated into other applications, but
    when separated as a cohesive set component, it provides greater opportunity for
    broad reuse.  Various interfaces to access standalone TTS services
    via HTTP, [CATS], and SIP ([app-components], and [ms-uri]) have been
    proposed.

    Automatic Speech Recognition (ASR) is a service which attempts to
    decipher digitized speech based on a proposed grammar.  Like TTS,
    ASR services can be embedded, or exposed so that many applications
    can take advantage of
   state.  Notifications would convey information about the
   pariticipants such as: the SIP URL identifying each user, their
   status in the space (active, declined, departed), URLs services.  Various IP interfaces to invoke
   other features (such ASR,
    such as sidebar conversations), links CATS, have been proposed.

4.5.4.2. VoiceXML

    [VoiceXML] is a W3C recommendation that was designed to other
   relevant information (such as floor give authors
    control policies), over the spoken dialog between users and if floor
   control policies are in place, applications. The
    application and user take turns speaking: the user's floor control status.  A
   "call-leg" event package would provide information about all the
   dialogs application prompts
    the target user is maintaining, what conversations user, and the user in participating in, and how these are correlated.  A concrete
   proposal for both conference events and call-leg events is described
   in [call-pkg].

   Note that user presence has a close relationship with these two
   proposed event packages. It turn responds.  Its major goal is fundamental to bring
    the presence model advantages of web-based development and content delivery to
    interactive voice response applications.  We believe that VoiceXML
    represents the information used to obtain user presence is constructed
   from any number of different input sources. Examples of such sources
   include ideal partner for SIP REGISTER requests in the development of
    distributed IVR servers. VoiceXML is an XML based scripting language
    for describing IVR services at an abstract level. VoiceXML supports
    DTMF recognition, speech recognition, text-to-speech, and uploads playing
    out of presence documents.
   These two packages can be considered another mechanism that allows a
   presence agent to determine the presence state recorded media files. The results of the user.
   Specifically, a user presence server can act as a subscriber for data collected from
    the
   call-leg and conference packages to obtain additional information
   that can be used user are passed to construct a presence document. controlling entity through an HTTP POST
    operation. The multi-party architecture should controller can then return another script, or
    terminate the interaction with the IVR server.

    A VoiceXML server also provide need not be implemented as a mechanism to get
   information about the status /handling monolithic
    server.  Below is a diagram of a VoiceXML browser which is split
    into media and non-media handling parts.  The VoiceXML interpreter
    handles SIP dialog (for example,
   information about the history of other contacts attempted prior state and state within a VoiceXML document, and
    sends requests to the current contact).  Finally, the architecture should provide
   ample opportunities to present informational URIs which relate to
   calls, conversations, media component over another protocol (for
    example [RTSP] or dialogs in some way.  For example, consider
   the CATS).

                        +-------------+
                        |             |
                        | VoiceXML    |
                        | Interpreter |
                        | (signaling) |
                        +-------------+
                        SIP Call-Info header, or Contact headers returned in a 300-class
   response.  Frequently additional information Multiparty Framework

                          ^          ^
                          |          |
                      SIP |          | RTSP
                          |          |
                          |          |
                          v          v
             +-------------+        +-------------+
             |             |        |             |
             |  SIP UA     |   RTP  | RTSP Server |
             |             |<------>|   (media)   |
             |             |        |             |
             +-------------+        +-------------+

                Figure : Decomposed VoiceXML Server

    More details about a call or dialog
   can be fetched via non-SIP URIs.  For example, consider a web page
   for package tracking when calling a delivery company, or a web page the integration of SIP with related documentation when joining a dial-in conference.  The
   use VoiceXML are provided
    in [sip-vxml]

4.6 Use of URIs

    All naming in the multiparty framework is discussed SIP uses URIs.  URIs in more detail SIP are used in Section 4.6.

4.5 Componentization and Decomposition

   This framework proposes a decomposed component architecture with a
   very loose coupling plethora of services
    contexts: the Request-URI; Contact, To, From, and components.  This means that a
   service (such as a conferencing server *-Info headers;
    application/uri bodies; and embedded in email, web pages, instant
    messages, and ENUM records.  The request-URI identifies the user or an auto-attendant) need
    service that the call is destined for.

    SIP Multiparty Framework

   not be implemented as an actual server.  Rather, these services can
   be built by combining a few basic components in straightforward or
   arbitrarily complex ways.

   Since the components are easily deployed on separate boxes, by
   separate vendors, or even with separate providers, we achieve a
   separation of function that allows each piece to be developed URIs embedded in
   complete isolation.  We informational SIP headers, SIP bodies, and non-
    SIP content can also reuse existing components for new
   applications.  This allows rapid service creation, specify methods, special parameters, headers,
    and the ability
   for services to be distributed across organizational domains
   anywhere in the Internet. even bodies.  For many example:

    sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098
      &To=<sip:bob@biloxi.com>;tag=879738
      &From=<sip:alice@atlanta.com>;tag=023214

    sip:bob@babylon.biloxi.com;method=REFER?
      Refer-To=<http://www.atlanta.com/~alice>

    Throughout this draft we discuss call control primitive operations.
    One of these components it is also desirable to discover their
   capabilities, for example querying the ability of biggest problems is defining how these operations may be
    invoked.  There are a mixer number of ways to host a
   10 dialog conference, or do this.  One way is to reserve resources for a specific time.
   These actions could be provided
    define the primitives in the form of URLs, provided there
   is an a priori means of understanding their semantics.  For protocol itself such that SIP methods
    (for example
   if there is a published dictionary of operations, REFER) or SIP headers (for example Replaces) indicate a
    specific call control action.  Another way to query the
   service for invoke call control
    primitives is to define a specific Request-URI naming convention.
    Either these conventions must be shared between the available operations client (the
    invoker) and the associated URLs, server, or published by or on behlf of the server.
    The former involves defining URL can be the interface for providing these service operations.
   This concept is described in more detail construction techniques (e.g. URL
    parameters and/or token conventions) as proposed in [ms-uri].  The
    latter technique usually involves discovering the context of dialog
   operations in section 4.6

4.5.1 Media Intermediaries

   Media Intermediaries are not participants in any conversation space,
   although an entity which is also a media translator may also have URI via a
   colocated participant component (for example SIP
    event package, a mixer which also
   announces the arrival of web page, a new participant; business card, or an Instant Message.
    Yet another means to acquire the announcement portion URLs is to define a participant, but dictionary of
    primitives with well-defined semantics and provide a means to query
                        SIP Multiparty Framework

    the mixer itself is not).  Media
   intermediaries should named primitives and corresponding URLs that may be as transparent as possible to invoked on
    the end
   users--offering a useful, fundamental service; without getting service or dialogs.

4.6.1 Naming Users in
   the way of new features implemented by participants.  Some common
   media intermediaries are desribed below.

4.5.1.1 Mixer

   A mixer SIP

    An address-of-record, or public SIP address, is a component SIP (or SIPS) URI
    that combines media from all call-legs in the
   same conversation in points to a media specific way.  For example, the default
   combining for an audio conference would be an N-1 configuration.  In
   other words, each user receives domain with a mixed media stream location server that represents can map the combined audio URI
    to set of all Contact URIs where the users except himself or herself.

   For reference, user might be available.  Typically
    the RTP definition of a mixer is included below.
   Note Contact URIs are populated via registration.

         Address of Record        Contacts

         sip:bob@biloxi.com   ->  sip:bob@babylon.biloxi.com:5060
                                  sip:bbrown@mailbox.provider.net
                                  sip:+1.408.555.6789@mobile.net

    [Caller-prefs] defines a set of additional parameters to the Contact
    header that SIP multiparty applications may deal with media define the characteristics of the user agent at the
    specified URI.  For example, there is a mobility parameter which
    indicates whether the UA is
   not carried by RTP (for example Instant Messages).  A mixer, as
   defined above, can still combine fixed or mobile.  When a user agent
    registers, it places these messages parameters in the Contact headers to
    characterize the URIs it is registering.  This allows a media specific
   way and act as a SIP mixing component.

        "Mixer: An intermediate system proxy for
    that recieves RTP packets from
        one or more sources, ... combines domain to have information about the packets in some manner
        and then forwards contact addresses for that
    user.

    When a new RTP packet.  Since caller sends a request, it can optionally include the timing across
        multiple input sources will not generally be syncronized, Accept-
    Contact and Reject-Contact headers which request certain handling by
    the
        mixer will make timing adjustments among proxy in the streams and
                       SIP Multiparty Framework

        generate its own timing for target domain.  These headers contain preferences
    that describe the combined stream.  Thus all data
        packets originating from a mixer will be identified as having set of desired URIs to which the mixer as caller would like
    their syncronization source."

   Conventions for specifying a mixing or conferencing service request routed.  The proxy in the target domain matches these
    preferences with the Contact characteristics originally registered
    by the target user.  The target user can also choose to run
    arbitrarily complex "Find-me" feature logic on a SIP
   URI are proposed proxy in [ms-uri].

4.5.1.2 Media Translator

   RTP also defines an entity called a translator.  Like a mixer, this
   concept is useful outside of the context of RTP target
    domain.

    There is a strong asymmetry in how preferences for callers and
    callees can be applied presented to most other media types.

        "Translator: An intermediate system that forwards RTP packets
        with their syncronization source identifier intact.  Examples the network. While a caller takes an
    active role by initiating the request, the callee takes a passive
    role in waiting for requests. This motivates the use of translators include devices that convert encodings without
        mixing, replicators from multicast to unicast, callee-
    supplied scripts and application-
        level firewalls."

4.5.1.3 Transcoder

   A transcoder translates media from one encoding to another (for
   example, GSM voice to G.711, or MPEG2 to H.261). caller preferences included in the call
    request.  This asymmetry is also reflected in the appropriate
    relationship between caller and callee preferences. A transcoder server for
   RTP media is a type
    callee should respect the wishes of RTP translator.

4.5.1.4 Media Relay

   A media relay terminates media and simply forwards it to a new
   destination without changing the content in any way.  Sometimes
   media relays are used to provide source IP address anonymity, caller to
   facilitate middlebox traversal, or avoid certain
    locations, while the preferences among locations has to provide a trusted entity where
   media can be forcefully disconnected.  A media relay the
    callee's choice, as it determines where, for RTP is also
   a example, the phone
    rings and whether the callee incurs mobile telephone charges for
    incoming calls.

    SIP User Agent implementations are encouraged to make intelligent
    decisions based on the type of RTP Translator.

4.5.2 Queue Server

   A queue server participants (active/passive, hidden,
    human/robot) in a conversation space.  This information is conveyed
    in a location where calls can be entered into one of
   several FIFO (first-in, first-out) queues.  A queue server would
   subscribe to the presence of groups SIP URI parameter and communicated using an appropriate SIP
    header or individuals who are
   interested in its queues.  When detecting that event body.  For example, a user is available
   to music on hold service a queue, may take
    the server redirects sensible approach that if there are two or transfers the last call more unhidden
                        SIP Multiparty Framework

    participants, it should not provide hold music; or that it will not
    send hold music to robots.

    Multiple participants in the relevant queue to same conversation space may represent
    the available same human user.  On a queue-by-queue
   basis, authorized users could also subscribe to  For example, the call state
   (dialog information) of calls within a queue.  Authorized users
   could user may use this information to effectively pluck (take) one participant
    for video, chat, and whiteboard media on a call out of PC and another for audio
    media on a SIP phone.  In this case, the queue address-of-record is the
    same for both user agents, but the Contacts are different.  In
    addition, human users may add robot participants which act on their
    behalf (for example by sending an INVITE with a Replaces header call recording service, or a calendar
    reminder).  Call Control features in SIP should continue to one of the user agents function
    as expected in the queue).

4.5.3 Parking Place such an environment.

4.6.2 Naming Services with SIP URIs.

    A parking place is critical piece of defining a location where calls session level service that can be terminated
   temporarily and then retrieved later.  While a call
    accessed by SIP is "parked", it
   can receive media "on-hold" such as music, announcements, or defining the naming of the resources within that
    service.  This point cannot be overstated.

    In the context of SIP Multiparty Framework

   advertisements.  Such control of application components, we take
    advantage of the fact that the standard SIP URI has a service could user part.
    Most services may be further decomposed such thought of as user automatons that announcements participate
    in SIP sessions. It naturally follows that the user address, or music are handled by a separate component.

4.5.4 Announcements and Voice Dialogs

   An announcement server is the
    left-hand-side of the URI, should be utilized as a server which can play digitized service
    indicator.

    For example, media
   (frequently audio), such as music or recorded speech.  These servers
   are typically accessible via SIP, HTTP, or RTSP.  An analogous
   service is commonly offer multiple services at a
    single host address.  Use of the user part as a recording service which stores digitized media.  A
   convention for specifying announcements in indicator
    enables service consumers to direct their requests without
    ambiguity.  It has the added benefit of enabling media services to
    register their availability with SIP URIs is described Registrars just as any "real"
    SIP user would.  This maintains consistency and provides enhanced
    flexibility in
   [ms-uri].  Likewise the same server could easily provide deployment of media services in the network.

    There has been much discussion about the potential for confusion if
    media services URIs are not readily distinguishable from other types
    of SIP UA's.  The use of a service
   which records digitized media.

   A "voice dialog" is namespace provides a model of spoken interactive behavior between mechanism to
    unambiguously identify standard interfaces while not constraining
    the development of private or experimental services.

    In SIP, the request-URI identifies the user or service that the call
    is destined for.  The great advantage of using URIs (specifically,
    the SIP request URI) as a
   human and an automaton which can include synthesized speech,
   digitized audio, recognition service identifier comes because of spoken and DTMF key input, recording the
    combination of spoken input, two facts. First, unlike in the PSTN, where the
    namespace (dialable telephone numbers) are limited, URIs come from
    an infinite space. They are plentiful, and interaction with call control. Dialogs
   frequently consist they are free. Secondly,
    the primary function of forms or menus. Forms present information and
   gather input; menus offer choices SIP is call routing through manipulations of what
    the request URI. In the traditional SIP application, this URI
    represents people. However, the URI can also represent services, as
    we propose here. This means we can apply the routing services SIP
    provides to do next.

   Spoken dialogs are a basic building block routing of applications which use
   voice.  Consider for example that a voice mail system, calls to services. The result - the
   conference-id problem
    of service invocation and passcode collection system for service location becomes a conferencing
   system, and complicated voice portal applications all require routing
    problem, for which SIP provides a
   voice dialog component.

4.5.4.1. Text-to-Speech scalable and Automatic Speech Recognition

   Text-to-Speech (TTS) flexible solution.

                        SIP Multiparty Framework

    Since there is such a vast namespace of services, we can explicitly
    name each service which converts text into digitized
   audio.  TTS is frequently integrated into other applications, but
   when separated as in a component, it provides greater opportunity for
   broad reuse.  Various interfaces to access standalone TTS finely granular way. This allows the
    distribution of services
   via HTTP, RTSP (in [MRCP]), and SIP ([app-components], [ms-uri] and
   [MRCP-SIP]) have been proposed.

   Automatic Speech Recognition (ASR) is a service which attempts to
   decipher digitized speech based on across the network.

    Consider a proposed grammar.  Like TTS,
   ASR services can be embedded, or exposed so that many applications conferencing service, where we have separated the names
    of ad-hoc conferences from scheduled conferences, we can take advantage program
    proxies to route calls for ad-hoc conferences to one set of such services.  Various IP interfaces servers,
    and calls for scheduled ones to ASR,
   such as MRCP, have been proposed.

4.5.4.2. VoiceXML

   [VoiceXML] another, possibly even in a
    different provider. In fact, since each conference itself is given a W3C recommendation that was designed to give authors
   control over the spoken dialog between users and applications. The
   application
    URI, we can distribute conferences across servers, and user take turns speaking: the application prompts easily
    guarantee that calls for the user, and same conference always get routed to
    the user in turn responds.  Its major goal same server. This is in stark contrast to bring conferences in the advantages
    telephone network, where the equivalent of web-based development and content delivery to
   interactive voice response applications.  We believe that VoiceXML
   represents the ideal partner for SIP in URI - the development of
   distributed IVR servers. VoiceXML phone
    number - is an XML based scripting language
                       SIP Multiparty Framework

   for describing IVR services at an abstract level. VoiceXML supports
   DTMF recognition, speech recognition, text-to-speech, and playing
   out scarce. An entire conferencing provider generally has
    one or two numbers. Conference IDs must be obtained through IVR
    interactions with the caller, or through a human attendant. This
    makes it difficult to distribute conferences across servers all over
    the network, since the PSTN routing only knows about the dialed
    number.

    In the case of recorded media files. The results a dialog server, the voice dialog itself is the
    target for the call. As such, the request URI should contain the
    identifier for this spoken dialog. This is consistent with the
    Request-URI service invocation model of RFC 3087. This URL can be in
    one of two formats. In the data collected from first, the user are passed to a controlling entity through VoiceXML script is identified
    directly by an HTTP POST
   operation. The controller can then return another script, or
   terminate URL. In the interaction with second, the IVR server.

   A VoiceXML server also need script is not be implemented as specified.
    Rather, the dialog server uses its configuration to map the incoming
    request to a monolithic
   server.  Below is specific script.

    Since the request URI could indicate a diagram of request for a VoiceXML browser variety of
    different services, of which is split
   into media and non-media handling parts.  The VoiceXML interpreter
   handles SIP a dialog state and state within server is only one type, this
    example request URI first begins with a service identifier, that
    indicates the basic service required. For VoiceXML document, and
   sends requests scripts, this
    identification information is a URL-encoded version of the URL which
    references the script to execute, or if not present, the media component over another protocol (for
   example RTSP).

                       +-------------+
                       |             |
                       | VoiceXML    |
                       | Interpreter |
                       | (signaling) |
                       +-------------+
                         ^          ^
                         |          |
                     SIP |          | [RTSP]
                         |          |
                         |          |
                         v          v
            +-------------+        +-------------+
            |             |        |             |
            |  SIP UA     |   RTP  | RTSP Server |
            |             |<------>|   (media)   |
            |             |        |             |
            +-------------+        +-------------+

               Figure : Decomposed VoiceXML Server

   From a naming perspective, a critical issue when using VoiceXML is
   how a request URI is associated with a script to invoke when the
   call is answered. We see three primary mechanisms: 1) There is a
   one-to-one binding of the address in the request URI dialog
    server uses server-specific configuration to a determine which script
    to execute. These bindings are published by the provider

       Examples of the IVR
   service. 2) URLs that invoke VoiceXML dialogs are:
       (line folding for clarity only)

       sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml
        @vxmlservers.com

       sip:dialog.vxml@vxmlservers.com

    The initial script to execute is actually carried as
   content in the body first of the SIP INVITE request. The request URI these indicates that the desired service is execution of content in the
   request (i.e., sip:executebody@servers.com). 3) The initial dialog server (located at
    vxmlservers.com) should invoke a VoiceXML script
   to execute is fetched by from
    http://dialogs.server.com/script32.vxml. Since the VoiceXML server; user part of the
    SIP URL to fetch it
   from is passed in cannot contain the : character, this must be escaped to %3a.

    These types of conventions are not limited to application component
    servers.  An ordinary SIP INVITE message that initiates the IVR
   session. This User Agent can be accomplished either with the application/uri
   MIME type have a special URIs as
    well, for example, one which is automatically answered by a body, or using the *-Info headers defined in
                        SIP
   which provide references to content to fetch. We believe that the
   third approach is probably the best one. SIP is not the ideal
   transfer mechanism. Passing Multiparty Framework

    speakerphone.  Since URIs are so plentiful, using a separate URI allows a far better transfer tool, for example HTTP, to be used to actually fetch the script back from
                       SIP Multiparty Framework

   the controller. HTTP
    this service does not exhaust a valuable resource.  The requested
    service is then also used to pass back form data from
   the IVR clear to the controller. The results of user agent receiving the HTTP POST request.  This URI
    can also
   contain additional VoiceXML scripts to execute.  More details about
   the integration be included as part of SIP another feature (for example, the
    Intercom feature described in Section 6.1.6).  This feature can be
    specified with VoiceXML a SIP user parameter, since are provided in [sip-vxml]

4.6 Use part of URIs

   All naming in SIP uses URIs.  URIs in the userpart
    of a SIP are used in URI.

    Likewise a plethora Request URI can fully describe an announcement service
    through the use of
   contexts: the Request-URI; Contact, To, From, and *-Info headers;
   application/uri bodies; and embedded in email, web pages, instant
   messages, user part of the address and ENUM records.  The request-URI identifies additional URI
    parameters.  In our example, the user or portion of the address,
    "annc",  specifies the announcement service that on the call is destined for.

   SIP URIs embedded in informational SIP headers, SIP bodies, media server.
    The two URI parameters "play=" and non-
   SIP content can also "early=" specify methods, special parameters, headers,
   and even bodies.  For example:

   sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098
     &To=<sip:bob@biloxi.com>;tag=879738
     &From=<sip:alice@atlanta.com>;tag=023214

   sip:bob@babylon.biloxi.com;method=REFER?
     Refer-To=<http://www.atlanta.com/~alice>

   Throughout this draft we discuss call control primitive operations.
   One of the biggest problems is defining how these operations may be
   invoked.  There are a number of ways audio
    resource to do this.  One way play and whether early media is to
   define the primitives in the protocol itself such desired.

        sip:annc@ms2.carrier.net;
         play=http://audio.carrier.net/allcircuitsbusy.au;early=yes

        sip:annc@ms2.carrier.net;
         play=file://fileserver.carrier.net/geminii/yourHoroscope.wav

    In practical applications, it is important that SIP methods
   (for example REFER) or SIP headers (for example Replaces) indicate a
   specific call control action.  Another way an invoker does not
    necessarily apply semantic rules to invoke call control
   primitives is various URIs it did not create.
    Instead, it should allow any arbitrary string to define a specific Request-URI naming convention.
   Either these conventions must be shared between the client (the
   invoker) provisioned, and
    map the server, or published by or on behlf of string to the server.
   The former involves defining URL construction techniques (e.g. URL
   parameters and/or token conventions) as proposed in [ms-uri]. desired behavior. The
   latter technique usually involves discovering the URI via a SIP
   event package, a web page, administrator of a business card, or an Instant Message.
   Yet another means
    service may choose to acquire provision specific conventions or mnemonic
    strings, but the URLs application should not require it. In any large
    installation, the system owner is likely to have pre-existing rules
    for mnemonic URIs, and any attempt by an application to define its
    own rules may create a dictionary conflict.  Implementations should allow an
    arbitrary mix of
   primitives with well-defined semantics and provide a means to query
   the named primitives and corresponding URLs that may be invoked on
   the service or dialogs.

4.6.1 Naming Users in SIP

   An address-of-record, from these schemes, or public SIP address, is a SIP (or SIPS) URI any other scheme that points
    renders valid SIP URIs to be provisioned, rather than enforce only
    one particular scheme.

    For example, a domain with a location server that voicemail application can map the URI
   to set of Contact URIs where the user might be available.  Typically
   the Contact URIs are populated via registration.

        Address built using very
    different sets of Record        Contacts

        sip:bob@biloxi.com   ->  sip:bob@babylon.biloxi.com:5060
                                 sip:bbrown@mailbox.provider.net
                                 sip:+1.408.555.6789@mobile.net URI conventions, as illustrated below:

         URI Identity       Example Scheme 1
                                 Example Scheme 2
                                      Example Scheme 3

         Deposit with       sip:sub-rjs-deposit@vm.wcom.com
         standard greeting       sip:677283@vm.wcom.com
                                      sip:rjs@vm.wcom.com;mode=deposit

         Deposit with on    sip:sub-rjs-deposit-busy.vm.wcom.com
         phone greeting          sip:677372@vm.wcom.com
                                      sip:rjs@vm.wcom.com;mode=3991243

         Deposit with       sip:sub-rjs-deposit-sg@vm.wcom.com
         special greeting        sip:677384@vm.wcom.com
                                      sip:rjs@vm.wcom.com;mode=sg
                        SIP Multiparty Framework

   [Caller-prefs] defines a set

         Retrieve - SIP     sip:sub-rjs-retrieve@vm.wcom.com
         authentication          sip:677405@vm.wcom.com
                                      sip:rjs@vm.wcom.com;mode=retrieve

         Retrieve - prompt  sip:sub-rjs-retrieve-inpin.vm.wcom.com
         for PIN in-band         sip:677415@vm.wcom.com
                                      sip:rjs@vm.wcom.com;mode=inpin

    As we have shown, SIP URIs represent an ideal, flexbile mechanism
    for describing and naming service resources, be they queues,
    conferences, voice dialogs, announcements, voicemail treatments, or
    phone features.

4.7 Invoker Independence

    Only the invoker of additional parameters features in SIP need to know exactly which
    feature they are invoking.  One of the Contact
   header primary benefits of this
    approach is that define the characteristics combinations of the user agent at the
   specified URI. features should work in SIP call
    control.  For example, there is a mobility parameter which
   indicates whether let us examine the UA is fixed or mobile.  When combination of a user agent
   registers, it places these parameters in the Contact headers to
   characterize the URIs it
    "transfer" of a call which is registering.  This allows "conferenced".

    Alice calls Bob.  Alice silently "conferences in" her robotic
    assistant Albert as a proxy for
   that domain hidden party.  Bob transfers Alice to have information about the contact addresses for that
   user.

   When a caller sends Carol.
    If Bob asks Alice to Replace her leg with a request, it can optionally include the Accept-
   Contact new one to Carol then
    both Alice and Reject-Contact headers which request certain handling by
   the proxy in the target domain.  These headers contain preferences
   that describe Albert should be communicating with Carol
    (transparently).

    Using the set peer-to-peer model, this combination of desired URIs to which the caller would like
   their request routed.  The proxy in the target domain matches these
   preferences features works
    fine if A is doing local mixing (Alice replaces Bob's call-leg with
    Carol's), or if A is using a central mixer (the mixer replaces Bob's
    call leg with Carol's).  A clever implementation using the Contact characteristics originally registered
   by the target user.  The target user 3pcc
    model can also choose generate similar results.

    New extensions to run
   arbitrarily complex "Find-me" feature logic on a proxy the SIP Call Control Framework should attempt to
    preserve this property.

4.8 Billing issues

    Billing in the target
   domain.

   There PSTN is typically based on who initiated a strong asymmetry in how preferences for callers and
   callees can be presented to call.  At
    the network. While moment billing in a caller takes an
   active role by initiating the request, SIP network is neither consistent with
    itself, nor with the callee takes a passive
   role in waiting PSTN.  (A billing model for requests. This motivates the use of callee-
   supplied scripts SIP should allow
    for both PSTN-style billing, and caller preferences included in the call
   request.  This asymmetry is also reflected in the appropriate
   relationship between caller and callee preferences. A server for non-PSTN billing.)  The example
    below demonstrates one such inconsistency.

    Alice places a
   callee should respect the wishes of the caller call to avoid certain
   locations, while the preferences among locations has Bob.  Alice then blind transfers Bob to be the
   callee's choice, as it determines where, for example, the phone
   rings Carol
    through a PSTN gateway.  In current usage of REFER and whether the callee incurs mobile telephone charges BYE/Also, Bob
    may be billed for
   incoming calls.

   SIP User Agent implementations are encouraged to make intelligent
   decisions based on the type of participants (active/passive, hidden,
   human/robot) in a conversation space. call he did not initiate (his UA originated the
    outgoing call leg however).  This information is conveyed
   in not necessarily a SIP URI parameter and communicated using an appropriate SIP
   header or event body.  For example, terrible
    thing, but it demonstrates a music on hold service security concern (Bob must have
    appropriate local policy to prevent fraud).  Also, Alice may take
   the sensible approach that if there are two or more unhidden
   participants, it wish to
    pay for Bob's session with Carol.  There should not provide hold music; or that it will not
   send hold music be a way to robots.

   Multiple participants signal
    this in the same conversation space SIP.

                        SIP Multiparty Framework

    Likewise a Replacement call may represent maintain the same human user.  For example, the user billing
    relationship as a Replaced call, so if Alice first calls Carol, then
    asks Bob to Replace this call, Alice may use one participant
   for video, chat, and whiteboard media on continue to receive a PC and another for audio
   media on a SIP phone.  In this case, the address-of-record is the
   same for both user agents, but the Contacts are different.  In
   addition, human users may add robot participants which act on their
   behalf (for example a call recording service, or a calendar
   reminder).  Call Control features bill.

    Further work in SIP billing should continue to function
   as expected in such an environment.

                       SIP Multiparty Framework

4.6.2 Naming Services with SIP URIs.

   A critical piece of defining define a session level service that can be
   accessed by SIP is defining way to set or discover
    the naming direction of the resources within that
   service.  This point cannot be overstated.

   In the context billing.

5 Catalog of SIP call control of application components, we take
   advantage of the fact that actions and sample features

    Call control actions can be categorized by the standard SIP URI has dialogs upon which
    they operate.  The actions may involve a user part.
   Most services single or multiple dialogs.
    These dialogs can be early or established.  Multiple dialogs may be thought of as user automatons that participate
    related in SIP sessions. It naturally follows that the user address, a conversation space to form a conference or the
   left-hand-side of the URI, other
    interesting media topologies.

    It should be utilized as noted that it is desirable to provide a service
   indicator.

   For example, media servers commonly offer multiple services at means by which
    a
   single host address.  Use of party can discover the user part as actions which may be performed on a service indicator
   enables service consumers dialog.
    The interested party may be independent or related to direct their requests without
   ambiguity.  It has the added benefit dialogs.
    One means of enabling media services accomplishing this is through the ability to
   register their availability with SIP Registrars just as any "real"
   SIP user would.  This maintains consistency define and provides enhanced
   flexibility
    obtain URLs for these actions as described in section 4.6.

    Below are listed several call control "actions" which establish or
    modify dialogs and relate the deployment of media services participants in a conversation space.
    The names of the network.

   There has been much discussion about the potential actions listed are for confusion if
   media services URIs descriptive purposes only
    (they are not readily distinguishable from other types
   of SIP UA's.  The use normative).  This list of a service namespace provides a mechanism to
   unambiguously identify standard interfaces while actions is not constraining
   the development of private or experimental services. meant to be
    exhaustive.

    In SIP, the request-URI identifies examples, all actions are initiated by the user or service that the call
   is destined for. "Alice"
    represented by UA "A".

5.1 Early Dialog Actions

    The great advantage following are a set of actions that may be performed on a single
    early dialog.  These actions can be thought of using URIs (specifically,
   the SIP request URI) as a service identifier comes because set of remote
    control operations.  For example an automaton might perform the
   combination
    operation on behalf of two facts. First, unlike in a user.  Alternatively a user might use the PSTN, where
    remote control in the
   namespace (dialable telephone numbers) are limited, URIs come from form of an infinite space. They are plentiful, and they are free. Secondly, application to perform the primary function action
    on the early dialog of SIP is call routing through manipulations of
   the request URI. In the traditional SIP application, this URI
   represents people. However, the URI can also represent services, as
   we propose here. This means we can apply the routing services SIP
   provides to routing of calls to services. The result - the problem
   of service invocation and service location becomes a routing
   problem, for UA which SIP provides a scalable and flexible solution.
   Since there is such a vast namespace may be out of services, we can explicitly
   name each service in a finely granular way. This allows the
   distribution reach. All of services across these
    actions correspond to telling the network.

   Consider UA how to respond to a conferencing service, where we have separated the names
   of ad-hoc conferences from scheduled conferences, we can program
   proxies request to route calls
    establish an early dialog. These actions provide useful
    functionality for ad-hoc conferences to one set of servers, PDA, PC and calls for scheduled ones server based applications which desire
    the ability to another, possibly even in control a
   different provider. In fact, since each conference itself UA.

5.1.1 Remote Answer

    A dialog is given a
   URI, we can distribute conferences across servers, and easily
   guarantee that calls for in some early dialog state such as 180 Ringing.  It may
    be desirable to tell the same conference always get routed UA to answer the same server. This dialog.  That is in stark contrast tell it
    to conferences in the
   telephone network, where send a 200 Ok response to establish the equivalent of dialog.

5.1.2 Remote Forward or Put
                        SIP Multiparty Framework

    It may be desirable to tell the URI - the phone
   number - is scarce. An entire conferencing provider generally has
                       SIP Multiparty Framework

   one or two numbers. Conference IDs must be obtained through IVR
   interactions UA to respond with the caller, or through a human attendant. This
   makes it difficult 3xx class
    response to distribute conferences across servers all over
   the network, since the PSTN routing only knows about the dialed
   number.

   In the case of a forward an early dialog server, to another UA.

5.1.3 Remote Busy or Error Out

    It may be desirable to instruct the voice dialog itself UA to send an error response
    such as 486 Busy Here.

5.2 Single Dialog Actions

    There is the
   target another useful set of actions which operate on a single
    established dialog.  These operations are useful in building
    productivity applications for aiding users to control their phone.
    For example a CRM application which sets up calls for a user
    eliminating the call. As such, the request URI should contain the
   identifier need for this spoken dialog. This is consistent with the
   Request-URI service invocation model user to actually enter an address.
    These operations can also be thought of RFC 3087. a remote control actions.

5.2.1 Remote Dial

    This URL action instructs the UA to initiate a dialog.  This action can
    be in
   one of two formats. In performed using the first, REFER method.

5.2.2 Remote On and Off Hold

    This action instructs the VoiceXML script is identified
   directly by UA to put an HTTP URL. In the second, established dialog on hold.
    Though this operation can be conceptually be performed with the script
    REFER method, there is not specified.
   Rather, the dialog server uses its configuration no semantics defined as to map what the incoming
   request referred
    party should do with the SDP. There is no way to a specific script.

   Since distinguish between
    the desire to go on or off hold.

5.2.3 Remote Hangup

    This action instructs the UA to terminate an early or established
    dialog. A REFER request with the following Refer-To URI could indicate a request for performs
    this action.  Note: this URL is not properly escaped.

    sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098
      &To=<sip:bob@biloxi.com>;tag=879738
      &From=<sip:alice@atlanta.com>;tag=023214

5.3 Multi-dialog actions

    These actions apply to a variety of
   different services, set of which a dialog server related dialogs.

5.3.1 Transfer

    The conversation space changes as follows:

         before            after
         { A , B }  -->   { C , B }

    A replaces itself with C.

                        SIP Multiparty Framework

    To make this happen using the peer-to-peer approach, "A" would send
    two SIP requests.  A shorthand for those requests is only one type, shown below:
         REFER B  Refer-To:C
         BYE B

    To make this
   example request URI first begins with a service identifier, that
   indicates happen instead using the basic service required. For VoiceXML scripts, 3pcc approach, the controller
    sends requests represented by the shorthand below:
         INVITE C (w/SDP of B)
         reINVITE B (w/SDP of C)
         BYE A

    Features enabled by this
   identification information is action:
    - blind transfer
    - transfer to a URL-encoded version central mixer (some type of the URL which
   references the script conference or forking)
    - transfer to execute, park server (park)
    - transfer to music on hold or if not present, the dialog announcement server uses server-specific configuration
    - transfer to determine which script a "queue"
    - transfer to execute.

      Examples of URLs that invoke VoiceXML dialogs are:
      (line folding for clarity only)

      sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml
       @vxmlservers.com

      sip:dialog.vxml@vxmlservers.com

   The first of these indicates that the dialog server (located at
   vxmlservers.com) should invoke a VoiceXML script fetched service (such as Voice Dialogs service)
    - transition from
   http://dialogs.server.com/script32.vxml. Since the user part local mixer to central mixer

5.3.2 Take

    The conversation space changes as follows:

         { B , C }  -->   { B , A }

    A forcibly replaces C with itself.  In most uses of this primitive,
    A is just "un-replacing" itself.

    Using the
   SIP URL cannot contain peer-to-peer approach, "A" sends:
         INVITE B  Replaces: <call leg between B and C>

    Using the : character, 3pcc approach (all requests sent from controller)
         INVITE A (w/SDP of B)
         reINVITE B (w/SDP of A)
         BYE C

    Features enabled by this must be escaped to %3a.

   These types action:
    - transferee completes an attended transfer
    - retrieve from central mixer (not recommended)
    - retrieve from music on hold or park
    - retrieve from queue
    - call center take
    - voice portal resuming ownership of conventions are not limited to application component
   servers.  An ordinary SIP User Agent can have a special URIs call it originated
    - answering-machine style screening (pickup)
    - pickup of a ringing call (i.e. early dialog)

    Note: that pick up of a ringing call has perhaps some interesting
    additional requirements.  First of all it is an early dialog as
   well, for example, one
    opposed to an established dialog.  Secondly the party which is automatically answered by a
   speakerphone.  Since URIs are to
    pickup the call may only wish to do so plentiful, using a separate URI for
   this service does not exhaust a valuable resource.  The requested
   service only while it is clear to the user agent receiving the request.  This URI
   can also be included as part of another feature (for example, the
   Intercom feature described in Section 6.1.6).  This feature can be
   specified with a SIP user parameter, since are part of the userpart
   of a SIP URI.

   Likewise a Request URI can fully describe an announcement service
   through the use of the user part of the address and additional URI
   parameters.  In our example, the user portion of the address,
   "annc",  specifies early
    dialog.  That is in the announcement service on race condition where the media server.
   The two URI parameters "play=" and "early=" specify ringing UA accepts
    just before it receives signaling from the audio
   resource party wishing to play and whether early media is desired. take the
                        SIP Multiparty Framework

       sip:annc@ms2.carrier.net;
        play=http://audio.carrier.net/allcircuitsbusy.au;early=yes

       sip:annc@ms2.carrier.net;
        play=file://fileserver.carrier.net/geminii/yourHoroscope.wav

   In practical applications, it is important that an invoker does not
   necessarily apply semantic rules to various URIs it did not create.
   Instead, it should allow any arbitrary string to be provisioned, and
   map the string to

    call, the desired behavior. The administrator of a
   service may choose taking party wishes to provision specific conventions yield or mnemonic
   strings, but the application should not require it. In any large
   installation, cancel the system owner take.  The goal
    is likely to have pre-existing rules
   for mnemonic URIs, and any attempt by an application to define its
   own rules may create a conflict.  Implementations should allow avoid yanking an
   arbitrary mix of URLs answered call from these schemes, or any other scheme that
   renders valid SIP URIs to be provisioned, rather than enforce only
   one particular scheme.

   For example, a voicemail application can be built using very
   different sets of URI conventions, as illustrated below:

        URI Identity       Example Scheme 1
                                Example Scheme 2
                                     Example Scheme 3

        Deposit with       sip:sub-rjs-deposit@vm.wcom.com
        standard greeting       sip:677283@vm.wcom.com
                                     sip:rjs@vm.wcom.com;mode=deposit

        Deposit with on    sip:sub-rjs-deposit-busy.vm.wcom.com
        phone greeting          sip:677372@vm.wcom.com
                                     sip:rjs@vm.wcom.com;mode=3991243

        Deposit with       sip:sub-rjs-deposit-sg@vm.wcom.com
        special greeting        sip:677384@vm.wcom.com
                                     sip:rjs@vm.wcom.com;mode=sg

        Retrieve - SIP     sip:sub-rjs-retrieve@vm.wcom.com
        authentication          sip:677405@vm.wcom.com
                                     sip:rjs@vm.wcom.com;mode=retrieve

        Retrieve - prompt  sip:sub-rjs-retrieve-inpin.vm.wcom.com
        for PIN in-band         sip:677415@vm.wcom.com
                                     sip:rjs@vm.wcom.com;mode=inpin

   As we have shown, SIP URIs represent an ideal, flexbile mechanism
   for describing and naming service resources, be they queues,
   conferences, voice dialogs, announcements, voicemail treatments, or
   phone features.

                       SIP Multiparty Framework

4.7 Invoker Independence

   Only the invoker of features in SIP know exactly which feature they
   are invoking.  One of called party.

5.3.3 Add

    The conversation space changes as follows:

         { A , B } -->    { A, B, C }

    A adds C to the primary benefits of this approach is that
   combinations of features should work in SIP call control.  For
   example, let us examine conversation.

    Using the combination of peer-to-peer approach, adding a "transfer" of party using local mixing
    requires no signaling.  To transition from a 2-party call
   which is "conferenced".

   Alice calls Bob.  Alice silently "conferences in" her robotic
   assistant Albert as or a hidden party.  Bob transfers Alice
    locally mixed conference to Carol.
   If Bob asks Alice centrally mixing A could send the
    following requests:
         REFER B  Refer-To: mixer
         INVITE mixer
         BYE B

    To add a party to Replace her leg with a new one to Carol then
   both Alice and Albert should be communicating with Carol
   (transparently). central mixer:
         REFER C  Refer-To: mixer
                or
         REFER mixer  Refer-To: C

    Using the peer-to-peer model, this combination 3pcc approach to transition to centrally mixed, the
    controller would send:
         INVITE mixer leg 1 (w/SDP of features works
   fine if A is doing local mixing (Alice replaces Bob's call-leg with
   Carol's), or if A)
         INVITE mixer leg 2 (w/SDP of B)
         INVITE C (late SDP)
         reINVITE A is using (w/SDP of mixer leg 1)
         reINVITE B (w/SDP of mixer leg 2)
         INVITE mixer leg3 (w/SDP of C)

    To add a party to a central mixer:
         INVITE C (late SDP)
         INVITE mixer (the mixer replaces Bob's (w/SDP of C)

    Features enabled:
    - standard conference feature
    - call leg with Carol's). recording
    - answering-machine style screening (screening)

5.3.4 Local Join

    The conversation space changes like this:

         { A, B}  , {A, C}  -->  {A, B, C}

                or like this

         { A, B}  , {C, D}  -->  {A, B, C, D}

    A clever implementation using takes two conversation spaces and joins them together into a
    single space.

                        SIP Multiparty Framework

    Using the 3pcc
   model peer-to-peer approach, A can generate similar results.

   New extensions to mix locally, or REFER the SIP Call Control Framework should attempt
    participants of both conversation spaces to
   preserve this property.

4.8 Billing issues

   Billing in the PSTN is typically based on who initiated a call.  At the moment billing same central mixer
    (as in a SIP network is neither consistent with
   itself, nor with 5.3)

    For the PSTN.  (A billing model 3pcc approach, the call flows for SIP should allow
   for both PSTN-style billing, inserting participants,
    and non-PSTN billing.)  The example
   below demonstrates one such inconsistency.

   Alice places a call to Bob.  Alice then blind transfers Bob to Carol
   through a PSTN gateway.  In current usage of REFER joining and BYE/Also, Bob
   may be billed splitting conversation spaces are tedious yet
    straightforward, so these are left as an exercise for a call he did not initiate (his UA originated the
   outgoing call leg however).  This is not necessarily a terrible
   thing, but it demonstrates a security concern (Bob must have
   appropriate local policy to prevent fraud).  Also, Alice may wish to
   pay for Bob's session with Carol.  There should be reader.

    Features enabled:
    - standard conference feature
    - leaving a way sidebar to signal
   this in SIP.

   Likewise rejoin a Replacement call may maintain the same billing
   relationship as larger conference

5.3.5 Insert

    The conversation space changes like this:

         { B , C }  -->  {A, B, C }

    A inserts itself into a Replaced call, so if Alice first calls Carol, then
   asks Bob to Replace conversation space.

    A proposed mechanism for signaling this call, Alice may continue using the peer-to-peer
    approach is to receive send a bill.

   Further work new header in SIP billing should define a way an INVITE with "joining"
    semantics.  For example:
         INVITE B  Join: <call id of B and C>

    If B accepted the INVITE, B would accept responsibility to set or discover setup the direction of billing.

5 Catalog of
    call control actions legs and sample features

   Call control actions can be categorized by mixing necessary (for example: to mix locally or to
    transfer the dialogs upon which
   they operate.  The actions may involve participants to a single or multiple dialogs.
   These dialogs can be early or established.  Multiple dialogs may be central mixer)

    Features enabled:
    - barge-in
    - call center monitoring
    - call recording

5.3.6 Split
    { A, B, C, D } --> { A, B } , { C, D }

    If using a central mixer with peer-to-peer
    REFER C  Refer-To: mixer (new URI)
    REFER D  Refer-To: mixer (new URI)
    BYE C
    BYE D

    Features enabled:
    - sidebar conversations during a larger conference

5.3.7 Near-fork

    A participates in two conversation spaces simultaneously:

         { A, B } --> { B , A } & { A , C }
                        SIP Multiparty Framework

   related in

    A is a participant in two conversation space to form a conference or other
   interesting media topologies.

   It should be noted spaces such that it is desirable A sends the
    same media to provide a means both spaces, and renders media from both spaces,
    presumably by which
   a party can discover the actions which may be performed on a dialog.
   The interested party may be independent mixing or related to rendering the dialogs.
   One means of accomplishing this media from both.  We can
    define that A is through the ability to define and
   obtain URLs "anchor" point for these actions as described in section 4.6.

   Below are listed several call control "actions" both forks, each of which establish or
   modify dialogs and relate the participants in is
    a separate conversation space.
   The names of the actions listed are for descriptive purposes only
   (they are not normative).

    This list of actions action is not meant to be
   exhaustive.

   In purely local implementation (it requires no special
    signaling).  Local features such as switching calls between the examples, all actions
    background and foreground are initiated by possible using this media
    relationship.

5.3.8 Far fork

    The conversation space diagram...

         { A, B } --> { A ,  B } & { B , C }

    A requests B to be the user "Alice"
   represented by UA "A".

5.1 Early Dialog Actions "anchor" of two conversation spaces.

    For an example of using 3pcc to setup media forking, see [Media
    forking].  The following session descriptions for forking are a set of actions quite complex.
    Controllers should verify that may be performed on a single
   early dialog.  These actions endpoints can be thought of as a set handle forked-media, by
    using some type of remote Requires header token.

    Two ways to setup this media relationship using peer-to-peer call
    control operations.  For example an automaton might perform have been proposed:
    - the
   operation on behalf of a user.  Alternatively anchor receives a user might use REFER with requires forked-media (implicit)
    - the
   remote control anchor receives an INVITE with an explicit header (explicit)

    Features enabled:
    - barge-in
    - voice portal services
    - whisper
    - hotword detection
    - sending DTMF somewhere else

6 Security Considerations

    Call Control primitives provide a powerful set of features that can
    be dangerous in the form hands of an application attacker.  To complicate matters,
    call control primitives are likely to perform the action
   on the early dialog of a UA which may be out of reach. All automatically authorized
    without direct human oversight.

    The class of attacks which are possible using these
   actions correspond to telling the UA how to respond tools include
    the ability to eavesdrop on calls, disconnect calls, redirect calls,
    render irritating content (including ringing) at a request to
   establish user agent, cause
    an early dialog. These actions provide useful
   functionality for PDA, PC action that has billing consequences, subvert billing (theft-of-
    service), and server based applications which desire
   the ability to obtain private information.  Call control a UA.

5.1.1 Remote Answer

   A dialog is in some early dialog state such as 180 Ringing.  It may extensions
    must take extra care to describe how these attacks will be desirable
    prevented.

                        SIP Multiparty Framework

    We can also make some general observations about authorization and
    trust with respect to tell call control.  The security model is
    dramatically dependent on the UA to answer signaling model chosen (see section
    4.2)

    Let us first examine the dialog.  That security model used in the 3pcc approach.
    All signaling goes through the controller, which is tell it
   to send a 200 Ok response to establish the dialog.

5.1.2 Remote Forward or Put

   It trusted
    entity.  Traditional SIP authentication and hop-by-hop encrpytion
    and message integrity work fine in this environment, but end-to-end
    encrpytion and message integrity may not be desirable to tell possible.

    When using the UA to respond with a 3xx class
   response to forward an early dialog to another UA.

5.1.3 Remote Busy or Error Out

   It may peer-to-peer approach, call control actions and
    primitives can be desirable to instruct legitimately initiated by a) an existing
    participant in the UA to send conversation space, b) a former participant in
    the conversation space, or c) an error response
   such as 486 Busy Here.

5.2 Single Dialog Actions

   There is another useful set entity trusted by one of actions which operate the
    participants.  For example, a participant always initiates a
    transfer; a retrieve from Park (a take) is initiated on behalf of a single
   established dialog.  These operations are useful in building
   productivity applications for aiding users to control their phone.
   For example
    former participant; and a CRM application which sets up calls for barge-in (insert or far-fork) is initiated
    by a user
                       SIP Multiparty Framework

   eliminating the need trusted entity (an operator for the user to actually enter example).

    Authenticating requests by an address.
   These operations existing participant or a trusted
    entity can also be thought of a remote control actions.

5.2.1 Remote Dial

   This action instructs done with baseline SIP mechanisms.  In the UA to initiate case of
    features initiated by a dialog.  This action can former participant, these should be performed
    protected against replay attacks by using the REFER method.

5.2.2 Remote On and Off Hold

   This action instructs the UA to put an established dialog on hold.
   Though a unique name or
    identifier per invocation.  The Replaces header exhibits this
    behavior as a by-product of its operation (once a Replaces operation can be conceptually be performed with the
   REFER method, there
    is successful, the call-leg being Replaced no semantics defined as longer exists).  For
    other requests, a "one-time" Request-URI may be provided to what the referred
   party should do
    feature invoker.

    To authorize call control primitives that trigger special behavior
    (such as an INVITE with Replace, Join, or Fork semantics), the SDP. There is no way
    receiving user agent may have trouble finding appropriate
    credentials with which to distinguish between challenge or authorize the desire request, as the
    sender may be completely unknown to go on or off hold.

5.2.3 Remote Hangup

   This action instructs the UA receiver, except through the
    introduction of a third party.  These credentials need to terminate an early be passed
    transitively in some way or established
   dialog. A REFER request with the following Refer-To URI performs
   this action.  Note: this URL is not properly escaped.

   sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098
     &To=<sip:bob@biloxi.com>;tag=879738
     &From=<sip:alice@atlanta.com>;tag=023214

5.3 Multi-dialog actions fetched in an event body, for example.

7 Appendix A: Example Features

    Primitives are defined in terms of their ability to provide
    features.  These actions apply example features should require an amply robust set
    of services to demonstrate a useful set of related dialogs.

5.3.1 Transfer

   The conversation space changes primitives.  They are
    described here briefly. Note that the descriptions of these features
    are non-normative.  Some of these features are used as follows:

        before            after
        { A , B }  -->   { C , B }

   A replaces itself with C.

   To make examples in
    section 6 to demonstrate how some features may require certain media
    relationships.  Note also that this happen using document describes a mixture of
    both features originating in the peer-to-peer approach, "A" would send
   two world of telephones, and features
    which are clearly Internet oriented.

7.1 Example Feature Definitions:

                        SIP requests.  A shorthand for those requests Multiparty Framework

    Call Waiting - Alice is shown below:
        REFER B  Refer-To:C
        BYE B

   To make this happen instead using the 3pcc approach, in a call, then receives another call.
    Alice can place the controller
   sends requests represented by first call on hold, and talk with the shorthand below:
        INVITE C (w/SDP of B)
        reINVITE B (w/SDP of C)
        BYE A

   Features enabled by this action:
   - blind transfer other
    caller.  She can typically switch back and forth between the
    callers.

    Blind Transfer - transfer to Alice is in a central mixer (some type of conference or forking)
                       SIP Multiparty Framework

   - transfer conversation with Bob.  Alice asks
    Bob to park server (park)
   - transfer contact Carol, but makes no attempt to music on hold contact Craol
    independently.  In many implementations, Alice does not verify Bob's
    success or announcement server failure in contacting Carol.

    Attended Transfer - transfer to The transferring party establishes a "queue"
   - session
    with the transfer target before completing the transfer.

    Consultative transfer to a service (such as Voice Dialogs service) - transition from local mixer to central mixer

5.3.2 Take

   The conversation space changes as follows:

        { B , C }  -->   { B , A }

   A forcibly replaces C the transferring party establishes a session
    with itself.  In most uses of this primitive,
   A is just "un-replacing" itself.

   Using the peer-to-peer approach, "A" sends:
        INVITE B  Replaces: <call leg between B target and C>

   Using mixes both sessions together so that all three
    parties can participate, then disconnects leaving the 3pcc approach (all requests sent from controller)
        INVITE A (w/SDP of B)
        reINVITE B (w/SDP of A)
        BYE C

   Features enabled by this action:
   - transferee completes an attended and
    transfer target with an active session.

    Conference Call - retrieve from central mixer (not recommended)
   - retrieve from music on hold Three or park
   - retrieve from queue more active, visible participants in the
    same conversation space.

    Call Park - A call center take
   - voice portal resuming ownership of participant parks a call (essentially puts the
    call on hold), and then retrieves it originated
   - answering-machine style screening (pickup) at a later time (typically from
    another location).

    Call Pickup - pickup of A party picks up a ringing call (i.e. early dialog)

   Note: that pick was ringing at another
    location.  One variation allows the caller to choose which location,
    another variation just picks up of any call in that user's "pickup
    group".

    Music on Hold - When Alice places a ringing call has perhaps some interesting
   additional requirements.  First of all with Bob on hold, it is an early dialog
    replaces its audio with streaming content such as
   opposed to an established dialog.  Secondly the party which is to
   pickup the music,
    announcements, or advertisements.

    Call Monitoring - A call may only wish to do so only while it is center supervisor joins an early
   dialog.  That is in-progress call
    for monitoring purposes.

    Barge-in - Carol interrupts Alice who has a call in-progress call
    with Bob.  In some variations, Alice forcibly joins a new
    conversation with Carol, in other variations, all three parties are
    placed in the race condition where the ringing UA accepts
   just before it receives signaling from the party wishing same conversation (basically a 3-way conference).

    Hotline - Alice picks up a phone and is immediately connected to take the
   call, the taking party wishes
    technical support hotline, for example.

    Autoanswer - Calls to yield a certain address or cancel the take.  The goal
   is to avoid yanking an answered call from the called party.

5.3.3 Add

   The conversation space changes as follows:

        { A , B } -->    { A, B, C }

   A adds C to the conversation.

   Using the peer-to-peer approach, adding location answer
    immediately via a party using local mixing
   requires no signaling.  To transition from speakerphone.

    Intercom - Alice typically presses a 2-party call or button on a
   locally mixed conference phone which
    immediately connects to centrally mixing A could send the
   following requests: another user or phone and casues that phone
    to play her voice over its speaker.  Some variations immediately
    setup two-way communications, other variations require another
    button to be pressed to enable a two-way conversation.

                        SIP Multiparty Framework

        REFER B  Refer-To: mixer
        INVITE mixer
        BYE B

   To add a party to a central mixer:
        REFER C  Refer-To: mixer
               or
        REFER mixer  Refer-To: C

   Using

    Speakerphone paging - Alice calls the 3pcc approach to transition to centrally mixed, paging address and speaks.
    Her voice is played on the
   controller would send:
        INVITE mixer leg 1 (w/SDP of A)
        INVITE mixer leg 2 (w/SDP of B)
        INVITE C (late SDP)
        reINVITE A (w/SDP of mixer leg 1)
        reINVITE B (w/SDP of mixer leg 2)
        INVITE mixer leg3 (w/SDP speaker of C)

   To add a party to every idle phone in a central mixer:
        INVITE C (late SDP)
        INVITE mixer (w/SDP
    preconfigured group of C)

   Features enabled: phones.

    Speed dial - standard conference feature Alice dials an abbreviated number, or enters an alias,
    or presses a special speed dial button representing Bob.  Her action
    is interpreted as if she specified the full address of Bob.

    Call Return - Alice calls Bob.  Bob misses the call recording
   - answering-machine style screening (screening)

5.3.4 Local Join

   The conversation space changes like this:

        { A, B}  , {A, C}  -->  {A, B, C} or like is
    disconnected before he is finished talking to Alice.  Bob invokes
    Call return which calls Alice, even if Alice did not provide her
    real identity or location to Bob.

    Inbound Call Screening - Alice doesn't want to receive calls from
    Matt.  Inbound Screening prevents Matt from disturbing Alice.  In
    some variations this

        { A, B}  , {C, D}  -->  {A, B, C, D}

   A takes two conversation spaces works even if Matt hides his identity.

    Outbound Call Screening - Alice is paged and joins them together into unknowingly calls a
   single space.

   Using the peer-to-peer approach, A can mix locally, or REFER the
   participants of both conversation spaces to the same central mixer
   (as
    PSTN pay-service telephone number in 5.3)

   For the 3pcc approach, the call flows for inserting participants,
   and joining Carribean, but local policy
    blocks her call, and splitting conversation spaces are tedious yet
   straightforward, so these are left as an exercise for the reader.

   Features enabled:
   - standard conference feature possibly informs her why.

    Call Forwarding - leaving Before a sidebar call-leg is accepted it is redirected to rejoin a larger conference

5.3.5 Insert
                       SIP Multiparty Framework

   The conversation space changes like this:

        { B , C }  -->  {A, B, C }

   A inserts itself into a conversation space.

   A proposed mechanism
    another location, for signaling this using example, because the peer-to-peer
   approach originally intended
    recipient is to send a new header in an INVITE with "joining"
   semantics.  For example:
        INVITE B  Join: <call id of B and C>

   If B accepted busy, does not answer, is disconnected from the INVITE, B would accept responsibility
    network, configured all requests to setup the
   call legs go soemwhere else.

    Message Waiting - Bob calls Alice when she steps away from her
    phone, when she returns a visible or audible indicator conveys that
    someone has left her a voicemail message.  The message waiting
    indication may also convey how many messages are waiting, from whom,
    what time, and mixing necessary (for example: other useful pieces of information.

    Do Not Disturb - Alice selects the Do Not Disturb option.  Calls to mix locally
    her either ring briefly or not at all and are forwarded elsewhere.
    Some variations allow specially authorized callers to
   transfer override this
    feature and ring Alice anyway.

    Distinctive ring - Incoming calls have different ring cadences or
    sample sounds depending on the participants From party, the To party, or other
    factors.

    Automatic Callback: Alice calls Bob, but Bob is busy.  Alice would
    like Bob to a central mixer)

   Features enabled:
   - barge-in
   - call center monitoring her automatically when he is available.  When Bob
    hangs up, alice's phone rings. When Alice answers, Bob's phone
    rings.  Bob answers and they talk.

    Find-Me - call recording

5.3.6 Split
   { A, B, C, D } --> { A, B } , { C, D }

   If Alice sets up complicated rules for how she can be reached
    (possibly using [CPL], [presence] or other factors).  When Bob calls
    Alice, his call is eventually routed to a central mixer with peer-to-peer
   REFER C  Refer-To: mixer (new URI)
   REFER D  Refer-To: mixer (new URI)
   BYE C
   BYE D

   Features enabled: temporary Contact where
    Alice happens to be available.

    Whispered call waiting - sidebar conversations during a larger conference

5.3.7 Near-fork

   A participates in two conversation spaces simultaneously:

        { A, B } --> { B , A } & { A , C }

   A Alice is a participant in two a conversation spaces such that A sends the
   same media with Bob.  Carol
    calls Alice.  Either Carol can "whisper" to both spaces, and renders media from both spaces,
   presumably by mixing Alice directly ("Can you
                        SIP Multiparty Framework

    get lunch in 15 minutes?"), or rendering an automaton whispers to Alice
    informing her that Carol is trying to reach her.

    Voice message screening - Bob calls Alice.  Alice is screening her
    calls, so Bob hears Alice's voicemail greeting.  Alice can hear Bob
    leave his message.  If she decides to talk to Bob, she can take the media
    call back from both.  We the voicemail system, otherwise she can
   define that A is let Bob leave
    a message. This emulates the "anchor" point for both forks, each behavior of which is a separate conversation space.

   This action is purely local implementation (it requires no special
   signaling).  Local features such as switching calls between home telephone answering
    machine

    Presence-Enabled Conferencing: Alice wants to set up a conference
    call with Bob and Cathy when they all happen to be available (rather
    than scheduling a predefined time).  The server providing the
   background
    application monitors their status, and foreground calls all three when they are possible using this media
   relationship.

5.3.8 Far fork

   The conversation space diagram...

                       SIP Multiparty Framework

        { A, B } --> { A ,  B } & { B , C }
    all "online", not idle, and not in another call.

    IM Conference Alerts: A requests B to be the "anchor" of two conversation spaces.

   For user receives an example of using 3pcc to setup media forking, see [Media
   forking].  The session descriptions for forking notification as an Instant
    Message whenever someone joins a conference they are quite complex.
   Controllers should verify that endpoints can handle forked-media, by
   using some type also in.

    Single Line Extension -- A group of Requires header token.

   Two ways to setup this media relationship using peer-to-peer call
   control have been proposed:
   - the anchor receives a REFER with requires forked-media (implicit)
   - the anchor receives an INVITE with Fork-with header (explicit)

   Features enabled:
   - barge-in
   - voice portal services
   - whisper
   - hotword detection
   - sending DTMF somewhere else

   The above notation does not fully describe the media topology. Below phones are the four possible media topologies by which C might want to join
   the  A-B dialog.  For some all treated as
    "extensions" of the above listed features there is a
   requirement to be able to specify any of these media  topologies single line. A call for one rings them all.  As
    soon as
   part of joining. In addition it one answers, the others stop ringing.  If any extension is also
    actively in a requirement that it be
   possible to change coversation, another extension can "pick up" and
    immediately join the media topology after conversation. This emulates the initial setup (e.g.
   in a reINVITE).  An example behavior of this is a silent monitored
   conversation which is modified to be
    home telephone line with multiple phones.

    Click-to-dial - Alice looks in her company directory for Bob.  When
    she finds Bob, she clicks on a full fledged conference URL to
   allow a call center supervisor to converse with the customer.

   The media topology can be separated into two perspectives.  The
   topology for the send him.  Her phone rings (or
    possibly answers automatically), and receive media streams when she answers, Bob's phone
    rings.

    Pre-paid calling - Alice pays for C.  For each of
   these streams C needs the ability to specify either point to point a certain currency or mixed media.  This works unit amount
    of calling value.  When she places a call, she provides her account
    number somehow.  If her account runs out of calling value during a
    call her call is disconnected or redirected to the matrix a service where the ˘send÷
   column indicates what happens with the media from C at B. The
   ˘receive÷ column indicates what C wants she
    can purchase more calling value.

    Voice Portal - A service that allows users to receive (mix or only BĂs
   media).  In the greater than 3 party case theoretically this cold be
   generalized access a portal site
    using spoken dialog interaction.  For example, Alice needs to specify the set for the mix, however, from
    schedule a
   pragmatic perspective the authors feel it is sufficient working dinner with her co-worker Carol. Alice uses a
    voice portal to constrain
   the description check Carol's flight schedule, find a restauraunt
    near her hotel, make a reservation, get directions there, and page
    Carol with this information.

7.2 Implementation of the sets to all or nothing for now (i.e. point to
   point or max of all).

     Send    Receive
   1 Pt2pt   mix
   2 mix     mix
   3 Pt2pt   Pt2pt
   4 mix     Pt2pt

   For following examples:
   A is the customer
   B is the agent
   C is the supervisor
                       SIP Multiparty Framework

   => and <= indicate the direction of media flow

   1. Send: point to point, Receive: mix
   Example application: silent monitoring or coaching
   A <= B  (point to point, only B hears C)
   A => B
   (A+B) => C (C gets mix of A + B)
   B <= C

   2. Send: mix, Receive: mix
   Example application: Normal Conference
   A <= (B+C) (mix, A gets mix of B+C)
   A => B
   (A+B) => C (C gets mix of A + B)
   B <= C

   3. Send: point to point, Receive: point to point
   Example application: Whisper/Sidebar
   A <= B (point to point, only B hears C)
   A => B
   B => C (point to point, C hears only B)
   B <= C

   4. Send: mix, Receive: point to point
   Example application: Recorded Conversation
   C ű Voice Recorder
   A <= B (point to point, only B hears C)
   A => B
   (A+B) => C (C gets mix of A + B)
   B <= C

6 Putting it all together

   These example features should require an amply robust set of
   services to demonstrate a useful set of primitives.  A summary of
   these features is listed below.  Implementation of features with an
   asterisk (*) are described briefly in Section 6.1.

   Example Features:
   Call Hold                  [Offer/Answer] these features

    Example Features:
    Call Hold                  [Offer/Answer] for SIP
    Call Waiting                      Local Implementation
    Blind Transfer             [cc-transfer]
    Attended Transfer          [cc-transfer]
    Consultative transfer      [cc-transfer]
    Conference Call            [conf-models]
                        SIP Multiparty Framework

    Call Park                  *[examples]
    Call Pickup                *[examples]
    Music on Hold              *[examples]
    Call Monitoring            *Insert
    Barge-in                   *Insert or Far-Fork
    Hotline                    Local Implementation
    Autoanswer                 Local URI convention
    Speed dial                 Local Implementation
                       SIP Multiparty Framework
    Intercom                   *Speed dial + autoanswer
    Speakerphone paging        *Speed dial + autoanswer
    Call Return                Proxy feature
    Inbound Call Screening     Proxy or Local implementation
    Outbound Call Screening    Proxy feature
    Call Forwarding            Proxy or Local implementation
    Message Waiting            [msg-waiting]
    Do Not Disturb             [presence]
    Distinctive ring           *Proxy or Local implementation
    Automatic Callback         2 person presence-based conference
    Find-Me                    Proxy service based on presence
    Whispered call waiting     Local implementation
    Voice message screening    *
    Presence-based Conferencing*call when presence = available
    IM Conference Alerts       subscribe to conference status
    Single Line Extension      *
    Click-to-dial              *
    Pre-paid calling           *
    Voice Portal                      *

6.1 Feature Solutions

   The following sections illustrates how some of the primitives can be
   put together to build some powerful and interesting features.

6.1.1

7.2.1 Call Park

    Call park requires the ability to: put a dialog some place,
   advertise it to users in a pickup group and to uniquely identify it
   in a means that can be communicated (including human voice).  The
   dialog can be held locally on the UA parking the dialog or
   alternatively transferred to the park service for the pickup group.
   The parked dialog then needs to be labeled (e.g. orbit 12) in a way
   that can be communicated to the party that is to pick up the call.
   The UAs in the pick up group discovers the parked dialog(s) via
   [call-leg] from the park service.  If the dialog is parked locally
   the park service merely aggregates the parked call states from the
   set of UAs in the pickup up group.

6.1.2 Call Pickup

   There are two different features which are called call pickup.  The
   first is the pickup of a parked dialog.  The UA from which the
   dialog is to be picked up subscribes to the call state [call-leg] of
   the park service or the UA which has locally parked the dialog.
   Dialogs which are parked should be labeled with an identifier.  The
   labels are used by the UA to allow the user to indicate which dialog
   is to be picked up.  The UA picking up the call invoked the URL in
   the call state which is labeled as replace-remote.

   The other call pickup feature involves picking up an early dialog
   (typically ringing).  This feature uses some of the same primitives
   as the pick up of a parked call.  The call state of the UA ringing
                       SIP Multiparty Framework

   phone is advertised using [call-leg].  The UA which is to pickup the
   early dialog subscribes either directly to the ringing UA or to a
   service aggregating the states for UAs in the pickup group.  The
   call state identifies early dialogs.  The UA uses the call state(s)
   to help the user choose which early dialog that is to be picked up.
   The UA then invokes the URL in the call state labeled as replace-
   remote.

6.1.3 Music on Hold

   Music on hold can be implemented a number of ways.  One way is to
   transfer the held call to a holding service.  When the UA wishes to
   take the call off hold it basically performs a take on the call from
   the holding service.  This involves subscribing to call state on the
   holding service and then invoking the URL in the call state labeled
   as replace-remote.

   Alternatively music on hold can be performed as a local mixing
   operation.  The UA holding the call can mix in the music from the
   music service via RTP (i.e. an additional dialog) or RTSP or other
   streaming media source.  This approach is simpler (i.e. the held
   dialog does not move so there is less chance of loosing them) from to: put a
   protocol perspective, however dialog some place,
    advertise it does use more LAN bandwidth to users in a pickup group and
   resources on the UA.

6.1.4 Call Monitoring

   Call monitoring is to uniquely identify it
    in a [join] operation. means that can be communicated (including human voice).  The monitoring
    dialog can be held locally on the UA sends a
   Join parking the dialog or
    alternatively transferred to the park service for the pickup group.
    The parked dialog it wants then needs to listen to.  It be labeled (e.g. orbit 12) in a way
    that can be communicated to the party that is able to discover pick up the dialog call.
    The UAs in the pick up group discovers the parked dialog(s) via
    [call-leg] from the park service.  If the dialog is parked locally
    the park service merely aggregates the parked call state [call-leg] on states from the monitored UA.  The
   monitoring UA sends SDP
    set of UAs in the INVITE pickup up group.

7.2.2 Call Pickup

    There are two different features which indicates receive only
   media {offer/answer].  IN addition the monitoring UA should indicate
   that it wants to receive a mix (see Error! Reference source not
   found.).  As the UA are called call pickup.  The
    first is monitoring only it does not matter whether the pickup of a parked dialog.  The UA indicates it wishes from which the send stream
    dialog is to be mix or point picked up subscribes to point.

6.1.5 Barge-in

   Barge-in works the same as call monitoring except that it must
   indicate that the send media stream to be mixed so that all state [call-leg] of
    the
   other parties can hear the stream from UA barging in.

6.1.6 Intercom

   The park service or the UA initiates a dialog using INVITE in which has locally parked the ordinary way [bis]. dialog.
    Dialogs which are parked should be labeled with an identifier.  The calling UA then signals
    labels are used by the paged UA to answer allow the call. user to indicate which dialog
    is to be picked up.  The
   calling UA may discover picking up the call invoked the URL to answer in
    the call via state which is labeled as replace-remote.

                        SIP Multiparty Framework

    The other call pickup feature involves picking up an early dialog
    (typically ringing).  This feature uses some of the same primitives
    as the pick up of a parked call.  The call state [call-leg] of the called UA. UA ringing
    phone is advertised using [call-leg].  The called UA accepts which is to pickup the INVITE
   with a 200 Ok and automatically enables
    early dialog subscribes either directly to the speakerphone.

   Alternatively this can be ringing UA or to a local decision
    service aggregating the states for UAs in the pickup group.  The
    call state identifies early dialogs.  The UA uses the call state(s)
    to answer
   based upon called party identification.

                       SIP Multiparty Framework

6.1.7 Speakerphone paging

   Speakerphone paging help the user choose which early dialog that is to be picked up.
    The UA then invokes the URL in the call state labeled as replace-
    remote.

7.2.3 Music on Hold

    Music on hold can be implemented using either multicast or
   through a simple multipoint mixer.  In number of ways.  One way is to
    transfer the multicast solution held call to a holding service.  When the
   paging UA sends wishes to
    take the call off hold it basically performs a multicast INVITE [bis] with send only media in take on the call from
    the
   [SDP] (see also [offer/answer]).  The automatic answer holding service.  This involves subscribing to call state on the
    holding service and enabling
   of then invoking the speakerphone is a locally configured decision on URL in the paged
   UAs. call state labeled
    as replace-remote.

    Alternatively music on hold can be performed as a local mixing
    operation.  The paging UA sends RTP via holding the multicast address indicated call can mix in the SDP.

   The multipoint solution is accomplished by sending music from the
    music service via RTP (i.e. an INVITE to additional dialog) or RTSP or other
    streaming media source.  This approach is simpler (i.e. the
   multipoint mixer.  The mixer held
    dialog does not move so there is configured to automatically answer less chance of loosing them) from a
    protocol perspective, however it does use more LAN bandwidth and
    resources on the dialog. UA.

7.2.4 Call Monitoring

    Call monitoring is a [join] operation.  The paging monitoring UA then sends [REFER] requests for each of
   the UAs that are a
    Join to become paging speakers (The UA is likely the dialog it wants to send
   out a single REFER which listen to.  It is parallel forked by the proxy server).
   The UAs performing as paging speakers are configured able to
   automatically answer based upon caller identification (e.g. To
   field, URI or Referred-To headers).

6.1.8 Distinctive ring

   The target UA either makes a local decision based on information in
   an incoming INVITE (To, From, Contact, Request-URI) or trusts an
   Alert-Info header provded by discover
    the caller or inserted by a trusted
   proxy.  In dialog via the latter case, call state [call-leg] on the monitored UA.  The
    monitoring UA fetches the content described sends SDP in the URI (typically via http) and renders INVITE which indicates receive only
    media {offer/answer].  IN addition the monitoring UA should indicate
    that it wants to receive a mix (see Error! Reference source not
    found.).  As the user.

6.1.9 Voice message screening

   At first, this UA is monitoring only it does not matter whether
    the UA indicates it wishes the send stream be mix or point to point.

7.2.5 Barge-in

    Barge-in works the same as call monitoring.  In this case monitoring except that it must
    indicate that the
   voicemail service is one send media stream to be mixed so that all of the UAs.
    other parties can hear the stream from UA barging in.

7.2.6 Intercom

    The UA screening the message
   monitors initiates a dialog using INVITE in the call on ordinary way [bis].
    The calling UA then signals the voicemail service, and also subscribes paged UA to
   call-leg information.  If answer the user screening their messages decides
   to answer, they perform a Take from call.  The
    calling UA may discover the voicemail system (for
   example, send an INVITE with Replaces URL to answer the UA leaving call via the message)

6.1.10 Single Line Extension

   Incoming calls ring all call
    state [call-leg] of the extensions through basic parallel
   forking [bis].  Each extension subscribes to call-leg events from
   each other extension.  While one user has an active call, any other called UA. The called UA extension can insert itself into that conversation (it already
   knows the call-leg information)in accepts the same way as barge-in.

6.1.11 Click-to-dial

   The application or server which hosts INVITE
    with a 200 Ok and automatically enables the click-to-dial application
   captures speakerphone.

                        SIP Multiparty Framework

    Alternatively this can be a local decision for the URL UA to be dialed and answer
    based upon called party identification.

7.2.7 Speakerphone paging

    Speakerphone paging can setup the call be implemented using 3pcc either multicast or
   can
    through a simple multipoint mixer.  In the multicast solution the
    paging UA sends a multicast INVITE [bis] with send only media in the
    [SDP] (see also [offer/answer]).  The automatic answer and enabling
    of the speakerphone is a [REFER] request to locally configured decision on the paged
    UAs.  The paging UA which sends RTP via the multicast address indicated in
    the SDP.

    The multipoint solution is accomplished by sending an INVITE to dial the address.
   As users sometimes change their mind or wish to give up listing
    multipoint mixer.  The mixer is configured to a
   ringing or voicemail answered phone, this application illustrates automatically answer
    the need to also have dialog.  The paging UA then sends [REFER] requests for each of
    the ability UAs that are to remotely hangup a call.

                       SIP Multiparty Framework

6.1.12 Pre-paid calling

   For prepaid calling, the user's media always passes through become paging speakers (The UA is likely to send
    out a device single REFER which is trusted parallel forked by the pre-paid provider.  This may be the other
   endpoint (for example a PSTN gateway).  In either case, an
   intermediary proxy server).
    The UAs performing as paging speakers are configured to
    automatically answer based upon caller identification (e.g. To
    field, URI or B2BUA can periodically verify the amount of
   time available Referred-To headers).

7.2.8 Distinctive ring

    The target UA either makes a local decision based on information in
    an incoming INVITE (To, From, Contact, Request-URI) or trusts an
    Alert-Info header provded by the pre-paid account, and use the session-timer
   extension to cause the trusted endpoint (gateway) caller or intermediary
   (media relay) to send inserted by a reINVITE before that time runs out.  During trusted
    proxy.  In the reINVITE, latter case, the SIP intermediary can reverify UA fetches the account and
   insert another session-timer header.

   Note that while most pre-paid systems on content described in
    the PSTN use an IVR URI (typically via http) and renders it to
   collect the account number and destination, user.

7.2.9 Voice message screening

    At first, this isn't strictly
   necessary for a SIP-originated prepaid call.  SIP requests and SIP
   URIs are sufficiently expressive to convey is the final destination, same as call monitoring.  In this case the provider
    voicemail service is one of the prepaid service, UAs.  The UA screening the location from which message
    monitors the
   user is calling, and call on the prepaid account they want voicemail service, and also subscribes to use.
    call-leg information.  If a
   pre-paid IVR is used, the mechanism described below (Voice Portals)
   can be combined as well.

6.1.13 Voice Portal

   A voice portal is essentially user screening their messages decides
    to answer, they perform a complex collection of voice dialogs
   used Take from the voicemail system (for
    example, send an INVITE with Replaces to access interesting content.  One of the most desirable call
   control features of a Voice Portal is UA leaving the ability message)

7.2.10 Single Line Extension

    Incoming calls ring all the extensions through basic parallel
    forking [bis].  Each extension subscribes to start a new
   outgoing call call-leg events from within
    each other extension.  While one user has an active call, any other
    UA extension can insert itself into that conversation (it already
    knows the context of call-leg information)in the Portal (to make a
   restauraunt reservation, same way as barge-in.

7.2.11 Click-to-dial

    The application or return a voicemail message for example).
   Once server which hosts the new call is over, click-to-dial application
    captures the user should be able to return URL to be dialed and can setup the
   Portal by pressing a special key, call using some DTMF sequence (ex: a
   very long pound or hash tone), 3pcc or by speaking
    can send a hotword (ex: "Main
   Menu").

   In order [REFER] request to accomplish this, the Voice Portal starts with the
   following media relationship:

       { User , Voice Portal }

   The user then asks UA which is to make an outgoing call.  The Voice Portal asks dial the User address.
    As users sometimes change their mind or wish to give up listing to perform a Far-Fork.  In other words the Voice Portal
   wants the following media relationship:

       { Target , User }  &  { User , Voice Portal }

   The Voice Portal is now just listening for a hotword
                        SIP Multiparty Framework

    ringing or voicemail answered phone, this application illustrates
    the
   appropriate DTMF.  As soon as the user indicates they are done, the
   Voice Portal Takes the call from the old Target, and we are back need to
   the original media relationship.

   This feature can also be used by have the account number and phone number
   collection menu in ability to remotely hangup a pre-paid call.

7.2.12 Pre-paid calling service.  A user can press

    For prepaid calling, the user's media always passes through a
   DTMF sequence device
    which presents them with is trusted by the pre-paid provider.  This may be the other
    endpoint (for example a
                       SIP Multiparty Framework

7 Security Considerations

   Call Control primitives provide a powerful set of features that PSTN gateway).  In either case, an
    intermediary proxy or B2BUA can
   be dangerous in periodically verify the hands amount of an attacker.  To complicate matters,
   call control primitives are likely
    time available on the pre-paid account, and use the session-timer
    extension to be automatically authorized
   without direct human oversight.

   The class of attacks which are possible using these tools include cause the ability trusted endpoint (gateway) or intermediary
    (media relay) to eavesdrop on calls, disconnect calls, redirect calls,
   render irritating content (including ringing) at send a user agent, cause
   an action reINVITE before that has billing consequences, subvert billing (theft-of-
   service), time runs out.  During
    the reINVITE, the SIP intermediary can reverify the account and obtain private information.  Call control extensions
   must take extra care
    insert another session-timer header.

    Note that while most pre-paid systems on the PSTN use an IVR to describe how these attacks will be
   prevented.

   We can also make some general observations about authorization
    collect the account number and
   trust with respect destination, this isn't strictly
    necessary for a SIP-originated prepaid call.  SIP requests and SIP
    URIs are sufficiently expressive to call control.  The security model is
   dramatically dependent on convey the signaling model chosen (see section
   4.2)

   Let us first examine final destination,
    the security model used in provider of the 3pcc approach.
   All signaling goes through prepaid service, the controller, location from which is a trusted
   entity.  Traditional SIP authentication and hop-by-hop encrpytion
   and message integrity work fine in this environment, but end-to-end
   encrpytion and message integrity may not be possible.

   When using the peer-to-peer approach, call control actions
    user is calling, and
   primitives can be legitimately initiated by a) an existing
   participant in the conversation space, b) prepaid account they want to use.  If a former participant in
    pre-paid IVR is used, the conversation space, or c) an entity trusted by one mechanism described below (Voice Portals)
    can be combined as well.

7.2.13 Voice Portal

    A voice portal is essentially a complex collection of voice dialogs
    used to access interesting content.  One of the
   participants.  For example, a participant always initiates most desirable call
    control features of a
   transfer; Voice Portal is the ability to start a retrieve new
    outgoing call from Park (a take) is initiated on behalf within the context of the Portal (to make a
   former participant; and a barge-in (insert
    restauraunt reservation, or far-fork) is initiated
   by return a trusted entity (an operator voicemail message for example).

   Authenticating requests by an existing participant or a trusted
   entity can be done with baseline SIP mechanisms.  In
    Once the case of
   features initiated by a former participant, these new call is over, the user should be
   protected against replay attacks able to return to the
    Portal by pressing a special key, using some DTMF sequence (ex: a unique name
    very long pound or
   identifier per invocation.  The Replaces header exhibits this
   behavior as a by-product of its operation (once a Replaces operation
   is successful, the call-leg being Replaced no longer exists).  For
   other requests, hash tone), or by speaking a "one-time" Request-URI may be provided hotword (ex: "Main
    Menu").

    In order to accomplish this, the
   feature invoker.

   To authorize call control primitives that trigger special behavior
   (such as an INVITE Voice Portal starts with Replace, Join, or Fork semantics), the
   receiving
    following media relationship:

        { User , Voice Portal }

    The user agent may have trouble finding appropriate
   credentials with which then asks to challenge make an outgoing call.  The Voice Portal asks
    the User to perform a Far-Fork.  In other words the Voice Portal
    wants the following media relationship:

        { Target , User }  &  { User , Voice Portal }

    The Voice Portal is now just listening for a hotword or authorize the request,
    appropriate DTMF.  As soon as the
   sender may be completely unknown to user indicates they are done, the receiver, except through
    Voice Portal Takes the
   introduction of a third party.  These credentials need call from the old Target, and we are back to
    the original media relationship.

                        SIP Multiparty Framework

    This feature can also be passed
   transitively in some way or fetched used by the account number and phone number
    collection menu in an event body, for example. a pre-paid calling service.  A user can press a
    DTMF sequence which presents them with the a

8 References

    [SIP] M. Handley, E. Schooler, and H. Schulzrinne, "SIP: Session
    Initiation Protocol", RFC2543, Internet Engineering Task Force,
    Nov 1998.

   [RFC2026] S Bradner, "The Internet Standards Process -- Revision 3",
   RFC2026 (BCP), IETF, October 1996.

    [RFC2119] S. Bradner, "Key words for use in RFCs to indicate
    requirement     levels," Request for Comments (Best Current
    Practice) 2119, Internet     Engineering Task Force, Mar. 1997.

    [REFER] R. Sparks, "The Refer Method", Internet Draft <draft-ietf-
    sip-refer-02>, IETF, October 30, 2001, Work in progress.

    [3pcc] J. Rosenberg, J. Peterson, H. Schulzrinne, G. Camarillo,
    "Third Party Call Control in SIP", Internet Draft <draft-rosenberg-
    sip-3pcc-02.txt>, IETF;  March 2001.  Work in progress

    [transfer] R. Sparks, "SIP Call Control - Transfer", Internet Draft
    <draft-ietf-sip-cc-transfer-04.txt>, IETF; Feb. 2001. Work in
    progress.

    [Replaces] B. Biggs, R. Dean, R. Mahy, "The SIP Replaces Header",
    Internet Draft <draft-ietf-sip-replaces-00.txt>, IETF, Nov. 2001.
    Work in progress.

    [conf-models]  J. Rosenberg, H. Schulzrinne, "Models for Multi Party
    Conferencing in SIP", Internet Draft <draft-rosenberg-sip-
    conferencing-models-00.txt>, IETF; Nov. 2000. Work in progress.

    [service examples] A. Johnston, R. Sparks, C. Cunningham, S.
    Donovan, K. Summers, "SIP Service Examples" Internet Draft <draft-
    ietf-sip-service-examples-03.txt>, IETF, June 2002, Work in
    progress.

    [Join] R. Mahy, D. Petrie, "The SIP Join and Fork Headers", Internet
    Draft <draft-mahy-sipping-join-and-fork-00.txt>, IETF, November
    2001, Work in progress.

    [RTP] H. Schulzrinne , S. Casner , R. Frederick , V. Jacobson ,
    "RTP: A Transport Protocol for Real-Time Applications", Request for
    Comments (Standards Track)1889, IETF, January 1996

    [SDP] H. Schulzrinne M. Handley, V. Jacobson, "SDP: Session
    Description Protocol", Request for Comments (Standards Track) 2327,
    Internet Engineering Task Force, April 1998
                        SIP Multiparty Framework

    [events] A. Roach, "SIP-Specific Event Notification",Internet Draft
    <draft-ietf-sip-events-03.txt>, IETF, February 2002, Work in
    progress.

                       SIP Multiparty Framework

    [offer/answer] J. Rosenberg, H. Schulzrinne, "An Offer/Answer Model
    with SDP", Internet Draft <draft-ietf-mmusic-sdp-offer-answer-
    01.txt>, IETF, February 21, 2002, Work in progress.

    [caller prefs] J. Rosenberg, "SIP Caller Preferences and Callee
    Capabilities",Internet Draft <draft-ietf-sip-callerprefs-05.txt>,
    IETF, November 21, 2001, Work in progress.

    [msg waiting] R. Mahy, I. Slain, "Message Waiting in SIP",Internet
    Draft <draft-mahy-sip-message-waiting-02.txt>, IETF, July 2001, Work
    in progress.

    [Presence] Rosenberg et al., "SIP Extensions for Presence", Internet
    Draft <draft-ietf-simple-presence-04.txt>, IETF, November 21, 2001,
    Work in progress.

    [visited] D. Oran, H. Schulzrinne, "The Visited Header",Internet
    Draft <>, IETF, date, Work in progress.

    [app components] , "",Internet Draft <>, IETF, date, Work in
    progress.

    [ms-uri] J. Van Dyke, E. Burger, "SIP URI Conventions for Media
    Servers",Internet Draft <draft-burger-sipping-msuri-01.txt>, IETF,
    November 21, 2001, Work in progress.

    [call-pkg] J. Rosenberg, H. Schulzrinne, "SIP Event Packages for
    Call Leg and Conference State", Internet Draft <draft-rosenberg-sip-
    call-package-00.txt>, IETF, July 13, 2001, Work in progress.

    [enum] , "",Internet Draft <>, IETF, date, Work in progress.

    [http]  R. Fielding et al, "Hypertext Transfer Protocol --
    HTTP/1.1", Request for Comments (Standards Track) 2616, Internet
    Engineering Task Force, June 1999

    [rtsp] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming
    Protocol (RTSP)", Request for Comments (Standards Track) 2326,
    Internet Engineering Task Force, April 1998

    [mrcp] S. Shanmugham, P. Monaco, B. Eberman, "MRCP: Media Resource
    Control Protocol", Internet Draft <draft-shanmugham-mrcp-01.txt>,
    IETF, November 20, 2001, Work in progress.

    [VoiceXML] S. McGlashan et al, ˘Voice "Voice Extensible Markup Language
    (VoiceXML) Version 2.0÷, 2.0", W3C Working Draft, 23 October 2001, Work in
    progress.

    [H.323]

   [tel URL]

   [caller-prefs]
                        SIP Multiparty Framework

    [tel URL]

    [caller-prefs]

    [session timer]

    [service context]

    [avt tones]

    [GSM]

    [MPEG2]

    [G.711]

    [H.261]

    [H.450]

    [JTAPI]

    [CSTA]

    [mrcp-sip] , "",Internet Draft <draft-robinson-mrcp-sip-00.txt>,
    IETF, date, Work in progress.

    [distributed full mesh conf]

    [Media forking] M. Shankar, "SIP Forked Media", Internet Draft
    <draft-shankar-sip-forked-media-00.txt>, IETF, Feb. 2001.  Work in
    progress.

    [PHONECTL] R. Dean, Belkind, B. Biggs, "PHONECTL: A Protocol for
    Remote Phone Control", Internet Draft <draft-dean-phonectl-03.txt>,
    IETF, Jan. 2001.  Work in progress.

9 Changes since -00

    - Removed many media-specific references.

    - Condensed discussion on mixing models, and VoiceXML discussion.

    - Moved the sample feature discussion to an Appendix

10
    To Do

    - Add diagrams to section 4.3.1, 4.3.2, 4.3.2 and 4.3.3

    - Convert to XML

    - Fix references
                        SIP Multiparty Framework

    - Propose to move Appendix A (sample features to service flows)

    - Align with terminology with conferencing drafts

    - Show roadmap for related drafts

      Other frameworks and requirements
         Conferencing framework
         Conferencing models
         Framework for markup

      Extensions
         REFER
         Replaces
         Join
         Caller prefs

      Packages
         conference-package
         dialog package

      Usage Drafts
         3pcc
         cc-transfer

      Informational Drafts
         Service flows

    - Define some semantics for authorization rules.  For example one
    could define a dictionary of primitives and/or perhaps define sets
    or classes of these primitives, then configure who is allowed to use
    them

10

11
    Acknowledgments

    Thanks to all who attended the SIP interim meeting in February 2001
    for their support of the ideas behind this document.

11

12
    Author's Addresses

    Rohan Mahy
                       SIP Multiparty Framework
    Cisco Systems
    170 West Tasman Dr, MS: SJC-21/3/3
    Phone: +1 408 526 8570
    Email: rohan@cisco.com

    Ben Campbell
    dynamicsoft
    5100 Tennyson Parkway
    Suite 1200
    Plano, Texas 75024
    Email: bcampbell@dynamicsoft.com
                        SIP Multiparty Framework

    Alan Johnston
    WorldCom
    100 S. 4th Street
    St. Louis, Missouri 63104
    Email: alan.johnston@wcom.com

    Daniel G. Petrie
    Pingtel Corp.
    400 W. Cummings Park
    Suite 2200
    Woburn, MA 01801
    Phone: +1 781 938 5306
    Email: dpetrie@pingtel.com

    Jonathan Rosenberg
    dynamicsoft
    72 Eagle Rock Avenue
    First Floor
    East Hanover, NJ 07936
    Email: jdrosen@dynamicsoft.com

    Robert J. Sparks
    dynamicsoft
    5100 Tennyson Parkway
    Suite 1200
    Plano, TX  75024
    Email: rsparks@dynamicsoft.com

    Full Copyright Statement

    "Copyright (C) The Internet Society (date). All Rights Reserved.
    This document and translations of it may be copied and furnished to
    others, and derivative works that comment on or otherwise explain it
    or assist in its implementation may be prepared, copied, published
    and distributed, in whole or in part, without restriction of any
    kind, provided that the above copyright notice and this paragraph
    are included on all such copies and derivative works. However, this
    document itself may not be modified in any way, such as by removing
    the copyright notice or references to the Internet Society or other
    Internet organizations, except as needed for the purpose of
    developing Internet standards in which case the procedures for
                       SIP Multiparty Framework
    copyrights defined in the Internet Standards process must be
    followed, or as required to translate it into languages other than
    English.

    The limited permissions granted above are perpetual and will not be
    revoked by the Internet Society or its successors or assigns.
    This document and the information contained herein is provided on an
    "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
    TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
    BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
    HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
    MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

                        SIP Multiparty Framework