draft-ietf-sipping-cc-framework-00.txt   draft-ietf-sipping-cc-framework-01.txt 
SIPPING Working Group Mahy/Cisco SIPPING Working Group Mahy/Cisco
Internet Draft Campbell/dynamicsoft Internet Draft Campbell/dynamicsoft
Document: draft-ietf-sipping-cc-framework-00.txt Johnston/Worldcom Document: draft-ietf-sipping-cc-framework-01.txt Johnston/Worldcom
February 2002 Petrie/Pingtel June 2002 Petrie/Pingtel
Rosenberg/dynamicsoft Rosenberg/dynamicsoft
Expires: August 2002 Sparks/dynamicsoft Expires: December 2002 Sparks/dynamicsoft
A Multi-party Application Framework for SIP A Multi-party Application Framework for SIP
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 [RFC2026]. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts documents at any time. It is inappropriate to use Internet- Drafts
as reference material or to cite them other than as "work in as reference material or to cite them other than as "work in
progress." progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
skipping to change at page 2, line 12 skipping to change at page 2, line 12
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" this
document are to be interpreted as described in RFC-2119 [RFC2119]. document are to be interpreted as described in RFC-2119 [RFC2119].
SIP Multiparty Framework SIP Multiparty Framework
Table of Contents Table of Contents
1 Abstract.......................................................1 1 Abstract.......................................................1
2 Conventions used in this document..............................1 2 Conventions used in this document..............................1
3 Motivation and Background......................................4 3 Motivation and Background......................................4
3.1 Goals........................................................4 3.1 Goals........................................................4
3.2 Example Features.............................................6 3.2 Example Features............................................28
4 Key Concepts...................................................9 4 Key Concepts...................................................6
4.1 "Conversation Space" Model...................................9 4.1 "Conversation Space" Model...................................6
4.1.1 Comparison with Related Definitions.......................10 4.1.1 Comparison with Related Definitions........................7
4.2 Signaling Models............................................11 4.2 Signaling Models.............................................7
4.3 Mixing Models...............................................12 4.3 Mixing Models................................................8
4.3.1 (Single) End System Mixing................................12 4.3.1 (Single) End System Mixing.................................9
4.3.2 Centralized Mixing........................................12 4.3.2 Centralized Mixing.........................................9
4.3.3 Multicast and Multi-unicast conferences...................14 4.3.3 Multicast and Multi-unicast conferences...................10
4.4 Conveying Information and Events............................15 4.4 Conveying Information and Events............................11
4.5 Componentization and Decomposition..........................16 4.5 Componentization and Decomposition..........................13
4.5.1 Media Intermediaries......................................17 4.5.1 Media Intermediaries......................................13
4.5.2 Queue Server..............................................18 4.5.2 Queue Server..............................................14
4.5.3 Parking Place.............................................18 4.5.3 Parking Place.............................................14
4.5.4 Announcements and Voice Dialogs...........................19 4.5.4 Announcements and Voice Dialogs...........................14
4.6 Use of URIs.................................................21 4.6 Use of URIs.................................................16
4.6.1 Naming Users in SIP.......................................21 4.6.1 Naming Users in SIP.......................................17
4.6.2 Naming Services with SIP URIs.............................23 4.6.2 Naming Services with SIP URIs.............................18
4.7 Invoker Independence........................................26 4.7 Invoker Independence........................................21
4.8 Billing issues..............................................26 4.8 Billing issues..............................................21
5 Catalog of call control actions and sample features............26 5 Catalog of call control actions and sample features............22
5.1 Early Dialog Actions........................................27 5.1 Early Dialog Actions........................................22
5.1.1 Remote Answer.............................................27 5.1.1 Remote Answer.............................................22
5.1.2 Remote Forward or Put.....................................27 5.1.2 Remote Forward or Put.....................................22
5.1.3 Remote Busy or Error Out..................................27 5.1.3 Remote Busy or Error Out..................................23
5.2 Single Dialog Actions.......................................27 5.2 Single Dialog Actions.......................................23
5.2.1 Remote Dial...............................................28 5.2.1 Remote Dial...............................................23
5.2.2 Remote On and Off Hold....................................28 5.2.2 Remote On and Off Hold....................................23
5.2.3 Remote Hangup.............................................28 5.2.3 Remote Hangup.............................................23
5.3 Multi-dialog actions........................................28 5.3 Multi-dialog actions........................................23
5.3.1 Transfer..................................................28 5.3.1 Transfer..................................................23
5.3.2 Take......................................................29 5.3.2 Take......................................................24
5.3.3 Add.......................................................29 5.3.3 Add.......................................................25
5.3.4 Local Join................................................30 5.3.4 Local Join................................................25
5.3.5 Insert....................................................30 5.3.5 Insert....................................................26
5.3.6 Split.....................................................31 5.3.6 Split.....................................................26
5.3.7 Near-fork.................................................31 5.3.7 Near-fork.................................................26
5.3.8 Far fork..................................................31 5.3.8 Far fork..................................................27
6 Putting it all together.......................................33 6 Putting it all together.............Error! Bookmark not defined.
6.1 Feature Solutions...........................................34 6.1 Feature Solutions.................Error! Bookmark not defined.
6.1.1 Call Park.................................................34 6.1.1 Call Park.................................................32
6.1.2 Call Pickup...............................................34 6.1.2 Call Pickup...............................................32
6.1.3 Music on Hold.............................................35 6.1.3 Music on Hold.............................................33
6.1.4 Call Monitoring...........................................35 6.1.4 Call Monitoring...........................................33
6.1.5 Barge-in..................................................35 6.1.5 Barge-in..................................................33
6.1.6 Intercom..................................................35 6.1.6 Intercom..................................................33
6.1.7 Speakerphone paging.......................................36 6.1.7 Speakerphone paging.......................................34
6.1.8 Distinctive ring..........................................36 6.1.8 Distinctive ring..........................................34
SIP Multiparty Framework SIP Multiparty Framework
6.1.9 Voice message screening...................................36 6.1.9 Voice message screening...................................34
6.1.10 Single Line Extension.....................................36 6.1.10 Single Line Extension.....................................34
6.1.11 Click-to-dial.............................................36 6.1.11 Click-to-dial.............................................34
6.1.12 Pre-paid calling..........................................37 6.1.12 Pre-paid calling..........................................35
6.1.13 Voice Portal..............................................37 6.1.13 Voice Portal..............................................35
7 Security Considerations.......................................38 7 Security Considerations.......................................27
8 References....................................................39 8 References....................................................36
9 Acknowledgments...............................................41 9 Acknowledgments...............................................39
10 Author's Addresses...........................................41 10 Author's Addresses...........................................39
SIP Multiparty Framework SIP Multiparty Framework
3 Motivation and Background 3 Motivation and Background
The Session Initiation Protocol [SIP] was defined for the The Session Initiation Protocol [SIP] was defined for the
initiation, maintenance, and termination of sessions or calls initiation, maintenance, and termination of sessions or calls
between one or more users. However, despite its origins as a large- between one or more users. However, despite its origins as a large-
scale multiparty conferencing protocol, SIP is used today primarily scale multiparty conferencing protocol, SIP is used today primarily
for point to point calls. This two-party configuration is the focus for point to point calls. This two-party configuration is the focus
of the SIP specification and most of its extensions. of the SIP specification and most of its extensions.
skipping to change at page 4, line 28 skipping to change at page 4, line 28
conversation to perceive specific media relationships. In other conversation to perceive specific media relationships. In other
protocols that deal with the concept of calls, this manipulation is protocols that deal with the concept of calls, this manipulation is
known as call control. In addition to its dialog manipulation known as call control. In addition to its dialog manipulation
aspect, "call control" also includes communicating information and aspect, "call control" also includes communicating information and
events related to manipulating calls, including information and events related to manipulating calls, including information and
events dealing with session state and history, conference state, events dealing with session state and history, conference state,
user state, and even message state. user state, and even message state.
3.1 Goals 3.1 Goals
Based on input from the SIP community, the authors compiled the Based on input from the SIP community, the authors compiled the
following set of goals for SIP call control: following set of goals for SIP call control and multiparty
applications:
- Define Primitives, Not Services. Allow for a handful of robust - Define Primitives, Not Services. Allow for a handful of robust
yet simple mechanisms which can be combined to deliver features and yet simple mechanisms which can be combined to deliver features and
services. Throughout this document we refer to these simple services. Throughout this document we refer to these simple
mechanisms as "primitives". Primitives should be sufficiently mechanisms as "primitives". Primitives should be sufficiently
robust that when they are combined they can be used to build lots of robust that when they are combined they can be used to build lots of
services. However, the goal is not to define a provably complete services. However, the goal is not to define a provably complete
set of primitives. Note that while the IETF will NOT standardize set of primitives. Note that while the IETF will NOT standardize
behavior or services, it may define example services for behavior or services, it may define example services for
informational purposes, as in [service examples]. informational purposes, as in [service examples].
skipping to change at page 4, line 57 skipping to change at page 5, line 4
- Signaling Model independent: Support both a central control and a - Signaling Model independent: Support both a central control and a
peer-to-peer feature invocation model (and combinations of the two). peer-to-peer feature invocation model (and combinations of the two).
baseline SIP already supports a centralized control model described baseline SIP already supports a centralized control model described
in [3pcc], and the SIP community has expressed a great deal of in [3pcc], and the SIP community has expressed a great deal of
interest in peer-to-peer or distributed call control. Some such interest in peer-to-peer or distributed call control. Some such
primitives are already defined in [REFER] and [Replaces]. primitives are already defined in [REFER] and [Replaces].
- Mixing Model independent: The bulk of interesting multiparty - Mixing Model independent: The bulk of interesting multiparty
applications involve mixing or combining media from multiple applications involve mixing or combining media from multiple
participants. This mixing can be performed by one or more of the participants. This mixing can be performed by one or more of the
participants, or by a centralized mixing resource. The experience
SIP Multiparty Framework SIP Multiparty Framework
participants, or by a centralized mixing resource. The experience
of the participants should not depend on the mixing model used. of the participants should not depend on the mixing model used.
While most examples in this document refer to audio mixing, the While most examples in this document refer to audio mixing, the
framework applies to any media type. In this context a "mixer" framework applies to any media type. In this context a "mixer"
refers to combining media in an appropriate, media-specific way. refers to combining media in an appropriate, media-specific way.
- Invoker oriented. Only the user who invokes a feature or a service - Invoker oriented. Only the user who invokes a feature or a service
needs to know exactly which service is invoked or why. This is good needs to know exactly which service is invoked or why. This is good
because it allows new services to be created without requiring new because it allows new services to be created without requiring new
primitives from all the participants; and it allows for much simpler primitives from all the participants; and it allows for much simpler
feature authorization policies, for example, when participation feature authorization policies, for example, when participation
skipping to change at page 5, line 45 skipping to change at page 5, line 46
- Make use of SIP headers and SIP event packages to provide SIP - Make use of SIP headers and SIP event packages to provide SIP
entities with information about their environment. These should entities with information about their environment. These should
include information about the status / handling of dialogs on other include information about the status / handling of dialogs on other
user agents, information about the history of other contacts user agents, information about the history of other contacts
attempted prior to the current contact, the status of participants, attempted prior to the current contact, the status of participants,
the status of conferences, user presence information, and the status the status of conferences, user presence information, and the status
of messages. of messages.
- Encourage service decomposition, and design to make use of - Encourage service decomposition, and design to make use of
standard components using well-defined, simple interfaces. Sample standard components using well-defined, simple interfaces. Sample
components include a media mixer, recording service, announcement components include a SIP mixer, recording service, announcement
server, and voice dialog server. (This is not an exhaustive list). server, and voice dialog server. (This is not an exhaustive list).
- Include authentication, authorization, policy, logging, and - Include authentication, authorization, policy, logging, and
accounting mechanisms to allow these primitives to be used safely accounting mechanisms to allow these primitives to be used safely
among mutually untrusted participants. Some of these mechanisms may among mutually untrusted participants. Some of these mechanisms may
be used to assist in billing, but no specific billing system will be be used to assist in billing, but no specific billing system will be
endorsed. endorsed.
- Permit graceful fallback to baseline SIP. Definitions for new SIP - Permit graceful fallback to baseline SIP. Definitions for new SIP
call control extensions/primitives MUST describe a graceful way to call control extensions/primitives MUST describe a graceful way to
fallback to baseline SIP behavior. Support for one primitive MUST fallback to baseline SIP behavior. Support for one primitive MUST
NOT imply support for another primitive. NOT imply support for another primitive.
SIP Multiparty Framework SIP Multiparty Framework
- Do not reinvent traditional models, such as the model used the - There is no desire or goal to reinvent traditional models, such as
H.450 family of protocols, JTAPI, or the CSTA call model. In the the model used the [H.450] family of protocols, [JTAPI], or the
opinion of the authors, these models share more characteristics of [CSTA] call model, as these other models do not share the design
the traditional telephone network than with SIP. As these other goals presented in this document.
models do not share the design goals presented in this document, it
would be a disservice to these other protocols and SIP to try to
shoehorn our new design goals into an existing model.
3.2 Example Features
Primitives are defined in terms of their ability to provide
features. These example features should require an amply robust set
of services to demonstrate a useful set of primitives. They are
described here briefly. Note that the descriptions of these features
are non-normative. Some of these features are used as examples in
section 6 to demonstrate how some features may require certain media
relationships. Note also that this document describes a mixture of
both features originating in the world of telephones, and features
which are clearly Internet oriented.
Example Features:
Call Waiting - Alice is in a call, then receives another call.
Alice can place the first call on hold, and talk with the other
caller. She can typically switch back and forth between the
callers.
Blind Transfer - Alice is in a conversation with Bob. Alice asks
Bob to contact Carol, but makes no attempt to contact Craol
independently. In many implementations, Alice does not verify Bob's
success or failure in contacting Carol.
Attended Transfer - The transferring party establishes a session
with the transfer target before completing the transfer.
Consultative transfer - the transferring party establishes a session
with the target and mixes both sessions together so that all three
parties can participate, then disconnects leaving the transferee and
transfer target with an active session.
Conference Call - Three or more active, visible participants in the
same conversation space.
Call Park - A call participant parks a call (essentially puts the
call on hold), and then retrieves it at a later time (typically from
another location).
Call Pickup - A party picks up a call that was ringing at another
location. One variation allows the caller to choose which location,
another variation just picks up any call in that user's "pickup
group".
SIP Multiparty Framework
Music on Hold - When Alice places a call with Bob on hold, it
replaces its audio with streaming content such as music,
announcements, or advertisements.
Call Monitoring - A call center supervisor joins an in-progress call
for monitoring purposes.
Barge-in - Carol interrupts Alice who has a call in-progress call
with Bob. In some variations, Alice forcibly joins a new
conversation with Carol, in other variations, all three parties are
placed in the same conversation (basically a 3-way conference).
Hotline - Alice picks up a phone and is immediately connected to the
technical support hotline, for example.
Autoanswer - Calls to a certain address or location answer
immediately via a speakerphone.
Intercom - Alice typically presses a button on a phone which
immediately connects to another user or phone and casues that phone
to play her voice over its speaker. Some variations immediately
setup two-way communications, other variations require another
button to be pressed to enable a two-way conversation.
Speakerphone paging - Alice calls the paging address and speaks.
Her voice is played on the speaker of every idle phone in a
preconfigured group of phones.
Speed dial - Alice dials an abbreviated number, or enters an alias,
or presses a special speed dial button representing Bob. Her action
is interpreted as if she specified the full address of Bob.
Call Return - Alice calls Bob. Bob misses the call or is
disconnected before he is finished talking to Alice. Bob invokes
Call return which calls Alice, even if Alice did not provide her
real identity or location to Bob.
Inbound Call Screening - Alice doesn't want to receive calls from
Matt. Inbound Screening prevents Matt from disturbing Alice. In
some variations this works even if Matt hides his identity.
Outbound Call Screening - Alice is paged and unknowingly calls a
PSTN pay-service telephone number in the Carribean, but local policy
blocks her call, and possibly informs her why.
Call Forwarding - Before a call-leg is accepted it is redirected to
another location, for example, because the originally intended
recipient is busy, does not answer, is disconnected from the
network, configured all requests to go soemwhere else.
Message Waiting - Bob calls Alice when she steps away from her
phone, when she returns a visible or audible indicator conveys that
someone has left her a voicemail message. The message waiting
SIP Multiparty Framework
indication may also convey how many messages are waiting, from whom,
what time, and other useful pieces of information.
Do Not Disturb - Alice selects the Do Not Disturb option. Calls to
her either ring briefly or not at all and are forwarded elsewhere.
Some variations allow specially authorized callers to override this
feature and ring Alice anyway.
Distinctive ring - Incoming calls have different ring cadences or
sample sounds depending on the From party, the To party, or other
factors.
Automatic Callback: Alice calls Bob, but Bob is busy. Alice would
like Bob to call her automatically when he is available. When Bob
hangs up, alice's phone rings. When Alice answers, Bob's phone
rings. Bob answers and they talk.
Find-Me - Alice sets up complicated rules for how she can be reached
(possibly using [CPL], [presence] or other factors). When Bob calls
Alice, his call is eventually routed to a temporary Contact where
Alice happens to be available.
Whispered call waiting - Alice is in a conversation with Bob. Carol
calls Alice. Either Carol can "whisper" to Alice directly ("Can you
get lunch in 15 minutes?"), or an automaton whispers to Alice
informing her that Carol is trying to reach her.
Voice message screening - Bob calls Alice. Alice is screening her
calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob
leave his message. If she decides to talk to Bob, she can take the
call back from the voicemail system, otherwise she can let Bob leave
a message. This emulates the behavior of a home telephone answering
machine
Presence-Enabled Conferencing: Alice wants to set up a conference
call with Bob and Cathy when they all happen to be available (rather
than scheduling a predefined time). The server providing the
application monitors their status, and calls all three when they are
all "online", not idle, and not in another call.
IM Conference Alerts: A user receives an notification as an Instant
Message whenever someone joins a conference they are also in.
Single Line Extension -- A group of phones are all treated as
"extensions" of a single line. A call for one rings them all. As
soon as one answers, the others stop ringing. If any extension is
actively in a coversation, another extension can "pick up" and
immediately join the conversation. This emulates the behavior of a
home telephone line with multiple phones.
Click-to-dial - Alice looks in her company directory for Bob. When
she finds Bob, she clicks on a URL to call him. Her phone rings (or
possibly answers automatically), and when she answers, Bob's phone
rings.
SIP Multiparty Framework
Pre-paid calling - Alice pays for a certain currency or unit amount
of calling value. When she places a call, she provides her account
number somehow. If her account runs out of calling value during a
call her call is disconnected or redirected to a service where she
can purchase more calling value.
Voice Portal - A service that allows users to access a portal site
using spoken dialog interaction. For example, Alice needs to
schedule a working dinner with her co-worker Carol. Alice uses a
voice portal to check Carol's flight schedule, find a restauraunt
near her hotel, make a reservation, get directions there, and page
Carol with this information.
4 Key Concepts 4 Key Concepts
4.1 "Conversation Space" Model 4.1 "Conversation Space" Model
This document introduces the concept of an abstract "conversation This document introduces the concept of an abstract "conversation
space" (essentially as a set of participants who believe they are space" (essentially as a set of participants who believe they are
all communicating among one another). Each conversation space all communicating among one another). Each conversation space
contains one or more participants. contains one or more participants.
skipping to change at page 10, line 5 skipping to change at page 6, line 52
Participants may also be active or passive. Active participants are Participants may also be active or passive. Active participants are
expected to be intelligent enough to leave a conversation space when expected to be intelligent enough to leave a conversation space when
they no longer desire to participate. (An attentive human they no longer desire to participate. (An attentive human
participant is obviously active.) Some robotic participants (such participant is obviously active.) Some robotic participants (such
as a voice messaging system, an instant messaging agent, or a voice as a voice messaging system, an instant messaging agent, or a voice
dialog system) may be active participants if they can leave the dialog system) may be active participants if they can leave the
conversation space when there is no human interaction. Other robots conversation space when there is no human interaction. Other robots
(for example our tone generating robot from the previous example) (for example our tone generating robot from the previous example)
are passive participants. A human participant "on-hold" is passive. are passive participants. A human participant "on-hold" is passive.
SIP Multiparty Framework
An example diagram of a conversation space can be shown as a An example diagram of a conversation space can be shown as a
"bubble" or ovals, or as a "set" in curly or square brace notation. "bubble" or ovals, or as a "set" in curly or square brace notation.
Each set, oval, or "bubble" represents a conversation space. Hidden Each set, oval, or "bubble" represents a conversation space. Hidden
participants are shown in lowercase letters. participants are shown in lowercase letters.
{ A , B } [ A , B ] { A , B } [ A , B ]
SIP Multiparty Framework
.-. .---. .-. .---.
/ \ / \ / \ / \
/ A \ / A b \ / A \ / A b \
( ) ( ) ( ) ( )
\ B / \ C D / \ B / \ C D /
\ / \ / \ / \ /
'-' '---' '-' '---'
4.1.1 Comparison with Related Definitions 4.1.1 Comparison with Related Definitions
skipping to change at page 11, line 5 skipping to change at page 7, line 51
A locally mixed three-way call is two sessions and two call- A locally mixed three-way call is two sessions and two call-
legs. It is also a single conversation space. legs. It is also a single conversation space.
A simple dial-in audio conference is a single conversation A simple dial-in audio conference is a single conversation
space, but is represented by as many call-legs and sessions as space, but is represented by as many call-legs and sessions as
there are human participants. there are human participants.
A multicast conference is a single conversation space, a single A multicast conference is a single conversation space, a single
session, and as many call-legs as participants. session, and as many call-legs as participants.
SIP Multiparty Framework
4.2 Signaling Models 4.2 Signaling Models
Obviously to make changes to a conversation space, you must be able Obviously to make changes to a conversation space, you must be able
to use SIP signaling to cause these changes. Specifically there to use SIP signaling to cause these changes. Specifically there
must be a way to manipulate SIP dialogs (call legs) to move must be a way to manipulate SIP dialogs (call legs) to move
participants into and out of conversation spaces. Although this is participants into and out of conversation spaces. Although this is
not as obvious, there also must be a way to manipulate SIP dialogs not as obvious, there also must be a way to manipulate SIP dialogs
to include non-participant user agents which are otherwise involved to include non-participant user agents which are otherwise involved
SIP Multiparty Framework
in a conversation space (ex: B2BUAs, 3pcc controllers, mixers, in a conversation space (ex: B2BUAs, 3pcc controllers, mixers,
transcoders, translators, or relays). transcoders, translators, or relays).
Implementations may setup the media relationships described in the Implementations may setup the media relationships described in the
conversation space model using the approach described in [3pcc]. The conversation space model using the approach described in [3pcc]. The
3pcc approach relies on only the following 3 primitive operations: 3pcc approach relies on only the following 3 primitive operations:
Create a new call-leg (INVITE) Create a new call-leg (INVITE)
Modify a call-leg (reINVITE) Modify a call-leg (reINVITE)
Destroy a call-leg (BYE) Destroy a call-leg (BYE)
skipping to change at page 12, line 5 skipping to change at page 8, line 50
Replace an existing dialog Replace an existing dialog
Join a new dialog with an existing dialog [Join] Join a new dialog with an existing dialog [Join]
Fork a new dialog with an existing dialog Fork a new dialog with an existing dialog
Locally do media forking (multi-unicast) Locally do media forking (multi-unicast)
Ask another UA to send a request on your behalf Ask another UA to send a request on your behalf
Many of the features, primitives, and actions described in this Many of the features, primitives, and actions described in this
document also require some type of media mixing, combining, or document also require some type of media mixing, combining, or
selection as described in the next section. selection as described in the next section.
SIP Multiparty Framework
4.3 Mixing Models 4.3 Mixing Models
SIP permits a variety of mixing models, which are discussed here SIP permits a variety of mixing models, which are discussed here
briefly. This topic is discussed more thoroughly in [conf-models]. briefly. This topic is discussed more thoroughly in [conf-models].
For brevity, only the two most popular conferencing models are For brevity, only the two most popular conferencing models are
significantly discussed in this document (local and centralized significantly discussed in this document (local and centralized
mixing). Applications of the conversation spaces model to multicast mixing). Applications of the conversation spaces model to multicast
and multi-unicast (full unicast mesh) conferences are left as an and multi-unicast (full unicast mesh) conferences are left as an
exercise for the reader. Note that a distributed full mesh exercise for the reader. Note that a distributed full mesh
conference can be used for basic conferences, but does not easily conference can be used for basic conferences, but does not easily
SIP Multiparty Framework
allow for more complex conferencing actions like splitting, joining, allow for more complex conferencing actions like splitting, joining,
and forking. and forking.
Call control features should be designed to allow a mixer (local or Call control features should be designed to allow a mixer (local or
centralized) to decide when to reduce a conference back to a 2-party centralized) to decide when to reduce a conference back to a 2-party
call, or drop all the participants (for example if only two call, or drop all the participants (for example if only two
automatons are communicating). The actual heuristics used to automatons are communicating). The actual heuristics used to
release calls are beyond the scope of this document, but may depend release calls are beyond the scope of this document, but may depend
on properties in the conversation space, such as the number of on properties in the conversation space, such as the number of
active, passive, or hidden participants; and the send-only, receive- active, passive, or hidden participants; and the send-only, receive-
skipping to change at page 12, line 39 skipping to change at page 9, line 28
4.3.1 (Single) End System Mixing 4.3.1 (Single) End System Mixing
The first model we call "end system mixing". In this model, user A The first model we call "end system mixing". In this model, user A
calls user B, and they have a conversation. At some point later, A calls user B, and they have a conversation. At some point later, A
decides to conference in user C. To do this, A calls C, using a decides to conference in user C. To do this, A calls C, using a
completely separate SIP call. This call uses a different Call-ID, completely separate SIP call. This call uses a different Call-ID,
different tags, etc. There is no call set up directly between B and different tags, etc. There is no call set up directly between B and
C. No SIP extension or external signaling is needed. A merely C. No SIP extension or external signaling is needed. A merely
decides to locally join two call-legs. decides to locally join two call-legs.
[diagram] B C
\ /
\ /
A
A receives media streams from both B and C, and mixes them. A sends A receives media streams from both B and C, and mixes them. A sends
a stream containing A's and C's streams to B, and a stream a stream containing A's and C's streams to B, and a stream
containing A's and B's streams to C. Basically, user A handles both containing A's and B's streams to C. Basically, user A handles both
signaling and media mixing. B and C are unaware of the multi-party signaling and media mixing.
call, from a SIP perspective at least. From an RTP perspective, A is
a mixer, and so the RTCP reports from A will contain SDES
information that indicates the existence of an additional party in
the media stream.
4.3.2 Centralized Mixing 4.3.2 Centralized Mixing
In a centralized mixing model, all participants have a pairwise SIP In a centralized mixing model, all participants have a pairwise SIP
and media relationship with the mixer. Three applications of and media relationship with the mixer. Three applications of
centralized mixing are also discussed below. centralized mixing are also discussed below.
[diagram] [diagram]
4.3.2.1 Dial-In Conference Servers 4.3.2.1 Dial-In Conference Servers
SIP Multiparty Framework
Dial-In conference servers closely mirror dial-in conference bridges Dial-In conference servers closely mirror dial-in conference bridges
in the traditional PSTN. A dial-in conference server acts as a in the traditional PSTN. A dial-in conference server acts as a
normal SIP UA. Users call it, and the server maintains point to normal SIP UA. Users call it, and the server maintains point to
point SIP relationships with each user that calls in. The server point SIP relationships with each user that calls in. The server
takes the media from the users who dial into the same conference, takes the media from the users who dial into the same conference,
mixes them, and sends out the appropriate mixed stream to each mixes them, and sends out the appropriate mixed stream to each
participant separately. The model is depicted in Figure 3. Note that participant separately.
each UA (A,B,C,D) has a point to point SIP and RTP relationship with
the conference server. Each call has a different Call-ID. Each user
sends their own media to the server. The media delivered to user A
by the server is the media mixed from users B, C and D. The media
delivered to user B by the server is the media mixed from users A, C
and D. The media delivered to user C by the server is the media
mixed from users A, B and D. The media delivered to user D is the
media mixed from users A, B and C (this is also known as a mix-minus
configuration).
As in other applications of centralized mixing, the conference is As in other applications of centralized mixing, the conference is
identified by the request URI of the calls from each participant. identified by the request URI of the calls from each participant.
This provides numerous advantages from a services and routing point This provides numerous advantages from a services and routing point
of view [9]. For example, one conference on the server might be of view. For example, one conference on the server might be known as
known as sip:conference34@servers.com. All users who call SIP Multiparty Framework
sip:conference34@servers.com. All users who call
sip:conference34@servers.com are mixed together. Dial-In conference sip:conference34@servers.com are mixed together. Dial-In conference
servers are usually associated with pre-arranged conferences. servers are usually associated with pre-arranged conferences.
However, the same model applies to ad-hoc conferences. An ad-hoc However, the same model applies to ad-hoc conferences. An ad-hoc
conference server creates the conference state when the first user conference server creates the conference state when the first user
joins, and destroys it when the last one leaves. The SIP and RTP joins, and destroys it when the last one leaves. The SIP interface
interfaces are identical to the pre-arranged case. is identical to the pre-arranged case.
4.3.2.2 Ad-hoc Centralized Conferences 4.3.2.2 Ad-hoc Centralized Conferences
In an ad-hoc centralized conference, two users A and B start with a In an ad-hoc centralized conference, two users A and B start with a
normal SIP call. At some point later, they decide to add a third normal SIP call. At some point later, they decide to add a third
party. Instead of using end system mixing, they would prefer to use party. Instead of using end system mixing, they would prefer to use
a conference server. Initially, A calls B. At some point, B decides a central SIP mixer. Initially, A calls B. At some point, B decides
to add user C to the call, and begins the transition to a conference to add user C to the call, and begins the transition to a conference
server. The first step in this process is the discovery of a server. The first step in this process is the discovery of a
conference server that supports ad-hoc conferences. This can be done conference server that supports ad-hoc conferences. This can be done
through static configuration, or through any of a number of standard through static configuration, or through any of a number of standard
service discovery protocols, such as the Service Location Protocol service discovery protocols, such as the Service Location Protocol
[SLP]. Once the server is discovered, a conference ID is chosen. [SLP]. Once the server is discovered, a conference ID is chosen. The
This ID must be globally unique. The conference ID is then prepended first participant to send an INVITE to this URL creates the initial
to the server, and a SIP URL for the ad-hoc conference is formed. conference state in the server. SIP dialogs are manipulated (using
For example, if the server "a.servers.com" is used, and the unique any combination of 3pcc or peer-to-peer signaling) so that each
ID is "a7hytaskp09878a", the SIP URL for this conference is participant is sending media to the conference server. It is also
sip:a7hytaskp09878a@a.servers.com. The first participant to send an possible to transition from a end system mixed conference (even one
INVITE to this URL creates the initial conference state in the with a complex connection topology), to a centralized conference
server. SIP dialogs are manipulated (using any combination of 3pcc server.
or peer-to-peer signaling) so that each participant is sending media
to the conference server. It is also possible to transition from a
end system mixed conference (even one with a complex connection
topology), to a centralized conference server.
SIP Multiparty Framework
4.3.2.3 Dial-Out Conferences 4.3.2.3 Dial-Out Conferences
Dial-out conferences are a simple variation on dial-in conferences. Dial-out conferences are a simple variation on dial-in conferences.
Instead of the users joining the conference by sending an INVITE to Instead of the users joining the conference by sending an INVITE to
the server, the server chooses the users who are to be members of the server, the server chooses the users who are to be members of
the conference, and then sends them the INVITE. Typically dial out the conference, and then sends them the INVITE. Typically dial out
conferences are pre-arranged, with specific start times and an conferences are pre-arranged, with specific start times and an
initial group membership list. However, there are other means for initial group membership list. However, there are other means for
the dial-out server to determine the list of participants, including the dial-out server to determine the list of participants, including
user presence [13]. The model in no way limits the means by which user presence [13]. Once the users accept or reject the call from
the server determines the set of users. Once the users accept or the dial out server, the behavior of this system is identical to the
reject the call from the dial out server, the behavior of this dial-in server case.
system is identical to the dial-in server case of Section 4. Thus, a
dial-out conference server will generally need to support dial-in
access for the same conference, if it wishes to allow joining after
the conference begins. Note that, from the participants perspective,
they will learn the conference identity (the URL) from the From
field in the INVITE messages received from the server.
4.3.3 Multicast and Multi-unicast conferences 4.3.3 Multicast and Multi-unicast conferences
In these models, all endpoints send media to all other endpoints. In these models, all endpoints send media to all other endpoints.
Consequently every endpoint mixes their own media from all the other Consequently every endpoint mixes their own media from all the other
sources, and sends their own media to every other participant. sources, and sends their own media to every other participant.
[diagrams] [diagrams]
4.3.3.1 Large-Scale Multicast Conferences 4.3.3.1 Large-Scale Multicast Conferences
Large-scale multicast conferences were the original motivation for Large-scale multicast conferences were the original motivation for
both the Session Description Protocol [SDP] and SIP. In a large- both the Session Description Protocol [SDP] and SIP. In a large-
scale multicast conference, one or more multicast addresses are scale multicast conference, one or more multicast addresses are
allocated to the conference (more than one may be needed if layered
encodings are in use). Each participant joins that multicast groups,
and sends their media to those groups. Signaling is not sent to the
multicast groups. The sole purpose of the signaling is to inform
participants of which multicast groups to join. Large-scale
multicast conferences are usually pre-arranged, with specific start
and stop times (which is why this information exists in SDP).
Protocols such as the Session Announcement Protocol [SAP] are used
to announce these conferences. However, multicast conferences do not
need to be pre-arranged, so long as a mechanism exists to
dynamically obtain a multicast address. So, if there are N
participants, there will be point-to-point SIP relationships with
pairs of participants. Each participant sends a single media stream
to the group, and receives up to N-1 streams at any time. Note that
the number of streams that a user will receive depends on who is
actually sending at any given time. If the stream is audio, and
silence suppression is utilized, the number of streams a user will
receive at any given time is equal to the number of users talking at
any given time. Even for very large conferences, this is usually
just a small number of users.
SIP Multiparty Framework SIP Multiparty Framework
allocated to the conference. Each participant joins that multicast
groups, and sends their media to those groups. Signaling is not sent
to the multicast groups. The sole purpose of the signaling is to
inform participants of which multicast groups to join. Large-scale
multicast conferences are usually pre-arranged, with specific start
and stop times. However, multicast conferences do not need to be
pre-arranged, so long as a mechanism exists to dynamically obtain a
multicast address.
4.3.3.2 Centralized Signaling, Distributed Media 4.3.3.2 Centralized Signaling, Distributed Media
In this conferencing model, there is a centralized controller, as in In this conferencing model, there is a centralized controller, as in
the dial-in and dial-out cases. However, the centralized server the dial-in and dial-out cases. However, the centralized server
handles signaling only. The media is still sent directly between handles signaling only. The media is still sent directly between
participants, using either multicast or multi-unicast. Multi-unicast participants, using either multicast or multi-unicast. Multi-unicast
is when a user sends multiple packets (one for each recipient, is when a user sends multiple packets (one for each recipient,
addressed to that recipient). This is referred to as a addressed to that recipient). This is referred to as a
"Decentralized Multipoint Conference" in [H.323]. "Decentralized Multipoint Conference" in [H.323].
skipping to change at page 15, line 39 skipping to change at page 11, line 47
implementations. Note that this model assumes peer-to-peer implementations. Note that this model assumes peer-to-peer
signaling. signaling.
4.4 Conveying Information and Events 4.4 Conveying Information and Events
Participants should have access to information about the other Participants should have access to information about the other
participants in a conversation space, so that this information can participants in a conversation space, so that this information can
be rendered to a human user or processed by an automaton. Although be rendered to a human user or processed by an automaton. Although
some of this information may be available from the Request-URI or some of this information may be available from the Request-URI or
To, From, Contact, or other SIP headers, another mechanism of To, From, Contact, or other SIP headers, another mechanism of
reporting this information is necessary. Note that the data reporting this information is necessary.
reported by RTCP is insufficient for these purposes, as deletions
and additions are not detectable in real-time, and SIP may setup
session which do not involve RTP media.
Many applications are driven by knowledge about the progress of Many applications are driven by knowledge about the progress of
calls and conferences. In general these types of events allow for calls and conferences. In general these types of events allow for
the construction of distributed applications, where the application the construction of distributed applications, where the application
requires information on dialog and conference state, but is not requires information on dialog and conference state, but is not
necessarily co-resident with an endpoint user agent or conference necessarily co-resident with an endpoint user agent or conference
server. For example, a mixer involved in a conversation space may server. For example, a mixer involved in a conversation space may
wish to provide URLs for conference status, and/or conference/floor wish to provide URLs for conference status, and/or conference/floor
control. control.
SIP Multiparty Framework
The SIP [Events] architecture defines general mechanisms for The SIP [Events] architecture defines general mechanisms for
subscription to and notification of events within SIP networks. It subscription to and notification of events within SIP networks. It
introduces the notion of a package which is a specific introduces the notion of a package which is a specific
"instantiation" of the events mechanism for a well-defined set of "instantiation" of the events mechanism for a well-defined set of
events. events.
New event packages should be able to New event packages should be able to
SIP Multiparty Framework
provide the status of a user's call-legs (dialogs), provide the provide the status of a user's call-legs (dialogs), provide the
status of conferences and its participants, provide user presence status of conferences and its participants, provide user presence
information, and provide the status of user's messages. While this information, and provide the status of user's messages. While this
is not an exhaustive list, these are sufficient to enable the sample is not an exhaustive list, these are sufficient to enable the sample
features described in this document. features described in this document.
A conference event package allows users to subscribe to information A conference event package allows users to subscribe to information
about an entire conference or conversation space. This conference about an entire conference or conversation space. This conference
state could be provided by a conference server or mixing component state could be provided by a conference server or mixing component
(described in Section 4.5) if centralized mixing is used, or (described in a later section) if centralized mixing is used, or
gathered from relevant peers and merged into a cohesive set of gathered from relevant peers and merged into a cohesive set of
state. Notifications would convey information about the state. Notifications would convey information about the
pariticipants such as: the SIP URL identifying each user, their pariticipants such as: the SIP URL identifying each user, their
status in the space (active, declined, departed), URLs to invoke status in the space (active, declined, departed), URLs to invoke
other features (such as sidebar conversations), links to other other features (such as sidebar conversations), links to other
relevant information (such as floor control policies), and if floor relevant information (such as floor control policies), and if floor
control policies are in place, the user's floor control status. A control policies are in place, the user's floor control status. A
"call-leg" event package would provide information about all the dialog event package would provide information about all the dialogs
dialogs the target user is maintaining, what conversations the user the target user is maintaining, what conversations the user in
in participating in, and how these are correlated. A concrete participating in, and how these are correlated. Concrete proposals
proposal for both conference events and call-leg events is described for conference events and dialog events are described in [dialog-
in [call-pkg]. pkg] and [conf-pkg] respectively.
Note that user presence has a close relationship with these two Note that user presence has a close relationship with these two
proposed event packages. It is fundamental to the presence model proposed event packages. It is fundamental to the presence model
that the information used to obtain user presence is constructed that the information used to obtain user presence is constructed
from any number of different input sources. Examples of such sources from any number of different input sources. Examples of such sources
include SIP REGISTER requests and uploads of presence documents. include SIP REGISTER requests and uploads of presence documents.
These two packages can be considered another mechanism that allows a These two packages can be considered another mechanism that allows a
presence agent to determine the presence state of the user. presence agent to determine the presence state of the user.
Specifically, a user presence server can act as a subscriber for the Specifically, a user presence server can act as a subscriber for the
call-leg and conference packages to obtain additional information call-leg and conference packages to obtain additional information
skipping to change at page 16, line 54 skipping to change at page 13, line 5
ample opportunities to present informational URIs which relate to ample opportunities to present informational URIs which relate to
calls, conversations, or dialogs in some way. For example, consider calls, conversations, or dialogs in some way. For example, consider
the SIP Call-Info header, or Contact headers returned in a 300-class the SIP Call-Info header, or Contact headers returned in a 300-class
response. Frequently additional information about a call or dialog response. Frequently additional information about a call or dialog
can be fetched via non-SIP URIs. For example, consider a web page can be fetched via non-SIP URIs. For example, consider a web page
for package tracking when calling a delivery company, or a web page for package tracking when calling a delivery company, or a web page
with related documentation when joining a dial-in conference. The with related documentation when joining a dial-in conference. The
use of URIs in the multiparty framework is discussed in more detail use of URIs in the multiparty framework is discussed in more detail
in Section 4.6. in Section 4.6.
SIP Multiparty Framework
4.5 Componentization and Decomposition 4.5 Componentization and Decomposition
This framework proposes a decomposed component architecture with a This framework proposes a decomposed component architecture with a
very loose coupling of services and components. This means that a very loose coupling of services and components. This means that a
service (such as a conferencing server or an auto-attendant) need service (such as a conferencing server or an auto-attendant) need
SIP Multiparty Framework
not be implemented as an actual server. Rather, these services can not be implemented as an actual server. Rather, these services can
be built by combining a few basic components in straightforward or be built by combining a few basic components in straightforward or
arbitrarily complex ways. arbitrarily complex ways.
Since the components are easily deployed on separate boxes, by Since the components are easily deployed on separate boxes, by
separate vendors, or even with separate providers, we achieve a separate vendors, or even with separate providers, we achieve a
separation of function that allows each piece to be developed in separation of function that allows each piece to be developed in
complete isolation. We can also reuse existing components for new complete isolation. We can also reuse existing components for new
applications. This allows rapid service creation, and the ability applications. This allows rapid service creation, and the ability
for services to be distributed across organizational domains for services to be distributed across organizational domains
skipping to change at page 17, line 43 skipping to change at page 13, line 49
colocated participant component (for example a mixer which also colocated participant component (for example a mixer which also
announces the arrival of a new participant; the announcement portion announces the arrival of a new participant; the announcement portion
is a participant, but the mixer itself is not). Media is a participant, but the mixer itself is not). Media
intermediaries should be as transparent as possible to the end intermediaries should be as transparent as possible to the end
users--offering a useful, fundamental service; without getting in users--offering a useful, fundamental service; without getting in
the way of new features implemented by participants. Some common the way of new features implemented by participants. Some common
media intermediaries are desribed below. media intermediaries are desribed below.
4.5.1.1 Mixer 4.5.1.1 Mixer
A mixer is a component that combines media from all call-legs in the A SIP mixer is a component that combines media from all dialogs in
same conversation in a media specific way. For example, the default the same conversation in a media specific way. For example, the
combining for an audio conference would be an N-1 configuration. In default combining for an audio conference would be an N-1
other words, each user receives a mixed media stream that represents configuration, while the same mixer might interleave text messages
the combined audio of all the users except himself or herself. on a per-line basis.
For reference, the RTP definition of a mixer is included below.
Note that SIP multiparty applications may deal with media which is
not carried by RTP (for example Instant Messages). A mixer, as
defined above, can still combine these messages in a media specific
way and act as a SIP mixing component.
"Mixer: An intermediate system that recieves RTP packets from
one or more sources, ... combines the packets in some manner
and then forwards a new RTP packet. Since the timing across
multiple input sources will not generally be syncronized, the
mixer will make timing adjustments among the streams and
SIP Multiparty Framework
generate its own timing for the combined stream. Thus all data
packets originating from a mixer will be identified as having
the mixer as their syncronization source."
Conventions for specifying a mixing or conferencing service in a SIP Conventions for specifying a mixing or conferencing service in a SIP
URI are proposed in [ms-uri]. URI are proposed in [ms-uri].
4.5.1.2 Media Translator SIP Multiparty Framework
RTP also defines an entity called a translator. Like a mixer, this
concept is useful outside of the context of RTP and can be applied
to most other media types.
"Translator: An intermediate system that forwards RTP packets
with their syncronization source identifier intact. Examples
of translators include devices that convert encodings without
mixing, replicators from multicast to unicast, and application-
level firewalls."
4.5.1.3 Transcoder 4.5.1.2 Transcoder
A transcoder translates media from one encoding to another (for A transcoder translates media from one encoding or format to another
example, GSM voice to G.711, or MPEG2 to H.261). A transcoder for (for example, GSM voice to G.711, MPEG2 to H.261, or text/html to
RTP media is a type of RTP translator. text/plain).
4.5.1.4 Media Relay 4.5.1.3 Media Relay
A media relay terminates media and simply forwards it to a new A media relay terminates media and simply forwards it to a new
destination without changing the content in any way. Sometimes destination without changing the content in any way. Sometimes
media relays are used to provide source IP address anonymity, to media relays are used to provide source IP address anonymity, to
facilitate middlebox traversal, or to provide a trusted entity where facilitate middlebox traversal, or to provide a trusted entity where
media can be forcefully disconnected. A media relay for RTP is also media can be forcefully disconnected.
a type of RTP Translator.
4.5.2 Queue Server 4.5.2 Queue Server
A queue server is a location where calls can be entered into one of A queue server is a location where calls can be entered into one of
several FIFO (first-in, first-out) queues. A queue server would several FIFO (first-in, first-out) queues. A queue server would
subscribe to the presence of groups or individuals who are subscribe to the presence of groups or individuals who are
interested in its queues. When detecting that a user is available interested in its queues. When detecting that a user is available
to service a queue, the server redirects or transfers the last call to service a queue, the server redirects or transfers the last call
in the relevant queue to the available user. On a queue-by-queue in the relevant queue to the available user. On a queue-by-queue
basis, authorized users could also subscribe to the call state basis, authorized users could also subscribe to the call state
(dialog information) of calls within a queue. Authorized users (dialog information) of calls within a queue. Authorized users
could use this information to effectively pluck (take) a call out of could use this information to effectively pluck (take) a call out of
the queue (for example by sending an INVITE with a Replaces header the queue (for example by sending an INVITE with a Replaces header
to one of the user agents in the queue). to one of the user agents in the queue).
4.5.3 Parking Place 4.5.3 Parking Place
A parking place is a location where calls can be terminated A parking place is a location where calls can be terminated
temporarily and then retrieved later. While a call is "parked", it temporarily and then retrieved later. While a call is "parked", it
can receive media "on-hold" such as music, announcements, or can receive media "on-hold" such as music, announcements, or
SIP Multiparty Framework
advertisements. Such a service could be further decomposed such advertisements. Such a service could be further decomposed such
that announcements or music are handled by a separate component. that announcements or music are handled by a separate component.
4.5.4 Announcements and Voice Dialogs 4.5.4 Announcements and Voice Dialogs
An announcement server is a server which can play digitized media An announcement server is a server which can play digitized media
(frequently audio), such as music or recorded speech. These servers (frequently audio), such as music or recorded speech. These servers
are typically accessible via SIP, HTTP, or RTSP. An analogous are typically accessible via SIP, HTTP, or RTSP. An analogous
service is a recording service which stores digitized media. A service is a recording service which stores digitized media. A
convention for specifying announcements in SIP URIs is described in convention for specifying announcements in SIP URIs is described in
[ms-uri]. Likewise the same server could easily provide a service [ms-uri]. Likewise the same server could easily provide a service
which records digitized media. which records digitized media.
A "voice dialog" is a model of spoken interactive behavior between a A "voice dialog" is a model of spoken interactive behavior between a
human and an automaton which can include synthesized speech, human and an automaton which can include synthesized speech,
digitized audio, recognition of spoken and DTMF key input, recording digitized audio, recognition of spoken and DTMF key input, recording
of spoken input, and interaction with call control. Dialogs of spoken input, and interaction with call control. Dialogs
frequently consist of forms or menus. Forms present information and frequently consist of forms or menus. Forms present information and
gather input; menus offer choices of what to do next. gather input; menus offer choices of what to do next.
SIP Multiparty Framework
Spoken dialogs are a basic building block of applications which use Spoken dialogs are a basic building block of applications which use
voice. Consider for example that a voice mail system, the voice. Consider for example that a voice mail system, the
conference-id and passcode collection system for a conferencing conference-id and passcode collection system for a conferencing
system, and complicated voice portal applications all require a system, and complicated voice portal applications all require a
voice dialog component. voice dialog component.
4.5.4.1. Text-to-Speech and Automatic Speech Recognition 4.5.4.1. Text-to-Speech and Automatic Speech Recognition
Text-to-Speech (TTS) is a service which converts text into digitized Text-to-Speech (TTS) is a service which converts text into digitized
audio. TTS is frequently integrated into other applications, but audio. TTS is frequently integrated into other applications, but
when separated as a component, it provides greater opportunity for when separated as a component, it provides greater opportunity for
broad reuse. Various interfaces to access standalone TTS services broad reuse. Various interfaces to access standalone TTS services
via HTTP, RTSP (in [MRCP]), and SIP ([app-components], [ms-uri] and via HTTP, [CATS], and SIP ([app-components], and [ms-uri]) have been
[MRCP-SIP]) have been proposed. proposed.
Automatic Speech Recognition (ASR) is a service which attempts to Automatic Speech Recognition (ASR) is a service which attempts to
decipher digitized speech based on a proposed grammar. Like TTS, decipher digitized speech based on a proposed grammar. Like TTS,
ASR services can be embedded, or exposed so that many applications ASR services can be embedded, or exposed so that many applications
can take advantage of such services. Various IP interfaces to ASR, can take advantage of such services. Various IP interfaces to ASR,
such as MRCP, have been proposed. such as CATS, have been proposed.
4.5.4.2. VoiceXML 4.5.4.2. VoiceXML
[VoiceXML] is a W3C recommendation that was designed to give authors [VoiceXML] is a W3C recommendation that was designed to give authors
control over the spoken dialog between users and applications. The control over the spoken dialog between users and applications. The
application and user take turns speaking: the application prompts application and user take turns speaking: the application prompts
the user, and the user in turn responds. Its major goal is to bring the user, and the user in turn responds. Its major goal is to bring
the advantages of web-based development and content delivery to the advantages of web-based development and content delivery to
interactive voice response applications. We believe that VoiceXML interactive voice response applications. We believe that VoiceXML
represents the ideal partner for SIP in the development of represents the ideal partner for SIP in the development of
distributed IVR servers. VoiceXML is an XML based scripting language distributed IVR servers. VoiceXML is an XML based scripting language
SIP Multiparty Framework
for describing IVR services at an abstract level. VoiceXML supports for describing IVR services at an abstract level. VoiceXML supports
DTMF recognition, speech recognition, text-to-speech, and playing DTMF recognition, speech recognition, text-to-speech, and playing
out of recorded media files. The results of the data collected from out of recorded media files. The results of the data collected from
the user are passed to a controlling entity through an HTTP POST the user are passed to a controlling entity through an HTTP POST
operation. The controller can then return another script, or operation. The controller can then return another script, or
terminate the interaction with the IVR server. terminate the interaction with the IVR server.
A VoiceXML server also need not be implemented as a monolithic A VoiceXML server also need not be implemented as a monolithic
server. Below is a diagram of a VoiceXML browser which is split server. Below is a diagram of a VoiceXML browser which is split
into media and non-media handling parts. The VoiceXML interpreter into media and non-media handling parts. The VoiceXML interpreter
handles SIP dialog state and state within a VoiceXML document, and handles SIP dialog state and state within a VoiceXML document, and
sends requests to the media component over another protocol (for sends requests to the media component over another protocol (for
example RTSP). example [RTSP] or CATS).
+-------------+ +-------------+
| | | |
| VoiceXML | | VoiceXML |
| Interpreter | | Interpreter |
| (signaling) | | (signaling) |
+-------------+ +-------------+
SIP Multiparty Framework
^ ^ ^ ^
| | | |
SIP | | [RTSP] SIP | | RTSP
| | | |
| | | |
v v v v
+-------------+ +-------------+ +-------------+ +-------------+
| | | | | | | |
| SIP UA | RTP | RTSP Server | | SIP UA | RTP | RTSP Server |
| |<------>| (media) | | |<------>| (media) |
| | | | | | | |
+-------------+ +-------------+ +-------------+ +-------------+
Figure : Decomposed VoiceXML Server Figure : Decomposed VoiceXML Server
From a naming perspective, a critical issue when using VoiceXML is More details about the integration of SIP with VoiceXML are provided
how a request URI is associated with a script to invoke when the in [sip-vxml]
call is answered. We see three primary mechanisms: 1) There is a
one-to-one binding of the address in the request URI to a script to
execute. These bindings are published by the provider of the IVR
service. 2) The initial script to execute is actually carried as
content in the body of the SIP INVITE request. The request URI
indicates that the desired service is execution of content in the
request (i.e., sip:executebody@servers.com). 3) The initial script
to execute is fetched by the VoiceXML server; the URL to fetch it
from is passed in the SIP INVITE message that initiates the IVR
session. This can be accomplished either with the application/uri
MIME type as a body, or using the *-Info headers defined in SIP
which provide references to content to fetch. We believe that the
third approach is probably the best one. SIP is not the ideal
transfer mechanism. Passing a URI allows a far better transfer tool,
for example HTTP, to be used to actually fetch the script back from
SIP Multiparty Framework
the controller. HTTP is then also used to pass back form data from
the IVR to the controller. The results of the HTTP POST can also
contain additional VoiceXML scripts to execute. More details about
the integration of SIP with VoiceXML are provided in [sip-vxml]
4.6 Use of URIs 4.6 Use of URIs
All naming in SIP uses URIs. URIs in SIP are used in a plethora of All naming in SIP uses URIs. URIs in SIP are used in a plethora of
contexts: the Request-URI; Contact, To, From, and *-Info headers; contexts: the Request-URI; Contact, To, From, and *-Info headers;
application/uri bodies; and embedded in email, web pages, instant application/uri bodies; and embedded in email, web pages, instant
messages, and ENUM records. The request-URI identifies the user or messages, and ENUM records. The request-URI identifies the user or
service that the call is destined for. service that the call is destined for.
SIP URIs embedded in informational SIP headers, SIP bodies, and non- SIP URIs embedded in informational SIP headers, SIP bodies, and non-
skipping to change at page 21, line 45 skipping to change at page 17, line 4
specific call control action. Another way to invoke call control specific call control action. Another way to invoke call control
primitives is to define a specific Request-URI naming convention. primitives is to define a specific Request-URI naming convention.
Either these conventions must be shared between the client (the Either these conventions must be shared between the client (the
invoker) and the server, or published by or on behlf of the server. invoker) and the server, or published by or on behlf of the server.
The former involves defining URL construction techniques (e.g. URL The former involves defining URL construction techniques (e.g. URL
parameters and/or token conventions) as proposed in [ms-uri]. The parameters and/or token conventions) as proposed in [ms-uri]. The
latter technique usually involves discovering the URI via a SIP latter technique usually involves discovering the URI via a SIP
event package, a web page, a business card, or an Instant Message. event package, a web page, a business card, or an Instant Message.
Yet another means to acquire the URLs is to define a dictionary of Yet another means to acquire the URLs is to define a dictionary of
primitives with well-defined semantics and provide a means to query primitives with well-defined semantics and provide a means to query
SIP Multiparty Framework
the named primitives and corresponding URLs that may be invoked on the named primitives and corresponding URLs that may be invoked on
the service or dialogs. the service or dialogs.
4.6.1 Naming Users in SIP 4.6.1 Naming Users in SIP
An address-of-record, or public SIP address, is a SIP (or SIPS) URI An address-of-record, or public SIP address, is a SIP (or SIPS) URI
that points to a domain with a location server that can map the URI that points to a domain with a location server that can map the URI
to set of Contact URIs where the user might be available. Typically to set of Contact URIs where the user might be available. Typically
the Contact URIs are populated via registration. the Contact URIs are populated via registration.
Address of Record Contacts Address of Record Contacts
sip:bob@biloxi.com -> sip:bob@babylon.biloxi.com:5060 sip:bob@biloxi.com -> sip:bob@babylon.biloxi.com:5060
sip:bbrown@mailbox.provider.net sip:bbrown@mailbox.provider.net
sip:+1.408.555.6789@mobile.net sip:+1.408.555.6789@mobile.net
SIP Multiparty Framework
[Caller-prefs] defines a set of additional parameters to the Contact [Caller-prefs] defines a set of additional parameters to the Contact
header that define the characteristics of the user agent at the header that define the characteristics of the user agent at the
specified URI. For example, there is a mobility parameter which specified URI. For example, there is a mobility parameter which
indicates whether the UA is fixed or mobile. When a user agent indicates whether the UA is fixed or mobile. When a user agent
registers, it places these parameters in the Contact headers to registers, it places these parameters in the Contact headers to
characterize the URIs it is registering. This allows a proxy for characterize the URIs it is registering. This allows a proxy for
that domain to have information about the contact addresses for that that domain to have information about the contact addresses for that
user. user.
skipping to change at page 22, line 44 skipping to change at page 18, line 4
callee's choice, as it determines where, for example, the phone callee's choice, as it determines where, for example, the phone
rings and whether the callee incurs mobile telephone charges for rings and whether the callee incurs mobile telephone charges for
incoming calls. incoming calls.
SIP User Agent implementations are encouraged to make intelligent SIP User Agent implementations are encouraged to make intelligent
decisions based on the type of participants (active/passive, hidden, decisions based on the type of participants (active/passive, hidden,
human/robot) in a conversation space. This information is conveyed human/robot) in a conversation space. This information is conveyed
in a SIP URI parameter and communicated using an appropriate SIP in a SIP URI parameter and communicated using an appropriate SIP
header or event body. For example, a music on hold service may take header or event body. For example, a music on hold service may take
the sensible approach that if there are two or more unhidden the sensible approach that if there are two or more unhidden
SIP Multiparty Framework
participants, it should not provide hold music; or that it will not participants, it should not provide hold music; or that it will not
send hold music to robots. send hold music to robots.
Multiple participants in the same conversation space may represent Multiple participants in the same conversation space may represent
the same human user. For example, the user may use one participant the same human user. For example, the user may use one participant
for video, chat, and whiteboard media on a PC and another for audio for video, chat, and whiteboard media on a PC and another for audio
media on a SIP phone. In this case, the address-of-record is the media on a SIP phone. In this case, the address-of-record is the
same for both user agents, but the Contacts are different. In same for both user agents, but the Contacts are different. In
addition, human users may add robot participants which act on their addition, human users may add robot participants which act on their
behalf (for example a call recording service, or a calendar behalf (for example a call recording service, or a calendar
reminder). Call Control features in SIP should continue to function reminder). Call Control features in SIP should continue to function
as expected in such an environment. as expected in such an environment.
SIP Multiparty Framework
4.6.2 Naming Services with SIP URIs. 4.6.2 Naming Services with SIP URIs.
A critical piece of defining a session level service that can be A critical piece of defining a session level service that can be
accessed by SIP is defining the naming of the resources within that accessed by SIP is defining the naming of the resources within that
service. This point cannot be overstated. service. This point cannot be overstated.
In the context of SIP control of application components, we take In the context of SIP control of application components, we take
advantage of the fact that the standard SIP URI has a user part. advantage of the fact that the standard SIP URI has a user part.
Most services may be thought of as user automatons that participate Most services may be thought of as user automatons that participate
in SIP sessions. It naturally follows that the user address, or the in SIP sessions. It naturally follows that the user address, or the
skipping to change at page 23, line 47 skipping to change at page 19, line 4
combination of two facts. First, unlike in the PSTN, where the combination of two facts. First, unlike in the PSTN, where the
namespace (dialable telephone numbers) are limited, URIs come from namespace (dialable telephone numbers) are limited, URIs come from
an infinite space. They are plentiful, and they are free. Secondly, an infinite space. They are plentiful, and they are free. Secondly,
the primary function of SIP is call routing through manipulations of the primary function of SIP is call routing through manipulations of
the request URI. In the traditional SIP application, this URI the request URI. In the traditional SIP application, this URI
represents people. However, the URI can also represent services, as represents people. However, the URI can also represent services, as
we propose here. This means we can apply the routing services SIP we propose here. This means we can apply the routing services SIP
provides to routing of calls to services. The result - the problem provides to routing of calls to services. The result - the problem
of service invocation and service location becomes a routing of service invocation and service location becomes a routing
problem, for which SIP provides a scalable and flexible solution. problem, for which SIP provides a scalable and flexible solution.
SIP Multiparty Framework
Since there is such a vast namespace of services, we can explicitly Since there is such a vast namespace of services, we can explicitly
name each service in a finely granular way. This allows the name each service in a finely granular way. This allows the
distribution of services across the network. distribution of services across the network.
Consider a conferencing service, where we have separated the names Consider a conferencing service, where we have separated the names
of ad-hoc conferences from scheduled conferences, we can program of ad-hoc conferences from scheduled conferences, we can program
proxies to route calls for ad-hoc conferences to one set of servers, proxies to route calls for ad-hoc conferences to one set of servers,
and calls for scheduled ones to another, possibly even in a and calls for scheduled ones to another, possibly even in a
different provider. In fact, since each conference itself is given a different provider. In fact, since each conference itself is given a
URI, we can distribute conferences across servers, and easily URI, we can distribute conferences across servers, and easily
guarantee that calls for the same conference always get routed to guarantee that calls for the same conference always get routed to
the same server. This is in stark contrast to conferences in the the same server. This is in stark contrast to conferences in the
telephone network, where the equivalent of the URI - the phone telephone network, where the equivalent of the URI - the phone
number - is scarce. An entire conferencing provider generally has number - is scarce. An entire conferencing provider generally has
SIP Multiparty Framework
one or two numbers. Conference IDs must be obtained through IVR one or two numbers. Conference IDs must be obtained through IVR
interactions with the caller, or through a human attendant. This interactions with the caller, or through a human attendant. This
makes it difficult to distribute conferences across servers all over makes it difficult to distribute conferences across servers all over
the network, since the PSTN routing only knows about the dialed the network, since the PSTN routing only knows about the dialed
number. number.
In the case of a dialog server, the voice dialog itself is the In the case of a dialog server, the voice dialog itself is the
target for the call. As such, the request URI should contain the target for the call. As such, the request URI should contain the
identifier for this spoken dialog. This is consistent with the identifier for this spoken dialog. This is consistent with the
Request-URI service invocation model of RFC 3087. This URL can be in Request-URI service invocation model of RFC 3087. This URL can be in
skipping to change at page 24, line 46 skipping to change at page 20, line 4
sip:dialog.vxml@vxmlservers.com sip:dialog.vxml@vxmlservers.com
The first of these indicates that the dialog server (located at The first of these indicates that the dialog server (located at
vxmlservers.com) should invoke a VoiceXML script fetched from vxmlservers.com) should invoke a VoiceXML script fetched from
http://dialogs.server.com/script32.vxml. Since the user part of the http://dialogs.server.com/script32.vxml. Since the user part of the
SIP URL cannot contain the : character, this must be escaped to %3a. SIP URL cannot contain the : character, this must be escaped to %3a.
These types of conventions are not limited to application component These types of conventions are not limited to application component
servers. An ordinary SIP User Agent can have a special URIs as servers. An ordinary SIP User Agent can have a special URIs as
well, for example, one which is automatically answered by a well, for example, one which is automatically answered by a
SIP Multiparty Framework
speakerphone. Since URIs are so plentiful, using a separate URI for speakerphone. Since URIs are so plentiful, using a separate URI for
this service does not exhaust a valuable resource. The requested this service does not exhaust a valuable resource. The requested
service is clear to the user agent receiving the request. This URI service is clear to the user agent receiving the request. This URI
can also be included as part of another feature (for example, the can also be included as part of another feature (for example, the
Intercom feature described in Section 6.1.6). This feature can be Intercom feature described in Section 6.1.6). This feature can be
specified with a SIP user parameter, since are part of the userpart specified with a SIP user parameter, since are part of the userpart
of a SIP URI. of a SIP URI.
Likewise a Request URI can fully describe an announcement service Likewise a Request URI can fully describe an announcement service
through the use of the user part of the address and additional URI through the use of the user part of the address and additional URI
parameters. In our example, the user portion of the address, parameters. In our example, the user portion of the address,
"annc", specifies the announcement service on the media server. "annc", specifies the announcement service on the media server.
The two URI parameters "play=" and "early=" specify the audio The two URI parameters "play=" and "early=" specify the audio
resource to play and whether early media is desired. resource to play and whether early media is desired.
SIP Multiparty Framework
sip:annc@ms2.carrier.net; sip:annc@ms2.carrier.net;
play=http://audio.carrier.net/allcircuitsbusy.au;early=yes play=http://audio.carrier.net/allcircuitsbusy.au;early=yes
sip:annc@ms2.carrier.net; sip:annc@ms2.carrier.net;
play=file://fileserver.carrier.net/geminii/yourHoroscope.wav play=file://fileserver.carrier.net/geminii/yourHoroscope.wav
In practical applications, it is important that an invoker does not In practical applications, it is important that an invoker does not
necessarily apply semantic rules to various URIs it did not create. necessarily apply semantic rules to various URIs it did not create.
Instead, it should allow any arbitrary string to be provisioned, and Instead, it should allow any arbitrary string to be provisioned, and
map the string to the desired behavior. The administrator of a map the string to the desired behavior. The administrator of a
skipping to change at page 25, line 44 skipping to change at page 21, line 4
standard greeting sip:677283@vm.wcom.com standard greeting sip:677283@vm.wcom.com
sip:rjs@vm.wcom.com;mode=deposit sip:rjs@vm.wcom.com;mode=deposit
Deposit with on sip:sub-rjs-deposit-busy.vm.wcom.com Deposit with on sip:sub-rjs-deposit-busy.vm.wcom.com
phone greeting sip:677372@vm.wcom.com phone greeting sip:677372@vm.wcom.com
sip:rjs@vm.wcom.com;mode=3991243 sip:rjs@vm.wcom.com;mode=3991243
Deposit with sip:sub-rjs-deposit-sg@vm.wcom.com Deposit with sip:sub-rjs-deposit-sg@vm.wcom.com
special greeting sip:677384@vm.wcom.com special greeting sip:677384@vm.wcom.com
sip:rjs@vm.wcom.com;mode=sg sip:rjs@vm.wcom.com;mode=sg
SIP Multiparty Framework
Retrieve - SIP sip:sub-rjs-retrieve@vm.wcom.com Retrieve - SIP sip:sub-rjs-retrieve@vm.wcom.com
authentication sip:677405@vm.wcom.com authentication sip:677405@vm.wcom.com
sip:rjs@vm.wcom.com;mode=retrieve sip:rjs@vm.wcom.com;mode=retrieve
Retrieve - prompt sip:sub-rjs-retrieve-inpin.vm.wcom.com Retrieve - prompt sip:sub-rjs-retrieve-inpin.vm.wcom.com
for PIN in-band sip:677415@vm.wcom.com for PIN in-band sip:677415@vm.wcom.com
sip:rjs@vm.wcom.com;mode=inpin sip:rjs@vm.wcom.com;mode=inpin
As we have shown, SIP URIs represent an ideal, flexbile mechanism As we have shown, SIP URIs represent an ideal, flexbile mechanism
for describing and naming service resources, be they queues, for describing and naming service resources, be they queues,
conferences, voice dialogs, announcements, voicemail treatments, or conferences, voice dialogs, announcements, voicemail treatments, or
phone features. phone features.
SIP Multiparty Framework
4.7 Invoker Independence 4.7 Invoker Independence
Only the invoker of features in SIP know exactly which feature they Only the invoker of features in SIP need to know exactly which
are invoking. One of the primary benefits of this approach is that feature they are invoking. One of the primary benefits of this
combinations of features should work in SIP call control. For approach is that combinations of features should work in SIP call
example, let us examine the combination of a "transfer" of a call control. For example, let us examine the combination of a
which is "conferenced". "transfer" of a call which is "conferenced".
Alice calls Bob. Alice silently "conferences in" her robotic Alice calls Bob. Alice silently "conferences in" her robotic
assistant Albert as a hidden party. Bob transfers Alice to Carol. assistant Albert as a hidden party. Bob transfers Alice to Carol.
If Bob asks Alice to Replace her leg with a new one to Carol then If Bob asks Alice to Replace her leg with a new one to Carol then
both Alice and Albert should be communicating with Carol both Alice and Albert should be communicating with Carol
(transparently). (transparently).
Using the peer-to-peer model, this combination of features works Using the peer-to-peer model, this combination of features works
fine if A is doing local mixing (Alice replaces Bob's call-leg with fine if A is doing local mixing (Alice replaces Bob's call-leg with
Carol's), or if A is using a central mixer (the mixer replaces Bob's Carol's), or if A is using a central mixer (the mixer replaces Bob's
skipping to change at page 26, line 47 skipping to change at page 22, line 5
Alice places a call to Bob. Alice then blind transfers Bob to Carol Alice places a call to Bob. Alice then blind transfers Bob to Carol
through a PSTN gateway. In current usage of REFER and BYE/Also, Bob through a PSTN gateway. In current usage of REFER and BYE/Also, Bob
may be billed for a call he did not initiate (his UA originated the may be billed for a call he did not initiate (his UA originated the
outgoing call leg however). This is not necessarily a terrible outgoing call leg however). This is not necessarily a terrible
thing, but it demonstrates a security concern (Bob must have thing, but it demonstrates a security concern (Bob must have
appropriate local policy to prevent fraud). Also, Alice may wish to appropriate local policy to prevent fraud). Also, Alice may wish to
pay for Bob's session with Carol. There should be a way to signal pay for Bob's session with Carol. There should be a way to signal
this in SIP. this in SIP.
SIP Multiparty Framework
Likewise a Replacement call may maintain the same billing Likewise a Replacement call may maintain the same billing
relationship as a Replaced call, so if Alice first calls Carol, then relationship as a Replaced call, so if Alice first calls Carol, then
asks Bob to Replace this call, Alice may continue to receive a bill. asks Bob to Replace this call, Alice may continue to receive a bill.
Further work in SIP billing should define a way to set or discover Further work in SIP billing should define a way to set or discover
the direction of billing. the direction of billing.
5 Catalog of call control actions and sample features 5 Catalog of call control actions and sample features
Call control actions can be categorized by the dialogs upon which Call control actions can be categorized by the dialogs upon which
they operate. The actions may involve a single or multiple dialogs. they operate. The actions may involve a single or multiple dialogs.
These dialogs can be early or established. Multiple dialogs may be These dialogs can be early or established. Multiple dialogs may be
SIP Multiparty Framework
related in a conversation space to form a conference or other related in a conversation space to form a conference or other
interesting media topologies. interesting media topologies.
It should be noted that it is desirable to provide a means by which It should be noted that it is desirable to provide a means by which
a party can discover the actions which may be performed on a dialog. a party can discover the actions which may be performed on a dialog.
The interested party may be independent or related to the dialogs. The interested party may be independent or related to the dialogs.
One means of accomplishing this is through the ability to define and One means of accomplishing this is through the ability to define and
obtain URLs for these actions as described in section 4.6. obtain URLs for these actions as described in section 4.6.
Below are listed several call control "actions" which establish or Below are listed several call control "actions" which establish or
skipping to change at page 27, line 44 skipping to change at page 23, line 4
functionality for PDA, PC and server based applications which desire functionality for PDA, PC and server based applications which desire
the ability to control a UA. the ability to control a UA.
5.1.1 Remote Answer 5.1.1 Remote Answer
A dialog is in some early dialog state such as 180 Ringing. It may A dialog is in some early dialog state such as 180 Ringing. It may
be desirable to tell the UA to answer the dialog. That is tell it be desirable to tell the UA to answer the dialog. That is tell it
to send a 200 Ok response to establish the dialog. to send a 200 Ok response to establish the dialog.
5.1.2 Remote Forward or Put 5.1.2 Remote Forward or Put
SIP Multiparty Framework
It may be desirable to tell the UA to respond with a 3xx class It may be desirable to tell the UA to respond with a 3xx class
response to forward an early dialog to another UA. response to forward an early dialog to another UA.
5.1.3 Remote Busy or Error Out 5.1.3 Remote Busy or Error Out
It may be desirable to instruct the UA to send an error response It may be desirable to instruct the UA to send an error response
such as 486 Busy Here. such as 486 Busy Here.
5.2 Single Dialog Actions 5.2 Single Dialog Actions
There is another useful set of actions which operate on a single There is another useful set of actions which operate on a single
established dialog. These operations are useful in building established dialog. These operations are useful in building
productivity applications for aiding users to control their phone. productivity applications for aiding users to control their phone.
For example a CRM application which sets up calls for a user For example a CRM application which sets up calls for a user
SIP Multiparty Framework
eliminating the need for the user to actually enter an address. eliminating the need for the user to actually enter an address.
These operations can also be thought of a remote control actions. These operations can also be thought of a remote control actions.
5.2.1 Remote Dial 5.2.1 Remote Dial
This action instructs the UA to initiate a dialog. This action can This action instructs the UA to initiate a dialog. This action can
be performed using the REFER method. be performed using the REFER method.
5.2.2 Remote On and Off Hold 5.2.2 Remote On and Off Hold
skipping to change at page 28, line 45 skipping to change at page 24, line 5
5.3.1 Transfer 5.3.1 Transfer
The conversation space changes as follows: The conversation space changes as follows:
before after before after
{ A , B } --> { C , B } { A , B } --> { C , B }
A replaces itself with C. A replaces itself with C.
SIP Multiparty Framework
To make this happen using the peer-to-peer approach, "A" would send To make this happen using the peer-to-peer approach, "A" would send
two SIP requests. A shorthand for those requests is shown below: two SIP requests. A shorthand for those requests is shown below:
REFER B Refer-To:C REFER B Refer-To:C
BYE B BYE B
To make this happen instead using the 3pcc approach, the controller To make this happen instead using the 3pcc approach, the controller
sends requests represented by the shorthand below: sends requests represented by the shorthand below:
INVITE C (w/SDP of B) INVITE C (w/SDP of B)
reINVITE B (w/SDP of C) reINVITE B (w/SDP of C)
BYE A BYE A
Features enabled by this action: Features enabled by this action:
- blind transfer - blind transfer
- transfer to a central mixer (some type of conference or forking) - transfer to a central mixer (some type of conference or forking)
SIP Multiparty Framework
- transfer to park server (park) - transfer to park server (park)
- transfer to music on hold or announcement server - transfer to music on hold or announcement server
- transfer to a "queue" - transfer to a "queue"
- transfer to a service (such as Voice Dialogs service) - transfer to a service (such as Voice Dialogs service)
- transition from local mixer to central mixer - transition from local mixer to central mixer
5.3.2 Take 5.3.2 Take
The conversation space changes as follows: The conversation space changes as follows:
skipping to change at page 29, line 45 skipping to change at page 25, line 4
- voice portal resuming ownership of a call it originated - voice portal resuming ownership of a call it originated
- answering-machine style screening (pickup) - answering-machine style screening (pickup)
- pickup of a ringing call (i.e. early dialog) - pickup of a ringing call (i.e. early dialog)
Note: that pick up of a ringing call has perhaps some interesting Note: that pick up of a ringing call has perhaps some interesting
additional requirements. First of all it is an early dialog as additional requirements. First of all it is an early dialog as
opposed to an established dialog. Secondly the party which is to opposed to an established dialog. Secondly the party which is to
pickup the call may only wish to do so only while it is an early pickup the call may only wish to do so only while it is an early
dialog. That is in the race condition where the ringing UA accepts dialog. That is in the race condition where the ringing UA accepts
just before it receives signaling from the party wishing to take the just before it receives signaling from the party wishing to take the
SIP Multiparty Framework
call, the taking party wishes to yield or cancel the take. The goal call, the taking party wishes to yield or cancel the take. The goal
is to avoid yanking an answered call from the called party. is to avoid yanking an answered call from the called party.
5.3.3 Add 5.3.3 Add
The conversation space changes as follows: The conversation space changes as follows:
{ A , B } --> { A, B, C } { A , B } --> { A, B, C }
A adds C to the conversation. A adds C to the conversation.
Using the peer-to-peer approach, adding a party using local mixing Using the peer-to-peer approach, adding a party using local mixing
requires no signaling. To transition from a 2-party call or a requires no signaling. To transition from a 2-party call or a
locally mixed conference to centrally mixing A could send the locally mixed conference to centrally mixing A could send the
following requests: following requests:
SIP Multiparty Framework
REFER B Refer-To: mixer REFER B Refer-To: mixer
INVITE mixer INVITE mixer
BYE B BYE B
To add a party to a central mixer: To add a party to a central mixer:
REFER C Refer-To: mixer REFER C Refer-To: mixer
or or
REFER mixer Refer-To: C REFER mixer Refer-To: C
Using the 3pcc approach to transition to centrally mixed, the Using the 3pcc approach to transition to centrally mixed, the
skipping to change at page 30, line 47 skipping to change at page 26, line 5
{ A, B} , {A, C} --> {A, B, C} { A, B} , {A, C} --> {A, B, C}
or like this or like this
{ A, B} , {C, D} --> {A, B, C, D} { A, B} , {C, D} --> {A, B, C, D}
A takes two conversation spaces and joins them together into a A takes two conversation spaces and joins them together into a
single space. single space.
SIP Multiparty Framework
Using the peer-to-peer approach, A can mix locally, or REFER the Using the peer-to-peer approach, A can mix locally, or REFER the
participants of both conversation spaces to the same central mixer participants of both conversation spaces to the same central mixer
(as in 5.3) (as in 5.3)
For the 3pcc approach, the call flows for inserting participants, For the 3pcc approach, the call flows for inserting participants,
and joining and splitting conversation spaces are tedious yet and joining and splitting conversation spaces are tedious yet
straightforward, so these are left as an exercise for the reader. straightforward, so these are left as an exercise for the reader.
Features enabled: Features enabled:
- standard conference feature - standard conference feature
- leaving a sidebar to rejoin a larger conference - leaving a sidebar to rejoin a larger conference
5.3.5 Insert 5.3.5 Insert
SIP Multiparty Framework
The conversation space changes like this: The conversation space changes like this:
{ B , C } --> {A, B, C } { B , C } --> {A, B, C }
A inserts itself into a conversation space. A inserts itself into a conversation space.
A proposed mechanism for signaling this using the peer-to-peer A proposed mechanism for signaling this using the peer-to-peer
approach is to send a new header in an INVITE with "joining" approach is to send a new header in an INVITE with "joining"
semantics. For example: semantics. For example:
skipping to change at page 31, line 43 skipping to change at page 27, line 4
BYE D BYE D
Features enabled: Features enabled:
- sidebar conversations during a larger conference - sidebar conversations during a larger conference
5.3.7 Near-fork 5.3.7 Near-fork
A participates in two conversation spaces simultaneously: A participates in two conversation spaces simultaneously:
{ A, B } --> { B , A } & { A , C } { A, B } --> { B , A } & { A , C }
SIP Multiparty Framework
A is a participant in two conversation spaces such that A sends the A is a participant in two conversation spaces such that A sends the
same media to both spaces, and renders media from both spaces, same media to both spaces, and renders media from both spaces,
presumably by mixing or rendering the media from both. We can presumably by mixing or rendering the media from both. We can
define that A is the "anchor" point for both forks, each of which is define that A is the "anchor" point for both forks, each of which is
a separate conversation space. a separate conversation space.
This action is purely local implementation (it requires no special This action is purely local implementation (it requires no special
signaling). Local features such as switching calls between the signaling). Local features such as switching calls between the
background and foreground are possible using this media background and foreground are possible using this media
relationship. relationship.
5.3.8 Far fork 5.3.8 Far fork
The conversation space diagram... The conversation space diagram...
SIP Multiparty Framework
{ A, B } --> { A , B } & { B , C } { A, B } --> { A , B } & { B , C }
A requests B to be the "anchor" of two conversation spaces. A requests B to be the "anchor" of two conversation spaces.
For an example of using 3pcc to setup media forking, see [Media For an example of using 3pcc to setup media forking, see [Media
forking]. The session descriptions for forking are quite complex. forking]. The session descriptions for forking are quite complex.
Controllers should verify that endpoints can handle forked-media, by Controllers should verify that endpoints can handle forked-media, by
using some type of Requires header token. using some type of Requires header token.
Two ways to setup this media relationship using peer-to-peer call Two ways to setup this media relationship using peer-to-peer call
control have been proposed: control have been proposed:
- the anchor receives a REFER with requires forked-media (implicit) - the anchor receives a REFER with requires forked-media (implicit)
- the anchor receives an INVITE with Fork-with header (explicit) - the anchor receives an INVITE with an explicit header (explicit)
Features enabled: Features enabled:
- barge-in - barge-in
- voice portal services - voice portal services
- whisper - whisper
- hotword detection - hotword detection
- sending DTMF somewhere else - sending DTMF somewhere else
The above notation does not fully describe the media topology. Below 6 Security Considerations
are the four possible media topologies by which C might want to join
the A-B dialog. For some of the above listed features there is a
requirement to be able to specify any of these media topologies as
part of joining. In addition it is also a requirement that it be
possible to change the media topology after the initial setup (e.g.
in a reINVITE). An example of this is a silent monitored
conversation which is modified to be a full fledged conference to
allow a call center supervisor to converse with the customer.
The media topology can be separated into two perspectives. The Call Control primitives provide a powerful set of features that can
topology for the send and receive media streams for C. For each of be dangerous in the hands of an attacker. To complicate matters,
these streams C needs the ability to specify either point to point call control primitives are likely to be automatically authorized
or mixed media. This works out to the matrix where the ˘send÷ without direct human oversight.
column indicates what happens with the media from C at B. The
˘receive÷ column indicates what C wants to receive (mix or only BĂs
media). In the greater than 3 party case theoretically this cold be
generalized to specify the set for the mix, however, from a
pragmatic perspective the authors feel it is sufficient to constrain
the description of the sets to all or nothing for now (i.e. point to
point or max of all).
Send Receive The class of attacks which are possible using these tools include
1 Pt2pt mix the ability to eavesdrop on calls, disconnect calls, redirect calls,
2 mix mix render irritating content (including ringing) at a user agent, cause
3 Pt2pt Pt2pt an action that has billing consequences, subvert billing (theft-of-
4 mix Pt2pt service), and obtain private information. Call control extensions
must take extra care to describe how these attacks will be
prevented.
For following examples:
A is the customer
B is the agent
C is the supervisor
SIP Multiparty Framework SIP Multiparty Framework
=> and <= indicate the direction of media flow We can also make some general observations about authorization and
trust with respect to call control. The security model is
dramatically dependent on the signaling model chosen (see section
4.2)
1. Send: point to point, Receive: mix Let us first examine the security model used in the 3pcc approach.
Example application: silent monitoring or coaching All signaling goes through the controller, which is a trusted
A <= B (point to point, only B hears C) entity. Traditional SIP authentication and hop-by-hop encrpytion
A => B and message integrity work fine in this environment, but end-to-end
(A+B) => C (C gets mix of A + B) encrpytion and message integrity may not be possible.
B <= C
2. Send: mix, Receive: mix When using the peer-to-peer approach, call control actions and
Example application: Normal Conference primitives can be legitimately initiated by a) an existing
A <= (B+C) (mix, A gets mix of B+C) participant in the conversation space, b) a former participant in
A => B the conversation space, or c) an entity trusted by one of the
(A+B) => C (C gets mix of A + B) participants. For example, a participant always initiates a
B <= C transfer; a retrieve from Park (a take) is initiated on behalf of a
former participant; and a barge-in (insert or far-fork) is initiated
by a trusted entity (an operator for example).
3. Send: point to point, Receive: point to point Authenticating requests by an existing participant or a trusted
Example application: Whisper/Sidebar entity can be done with baseline SIP mechanisms. In the case of
A <= B (point to point, only B hears C) features initiated by a former participant, these should be
A => B protected against replay attacks by using a unique name or
B => C (point to point, C hears only B) identifier per invocation. The Replaces header exhibits this
B <= C behavior as a by-product of its operation (once a Replaces operation
is successful, the call-leg being Replaced no longer exists). For
other requests, a "one-time" Request-URI may be provided to the
feature invoker.
4. Send: mix, Receive: point to point To authorize call control primitives that trigger special behavior
Example application: Recorded Conversation (such as an INVITE with Replace, Join, or Fork semantics), the
C ű Voice Recorder receiving user agent may have trouble finding appropriate
A <= B (point to point, only B hears C) credentials with which to challenge or authorize the request, as the
A => B sender may be completely unknown to the receiver, except through the
(A+B) => C (C gets mix of A + B) introduction of a third party. These credentials need to be passed
B <= C transitively in some way or fetched in an event body, for example.
6 Putting it all together 7 Appendix A: Example Features
These example features should require an amply robust set of Primitives are defined in terms of their ability to provide
services to demonstrate a useful set of primitives. A summary of features. These example features should require an amply robust set
these features is listed below. Implementation of features with an of services to demonstrate a useful set of primitives. They are
asterisk (*) are described briefly in Section 6.1. described here briefly. Note that the descriptions of these features
are non-normative. Some of these features are used as examples in
section 6 to demonstrate how some features may require certain media
relationships. Note also that this document describes a mixture of
both features originating in the world of telephones, and features
which are clearly Internet oriented.
7.1 Example Feature Definitions:
SIP Multiparty Framework
Call Waiting - Alice is in a call, then receives another call.
Alice can place the first call on hold, and talk with the other
caller. She can typically switch back and forth between the
callers.
Blind Transfer - Alice is in a conversation with Bob. Alice asks
Bob to contact Carol, but makes no attempt to contact Craol
independently. In many implementations, Alice does not verify Bob's
success or failure in contacting Carol.
Attended Transfer - The transferring party establishes a session
with the transfer target before completing the transfer.
Consultative transfer - the transferring party establishes a session
with the target and mixes both sessions together so that all three
parties can participate, then disconnects leaving the transferee and
transfer target with an active session.
Conference Call - Three or more active, visible participants in the
same conversation space.
Call Park - A call participant parks a call (essentially puts the
call on hold), and then retrieves it at a later time (typically from
another location).
Call Pickup - A party picks up a call that was ringing at another
location. One variation allows the caller to choose which location,
another variation just picks up any call in that user's "pickup
group".
Music on Hold - When Alice places a call with Bob on hold, it
replaces its audio with streaming content such as music,
announcements, or advertisements.
Call Monitoring - A call center supervisor joins an in-progress call
for monitoring purposes.
Barge-in - Carol interrupts Alice who has a call in-progress call
with Bob. In some variations, Alice forcibly joins a new
conversation with Carol, in other variations, all three parties are
placed in the same conversation (basically a 3-way conference).
Hotline - Alice picks up a phone and is immediately connected to the
technical support hotline, for example.
Autoanswer - Calls to a certain address or location answer
immediately via a speakerphone.
Intercom - Alice typically presses a button on a phone which
immediately connects to another user or phone and casues that phone
to play her voice over its speaker. Some variations immediately
setup two-way communications, other variations require another
button to be pressed to enable a two-way conversation.
SIP Multiparty Framework
Speakerphone paging - Alice calls the paging address and speaks.
Her voice is played on the speaker of every idle phone in a
preconfigured group of phones.
Speed dial - Alice dials an abbreviated number, or enters an alias,
or presses a special speed dial button representing Bob. Her action
is interpreted as if she specified the full address of Bob.
Call Return - Alice calls Bob. Bob misses the call or is
disconnected before he is finished talking to Alice. Bob invokes
Call return which calls Alice, even if Alice did not provide her
real identity or location to Bob.
Inbound Call Screening - Alice doesn't want to receive calls from
Matt. Inbound Screening prevents Matt from disturbing Alice. In
some variations this works even if Matt hides his identity.
Outbound Call Screening - Alice is paged and unknowingly calls a
PSTN pay-service telephone number in the Carribean, but local policy
blocks her call, and possibly informs her why.
Call Forwarding - Before a call-leg is accepted it is redirected to
another location, for example, because the originally intended
recipient is busy, does not answer, is disconnected from the
network, configured all requests to go soemwhere else.
Message Waiting - Bob calls Alice when she steps away from her
phone, when she returns a visible or audible indicator conveys that
someone has left her a voicemail message. The message waiting
indication may also convey how many messages are waiting, from whom,
what time, and other useful pieces of information.
Do Not Disturb - Alice selects the Do Not Disturb option. Calls to
her either ring briefly or not at all and are forwarded elsewhere.
Some variations allow specially authorized callers to override this
feature and ring Alice anyway.
Distinctive ring - Incoming calls have different ring cadences or
sample sounds depending on the From party, the To party, or other
factors.
Automatic Callback: Alice calls Bob, but Bob is busy. Alice would
like Bob to call her automatically when he is available. When Bob
hangs up, alice's phone rings. When Alice answers, Bob's phone
rings. Bob answers and they talk.
Find-Me - Alice sets up complicated rules for how she can be reached
(possibly using [CPL], [presence] or other factors). When Bob calls
Alice, his call is eventually routed to a temporary Contact where
Alice happens to be available.
Whispered call waiting - Alice is in a conversation with Bob. Carol
calls Alice. Either Carol can "whisper" to Alice directly ("Can you
SIP Multiparty Framework
get lunch in 15 minutes?"), or an automaton whispers to Alice
informing her that Carol is trying to reach her.
Voice message screening - Bob calls Alice. Alice is screening her
calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob
leave his message. If she decides to talk to Bob, she can take the
call back from the voicemail system, otherwise she can let Bob leave
a message. This emulates the behavior of a home telephone answering
machine
Presence-Enabled Conferencing: Alice wants to set up a conference
call with Bob and Cathy when they all happen to be available (rather
than scheduling a predefined time). The server providing the
application monitors their status, and calls all three when they are
all "online", not idle, and not in another call.
IM Conference Alerts: A user receives an notification as an Instant
Message whenever someone joins a conference they are also in.
Single Line Extension -- A group of phones are all treated as
"extensions" of a single line. A call for one rings them all. As
soon as one answers, the others stop ringing. If any extension is
actively in a coversation, another extension can "pick up" and
immediately join the conversation. This emulates the behavior of a
home telephone line with multiple phones.
Click-to-dial - Alice looks in her company directory for Bob. When
she finds Bob, she clicks on a URL to call him. Her phone rings (or
possibly answers automatically), and when she answers, Bob's phone
rings.
Pre-paid calling - Alice pays for a certain currency or unit amount
of calling value. When she places a call, she provides her account
number somehow. If her account runs out of calling value during a
call her call is disconnected or redirected to a service where she
can purchase more calling value.
Voice Portal - A service that allows users to access a portal site
using spoken dialog interaction. For example, Alice needs to
schedule a working dinner with her co-worker Carol. Alice uses a
voice portal to check Carol's flight schedule, find a restauraunt
near her hotel, make a reservation, get directions there, and page
Carol with this information.
7.2 Implementation of these features
Example Features: Example Features:
Call Hold [Offer/Answer] for SIP Call Hold [Offer/Answer] for SIP
Call Waiting Local Implementation Call Waiting Local Implementation
Blind Transfer [cc-transfer] Blind Transfer [cc-transfer]
Attended Transfer [cc-transfer] Attended Transfer [cc-transfer]
Consultative transfer [cc-transfer] Consultative transfer [cc-transfer]
Conference Call [conf-models] Conference Call [conf-models]
SIP Multiparty Framework
Call Park *[examples] Call Park *[examples]
Call Pickup *[examples] Call Pickup *[examples]
Music on Hold *[examples] Music on Hold *[examples]
Call Monitoring *Insert Call Monitoring *Insert
Barge-in *Insert or Far-Fork Barge-in *Insert or Far-Fork
Hotline Local Implementation Hotline Local Implementation
Autoanswer Local URI convention Autoanswer Local URI convention
Speed dial Local Implementation Speed dial Local Implementation
SIP Multiparty Framework
Intercom *Speed dial + autoanswer Intercom *Speed dial + autoanswer
Speakerphone paging *Speed dial + autoanswer Speakerphone paging *Speed dial + autoanswer
Call Return Proxy feature Call Return Proxy feature
Inbound Call Screening Proxy or Local implementation Inbound Call Screening Proxy or Local implementation
Outbound Call Screening Proxy feature Outbound Call Screening Proxy feature
Call Forwarding Proxy or Local implementation Call Forwarding Proxy or Local implementation
Message Waiting [msg-waiting] Message Waiting [msg-waiting]
Do Not Disturb [presence] Do Not Disturb [presence]
Distinctive ring *Proxy or Local implementation Distinctive ring *Proxy or Local implementation
Automatic Callback 2 person presence-based conference Automatic Callback 2 person presence-based conference
Find-Me Proxy service based on presence Find-Me Proxy service based on presence
Whispered call waiting Local implementation Whispered call waiting Local implementation
Voice message screening * Voice message screening *
Presence-based Conferencing*call when presence = available Presence-based Conferencing*call when presence = available
IM Conference Alerts subscribe to conference status IM Conference Alerts subscribe to conference status
Single Line Extension * Single Line Extension *
Click-to-dial * Click-to-dial *
Pre-paid calling * Pre-paid calling *
Voice Portal * Voice Portal *
6.1 Feature Solutions 7.2.1 Call Park
The following sections illustrates how some of the primitives can be
put together to build some powerful and interesting features.
6.1.1 Call Park
Call park requires the ability to: put a dialog some place, Call park requires the ability to: put a dialog some place,
advertise it to users in a pickup group and to uniquely identify it advertise it to users in a pickup group and to uniquely identify it
in a means that can be communicated (including human voice). The in a means that can be communicated (including human voice). The
dialog can be held locally on the UA parking the dialog or dialog can be held locally on the UA parking the dialog or
alternatively transferred to the park service for the pickup group. alternatively transferred to the park service for the pickup group.
The parked dialog then needs to be labeled (e.g. orbit 12) in a way The parked dialog then needs to be labeled (e.g. orbit 12) in a way
that can be communicated to the party that is to pick up the call. that can be communicated to the party that is to pick up the call.
The UAs in the pick up group discovers the parked dialog(s) via The UAs in the pick up group discovers the parked dialog(s) via
[call-leg] from the park service. If the dialog is parked locally [call-leg] from the park service. If the dialog is parked locally
the park service merely aggregates the parked call states from the the park service merely aggregates the parked call states from the
set of UAs in the pickup up group. set of UAs in the pickup up group.
6.1.2 Call Pickup 7.2.2 Call Pickup
There are two different features which are called call pickup. The There are two different features which are called call pickup. The
first is the pickup of a parked dialog. The UA from which the first is the pickup of a parked dialog. The UA from which the
dialog is to be picked up subscribes to the call state [call-leg] of dialog is to be picked up subscribes to the call state [call-leg] of
the park service or the UA which has locally parked the dialog. the park service or the UA which has locally parked the dialog.
Dialogs which are parked should be labeled with an identifier. The Dialogs which are parked should be labeled with an identifier. The
labels are used by the UA to allow the user to indicate which dialog labels are used by the UA to allow the user to indicate which dialog
is to be picked up. The UA picking up the call invoked the URL in is to be picked up. The UA picking up the call invoked the URL in
the call state which is labeled as replace-remote. the call state which is labeled as replace-remote.
SIP Multiparty Framework
The other call pickup feature involves picking up an early dialog The other call pickup feature involves picking up an early dialog
(typically ringing). This feature uses some of the same primitives (typically ringing). This feature uses some of the same primitives
as the pick up of a parked call. The call state of the UA ringing as the pick up of a parked call. The call state of the UA ringing
SIP Multiparty Framework
phone is advertised using [call-leg]. The UA which is to pickup the phone is advertised using [call-leg]. The UA which is to pickup the
early dialog subscribes either directly to the ringing UA or to a early dialog subscribes either directly to the ringing UA or to a
service aggregating the states for UAs in the pickup group. The service aggregating the states for UAs in the pickup group. The
call state identifies early dialogs. The UA uses the call state(s) call state identifies early dialogs. The UA uses the call state(s)
to help the user choose which early dialog that is to be picked up. to help the user choose which early dialog that is to be picked up.
The UA then invokes the URL in the call state labeled as replace- The UA then invokes the URL in the call state labeled as replace-
remote. remote.
6.1.3 Music on Hold 7.2.3 Music on Hold
Music on hold can be implemented a number of ways. One way is to Music on hold can be implemented a number of ways. One way is to
transfer the held call to a holding service. When the UA wishes to transfer the held call to a holding service. When the UA wishes to
take the call off hold it basically performs a take on the call from take the call off hold it basically performs a take on the call from
the holding service. This involves subscribing to call state on the the holding service. This involves subscribing to call state on the
holding service and then invoking the URL in the call state labeled holding service and then invoking the URL in the call state labeled
as replace-remote. as replace-remote.
Alternatively music on hold can be performed as a local mixing Alternatively music on hold can be performed as a local mixing
operation. The UA holding the call can mix in the music from the operation. The UA holding the call can mix in the music from the
music service via RTP (i.e. an additional dialog) or RTSP or other music service via RTP (i.e. an additional dialog) or RTSP or other
streaming media source. This approach is simpler (i.e. the held streaming media source. This approach is simpler (i.e. the held
dialog does not move so there is less chance of loosing them) from a dialog does not move so there is less chance of loosing them) from a
protocol perspective, however it does use more LAN bandwidth and protocol perspective, however it does use more LAN bandwidth and
resources on the UA. resources on the UA.
6.1.4 Call Monitoring 7.2.4 Call Monitoring
Call monitoring is a [join] operation. The monitoring UA sends a Call monitoring is a [join] operation. The monitoring UA sends a
Join to the dialog it wants to listen to. It is able to discover Join to the dialog it wants to listen to. It is able to discover
the dialog via the call state [call-leg] on the monitored UA. The the dialog via the call state [call-leg] on the monitored UA. The
monitoring UA sends SDP in the INVITE which indicates receive only monitoring UA sends SDP in the INVITE which indicates receive only
media {offer/answer]. IN addition the monitoring UA should indicate media {offer/answer]. IN addition the monitoring UA should indicate
that it wants to receive a mix (see Error! Reference source not that it wants to receive a mix (see Error! Reference source not
found.). As the UA is monitoring only it does not matter whether found.). As the UA is monitoring only it does not matter whether
the UA indicates it wishes the send stream be mix or point to point. the UA indicates it wishes the send stream be mix or point to point.
6.1.5 Barge-in 7.2.5 Barge-in
Barge-in works the same as call monitoring except that it must Barge-in works the same as call monitoring except that it must
indicate that the send media stream to be mixed so that all of the indicate that the send media stream to be mixed so that all of the
other parties can hear the stream from UA barging in. other parties can hear the stream from UA barging in.
6.1.6 Intercom 7.2.6 Intercom
The UA initiates a dialog using INVITE in the ordinary way [bis]. The UA initiates a dialog using INVITE in the ordinary way [bis].
The calling UA then signals the paged UA to answer the call. The The calling UA then signals the paged UA to answer the call. The
calling UA may discover the URL to answer the call via the call calling UA may discover the URL to answer the call via the call
state [call-leg] of the called UA. The called UA accepts the INVITE state [call-leg] of the called UA. The called UA accepts the INVITE
with a 200 Ok and automatically enables the speakerphone. with a 200 Ok and automatically enables the speakerphone.
SIP Multiparty Framework
Alternatively this can be a local decision for the UA to answer Alternatively this can be a local decision for the UA to answer
based upon called party identification. based upon called party identification.
SIP Multiparty Framework 7.2.7 Speakerphone paging
6.1.7 Speakerphone paging
Speakerphone paging can be implemented using either multicast or Speakerphone paging can be implemented using either multicast or
through a simple multipoint mixer. In the multicast solution the through a simple multipoint mixer. In the multicast solution the
paging UA sends a multicast INVITE [bis] with send only media in the paging UA sends a multicast INVITE [bis] with send only media in the
[SDP] (see also [offer/answer]). The automatic answer and enabling [SDP] (see also [offer/answer]). The automatic answer and enabling
of the speakerphone is a locally configured decision on the paged of the speakerphone is a locally configured decision on the paged
UAs. The paging UA sends RTP via the multicast address indicated in UAs. The paging UA sends RTP via the multicast address indicated in
the SDP. the SDP.
The multipoint solution is accomplished by sending an INVITE to the The multipoint solution is accomplished by sending an INVITE to the
multipoint mixer. The mixer is configured to automatically answer multipoint mixer. The mixer is configured to automatically answer
the dialog. The paging UA then sends [REFER] requests for each of the dialog. The paging UA then sends [REFER] requests for each of
the UAs that are to become paging speakers (The UA is likely to send the UAs that are to become paging speakers (The UA is likely to send
out a single REFER which is parallel forked by the proxy server). out a single REFER which is parallel forked by the proxy server).
The UAs performing as paging speakers are configured to The UAs performing as paging speakers are configured to
automatically answer based upon caller identification (e.g. To automatically answer based upon caller identification (e.g. To
field, URI or Referred-To headers). field, URI or Referred-To headers).
6.1.8 Distinctive ring 7.2.8 Distinctive ring
The target UA either makes a local decision based on information in The target UA either makes a local decision based on information in
an incoming INVITE (To, From, Contact, Request-URI) or trusts an an incoming INVITE (To, From, Contact, Request-URI) or trusts an
Alert-Info header provded by the caller or inserted by a trusted Alert-Info header provded by the caller or inserted by a trusted
proxy. In the latter case, the UA fetches the content described in proxy. In the latter case, the UA fetches the content described in
the URI (typically via http) and renders it to the user. the URI (typically via http) and renders it to the user.
6.1.9 Voice message screening 7.2.9 Voice message screening
At first, this is the same as call monitoring. In this case the At first, this is the same as call monitoring. In this case the
voicemail service is one of the UAs. The UA screening the message voicemail service is one of the UAs. The UA screening the message
monitors the call on the voicemail service, and also subscribes to monitors the call on the voicemail service, and also subscribes to
call-leg information. If the user screening their messages decides call-leg information. If the user screening their messages decides
to answer, they perform a Take from the voicemail system (for to answer, they perform a Take from the voicemail system (for
example, send an INVITE with Replaces to the UA leaving the message) example, send an INVITE with Replaces to the UA leaving the message)
6.1.10 Single Line Extension 7.2.10 Single Line Extension
Incoming calls ring all the extensions through basic parallel Incoming calls ring all the extensions through basic parallel
forking [bis]. Each extension subscribes to call-leg events from forking [bis]. Each extension subscribes to call-leg events from
each other extension. While one user has an active call, any other each other extension. While one user has an active call, any other
UA extension can insert itself into that conversation (it already UA extension can insert itself into that conversation (it already
knows the call-leg information)in the same way as barge-in. knows the call-leg information)in the same way as barge-in.
6.1.11 Click-to-dial 7.2.11 Click-to-dial
The application or server which hosts the click-to-dial application The application or server which hosts the click-to-dial application
captures the URL to be dialed and can setup the call using 3pcc or captures the URL to be dialed and can setup the call using 3pcc or
can send a [REFER] request to the UA which is to dial the address. can send a [REFER] request to the UA which is to dial the address.
As users sometimes change their mind or wish to give up listing to a As users sometimes change their mind or wish to give up listing to a
SIP Multiparty Framework
ringing or voicemail answered phone, this application illustrates ringing or voicemail answered phone, this application illustrates
the need to also have the ability to remotely hangup a call. the need to also have the ability to remotely hangup a call.
SIP Multiparty Framework 7.2.12 Pre-paid calling
6.1.12 Pre-paid calling
For prepaid calling, the user's media always passes through a device For prepaid calling, the user's media always passes through a device
which is trusted by the pre-paid provider. This may be the other which is trusted by the pre-paid provider. This may be the other
endpoint (for example a PSTN gateway). In either case, an endpoint (for example a PSTN gateway). In either case, an
intermediary proxy or B2BUA can periodically verify the amount of intermediary proxy or B2BUA can periodically verify the amount of
time available on the pre-paid account, and use the session-timer time available on the pre-paid account, and use the session-timer
extension to cause the trusted endpoint (gateway) or intermediary extension to cause the trusted endpoint (gateway) or intermediary
(media relay) to send a reINVITE before that time runs out. During (media relay) to send a reINVITE before that time runs out. During
the reINVITE, the SIP intermediary can reverify the account and the reINVITE, the SIP intermediary can reverify the account and
insert another session-timer header. insert another session-timer header.
Note that while most pre-paid systems on the PSTN use an IVR to Note that while most pre-paid systems on the PSTN use an IVR to
collect the account number and destination, this isn't strictly collect the account number and destination, this isn't strictly
necessary for a SIP-originated prepaid call. SIP requests and SIP necessary for a SIP-originated prepaid call. SIP requests and SIP
URIs are sufficiently expressive to convey the final destination, URIs are sufficiently expressive to convey the final destination,
the provider of the prepaid service, the location from which the the provider of the prepaid service, the location from which the
user is calling, and the prepaid account they want to use. If a user is calling, and the prepaid account they want to use. If a
pre-paid IVR is used, the mechanism described below (Voice Portals) pre-paid IVR is used, the mechanism described below (Voice Portals)
can be combined as well. can be combined as well.
6.1.13 Voice Portal 7.2.13 Voice Portal
A voice portal is essentially a complex collection of voice dialogs A voice portal is essentially a complex collection of voice dialogs
used to access interesting content. One of the most desirable call used to access interesting content. One of the most desirable call
control features of a Voice Portal is the ability to start a new control features of a Voice Portal is the ability to start a new
outgoing call from within the context of the Portal (to make a outgoing call from within the context of the Portal (to make a
restauraunt reservation, or return a voicemail message for example). restauraunt reservation, or return a voicemail message for example).
Once the new call is over, the user should be able to return to the Once the new call is over, the user should be able to return to the
Portal by pressing a special key, using some DTMF sequence (ex: a Portal by pressing a special key, using some DTMF sequence (ex: a
very long pound or hash tone), or by speaking a hotword (ex: "Main very long pound or hash tone), or by speaking a hotword (ex: "Main
Menu"). Menu").
skipping to change at page 37, line 56 skipping to change at page 36, line 5
the User to perform a Far-Fork. In other words the Voice Portal the User to perform a Far-Fork. In other words the Voice Portal
wants the following media relationship: wants the following media relationship:
{ Target , User } & { User , Voice Portal } { Target , User } & { User , Voice Portal }
The Voice Portal is now just listening for a hotword or the The Voice Portal is now just listening for a hotword or the
appropriate DTMF. As soon as the user indicates they are done, the appropriate DTMF. As soon as the user indicates they are done, the
Voice Portal Takes the call from the old Target, and we are back to Voice Portal Takes the call from the old Target, and we are back to
the original media relationship. the original media relationship.
SIP Multiparty Framework
This feature can also be used by the account number and phone number This feature can also be used by the account number and phone number
collection menu in a pre-paid calling service. A user can press a collection menu in a pre-paid calling service. A user can press a
DTMF sequence which presents them with the a DTMF sequence which presents them with the a
SIP Multiparty Framework
7 Security Considerations
Call Control primitives provide a powerful set of features that can
be dangerous in the hands of an attacker. To complicate matters,
call control primitives are likely to be automatically authorized
without direct human oversight.
The class of attacks which are possible using these tools include
the ability to eavesdrop on calls, disconnect calls, redirect calls,
render irritating content (including ringing) at a user agent, cause
an action that has billing consequences, subvert billing (theft-of-
service), and obtain private information. Call control extensions
must take extra care to describe how these attacks will be
prevented.
We can also make some general observations about authorization and
trust with respect to call control. The security model is
dramatically dependent on the signaling model chosen (see section
4.2)
Let us first examine the security model used in the 3pcc approach.
All signaling goes through the controller, which is a trusted
entity. Traditional SIP authentication and hop-by-hop encrpytion
and message integrity work fine in this environment, but end-to-end
encrpytion and message integrity may not be possible.
When using the peer-to-peer approach, call control actions and
primitives can be legitimately initiated by a) an existing
participant in the conversation space, b) a former participant in
the conversation space, or c) an entity trusted by one of the
participants. For example, a participant always initiates a
transfer; a retrieve from Park (a take) is initiated on behalf of a
former participant; and a barge-in (insert or far-fork) is initiated
by a trusted entity (an operator for example).
Authenticating requests by an existing participant or a trusted
entity can be done with baseline SIP mechanisms. In the case of
features initiated by a former participant, these should be
protected against replay attacks by using a unique name or
identifier per invocation. The Replaces header exhibits this
behavior as a by-product of its operation (once a Replaces operation
is successful, the call-leg being Replaced no longer exists). For
other requests, a "one-time" Request-URI may be provided to the
feature invoker.
To authorize call control primitives that trigger special behavior
(such as an INVITE with Replace, Join, or Fork semantics), the
receiving user agent may have trouble finding appropriate
credentials with which to challenge or authorize the request, as the
sender may be completely unknown to the receiver, except through the
introduction of a third party. These credentials need to be passed
transitively in some way or fetched in an event body, for example.
8 References 8 References
[SIP] M. Handley, E. Schooler, and H. Schulzrinne, "SIP: Session [SIP] M. Handley, E. Schooler, and H. Schulzrinne, "SIP: Session
Initiation Protocol", RFC2543, Internet Engineering Task Force, Initiation Protocol", RFC2543, Internet Engineering Task Force,
Nov 1998. Nov 1998.
[RFC2026] S Bradner, "The Internet Standards Process -- Revision 3",
RFC2026 (BCP), IETF, October 1996.
[RFC2119] S. Bradner, "Key words for use in RFCs to indicate [RFC2119] S. Bradner, "Key words for use in RFCs to indicate
requirement levels," Request for Comments (Best Current requirement levels," Request for Comments (Best Current
Practice) 2119, Internet Engineering Task Force, Mar. 1997. Practice) 2119, Internet Engineering Task Force, Mar. 1997.
[REFER] R. Sparks, "The Refer Method", Internet Draft <draft-ietf- [REFER] R. Sparks, "The Refer Method", Internet Draft <draft-ietf-
sip-refer-02>, IETF, October 30, 2001, Work in progress. sip-refer-02>, IETF, October 30, 2001, Work in progress.
[3pcc] J. Rosenberg, J. Peterson, H. Schulzrinne, G. Camarillo, [3pcc] J. Rosenberg, J. Peterson, H. Schulzrinne, G. Camarillo,
"Third Party Call Control in SIP", Internet Draft <draft-rosenberg- "Third Party Call Control in SIP", Internet Draft <draft-rosenberg-
sip-3pcc-02.txt>, IETF; March 2001. Work in progress sip-3pcc-02.txt>, IETF; March 2001. Work in progress
skipping to change at page 39, line 53 skipping to change at page 37, line 4
Draft <draft-mahy-sipping-join-and-fork-00.txt>, IETF, November Draft <draft-mahy-sipping-join-and-fork-00.txt>, IETF, November
2001, Work in progress. 2001, Work in progress.
[RTP] H. Schulzrinne , S. Casner , R. Frederick , V. Jacobson , [RTP] H. Schulzrinne , S. Casner , R. Frederick , V. Jacobson ,
"RTP: A Transport Protocol for Real-Time Applications", Request for "RTP: A Transport Protocol for Real-Time Applications", Request for
Comments (Standards Track)1889, IETF, January 1996 Comments (Standards Track)1889, IETF, January 1996
[SDP] H. Schulzrinne M. Handley, V. Jacobson, "SDP: Session [SDP] H. Schulzrinne M. Handley, V. Jacobson, "SDP: Session
Description Protocol", Request for Comments (Standards Track) 2327, Description Protocol", Request for Comments (Standards Track) 2327,
Internet Engineering Task Force, April 1998 Internet Engineering Task Force, April 1998
SIP Multiparty Framework
[events] A. Roach, "SIP-Specific Event Notification",Internet Draft [events] A. Roach, "SIP-Specific Event Notification",Internet Draft
<draft-ietf-sip-events-03.txt>, IETF, February 2002, Work in <draft-ietf-sip-events-03.txt>, IETF, February 2002, Work in
progress. progress.
SIP Multiparty Framework
[offer/answer] J. Rosenberg, H. Schulzrinne, "An Offer/Answer Model [offer/answer] J. Rosenberg, H. Schulzrinne, "An Offer/Answer Model
with SDP", Internet Draft <draft-ietf-mmusic-sdp-offer-answer- with SDP", Internet Draft <draft-ietf-mmusic-sdp-offer-answer-
01.txt>, IETF, February 21, 2002, Work in progress. 01.txt>, IETF, February 21, 2002, Work in progress.
[caller prefs] J. Rosenberg, "SIP Caller Preferences and Callee [caller prefs] J. Rosenberg, "SIP Caller Preferences and Callee
Capabilities",Internet Draft <draft-ietf-sip-callerprefs-05.txt>, Capabilities",Internet Draft <draft-ietf-sip-callerprefs-05.txt>,
IETF, November 21, 2001, Work in progress. IETF, November 21, 2001, Work in progress.
[msg waiting] R. Mahy, I. Slain, "Message Waiting in SIP",Internet [msg waiting] R. Mahy, I. Slain, "Message Waiting in SIP",Internet
Draft <draft-mahy-sip-message-waiting-02.txt>, IETF, July 2001, Work Draft <draft-mahy-sip-message-waiting-02.txt>, IETF, July 2001, Work
skipping to change at page 40, line 51 skipping to change at page 37, line 54
Engineering Task Force, June 1999 Engineering Task Force, June 1999
[rtsp] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming [rtsp] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming
Protocol (RTSP)", Request for Comments (Standards Track) 2326, Protocol (RTSP)", Request for Comments (Standards Track) 2326,
Internet Engineering Task Force, April 1998 Internet Engineering Task Force, April 1998
[mrcp] S. Shanmugham, P. Monaco, B. Eberman, "MRCP: Media Resource [mrcp] S. Shanmugham, P. Monaco, B. Eberman, "MRCP: Media Resource
Control Protocol", Internet Draft <draft-shanmugham-mrcp-01.txt>, Control Protocol", Internet Draft <draft-shanmugham-mrcp-01.txt>,
IETF, November 20, 2001, Work in progress. IETF, November 20, 2001, Work in progress.
[VoiceXML] S. McGlashan et al, ˘Voice Extensible Markup Language [VoiceXML] S. McGlashan et al, "Voice Extensible Markup Language
(VoiceXML) Version 2.0÷, W3C Working Draft, 23 October 2001, Work in (VoiceXML) Version 2.0", W3C Working Draft, 23 October 2001, Work in
progress. progress.
[H.323] [H.323]
SIP Multiparty Framework
[tel URL] [tel URL]
[caller-prefs] [caller-prefs]
SIP Multiparty Framework
[session timer] [session timer]
[service context] [service context]
[avt tones] [avt tones]
[GSM] [GSM]
[MPEG2] [MPEG2]
skipping to change at page 41, line 39 skipping to change at page 38, line 43
[distributed full mesh conf] [distributed full mesh conf]
[Media forking] M. Shankar, "SIP Forked Media", Internet Draft [Media forking] M. Shankar, "SIP Forked Media", Internet Draft
<draft-shankar-sip-forked-media-00.txt>, IETF, Feb. 2001. Work in <draft-shankar-sip-forked-media-00.txt>, IETF, Feb. 2001. Work in
progress. progress.
[PHONECTL] R. Dean, Belkind, B. Biggs, "PHONECTL: A Protocol for [PHONECTL] R. Dean, Belkind, B. Biggs, "PHONECTL: A Protocol for
Remote Phone Control", Internet Draft <draft-dean-phonectl-03.txt>, Remote Phone Control", Internet Draft <draft-dean-phonectl-03.txt>,
IETF, Jan. 2001. Work in progress. IETF, Jan. 2001. Work in progress.
9 To Do 9 Changes since -00
- Add diagrams to section 4.3.1, 4.3.2, and 4.3.3 - Removed many media-specific references.
- Condensed discussion on mixing models, and VoiceXML discussion.
- Moved the sample feature discussion to an Appendix
10
To Do
- Add diagrams to section 4.3.2 and 4.3.3
- Convert to XML
- Fix references - Fix references
SIP Multiparty Framework
- Propose to move Appendix A (sample features to service flows)
- Align with terminology with conferencing drafts
- Show roadmap for related drafts
Other frameworks and requirements
Conferencing framework
Conferencing models
Framework for markup
Extensions
REFER
Replaces
Join
Caller prefs
Packages
conference-package
dialog package
Usage Drafts
3pcc
cc-transfer
Informational Drafts
Service flows
- Define some semantics for authorization rules. For example one - Define some semantics for authorization rules. For example one
could define a dictionary of primitives and/or perhaps define sets could define a dictionary of primitives and/or perhaps define sets
or classes of these primitives, then configure who is allowed to use or classes of these primitives, then configure who is allowed to use
them them
10 Acknowledgments 11
Acknowledgments
Thanks to all who attended the SIP interim meeting in February 2001 Thanks to all who attended the SIP interim meeting in February 2001
for their support of the ideas behind this document. for their support of the ideas behind this document.
11 Author's Addresses 12
Author's Addresses
Rohan Mahy Rohan Mahy
SIP Multiparty Framework
Cisco Systems Cisco Systems
170 West Tasman Dr, MS: SJC-21/3/3 170 West Tasman Dr, MS: SJC-21/3/3
Phone: +1 408 526 8570 Phone: +1 408 526 8570
Email: rohan@cisco.com Email: rohan@cisco.com
Ben Campbell Ben Campbell
dynamicsoft dynamicsoft
5100 Tennyson Parkway 5100 Tennyson Parkway
Suite 1200 Suite 1200
Plano, Texas 75024 Plano, Texas 75024
Email: bcampbell@dynamicsoft.com Email: bcampbell@dynamicsoft.com
SIP Multiparty Framework
Alan Johnston Alan Johnston
WorldCom WorldCom
100 S. 4th Street 100 S. 4th Street
St. Louis, Missouri 63104 St. Louis, Missouri 63104
Email: alan.johnston@wcom.com Email: alan.johnston@wcom.com
Daniel G. Petrie Daniel G. Petrie
Pingtel Corp. Pingtel Corp.
400 W. Cummings Park 400 W. Cummings Park
skipping to change at page 43, line 4 skipping to change at page 40, line 47
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph kind, provided that the above copyright notice and this paragraph
are included on all such copies and derivative works. However, this are included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for developing Internet standards in which case the procedures for
SIP Multiparty Framework
copyrights defined in the Internet Standards process must be copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than followed, or as required to translate it into languages other than
English. English.
The limited permissions granted above are perpetual and will not be The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns. revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
SIP Multiparty Framework
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/