draft-ietf-sipping-cc-framework-01.txt   draft-ietf-sipping-cc-framework-02.txt 
SIPPING Working Group Mahy/Cisco SIPPING WG R. Mahy
Internet Draft Campbell/dynamicsoft Internet-Draft Cisco Systems
Document: draft-ietf-sipping-cc-framework-01.txt Johnston/Worldcom Expires: September 5, 2003 B. Campbell
June 2002 Petrie/Pingtel R. Sparks
Rosenberg/dynamicsoft J. Rosenberg
Expires: December 2002 Sparks/dynamicsoft dynamicsoft
D. Petrie
Pingtel
A. Johnston
WorldCom
March 7, 2003
A Multi-party Application Framework for SIP A Call Control and Multi-party usage framework for the Session
Initiation Protocol (SIP)
draft-ietf-sipping-cc-framework-02.txt
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that other
other groups may also distribute working documents as Internet- groups may also distribute working documents as Internet-Drafts.
Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other Internet-Drafts are draft documents valid for a maximum of six months
documents at any time. It is inappropriate to use Internet- Drafts and may be updated, replaced, or obsoleted by other documents at any
as reference material or to cite them other than as "work in time. It is inappropriate to use Internet-Drafts as reference
progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
1 Abstract This Internet-Draft will expire on September 5, 2003.
This document defines a framework and requirements for multi-party Copyright Notice
applications in SIP. To enable discussion of multi-party
applications we define an abstract call model for describing the
media relationships required by many of these applications. The
model and actions described here are specifically chosen to be
independent of the SIP signaling and/or mixing approach chosen to
actually setup the media relationships. In addition to its dialog
manipulation aspect, this framework includes requirements for
communicating related information and events such as conference and
session state, and session history. This framework also describes
other goals which embody the spirit of SIP applications as used on
the Internet.
2 Conventions used in this document Copyright (C) The Internet Society (2003). All Rights Reserved.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", Abstract
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" this
document are to be interpreted as described in RFC-2119 [RFC2119].
SIP Multiparty Framework This document defines a framework and requirements for multi-party
usage of SIP. To enable discussion of multi-party features and
applications we define an abstract call model for describing the
media relationships required by many of these. The model and actions
described here are specifically chosen to be independent of the SIP
signaling and/or mixing approach chosen to actually setup the media
relationships. In addition to its dialog manipulation aspect, this
framework includes requirements for communicating related information
and events such as conference and session state, and session history.
This framework also describes other goals which embody the spirit of
SIP applications as used on the Internet.
Table of Contents Table of Contents
1 Abstract.......................................................1
2 Conventions used in this document..............................1
3 Motivation and Background......................................4
3.1 Goals........................................................4
3.2 Example Features............................................28
4 Key Concepts...................................................6
4.1 "Conversation Space" Model...................................6
4.1.1 Comparison with Related Definitions........................7
4.2 Signaling Models.............................................7
4.3 Mixing Models................................................8
4.3.1 (Single) End System Mixing.................................9
4.3.2 Centralized Mixing.........................................9
4.3.3 Multicast and Multi-unicast conferences...................10
4.4 Conveying Information and Events............................11
4.5 Componentization and Decomposition..........................13
4.5.1 Media Intermediaries......................................13
4.5.2 Queue Server..............................................14
4.5.3 Parking Place.............................................14
4.5.4 Announcements and Voice Dialogs...........................14
4.6 Use of URIs.................................................16
4.6.1 Naming Users in SIP.......................................17
4.6.2 Naming Services with SIP URIs.............................18
4.7 Invoker Independence........................................21
4.8 Billing issues..............................................21
5 Catalog of call control actions and sample features............22
5.1 Early Dialog Actions........................................22
5.1.1 Remote Answer.............................................22
5.1.2 Remote Forward or Put.....................................22
5.1.3 Remote Busy or Error Out..................................23
5.2 Single Dialog Actions.......................................23
5.2.1 Remote Dial...............................................23
5.2.2 Remote On and Off Hold....................................23
5.2.3 Remote Hangup.............................................23
5.3 Multi-dialog actions........................................23
5.3.1 Transfer..................................................23
5.3.2 Take......................................................24
5.3.3 Add.......................................................25
5.3.4 Local Join................................................25
5.3.5 Insert....................................................26
5.3.6 Split.....................................................26
5.3.7 Near-fork.................................................26
5.3.8 Far fork..................................................27
6 Putting it all together.............Error! Bookmark not defined.
6.1 Feature Solutions.................Error! Bookmark not defined.
6.1.1 Call Park.................................................32
6.1.2 Call Pickup...............................................32
6.1.3 Music on Hold.............................................33
6.1.4 Call Monitoring...........................................33
6.1.5 Barge-in..................................................33
6.1.6 Intercom..................................................33
6.1.7 Speakerphone paging.......................................34
6.1.8 Distinctive ring..........................................34
SIP Multiparty Framework
6.1.9 Voice message screening...................................34 1. Conventions . . . . . . . . . . . . . . . . . . . . . . . 4
6.1.10 Single Line Extension.....................................34 2. Motivation and Background . . . . . . . . . . . . . . . . 4
6.1.11 Click-to-dial.............................................34 3. Key Concepts . . . . . . . . . . . . . . . . . . . . . . . 6
6.1.12 Pre-paid calling..........................................35 3.1 "Conversation Space" Model . . . . . . . . . . . . . . . . 6
6.1.13 Voice Portal..............................................35 3.2 Comparison with Related Definitions . . . . . . . . . . . 7
7 Security Considerations.......................................27 3.3 Signaling Models . . . . . . . . . . . . . . . . . . . . . 8
8 References....................................................36 3.4 Mixing Models . . . . . . . . . . . . . . . . . . . . . . 9
9 Acknowledgments...............................................39 3.4.1 Tightly Coupled . . . . . . . . . . . . . . . . . . . . . 10
10 Author's Addresses...........................................39 3.4.2 Loosely Coupled . . . . . . . . . . . . . . . . . . . . . 11
SIP Multiparty Framework 3.5 Conveying Information and Events . . . . . . . . . . . . . 12
3.6 Componentization and Decomposition . . . . . . . . . . . . 13
3.6.1 Media Intermediaries . . . . . . . . . . . . . . . . . . . 14
3.6.2 Mixer . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.6.3 Transcoder . . . . . . . . . . . . . . . . . . . . . . . . 14
3.6.4 Media Relay . . . . . . . . . . . . . . . . . . . . . . . 15
3.6.5 Queue Server . . . . . . . . . . . . . . . . . . . . . . . 15
3.6.6 Parking Place . . . . . . . . . . . . . . . . . . . . . . 15
3.6.7 Announcements and Voice Dialogs . . . . . . . . . . . . . 15
3.7 Use of URIs . . . . . . . . . . . . . . . . . . . . . . . 17
3.7.1 Naming Users in SIP . . . . . . . . . . . . . . . . . . . 18
3.7.2 Naming Services with SIP URIs . . . . . . . . . . . . . . 19
3.8 Invoker Independence . . . . . . . . . . . . . . . . . . . 22
3.9 Billing issues . . . . . . . . . . . . . . . . . . . . . . 23
4. Catalog of call control actions and sample features . . . 23
4.1 Early Dialog Actions . . . . . . . . . . . . . . . . . . . 24
4.1.1 Remote Answer . . . . . . . . . . . . . . . . . . . . . . 24
4.1.2 Remote Forward or Put . . . . . . . . . . . . . . . . . . 24
4.1.3 Remote Busy or Error Out . . . . . . . . . . . . . . . . . 24
4.2 Single Dialog Actions . . . . . . . . . . . . . . . . . . 25
4.2.1 Remote Dial . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 Remote On and Off Hold . . . . . . . . . . . . . . . . . . 25
4.2.3 Remote Hangup . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Multi-dialog actions . . . . . . . . . . . . . . . . . . . 25
4.3.1 Transfer . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.2 Take . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.3 Add . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.4 Local Join . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.5 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.6 Split . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.7 Near-fork . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.8 Far fork . . . . . . . . . . . . . . . . . . . . . . . . . 28
5. Security Considerations . . . . . . . . . . . . . . . . . 28
6. Appendix A: Example Features . . . . . . . . . . . . . . . 29
6.1 Implementation of these features . . . . . . . . . . . . . 33
6.1.1 Call Park . . . . . . . . . . . . . . . . . . . . . . . . 34
6.1.2 Call Pickup . . . . . . . . . . . . . . . . . . . . . . . 35
6.1.3 Music on Hold . . . . . . . . . . . . . . . . . . . . . . 35
6.1.4 Call Monitoring . . . . . . . . . . . . . . . . . . . . . 35
6.1.5 Barge-in . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.1.6 Intercom . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.1.7 Speakerphone paging . . . . . . . . . . . . . . . . . . . 36
6.1.8 Distinctive ring . . . . . . . . . . . . . . . . . . . . . 36
6.1.9 Voice message screening . . . . . . . . . . . . . . . . . 37
6.1.10 Single Line Extension . . . . . . . . . . . . . . . . . . 37
6.1.11 Click-to-dial . . . . . . . . . . . . . . . . . . . . . . 37
6.1.12 Pre-paid calling . . . . . . . . . . . . . . . . . . . . . 37
6.1.13 Voice Portal . . . . . . . . . . . . . . . . . . . . . . . 38
Normative References . . . . . . . . . . . . . . . . . . . 38
Informational References . . . . . . . . . . . . . . . . . 40
Authors' Addresses . . . . . . . . . . . . . . . . . . . . 40
Intellectual Property and Copyright Statements . . . . . . 42
3 Motivation and Background 1. Conventions
The Session Initiation Protocol [SIP] was defined for the The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
initiation, maintenance, and termination of sessions or calls "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
between one or more users. However, despite its origins as a large- document are to be interpreted as described in RFC-2119 [2].
scale multiparty conferencing protocol, SIP is used today primarily
for point to point calls. This two-party configuration is the focus 2. Motivation and Background
of the SIP specification and most of its extensions.
The Session Initiation Protocol [1] (SIP) was defined for the
initiation, maintenance, and termination of sessions or calls between
one or more users. However, despite its origins as a large-scale
multiparty conferencing protocol, SIP is used today primarily for
point to point calls. This two-party configuration is the focus of
the SIP specification and most of its extensions.
This document defines a framework and requirements for multi-party This document defines a framework and requirements for multi-party
applications in SIP. Most multi-party applications manipulate SIP usage of SIP. Most multi-party operations manipulate SIP session
dialogs (also known as call legs) to cause participants in a dialogs (also known as call legs) or SIP conference media policy to
conversation to perceive specific media relationships. In other cause participants in a conversation to perceive specific media
protocols that deal with the concept of calls, this manipulation is relationships. In other protocols that deal with the concept of
known as call control. In addition to its dialog manipulation calls, this manipulation is known as call control. In addition to
aspect, "call control" also includes communicating information and its dialog or policy manipulation aspect, "call control" also
events related to manipulating calls, including information and includes communicating information and events related to manipulating
events dealing with session state and history, conference state, calls, including information and events dealing with session state
user state, and even message state. and history, conference state, user state, and even message state.
3.1 Goals
Based on input from the SIP community, the authors compiled the Based on input from the SIP community, the authors compiled the
following set of goals for SIP call control and multiparty following set of goals for SIP call control and multiparty
applications: applications:
- Define Primitives, Not Services. Allow for a handful of robust o Define Primitives, Not Services. Allow for a handful of robust
yet simple mechanisms which can be combined to deliver features and yet simple mechanisms which can be combined to deliver features
services. Throughout this document we refer to these simple and services. Throughout this document we refer to these simple
mechanisms as "primitives". Primitives should be sufficiently mechanisms as "primitives". Primitives should be sufficiently
robust that when they are combined they can be used to build lots of robust that when they are combined they can be used to build lots
services. However, the goal is not to define a provably complete of services. However, the goal is not to define a provably
set of primitives. Note that while the IETF will NOT standardize complete set of primitives. Note that while the IETF will NOT
behavior or services, it may define example services for standardize behavior or services, it may define example services
informational purposes, as in [service examples]. for informational purposes, as in service examples [6].
- Participant oriented. The primitives should be designed to o Participant oriented. The primitives should be designed to
provide services which are oriented around the experience of the provide services which are oriented around the experience of the
participants. The authors observe that end users of features and participants. The authors observe that end users of features and
services usually don't care how a media relationship is setup. services usually don't care how a media relationship is setup.
Their ultimate experience is based only on the resulting media and Their ultimate experience is based only on the resulting media and
other externally visible characteristics. other externally visible characteristics.
- Signaling Model independent: Support both a central control and a o Signaling Model independent: Support both a central control and a
peer-to-peer feature invocation model (and combinations of the two). peer-to-peer feature invocation model (and combinations of the
baseline SIP already supports a centralized control model described two). Baseline SIP already supports a centralized control model
in [3pcc], and the SIP community has expressed a great deal of described in [3pcc], and the SIP community has expressed a great
interest in peer-to-peer or distributed call control. Some such deal of interest in peer-to-peer or distributed call control using
primitives are already defined in [REFER] and [Replaces]. primitives such as those defined in REFER [8], Replaces [9], and
Join [10].
- Mixing Model independent: The bulk of interesting multiparty o Mixing Model independent: The bulk of interesting multiparty
applications involve mixing or combining media from multiple applications involve mixing or combining media from multiple
participants. This mixing can be performed by one or more of the participants. This mixing can be performed by one or more of the
SIP Multiparty Framework
participants, or by a centralized mixing resource. The experience participants, or by a centralized mixing resource. The experience
of the participants should not depend on the mixing model used. of the participants should not depend on the mixing model used.
While most examples in this document refer to audio mixing, the While most examples in this document refer to audio mixing, the
framework applies to any media type. In this context a "mixer" framework applies to any media type. In this context a "mixer"
refers to combining media in an appropriate, media-specific way. refers to combining media in an appropriate, media-specific way.
This is consistent with model described in the SIP conferencing
framework.
- Invoker oriented. Only the user who invokes a feature or a service o Invoker oriented. Only the user who invokes a feature or a service
needs to know exactly which service is invoked or why. This is good needs to know exactly which service is invoked or why. This is
because it allows new services to be created without requiring new good because it allows new services to be created without
primitives from all the participants; and it allows for much simpler requiring new primitives from all the participants; and it allows
feature authorization policies, for example, when participation for much simpler feature authorization policies, for example, when
spans organizational boundaries. As discussed in section 4.7, this participation spans organizational boundaries. As discussed in
also avoids exponential state explosion when combining features. section 3.8, this also avoids exponential state explosion when
The invoker only has to manage a user interface or API to prevent combining features. The invoker only has to manage a user
local feature interactions. All the other participants simply need interface or API to prevent local feature interactions. All the
to manage the feature interactions of a much smaller number of other participants simply need to manage the feature interactions
primitives. of a much smaller number of primitives.
- Primitives make full use of URIs. URIs are a very powerful o Primitives make full use of URIs. URIs are a very powerful
mechanism for describing users and services. They represent a mechanism for describing users and services. They represent a
plentiful resource which can be extremely expressive and easily plentiful resource which can be extremely expressive and easily
routed, translated, and manipulated--even across organizational routed, translated, and manipulated--even across organizational
boundaries. URIs can contain special parameters and informational boundaries. URIs can contain special parameters and informational
headers which need only be relevant to the owner of the namespace headers which need only be relevant to the owner of the namespace
(domain) of the URI. Just as a user who selects an http: URL need (domain) of the URI. Just as a user who selects an http: URL need
not understand the significance and organization of the web site it not understand the significance and organization of the web site
references, a user may encounter a SIP URL which translates into an it references, a user may encounter a SIP URL which translates
email-style group alias, which plays a pre-recorded message, or runs into an email-style group alias, which plays a pre-recorded
some complex call-handling logic. message, or runs some complex call-handling logic. Note that
while this may seem paradoxical to the previous goal, both goals
can be satisfied by the same model.
- Make use of SIP headers and SIP event packages to provide SIP o Make use of SIP headers and SIP event packages to provide SIP
entities with information about their environment. These should entities with information about their environment. These should
include information about the status / handling of dialogs on other include information about the status / handling of dialogs on
user agents, information about the history of other contacts other user agents, information about the history of other contacts
attempted prior to the current contact, the status of participants, attempted prior to the current contact, the status of
the status of conferences, user presence information, and the status participants, the status of conferences, user presence
of messages. information, and the status of messages.
- Encourage service decomposition, and design to make use of o Encourage service decomposition, and design to make use of
standard components using well-defined, simple interfaces. Sample standard components using well-defined, simple interfaces. Sample
components include a SIP mixer, recording service, announcement components include a SIP mixer, recording service, announcement
server, and voice dialog server. (This is not an exhaustive list). server, and voice dialog server. (This is not an exhaustive
list).
- Include authentication, authorization, policy, logging, and o Include authentication, authorization, policy, logging, and
accounting mechanisms to allow these primitives to be used safely accounting mechanisms to allow these primitives to be used safely
among mutually untrusted participants. Some of these mechanisms may among mutually untrusted participants. Some of these mechanisms
be used to assist in billing, but no specific billing system will be may be used to assist in billing, but no specific billing system
endorsed. will be endorsed.
- Permit graceful fallback to baseline SIP. Definitions for new SIP o Permit graceful fallback to baseline SIP. Definitions for new SIP
call control extensions/primitives MUST describe a graceful way to call control extensions/primitives MUST describe a graceful way to
fallback to baseline SIP behavior. Support for one primitive MUST fallback to baseline SIP behavior. Support for one primitive MUST
NOT imply support for another primitive. NOT imply support for another primitive.
SIP Multiparty Framework o There is no desire or goal to reinvent traditional models, such as
- There is no desire or goal to reinvent traditional models, such as
the model used the [H.450] family of protocols, [JTAPI], or the the model used the [H.450] family of protocols, [JTAPI], or the
[CSTA] call model, as these other models do not share the design [CSTA] call model, as these other models do not share the design
goals presented in this document. goals presented in this document.
4 Key Concepts 3. Key Concepts
4.1 "Conversation Space" Model 3.1 "Conversation Space" Model
This document introduces the concept of an abstract "conversation This document introduces the concept of an abstract "conversation
space" (essentially as a set of participants who believe they are space" (essentially as a set of participants who believe they are all
all communicating among one another). Each conversation space communicating among one another). Each conversation space contains
contains one or more participants. one or more participants.
Participants are SIP User Agents which send original media to or Participants are SIP User Agents which send original media to or
terminate and receive media from other members of the conversation terminate and receive media from other members of the conversation
space. Logically, every participant in the conversation space has space. Logically, every participant in the conversation space has
access to all the media generated in that space (this is strictly access to all the media generated in that space (this is strictly
true if all participants share a common media type). A SIP User true if all participants share a common media type). A SIP User
Agent which does not contribute or consume any media is NOT a Agent which does not contribute or consume any media is NOT a
participant; nor is a user agent which merely forwards, transcodes, participant; nor is a user agent which merely forwards, transcodes,
mixes, or selects media originating elsewhere in the conversation mixes, or selects media originating elsewhere in the conversation
space. [Note that a conversation space consists of zero or more SIP space. [Note that a conversation space consists of zero or more SIP
skipping to change at page 6, line 45 skipping to change at page 7, line 16
be hidden within a conversation space. Some examples of hidden be hidden within a conversation space. Some examples of hidden
participants include: robots which generate tones, images, or participants include: robots which generate tones, images, or
announcements during a conference to announce users arriving and announcements during a conference to announce users arriving and
departing, a human call center supervisor monitoring a conversation departing, a human call center supervisor monitoring a conversation
between a trainee and a customer, and robots which record media for between a trainee and a customer, and robots which record media for
training or archival purposes. training or archival purposes.
Participants may also be active or passive. Active participants are Participants may also be active or passive. Active participants are
expected to be intelligent enough to leave a conversation space when expected to be intelligent enough to leave a conversation space when
they no longer desire to participate. (An attentive human they no longer desire to participate. (An attentive human
participant is obviously active.) Some robotic participants (such participant is obviously active.) Some robotic participants (such as
as a voice messaging system, an instant messaging agent, or a voice a voice messaging system, an instant messaging agent, or a voice
dialog system) may be active participants if they can leave the dialog system) may be active participants if they can leave the
conversation space when there is no human interaction. Other robots conversation space when there is no human interaction. Other robots
(for example our tone generating robot from the previous example) (for example our tone generating robot from the previous example) are
are passive participants. A human participant "on-hold" is passive. passive participants. A human participant "on-hold" is passive.
An example diagram of a conversation space can be shown as a An example diagram of a conversation space can be shown as a "bubble"
"bubble" or ovals, or as a "set" in curly or square brace notation. or ovals, or as a "set" in curly or square brace notation. Each set,
Each set, oval, or "bubble" represents a conversation space. Hidden oval, or "bubble" represents a conversation space. Hidden
participants are shown in lowercase letters. participants are shown in lowercase letters.
{ A , B } [ A , B ] { A , B } [ A , B ]
SIP Multiparty Framework
.-. .---. .-. .---.
/ \ / \ / \ / \
/ A \ / A b \ / A \ / A b \
( ) ( ) ( ) ( )
\ B / \ C D / \ B / \ C D /
\ / \ / \ / \ /
'-' '---' '-' '---'
4.1.1 Comparison with Related Definitions 3.2 Comparison with Related Definitions
In SIP, a call is "an informal term that refers to some In SIP, a call is "an informal term that refers to some communication
communication between peers, generally set up for the purposes of a between peers, generally set up for the purposes of a multimedia
multimedia conversation." Obviously we cannot discuss normative conversation." Obviously we cannot discuss normative behavior based
behavior based on such an intentionally vague definition. The on such an intentionally vague definition. The concept of a
concept of a conversation space is needed because the SIP definition conversation space is needed because the SIP definition of call is
of call is not sufficiently precise for the purpose of describing not sufficiently precise for the purpose of describing the user
the user experience of multiparty features. experience of multiparty features.
Do any other definitions convey the correct meaning? SIP, and [SDP] Do any other definitions convey the correct meaning? SIP, and SDP
both define a conference as "a multimedia session identified by a [5] both define a conference as "a multimedia session identified by a
common session description." A session is defined as "a set of common session description." A session is defined as "a set of
multimedia senders and receivers and the data streams flowing from multimedia senders and receivers and the data streams flowing from
senders to receivers." Both of these definitions are heavily senders to receivers." Both of these definitions are heavily
oriented toward multicast sessions with little differenciation among oriented toward multicast sessions with little differenciation among
participants. As such, neither is particularly useful for our participants. As such, neither is particularly useful for our
purposes. In fact, the definition of "call" in some call models is purposes. In fact, the definition of "call" in some call models is
more similar to our definition of a conversation space. more similar to our definition of a conversation space.
Some examples of the relationship between conversation spaces, SIP Some examples of the relationship between conversation spaces, SIP
call legs, and SIP sessions are listed below. In each example, a call legs, and SIP sessions are listed below. In each example, a
human user will perceive that there is a single call. human user will perceive that there is a single call.
A simple two-party call is a single conversation space, a single o A simple two-party call is a single conversation space, a single
session, and a single call-leg. session, and a single call-leg.
A locally mixed three-way call is two sessions and two call- o A locally mixed three-way call is two sessions and two call-legs.
legs. It is also a single conversation space. It is also a single conversation space.
A simple dial-in audio conference is a single conversation o A simple dial-in audio conference is a single conversation space,
space, but is represented by as many call-legs and sessions as but is represented by as many call-legs and sessions as there are
there are human participants. human participants.
A multicast conference is a single conversation space, a single o A multicast conference is a single conversation space, a single
session, and as many call-legs as participants. session, and as many call-legs as participants.
4.2 Signaling Models 3.3 Signaling Models
Obviously to make changes to a conversation space, you must be able Obviously to make changes to a conversation space, you must be able
to use SIP signaling to cause these changes. Specifically there to use SIP signaling to cause these changes. Specifically there must
must be a way to manipulate SIP dialogs (call legs) to move be a way to manipulate SIP dialogs (call legs) to move participants
participants into and out of conversation spaces. Although this is into and out of conversation spaces. Although this is not as
not as obvious, there also must be a way to manipulate SIP dialogs obvious, there also must be a way to manipulate SIP dialogs to
to include non-participant user agents which are otherwise involved include non-participant user agents which are otherwise involved in a
SIP Multiparty Framework conversation space (ex: B2BUAs, 3pcc controllers, mixers,
in a conversation space (ex: B2BUAs, 3pcc controllers, mixers,
transcoders, translators, or relays). transcoders, translators, or relays).
Implementations may setup the media relationships described in the Implementations may setup the media relationships described in the
conversation space model using the approach described in [3pcc]. The conversation space model using the approach described in 3pcc [7].
3pcc approach relies on only the following 3 primitive operations: The 3pcc approach relies on only the following 3 primitive
operations:
Create a new call-leg (INVITE) o Create a new call-leg (INVITE)
Modify a call-leg (reINVITE)
Destroy a call-leg (BYE)
The main advantage of the 3pcc approach is that it only requires o Modify a call-leg (reINVITE)
very basic SIP support from end systems to support call control
features. As such, third-party call control is a natural way to o Destroy a call-leg (BYE)
handle protocol conversion and mid-call features. It also has the
advantage and disadvantage that new features can/must be implemented The main advantage of the 3pcc approach is that it only requires very
in one place only (the controller), and neither requires enhanced basic SIP support from end systems to support call control features.
client functionality, nor takes advantage of it. As such, third-party call control is a natural way to handle protocol
conversion and mid-call features. It also has the advantage and
disadvantage that new features can/must be implemented in one place
only (the controller), and neither requires enhanced client
functionality, nor takes advantage of it.
In addition, a peer-to-peer approach is discussed at length in this In addition, a peer-to-peer approach is discussed at length in this
draft. The primary drawback of the peer-to-peer model is additional draft. The primary drawback of the peer-to-peer model is additional
end system complexity. The benefits of the peer-to-peer model end system complexity. The benefits of the peer-to-peer model
include: include:
- state remains at the edges
- call signaling need only go through participants involved o state remains at the edges
(there are no additional points of failure)
- peers can take advantage of end-to-end message integrity or o call signaling need only go through participants involved (there
are no additional points of failure)
o peers can take advantage of end-to-end message integrity or
encryption encryption
- setup time is shorter (fewer messages and round trips
are required) o setup time is shorter (fewer messages and round trips are
required)
The peer-to-peer approach relies on additional "primitive" The peer-to-peer approach relies on additional "primitive"
operations, some of which are identified here. operations, some of which are identified here.
Replace an existing dialog o Replace an existing dialog
Join a new dialog with an existing dialog [Join]
Fork a new dialog with an existing dialog o Join a new dialog with an existing dialog
Locally do media forking (multi-unicast)
Ask another UA to send a request on your behalf o Support SIP conference policy control
o Locally perform media forking (multi-unicast)
o Ask another UA to send a request on your behalf
Many of the features, primitives, and actions described in this Many of the features, primitives, and actions described in this
document also require some type of media mixing, combining, or document also require some type of media mixing, combining, or
selection as described in the next section. selection as described in the next section.
4.3 Mixing Models 3.4 Mixing Models
SIP permits a variety of mixing models, which are discussed here SIP permits a variety of mixing models, which are discussed here
briefly. This topic is discussed more thoroughly in [conf-models]. briefly. This topic is discussed more thoroughly in the SIP
conferencing framework [15] and cc-conferencing [20]. SIP supports
both tightly-coupled and loosely-coupled conferencing, although more
sophisticated behavior is available in tightly-coupled conferences.
In a tightly-coupled conference, a single SIP user agent (called the
focus) has a direct dialog relationship with each participant (and
may control non participant user agents as well). In a
loosely-coupled conference there is no coordinated signaling
relationships among the participants.
For brevity, only the two most popular conferencing models are For brevity, only the two most popular conferencing models are
significantly discussed in this document (local and centralized significantly discussed in this document (local and centralized
mixing). Applications of the conversation spaces model to multicast mixing). Applications of the conversation spaces model to
and multi-unicast (full unicast mesh) conferences are left as an loosely-coupled multicast and distributed full unicast mesh
exercise for the reader. Note that a distributed full mesh conferences are left as an exercise for the reader. Note that a
conference can be used for basic conferences, but does not easily distributed full mesh conference can be used for basic conferences,
SIP Multiparty Framework but does not easily allow for more complex conferencing actions like
splitting, merging, and sidebars.
allow for more complex conferencing actions like splitting, joining,
and forking.
Call control features should be designed to allow a mixer (local or Call control features should be designed to allow a mixer (local or
centralized) to decide when to reduce a conference back to a 2-party centralized) to decide when to reduce a conference back to a 2-party
call, or drop all the participants (for example if only two call, or drop all the participants (for example if only two
automatons are communicating). The actual heuristics used to automatons are communicating). The actual heuristics used to release
release calls are beyond the scope of this document, but may depend calls are beyond the scope of this document, but may depend on
on properties in the conversation space, such as the number of properties in the conversation space, such as the number of active,
active, passive, or hidden participants; and the send-only, receive- passive, or hidden participants; and the send-only, receive-only, or
only, or send-and-receive orientation of various participants. send-and-receive orientation of various participants.
4.3.1 (Single) End System Mixing 3.4.1 Tightly Coupled
3.4.1.1 (Single) End System Mixing
The first model we call "end system mixing". In this model, user A The first model we call "end system mixing". In this model, user A
calls user B, and they have a conversation. At some point later, A calls user B, and they have a conversation. At some point later, A
decides to conference in user C. To do this, A calls C, using a decides to conference in user C. To do this, A calls C, using a
completely separate SIP call. This call uses a different Call-ID, completely separate SIP call. This call uses a different Call-ID,
different tags, etc. There is no call set up directly between B and different tags, etc. There is no call set up directly between B and
C. No SIP extension or external signaling is needed. A merely C. No SIP extension or external signaling is needed. A merely
decides to locally join two call-legs. decides to locally join two call-legs.
B C B C
\ / \ /
\ / \ /
A A
A receives media streams from both B and C, and mixes them. A sends A receives media streams from both B and C, and mixes them. A sends a
a stream containing A's and C's streams to B, and a stream stream containing A's and C's streams to B, and a stream containing
containing A's and B's streams to C. Basically, user A handles both A's and B's streams to C. Basically, user A handles both signaling
signaling and media mixing. and media mixing.
4.3.2 Centralized Mixing 3.4.1.2 Centralized Mixing
In a centralized mixing model, all participants have a pairwise SIP In a centralized mixing model, all participants have a pairwise SIP
and media relationship with the mixer. Three applications of and media relationship with the mixer. Common applications of
centralized mixing are also discussed below. centralized mixing include ad-hoc conferences and scheduled dial-in
or dial-out conferences. [need diagram]
[diagram]
4.3.2.1 Dial-In Conference Servers
Dial-In conference servers closely mirror dial-in conference bridges
in the traditional PSTN. A dial-in conference server acts as a
normal SIP UA. Users call it, and the server maintains point to
point SIP relationships with each user that calls in. The server
takes the media from the users who dial into the same conference,
mixes them, and sends out the appropriate mixed stream to each
participant separately.
As in other applications of centralized mixing, the conference is
identified by the request URI of the calls from each participant.
This provides numerous advantages from a services and routing point
of view. For example, one conference on the server might be known as
SIP Multiparty Framework
sip:conference34@servers.com. All users who call
sip:conference34@servers.com are mixed together. Dial-In conference
servers are usually associated with pre-arranged conferences.
However, the same model applies to ad-hoc conferences. An ad-hoc
conference server creates the conference state when the first user
joins, and destroys it when the last one leaves. The SIP interface
is identical to the pre-arranged case.
4.3.2.2 Ad-hoc Centralized Conferences
In an ad-hoc centralized conference, two users A and B start with a
normal SIP call. At some point later, they decide to add a third
party. Instead of using end system mixing, they would prefer to use
a central SIP mixer. Initially, A calls B. At some point, B decides
to add user C to the call, and begins the transition to a conference
server. The first step in this process is the discovery of a
conference server that supports ad-hoc conferences. This can be done
through static configuration, or through any of a number of standard
service discovery protocols, such as the Service Location Protocol
[SLP]. Once the server is discovered, a conference ID is chosen. The
first participant to send an INVITE to this URL creates the initial
conference state in the server. SIP dialogs are manipulated (using
any combination of 3pcc or peer-to-peer signaling) so that each
participant is sending media to the conference server. It is also
possible to transition from a end system mixed conference (even one
with a complex connection topology), to a centralized conference
server.
4.3.2.3 Dial-Out Conferences
Dial-out conferences are a simple variation on dial-in conferences. 3.4.1.3 Centralized Signaling, Distributed Media
Instead of the users joining the conference by sending an INVITE to
the server, the server chooses the users who are to be members of
the conference, and then sends them the INVITE. Typically dial out
conferences are pre-arranged, with specific start times and an
initial group membership list. However, there are other means for
the dial-out server to determine the list of participants, including
user presence [13]. Once the users accept or reject the call from
the dial out server, the behavior of this system is identical to the
dial-in server case.
4.3.3 Multicast and Multi-unicast conferences In this conferencing model, there is a centralized controller, as in
the dial-in and dial-out cases. However, the centralized server
handles signaling only. The media is still sent directly between
participants, using either multicast or multi-unicast. Multi-unicast
is when a user sends multiple packets (one for each recipient,
addressed to that recipient). This is referred to as a "Decentralized
Multipoint Conference" in [H.323].
In these models, all endpoints send media to all other endpoints. 3.4.2 Loosely Coupled
Consequently every endpoint mixes their own media from all the other
sources, and sends their own media to every other participant.
[diagrams] In these models, there is no point of central control of SIP
signaling. As in the "Centralized Signaling, Distributed Media" case
above, all endpoints send media to all other endpoints. Consequently
every endpoint mixes their own media from all the other sources, and
sends their own media to every other participant. [add diagrams]
4.3.3.1 Large-Scale Multicast Conferences 3.4.2.1 Large-Scale Multicast Conferences
Large-scale multicast conferences were the original motivation for Large-scale multicast conferences were the original motivation for
both the Session Description Protocol [SDP] and SIP. In a large- both the Session Description Protocol [SDP] and SIP. In a large-
scale multicast conference, one or more multicast addresses are scale multicast conference, one or more multicast addresses are
SIP Multiparty Framework
allocated to the conference. Each participant joins that multicast allocated to the conference. Each participant joins that multicast
groups, and sends their media to those groups. Signaling is not sent groups, and sends their media to those groups. Signaling is not sent
to the multicast groups. The sole purpose of the signaling is to to the multicast groups. The sole purpose of the signaling is to
inform participants of which multicast groups to join. Large-scale inform participants of which multicast groups to join. Large-scale
multicast conferences are usually pre-arranged, with specific start multicast conferences are usually pre-arranged, with specific start
and stop times. However, multicast conferences do not need to be and stop times. However, multicast conferences do not need to be
pre-arranged, so long as a mechanism exists to dynamically obtain a pre-arranged, so long as a mechanism exists to dynamically obtain a
multicast address. multicast address.
4.3.3.2 Centralized Signaling, Distributed Media 3.4.2.2 Full Distributed Unicast Conferencing
In this conferencing model, there is a centralized controller, as in
the dial-in and dial-out cases. However, the centralized server
handles signaling only. The media is still sent directly between
participants, using either multicast or multi-unicast. Multi-unicast
is when a user sends multiple packets (one for each recipient,
addressed to that recipient). This is referred to as a
"Decentralized Multipoint Conference" in [H.323].
4.3.3.3 Full Distributed Unicast Conferencing
In this conferencing model, each participant has both a pairwise In this conferencing model, each participant has both a pairwise
media relationship and a pairwise SIP relationship with every other media relationship and a pairwise SIP relationship with every other
participant (a full mesh). This model requires a mechanism to participant (a full mesh). This model requires a mechanism to
maintain a consistent view of distributed state across the group. maintain a consistent view of distributed state across the group.
This is a classic hard problem in computer science. Also, this This is a classic hard problem in computer science. Also, this model
model does not scale well for large numbers of participants. does not scale well for large numbers of participants. because for
bascause for <n> participants the number of media and SIP <n> participants the number of media and SIP relationships is
relationships is approximately n-squared. As a result, this model approximately n-squared. As a result, this model is not generally
is not generally available in commercial implementations; to the available in commercial implementations; to the contrary it is
contrary it is primarily the topic of research or experimental primarily the topic of research or experimental implementations.
implementations. Note that this model assumes peer-to-peer
signaling.
4.4 Conveying Information and Events Note that this model assumes peer-to-peer signaling.
3.5 Conveying Information and Events
Participants should have access to information about the other Participants should have access to information about the other
participants in a conversation space, so that this information can participants in a conversation space, so that this information can be
be rendered to a human user or processed by an automaton. Although rendered to a human user or processed by an automaton. Although some
some of this information may be available from the Request-URI or of this information may be available from the Request-URI or To,
To, From, Contact, or other SIP headers, another mechanism of From, Contact, or other SIP headers, another mechanism of reporting
reporting this information is necessary. this information is necessary.
Many applications are driven by knowledge about the progress of Many applications are driven by knowledge about the progress of calls
calls and conferences. In general these types of events allow for and conferences. In general these types of events allow for the
the construction of distributed applications, where the application construction of distributed applications, where the application
requires information on dialog and conference state, but is not requires information on session dialog and conference state, but is
necessarily co-resident with an endpoint user agent or conference not necessarily co-resident with an endpoint user agent or conference
server. For example, a mixer involved in a conversation space may server. For example, a focus involved in a conversation space may
wish to provide URLs for conference status, and/or conference/floor wish to provide URLs for conference status, and/or conference/floor
control. control.
SIP Multiparty Framework The SIP Events [4] architecture defines general mechanisms for
The SIP [Events] architecture defines general mechanisms for
subscription to and notification of events within SIP networks. It subscription to and notification of events within SIP networks. It
introduces the notion of a package which is a specific introduces the notion of a package which is a specific
"instantiation" of the events mechanism for a well-defined set of "instantiation" of the events mechanism for a well-defined set of
events. events.
New event packages should be able to Event packages are needed to provide the status of a user's session
provide the status of a user's call-legs (dialogs), provide the dialogs, provide the status of conferences and its participants,
status of conferences and its participants, provide user presence provide user presence information, provide the status of
information, and provide the status of user's messages. While this registrations, and provide the status of user's messages. While this
is not an exhaustive list, these are sufficient to enable the sample is not an exhaustive list, these are sufficient to enable the sample
features described in this document. features described in this document.
A conference event package allows users to subscribe to information The conference event package [12] allows users to subscribe to
about an entire conference or conversation space. This conference information about an entire tightly-coupled SIP conference.
state could be provided by a conference server or mixing component Notifications convey information about the pariticipants such as: the
(described in a later section) if centralized mixing is used, or SIP URL identifying each user, their status in the space (active,
gathered from relevant peers and merged into a cohesive set of declined, departed), URLs to invoke other features (such as sidebar
state. Notifications would convey information about the conversations), links to other relevant information (such as floor
pariticipants such as: the SIP URL identifying each user, their control policies), and if floor control policies are in place, the
status in the space (active, declined, departed), URLs to invoke user's floor control status. For conversation spaces created from
other features (such as sidebar conversations), links to other cascaded conferences, converstation state can be gathered from
relevant information (such as floor control policies), and if floor relevant foci and merged into a cohesive set of state.
control policies are in place, the user's floor control status. A
dialog event package would provide information about all the dialogs
the target user is maintaining, what conversations the user in
participating in, and how these are correlated. Concrete proposals
for conference events and dialog events are described in [dialog-
pkg] and [conf-pkg] respectively.
Note that user presence has a close relationship with these two The session dialog package [11] provides information about all the
proposed event packages. It is fundamental to the presence model dialogs the target user is maintaining, what conversations the user
that the information used to obtain user presence is constructed in participating in, and how these are correlated. Likewise the
from any number of different input sources. Examples of such sources registration package [13] provides notifications when contacts have
include SIP REGISTER requests and uploads of presence documents. changed for a specific address-of-record. The combination of these
These two packages can be considered another mechanism that allows a allows a user agent to learn about all conversations occurring for
presence agent to determine the presence state of the user. the entire registered contact set for an address-of-record.
Specifically, a user presence server can act as a subscriber for the
call-leg and conference packages to obtain additional information
that can be used to construct a presence document.
The multi-party architecture should also provide a mechanism to get Note that user presence in SIP [14] has a close relationship with
information about the status /handling of a dialog (for example, these later two event packages. It is fundamental to the presence
model that the information used to obtain user presence is
constructed from any number of different input sources. Examples of
other such sources include calendaring information and uploads of
presence documents. These two packages can be considered another
mechanism that allows a presence agent to determine the presence
state of the user. Specifically, a user presence server can act as a
subscriber for the session dialog and registration packages to obtain
additional information that can be used to construct a presence
document.
The multi-party architecture may also need to provide a mechanism to
get information about the status /handling of a dialog (for example,
information about the history of other contacts attempted prior to information about the history of other contacts attempted prior to
the current contact). Finally, the architecture should provide the current contact). Finally, the architecture should provide ample
ample opportunities to present informational URIs which relate to opportunities to present informational URIs which relate to calls,
calls, conversations, or dialogs in some way. For example, consider conversations, or dialogs in some way. For example, consider the SIP
the SIP Call-Info header, or Contact headers returned in a 300-class Call-Info header, or Contact headers returned in a 300-class
response. Frequently additional information about a call or dialog response. Frequently additional information about a call or dialog
can be fetched via non-SIP URIs. For example, consider a web page can be fetched via non-SIP URIs. For example, consider a web page
for package tracking when calling a delivery company, or a web page for package tracking when calling a delivery company, or a web page
with related documentation when joining a dial-in conference. The with related documentation when joining a dial-in conference. The
use of URIs in the multiparty framework is discussed in more detail use of URIs in the multiparty framework is discussed in more detail
in Section 4.6. in Section 3.7.
SIP Multiparty Framework Finally the interaction of SIP with stimulus-signaling-based
applications, which allow a user agent to interact with an
application without knowledge of the semantics of that application,
is discussed in the SIP application interaction framework [16].
Stimulus signaling can occur to a user interface running locally with
the client, or to a remote user interface, through media streams.
Stimulus signaling encompasses a wide range of mechanisms, ranging
from clicking on hyperlinks, to pressing buttons, to traditional Dual
Tone Multi Frequency (DTMF) input. In all cases, stimulus signaling
is supported through the use of markup languages, which play a key
role in that framework.
4.5 Componentization and Decomposition 3.6 Componentization and Decomposition
This framework proposes a decomposed component architecture with a This framework proposes a decomposed component architecture with a
very loose coupling of services and components. This means that a very loose coupling of services and components. This means that a
service (such as a conferencing server or an auto-attendant) need service (such as a conferencing server or an auto-attendant) need not
not be implemented as an actual server. Rather, these services can be implemented as an actual server. Rather, these services can be
be built by combining a few basic components in straightforward or built by combining a few basic components in straightforward or
arbitrarily complex ways. arbitrarily complex ways.
Since the components are easily deployed on separate boxes, by Since the components are easily deployed on separate boxes, by
separate vendors, or even with separate providers, we achieve a separate vendors, or even with separate providers, we achieve a
separation of function that allows each piece to be developed in separation of function that allows each piece to be developed in
complete isolation. We can also reuse existing components for new complete isolation. We can also reuse existing components for new
applications. This allows rapid service creation, and the ability applications. This allows rapid service creation, and the ability
for services to be distributed across organizational domains for services to be distributed across organizational domains anywhere
anywhere in the Internet. in the Internet.
For many of these components it is also desirable to discover their For many of these components it is also desirable to discover their
capabilities, for example querying the ability of a mixer to host a capabilities, for example querying the ability of a mixer to host a
10 dialog conference, or to reserve resources for a specific time. 10 dialog conference, or to reserve resources for a specific time.
These actions could be provided in the form of URLs, provided there These actions could be provided in the form of URLs, provided there
is an a priori means of understanding their semantics. For example is an a priori means of understanding their semantics. For example
if there is a published dictionary of operations, a way to query the if there is a published dictionary of operations, a way to query the
service for the available operations and the associated URLs, the service for the available operations and the associated URLs, the URL
URL can be the interface for providing these service operations. can be the interface for providing these service operations. This
This concept is described in more detail in the context of dialog concept is described in more detail in the context of dialog
operations in section 4.6 operations in section
4.5.1 Media Intermediaries 3.6.1 Media Intermediaries
Media Intermediaries are not participants in any conversation space, Media Intermediaries are not participants in any conversation space,
although an entity which is also a media translator may also have a although an entity which is also a media translator may also have a
colocated participant component (for example a mixer which also colocated participant component (for example a mixer which also
announces the arrival of a new participant; the announcement portion announces the arrival of a new participant; the announcement portion
is a participant, but the mixer itself is not). Media is a participant, but the mixer itself is not). Media intermediaries
intermediaries should be as transparent as possible to the end should be as transparent as possible to the end users--offering a
users--offering a useful, fundamental service; without getting in useful, fundamental service; without getting in the way of new
the way of new features implemented by participants. Some common features implemented by participants. Some common media
media intermediaries are desribed below. intermediaries are desribed below.
4.5.1.1 Mixer 3.6.2 Mixer
A SIP mixer is a component that combines media from all dialogs in A SIP mixer is a component that combines media from all dialogs in
the same conversation in a media specific way. For example, the the same conversation in a media specific way. For example, the
default combining for an audio conference would be an N-1 default combining for an audio conference might be an N-1
configuration, while the same mixer might interleave text messages configuration, while a text mixer might interleave text messages on a
on a per-line basis. per-line basis. More details about the media policy used by mixers
is described in media policy manipulation in the conference policy
Conventions for specifying a mixing or conferencing service in a SIP control protocol [17].
URI are proposed in [ms-uri].
SIP Multiparty Framework
4.5.1.2 Transcoder 3.6.3 Transcoder
A transcoder translates media from one encoding or format to another A transcoder translates media from one encoding or format to another
(for example, GSM voice to G.711, MPEG2 to H.261, or text/html to (for example, GSM voice to G.711, MPEG2 to H.261, or text/html to
text/plain). text/plain), or from one media type to another (for example text to
speech). A more thorough discussion of transcoding is described in
SIP transcoding services invocation [18].
4.5.1.3 Media Relay 3.6.4 Media Relay
A media relay terminates media and simply forwards it to a new A media relay terminates media and simply forwards it to a new
destination without changing the content in any way. Sometimes destination without changing the content in any way. Sometimes media
media relays are used to provide source IP address anonymity, to relays are used to provide source IP address anonymity, to facilitate
facilitate middlebox traversal, or to provide a trusted entity where middlebox traversal, or to provide a trusted entity where media can
media can be forcefully disconnected. be forcefully disconnected.
4.5.2 Queue Server 3.6.5 Queue Server
A queue server is a location where calls can be entered into one of A queue server is a location where calls can be entered into one of
several FIFO (first-in, first-out) queues. A queue server would several FIFO (first-in, first-out) queues. A queue server would
subscribe to the presence of groups or individuals who are subscribe to the presence of groups or individuals who are interested
interested in its queues. When detecting that a user is available in its queues. When detecting that a user is available to service a
to service a queue, the server redirects or transfers the last call queue, the server redirects or transfers the last call in the
in the relevant queue to the available user. On a queue-by-queue relevant queue to the available user. On a queue-by-queue basis,
basis, authorized users could also subscribe to the call state authorized users could also subscribe to the call state (dialog
(dialog information) of calls within a queue. Authorized users information) of calls within a queue. Authorized users could use
could use this information to effectively pluck (take) a call out of this information to effectively pluck (take) a call out of the queue
the queue (for example by sending an INVITE with a Replaces header (for example by sending an INVITE with a Replaces header to one of
to one of the user agents in the queue). the user agents in the queue).
4.5.3 Parking Place 3.6.6 Parking Place
A parking place is a location where calls can be terminated A parking place is a location where calls can be terminated
temporarily and then retrieved later. While a call is "parked", it temporarily and then retrieved later. While a call is "parked", it
can receive media "on-hold" such as music, announcements, or can receive media "on-hold" such as music, announcements, or
advertisements. Such a service could be further decomposed such advertisements. Such a service could be further decomposed such that
that announcements or music are handled by a separate component. announcements or music are handled by a separate component.
4.5.4 Announcements and Voice Dialogs 3.6.7 Announcements and Voice Dialogs
An announcement server is a server which can play digitized media An announcement server is a server which can play digitized media
(frequently audio), such as music or recorded speech. These servers (frequently audio), such as music or recorded speech. These servers
are typically accessible via SIP, HTTP, or RTSP. An analogous are typically accessible via SIP, HTTP, or RTSP. An analogous
service is a recording service which stores digitized media. A service is a recording service which stores digitized media. A
convention for specifying announcements in SIP URIs is described in convention for specifying announcements in SIP URIs is described in
[ms-uri]. Likewise the same server could easily provide a service [netann]. Likewise the same server could easily provide a service
which records digitized media. which records digitized media.
A "voice dialog" is a model of spoken interactive behavior between a A "voice dialog" is a model of spoken interactive behavior between a
human and an automaton which can include synthesized speech, human and an automaton which can include synthesized speech,
digitized audio, recognition of spoken and DTMF key input, recording digitized audio, recognition of spoken and DTMF key input, recording
of spoken input, and interaction with call control. Dialogs of spoken input, and interaction with call control. Voice dialogs
frequently consist of forms or menus. Forms present information and frequently consist of forms or menus. Forms present information and
gather input; menus offer choices of what to do next. gather input; menus offer choices of what to do next.
SIP Multiparty Framework
Spoken dialogs are a basic building block of applications which use Spoken dialogs are a basic building block of applications which use
voice. Consider for example that a voice mail system, the voice. Consider for example that a voice mail system, the
conference-id and passcode collection system for a conferencing conference-id and passcode collection system for a conferencing
system, and complicated voice portal applications all require a system, and complicated voice portal applications all require a voice
voice dialog component. dialog component.
4.5.4.1. Text-to-Speech and Automatic Speech Recognition 3.6.7.1 Text-to-Speech and Automatic Speech Recognition
Text-to-Speech (TTS) is a service which converts text into digitized Text-to-Speech (TTS) is a service which converts text into digitized
audio. TTS is frequently integrated into other applications, but audio. TTS is frequently integrated into other applications, but
when separated as a component, it provides greater opportunity for when separated as a component, it provides greater opportunity for
broad reuse. Various interfaces to access standalone TTS services broad reuse. Automatic Speech Recognition (ASR) is a service which
via HTTP, [CATS], and SIP ([app-components], and [ms-uri]) have been attempts to decipher digitized speech based on a proposed grammar.
proposed. Like TTS, ASR services can be embedded, or exposed so that many
applications can take advantage of such services. A standardized
Automatic Speech Recognition (ASR) is a service which attempts to (decomposed) interface to access standalone TTS and ASR services is
decipher digitized speech based on a proposed grammar. Like TTS, currently being developed in the SPEECHSC Workin Group.
ASR services can be embedded, or exposed so that many applications
can take advantage of such services. Various IP interfaces to ASR,
such as CATS, have been proposed.
4.5.4.2. VoiceXML 3.6.7.2 VoiceXML
[VoiceXML] is a W3C recommendation that was designed to give authors [VoiceXML] is a W3C recommendation that was designed to give authors
control over the spoken dialog between users and applications. The control over the spoken dialog between users and applications. The
application and user take turns speaking: the application prompts application and user take turns speaking: the application prompts the
the user, and the user in turn responds. Its major goal is to bring user, and the user in turn responds. Its major goal is to bring the
the advantages of web-based development and content delivery to advantages of web-based development and content delivery to
interactive voice response applications. We believe that VoiceXML interactive voice response applications. We believe that VoiceXML
represents the ideal partner for SIP in the development of represents the ideal partner for SIP in the development of
distributed IVR servers. VoiceXML is an XML based scripting language distributed IVR servers. VoiceXML is an XML based scripting language
for describing IVR services at an abstract level. VoiceXML supports for describing IVR services at an abstract level. VoiceXML supports
DTMF recognition, speech recognition, text-to-speech, and playing DTMF recognition, speech recognition, text-to-speech, and playing out
out of recorded media files. The results of the data collected from of recorded media files. The results of the data collected from the
the user are passed to a controlling entity through an HTTP POST user are passed to a controlling entity through an HTTP POST
operation. The controller can then return another script, or operation. The controller can then return another script, or
terminate the interaction with the IVR server. terminate the interaction with the IVR server.
A VoiceXML server also need not be implemented as a monolithic A VoiceXML server also need not be implemented as a monolithic
server. Below is a diagram of a VoiceXML browser which is split server. Below is a diagram of a VoiceXML browser which is split into
into media and non-media handling parts. The VoiceXML interpreter media and non-media handling parts. The VoiceXML interpreter handles
handles SIP dialog state and state within a VoiceXML document, and SIP dialog state and state within a VoiceXML document, and sends
sends requests to the media component over another protocol (for requests to the media component over another protocol.
example [RTSP] or CATS).
+-------------+ +-------------+
| | | |
| VoiceXML | | VoiceXML |
| Interpreter | | Interpreter |
| (signaling) | | (signaling) |
+-------------+ +-------------+
SIP Multiparty Framework
^ ^ ^ ^
| | | |
SIP | | RTSP SIP | | RTSP
| | | |
| | | |
v v v v
+-------------+ +-------------+ +-------------+ +-------------+
| | | | | | | |
| SIP UA | RTP | RTSP Server | | SIP UA | RTP | RTSP Server |
| |<------>| (media) | | |<------>| (media) |
| | | | | | | |
+-------------+ +-------------+ +-------------+ +-------------+
Figure : Decomposed VoiceXML Server Figure : Decomposed VoiceXML Server
More details about the integration of SIP with VoiceXML are provided 3.7 Use of URIs
in [sip-vxml]
4.6 Use of URIs
All naming in SIP uses URIs. URIs in SIP are used in a plethora of All naming in SIP uses URIs. URIs in SIP are used in a plethora of
contexts: the Request-URI; Contact, To, From, and *-Info headers; contexts: the Request-URI; Contact, To, From, and *-Info headers;
application/uri bodies; and embedded in email, web pages, instant application/uri bodies; and embedded in email, web pages, instant
messages, and ENUM records. The request-URI identifies the user or messages, and ENUM records. The request-URI identifies the user or
service that the call is destined for. service that the call is destined for.
SIP URIs embedded in informational SIP headers, SIP bodies, and non- SIP URIs embedded in informational SIP headers, SIP bodies, and
SIP content can also specify methods, special parameters, headers, non-SIP content can also specify methods, special parameters,
and even bodies. For example: headers, and even bodies. For example:
sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098 sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098
&To=<sip:bob@biloxi.com>;tag=879738 &To=<sip:bob@biloxi.com>;tag=879738
&From=<sip:alice@atlanta.com>;tag=023214 &From=<sip:alice@atlanta.com>;tag=023214
sip:bob@babylon.biloxi.com;method=REFER? sip:bob@babylon.biloxi.com;method=REFER?
Refer-To=<http://www.atlanta.com/~alice> Refer-To=<http://www.atlanta.com/~alice>
Throughout this draft we discuss call control primitive operations. Throughout this draft we discuss call control primitive operations.
One of the biggest problems is defining how these operations may be One of the biggest problems is defining how these operations may be
invoked. There are a number of ways to do this. One way is to invoked. There are a number of ways to do this. One way is to
define the primitives in the protocol itself such that SIP methods define the primitives in the protocol itself such that SIP methods
(for example REFER) or SIP headers (for example Replaces) indicate a (for example REFER) or SIP headers (for example Replaces) indicate a
specific call control action. Another way to invoke call control specific call control action. Another way to invoke call control
primitives is to define a specific Request-URI naming convention. primitives is to define a specific Request-URI naming convention.
Either these conventions must be shared between the client (the Either these conventions must be shared between the client (the
invoker) and the server, or published by or on behlf of the server. invoker) and the server, or published by or on behlf of the server.
The former involves defining URL construction techniques (e.g. URL The former involves defining URL construction techniques (e.g. URL
parameters and/or token conventions) as proposed in [ms-uri]. The parameters and/or token conventions) as proposed in [netannc]. The
latter technique usually involves discovering the URI via a SIP latter technique usually involves discovering the URI via a SIP event
event package, a web page, a business card, or an Instant Message. package, a web page, a business card, or an Instant Message. Yet
Yet another means to acquire the URLs is to define a dictionary of another means to acquire the URLs is to define a dictionary of
primitives with well-defined semantics and provide a means to query primitives with well-defined semantics and provide a means to query
SIP Multiparty Framework
the named primitives and corresponding URLs that may be invoked on the named primitives and corresponding URLs that may be invoked on
the service or dialogs. the service or dialogs.
4.6.1 Naming Users in SIP 3.7.1 Naming Users in SIP
An address-of-record, or public SIP address, is a SIP (or SIPS) URI An address-of-record, or public SIP address, is a SIP (or SIPS) URI
that points to a domain with a location server that can map the URI that points to a domain with a location server that can map the URI
to set of Contact URIs where the user might be available. Typically to set of Contact URIs where the user might be available. Typically
the Contact URIs are populated via registration. the Contact URIs are populated via registration.
Address of Record Contacts Address of Record Contacts
sip:bob@biloxi.com -> sip:bob@babylon.biloxi.com:5060 sip:bob@biloxi.com -> sip:bob@babylon.biloxi.com:5060
sip:bbrown@mailbox.provider.net sip:bbrown@mailbox.provider.net
sip:+1.408.555.6789@mobile.net sip:+1.408.555.6789@mobile.net
[Caller-prefs] defines a set of additional parameters to the Contact Caller Preferences and Callee Capabilities [21] defines a set of
header that define the characteristics of the user agent at the additional parameters to the Contact header that define the
specified URI. For example, there is a mobility parameter which characteristics of the user agent at the specified URI. For example,
indicates whether the UA is fixed or mobile. When a user agent there is a mobility parameter which indicates whether the UA is fixed
registers, it places these parameters in the Contact headers to or mobile. When a user agent registers, it places these parameters
characterize the URIs it is registering. This allows a proxy for in the Contact headers to characterize the URIs it is registering.
that domain to have information about the contact addresses for that This allows a proxy for that domain to have information about the
user. contact addresses for that user.
When a caller sends a request, it can optionally include the Accept- When a caller sends a request, it can optionally include the
Contact and Reject-Contact headers which request certain handling by Accept-Contact and Reject-Contact headers which request certain
the proxy in the target domain. These headers contain preferences handling by the proxy in the target domain. These headers contain
that describe the set of desired URIs to which the caller would like preferences that describe the set of desired URIs to which the caller
their request routed. The proxy in the target domain matches these would like their request routed. The proxy in the target domain
preferences with the Contact characteristics originally registered matches these preferences with the Contact characteristics originally
by the target user. The target user can also choose to run registered by the target user. The target user can also choose to
arbitrarily complex "Find-me" feature logic on a proxy in the target run arbitrarily complex "Find-me" feature logic on a proxy in the
domain. target domain.
There is a strong asymmetry in how preferences for callers and There is a strong asymmetry in how preferences for callers and
callees can be presented to the network. While a caller takes an callees can be presented to the network. While a caller takes an
active role by initiating the request, the callee takes a passive active role by initiating the request, the callee takes a passive
role in waiting for requests. This motivates the use of callee- role in waiting for requests. This motivates the use of
supplied scripts and caller preferences included in the call callee-supplied scripts and caller preferences included in the call
request. This asymmetry is also reflected in the appropriate request. This asymmetry is also reflected in the appropriate
relationship between caller and callee preferences. A server for a relationship between caller and callee preferences. A server for a
callee should respect the wishes of the caller to avoid certain callee should respect the wishes of the caller to avoid certain
locations, while the preferences among locations has to be the locations, while the preferences among locations has to be the
callee's choice, as it determines where, for example, the phone callee's choice, as it determines where, for example, the phone rings
rings and whether the callee incurs mobile telephone charges for and whether the callee incurs mobile telephone charges for incoming
incoming calls. calls.
SIP User Agent implementations are encouraged to make intelligent SIP User Agent implementations are encouraged to make intelligent
decisions based on the type of participants (active/passive, hidden, decisions based on the type of participants (active/passive, hidden,
human/robot) in a conversation space. This information is conveyed human/robot) in a conversation space. This information is conveyed
in a SIP URI parameter and communicated using an appropriate SIP via the session dialog package or in a SIP header parameter
header or event body. For example, a music on hold service may take communicated using an appropriate SIP header. For example, a music
the sensible approach that if there are two or more unhidden on hold service may take the sensible approach that if there are two
SIP Multiparty Framework or more unhidden participants, it should not provide hold music; or
that it will not send hold music to robots.
participants, it should not provide hold music; or that it will not
send hold music to robots.
Multiple participants in the same conversation space may represent Multiple participants in the same conversation space may represent
the same human user. For example, the user may use one participant the same human user. For example, the user may use one participant
for video, chat, and whiteboard media on a PC and another for audio for video, chat, and whiteboard media on a PC and another for audio
media on a SIP phone. In this case, the address-of-record is the media on a SIP phone. In this case, the address-of-record is the
same for both user agents, but the Contacts are different. In same for both user agents, but the Contacts are different. In
addition, human users may add robot participants which act on their addition, human users may add robot participants which act on their
behalf (for example a call recording service, or a calendar behalf (for example a call recording service, or a calendar
reminder). Call Control features in SIP should continue to function reminder). Call Control features in SIP should continue to function
as expected in such an environment. as expected in such an environment.
4.6.2 Naming Services with SIP URIs. 3.7.2 Naming Services with SIP URIs
A critical piece of defining a session level service that can be [Editor's Note: this section needs to be pared down considerably, and
the examples replaced with example.{com|org|net} domain names.] A
critical piece of defining a session level service that can be
accessed by SIP is defining the naming of the resources within that accessed by SIP is defining the naming of the resources within that
service. This point cannot be overstated. service. This point cannot be overstated.
In the context of SIP control of application components, we take In the context of SIP control of application components, we take
advantage of the fact that the standard SIP URI has a user part. advantage of the fact that the standard SIP URI has a user part.
Most services may be thought of as user automatons that participate Most services may be thought of as user automatons that participate
in SIP sessions. It naturally follows that the user address, or the in SIP sessions. It naturally follows that the user address, or the
left-hand-side of the URI, should be utilized as a service left-hand-side of the URI, should be utilized as a service indicator.
indicator.
For example, media servers commonly offer multiple services at a For example, media servers commonly offer multiple services at a
single host address. Use of the user part as a service indicator single host address. Use of the user part as a service indicator
enables service consumers to direct their requests without enables service consumers to direct their requests without ambiguity.
ambiguity. It has the added benefit of enabling media services to It has the added benefit of enabling media services to register their
register their availability with SIP Registrars just as any "real" availability with SIP Registrars just as any "real" SIP user would.
SIP user would. This maintains consistency and provides enhanced This maintains consistency and provides enhanced flexibility in the
flexibility in the deployment of media services in the network. deployment of media services in the network.
There has been much discussion about the potential for confusion if There has been much discussion about the potential for confusion if
media services URIs are not readily distinguishable from other types media services URIs are not readily distinguishable from other types
of SIP UA's. The use of a service namespace provides a mechanism to of SIP UA's. The use of a service namespace provides a mechanism to
unambiguously identify standard interfaces while not constraining unambiguously identify standard interfaces while not constraining
the development of private or experimental services. the development of private or experimental services.
In SIP, the request-URI identifies the user or service that the call In SIP, the request-URI identifies the user or service that the call
is destined for. The great advantage of using URIs (specifically, is destined for. The great advantage of using URIs (specifically,
the SIP request URI) as a service identifier comes because of the the SIP request URI) as a service identifier comes because of the
combination of two facts. First, unlike in the PSTN, where the combination of two facts. First, unlike in the PSTN, where the
namespace (dialable telephone numbers) are limited, URIs come from namespace (dialable telephone numbers) are limited, URIs come from an
an infinite space. They are plentiful, and they are free. Secondly, infinite space. They are plentiful, and they are free. Secondly, the
the primary function of SIP is call routing through manipulations of primary function of SIP is call routing through manipulations of the
the request URI. In the traditional SIP application, this URI request URI. In the traditional SIP application, this URI represents
represents people. However, the URI can also represent services, as people. However, the URI can also represent services, as we propose
we propose here. This means we can apply the routing services SIP here. This means we can apply the routing services SIP provides to
provides to routing of calls to services. The result - the problem routing of calls to services. The result - the problem of service
of service invocation and service location becomes a routing invocation and service location becomes a routing problem, for which
problem, for which SIP provides a scalable and flexible solution. SIP provides a scalable and flexible solution. Since there is such a
vast namespace of services, we can explicitly name each service in a
SIP Multiparty Framework finely granular way. This allows the distribution of services across
the network.
Since there is such a vast namespace of services, we can explicitly
name each service in a finely granular way. This allows the
distribution of services across the network.
Consider a conferencing service, where we have separated the names Consider a conferencing service, where we have separated the names of
of ad-hoc conferences from scheduled conferences, we can program ad-hoc conferences from scheduled conferences, we can program proxies
proxies to route calls for ad-hoc conferences to one set of servers, to route calls for ad-hoc conferences to one set of servers, and
and calls for scheduled ones to another, possibly even in a calls for scheduled ones to another, possibly even in a different
different provider. In fact, since each conference itself is given a provider. In fact, since each conference itself is given a URI, we
URI, we can distribute conferences across servers, and easily can distribute conferences across servers, and easily guarantee that
guarantee that calls for the same conference always get routed to calls for the same conference always get routed to the same server.
the same server. This is in stark contrast to conferences in the This is in stark contrast to conferences in the telephone network,
telephone network, where the equivalent of the URI - the phone where the equivalent of the URI - the phone number - is scarce. An
number - is scarce. An entire conferencing provider generally has entire conferencing provider generally has one or two numbers.
one or two numbers. Conference IDs must be obtained through IVR Conference IDs must be obtained through IVR interactions with the
interactions with the caller, or through a human attendant. This caller, or through a human attendant. This makes it difficult to
makes it difficult to distribute conferences across servers all over distribute conferences across servers all over the network, since the
the network, since the PSTN routing only knows about the dialed PSTN routing only knows about the dialed number.
number.
In the case of a dialog server, the voice dialog itself is the In the case of a dialog server, the voice dialog itself is the target
target for the call. As such, the request URI should contain the for the call. As such, the request URI should contain the identifier
identifier for this spoken dialog. This is consistent with the for this spoken dialog. This is consistent with the Request-URI
Request-URI service invocation model of RFC 3087. This URL can be in service invocation model of RFC 3087. This URL can be in one of two
one of two formats. In the first, the VoiceXML script is identified formats. In the first, the VoiceXML script is identified directly by
directly by an HTTP URL. In the second, the script is not specified. an HTTP URL. In the second, the script is not specified. Rather, the
Rather, the dialog server uses its configuration to map the incoming dialog server uses its configuration to map the incoming request to a
request to a specific script. specific script.
Since the request URI could indicate a request for a variety of Since the request URI could indicate a request for a variety of
different services, of which a dialog server is only one type, this different services, of which a dialog server is only one type, this
example request URI first begins with a service identifier, that example request URI first begins with a service identifier, that
indicates the basic service required. For VoiceXML scripts, this indicates the basic service required. For VoiceXML scripts, this
identification information is a URL-encoded version of the URL which identification information is a URL-encoded version of the URL which
references the script to execute, or if not present, the dialog references the script to execute, or if not present, the dialog
server uses server-specific configuration to determine which script server uses server-specific configuration to determine which script
to execute. to execute.
Examples of URLs that invoke VoiceXML dialogs are: Examples of URLs that invoke VoiceXML dialogs are: (line folding for
(line folding for clarity only) clarity only)
sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml
@vxmlservers.com @vxmlservers.com
sip:dialog.vxml@vxmlservers.com sip:dialog.vxml@vxmlservers.com
The first of these indicates that the dialog server (located at The first of these indicates that the dialog server (located at
vxmlservers.com) should invoke a VoiceXML script fetched from vxmlservers.com) should invoke a VoiceXML script fetched from
http://dialogs.server.com/script32.vxml. Since the user part of the http://dialogs.server.com/script32.vxml. Since the user part of the
SIP URL cannot contain the : character, this must be escaped to %3a. SIP URL cannot contain the : character, this must be escaped to %3a.
These types of conventions are not limited to application component These types of conventions are not limited to application component
servers. An ordinary SIP User Agent can have a special URIs as servers. An ordinary SIP User Agent can have a special URIs as well,
well, for example, one which is automatically answered by a for example, one which is automatically answered by a speakerphone.
SIP Multiparty Framework Since URIs are so plentiful, using a separate URI for this service
does not exhaust a valuable resource. The requested service is clear
speakerphone. Since URIs are so plentiful, using a separate URI for to the user agent receiving the request. This URI can also be
this service does not exhaust a valuable resource. The requested included as part of another feature (for example, the Intercom
service is clear to the user agent receiving the request. This URI feature described in Section 6.1.6). This feature can be specified
can also be included as part of another feature (for example, the with a SIP user parameter, since are part of the userpart of a SIP
Intercom feature described in Section 6.1.6). This feature can be URI.
specified with a SIP user parameter, since are part of the userpart
of a SIP URI.
Likewise a Request URI can fully describe an announcement service Likewise a Request URI can fully describe an announcement service
through the use of the user part of the address and additional URI through the use of the user part of the address and additional URI
parameters. In our example, the user portion of the address, parameters. In our example, the user portion of the address, "annc",
"annc", specifies the announcement service on the media server. specifies the announcement service on the media server. The two URI
The two URI parameters "play=" and "early=" specify the audio parameters "play=" and "early=" specify the audio resource to play
resource to play and whether early media is desired. and whether early media is desired.
sip:annc@ms2.carrier.net; sip:annc@ms2.carrier.net;
play=http://audio.carrier.net/allcircuitsbusy.au;early=yes play=http://audio.carrier.net/allcircuitsbusy.au;early=yes
sip:annc@ms2.carrier.net; sip:annc@ms2.carrier.net;
play=file://fileserver.carrier.net/geminii/yourHoroscope.wav play=file://fileserver.carrier.net/geminii/yourHoroscope.wav
In practical applications, it is important that an invoker does not In practical applications, it is important that an invoker does not
necessarily apply semantic rules to various URIs it did not create. necessarily apply semantic rules to various URIs it did not create.
Instead, it should allow any arbitrary string to be provisioned, and Instead, it should allow any arbitrary string to be provisioned, and
skipping to change at page 21, line 4 skipping to change at page 22, line 34
standard greeting sip:677283@vm.wcom.com standard greeting sip:677283@vm.wcom.com
sip:rjs@vm.wcom.com;mode=deposit sip:rjs@vm.wcom.com;mode=deposit
Deposit with on sip:sub-rjs-deposit-busy.vm.wcom.com Deposit with on sip:sub-rjs-deposit-busy.vm.wcom.com
phone greeting sip:677372@vm.wcom.com phone greeting sip:677372@vm.wcom.com
sip:rjs@vm.wcom.com;mode=3991243 sip:rjs@vm.wcom.com;mode=3991243
Deposit with sip:sub-rjs-deposit-sg@vm.wcom.com Deposit with sip:sub-rjs-deposit-sg@vm.wcom.com
special greeting sip:677384@vm.wcom.com special greeting sip:677384@vm.wcom.com
sip:rjs@vm.wcom.com;mode=sg sip:rjs@vm.wcom.com;mode=sg
SIP Multiparty Framework
Retrieve - SIP sip:sub-rjs-retrieve@vm.wcom.com Retrieve - SIP sip:sub-rjs-retrieve@vm.wcom.com
authentication sip:677405@vm.wcom.com authentication sip:677405@vm.wcom.com
sip:rjs@vm.wcom.com;mode=retrieve sip:rjs@vm.wcom.com;mode=retrieve
Retrieve - prompt sip:sub-rjs-retrieve-inpin.vm.wcom.com Retrieve - prompt sip:sub-rjs-retrieve-inpin.vm.wcom.com
for PIN in-band sip:677415@vm.wcom.com for PIN in-band sip:677415@vm.wcom.com
sip:rjs@vm.wcom.com;mode=inpin sip:rjs@vm.wcom.com;mode=inpin
As we have shown, SIP URIs represent an ideal, flexbile mechanism As we have shown, SIP URIs represent an ideal, flexbile mechanism for
for describing and naming service resources, be they queues, describing and naming service resources, be they queues, conferences,
conferences, voice dialogs, announcements, voicemail treatments, or voice dialogs, announcements, voicemail treatments, or phone
phone features. features.
4.7 Invoker Independence 3.8 Invoker Independence
Only the invoker of features in SIP need to know exactly which With functional signaling, only the invoker of features in SIP need
feature they are invoking. One of the primary benefits of this to know exactly which feature they are invoking. One of the primary
approach is that combinations of features should work in SIP call benefits of this approach is that combinations of functional features
control. For example, let us examine the combination of a work in SIP call control without requiring complex feature
"transfer" of a call which is "conferenced". interaction matrices. For example, let us examine the combination of
a "transfer" of a call which is "conferenced".
Alice calls Bob. Alice silently "conferences in" her robotic Alice calls Bob. Alice silently "conferences in" her robotic
assistant Albert as a hidden party. Bob transfers Alice to Carol. assistant Albert as a hidden party. Bob transfers Alice to Carol.
If Bob asks Alice to Replace her leg with a new one to Carol then If Bob asks Alice to Replace her leg with a new one to Carol then
both Alice and Albert should be communicating with Carol both Alice and Albert should be communicating with Carol
(transparently). (transparently).
Using the peer-to-peer model, this combination of features works Using the peer-to-peer model, this combination of features works fine
fine if A is doing local mixing (Alice replaces Bob's call-leg with if A is doing local mixing (Alice replaces Bob's call-leg with
Carol's), or if A is using a central mixer (the mixer replaces Bob's Carol's), or if A is using a central mixer (the mixer replaces Bob's
call leg with Carol's). A clever implementation using the 3pcc call leg with Carol's). A clever implementation using the 3pcc model
model can generate similar results. can generate similar results.
New extensions to the SIP Call Control Framework should attempt to New extensions to the SIP Call Control Framework should attempt to
preserve this property. preserve this property.
4.8 Billing issues 3.9 Billing issues
Billing in the PSTN is typically based on who initiated a call. At Billing in the PSTN is typically based on who initiated a call. At
the moment billing in a SIP network is neither consistent with the moment billing in a SIP network is neither consistent with
itself, nor with the PSTN. (A billing model for SIP should allow itself, nor with the PSTN. (A billing model for SIP should allow for
for both PSTN-style billing, and non-PSTN billing.) The example both PSTN-style billing, and non-PSTN billing.) The example below
below demonstrates one such inconsistency. demonstrates one such inconsistency.
Alice places a call to Bob. Alice then blind transfers Bob to Carol Alice places a call to Bob. Alice then blind transfers Bob to Carol
through a PSTN gateway. In current usage of REFER and BYE/Also, Bob through a PSTN gateway. In current usage of REFER, Bob may be billed
may be billed for a call he did not initiate (his UA originated the for a call he did not initiate (his UA originated the outgoing call
outgoing call leg however). This is not necessarily a terrible leg however). This is not necessarily a terrible thing, but it
thing, but it demonstrates a security concern (Bob must have demonstrates a security concern (Bob must have appropriate local
appropriate local policy to prevent fraud). Also, Alice may wish to policy to prevent fraud). Also, Alice may wish to pay for Bob's
pay for Bob's session with Carol. There should be a way to signal session with Carol. There should be a way to signal this in SIP.
this in SIP.
SIP Multiparty Framework
Likewise a Replacement call may maintain the same billing Likewise a Replacement call may maintain the same billing
relationship as a Replaced call, so if Alice first calls Carol, then relationship as a Replaced call, so if Alice first calls Carol, then
asks Bob to Replace this call, Alice may continue to receive a bill. asks Bob to Replace this call, Alice may continue to receive a bill.
Further work in SIP billing should define a way to set or discover Further work in SIP billing should define a way to set or discover
the direction of billing. the direction of billing.
5 Catalog of call control actions and sample features 4. Catalog of call control actions and sample features
Call control actions can be categorized by the dialogs upon which Call control actions can be categorized by the dialogs upon which
they operate. The actions may involve a single or multiple dialogs. they operate. The actions may involve a single or multiple dialogs.
These dialogs can be early or established. Multiple dialogs may be These dialogs can be early or established. Multiple dialogs may be
related in a conversation space to form a conference or other related in a conversation space to form a conference or other
interesting media topologies. interesting media topologies.
It should be noted that it is desirable to provide a means by which It should be noted that it is desirable to provide a means by which a
a party can discover the actions which may be performed on a dialog. party can discover the actions which may be performed on a dialog.
The interested party may be independent or related to the dialogs. The interested party may be independent or related to the dialogs.
One means of accomplishing this is through the ability to define and One means of accomplishing this is through the ability to define and
obtain URLs for these actions as described in section 4.6. obtain URLs for these actions as described in section .
Below are listed several call control "actions" which establish or Below are listed several call control "actions" which establish or
modify dialogs and relate the participants in a conversation space. modify dialogs and relate the participants in a conversation space.
The names of the actions listed are for descriptive purposes only The names of the actions listed are for descriptive purposes only
(they are not normative). This list of actions is not meant to be (they are not normative). This list of actions is not meant to be
exhaustive. exhaustive.
In the examples, all actions are initiated by the user "Alice" In the examples, all actions are initiated by the user "Alice"
represented by UA "A". represented by UA "A".
5.1 Early Dialog Actions 4.1 Early Dialog Actions
The following are a set of actions that may be performed on a single The following are a set of actions that may be performed on a single
early dialog. These actions can be thought of as a set of remote early dialog. These actions can be thought of as a set of remote
control operations. For example an automaton might perform the control operations. For example an automaton might perform the
operation on behalf of a user. Alternatively a user might use the operation on behalf of a user. Alternatively a user might use the
remote control in the form of an application to perform the action remote control in the form of an application to perform the action on
on the early dialog of a UA which may be out of reach. All of these the early dialog of a UA which may be out of reach. All of these
actions correspond to telling the UA how to respond to a request to actions correspond to telling the UA how to respond to a request to
establish an early dialog. These actions provide useful establish an early dialog. These actions provide useful functionality
functionality for PDA, PC and server based applications which desire for PDA, PC and server based applications which desire the ability to
the ability to control a UA. control a UA.
5.1.1 Remote Answer 4.1.1 Remote Answer
A dialog is in some early dialog state such as 180 Ringing. It may A dialog is in some early dialog state such as 180 Ringing. It may
be desirable to tell the UA to answer the dialog. That is tell it be desirable to tell the UA to answer the dialog. That is tell it to
to send a 200 Ok response to establish the dialog. send a 200 Ok response to establish the dialog.
5.1.2 Remote Forward or Put 4.1.2 Remote Forward or Put
SIP Multiparty Framework
It may be desirable to tell the UA to respond with a 3xx class It may be desirable to tell the UA to respond with a 3xx class
response to forward an early dialog to another UA. response to forward an early dialog to another UA.
5.1.3 Remote Busy or Error Out 4.1.3 Remote Busy or Error Out
It may be desirable to instruct the UA to send an error response It may be desirable to instruct the UA to send an error response such
such as 486 Busy Here. as 486 Busy Here.
5.2 Single Dialog Actions 4.2 Single Dialog Actions
There is another useful set of actions which operate on a single There is another useful set of actions which operate on a single
established dialog. These operations are useful in building established dialog. These operations are useful in building
productivity applications for aiding users to control their phone. productivity applications for aiding users to control their phone.
For example a CRM application which sets up calls for a user For example a CRM application which sets up calls for a user
eliminating the need for the user to actually enter an address. eliminating the need for the user to actually enter an address.
These operations can also be thought of a remote control actions. These operations can also be thought of a remote control actions.
5.2.1 Remote Dial 4.2.1 Remote Dial
This action instructs the UA to initiate a dialog. This action can This action instructs the UA to initiate a dialog. This action can
be performed using the REFER method. be performed using the REFER method.
5.2.2 Remote On and Off Hold 4.2.2 Remote On and Off Hold
This action instructs the UA to put an established dialog on hold. This action instructs the UA to put an established dialog on hold.
Though this operation can be conceptually be performed with the Though this operation can be conceptually be performed with the REFER
REFER method, there is no semantics defined as to what the referred method, there is no semantics defined as to what the referred party
party should do with the SDP. There is no way to distinguish between should do with the SDP. There is no way to distinguish between the
the desire to go on or off hold. desire to go on or off hold.
5.2.3 Remote Hangup 4.2.3 Remote Hangup
This action instructs the UA to terminate an early or established This action instructs the UA to terminate an early or established
dialog. A REFER request with the following Refer-To URI performs dialog. A REFER request with the following Refer-To URI performs this
this action. Note: this URL is not properly escaped. action. Note: this URL is not properly escaped.
sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098 sip:bob@babylon.biloxi.example.com;method=BYE?Call-ID=13413098
&To=<sip:bob@biloxi.com>;tag=879738 &To=<sip:bob@biloxi.com>;tag=879738
&From=<sip:alice@atlanta.com>;tag=023214 &From=<sip:alice@atlanta.example.com>;tag=023214
5.3 Multi-dialog actions 4.3 Multi-dialog actions
These actions apply to a set of related dialogs. These actions apply to a set of related dialogs.
5.3.1 Transfer 4.3.1 Transfer
The conversation space changes as follows: The conversation space changes as follows:
before after before after
{ A , B } --> { C , B } { A , B } --> { C , B }
A replaces itself with C. A replaces itself with C.
SIP Multiparty Framework
To make this happen using the peer-to-peer approach, "A" would send To make this happen using the peer-to-peer approach, "A" would send
two SIP requests. A shorthand for those requests is shown below: two SIP requests. A shorthand for those requests is shown below:
REFER B Refer-To:C REFER B Refer-To:C
BYE B BYE B
To make this happen instead using the 3pcc approach, the controller To make this happen instead using the 3pcc approach, the controller
sends requests represented by the shorthand below: sends requests represented by the shorthand below:
INVITE C (w/SDP of B) INVITE C (w/SDP of B)
reINVITE B (w/SDP of C) reINVITE B (w/SDP of C)
BYE A BYE A
Features enabled by this action: Features enabled by this action: - blind transfer - transfer to a
- blind transfer central mixer (some type of conference or forking) - transfer to park
- transfer to a central mixer (some type of conference or forking) server (park) - transfer to music on hold or announcement server -
- transfer to park server (park) transfer to a "queue" - transfer to a service (such as Voice Dialogs
- transfer to music on hold or announcement server service) - transition from local mixer to central mixer
- transfer to a "queue"
- transfer to a service (such as Voice Dialogs service)
- transition from local mixer to central mixer
5.3.2 Take
The conversation space changes as follows:
{ B , C } --> { B , A } This action is frequently referred to as "completing an attended
transfer". It is described in more detail in cc-transfer [19].
A forcibly replaces C with itself. In most uses of this primitive, 4.3.2 Take
A is just "un-replacing" itself.
Using the peer-to-peer approach, "A" sends: The conversation space changes as follows: { B , C } --> { B , A }
INVITE B Replaces: <call leg between B and C> A forcibly replaces C with itself. In most uses of this primitive, A
is just "un-replacing" itself. Using the peer-to-peer approach, "A"
sends: INVITE B Replaces: <call leg between B and C>
Using the 3pcc approach (all requests sent from controller) Using the 3pcc approach (all requests sent from controller) INVITE A
INVITE A (w/SDP of B) (w/SDP of B) reINVITE B (w/SDP of A) BYE C
reINVITE B (w/SDP of A)
BYE C
Features enabled by this action: Features enabled by this action: - transferee completes an attended
- transferee completes an attended transfer transfer - retrieve from central mixer (not recommended) - retrieve
- retrieve from central mixer (not recommended) from music on hold or park - retrieve from queue - call center take -
- retrieve from music on hold or park voice portal resuming ownership of a call it originated -
- retrieve from queue answering-machine style screening (pickup) - pickup of a ringing call
- call center take (i.e. early dialog)
- voice portal resuming ownership of a call it originated
- answering-machine style screening (pickup)
- pickup of a ringing call (i.e. early dialog)
Note: that pick up of a ringing call has perhaps some interesting Note: that pick up of a ringing call has perhaps some interesting
additional requirements. First of all it is an early dialog as additional requirements. First of all it is an early dialog as
opposed to an established dialog. Secondly the party which is to opposed to an established dialog. Secondly the party which is to
pickup the call may only wish to do so only while it is an early pickup the call may only wish to do so only while it is an early
dialog. That is in the race condition where the ringing UA accepts dialog. That is in the race condition where the ringing UA accepts
just before it receives signaling from the party wishing to take the just before it receives signaling from the party wishing to take the
SIP Multiparty Framework
call, the taking party wishes to yield or cancel the take. The goal call, the taking party wishes to yield or cancel the take. The goal
is to avoid yanking an answered call from the called party. is to avoid yanking an answered call from the called party.
5.3.3 Add This action is described in Replaces [9] and in cc-transfer [19].
The conversation space changes as follows:
{ A , B } --> { A, B, C }
A adds C to the conversation.
Using the peer-to-peer approach, adding a party using local mixing
requires no signaling. To transition from a 2-party call or a
locally mixed conference to centrally mixing A could send the
following requests:
REFER B Refer-To: mixer
INVITE mixer
BYE B
To add a party to a central mixer:
REFER C Refer-To: mixer
or
REFER mixer Refer-To: C
Using the 3pcc approach to transition to centrally mixed, the
controller would send:
INVITE mixer leg 1 (w/SDP of A)
INVITE mixer leg 2 (w/SDP of B)
INVITE C (late SDP)
reINVITE A (w/SDP of mixer leg 1)
reINVITE B (w/SDP of mixer leg 2)
INVITE mixer leg3 (w/SDP of C)
To add a party to a central mixer:
INVITE C (late SDP)
INVITE mixer (w/SDP of C)
Features enabled:
- standard conference feature
- call recording
- answering-machine style screening (screening)
5.3.4 Local Join
The conversation space changes like this:
{ A, B} , {A, C} --> {A, B, C}
or like this 4.3.3 Add
{ A, B} , {C, D} --> {A, B, C, D} Note that the following 4 actions are described in cc-conferencing
[20].
A takes two conversation spaces and joins them together into a This is merely adding a participant to a SIP conference. The
single space. conversation space changes as follows: { A , B } --> { A, B, C } A
adds C to the conversation. Using the peer-to-peer approach, adding a
party using local mixing requires no signaling. To transition from a
2-party call or a locally mixed conference to centrally mixing A
could send the following requests: REFER B Refer-To: conference-URI
INVITE conference-URI BYE B To add a party to a conference: REFER C
Refer-To: conference-URI or REFER conference-URI Refer-To: C Using
the 3pcc approach to transition to centrally mixed, the controller
would send: INVITE mixer leg 1 (w/SDP of A) INVITE mixer leg 2 (w/SDP
of B) INVITE C (late SDP) reINVITE A (w/SDP of mixer leg 1) reINVITE
B (w/SDP of mixer leg 2) INVITE mixer leg3 (w/SDP of C) To add a
party to a SIP conference: INVITE C (late SDP) INVITE conference-URI
(w/SDP of C) Features enabled: - standard conference feature - call
recording - answering-machine style screening (screening)
SIP Multiparty Framework 4.3.4 Local Join
The conversation space changes like this: { A, B} , {A, C} --> {A,
B, C} or like this { A, B} , {C, D} --> {A, B, C, D} A takes two
conversation spaces and joins them together into a single space.
Using the peer-to-peer approach, A can mix locally, or REFER the Using the peer-to-peer approach, A can mix locally, or REFER the
participants of both conversation spaces to the same central mixer participants of both conversation spaces to the same central mixer
(as in 5.3) (as in 5.3) For the 3pcc approach, the call flows for inserting
participants, and joining and splitting conversation spaces are
For the 3pcc approach, the call flows for inserting participants, tedious yet straightforward, so these are left as an exercise for the
and joining and splitting conversation spaces are tedious yet reader. Features enabled: - standard conference feature - leaving a
straightforward, so these are left as an exercise for the reader. sidebar to rejoin a larger conference
Features enabled:
- standard conference feature
- leaving a sidebar to rejoin a larger conference
5.3.5 Insert
The conversation space changes like this:
{ B , C } --> {A, B, C } 4.3.5 Insert
A inserts itself into a conversation space. The conversation space changes like this: { B , C } --> {A, B, C }
A inserts itself into a conversation space. A proposed mechanism for
signaling this using the peer-to-peer approach is to send a new
header in an INVITE with "joining" semantics. For example: INVITE B
Join: <call id of B and C> If B accepted the INVITE, B would accept
responsibility to setup the call legs and mixing necessary (for
example: to mix locally or to transfer the participants to a central
mixer) Features enabled: - barge-in - call center monitoring - call
recording
A proposed mechanism for signaling this using the peer-to-peer 4.3.6 Split
approach is to send a new header in an INVITE with "joining" { A, B, C, D } --> { A, B } , { C, D } If using a central conference
semantics. For example: with peer-to-peer REFER C Refer-To: conference-URI (new URI) REFER D
INVITE B Join: <call id of B and C> Refer-To: conference-URI (new URI) BYE C BYE D Features enabled: -
sidebar conversations during a larger conference
If B accepted the INVITE, B would accept responsibility to setup the 4.3.7 Near-fork
call legs and mixing necessary (for example: to mix locally or to
transfer the participants to a central mixer)
Features enabled: A participates in two conversation spaces simultaneously: { A, B }
- barge-in --> { B , A } & { A , C } A is a participant in two conversation
- call center monitoring spaces such that A sends the same media to both spaces, and renders
- call recording media from both spaces, presumably by mixing or rendering the media
from both. We can define that A is the "anchor" point for both
forks, each of which is a separate conversation space. This action is
purely local implementation (it requires no special signaling).
Local features such as switching calls between the background and
foreground are possible using this media relationship.
5.3.6 Split 4.3.8 Far fork
{ A, B, C, D } --> { A, B } , { C, D }
If using a central mixer with peer-to-peer The conversation space diagram... { A, B } --> { A , B } & { B , C }
REFER C Refer-To: mixer (new URI) A requests B to be the "anchor" of two conversation spaces. This is
REFER D Refer-To: mixer (new URI) easily setup by creating a conference with two subconferences and
BYE C setting the media policy appopriately such that B is a participant in
BYE D both. Media forking can also be setup using 3pcc as described in
Section 5.1 of RFC3264 [3] (an offer/answer model for SDP). The
session descriptions for forking are quite complex. Controllers
should verify that endpoints can handle forked-media, for example
using prior configuration.
Features enabled: Features enabled:
- sidebar conversations during a larger conference
5.3.7 Near-fork
A participates in two conversation spaces simultaneously:
{ A, B } --> { B , A } & { A , C }
SIP Multiparty Framework
A is a participant in two conversation spaces such that A sends the
same media to both spaces, and renders media from both spaces,
presumably by mixing or rendering the media from both. We can
define that A is the "anchor" point for both forks, each of which is
a separate conversation space.
This action is purely local implementation (it requires no special o barge-in
signaling). Local features such as switching calls between the
background and foreground are possible using this media
relationship.
5.3.8 Far fork
The conversation space diagram...
{ A, B } --> { A , B } & { B , C }
A requests B to be the "anchor" of two conversation spaces. o voice portal services
For an example of using 3pcc to setup media forking, see [Media o whisper
forking]. The session descriptions for forking are quite complex.
Controllers should verify that endpoints can handle forked-media, by
using some type of Requires header token.
Two ways to setup this media relationship using peer-to-peer call o hotword detection
control have been proposed:
- the anchor receives a REFER with requires forked-media (implicit)
- the anchor receives an INVITE with an explicit header (explicit)
Features enabled: o sending DTMF somewhere else
- barge-in
- voice portal services
- whisper
- hotword detection
- sending DTMF somewhere else
6 Security Considerations 5. Security Considerations
Call Control primitives provide a powerful set of features that can Call Control primitives provide a powerful set of features that can
be dangerous in the hands of an attacker. To complicate matters, be dangerous in the hands of an attacker. To complicate matters,
call control primitives are likely to be automatically authorized call control primitives are likely to be automatically authorized
without direct human oversight. without direct human oversight.
The class of attacks which are possible using these tools include The class of attacks which are possible using these tools include the
the ability to eavesdrop on calls, disconnect calls, redirect calls, ability to eavesdrop on calls, disconnect calls, redirect calls,
render irritating content (including ringing) at a user agent, cause render irritating content (including ringing) at a user agent, cause
an action that has billing consequences, subvert billing (theft-of- an action that has billing consequences, subvert billing
service), and obtain private information. Call control extensions (theft-of-service), and obtain private information. Call control
must take extra care to describe how these attacks will be extensions must take extra care to describe how these attacks will be
prevented. prevented.
SIP Multiparty Framework
We can also make some general observations about authorization and We can also make some general observations about authorization and
trust with respect to call control. The security model is trust with respect to call control. The security model is
dramatically dependent on the signaling model chosen (see section dramatically dependent on the signaling model chosen (see section
4.2) 3.2)
Let us first examine the security model used in the 3pcc approach. Let us first examine the security model used in the 3pcc approach.
All signaling goes through the controller, which is a trusted All signaling goes through the controller, which is a trusted entity.
entity. Traditional SIP authentication and hop-by-hop encrpytion Traditional SIP authentication and hop-by-hop encrpytion and message
and message integrity work fine in this environment, but end-to-end integrity work fine in this environment, but end-to-end encrpytion
encrpytion and message integrity may not be possible. and message integrity may not be possible.
When using the peer-to-peer approach, call control actions and When using the peer-to-peer approach, call control actions and
primitives can be legitimately initiated by a) an existing primitives can be legitimately initiated by a) an existing
participant in the conversation space, b) a former participant in participant in the conversation space, b) a former participant in the
the conversation space, or c) an entity trusted by one of the conversation space, or c) an entity trusted by one of the
participants. For example, a participant always initiates a participants. For example, a participant always initiates a
transfer; a retrieve from Park (a take) is initiated on behalf of a transfer; a retrieve from Park (a take) is initiated on behalf of a
former participant; and a barge-in (insert or far-fork) is initiated former participant; and a barge-in (insert or far-fork) is initiated
by a trusted entity (an operator for example). by a trusted entity (an operator for example).
Authenticating requests by an existing participant or a trusted Authenticating requests by an existing participant or a trusted
entity can be done with baseline SIP mechanisms. In the case of entity can be done with baseline SIP mechanisms. In the case of
features initiated by a former participant, these should be features initiated by a former participant, these should be protected
protected against replay attacks by using a unique name or against replay attacks by using a unique name or identifier per
identifier per invocation. The Replaces header exhibits this invocation. The Replaces header exhibits this behavior as a
behavior as a by-product of its operation (once a Replaces operation by-product of its operation (once a Replaces operation is successful,
is successful, the call-leg being Replaced no longer exists). For the call-leg being Replaced no longer exists). For other requests, a
other requests, a "one-time" Request-URI may be provided to the "one-time" Request-URI may be provided to the feature invoker.
feature invoker.
To authorize call control primitives that trigger special behavior To authorize call control primitives that trigger special behavior
(such as an INVITE with Replace, Join, or Fork semantics), the (such as an INVITE with Replaces or Join semantics), the receiving
receiving user agent may have trouble finding appropriate user agent may have trouble finding appropriate credentials with
credentials with which to challenge or authorize the request, as the which to challenge or authorize the request, as the sender may be
sender may be completely unknown to the receiver, except through the completely unknown to the receiver, except through the introduction
introduction of a third party. These credentials need to be passed of a third party. These credentials need to be passed transitively
transitively in some way or fetched in an event body, for example. in some way or fetched in an event body, for example.
7 Appendix A: Example Features 6. Appendix A: Example Features
Primitives are defined in terms of their ability to provide Primitives are defined in terms of their ability to provide features.
features. These example features should require an amply robust set
of services to demonstrate a useful set of primitives. They are These example features should require an amply robust set of services
described here briefly. Note that the descriptions of these features to demonstrate a useful set of primitives. They are described here
are non-normative. Some of these features are used as examples in briefly. Note that the descriptions of these features are
non-normative. Some of these features are used as examples in
section 6 to demonstrate how some features may require certain media section 6 to demonstrate how some features may require certain media
relationships. Note also that this document describes a mixture of relationships. Note also that this document describes a mixture of
both features originating in the world of telephones, and features both features originating in the world of telephones, and features
which are clearly Internet oriented. which are clearly Internet oriented.
7.1 Example Feature Definitions: Example Feature Definitions:
SIP Multiparty Framework
Call Waiting - Alice is in a call, then receives another call. Call Waiting - Alice is in a call, then receives another call. Alice
Alice can place the first call on hold, and talk with the other can place the first call on hold, and talk with the other caller.
caller. She can typically switch back and forth between the She can typically switch back and forth between the callers.
callers.
Blind Transfer - Alice is in a conversation with Bob. Alice asks Blind Transfer - Alice is in a conversation with Bob. Alice asks Bob
Bob to contact Carol, but makes no attempt to contact Craol to contact Carol, but makes no attempt to contact Craol
independently. In many implementations, Alice does not verify Bob's independently. In many implementations, Alice does not verify Bob's
success or failure in contacting Carol. success or failure in contacting Carol.
Attended Transfer - The transferring party establishes a session Attended Transfer - The transferring party establishes a session with
with the transfer target before completing the transfer. the transfer target before completing the transfer.
Consultative transfer - the transferring party establishes a session Consultative transfer - the transferring party establishes a session
with the target and mixes both sessions together so that all three with the target and mixes both sessions together so that all three
parties can participate, then disconnects leaving the transferee and parties can participate, then disconnects leaving the transferee and
transfer target with an active session. transfer target with an active session.
Conference Call - Three or more active, visible participants in the Conference Call - Three or more active, visible participants in the
same conversation space. same conversation space.
Call Park - A call participant parks a call (essentially puts the Call Park - A call participant parks a call (essentially puts the
skipping to change at page 29, line 58 skipping to change at page 31, line 17
Hotline - Alice picks up a phone and is immediately connected to the Hotline - Alice picks up a phone and is immediately connected to the
technical support hotline, for example. technical support hotline, for example.
Autoanswer - Calls to a certain address or location answer Autoanswer - Calls to a certain address or location answer
immediately via a speakerphone. immediately via a speakerphone.
Intercom - Alice typically presses a button on a phone which Intercom - Alice typically presses a button on a phone which
immediately connects to another user or phone and casues that phone immediately connects to another user or phone and casues that phone
to play her voice over its speaker. Some variations immediately to play her voice over its speaker. Some variations immediately
setup two-way communications, other variations require another setup two-way communications, other variations require another button
button to be pressed to enable a two-way conversation. to be pressed to enable a two-way conversation.
SIP Multiparty Framework
Speakerphone paging - Alice calls the paging address and speaks. Speakerphone paging - Alice calls the paging address and speaks. Her
Her voice is played on the speaker of every idle phone in a voice is played on the speaker of every idle phone in a preconfigured
preconfigured group of phones. group of phones.
Speed dial - Alice dials an abbreviated number, or enters an alias, Speed dial - Alice dials an abbreviated number, or enters an alias,
or presses a special speed dial button representing Bob. Her action or presses a special speed dial button representing Bob. Her action
is interpreted as if she specified the full address of Bob. is interpreted as if she specified the full address of Bob.
Call Return - Alice calls Bob. Bob misses the call or is Call Return - Alice calls Bob. Bob misses the call or is
disconnected before he is finished talking to Alice. Bob invokes disconnected before he is finished talking to Alice. Bob invokes
Call return which calls Alice, even if Alice did not provide her Call return which calls Alice, even if Alice did not provide her real
real identity or location to Bob. identity or location to Bob.
Inbound Call Screening - Alice doesn't want to receive calls from Inbound Call Screening - Alice doesn't want to receive calls from
Matt. Inbound Screening prevents Matt from disturbing Alice. In Matt. Inbound Screening prevents Matt from disturbing Alice. In
some variations this works even if Matt hides his identity. some variations this works even if Matt hides his identity.
Outbound Call Screening - Alice is paged and unknowingly calls a Outbound Call Screening - Alice is paged and unknowingly calls a PSTN
PSTN pay-service telephone number in the Carribean, but local policy pay-service telephone number in the Carribean, but local policy
blocks her call, and possibly informs her why. blocks her call, and possibly informs her why.
Call Forwarding - Before a call-leg is accepted it is redirected to Call Forwarding - Before a call-leg is accepted it is redirected to
another location, for example, because the originally intended another location, for example, because the originally intended
recipient is busy, does not answer, is disconnected from the recipient is busy, does not answer, is disconnected from the network,
network, configured all requests to go soemwhere else. configured all requests to go soemwhere else.
Message Waiting - Bob calls Alice when she steps away from her Message Waiting - Bob calls Alice when she steps away from her phone,
phone, when she returns a visible or audible indicator conveys that when she returns a visible or audible indicator conveys that someone
someone has left her a voicemail message. The message waiting has left her a voicemail message. The message waiting indication may
indication may also convey how many messages are waiting, from whom, also convey how many messages are waiting, from whom, what time, and
what time, and other useful pieces of information. other useful pieces of information.
Do Not Disturb - Alice selects the Do Not Disturb option. Calls to Do Not Disturb - Alice selects the Do Not Disturb option. Calls to
her either ring briefly or not at all and are forwarded elsewhere. her either ring briefly or not at all and are forwarded elsewhere.
Some variations allow specially authorized callers to override this Some variations allow specially authorized callers to override this
feature and ring Alice anyway. feature and ring Alice anyway.
Distinctive ring - Incoming calls have different ring cadences or Distinctive ring - Incoming calls have different ring cadences or
sample sounds depending on the From party, the To party, or other sample sounds depending on the From party, the To party, or other
factors. factors.
Automatic Callback: Alice calls Bob, but Bob is busy. Alice would Automatic Callback: Alice calls Bob, but Bob is busy. Alice would
like Bob to call her automatically when he is available. When Bob like Bob to call her automatically when he is available. When Bob
hangs up, alice's phone rings. When Alice answers, Bob's phone hangs up, alice's phone rings. When Alice answers, Bob's phone rings.
rings. Bob answers and they talk. Bob answers and they talk.
Find-Me - Alice sets up complicated rules for how she can be reached Find-Me - Alice sets up complicated rules for how she can be reached
(possibly using [CPL], [presence] or other factors). When Bob calls (possibly using [CPL], [presence] or other factors). When Bob calls
Alice, his call is eventually routed to a temporary Contact where Alice, his call is eventually routed to a temporary Contact where
Alice happens to be available. Alice happens to be available.
Whispered call waiting - Alice is in a conversation with Bob. Carol Whispered call waiting - Alice is in a conversation with Bob. Carol
calls Alice. Either Carol can "whisper" to Alice directly ("Can you calls Alice. Either Carol can "whisper" to Alice directly ("Can you
SIP Multiparty Framework
get lunch in 15 minutes?"), or an automaton whispers to Alice get lunch in 15 minutes?"), or an automaton whispers to Alice
informing her that Carol is trying to reach her. informing her that Carol is trying to reach her.
Voice message screening - Bob calls Alice. Alice is screening her Voice message screening - Bob calls Alice. Alice is screening her
calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob
leave his message. If she decides to talk to Bob, she can take the leave his message. If she decides to talk to Bob, she can take the
call back from the voicemail system, otherwise she can let Bob leave call back from the voicemail system, otherwise she can let Bob leave
a message. This emulates the behavior of a home telephone answering a message. This emulates the behavior of a home telephone answering
machine machine
skipping to change at page 31, line 50 skipping to change at page 33, line 21
call her call is disconnected or redirected to a service where she call her call is disconnected or redirected to a service where she
can purchase more calling value. can purchase more calling value.
Voice Portal - A service that allows users to access a portal site Voice Portal - A service that allows users to access a portal site
using spoken dialog interaction. For example, Alice needs to using spoken dialog interaction. For example, Alice needs to
schedule a working dinner with her co-worker Carol. Alice uses a schedule a working dinner with her co-worker Carol. Alice uses a
voice portal to check Carol's flight schedule, find a restauraunt voice portal to check Carol's flight schedule, find a restauraunt
near her hotel, make a reservation, get directions there, and page near her hotel, make a reservation, get directions there, and page
Carol with this information. Carol with this information.
7.2 Implementation of these features 6.1 Implementation of these features
Example Features: Example Features:
Call Hold [Offer/Answer] for SIP Call Hold [Offer/Answer] for SIP
Call Waiting Local Implementation Call Waiting Local Implementation
Blind Transfer [cc-transfer] Blind Transfer [cc-transfer]
Attended Transfer [cc-transfer] Attended Transfer [cc-transfer]
Consultative transfer [cc-transfer] Consultative transfer [cc-transfer]
Conference Call [conf-models] Conference Call [conf-models]
SIP Multiparty Framework
Call Park *[examples] Call Park *[examples]
Call Pickup *[examples] Call Pickup *[examples]
Music on Hold *[examples] Music on Hold *[examples]
Call Monitoring *Insert Call Monitoring *Insert
Barge-in *Insert or Far-Fork Barge-in *Insert or Far-Fork
Hotline Local Implementation Hotline Local Implementation
Autoanswer Local URI convention Autoanswer Local URI convention
Speed dial Local Implementation Speed dial Local Implementation
Intercom *Speed dial + autoanswer Intercom *Speed dial + autoanswer
Speakerphone paging *Speed dial + autoanswer Speakerphone paging *Speed dial + autoanswer
skipping to change at page 32, line 34 skipping to change at page 34, line 39
Find-Me Proxy service based on presence Find-Me Proxy service based on presence
Whispered call waiting Local implementation Whispered call waiting Local implementation
Voice message screening * Voice message screening *
Presence-based Conferencing*call when presence = available Presence-based Conferencing*call when presence = available
IM Conference Alerts subscribe to conference status IM Conference Alerts subscribe to conference status
Single Line Extension * Single Line Extension *
Click-to-dial * Click-to-dial *
Pre-paid calling * Pre-paid calling *
Voice Portal * Voice Portal *
7.2.1 Call Park 6.1.1 Call Park
Call park requires the ability to: put a dialog some place, Call park requires the ability to: put a dialog some place, advertise
advertise it to users in a pickup group and to uniquely identify it it to users in a pickup group and to uniquely identify it in a means
in a means that can be communicated (including human voice). The that can be communicated (including human voice). The dialog can be
dialog can be held locally on the UA parking the dialog or held locally on the UA parking the dialog or alternatively
alternatively transferred to the park service for the pickup group. transferred to the park service for the pickup group. The parked
The parked dialog then needs to be labeled (e.g. orbit 12) in a way dialog then needs to be labeled (e.g. orbit 12) in a way that can be
that can be communicated to the party that is to pick up the call. communicated to the party that is to pick up the call. The UAs in
The UAs in the pick up group discovers the parked dialog(s) via the pick up group discovers the parked dialog(s) via the dialog
[call-leg] from the park service. If the dialog is parked locally package from the park service. If the dialog is parked locally the
the park service merely aggregates the parked call states from the park service merely aggregates the parked call states from the set of
set of UAs in the pickup up group. UAs in the pickup up group.
7.2.2 Call Pickup 6.1.2 Call Pickup
There are two different features which are called call pickup. The There are two different features which are called call pickup. The
first is the pickup of a parked dialog. The UA from which the first is the pickup of a parked dialog. The UA from which the dialog
dialog is to be picked up subscribes to the call state [call-leg] of is to be picked up subscribes to the session dialog state of the park
the park service or the UA which has locally parked the dialog. service or the UA which has locally parked the dialog. Dialogs which
Dialogs which are parked should be labeled with an identifier. The are parked should be labeled with an identifier. The labels are used
labels are used by the UA to allow the user to indicate which dialog by the UA to allow the user to indicate which dialog is to be picked
is to be picked up. The UA picking up the call invoked the URL in up. The UA picking up the call invoked the URL in the call state
the call state which is labeled as replace-remote. which is labeled as replace-remote.
SIP Multiparty Framework
The other call pickup feature involves picking up an early dialog The other call pickup feature involves picking up an early dialog
(typically ringing). This feature uses some of the same primitives (typically ringing). This feature uses some of the same primitives
as the pick up of a parked call. The call state of the UA ringing as the pick up of a parked call. The call state of the UA ringing
phone is advertised using [call-leg]. The UA which is to pickup the phone is advertised using the dialog package. The UA which is to
early dialog subscribes either directly to the ringing UA or to a pickup the early dialog subscribes either directly to the ringing UA
service aggregating the states for UAs in the pickup group. The or to a service aggregating the states for UAs in the pickup group.
call state identifies early dialogs. The UA uses the call state(s) The call state identifies early dialogs. The UA uses the call
to help the user choose which early dialog that is to be picked up. state(s) to help the user choose which early dialog that is to be
The UA then invokes the URL in the call state labeled as replace- picked up. The UA then invokes the URL in the call state labeled as
remote. replace-remote.
7.2.3 Music on Hold 6.1.3 Music on Hold
Music on hold can be implemented a number of ways. One way is to Music on hold can be implemented a number of ways. One way is to
transfer the held call to a holding service. When the UA wishes to transfer the held call to a holding service. When the UA wishes to
take the call off hold it basically performs a take on the call from take the call off hold it basically performs a take on the call from
the holding service. This involves subscribing to call state on the the holding service. This involves subscribing to call state on the
holding service and then invoking the URL in the call state labeled holding service and then invoking the URL in the call state labeled
as replace-remote. as replace-remote.
Alternatively music on hold can be performed as a local mixing Alternatively music on hold can be performed as a local mixing
operation. The UA holding the call can mix in the music from the operation. The UA holding the call can mix in the music from the
music service via RTP (i.e. an additional dialog) or RTSP or other music service via RTP (i.e. an additional dialog) or RTSP or other
streaming media source. This approach is simpler (i.e. the held streaming media source. This approach is simpler (i.e. the held
dialog does not move so there is less chance of loosing them) from a dialog does not move so there is less chance of loosing them) from a
protocol perspective, however it does use more LAN bandwidth and protocol perspective, however it does use more LAN bandwidth and
resources on the UA. resources on the UA.
7.2.4 Call Monitoring 6.1.4 Call Monitoring
Call monitoring is a [join] operation. The monitoring UA sends a Call monitoring is a Join operation. The monitoring UA sends a Join
Join to the dialog it wants to listen to. It is able to discover to the dialog it wants to listen to. It is able to discover the
the dialog via the call state [call-leg] on the monitored UA. The dialog via the dialog state on the monitored UA. The monitoring UA
monitoring UA sends SDP in the INVITE which indicates receive only sends SDP in the INVITE which indicates receive only media. As the
media {offer/answer]. IN addition the monitoring UA should indicate UA is monitoring only it does not matter whether the UA indicates it
that it wants to receive a mix (see Error! Reference source not wishes the send stream be mix or point to point.
found.). As the UA is monitoring only it does not matter whether
the UA indicates it wishes the send stream be mix or point to point.
7.2.5 Barge-in 6.1.5 Barge-in
Barge-in works the same as call monitoring except that it must Barge-in works the same as call monitoring except that it must
indicate that the send media stream to be mixed so that all of the indicate that the send media stream to be mixed so that all of the
other parties can hear the stream from UA barging in. other parties can hear the stream from UA barging in.
7.2.6 Intercom 6.1.6 Intercom
The UA initiates a dialog using INVITE in the ordinary way [bis].
The calling UA then signals the paged UA to answer the call. The
calling UA may discover the URL to answer the call via the call
state [call-leg] of the called UA. The called UA accepts the INVITE
with a 200 Ok and automatically enables the speakerphone.
SIP Multiparty Framework The UA initiates a dialog using INVITE in the ordinary way. The
calling UA then signals the paged UA to answer the call. The calling
UA may discover the URL to answer the call via the session dialog
package of the called UA. The called UA accepts the INVITE with a 200
Ok and automatically enables the speakerphone.
Alternatively this can be a local decision for the UA to answer Alternatively this can be a local decision for the UA to answer based
based upon called party identification. upon called party identification.
7.2.7 Speakerphone paging 6.1.7 Speakerphone paging
Speakerphone paging can be implemented using either multicast or Speakerphone paging can be implemented using either multicast or
through a simple multipoint mixer. In the multicast solution the through a simple multipoint mixer. In the multicast solution the
paging UA sends a multicast INVITE [bis] with send only media in the paging UA sends a multicast INVITE with send only media in the SDP
[SDP] (see also [offer/answer]). The automatic answer and enabling (see also RFC3264). The automatic answer and enabling of the
of the speakerphone is a locally configured decision on the paged speakerphone is a locally configured decision on the paged UAs. The
UAs. The paging UA sends RTP via the multicast address indicated in paging UA sends RTP via the multicast address indicated in the SDP.
the SDP.
The multipoint solution is accomplished by sending an INVITE to the The multipoint solution is accomplished by sending an INVITE to the
multipoint mixer. The mixer is configured to automatically answer multipoint mixer. The mixer is configured to automatically answer
the dialog. The paging UA then sends [REFER] requests for each of the dialog. The paging UA then sends REFER requests for each of the
the UAs that are to become paging speakers (The UA is likely to send UAs that are to become paging speakers (The UA is likely to send out
out a single REFER which is parallel forked by the proxy server). a single REFER which is parallel forked by the proxy server). The
The UAs performing as paging speakers are configured to UAs performing as paging speakers are configured to automatically
automatically answer based upon caller identification (e.g. To answer based upon caller identification (e.g. To field, URI or
field, URI or Referred-To headers). Referred-To headers).
7.2.8 Distinctive ring Finally as a third option, the user agent can send a mass-invitation
request to a conference server, which would create a conference and
send invitations to the conference to all user agents in the paging
group.
6.1.8 Distinctive ring
The target UA either makes a local decision based on information in The target UA either makes a local decision based on information in
an incoming INVITE (To, From, Contact, Request-URI) or trusts an an incoming INVITE (To, From, Contact, Request-URI) or trusts an
Alert-Info header provded by the caller or inserted by a trusted Alert-Info header provded by the caller or inserted by a trusted
proxy. In the latter case, the UA fetches the content described in proxy. In the latter case, the UA fetches the content described in
the URI (typically via http) and renders it to the user. the URI (typically via http) and renders it to the user.
7.2.9 Voice message screening 6.1.9 Voice message screening
At first, this is the same as call monitoring. In this case the At first, this is the same as call monitoring. In this case the
voicemail service is one of the UAs. The UA screening the message voicemail service is one of the UAs. The UA screening the message
monitors the call on the voicemail service, and also subscribes to monitors the call on the voicemail service, and also subscribes to
call-leg information. If the user screening their messages decides call-leg information. If the user screening their messages decides
to answer, they perform a Take from the voicemail system (for to answer, they perform a Take from the voicemail system (for
example, send an INVITE with Replaces to the UA leaving the message) example, send an INVITE with Replaces to the UA leaving the message)
7.2.10 Single Line Extension 6.1.10 Single Line Extension
Incoming calls ring all the extensions through basic parallel Incoming calls ring all the extensions through basic parallel forking
forking [bis]. Each extension subscribes to call-leg events from [bis]. Each extension subscribes to call-leg events from each other
each other extension. While one user has an active call, any other extension. While one user has an active call, any other UA extension
UA extension can insert itself into that conversation (it already can insert itself into that conversation (it already knows the
knows the call-leg information)in the same way as barge-in. call-leg information)in the same way as barge-in.
7.2.11 Click-to-dial 6.1.11 Click-to-dial
The application or server which hosts the click-to-dial application The application or server which hosts the click-to-dial application
captures the URL to be dialed and can setup the call using 3pcc or captures the URL to be dialed and can setup the call using 3pcc or
can send a [REFER] request to the UA which is to dial the address. can send a REFER request to the UA which is to dial the address. As
As users sometimes change their mind or wish to give up listing to a users sometimes change their mind or wish to give up listing to a
SIP Multiparty Framework ringing or voicemail answered phone, this application illustrates the
need to also have the ability to remotely hangup a call.
ringing or voicemail answered phone, this application illustrates
the need to also have the ability to remotely hangup a call.
7.2.12 Pre-paid calling 6.1.12 Pre-paid calling
For prepaid calling, the user's media always passes through a device For prepaid calling, the user's media always passes through a device
which is trusted by the pre-paid provider. This may be the other which is trusted by the pre-paid provider. This may be the other
endpoint (for example a PSTN gateway). In either case, an endpoint (for example a PSTN gateway). In either case, an
intermediary proxy or B2BUA can periodically verify the amount of intermediary proxy or B2BUA can periodically verify the amount of
time available on the pre-paid account, and use the session-timer time available on the pre-paid account, and use the session-timer
extension to cause the trusted endpoint (gateway) or intermediary extension to cause the trusted endpoint (gateway) or intermediary
(media relay) to send a reINVITE before that time runs out. During (media relay) to send a reINVITE before that time runs out. During
the reINVITE, the SIP intermediary can reverify the account and the reINVITE, the SIP intermediary can reverify the account and
insert another session-timer header. insert another session-timer header.
Note that while most pre-paid systems on the PSTN use an IVR to Note that while most pre-paid systems on the PSTN use an IVR to
collect the account number and destination, this isn't strictly collect the account number and destination, this isn't strictly
necessary for a SIP-originated prepaid call. SIP requests and SIP necessary for a SIP-originated prepaid call. SIP requests and SIP
URIs are sufficiently expressive to convey the final destination, URIs are sufficiently expressive to convey the final destination, the
the provider of the prepaid service, the location from which the provider of the prepaid service, the location from which the user is
user is calling, and the prepaid account they want to use. If a calling, and the prepaid account they want to use. If a pre-paid IVR
pre-paid IVR is used, the mechanism described below (Voice Portals) is used, the mechanism described below (Voice Portals) can be
can be combined as well. combined as well.
7.2.13 Voice Portal 6.1.13 Voice Portal
A voice portal is essentially a complex collection of voice dialogs A voice portal is essentially a complex collection of voice dialogs
used to access interesting content. One of the most desirable call used to access interesting content. One of the most desirable call
control features of a Voice Portal is the ability to start a new control features of a Voice Portal is the ability to start a new
outgoing call from within the context of the Portal (to make a outgoing call from within the context of the Portal (to make a
restauraunt reservation, or return a voicemail message for example). restauraunt reservation, or return a voicemail message for example).
Once the new call is over, the user should be able to return to the Once the new call is over, the user should be able to return to the
Portal by pressing a special key, using some DTMF sequence (ex: a Portal by pressing a special key, using some DTMF sequence (ex: a
very long pound or hash tone), or by speaking a hotword (ex: "Main very long pound or hash tone), or by speaking a hotword (ex: "Main
Menu"). Menu").
skipping to change at page 36, line 5 skipping to change at page 38, line 33
the User to perform a Far-Fork. In other words the Voice Portal the User to perform a Far-Fork. In other words the Voice Portal
wants the following media relationship: wants the following media relationship:
{ Target , User } & { User , Voice Portal } { Target , User } & { User , Voice Portal }
The Voice Portal is now just listening for a hotword or the The Voice Portal is now just listening for a hotword or the
appropriate DTMF. As soon as the user indicates they are done, the appropriate DTMF. As soon as the user indicates they are done, the
Voice Portal Takes the call from the old Target, and we are back to Voice Portal Takes the call from the old Target, and we are back to
the original media relationship. the original media relationship.
SIP Multiparty Framework
This feature can also be used by the account number and phone number This feature can also be used by the account number and phone number
collection menu in a pre-paid calling service. A user can press a collection menu in a pre-paid calling service. A user can press a
DTMF sequence which presents them with the a DTMF sequence which presents them with the appropriate menu again.
8 References
[SIP] M. Handley, E. Schooler, and H. Schulzrinne, "SIP: Session
Initiation Protocol", RFC2543, Internet Engineering Task Force,
Nov 1998.
[RFC2119] S. Bradner, "Key words for use in RFCs to indicate
requirement levels," Request for Comments (Best Current
Practice) 2119, Internet Engineering Task Force, Mar. 1997.
[REFER] R. Sparks, "The Refer Method", Internet Draft <draft-ietf-
sip-refer-02>, IETF, October 30, 2001, Work in progress.
[3pcc] J. Rosenberg, J. Peterson, H. Schulzrinne, G. Camarillo,
"Third Party Call Control in SIP", Internet Draft <draft-rosenberg-
sip-3pcc-02.txt>, IETF; March 2001. Work in progress
[transfer] R. Sparks, "SIP Call Control - Transfer", Internet Draft
<draft-ietf-sip-cc-transfer-04.txt>, IETF; Feb. 2001. Work in
progress.
[Replaces] B. Biggs, R. Dean, R. Mahy, "The SIP Replaces Header",
Internet Draft <draft-ietf-sip-replaces-00.txt>, IETF, Nov. 2001.
Work in progress.
[conf-models] J. Rosenberg, H. Schulzrinne, "Models for Multi Party
Conferencing in SIP", Internet Draft <draft-rosenberg-sip-
conferencing-models-00.txt>, IETF; Nov. 2000. Work in progress.
[service examples] A. Johnston, R. Sparks, C. Cunningham, S.
Donovan, K. Summers, "SIP Service Examples" Internet Draft <draft-
ietf-sip-service-examples-03.txt>, IETF, June 2002, Work in
progress.
[Join] R. Mahy, D. Petrie, "The SIP Join and Fork Headers", Internet
Draft <draft-mahy-sipping-join-and-fork-00.txt>, IETF, November
2001, Work in progress.
[RTP] H. Schulzrinne , S. Casner , R. Frederick , V. Jacobson ,
"RTP: A Transport Protocol for Real-Time Applications", Request for
Comments (Standards Track)1889, IETF, January 1996
[SDP] H. Schulzrinne M. Handley, V. Jacobson, "SDP: Session
Description Protocol", Request for Comments (Standards Track) 2327,
Internet Engineering Task Force, April 1998
SIP Multiparty Framework
[events] A. Roach, "SIP-Specific Event Notification",Internet Draft
<draft-ietf-sip-events-03.txt>, IETF, February 2002, Work in
progress.
[offer/answer] J. Rosenberg, H. Schulzrinne, "An Offer/Answer Model
with SDP", Internet Draft <draft-ietf-mmusic-sdp-offer-answer-
01.txt>, IETF, February 21, 2002, Work in progress.
[caller prefs] J. Rosenberg, "SIP Caller Preferences and Callee
Capabilities",Internet Draft <draft-ietf-sip-callerprefs-05.txt>,
IETF, November 21, 2001, Work in progress.
[msg waiting] R. Mahy, I. Slain, "Message Waiting in SIP",Internet
Draft <draft-mahy-sip-message-waiting-02.txt>, IETF, July 2001, Work
in progress.
[Presence] Rosenberg et al., "SIP Extensions for Presence", Internet
Draft <draft-ietf-simple-presence-04.txt>, IETF, November 21, 2001,
Work in progress.
[visited] D. Oran, H. Schulzrinne, "The Visited Header",Internet
Draft <>, IETF, date, Work in progress.
[app components] , "",Internet Draft <>, IETF, date, Work in
progress.
[ms-uri] J. Van Dyke, E. Burger, "SIP URI Conventions for Media
Servers",Internet Draft <draft-burger-sipping-msuri-01.txt>, IETF,
November 21, 2001, Work in progress.
[call-pkg] J. Rosenberg, H. Schulzrinne, "SIP Event Packages for
Call Leg and Conference State", Internet Draft <draft-rosenberg-sip-
call-package-00.txt>, IETF, July 13, 2001, Work in progress.
[enum] , "",Internet Draft <>, IETF, date, Work in progress.
[http] R. Fielding et al, "Hypertext Transfer Protocol --
HTTP/1.1", Request for Comments (Standards Track) 2616, Internet
Engineering Task Force, June 1999
[rtsp] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming
Protocol (RTSP)", Request for Comments (Standards Track) 2326,
Internet Engineering Task Force, April 1998
[mrcp] S. Shanmugham, P. Monaco, B. Eberman, "MRCP: Media Resource
Control Protocol", Internet Draft <draft-shanmugham-mrcp-01.txt>,
IETF, November 20, 2001, Work in progress.
[VoiceXML] S. McGlashan et al, "Voice Extensible Markup Language
(VoiceXML) Version 2.0", W3C Working Draft, 23 October 2001, Work in
progress.
[H.323]
SIP Multiparty Framework
[tel URL]
[caller-prefs]
[session timer]
[service context]
[avt tones]
[GSM] Normative References
[MPEG2] [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP:
Session Initiation Protocol", RFC 3261, June 2002.
[G.711] [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[H.261] [3] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
Session Description Protocol (SDP)", RFC 3264, June 2002.
[H.450] [4] Roach, A., "Session Initiation Protocol (SIP)-Specific Event
Notification", RFC 3265, June 2002.
[JTAPI] [5] Handley, M. and V. Jacobson, "SDP: Session Description
Protocol", RFC 2327, April 1998.
[CSTA] [6] Johnston, A. and S. Donovan, "Session Initiation Protocol
Service Examples", draft-ietf-sipping-service-examples-04 (work
in progress), March 2003.
[mrcp-sip] , "",Internet Draft <draft-robinson-mrcp-sip-00.txt>, [7] Rosenberg, J., Schulzrinne, H., Camarillo, G. and J. Peterson,
IETF, date, Work in progress. "Best Current Practices for Third Party Call Control in the
Session Initiation Protocol", draft-ietf-sipping-3pcc-03 (work
in progress), March 2003.
[distributed full mesh conf] [8] Sparks, R., "The SIP Refer Method", draft-ietf-sip-refer-07
(work in progress), December 2002.
[Media forking] M. Shankar, "SIP Forked Media", Internet Draft [9] Dean, R., Biggs, B. and R. Mahy, "The Session Inititation
<draft-shankar-sip-forked-media-00.txt>, IETF, Feb. 2001. Work in Protocol (SIP) 'Replaces' Header", draft-ietf-sip-replaces-03
progress. (work in progress), March 2003.
[PHONECTL] R. Dean, Belkind, B. Biggs, "PHONECTL: A Protocol for [10] Mahy, R. and D. Petrie, "The Session Inititation Protocol (SIP)
Remote Phone Control", Internet Draft <draft-dean-phonectl-03.txt>, 'Join' Header", draft-ietf-sip-join-01 (work in progress),
IETF, Jan. 2001. Work in progress. March 2003.
9 Changes since -00 [11] Rosenberg, J. and H. Schulzrinne, "An INVITE Inititiated Dialog
Event Package for the Session Initiation Protocol (SIP",
draft-ietf-sipping-dialog-package-01 (work in progress), March
2003.
- Removed many media-specific references. [12] Rosenberg, J. and H. Schulzrinne, "A Session Initiation
Protocol (SIP) Event Package for Conference State",
draft-ietf-sipping-conference-package-00 (work in progress),
June 2002.
- Condensed discussion on mixing models, and VoiceXML discussion. [13] Rosenberg, J., "A Session Initiation Protocol (SIP) Event
Package for Registrations", draft-ietf-sipping-reg-event-00
(work in progress), October 2002.
- Moved the sample feature discussion to an Appendix [14] Rosenberg, J., "A Presence Event Package for the Session
Initiation Protocol (SIP)", draft-ietf-simple-presence-10 (work
in progress), January 2003.
10 [15] Rosenberg, J., "A Framework for Conferencing with the Session
To Do Initiation Protocol",
draft-rosenberg-sipping-conferencing-framework-01 (work in
progress), February 2003.
- Add diagrams to section 4.3.2 and 4.3.3 [16] Rosenberg, J., "A Framework and Requirements for Application
Interaction in SIP",
draft-rosenberg-sipping-app-interaction-framework-00 (work in
progress), November 2002.
- Convert to XML [17] Mahy, R. and N. Ismail, "Media Policy Manipulation in the
Conference Policy Control Protocol",
draft-mahy-sipping-media-policy-control-00 (work in progress),
February 2003.
- Fix references [18] Camarillo, G., "Transcoding Services Invocation in the Session
SIP Multiparty Framework Initiation Protocol", draft-camarillo-sip-deaf-02 (work in
progress), February 2003.
- Propose to move Appendix A (sample features to service flows) [19] Sparks, R. and A. Johnston, "Session Initiation Protocol Call
Control - Transfer", draft-ietf-sipping-cc-transfer-01 (work in
progress), February 2003.
- Align with terminology with conferencing drafts [20] Johnston, A. and O. Levin, "Session Initiation Protocol Call
Control - Conferencing for User Agents",
draft-johnston-sipping-cc-conferencing-01 (work in progress),
February 2003.
- Show roadmap for related drafts [21] Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Caller
Preferences and Callee Capabilities for the Session Initiation
Protocol (SIP)", draft-ietf-sip-callerprefs-08 (work in
progress), March 2003.
Other frameworks and requirements Informational References
Conferencing framework
Conferencing models
Framework for markup
Extensions Authors' Addresses
REFER
Replaces
Join
Caller prefs
Packages Rohan Mahy
conference-package Cisco Systems
dialog package
Usage Drafts EMail: rohan@cisco.com
3pcc
cc-transfer
Informational Drafts Ben Campbell
Service flows dynamicsoft
- Define some semantics for authorization rules. For example one EMail: bcampbell@dynamicsoft.com
could define a dictionary of primitives and/or perhaps define sets
or classes of these primitives, then configure who is allowed to use
them
11 Robert Sparks
Acknowledgments dynamicsoft
Thanks to all who attended the SIP interim meeting in February 2001 EMail: rsparks@dynamicsoft.com
for their support of the ideas behind this document. Jonathan Rosenberg
dynamicsoft
12 EMail: jdrosen@dynamicsoft.com
Author's Addresses
Rohan Mahy Dan Petrie
Cisco Systems Pingtel
170 West Tasman Dr, MS: SJC-21/3/3
Phone: +1 408 526 8570
Email: rohan@cisco.com
Ben Campbell EMail: dpetrie@pingtel.com
dynamicsoft
5100 Tennyson Parkway
Suite 1200
Plano, Texas 75024
Email: bcampbell@dynamicsoft.com
SIP Multiparty Framework
Alan Johnston Alan Johnston
WorldCom WorldCom
100 S. 4th Street
St. Louis, Missouri 63104
Email: alan.johnston@wcom.com
Daniel G. Petrie EMail: alan.johnston@wcom.com
Pingtel Corp.
400 W. Cummings Park
Suite 2200
Woburn, MA 01801
Phone: +1 781 938 5306
Email: dpetrie@pingtel.com
Jonathan Rosenberg Intellectual Property Statement
dynamicsoft
72 Eagle Rock Avenue
First Floor
East Hanover, NJ 07936
Email: jdrosen@dynamicsoft.com
Robert J. Sparks The IETF takes no position regarding the validity or scope of any
dynamicsoft intellectual property or other rights that might be claimed to
5100 Tennyson Parkway pertain to the implementation or use of the technology described in
Suite 1200 this document or the extent to which any license under such rights
Plano, TX 75024 might or might not be available; neither does it represent that it
Email: rsparks@dynamicsoft.com has made any effort to identify any such rights. Information on the
IETF's procedures with respect to rights in standards-track and
standards-related documentation can be found in BCP-11. Copies of
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to
obtain a general license or permission for the use of such
proprietary rights by implementors or users of this specification can
be obtained from the IETF Secretariat.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to practice
this standard. Please address the information to the IETF Executive
Director.
Full Copyright Statement Full Copyright Statement
"Copyright (C) The Internet Society (date). All Rights Reserved. Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph kind, provided that the above copyright notice and this paragraph are
are included on all such copies and derivative works. However, this included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than followed, or as required to translate it into languages other than
English. English.
The limited permissions granted above are perpetual and will not be The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns. revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
SIP Multiparty Framework Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/