draft-ietf-sipping-app-interaction-framework-04.txt   draft-ietf-sipping-app-interaction-framework-05.txt 
SIPPING J. Rosenberg
SIP J. Rosenberg
Internet-Draft Cisco Systems Internet-Draft Cisco Systems
Expires: August 17, 2005 February 16, 2005 Expires: January 19, 2006 July 18, 2005
A Framework for Application Interaction in the Session Initiation A Framework for Application Interaction in the Session Initiation
Protocol (SIP) Protocol (SIP)
draft-ietf-sipping-app-interaction-framework-04 draft-ietf-sipping-app-interaction-framework-05
Status of this Memo Status of this Memo
This document is an Internet-Draft and is subject to all provisions By submitting this Internet-Draft, each author represents that any
of section 3 of RFC 3667. By submitting this Internet-Draft, each applicable patent or other IPR claims of which he or she is aware
author represents that any applicable patent or other IPR claims of have been or will be disclosed, and any of which he or she becomes
which he or she is aware have been or will be disclosed, and any of aware will be disclosed, in accordance with Section 6 of BCP 79.
which he or she become aware will be disclosed, in accordance with
RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as other groups may also distribute working documents as Internet-
Internet-Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 17, 2005. This Internet-Draft will expire on January 19, 2006.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2005).
Abstract Abstract
This document describes a framework for the interaction between users This document describes a framework for the interaction between users
and Session Initiation Protocol (SIP) based applications, and defines and Session Initiation Protocol (SIP) based applications. By
a new Refer-To header field parameter and option tag in support of interacting with applications, users can guide the way in which they
that framework. By interacting with applications, users can guide operate. The focus of this framework is stimulus signaling, which
the way in which they operate. The focus of this framework is allows a user agent to interact with an application without knowledge
stimulus signaling, which allows a user agent to interact with an of the semantics of that application. Stimulus signaling can occur
application without knowledge of the semantics of that application. to a user interface running locally with the client, or to a remote
user interface, through media streams. Stimulus signaling
Stimulus signaling can occur to a user interface running locally with encompasses a wide range of mechanisms, ranging from clicking on
the client, or to a remote user interface, through media streams. hyperlinks, to pressing buttons, to traditional Dual Tone Multi
Stimulus signaling encompasses a wide range of mechanisms, ranging Frequency (DTMF) input. In all cases, stimulus signaling is
from clicking on hyperlinks, to pressing buttons, to traditional Dual supported through the use of markup languages, which play a key role
Tone Multi Frequency (DTMF) input. In all cases, stimulus signaling in this framework.
is supported through the use of markup languages, which play a key
role in this framework.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. A Model for Application Interaction . . . . . . . . . . . . . 7 3. A Model for Application Interaction . . . . . . . . . . . . . 7
3.1 Functional vs. Stimulus . . . . . . . . . . . . . . . . . 9 3.1 Functional vs. Stimulus . . . . . . . . . . . . . . . . . 9
3.2 Real-Time vs. Non-Real Time . . . . . . . . . . . . . . . 9 3.2 Real-Time vs. Non-Real Time . . . . . . . . . . . . . . . 9
3.3 Client-Local vs. Client-Remote . . . . . . . . . . . . . . 10 3.3 Client-Local vs. Client-Remote . . . . . . . . . . . . . . 10
3.4 Presentation Capable vs. Presentation Free . . . . . . . . 11 3.4 Presentation Capable vs. Presentation Free . . . . . . . . 11
4. Interaction Scenarios on Telephones . . . . . . . . . . . . . 11 4. Interaction Scenarios on Telephones . . . . . . . . . . . . . 11
4.1 Client Remote . . . . . . . . . . . . . . . . . . . . . . 12 4.1 Client Remote . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Client Local . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 Client Local . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Flip-Flop . . . . . . . . . . . . . . . . . . . . . . . . 12 4.3 Flip-Flop . . . . . . . . . . . . . . . . . . . . . . . . 12
5. Framework Overview . . . . . . . . . . . . . . . . . . . . . . 13 5. Framework Overview . . . . . . . . . . . . . . . . . . . . . . 13
6. Deployment Topologies . . . . . . . . . . . . . . . . . . . . 15 6. Deployment Topologies . . . . . . . . . . . . . . . . . . . . 16
6.1 Third Party Application . . . . . . . . . . . . . . . . . 16 6.1 Third Party Application . . . . . . . . . . . . . . . . . 16
6.2 Co-Resident Application . . . . . . . . . . . . . . . . . 16 6.2 Co-Resident Application . . . . . . . . . . . . . . . . . 17
6.3 Third Party Application and User Device Proxy . . . . . . 17 6.3 Third Party Application and User Device Proxy . . . . . . 18
6.4 Proxy Application . . . . . . . . . . . . . . . . . . . . 18 6.4 Proxy Application . . . . . . . . . . . . . . . . . . . . 19
7. Application Behavior . . . . . . . . . . . . . . . . . . . . . 19 7. Application Behavior . . . . . . . . . . . . . . . . . . . . . 19
7.1 Client Local Interfaces . . . . . . . . . . . . . . . . . 19 7.1 Client Local Interfaces . . . . . . . . . . . . . . . . . 20
7.1.1 Discovering Capabilities . . . . . . . . . . . . . . . 19 7.1.1 Discovering Capabilities . . . . . . . . . . . . . . . 20
7.1.2 Pushing an Initial Interface Component . . . . . . . . 20 7.1.2 Pushing an Initial Interface Component . . . . . . . . 20
7.1.3 Updating an Interface Component . . . . . . . . . . . 22 7.1.3 Updating an Interface Component . . . . . . . . . . . 22
7.1.4 Terminating an Interface Component . . . . . . . . . . 22 7.1.4 Terminating an Interface Component . . . . . . . . . . 22
7.2 Client Remote Interfaces . . . . . . . . . . . . . . . . . 23 7.2 Client Remote Interfaces . . . . . . . . . . . . . . . . . 23
7.2.1 Originating and Terminating Applications . . . . . . . 23 7.2.1 Originating and Terminating Applications . . . . . . . 23
7.2.2 Intermediary Applications . . . . . . . . . . . . . . 24 7.2.2 Intermediary Applications . . . . . . . . . . . . . . 23
8. User Agent Behavior . . . . . . . . . . . . . . . . . . . . . 24 8. User Agent Behavior . . . . . . . . . . . . . . . . . . . . . 24
8.1 Advertising Capabilities . . . . . . . . . . . . . . . . . 24 8.1 Advertising Capabilities . . . . . . . . . . . . . . . . . 24
8.2 Receiving User Interface Components . . . . . . . . . . . 25 8.2 Receiving User Interface Components . . . . . . . . . . . 25
8.3 Mapping User Input to User Interface Components . . . . . 26 8.3 Mapping User Input to User Interface Components . . . . . 26
8.4 Receiving Updates to User Interface Components . . . . . . 27 8.4 Receiving Updates to User Interface Components . . . . . . 27
8.5 Terminating a User Interface Component . . . . . . . . . . 27 8.5 Terminating a User Interface Component . . . . . . . . . . 27
9. Inter-Application Feature Interaction . . . . . . . . . . . . 28 9. Inter-Application Feature Interaction . . . . . . . . . . . . 27
9.1 Client Local UI . . . . . . . . . . . . . . . . . . . . . 28 9.1 Client Local UI . . . . . . . . . . . . . . . . . . . . . 28
9.2 Client-Remote UI . . . . . . . . . . . . . . . . . . . . . 29 9.2 Client-Remote UI . . . . . . . . . . . . . . . . . . . . . 29
10. Intra Application Feature Interaction . . . . . . . . . . . 30 10. Intra Application Feature Interaction . . . . . . . . . . . 29
11. Example Call Flow . . . . . . . . . . . . . . . . . . . . . 30 11. Example Call Flow . . . . . . . . . . . . . . . . . . . . . 30
12. Security Considerations . . . . . . . . . . . . . . . . . . 35 12. Security Considerations . . . . . . . . . . . . . . . . . . 35
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . 36 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . 36
13.1 SIP Option Tag . . . . . . . . . . . . . . . . . . . . . . 36
13.2 Header Field Parameter . . . . . . . . . . . . . . . . . . 36
14. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 36 14. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 36
15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36
16. References . . . . . . . . . . . . . . . . . . . . . . . . . 37 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 36
16.1 Normative References . . . . . . . . . . . . . . . . . . . . 37 16.1 Normative References . . . . . . . . . . . . . . . . . . . 36
16.2 Informative References . . . . . . . . . . . . . . . . . . . 37 16.2 Informative References . . . . . . . . . . . . . . . . . . 37
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 38 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 38
Intellectual Property and Copyright Statements . . . . . . . . 39 Intellectual Property and Copyright Statements . . . . . . . . 39
1. Introduction 1. Introduction
The Session Initiation Protocol (SIP) [1] provides the ability for The Session Initiation Protocol (SIP) [1] provides the ability for
users to initiate, manage, and terminate communications sessions. users to initiate, manage, and terminate communications sessions.
Frequently, these sessions will involve a SIP application. A SIP Frequently, these sessions will involve a SIP application. A SIP
application is defined as a program running on a SIP-based element application is defined as a program running on a SIP-based element
(such as a proxy or user agent) that provides some value-added (such as a proxy or user agent) that provides some value-added
function to a user or system administrator. Examples of SIP function to a user or system administrator. Examples of SIP
applications include pre-paid calling card calls, conferencing, and applications include pre-paid calling card calls, conferencing, and
presence-based [11] call routing. presence-based [12] call routing.
In order for most applications to properly function, they need input In order for most applications to properly function, they need input
from the user to guide their operation. As an example, a pre-paid from the user to guide their operation. As an example, a pre-paid
calling card application requires the user to input their calling calling card application requires the user to input their calling
card number, their PIN code, and the destination number they wish to card number, their PIN code, and the destination number they wish to
reach. The process by which a user provides input to an application reach. The process by which a user provides input to an application
is called "application interaction". is called "application interaction".
Application interaction can be either functional or stimulus. Application interaction can be either functional or stimulus.
Functional interaction requires the user device to understand the Functional interaction requires the user device to understand the
semantics of the application, whereas stimulus interaction does not. semantics of the application, whereas stimulus interaction does not.
Stimulus signaling allows for applications to be built without Stimulus signaling allows for applications to be built without
requiring modifications to the user device. Stimulus interaction is requiring modifications to the user device. Stimulus interaction is
the subject of this framework. The framework provides a model for the subject of this framework. The framework provides a model for
how users interact with applications through user interfaces, and how how users interact with applications through user interfaces, and how
user interfaces and applications can be distributed throughout a user interfaces and applications can be distributed throughout a
network. This model is then used to describe how applications can network. This model is then used to describe how applications can
instantiate and manage user interfaces. instantiate and manage user interfaces.
This document also defines a new SIP Refer-To header field parameter
and a new SIP option tag indicating support for that parameter.
2. Definitions 2. Definitions
SIP Application: A SIP application is defined as a program running on SIP Application: A SIP application is defined as a program running on
a SIP-based element (such as a proxy or user agent) that provides a SIP-based element (such as a proxy or user agent) that provides
some value-added function to a user or system administrator. some value-added function to a user or system administrator.
Examples of SIP applications include pre-paid calling card calls, Examples of SIP applications include pre-paid calling card calls,
conferencing, and presence-based [11] call routing. conferencing, and presence-based [12] call routing.
Application Interaction: The process by which a user provides input Application Interaction: The process by which a user provides input
to an application. to an application.
Real-Time Application Interaction: Application interaction that takes Real-Time Application Interaction: Application interaction that takes
place while an application instance is executing. For example, place while an application instance is executing. For example,
when a user enters their PIN number into a pre-paid calling card when a user enters their PIN number into a pre-paid calling card
application, this is real-time application interaction. application, this is real-time application interaction.
Non-Real Time Application Interaction: Application interaction that Non-Real Time Application Interaction: Application interaction that
skipping to change at page 6, line 27 skipping to change at page 6, line 31
responses to those commands prompting for more information. responses to those commands prompting for more information.
Prompt-and-Collect: The basic primitive of an IVR user interface. Prompt-and-Collect: The basic primitive of an IVR user interface.
The user is presented with a voice option, and the user speaks The user is presented with a voice option, and the user speaks
their choice. their choice.
Barge-In: The act of entering information into an IVR user inteface Barge-In: The act of entering information into an IVR user inteface
prior to the completion of a prompt requesting that information. prior to the completion of a prompt requesting that information.
Focus: A user interface component has focus when user input is Focus: A user interface component has focus when user input is
provided fed to it, as opposed to any other user interface provided to it, as opposed to any other user interface components.
components. This is not to be confused with the term focus within This is not to be confused with the term focus within the SIP
the SIP conferencing framework, which refers to the center user conferencing framework, which refers to the center user agent in a
agent in a conference [13]. conference [14].
Focus Determination: The process by which the user device determines Focus Determination: The process by which the user device determines
which user interface component will receive the user input. which user interface component will receive the user input.
Focusless Device: A user device which has no ability to perform focus Focusless Device: A user device which has no ability to perform focus
determination. An example of a focusless device is a telephone determination. An example of a focusless device is a telephone
with a keypad. with a keypad.
Presentation Capable UI: A user interface which can prompt the user Presentation Capable UI: A user interface which can prompt the user
with input, collect results, and then prompt the user with new with input, collect results, and then prompt the user with new
skipping to change at page 8, line 15 skipping to change at page 8, line 19
user with context in order to make decisions about what they want. user with context in order to make decisions about what they want.
The user interacts with the device, causing information to be passed The user interacts with the device, causing information to be passed
from the device to the user interface. The user interface interprets from the device to the user interface. The user interface interprets
the information, and passes it as a user interface event to the the information, and passes it as a user interface event to the
application. The application may be able to modify the user application. The application may be able to modify the user
interface based on this event. Whether or not this is possible interface based on this event. Whether or not this is possible
depends on the type of user interface. depends on the type of user interface.
User interfaces are fundamentally about rendering and interpretation. User interfaces are fundamentally about rendering and interpretation.
Rendering refers to the way in which the user is provided context. Rendering refers to the way in which the user is provided context.
This can be through hyperlinks, images, sounds, videos, text, and so This can be through hyperlinks, images, sounds, videos, text, and so
on. Interpretation refers to the way in which the user interface on. Interpretation refers to the way in which the user interface
takes the "raw" data provided by the user, and returns the result to takes the "raw" data provided by the user, and returns the result to
the application as a meaningful event, abstracted from the the application as a meaningful event, abstracted from the
particulars of the user interface. As an example, consider a particulars of the user interface. As an example, consider a pre-
pre-paid calling card application. The user interface worries about paid calling card application. The user interface worries about
details such as what prompt the user is provided, whether the voice details such as what prompt the user is provided, whether the voice
is male or female, and so on. It is concerned with recognizing the is male or female, and so on. It is concerned with recognizing the
speech that the user provides, in order to obtain the desired speech that the user provides, in order to obtain the desired
information. In this case, the desired information is the calling information. In this case, the desired information is the calling
card number, the PIN code, and the destination number. The card number, the PIN code, and the destination number. The
application needs that data, and it doesn't matter to the application application needs that data, and it doesn't matter to the application
whether it was collected using a male prompt or a female one. whether it was collected using a male prompt or a female one.
User interfaces generally have real-time requirements towards the User interfaces generally have real-time requirements towards the
user. That is, when a user interacts with the user interface, the user. That is, when a user interacts with the user interface, the
skipping to change at page 9, line 40 skipping to change at page 9, line 43
information to the user, stimulus user interfaces are usually slower, information to the user, stimulus user interfaces are usually slower,
less user friendly, and less responsive than a functional less user friendly, and less responsive than a functional
counterpart. However, they allow for substantial innovation in counterpart. However, they allow for substantial innovation in
applications, since no standardization activity is needed to build a applications, since no standardization activity is needed to build a
new application, as long as it can interact with the user within the new application, as long as it can interact with the user within the
confines of the user interface mechanism. The web is an example of a confines of the user interface mechanism. The web is an example of a
stimulus user interface to applications. stimulus user interface to applications.
In SIP systems, functional interfaces are provided by extending the In SIP systems, functional interfaces are provided by extending the
SIP protocol to provide the needed functionality. For example, the SIP protocol to provide the needed functionality. For example, the
SIP caller preferences specification [14] provides a functional SIP caller preferences specification [15] provides a functional
interface that allows a user to request applications to route the interface that allows a user to request applications to route the
call to specific types of user agents. Functional interfaces are call to specific types of user agents. Functional interfaces are
important, but are not the subject of this framework. The primary important, but are not the subject of this framework. The primary
goal of this framework is to address the role of stimulus interfaces goal of this framework is to address the role of stimulus interfaces
to SIP applications. to SIP applications.
3.2 Real-Time vs. Non-Real Time 3.2 Real-Time vs. Non-Real Time
Application interaction systems can also be real-time or Application interaction systems can also be real-time or non-real-
non-real-time. Non-real interaction allows the user to enter time. Non-real interaction allows the user to enter information
information about application operation asynchronously with its about application operation asynchronously with its invocation.
invocation. Frequently, this is done through provisioning systems. Frequently, this is done through provisioning systems. As an
example, a user can set up the forwarding number for a call-forward
As an example, a user can set up the forwarding number for a on no-answer application using a web page. Real-time interaction
call-forward on no-answer application using a web page. Real-time requires the user to interact with the application at the time of its
interaction requires the user to interact with the application at the invocation.
time of its invocation.
3.3 Client-Local vs. Client-Remote 3.3 Client-Local vs. Client-Remote
Another axis in the taxonomization is whether the user interface is Another axis in the taxonomization is whether the user interface is
co-resident with the user device (which we refer to as a client-local co-resident with the user device (which we refer to as a client-local
user interface), or the user interface runs in a host separated from user interface), or the user interface runs in a host separated from
the client (which we refer to as a client-remote user interface). In the client (which we refer to as a client-remote user interface). In
a client-remote user interface, there exists some kind of protocol a client-remote user interface, there exists some kind of protocol
between the client device and the UI that allows the client to between the client device and the UI that allows the client to
interact with the user interface over a network. interact with the user interface over a network.
skipping to change at page 11, line 26 skipping to change at page 11, line 28
the next input. As a result, presentation capable user interfaces the next input. As a result, presentation capable user interfaces
require an update to the information provided to the user after each require an update to the information provided to the user after each
input. The web is a classic example of this. After every input input. The web is a classic example of this. After every input
(i.e., a click), the browser provides the input to the application (i.e., a click), the browser provides the input to the application
and fetches the next page to render. In a presentation free user and fetches the next page to render. In a presentation free user
interface, this is not the case. Since the user is not provided with interface, this is not the case. Since the user is not provided with
feedback, these user interfaces tend to merely collect information as feedback, these user interfaces tend to merely collect information as
its entered, and pass it to the application. its entered, and pass it to the application.
Another difference is that a presentation-free user interface cannot Another difference is that a presentation-free user interface cannot
support the concept of a focus. As a result, if multiple easily support the concept of a focus. Selection of a focus usually
applications wish to gather input from the user, there is no way for requires a means for informing the user of the available
the user to select which application the input is destined for. The applications, allowing the user to choose, and then informing them
input provided to applications through presentation-free user about which one they have chosen. Without the first and third steps
interfaces is more of a broadcast or notification operation, as a (which a presentation-free UI cannot provide), focus selection is
result. very difficult. Without a selected focus, the input provided to
applications through presentation-free user interfaces is more of a
broadcast or notification operation, as a result.
4. Interaction Scenarios on Telephones 4. Interaction Scenarios on Telephones
In this section, we applied the model of Section 3 to telephones. In this section, we applied the model of Section 3 to telephones.
In a traditional telephone, the user interface consists of a 12-key In a traditional telephone, the user interface consists of a 12-key
keypad, a speaker, and a microphone. Indeed, from here forward, the keypad, a speaker, and a microphone. Indeed, from here forward, the
term "telephone" is used to represent any device that meets, at a term "telephone" is used to represent any device that meets, at a
minimum, the characteristics described in the previous sentence. minimum, the characteristics described in the previous sentence.
Circuit-switched telephony applications are almost universally Circuit-switched telephony applications are almost universally
skipping to change at page 12, line 15 skipping to change at page 12, line 21
interface on an IP network, input taken from a user on a circuit interface on an IP network, input taken from a user on a circuit
switched telephone. The gateway may be able to run a client-local switched telephone. The gateway may be able to run a client-local
user interface, just as an IP telephone might. user interface, just as an IP telephone might.
4.1 Client Remote 4.1 Client Remote
The most obvious instantiation is the "classic" circuit-switched The most obvious instantiation is the "classic" circuit-switched
telephony model. In that model, the user interface runs remotely telephony model. In that model, the user interface runs remotely
from the client. The interface between the user and the user from the client. The interface between the user and the user
interface is through media, set up by SIP and carried over the Real interface is through media, set up by SIP and carried over the Real
Time Transport Protocol (RTP) [16]. The microphone input can be Time Transport Protocol (RTP) [18]. The microphone input can be
carried using any suitable voice encoding algorithm. The keypad carried using any suitable voice encoding algorithm. The keypad
input can be conveyed in one of two ways. The first is to convert input can be conveyed in one of two ways. The first is to convert
the keypad input to DTMF, and then convey that DTMF using a suitance the keypad input to DTMF, and then convey that DTMF using a suitance
encoding algorithm for it (such as PCMU). An alternative, and encoding algorithm for it (such as PCMU). An alternative, and
generally the preferred approach, is to transmit the keypad input generally the preferred approach, is to transmit the keypad input
using RFC 2833 [17], which provides an encoding mechanism for using RFC 2833 [19], which provides an encoding mechanism for
carrying keypad input within RTP. carrying keypad input within RTP.
In this classic model, the user interface would run on a server in In this classic model, the user interface would run on a server in
the IP network. It would perform speech recognition and DTMF the IP network. It would perform speech recognition and DTMF
recognition to derive the user intent, feed them through the user recognition to derive the user intent, feed them through the user
interface, and provide the result to an application. interface, and provide the result to an application.
4.2 Client Local 4.2 Client Local
An alternative model is for the entire user interface to reside on An alternative model is for the entire user interface to reside on
skipping to change at page 12, line 46 skipping to change at page 12, line 52
telephone had a textual display. telephone had a textual display.
For simpler phones without a display, the user interface can be For simpler phones without a display, the user interface can be
described by a Keypad Markup Language request document [7]. As the described by a Keypad Markup Language request document [7]. As the
user enters digits in the keypad, they are passed to the user user enters digits in the keypad, they are passed to the user
interface, which generates user interface events that can be interface, which generates user interface events that can be
transported to the application. transported to the application.
4.3 Flip-Flop 4.3 Flip-Flop
A middle-ground approach is to flip back and forth between a A middle-ground approach is to flip back and forth between a client-
client-local and client-remote user interface. Many voice local and client-remote user interface. Many voice applications are
applications are of the type which listen to the media stream and of the type which listen to the media stream and wait for some
wait for some specific trigger that kicks off a more complex user specific trigger that kicks off a more complex user interaction. The
interaction. The long pound in a pre-paid calling card application long pound in a pre-paid calling card application is one example.
is one example. Another example is a conference recording Another example is a conference recording application, where the user
application, where the user can press a key at some point in the call can press a key at some point in the call to begin recording. When
to begin recording. When the key is pressed, the user hears a the key is pressed, the user hears a whisper to inform them that
whisper to inform them that recording has started. recording has started.
The ideal way to support such an application is to install a The ideal way to support such an application is to install a client-
client-local user interface component that waits for the trigger to local user interface component that waits for the trigger to kick off
kick off the real interaction. Once the trigger is received, the the real interaction. Once the trigger is received, the application
application connects the user to a client-remote user interface that connects the user to a client-remote user interface that can play
can play announements, collect more information, and so on. announements, collect more information, and so on.
The benefit of flip-flopping between a client-local and client-remote The benefit of flip-flopping between a client-local and client-remote
user interface is cost. The client-local user interface will user interface is cost. The client-local user interface will
eliminate the need to send media streams into the network just to eliminate the need to send media streams into the network just to
wait for the user to press the pound key on the keypad. wait for the user to press the pound key on the keypad.
The Keypad Markup Language (KPML) was designed to support exactly The Keypad Markup Language (KPML) was designed to support exactly
this kind of need [7]. It models the keypad on a phone, and allows this kind of need [7]. It models the keypad on a phone, and allows
an application to be informed when any sequence of keys have been an application to be informed when any sequence of keys have been
pressed. However, KPML has no presentation component. Since user pressed. However, KPML has no presentation component. Since user
skipping to change at page 13, line 50 skipping to change at page 14, line 7
In this framework, we use the term "SIP application" to refer to a In this framework, we use the term "SIP application" to refer to a
broad set of functionality. A SIP application is a program running broad set of functionality. A SIP application is a program running
on a SIP-based element (such as a proxy or user agent) that provides on a SIP-based element (such as a proxy or user agent) that provides
some value-added function to a user or system administrator. SIP some value-added function to a user or system administrator. SIP
applications can execute on behalf of a caller, a called party, or a applications can execute on behalf of a caller, a called party, or a
multitude of users at once. multitude of users at once.
Each application has a number of instances that are executing at any Each application has a number of instances that are executing at any
given time. An instance represents a single execution path for an given time. An instance represents a single execution path for an
application. Each instance has a well defined lifecycle. It is application. It is established as a result of some event. That
established as a result of some event. That event can be a SIP event can be a SIP event, such as the reception of a SIP INVITE
event, such as the reception of a SIP INVITE request, or it can be a request, or it can be a non-SIP event, such as a web form post or
non-SIP event, such as a web form post or even a timer. Application even a timer. Application instances also have an end time. Some
instances also have a specific end time. Some instances have a instances have a lifetime that is coupled with a SIP transaction or
lifetime that is coupled with a SIP transaction or dialog. For dialog. For example, a proxy application might begin when an INVITE
example, a proxy application might begin when an INVITE arrives, and arrives, and terminate when the call is answered. Other applications
terminate when the call is answered. Other applications have a have a lifetime that spans multiple dialogs or transactions. For
lifetime that spans multiple dialogs or transactions. For example, a example, a conferencing application instance may exist so long as
conferencing application instance may exist so long as there are any there are any dialogs connected to it. When the last dialog
dialogs connected to it. When the last dialog terminates, the terminates, the application instance terminates. Other applications
application instance terminates. Other applications have a liftime have a liftime that is completely decoupled from SIP events.
that is completely decoupled from SIP events.
It is fundamental to the framework described here that multiple It is fundamental to the framework described here that multiple
application instances may interact with a user during a single SIP application instances may interact with a user during a single SIP
transaction or dialog. Each instance may be for the same transaction or dialog. Each instance may be for the same
application, or different applications. Each of the applications may application, or different applications. Each of the applications may
be completely independent, in that they may be owned by different be completely independent, in that they may be owned by different
providers, and may not be aware of each others existence. Similarly, providers, and may not be aware of each others existence. Similarly,
there may be application instances interacting with the caller, and there may be application instances interacting with the caller, and
instances interacting with the callee, both within the same instances interacting with the callee, both within the same
transaction or dialog. transaction or dialog.
The first step in the interaction with the user is to instantiate one The first step in the interaction with the user is to instantiate one
or more user interface components for the application instance. A or more user interface components for the application instance. A
user interface component is a single piece of the user interface that user interface component is a single piece of the user interface that
is defined by a logical flow that is not synchronously coupled with is defined by a logical flow that is not synchronously coupled with
any other component. In other words, each component runs more or any other component. In other words, each component runs
less independently. independently.
A user interface component can be instantiated in one of the user A user interface component can be instantiated in one of the user
agents in a dialog (for a client-local user interface), or within a agents in a dialog (for a client-local user interface), or within a
network element (for a client-remote user interface). If a network element (for a client-remote user interface). If a client-
client-local user interface is to be used, the application needs to local user interface is to be used, the application needs to
determine whether or not the user agent is capable of supporting a determine whether or not the user agent is capable of supporting a
client-local user interface, and in what format. In this framework, client-local user interface, and in what format. In this framework,
all client-local user interface components are described by a markup all client-local user interface components are described by a markup
language. A markup language describes a logical flow of presentation language. A markup language describes a logical flow of presentation
of information to the user, collection of information from the user, of information to the user, collection of information from the user,
and transmission of that information to an application. Examples of and transmission of that information to an application. Examples of
markup languages include HTML, WML, VoiceXML, and the Keypad Markup markup languages include HTML, WML, VoiceXML, and the Keypad Markup
Language (KPML) [7]. Language (KPML) [7].
Unlike an application instance, which has very flexible lifetimes, a Unlike an application instance, which has very flexible lifetimes, a
skipping to change at page 15, line 12 skipping to change at page 15, line 15
dialog) is created. However, the user interface component terminates dialog) is created. However, the user interface component terminates
when the dialog terminates. The user interface component can be when the dialog terminates. The user interface component can be
terminated earlier by the user agent, and possibly by the terminated earlier by the user agent, and possibly by the
application, but its lifetime never exceeds that of its associated application, but its lifetime never exceeds that of its associated
dialog. dialog.
There are two ways to create a client local interface component. For There are two ways to create a client local interface component. For
interface components that are presentation capable, the application interface components that are presentation capable, the application
sends a REFER [6] request to the user agent. The Refer-To header sends a REFER [6] request to the user agent. The Refer-To header
field contains an HTTP URI that points to the markup for the user field contains an HTTP URI that points to the markup for the user
interface. For interface components that are presentation free (such interface, and the REFER contains a Target-Dialog header field [9]
as those defined by KPML), the application sends a SUBSCRIBE request identifying the dialog associated with the user interface component.
to the user agent. The body of the SUBSCRIBE request contains a For user interface components that are presentation free (such as
filter, which, in this case, is the markup that defines when those defined by KPML), the application sends a SUBSCRIBE request to
information is to be sent to the application in a NOTIFY. the user agent. The body of the SUBSCRIBE request contains a filter,
which, in this case, is the markup that defines when information is
to be sent to the application in a NOTIFY. The SUBSCRIBE does not
contain the Target-Dialog header field, since equivalent information
is conveyed in the Event header field.
If a user interface component is to be instantiated in the network, If a user interface component is to be instantiated in the network,
there is no need to determine the capabilities of the device on which there is no need to determine the capabilities of the device on which
the user interface is instantiated. Presumably, it is on a device on the user interface is instantiated. Presumably, it is on a device on
which the application knows a UI can be created. However, the which the application knows a UI can be created. However, the
application does need to connect the user device to the user application does need to connect the user device to the user
interface. This will require manipulation of media streams in order interface. This will require manipulation of media streams in order
to establish that connection. to establish that connection.
The interface between the user interface component and the The interface between the user interface component and the
application depends on the type of user interface. For presentation application depends on the type of user interface. For presentation
capable user interfaces, such as those described by HTML and capable user interfaces, such as those described by HTML and
VoiceXML, HTTP form POST operations are used. For presentation free VoiceXML, HTTP form POST operations are used. For presentation free
user interfaces, a SIP NOTIFY is used. The differing needs and user interfaces, a SIP NOTIFY is used. The differing needs and
capabilities of these two user interfaces, as described in Section capabilities of these two user interfaces, as described in
3.4, is what drives the different choices for the interactions. Section 3.4, is what drives the different choices for the
Since presentation capable user interfaces require an update to the interactions. Since presentation capable user interfaces require an
presentation every time user data is entered, they are a good match update to the presentation every time user data is entered, they are
for HTTP. Since presentation free user interfaces merely transmit a good match for HTTP. Since presentation free user interfaces
user input to the application, a NOTIFY is more appropriate. merely transmit user input to the application, a NOTIFY is more
appropriate.
Indeed, for presentation free user interfaces, there are two Indeed, for presentation free user interfaces, there are two
different modalities of operation. The first is called "one shot". different modalities of operation. The first is called "one shot".
In the one-shot role, the markup waits for a user to enter some In the one-shot role, the markup waits for a user to enter some
information, and when they do, reports this event to the application. information, and when they do, reports this event to the application.
The application then does something, and the markup is no longer The application then does something, and the markup is no longer
used. In the other modality, called "monitor", the markup stays used. In the other modality, called "monitor", the markup stays
permanently resident, and reports information back to an application permanently resident, and reports information back to an application
until termination of the associated dialog. until termination of the associated dialog.
skipping to change at page 16, line 28 skipping to change at page 16, line 35
| Device B--------------------Y | | Device B--------------------Y |
+--------+ +-----+ +--------+ +-----+
Figure 2: Third Party Topology Figure 2: Third Party Topology
In this topology, the application that is interested in interacting In this topology, the application that is interested in interacting
with the users exists outside of the SIP dialog between the user with the users exists outside of the SIP dialog between the user
agents. In that case, the application learns about the initiation agents. In that case, the application learns about the initiation
and termination of the dialog, along with the dialog identifiers, and termination of the dialog, along with the dialog identifiers,
through some out of band means. One such possibility is the dialog through some out of band means. One such possibility is the dialog
event package [15]. Dialog information is only revealed to trusted event package [16]. Dialog information is only revealed to trusted
parties, so the application would need to be trusted by one of the parties, so the application would need to be trusted by one of the
users in order to obtain this information. users in order to obtain this information.
At any point during the dialog, the application can instantiate user At any point during the dialog, the application can instantiate user
interface components on the user device of the caller or callee. It interface components on the user device of the caller or callee. It
can do this either using SUBSCRIBE or REFER, depending on the type of can do this either using SUBSCRIBE or REFER, depending on the type of
user interface (presentation capable or presentation free). user interface (presentation capable or presentation free).
6.2 Co-Resident Application 6.2 Co-Resident Application
skipping to change at page 18, line 12 skipping to change at page 18, line 42
transported using RTP (including RFC 2833 for carrying user input), transported using RTP (including RFC 2833 for carrying user input),
is sent from the user device to the client remote user interface on is sent from the user device to the client remote user interface on
the User Device Proxy. As far as the application is concerned, it is the User Device Proxy. As far as the application is concerned, it is
installing what it thinks is a client local user interface on the installing what it thinks is a client local user interface on the
user device, but it happens to be on a user device proxy which looks user device, but it happens to be on a user device proxy which looks
like the user device to the application. like the user device to the application.
The user device proxy will need to terminate and re-originate both The user device proxy will need to terminate and re-originate both
signaling (SIP) and media traffic towards the actual peer in the signaling (SIP) and media traffic towards the actual peer in the
conversation. The User Device Proxy is a media relay in the conversation. The User Device Proxy is a media relay in the
terminology of RFC 3550 [16]. The User Device Proxy will need to terminology of RFC 3550 [18]. The User Device Proxy will need to
monitor the media streams associated with each dialog, in order to monitor the media streams associated with each dialog, in order to
convert user input received in the media stream to events reported to convert user input received in the media stream to events reported to
the user interface. This can pose a challenge in multi-media the user interface. This can pose a challenge in multi-media
systems, where it may be unclear on which media stream the user input systems, where it may be unclear on which media stream the user input
is being sent. As discussed in RFC 3264 [18], if a user agent has a is being sent. As discussed in RFC 3264 [20], if a user agent has a
single media source and is supporting multiple streams, it is single media source and is supporting multiple streams, it is
supposed to send that source to all streams. In cases where there supposed to send that source to all streams. In cases where there
are multiple sources, the mapping is a matter of local policy. In are multiple sources, the mapping is a matter of local policy. In
the absence of a way to explicitly identify or request which sources the absence of a way to explicitly identify or request which sources
map to which streams, the user device proxy will need to do the best map to which streams, the user device proxy will need to do the best
job it can. This specification RECOMMENDS that the User Device Proxy job it can. This specification RECOMMENDS that the User Device Proxy
monitor the first stream (defined in terms of ordering of media monitor the first stream (defined in terms of ordering of media
sessions within a session description). As such, user agents SHOULD sessions within a session description). As such, user agents SHOULD
send their user input on the first stream, absent a policy to direct send their user input on the first stream, absent a policy to direct
it otherwise. it otherwise.
skipping to change at page 19, line 28 skipping to change at page 19, line 38
+----------+ RTP +----------+ +----------+ RTP +----------+
User Device User Device User Device User Device
Figure 5: Proxy Application Topology Figure 5: Proxy Application Topology
In this topology, the application is co-resident with a transaction In this topology, the application is co-resident with a transaction
stateful, record-routing proxy server on the call path between two stateful, record-routing proxy server on the call path between two
user devices. The application uses SUBSCRIBE or REFER to install user devices. The application uses SUBSCRIBE or REFER to install
user interface components on one or both user devices. user interface components on one or both user devices.
This topology is common in routing applications, such as a This topology is common in routing applications, such as a web-
web-assisted call routing application. assisted call routing application.
7. Application Behavior 7. Application Behavior
The behavior of an application within this framework depends on The behavior of an application within this framework depends on
whether it seeks to use a client-local or client-remote user whether it seeks to use a client-local or client-remote user
interface. interface.
7.1 Client Local Interfaces 7.1 Client Local Interfaces
One key component of this framework is support for client local user One key component of this framework is support for client local user
interfaces. interfaces.
7.1.1 Discovering Capabilities 7.1.1 Discovering Capabilities
A client local user interface can only be instantiated on a user A client local user interface can only be instantiated on a user
agent if the user agent supports that type of user interface agent if the user agent supports that type of user interface
component. Support for client local user interface components is component. Support for client local user interface components is
declared by both the UAC and a UAS in its Accept, Allow, Contact and declared by both the UAC and a UAS in its Allow, Accept, Supported,
Allow-Event header fields of dialog-initiating requests and and Allow-Event header fields of dialog-initiating requests and
responses. If the Allow header field indicates support for the SIP responses. If the Allow header field indicates support for the SIP
SUBSCRIBE method, and the Allow-Event header field indicates support SUBSCRIBE method, and the Allow-Event header field indicates support
for the kpml package [7], and the Supported header field indicates for the kpml package [7], and the Supported header field indicates
that its Contact URI is a GRUU [8], it means that the UA can support for the GRUU GRUU [8] specification (which, in turn, means
instantiate presentation free user interface components. In this that the Contact header field contains a GRUU), it means that the UA
case, the application MAY push presentation free user interface can instantiate presentation free user interface components. In this
case, the application can push presentation free user interface
components according to the rules of Section 7.1.2. The specific components according to the rules of Section 7.1.2. The specific
markup languages that can be supported are indicated in the Accept markup languages that can be supported are indicated in the Accept
header field. header field.
If the Allow header field indicates support for the SIP REFER method, If the Allow header field indicates support for the SIP REFER method,
the Supported header field indicates support for the "refer-context" and the Supported header field indicates support for the Target-
extension described below, and the Contact header field contains UA Dialog header field [9], and the Contact header field contains UA
capabilities [5] that indicate support for the HTTP URI scheme, it capabilities [5] that indicate support for the HTTP URI scheme, it
means that the UA supports presentation capable user interface means that the UA supports presentation capable user interface
components. In this case, the application MAY push presentation components. In this case, the application can push presentation
capable user interface components to the client according to the capable user interface components to the client according to the
rules of Section 7.1.2. The specific markups that are supported are rules of Section 7.1.2. The specific markups that are supported are
indicated in the Accept header field. indicated in the Accept header field.
A third party application that is not present on the call path will A third party application that is not present on the call path will
not be privy to these headers in the dialog requests that pass by. not be privy to these header fields in the dialog initiating requests
As such, it will need to obtain this capability information in other that pass by. As such, it will need to obtain this capability
ways. One way is through the registration event package [19], which information in other ways. One way is through the registration event
can contain user agent capability information provided in REGISTER package [21], which can contain user agent capability information
requests [5]. provided in REGISTER requests [5].
7.1.2 Pushing an Initial Interface Component 7.1.2 Pushing an Initial Interface Component
Generally, we anticipate that interface components will need to be Generally, we anticipate that interface components will need to be
created at various different points in a SIP session. Clearly, they created at various different points in a SIP session. Clearly, they
will need to be pushed during session setup, or after the session is will need to be pushed during session setup, or after the session is
established. A user interface component is always associated with a established. A user interface component is always associated with a
specific dialog, however. specific dialog, however.
An application MUST NOT attempt to push a user interface component to An application MUST NOT attempt to push a user interface component to
skipping to change at page 21, line 18 skipping to change at page 21, line 34
the Contact header field of the dialog initiating request or response the Contact header field of the dialog initiating request or response
sent by that UA. Note that this REFER request creates a separate sent by that UA. Note that this REFER request creates a separate
dialog between the application and the UA. The Refer-To header field dialog between the application and the UA. The Refer-To header field
of the REFER request MUST contain an HTTP URI that references the of the REFER request MUST contain an HTTP URI that references the
markup document to be fetched. markup document to be fetched.
Furthermore, it is essential for the REFER request to be correlated Furthermore, it is essential for the REFER request to be correlated
with the dialog to which the user interface component will be with the dialog to which the user interface component will be
associated. This is necessary for authorization and for terminating associated. This is necessary for authorization and for terminating
the user interface components when the dialog terminates. To provide the user interface components when the dialog terminates. To provide
this context, this specification defines the "context" header field this context, the REFER request MUST contain a Target-Dialog header
parameter as an extension to the Refer-To heder field. The grammar field identifying the dialog with which the user interface component
for this header field parameter is: is associated. As discussed in [9], this request will also contain a
Require header field with the tdialog option tag.
refer-to-ctxt = "context" EQUAL DQUOTE local-tag "," remote-tag
"," callid DQUOTE ; callid defined in RFC 3261
;; NOTE: any DQUOTEs inside callid MUST be escaped
;; using quoted pair
local-tag = token
remote-tag = token
Refer-To = ("Refer-To" / "r") HCOLON ( name-addr / addr-spec ) *
(SEMI (generic-param / refer-to-ctxt))
The application MUST include the context header field parameter in
the REFER request. The remote-tag MUST be set to the remote tag of
the dialog as seen by the user device. The local-tag MUST be set to
the local tag of the dialog as seen by the user device. The callid
MUST be set to the Call-ID of the dialog as seen by the device.
Since the callid grammar allows it to contain double quotes, any such
double quotes MUST be represented with a quoted pair.
Since the "context" parameter in the Refer-To header field must be
understood by the UA to process the request, this specification
defines a new SIP option tag, "refer-context". A REFER request
generated by an application MUST include a Require header field with
this option tag value. Fortunately, the application will know ahead
of time whether this extension is supported, as discussed in Section
7.1.1.
To create a presentation free user interface component, the To create a presentation free user interface component, the
application sends a SUBSCRIBE request to the UA. The SUBSCRIBE MUST application sends a SUBSCRIBE request to the UA. The SUBSCRIBE MUST
be sent to the GRUU advertised by the UA. This SUBSCRIBE request be sent to the GRUU advertised by the UA. This SUBSCRIBE request
creates a separate dialog. The SUBSCRIBE request MUST use the KPML creates a separate dialog. The SUBSCRIBE request MUST use the KPML
[7] event package. The Event header field MUST contain parameters [7] event package. The body of the SUBSCRIBE request contains the
which identify the particular dialog that the interface component is markup document that defines the conditions under which the
being instantiated against. The body of the SUBSCRIBE request application wishes to be notified of user input.
contains the markup document that defines the conditions under which
the application wishes to be notified of user input.
In both cases, the REFER or SUBSCRIBE request SHOULD include a In both cases, the REFER or SUBSCRIBE request SHOULD include a
display name in the From header field which identifies the name of display name in the From header field which identifies the name of
the application. For example, a prepaid calling card might include a the application. For example, a prepaid calling card might include a
From header field which looks like: From header field which looks like:
From: "Prepaid Calling Card" <sip:prepaid@example.com> From: "Prepaid Calling Card" <sip:prepaid@example.com>
Any of the SIP identity assertion mechanisms that have been defined, Any of the SIP identity assertion mechanisms that have been defined,
such as [10] and [12] are applicable to these requests as well. such as [11] and [13] are applicable to these requests as well.
7.1.3 Updating an Interface Component 7.1.3 Updating an Interface Component
Once a user interface component has been created on a client, it can Once a user interface component has been created on a client, it can
be updated. The means for updating it depends on the type of UI be updated. The means for updating it depends on the type of UI
component. component.
Presentation capable UI components are updated using techniques Presentation capable UI components are updated using techniques
already in place for those markups. In particular, user input will already in place for those markups. In particular, user input will
cause an HTTP POST operation to push the user input to the cause an HTTP POST operation to push the user input to the
application. The result of the POST operation is a new markup that application. The result of the POST operation is a new markup that
the UI is supposed to use. This allows the UI to updated in response the UI is supposed to use. This allows the UI to be updated in
to user action. Some markups, such as HTML, provide the ability to response to user action. Some markups, such as HTML, provide the
force a refresh after a certain period of time, so that the UI can be ability to force a refresh after a certain period of time, so that
updated without user input. Those mechanisms can be used here as the UI can be updated without user input. Those mechanisms can be
well. However, there is no support for an asynchronous push of an used here as well. However, there is no support for an asynchronous
updated UI component from the appliciation to the user agent. A new push of an updated UI component from the appliciation to the user
REFER request to the same GRUU would create a new UI component rather agent. A new REFER request to the same GRUU would create a new UI
than updating any components already in place. component rather than updating any components already in place.
For presentation free UI, the story is different. The application For presentation free UI, the story is different. The application
MAY update the filter at any time by generating a SUBSCRIBE refresh MAY update the filter at any time by generating a SUBSCRIBE refresh
with the new filter. The UA will immediately begin using this new with the new filter. The UA will immediately begin using this new
filter. filter.
7.1.4 Terminating an Interface Component 7.1.4 Terminating an Interface Component
User interface components have a well defined lifetime. They are User interface components have a well defined lifetime. They are
created when the component is first pushed to the client. User created when the component is first pushed to the client. User
skipping to change at page 24, line 12 skipping to change at page 23, line 48
application. It is a terminating application because the user application. It is a terminating application because the user
explicitly calls it; i.e., it is the actual called party. An example explicitly calls it; i.e., it is the actual called party. An example
of an originating application is a wakeup call application, which of an originating application is a wakeup call application, which
calls a user at a specified time in order to wake them up. calls a user at a specified time in order to wake them up.
Because originating and terminating applications are a natural Because originating and terminating applications are a natural
termination point of the dialog, manipulation of the media session by termination point of the dialog, manipulation of the media session by
the application is trivial. Traditional SIP techniques for adding the application is trivial. Traditional SIP techniques for adding
and removing media streams, modifying codecs, and changing the and removing media streams, modifying codecs, and changing the
address of the recipient of the media streams, can be applied. address of the recipient of the media streams, can be applied.
Similarly, the application can directly authenticate itself to the
user through S/MIME, since it is the peer UA in the dialog.
7.2.2 Intermediary Applications 7.2.2 Intermediary Applications
Intermediary applications are, at the same time, more common than Intermediary applications are, at the same time, more common than
originating/terminating applications, and more complex. Intermediary originating/terminating applications, and more complex. Intermediary
applications are applications that are neither the actual caller or applications are applications that are neither the actual caller or
called party. Rather, they represent a "third party" that wishes to called party. Rather, they represent a "third party" that wishes to
interact with the user. The classic example is the ubiquitous interact with the user. The classic example is the ubiquitous pre-
pre-paid calling card application. paid calling card application.
In order for the intermediary application to add a client remote user In order for the intermediary application to add a client remote user
interface, it needs to manipulate the media streams of the user agent interface, it needs to manipulate the media streams of the user agent
to terminate on that user interface. This also introduces a to terminate on that user interface. This also introduces a
fundamental feature interaction issue. Since the intermediary fundamental feature interaction issue. Since the intermediary
application is not an actual participant in the call, the user will application is not an actual participant in the call, the user will
need to interact with both the intermediary application and its peer need to interact with both the intermediary application and its peer
in the dialog. Doing both at the same time is complicated, and is in the dialog. Doing both at the same time is complicated, and is
discussed in more detail in Section 9. discussed in more detail in Section 9.
8. User Agent Behavior 8. User Agent Behavior
8.1 Advertising Capabilities 8.1 Advertising Capabilities
In order to participate in applications that make use of stimulus In order to participate in applications that make use of stimulus
interfaces, a user agent needs to advertise its interaction interfaces, a user agent needs to advertise its interaction
capabilities. capabilities.
If a user agent supports presentation capable user interfaces, it If a user agent supports presentation capable user interfaces, it
MUST support the REFER method, along with the "context" extension MUST support the REFER method. It MUST include, in all dialog
defined here. It MUST include, in all dialog initiating requests and initiating requests and responses, an Allow header field that
responses, an Allow header field that includes the REFER method and includes the REFER method. The user agent MUST support the target
and the Supported header field that includes the value dialog specification [9], and MUST include the "tdialog" option tag
"refer-context". Furthermore, the UA MUST support the SIP user agent in the Supported header field of dialog forming requests and
responses. Furthermore, the UA MUST support the SIP user agent
capabilities specification [5]. The UA MUST be capable of being capabilities specification [5]. The UA MUST be capable of being
REFER'd to an HTTP URI. It MUST include, in the Contact header field REFER'd to an HTTP URI. It MUST include, in the Contact header field
of its dialog initiating requests and responses, a "schemes" Contact of its dialog initiating requests and responses, a "schemes" Contact
header field parameter include the http URI scheme. The UA MUST header field parameter that includes the http URI scheme. The UA
include, in all dialog initiating requests and responses, an Accept MUST include, in all dialog initiating requests and responses, an
header field listing all of those markups supported by the UA. It is Accept header field listing all of those markups supported by the UA.
RECOMMENDED that all user agents that support presentation capable It is RECOMMENDED that all user agents that support presentation
user interfaces support HTML. capable user interfaces support HTML.
If a user agent supports presentation free user interfaces, it MUST If a user agent supports presentation free user interfaces, it MUST
support the SUBSCRIBE [3] method. It MUST support the KPML [7] event support the SUBSCRIBE [3] method. It MUST support the KPML [7] event
package. It MUST include, in all dialog initiating requests and package. It MUST include, in all dialog initiating requests and
responses, an Allow header field that includes the SUBSCRIBE method. responses, an Allow header field that includes the SUBSCRIBE method.
It MUST include, in all dialog initiating requests and responses, an It MUST include, in all dialog initiating requests and responses, an
Allow-Events header field that lists the KPML event package. The UA Allow-Events header field that lists the KPML event package. The UA
MUST include, in all dialog initiating requests and responses, an MUST include, in all dialog initiating requests and responses, an
Accept header field listing those event filters it supports. At a Accept header field listing those event filters it supports. At a
minimum, a UA MUST support the "application/kpml-request+xml" MIME minimum, a UA MUST support the "application/kpml-request+xml" MIME
type. type.
For either presentation free or presentation capable user interfaces, For either presentation free or presentation capable user interfaces,
the user agent MUST support the GRUU [8] specification. The Contact the user agent MUST support the GRUU [8] specification. The Contact
header field in all dialog initiating requests and responses MUST header field in all dialog initiating requests and responses MUST
contain a GRUU. The UA MUST include a Supported header field which contain a GRUU. The UA MUST include a Supported header field which
contains the "gruu" option tag. contains the "gruu" option tag and the "tdialog" option tag.
Because these headers are examined by proxies which may be executing Because these headers are examined by proxies which may be executing
applications, a UA that wishes to support client local user applications, a UA that wishes to support client local user
interfaces should not encrypt them. interfaces should not encrypt them.
8.2 Receiving User Interface Components 8.2 Receiving User Interface Components
Once the UA has created a dialog (in either the early or confirmed Once the UA has created a dialog (in either the early or confirmed
states), it MUST be prepared to receive a SUBSCRIBE or REFER request states), it MUST be prepared to receive a SUBSCRIBE or REFER request
against its GRUU. If the UA receives such a request prior to the against its GRUU. If the UA receives such a request prior to the
establishment of a dialog, the UA MUST reject the request. establishment of a dialog, the UA MUST reject the request.
A user agent SHOULD attempt to authenticate the sender of the A user agent SHOULD attempt to authenticate the sender of the
request. The sender will generally be an application, and therefore request. The sender will generally be an application, and therefore
the user agent is unlikely to ever have a shared secret with it, the user agent is unlikely to ever have a shared secret with it,
making digest authentication useless. However, authenticated making digest authentication useless. However, authenticated
identities can be obtained through other means, such as [10]. identities can be obtained through other means, such as [11].
A user agent MAY have pre-defined authorization policies which permit A user agent MAY have pre-defined authorization policies which permit
applications which have authenticated themselves with a particular applications which have authenticated themselves with a particular
identity, to push user interface components. If such a set of identity, to push user interface components. If such a set of
policies are present, it is checked first. If the application is policies are present, they are checked first. If the application is
authorized, processing proceeds. authorized, processing proceeds.
If the application has authenticated itself, but it is not explicitly If the application has authenticated itself, but it is not explicitly
authorized or blocked, this specification RECOMMENDS that the authorized or blocked, this specification RECOMMENDS that the
application be automatically authorized if it can prove that it was application be automatically authorized if it can prove that it was
either on the call path, or is trusted by one of the elements on the either on the call path, or is trusted by one of the elements on the
call path. An application proves this to the user agent by call path. An application proves this to the user agent by
presenting it with the dialog identifiers in the SUBSCRIBE or REFER demonstrating that it knows the dialog identifiers. That occurs by
request. In the case of SUBSCRIBE, those identifiers are present in including them in a Target-Dialog header field for REFER requests, or
the Event header field [7]. In the case of REFER, those identifiers in the Event header field parameters of the KPML SUBSCRIBE request.
are present in the "context" parameter of the Refer-To header field.
Because of the dialog identifiers serve as a tool for authorization, Because of the dialog identifiers serve as a tool for authorization,
a user agent compliant to this framework SHOULD use dialog a user agent compliant to this framework SHOULD use dialog
identifiers that are cryptographically random, with at least 128 bits identifiers that are cryptographically random, with at least 128 bits
of randomness. It is recommended that this randomness be split of randomness. It is recommended that this randomness be split
between the Call-ID and From header field tag in the case of a UAC. between the Call-ID and From header field tag in the case of a UAC.
Furthermore, to ensure that only applications resident in or trusted Furthermore, to ensure that only applications resident in or trusted
by on-path elements can instantiate a user interface component, a by on-path elements can instantiate a user interface component, a
user agent compliant to this specification SHOULD use the sips URI user agent compliant to this specification SHOULD use the sips URI
skipping to change at page 26, line 45 skipping to change at page 26, line 31
header field until user authorization was obtained. header field until user authorization was obtained.
If an application does not present a valid dialog identifier in its If an application does not present a valid dialog identifier in its
REFER or SUBSCRIBE request, the user agent MUST reject the request REFER or SUBSCRIBE request, the user agent MUST reject the request
with a 403 response. with a 403 response.
If a REFER request to an HTTP URI was authorized, the UA executes the If a REFER request to an HTTP URI was authorized, the UA executes the
URI and fetches the content to be rendered to the user. This URI and fetches the content to be rendered to the user. This
instantiates a presentation capable user interface component. If a instantiates a presentation capable user interface component. If a
SUBSCRIBE was authorized, a presentation free user interface SUBSCRIBE was authorized, a presentation free user interface
component was instantiated. component is instantiated.
8.3 Mapping User Input to User Interface Components 8.3 Mapping User Input to User Interface Components
Once the user interface components are instantiated, the user agent Once the user interface components are instantiated, the user agent
must direct user input to the appropriate component. In the case of must direct user input to the appropriate component. In the case of
presentation capable user interfaces, this process is known as focus presentation capable user interfaces, this process is known as focus
selection. It is done by means that are specific to the user selection. It is done by means that are specific to the user
interface on the device. In the case of a PC, for example, the interface on the device. In the case of a PC, for example, the
window manager would allow the user to select the appropriate user window manager would allow the user to select the appropriate user
interface component that their input is directed to. interface component that their input is directed to.
skipping to change at page 27, line 21 skipping to change at page 27, line 7
allows the user to select a "line", and thus the associated dialog. allows the user to select a "line", and thus the associated dialog.
Any user input on the keypad while this line is selected are fed to Any user input on the keypad while this line is selected are fed to
the user interface components associated with that dialog. the user interface components associated with that dialog.
Otherwise, for client local user interfaces, the user input is Otherwise, for client local user interfaces, the user input is
assumed to be associated with all user interface components. For assumed to be associated with all user interface components. For
client remote user interfaces, the user device converts the user client remote user interfaces, the user device converts the user
input to media, typically conveyed using RFC 2833, and sends this to input to media, typically conveyed using RFC 2833, and sends this to
the client remote user interface. This user interface then needs to the client remote user interface. This user interface then needs to
map user input from potentially many media streams into user map user input from potentially many media streams into user
interface events. The process for doing this is described in Section interface events. The process for doing this is described in
6.3. Section 6.3.
8.4 Receiving Updates to User Interface Components 8.4 Receiving Updates to User Interface Components
For presentation capable user interfaces, updates to the user For presentation capable user interfaces, updates to the user
interface occur in ways specific to that user interface component. interface occur in ways specific to that user interface component.
In the case of HTML, for example, the document can tell the client to In the case of HTML, for example, the document can tell the client to
fetch a new document periodically. However, this framework does not fetch a new document periodically. However, this framework does not
provide any additional machinery to asynchronously push a new user provide any additional machinery to asynchronously push a new user
interface component to the client. interface component to the client.
skipping to change at page 28, line 51 skipping to change at page 28, line 37
user interface would appear as a separate window. The user interacts user interface would appear as a separate window. The user interacts
with the call recording application by selecting its window, and with with the call recording application by selecting its window, and with
the pre-paid calling card application by selecting its window. Focus the pre-paid calling card application by selecting its window. Focus
determination is literally provided by the PC window manager. It is determination is literally provided by the PC window manager. It is
clear to which application the user input is targeted. clear to which application the user input is targeted.
As another example, consider the same two applications, but on a As another example, consider the same two applications, but on a
"smart phone" that has a set of buttons, and next to each button, an "smart phone" that has a set of buttons, and next to each button, an
LCD display that can provide the user with an option. This user LCD display that can provide the user with an option. This user
interface can be represented using the Wireless Markup Language interface can be represented using the Wireless Markup Language
(WML). (WML), for example.
The phone would allocate some number of buttons to each application. The phone would allocate some number of buttons to each application.
The prepaid calling card would get one button for its "hangup" The prepaid calling card would get one button for its "hangup"
command, and the recording application would get one for its command, and the recording application would get one for its "start/
"start/stop" command. The user can easily determine which stop" command. The user can easily determine which application to
application to interact with by pressing the appropriate button. interact with by pressing the appropriate button. Pressing a button
Pressing a button determines focus and provides user input, both at determines focus and provides user input, both at the same time.
the same time.
Unfortunately, not all devices will have these advanced displays. A Unfortunately, not all devices will have these advanced displays. A
PSTN gateway, or a basic IP telephone, may only have a 12-key keypad. PSTN gateway, or a basic IP telephone, may only have a 12-key keypad.
The user interfaces for these devices are provided through the Keypad The user interfaces for these devices are provided through the Keypad
Markup Language (KPML). Considering once again the feature Markup Language (KPML). Considering once again the feature
interaction case above, the pre-paid calling card application and the interaction case above, the pre-paid calling card application and the
call recording application would both pass a KPML document to the call recording application would both pass a KPML document to the
device. When the user presses a button on the keypad, to which device. When the user presses a button on the keypad, to which
document does the input apply? The device does not allow the user to document does the input apply? The device does not allow the user to
select. A device where the user cannot provide focus is called a select. A device where the user cannot provide focus is called a
skipping to change at page 29, line 39 skipping to change at page 29, line 23
mechanism for feature interaction resolution than the PSTN on devices mechanism for feature interaction resolution than the PSTN on devices
which have the same user interface as they do on the PSTN. Devices which have the same user interface as they do on the PSTN. Devices
with better displays, such as PCs or screen phones, can benefit from with better displays, such as PCs or screen phones, can benefit from
the capabilities of this framework, allowing the user to determine the capabilities of this framework, allowing the user to determine
which application they are interacting with. which application they are interacting with.
Indeed, when a user provides input on a focusless device, the input Indeed, when a user provides input on a focusless device, the input
must be passed to all client local user interfaces, AND all client must be passed to all client local user interfaces, AND all client
remote user interfaces, unless the markup tells the UI to suppress remote user interfaces, unless the markup tells the UI to suppress
the media. In the case of KPML, key events are passed to remote user the media. In the case of KPML, key events are passed to remote user
interfaces by encoding them in RFC 2833 [17]. Of course, since a interfaces by encoding them in RFC 2833 [19]. Of course, since a
client cannot determine if a media stream terminates in a remote user client cannot determine if a media stream terminates in a remote user
interface or not, these key events are passed in all audio media interface or not, these key events are passed in all audio media
streams unless the KPML request document is used to suppress. streams unless the KPML request document is used to suppress.
9.2 Client-Remote UI 9.2 Client-Remote UI
When the user interfaces run remotely, the determination of focus can When the user interfaces run remotely, the determination of focus can
be much, much harder. There are many architectures that can be be much, much harder. There are many architectures that can be
deployed to handle the interaction. None are ideal. However, all deployed to handle the interaction. None are ideal. However, all
are beyond the scope of this specification. are beyond the scope of this specification.
skipping to change at page 30, line 41 skipping to change at page 30, line 24
Such interactions are best handled by markups which natively support Such interactions are best handled by markups which natively support
such interactions, such as SALT, and thus require no explicit support such interactions, such as SALT, and thus require no explicit support
from this framework. from this framework.
11. Example Call Flow 11. Example Call Flow
This section shows the operation of a call recording application. This section shows the operation of a call recording application.
This application allows a user to record the media in their call by This application allows a user to record the media in their call by
clicking on a button in a web form. The application uses a clicking on a button in a web form. The application uses a
presentation capable user interface component that is pushed to the presentation capable user interface component that is pushed to the
caller. caller. The conventions of [17] are used to describe representation
of long message lines.
A Recording App B A Recording App B
|(1) INVITE | | |(1) INVITE | |
|----------------------->| | |----------------------->| |
| |(2) INVITE | | |(2) INVITE |
| |----------------------->| | |----------------------->|
| |(3) 200 OK | | |(3) 200 OK |
| |<-----------------------| | |<-----------------------|
|(4) 200 OK | | |(4) 200 OK | |
|<-----------------------| | |<-----------------------| |
skipping to change at page 31, line 39 skipping to change at page 31, line 39
|<-----------------------| | |<-----------------------| |
|(13) NOTIFY | | |(13) NOTIFY | |
|----------------------->| | |----------------------->| |
|(14) 200 OK | | |(14) 200 OK | |
|<-----------------------| | |<-----------------------| |
|(15) HTTP POST | | |(15) HTTP POST | |
|----------------------->| | |----------------------->| |
|(16) 200 OK | | |(16) 200 OK | |
|<-----------------------| | |<-----------------------| |
Figure 8 Figure 7
First, the caller, A, sends an INVITE to setup a call (message 1). First, the caller, A, sends an INVITE to setup a call (message 1).
Since the caller supports the framework, and can handle presentation Since the caller supports the framework, and can handle presentation
capable user interface components, it includes the Supported header capable user interface components, it includes the Supported header
field indicating that the GRUU extension and the REFER context field indicating that the GRUU extension and the Target-Dialog header
extension are understood, Allow indicating that REFER is understood, field are understood, Allow indicating that REFER is understood, and
and a Contact header field that includes the "schemes" header field a Contact header field that includes the "schemes" header field
parameter. parameter.
INVITE sips:B@example.com SIP/2.0 INVITE sips:B@example.com SIP/2.0
Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8 Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
From: Caller <sip:A@example.com>;tag=kkaz- From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com> To: Callee <sip:B@example.org>
Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com Call-ID: fa77as7dad8-sd98ajzz@host.example.com
CSeq: 1 INVITE CSeq: 1 INVITE
Max-Forwards: 70 Max-Forwards: 70
Supported: gruu, refer-context Supported: gruu, tdialog
Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips" Accept: application/sdp, text/html
<allOneLine>
Contact: <sips:A@example.com;opaque=urn:uuid:f81d4f
ae-7dec-11d0-a765-00a0c91e6bf6;grid=99a>;schemes="http,sip,sips"
</allOneLine>
Content-Length: ... Content-Length: ...
Content-Type: application/sdp Content-Type: application/sdp
--SDP not shown-- --SDP not shown--
The proxy acts as a recording server, and forwards the INVITE to the The proxy acts as a recording server, and forwards the INVITE to the
called party (message 2): called party (message 2). It strips the Record-Route it would
normally insert due to the presence of the GRUU in the INVITE:
INVITE sips:B@pc.example.com SIP/2.0 INVITE sips:B@pc.example.com SIP/2.0
Record-Route: <sips:app.example.com;lr>
Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK97sh Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK97sh
Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8 Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
From: Caller <sip:A@example.com>;tag=kkaz- From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com> To: Callee <sip:B@example.org>
Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com Call-ID: fa77as7dad8-sd98ajzz@host.example.com
CSeq: 1 INVITE CSeq: 1 INVITE
Max-Forwards: 69 Max-Forwards: 70
Supported: gruu, refer-context Supported: gruu, tdialog
Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips" Accept: application/sdp, text/html
<allOneLine>
Contact: <sips:A@example.com;opaque=urn:uuid:f81d4f
ae-7dec-11d0-a765-00a0c91e6bf6;grid=99a>;schemes="http,sip,sips"
</allOneLine>
Content-Length: ... Content-Length: ...
Content-Type: application/sdp Content-Type: application/sdp
--SDP not shown-- --SDP not shown--
B accepts the call with a 200 OK (message 3). It does not support B accepts the call with a 200 OK (message 3). It does not support
the framework, and so the various header fields are not present. the framework, and so the various header fields are not present.
SIP/2.0 200 OK SIP/2.0 200 OK
Record-Route: <ssip:app.example.com;lr>
Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK97sh Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK97sh
Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8 Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
From: Caller <sip:A@example.com>;tag=kkaz- From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com>;tag=7777 To: Callee <sip:B@example.com>;tag=7777
Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com Call-ID: fa77as7dad8-sd98ajzz@host.example.com
CSeq: 1 INVITE CSeq: 1 INVITE
Contact: <sips:B@pc.example.com> Contact: <sips:B@pc.example.com>
Content-Length: ... Content-Length: ...
Content-Type: application/sdp Content-Type: application/sdp
--SDP not shown-- --SDP not shown--
This 200 OK is passed back to the caller (message 4): This 200 OK is passed back to the caller (message 4):
SIP/2.0 200 OK SIP/2.0 200 OK
Record-Route: <sips:app.example.com;lr> Record-Route: <sips:app.example.com;lr>
Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8 Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
From: Caller <sip:A@example.com>;tag=kkaz- From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com>;tag=7777 To: Callee <sip:B@example.com>;tag=7777
Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com Call-ID: fa77as7dad8-sd98ajzz@host.example.com
CSeq: 1 INVITE CSeq: 1 INVITE
Contact: <sips:B@pc.example.com> Contact: <sips:B@pc.example.com>
Content-Length: ... Content-Length: ...
Content-Type: application/sdp Content-Type: application/sdp
--SDP not shown-- --SDP not shown--
The caller generates an ACK (message 5). The caller generates an ACK (message 5).
ACK sips:B@pc.example.com ACK sips:B@pc.example.com
Route: <sips:app.example.com;lr> Route: <sips:app.example.com;lr>
Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz9 Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz9
From: Caller <sip:A@example.com>;tag=kkaz- From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com>;tag=7777 To: Callee <sip:B@example.com>;tag=7777
Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com Call-ID: fa77as7dad8-sd98ajzz@host.example.com
CSeq: 1 ACK CSeq: 1 ACK
The ACK is forwarded to the called party (message 6). The ACK is forwarded to the called party (message 6).
ACK sips:B@pc.example.com ACK sips:B@pc.example.com
Via: SIP/2.0/TLS app.example.com;branch=z9hG4bKh7s Via: SIP/2.0/TLS app.example.com;branch=z9hG4bKh7s
Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz9 Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz9
From: Caller <sip:A@example.com>;tag=kkaz- From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com>;tag=7777 To: Callee <sip:B@example.com>;tag=7777
Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com Call-ID: fa77as7dad8-sd98ajzz@host.example.com
CSeq: 1 ACK CSeq: 1 ACK
Now, the application decides to push a user interface component to Now, the application decides to push a user interface component to
user A. So, it sends it a REFER request (message 7): user A. So, it sends it a REFER request (message 7):
REFER sips:bad998asd8asd0000a@example.com SIP/2.0 <allOneLine>
REFER sips:A@example.com;opaque=urn:uuid:f81d4f
ae-7dec-11d0-a765-00a0c91e6bf6;grid=99a SIP/2.0
</allOneLine>
Refer-To: https://app.example.com/script.pl Refer-To: https://app.example.com/script.pl
;context="kkaz-,7777,faif9ahhs9dd8==-sd98ajzz@host.example.com" Target-Dialog: fa77as7dad8-sd98ajzz@host.example.com
;remote-tag=7777;local-tag=kkaz-
Require: tdialog
Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK9zh6 Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK9zh6
Max-Forwards: 70 Max-Forwards: 70
From: Recorder Application <sip:app.example.com>;tag=jhgf From: Recorder Application <sip:app.example.com>;tag=jhgf
To: Caller <sip:A@example.com> <allOneLine>
To: Caller <sips:A@example.com;opaque=urn:uuid:f81d4f
ae-7dec-11d0-a765-00a0c91e6bf6;grid=99a>
</allOneLine>
Require: tdialog
Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
Call-ID: 66676776767@app.example.com Call-ID: 66676776767@app.example.com
CSeq: 1 REFER CSeq: 1 REFER
Event: refer Event: refer
Contact: <sips:app.example.com> Contact: <sips:app.example.com>
The REFER is answered by a 200 OK (message 8). The REFER request goes to itself, where the Request URI is resolved
to the registered contact of A, and then sent there. The REFER is
answered by a 200 OK (message 8).
SIP/2.0 200 OK SIP/2.0 200 OK
Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK9zh6 Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK9zh6
From: Recorder Application <sip:app.example.com>;tag=jhgf From: Recorder Application <sip:app.example.com>;tag=jhgf
To: Caller <sip:A@example.com>;tag=pqoew To: Caller <sip:A@example.com>;tag=pqoew
Call-ID: 66676776767@app.example.com Call-ID: 66676776767@app.example.com
Supported: gruu, refer-context Supported: gruu, tdialog
Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips" <allOneLine>
Contact: <sips:A@example.com;opaque=urn:uuid:f81d4f
ae-7dec-11d0-a765-00a0c91e6bf6;grid=99a>;schemes="http,sip,sips"
</allOneLine>
CSeq: 1 REFER CSeq: 1 REFER
User A sends a NOTIFY (message 9): User A sends a NOTIFY (message 9):
NOTIFY sips:app.example.com SIP/2.0 NOTIFY sips:app.example.com SIP/2.0
Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995 Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995
To: Recorder Application <sip:app.example.com>;tag=jhgf To: Recorder Application <sip:app.example.com>;tag=jhgf
From: Caller <sip:A@example.com>;tag=pqoew From: Caller <sip:A@example.com>;tag=pqoew
Call-ID: 66676776767@app.example.com Call-ID: 66676776767@app.example.com
CSeq: 1 NOTIFY CSeq: 1 NOTIFY
Max-Forwards: 70 Max-Forwards: 70
<allOneLine>
Contact: <sips:A@example.com;opaque=urn:uuid:f81d4f
ae-7dec-11d0-a765-00a0c91e6bf6;grid=99a>;schemes="http,sip,sips"
</allOneLine>
Event: refer;id=93809824 Event: refer;id=93809824
Subscription-State: active;expires=3600 Subscription-State: active;expires=3600
Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips"
Content-Type: message/sipfrag;version=2.0 Content-Type: message/sipfrag;version=2.0
Content-Length: 20 Content-Length: 20
SIP/2.0 100 Trying SIP/2.0 100 Trying
And the recording server responds with a 200 OK (message 10) And the recording server responds with a 200 OK (message 10)
SIP/2.0 200 OK SIP/2.0 200 OK
Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995 Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995
To: Recorder Application <sip:app.example.com>;tag=jhgf To: Recorder Application <sip:app.example.com>;tag=jhgf
From: Caller <sip:A@example.com>;tag=pqoew From: Caller <sip:A@example.com>;tag=pqoew
Call-ID: 66676776767@app.example.com Call-ID: 66676776767@app.example.com
CSeq: 1 NOTIFY CSeq: 1 NOTIFY
skipping to change at page 35, line 15 skipping to change at page 35, line 32
And the recording server responds with a 200 OK (message 10) And the recording server responds with a 200 OK (message 10)
SIP/2.0 200 OK SIP/2.0 200 OK
Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995 Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995
To: Recorder Application <sip:app.example.com>;tag=jhgf To: Recorder Application <sip:app.example.com>;tag=jhgf
From: Caller <sip:A@example.com>;tag=pqoew From: Caller <sip:A@example.com>;tag=pqoew
Call-ID: 66676776767@app.example.com Call-ID: 66676776767@app.example.com
CSeq: 1 NOTIFY CSeq: 1 NOTIFY
The REFER request contained a "context" Refer-To header field The REFER request contained a Target-Dialog header field parameter
parameter with a valid dialog identifier. Furthermore, all of the with a valid dialog identifier. Furthermore, all of the signaling
signaling was over TLS and the dialog identifiers contain sufficient was over TLS and the dialog identifiers contain sufficient
randomness. As such, the caller, A, automatically authorizes the randomness. As such, the caller, A, automatically authorizes the
application. It then acts on the Refer-To URI, fetching the script application. It then acts on the Refer-To URI, fetching the script
from app.example.com (message 11). The response, message 12, from app.example.com (message 11). The response, message 12,
contains a web application that the user can click on to enable contains a web application that the user can click on to enable
recording. Because the client executed the URL in the Refer-To, it recording. Because the client executed the URL in the Refer-To, it
generates another NOTIFY to the application, informing it of the generates another NOTIFY to the application, informing it of the
successful response (message 13). This is answered with a 200 OK successful response (message 13). This is answered with a 200 OK
(message 14). When the user clicks on the link (message 15), the (message 14). When the user clicks on the link (message 15), the
results are posted to the server, and an updated display is provided results are posted to the server, and an updated display is provided
(message 16). (message 16).
skipping to change at page 36, line 7 skipping to change at page 36, line 20
requirements. Automatic authorization is granted if the application requirements. Automatic authorization is granted if the application
can prove that it is on the call path, or is trusted by an element on can prove that it is on the call path, or is trusted by an element on
the call path. As documented above, this can be accompished by the the call path. As documented above, this can be accompished by the
use of cryptographically random dialog identifiers and the usage of use of cryptographically random dialog identifiers and the usage of
sips for message confidentiality. It is RECOMMENDED that sips be sips for message confidentiality. It is RECOMMENDED that sips be
implemented by user agents compliant to this specification. This implemented by user agents compliant to this specification. This
does not represent a change from the requirements in RFC 3261. does not represent a change from the requirements in RFC 3261.
13. IANA Considerations 13. IANA Considerations
13.1 SIP Option Tag There are no IANA considerations associated with this specification.
This specification registers a new SIP option tag, as per the
guidelines in Section 27.1 of RFC 3261 [1].
Name: refer-context
Description: This option tag is used to identify the REFER extension
that defines the "context" parameter of the Refer-To header field.
13.2 Header Field Parameter
This specification defines a new header field parameter, as per the
registry created by [9]. The required information is as follows:
Header field in which the parameter can appear: Refer-To
Name of the Parameter context
RFC Reference RFC XXXX [[NOTE TO IANA: Please replace XXXX with the
RFC number of this specification.]]
14. Contributors 14. Contributors
This document was produced as a result of discussions amongst the This document was produced as a result of discussions amongst the
application interaction design team. All members of this team application interaction design team. All members of this team
contributed significantly to the ideas embodied in this document. contributed significantly to the ideas embodied in this document.
The members of this team were: The members of this team were:
Eric Burger Eric Burger
Cullen Jennings Cullen Jennings
skipping to change at page 37, line 8 skipping to change at page 36, line 44
The authors would like to thank Martin Dolly and Rohan Mahy for their The authors would like to thank Martin Dolly and Rohan Mahy for their
input and comments. Thanks to Allison Mankin for her support of this input and comments. Thanks to Allison Mankin for her support of this
work. work.
16. References 16. References
16.1 Normative References 16.1 Normative References
[1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP:
Session Initiation Protocol", RFC 3261, June 2002. Session Initiation Protocol", RFC 3261, June 2002.
[2] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional [2] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional
Responses in Session Initiation Protocol (SIP)", RFC 3262, June Responses in Session Initiation Protocol (SIP)", RFC 3262,
2002. June 2002.
[3] Roach, A., "Session Initiation Protocol (SIP)-Specific Event [3] Roach, A., "Session Initiation Protocol (SIP)-Specific Event
Notification", RFC 3265, June 2002. Notification", RFC 3265, June 2002.
[4] McGlashan, S., Lucas, B., Porter, B., Rehor, K., Burnett, D., [4] McGlashan, S., Lucas, B., Porter, B., Rehor, K., Burnett, D.,
Carter, J., Ferrans, J. and A. Hunt, "Voice Extensible Markup Carter, J., Ferrans, J., and A. Hunt, "Voice Extensible Markup
Language (VoiceXML) Version 2.0", W3C CR CR-voicexml20-20030220, Language (VoiceXML) Version 2.0", W3C CR CR-voicexml20-
February 2003. 20030220, February 2003.
[5] Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Indicating User [5] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Indicating
Agent Capabilities in the Session Initiation Protocol (SIP)", User Agent Capabilities in the Session Initiation Protocol
RFC 3840, August 2004. (SIP)", RFC 3840, August 2004.
[6] Sparks, R., "The Session Initiation Protocol (SIP) Refer [6] Sparks, R., "The Session Initiation Protocol (SIP) Refer
Method", RFC 3515, April 2003. Method", RFC 3515, April 2003.
[7] Burger, E., "A Session Initiation Protocol (SIP) Event Package [7] Burger, E., "A Session Initiation Protocol (SIP) Event Package
for Key Press Stimulus (KPML)", draft-ietf-sipping-kpml-07 for Key Press Stimulus (KPML)", draft-ietf-sipping-kpml-07
(work in progress), December 2004. (work in progress), December 2004.
[8] Rosenberg, J., "Obtaining and Using Globally Routable User Agent [8] Rosenberg, J., "Obtaining and Using Globally Routable User
(UA) URIs (GRUU) in the Session Initiation Protocol (SIP)", Agent (UA) URIs (GRUU) in the Session Initiation Protocol
draft-ietf-sip-gruu-02 (work in progress), July 2004. (SIP)", draft-ietf-sip-gruu-03 (work in progress),
February 2005.
[9] Camarillo, G., "The Internet Assigned Number Authority (IANA) [9] Rosenberg, J., "Request Authorization through Dialog
Identification in the Session Initiation Protocol (SIP)",
draft-ietf-sip-target-dialog-00 (work in progress), April 2005.
[10] Camarillo, G., "The Internet Assigned Number Authority (IANA)
Header Field Parameter Registry for the Session Initiation Header Field Parameter Registry for the Session Initiation
Protocol (SIP)", BCP 98, RFC 3968, December 2004. Protocol (SIP)", BCP 98, RFC 3968, December 2004.
16.2 Informative References 16.2 Informative References
[10] Peterson, J., "Enhancements for Authenticated Identity [11] Peterson, J. and C. Jennings, "Enhancements for Authenticated
Management in the Session Initiation Protocol (SIP)", Identity Management in the Session Initiation Protocol (SIP)",
draft-ietf-sip-identity-03 (work in progress), September 2004. draft-ietf-sip-identity-05 (work in progress), May 2005.
[11] Day, M., Rosenberg, J. and H. Sugano, "A Model for Presence and [12] Day, M., Rosenberg, J., and H. Sugano, "A Model for Presence
Instant Messaging", RFC 2778, February 2000. and Instant Messaging", RFC 2778, February 2000.
[12] Jennings, C., Peterson, J. and M. Watson, "Private Extensions [13] Jennings, C., Peterson, J., and M. Watson, "Private Extensions
to the Session Initiation Protocol (SIP) for Asserted Identity to the Session Initiation Protocol (SIP) for Asserted Identity
within Trusted Networks", RFC 3325, November 2002. within Trusted Networks", RFC 3325, November 2002.
[13] Rosenberg, J., "A Framework for Conferencing with the Session [14] Rosenberg, J., "A Framework for Conferencing with the Session
Initiation Protocol", Initiation Protocol",
draft-ietf-sipping-conferencing-framework-03 (work in draft-ietf-sipping-conferencing-framework-05 (work in
progress), October 2004. progress), May 2005.
[14] Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Caller [15] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller
Preferences for the Session Initiation Protocol (SIP)", RFC Preferences for the Session Initiation Protocol (SIP)",
3841, August 2004. RFC 3841, August 2004.
[15] Rosenberg, J., "An INVITE Inititiated Dialog Event Package for [16] Rosenberg, J., "An INVITE Inititiated Dialog Event Package for
the Session Initiation Protocol (SIP)", the Session Initiation Protocol (SIP)",
draft-ietf-sipping-dialog-package-05 (work in progress), draft-ietf-sipping-dialog-package-06 (work in progress),
November 2004. April 2005.
[16] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, [17] Sparks, R., "Session Initiation Protocol Torture Test
"RTP: A Transport Protocol for Real-Time Applications", RFC Messages", draft-ietf-sipping-torture-tests-07 (work in
3550, July 2003. progress), May 2005.
[17] Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits, [18] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
"RTP: A Transport Protocol for Real-Time Applications",
RFC 3550, July 2003.
[19] Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits,
Telephony Tones and Telephony Signals", RFC 2833, May 2000. Telephony Tones and Telephony Signals", RFC 2833, May 2000.
[18] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with [20] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
Session Description Protocol (SDP)", RFC 3264, June 2002. Session Description Protocol (SDP)", RFC 3264, June 2002.
[19] Rosenberg, J., "A Session Initiation Protocol (SIP) Event [21] Rosenberg, J., "A Session Initiation Protocol (SIP) Event
Package for Registrations", RFC 3680, March 2004. Package for Registrations", RFC 3680, March 2004.
Author's Address Author's Address
Jonathan Rosenberg Jonathan Rosenberg
Cisco Systems Cisco Systems
600 Lanidex Plaza 600 Lanidex Plaza
Parsippany, NJ 07054 Parsippany, NJ 07054
US US
Phone: +1 973 952-5000 Phone: +1 973 952-5000
EMail: jdrosen@cisco.com Email: jdrosen@cisco.com
URI: http://www.jdrosen.net URI: http://www.jdrosen.net
Intellectual Property Statement Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
 End of changes. 

This html diff was produced by rfcdiff 1.25, available from http://www.levkowetz.com/ietf/tools/rfcdiff/