draft-ietf-sipping-app-interaction-framework-00.txt   draft-ietf-sipping-app-interaction-framework-01.txt 
SIPPING J. Rosenberg SIPPING J. Rosenberg
Internet-Draft dynamicsoft Internet-Draft dynamicsoft
Expires: April 19, 2004 October 20, 2003 Expires: August 16, 2004 February 16, 2004
A Framework for Application Interaction in the Session Initiation A Framework for Application Interaction in the Session Initiation
Protocol (SIP) Protocol (SIP)
draft-ietf-sipping-app-interaction-framework-00 draft-ietf-sipping-app-interaction-framework-01
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
skipping to change at page 1, line 31 skipping to change at page 1, line 31
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http:// The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt. www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 19, 2004. This Internet-Draft will expire on August 16, 2004.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved. Copyright (C) The Internet Society (2004). All Rights Reserved.
Abstract Abstract
This document describes a framework and requirements for the This document describes a framework for the interaction between users
interaction between users and Session Initiation Protocol (SIP) based and Session Initiation Protocol (SIP) based applications. By
applications. By interacting with applications, users can guide the interacting with applications, users can guide the way in which they
way in which they operate. The focus of this framework is stimulus operate. The focus of this framework is stimulus signaling, which
signaling, which allows a user agent to interact with an application allows a user agent to interact with an application without knowledge
without knowledge of the semantics of that application. Stimulus of the semantics of that application. Stimulus signaling can occur to
signaling can occur to a user interface running locally with the a user interface running locally with the client, or to a remote user
client, or to a remote user interface, through media streams. interface, through media streams. Stimulus signaling encompasses a
Stimulus signaling encompasses a wide range of mechanisms, ranging wide range of mechanisms, ranging from clicking on hyperlinks, to
from clicking on hyperlinks, to pressing buttons, to traditional Dual pressing buttons, to traditional Dual Tone Multi Frequency (DTMF)
Tone Multi Frequency (DTMF) input. In all cases, stimulus signaling input. In all cases, stimulus signaling is supported through the use
is supported through the use of markup languages, which play a key of markup languages, which play a key role in this framework.
role in this framework.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . 4
3. A Model for Application Interaction . . . . . . . . . . . . 7 3. A Model for Application Interaction . . . . . . . . . . . . 7
3.1 Functional vs. Stimulus . . . . . . . . . . . . . . . . . . 8 3.1 Functional vs. Stimulus . . . . . . . . . . . . . . . . . . 8
3.2 Real-Time vs. Non-Real Time . . . . . . . . . . . . . . . . 9 3.2 Real-Time vs. Non-Real Time . . . . . . . . . . . . . . . . 9
3.3 Client-Local vs. Client-Remote . . . . . . . . . . . . . . . 9 3.3 Client-Local vs. Client-Remote . . . . . . . . . . . . . . . 9
3.4 Presentation Capable vs. Presentation Free . . . . . . . . . 10 3.4 Presentation Capable vs. Presentation Free . . . . . . . . . 10
3.5 Interaction Scenarios on Telephones . . . . . . . . . . . . 11 3.5 Interaction Scenarios on Telephones . . . . . . . . . . . . 11
3.5.1 Client Remote . . . . . . . . . . . . . . . . . . . . . . . 11 3.5.1 Client Remote . . . . . . . . . . . . . . . . . . . . . . . 11
3.5.2 Client Local . . . . . . . . . . . . . . . . . . . . . . . . 11 3.5.2 Client Local . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5.3 Flip-Flop . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.5.3 Flip-Flop . . . . . . . . . . . . . . . . . . . . . . . . . 12
4. Framework Overview . . . . . . . . . . . . . . . . . . . . . 13 4. Framework Overview . . . . . . . . . . . . . . . . . . . . . 13
5. Client Local Interfaces . . . . . . . . . . . . . . . . . . 16 5. Application Behavior . . . . . . . . . . . . . . . . . . . . 16
5.1 Discovering Capabilities . . . . . . . . . . . . . . . . . . 16 5.1 Client Local Interfaces . . . . . . . . . . . . . . . . . . 16
5.2 Pushing an Initial Interface Component . . . . . . . . . . . 16 5.1.1 Discovering Capabilities . . . . . . . . . . . . . . . . . . 16
5.3 Updating an Interface Component . . . . . . . . . . . . . . 18 5.1.2 Pushing an Initial Interface Component . . . . . . . . . . . 16
5.4 Terminating an Interface Component . . . . . . . . . . . . . 18 5.1.3 Updating an Interface Component . . . . . . . . . . . . . . 18
6. Client Remote Interfaces . . . . . . . . . . . . . . . . . . 19 5.1.4 Terminating an Interface Component . . . . . . . . . . . . . 18
6.1 Originating and Terminating Applications . . . . . . . . . . 19 5.2 Client Remote Interfaces . . . . . . . . . . . . . . . . . . 19
6.2 Intermediary Applications . . . . . . . . . . . . . . . . . 19 5.2.1 Originating and Terminating Applications . . . . . . . . . . 19
7. Inter-Application Feature Interaction . . . . . . . . . . . 21 5.2.2 Intermediary Applications . . . . . . . . . . . . . . . . . 19
7.1 Client Local UI . . . . . . . . . . . . . . . . . . . . . . 21 6. User Agent Behavior . . . . . . . . . . . . . . . . . . . . 21
7.2 Client-Remote UI . . . . . . . . . . . . . . . . . . . . . . 22 6.1 Advertising Capabilities . . . . . . . . . . . . . . . . . . 21
8. Intra Application Feature Interaction . . . . . . . . . . . 23 6.2 Receiving User Interface Components . . . . . . . . . . . . 21
9. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6.3 Mapping User Input to User Interface Components . . . . . . 23
10. Security Considerations . . . . . . . . . . . . . . . . . . 25 6.4 Receiving Updates to User Interface Components . . . . . . . 23
11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 6.5 Terminating a User Interface Component . . . . . . . . . . . 23
Informative References . . . . . . . . . . . . . . . . . . . 27 7. Inter-Application Feature Interaction . . . . . . . . . . . 25
Author's Address . . . . . . . . . . . . . . . . . . . . . . 28 7.1 Client Local UI . . . . . . . . . . . . . . . . . . . . . . 25
Intellectual Property and Copyright Statements . . . . . . . 29 7.2 Client-Remote UI . . . . . . . . . . . . . . . . . . . . . . 26
8. Intra Application Feature Interaction . . . . . . . . . . . 27
9. Example Call Flow . . . . . . . . . . . . . . . . . . . . . 28
10. Security Considerations . . . . . . . . . . . . . . . . . . 33
11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 34
Normative References . . . . . . . . . . . . . . . . . . . . 35
Informative References . . . . . . . . . . . . . . . . . . . 36
Author's Address . . . . . . . . . . . . . . . . . . . . . . 36
Intellectual Property and Copyright Statements . . . . . . . 37
1. Introduction 1. Introduction
The Session Initiation Protocol (SIP) [1] provides the ability for The Session Initiation Protocol (SIP) [1] provides the ability for
users to initiate, manage, and terminate communications sessions. users to initiate, manage, and terminate communications sessions.
Frequently, these sessions will involve a SIP application. A SIP Frequently, these sessions will involve a SIP application. A SIP
application is defined as a program running on a SIP-based element application is defined as a program running on a SIP-based element
(such as a proxy or user agent) that provides some value-added (such as a proxy or user agent) that provides some value-added
function to a user or system administrator. Examples of SIP function to a user or system administrator. Examples of SIP
applications include pre-paid calling card calls, conferencing, and applications include pre-paid calling card calls, conferencing, and
presence-based [3] call routing. presence-based [10] call routing.
In order for most applications to properly function, they need input In order for most applications to properly function, they need input
from the user to guide their operation. As an example, a pre-paid from the user to guide their operation. As an example, a pre-paid
calling card application requires the user to input their calling calling card application requires the user to input their calling
card number, their PIN code, and the destination number they wish to card number, their PIN code, and the destination number they wish to
reach. The process by which a user provides input to an application reach. The process by which a user provides input to an application
is called "application interaction". is called "application interaction".
Application interaction can be either functional or stimulus. Application interaction can be either functional or stimulus.
Functional interaction requires the user agent to understand the Functional interaction requires the user agent to understand the
skipping to change at page 4, line 11 skipping to change at page 4, line 11
user interfaces and applications can be distributed throughout a user interfaces and applications can be distributed throughout a
network. This model is then used to describe how applications can network. This model is then used to describe how applications can
instantiate and manage user interfaces. instantiate and manage user interfaces.
2. Definitions 2. Definitions
SIP Application: A SIP application is defined as a program running on SIP Application: A SIP application is defined as a program running on
a SIP-based element (such as a proxy or user agent) that provides a SIP-based element (such as a proxy or user agent) that provides
some value-added function to a user or system administrator. some value-added function to a user or system administrator.
Examples of SIP applications include pre-paid calling card calls, Examples of SIP applications include pre-paid calling card calls,
conferencing, and presence-based [3] call routing. conferencing, and presence-based [10] call routing.
Application Interaction: The process by which a user provides input Application Interaction: The process by which a user provides input
to an application. to an application.
Real-Time Application Interaction: Application interaction that takes Real-Time Application Interaction: Application interaction that takes
place while an application instance is executing. For example, place while an application instance is executing. For example,
when a user enters their PIN number into a pre-paid calling card when a user enters their PIN number into a pre-paid calling card
application, this is real-time application interaction. application, this is real-time application interaction.
Non-Real Time Application Interaction: Application interaction that Non-Real Time Application Interaction: Application interaction that
takes place asynchronously with the execution of the application. takes place asynchronously with the execution of the application.
Generally, non-real time application interaction is accomplished Generally, non-real time application interaction is accomplished
through provisioning. through provisioning.
Functional Application Interaction: Application interaction is Functional Application Interaction: Application interaction is
functional when the user device has an understanding of the functional when the user device has an understanding of the
semantics of the application that the user is interacting with. semantics of the interaction with the application.
Stimulus Application Interaction: Application interaction is Stimulus Application Interaction: Application interaction is
considered to be stimulus when the user device has no considered to be stimulus when the user device has no
understanding of the semantics of the application that the user is understanding of the semantics of the interaction with the
interacting with. application.
User Interface (UI): The user interface provides the user with User Interface (UI): The user interface provides the user with
context in order to make decisions about what they want. The user context in order to make decisions about what they want. The user
enters information into the user interface. The user interface enters information into the user interface. The user interface
interprets the information, and passes it to the application. interprets the information, and passes it to the application.
User Interface Component: A piece of user interface which operates User Interface Component: A piece of user interface which operates
independently of other pieces of the user interface. For example, independently of other pieces of the user interface. For example,
a user might have two separate web interfaces to a pre-paid a user might have two separate web interfaces to a pre-paid
calling card application - one for hanging up and making another calling card application - one for hanging up and making another
skipping to change at page 5, line 14 skipping to change at page 5, line 14
User Input: The "raw" information passed from a user to a user User Input: The "raw" information passed from a user to a user
interface. Examples of user input include a spoken word or a click interface. Examples of user input include a spoken word or a click
on a hyperlink. on a hyperlink.
Client-Local User Interface: A user interface which is co-resident Client-Local User Interface: A user interface which is co-resident
with the user device. with the user device.
Client Remote User Interface: A user interface which executes Client Remote User Interface: A user interface which executes
remotely from the user device. In this case, a standardized remotely from the user device. In this case, a standardized
interface is needed between them. Typically, this is done through interface is needed between the user device and the user
media sessions - audio, video, or application sharing. interface. Typically, this is done through media sessions - audio,
video, or application sharing.
Media Interaction: A means of separating a user and a user interface Media Interaction: A means of separating a user and a user interface
by connecting them with media streams. by connecting them with media streams.
Interactive Voice Response (IVR): An IVR is a type of user interface Interactive Voice Response (IVR): An IVR is a type of user interface
that allows users to speak commands to the application, and hear that allows users to speak commands to the application, and hear
responses to those commands prompting for more information. responses to those commands prompting for more information.
Prompt-and-Collect: The basic primitive of an IVR user interface. The Prompt-and-Collect: The basic primitive of an IVR user interface. The
user is presented with a voice option, and the user speaks their user is presented with a voice option, and the user speaks their
skipping to change at page 5, line 38 skipping to change at page 5, line 39
Barge-In: In an IVR user interface, a user is prompted to enter some Barge-In: In an IVR user interface, a user is prompted to enter some
information. With some prompts, the user may enter the requested information. With some prompts, the user may enter the requested
information before the prompt completes. In that case, the prompt information before the prompt completes. In that case, the prompt
ceases. The act of entering the information before completion of ceases. The act of entering the information before completion of
the prompt is referred to as barge-in. the prompt is referred to as barge-in.
Focus: A user interface component has focus when user input is Focus: A user interface component has focus when user input is
provided fed to it, as opposed to any other user interface provided fed to it, as opposed to any other user interface
components. This is not to be confused with the term focus within components. This is not to be confused with the term focus within
the SIP conferencing framework, which refers to the center user the SIP conferencing framework, which refers to the center user
agent in a conference [4]. agent in a conference [12].
Focus Determination: The process by which the user device determines Focus Determination: The process by which the user device determines
which user interface component will receive the user input. which user interface component will receive the user input.
Focusless User Interface: A user interface which has no ability to Focusless User Interface: A user interface which has no ability to
perform focus determination. An example of a focusless user perform focus determination. An example of a focusless user
interface is a keypad on a telephone. interface is a keypad on a telephone.
Presentation Capable UI: A user interface which can prompt the user Presentation Capable UI: A user interface which can prompt the user
with input, collect results, and then prompt the user with new with input, collect results, and then prompt the user with new
skipping to change at page 9, line 7 skipping to change at page 9, line 7
user interface is generic, totally ignorant of the details of the user interface is generic, totally ignorant of the details of the
application. Indeed, the application may pass instructions to the application. Indeed, the application may pass instructions to the
user interface describing how it should operate. The user interface user interface describing how it should operate. The user interface
translates user input into "stimulus" - which are data understood translates user input into "stimulus" - which are data understood
only by the application, and not by the user interface. Because they only by the application, and not by the user interface. Because they
are generic, and because they require communications with the are generic, and because they require communications with the
application in order to change the way in which they render application in order to change the way in which they render
information to the user, stimulus user interfaces are usually slower, information to the user, stimulus user interfaces are usually slower,
less user friendly, and less responsive than a functional less user friendly, and less responsive than a functional
counterpart. However, they allow for substantial innovation in counterpart. However, they allow for substantial innovation in
applications, since no standardization activity is needed to built a applications, since no standardization activity is needed to build a
new application, as long as it can interact with the user within the new application, as long as it can interact with the user within the
confines of the user interface mechanism. The web is an example of a confines of the user interface mechanism. The web is an example of a
stimulus user interface to applications. stimulus user interface to applications.
In SIP systems, functional interfaces are provided by extending the In SIP systems, functional interfaces are provided by extending the
SIP protocol to provide the needed functionality. For example, the SIP protocol to provide the needed functionality. For example, the
SIP caller preferences specification [5] provides a functional SIP caller preferences specification [13] provides a functional
interface that allows a user to request applications to route the interface that allows a user to request applications to route the
call to specific types of user agents. Functional interfaces are call to specific types of user agents. Functional interfaces are
important, but are not the subject of this framework. The primary important, but are not the subject of this framework. The primary
goal of this framework is to address the role of stimulus interfaces goal of this framework is to address the role of stimulus interfaces
to SIP applications. to SIP applications.
3.2 Real-Time vs. Non-Real Time 3.2 Real-Time vs. Non-Real Time
Application interaction systems can also be real-time or Application interaction systems can also be real-time or
non-real-time. Non-real interaction allows the user to enter non-real-time. Non-real interaction allows the user to enter
skipping to change at page 9, line 46 skipping to change at page 9, line 46
user interface), or the user interface runs in a host separated from user interface), or the user interface runs in a host separated from
the client (which we refer to as a client-remote user interface). In the client (which we refer to as a client-remote user interface). In
a client-remote user interface, there exists some kind of protocol a client-remote user interface, there exists some kind of protocol
between the client device and the UI that allows the client to between the client device and the UI that allows the client to
interact with the user interface over a network. interact with the user interface over a network.
The most important way to separate the UI and the client device is The most important way to separate the UI and the client device is
through media interaction. In media interaction, the interface through media interaction. In media interaction, the interface
between the user and the user interface is through media - audio, between the user and the user interface is through media - audio,
video, messaging, and so on. This is the classic mode of operation video, messaging, and so on. This is the classic mode of operation
for VoiceXML [2], where the user interface (also referred to as the for VoiceXML [3], where the user interface (also referred to as the
voice browser) runs on a platform in the network. Users communicate voice browser) runs on a platform in the network. Users communicate
with the voice browser through the telephone network (or using a SIP with the voice browser through the telephone network (or using a SIP
session). The voice browser interacts with the application using HTTP session). The voice browser interacts with the application using HTTP
to convey the information collected from the user. to convey the information collected from the user.
We refer to the second sub-case as a client-local user interface. In In the case of a client-local user interface, the user interface runs
this case, the user interface runs co-located with the user. The co-located with the user device. The interface between them is
interface between them is through the software that interprets the through the software that interprets the users input and passes them
users input and passes them to the user interface. The classic to the user interface. The classic example of this is the web. In the
example of this is the web. In the web, the user interface is a web web, the user interface is a web browser, and the interface is
browser, and the interface is defined by the HTML document that it's defined by the HTML document that it's rendering. The user interacts
rendering. The user interacts directly with the user interface directly with the user interface running in the browser. The results
running in the browser. The results of that user interface are sent of that user interface are sent to the application (running on the
to the application (running on the web server) using HTTP. web server) using HTTP.
It is important to note that whether or not the user interface is It is important to note that whether or not the user interface is
local, or remote (in the case of media interaction), is not a local, or remote (in the case of media interaction), is not a
property of the modality of the interface, but rather a property of property of the modality of the interface, but rather a property of
the system. As an example, it is possible for a web-based user the system. As an example, it is possible for a web-based user
interface to be provided with a client-remote user interface. In such interface to be provided with a client-remote user interface. In such
a scenario, video and application sharing media sessions can be used a scenario, video and application sharing media sessions can be used
between the user and the user interface. The user interface, still between the user and the user interface. The user interface, still
guided by HTML, now runs "in the network", remote from the client. guided by HTML, now runs "in the network", remote from the client.
Similarly, a VoiceXML document can be interpreted locally by a client Similarly, a VoiceXML document can be interpreted locally by a client
skipping to change at page 11, line 17 skipping to change at page 11, line 17
This same model can apply to a telephone. In a traditional telephone, This same model can apply to a telephone. In a traditional telephone,
the user interface consists of a 12-key keypad, a speaker, and a the user interface consists of a 12-key keypad, a speaker, and a
microphone. Indeed, from here forward, the term "telephone" is used microphone. Indeed, from here forward, the term "telephone" is used
to represent any device that meets, at a minimum, the characteristics to represent any device that meets, at a minimum, the characteristics
described in the previous sentence. Circuit-switched telephony described in the previous sentence. Circuit-switched telephony
applications are almost universally client-remote user interfaces. In applications are almost universally client-remote user interfaces. In
the Public Switched Telephone Network (PSTN), there is usually a the Public Switched Telephone Network (PSTN), there is usually a
circuit interface between the user and the user interface. The user circuit interface between the user and the user interface. The user
input from the keypad is conveyed used Dual-Tone Multi-Frequency input from the keypad is conveyed used Dual-Tone Multi-Frequency
(DTMF), and the microphone input as PCM encoded voice. (DTMF), and the microphone input as Pulse Code Modulated (PCM)
encoded voice.
In an IP-based system, there is more variability in how the system In an IP-based system, there is more variability in how the system
can be instantiated. Both client-remote and client-local user can be instantiated. Both client-remote and client-local user
interfaces to a telephone can be provided. interfaces to a telephone can be provided.
In this framework, a PSTN gateway can be considered a "user proxy". In this framework, a PSTN gateway can be considered a "user proxy".
It is a proxy for the user because it can provide, to a user It is a proxy for the user because it can provide, to a user
interface on an IP network, input taken from a user on a circuit interface on an IP network, input taken from a user on a circuit
switched telephone. The gateway may be able to run a client-local switched telephone. The gateway may be able to run a client-local
user interface, just as an IP telephone might. user interface, just as an IP telephone might.
3.5.1 Client Remote 3.5.1 Client Remote
The most obvious instantiation is the "classic" circuit-switched The most obvious instantiation is the "classic" circuit-switched
telephony model. In that model, the user interface runs remotely from telephony model. In that model, the user interface runs remotely from
the client. The interface between the user and the user interface is the client. The interface between the user and the user interface is
through media, set up by SIP and carried over the Real Time Transport through media, set up by SIP and carried over the Real Time Transport
Protocol (RTP) [7]. The microphone input can be carried using any Protocol (RTP) [14]. The microphone input can be carried using any
suitable voice encoding algorithm. The keypad input can be conveyed suitable voice encoding algorithm. The keypad input can be conveyed
in one of two ways. The first is to convert the keypad input to DTMF, in one of two ways. The first is to convert the keypad input to DTMF,
and then convey that DTMF using a suitance encoding algorithm for it and then convey that DTMF using a suitance encoding algorithm for it
(such as PCMU). An alternative, and generally the preferred approach, (such as PCMU). An alternative, and generally the preferred approach,
is to transmit the keypad input using RFC 2833 [8], which provides an is to transmit the keypad input using RFC 2833 [15], which provides
encoding mechanism for carrying keypad input within RTP. an encoding mechanism for carrying keypad input within RTP.
In this classic model, the user interface would run on a server in In this classic model, the user interface would run on a server in
the IP network. It would perform speech recognition and DTMF the IP network. It would perform speech recognition and DTMF
recognition to derive the user intent, feed them through the user recognition to derive the user intent, feed them through the user
interface, and provide the result to an application. interface, and provide the result to an application.
3.5.2 Client Local 3.5.2 Client Local
An alternative model is for the entire user interface to reside on An alternative model is for the entire user interface to reside on
the telephone. The user interface can be a VoiceXML browser, running the telephone. The user interface can be a VoiceXML browser, running
skipping to change at page 12, line 33 skipping to change at page 12, line 34
kick off the real interaction. Once the trigger is received, the kick off the real interaction. Once the trigger is received, the
application connects the user to a client-remote user interface that application connects the user to a client-remote user interface that
can play announements, collect more information, and so on. can play announements, collect more information, and so on.
The benefit of flip-flopping between a client-local and client-remote The benefit of flip-flopping between a client-local and client-remote
user interface is cost. The client-local user interface will user interface is cost. The client-local user interface will
eliminate the need to send media streams into the network just to eliminate the need to send media streams into the network just to
wait for the user to press the pound key on the keypad. wait for the user to press the pound key on the keypad.
The Keypad Markup Language (KPML) was designed to support exactly The Keypad Markup Language (KPML) was designed to support exactly
this kind of need [10]. It models the keypad on a phone, and allows this kind of need [6]. It models the keypad on a phone, and allows an
an application to be informed when any sequence of keys have been application to be informed when any sequence of keys have been
pressed. However, KPML has no presentation component. Since user pressed. However, KPML has no presentation component. Since user
interfaces generally require a response to user input, the interfaces generally require a response to user input, the
presentation will need to be done using a client-remote user presentation will need to be done using a client-remote user
interface that gets instantiated as a result of the trigger. interface that gets instantiated as a result of the trigger.
It is tempting to use a hybrid model, where a prompt-and-collect It is tempting to use a hybrid model, where a prompt-and-collect
application is implemented by using a client-remote user interface application is implemented by using a client-remote user interface
that plays the prompts, and a client-local user interface, described that plays the prompts, and a client-local user interface, described
by KPML, that collects digits. However, this only complicates the by KPML, that collects digits. However, this only complicates the
application. Firstly, the keypad input will be sent to both the media application. Firstly, the keypad input will be sent to both the media
skipping to change at page 14, line 9 skipping to change at page 14, line 9
agents in a dialog (for a client-local user interface), or within a agents in a dialog (for a client-local user interface), or within a
network element (for a client-remote user interface). If a network element (for a client-remote user interface). If a
client-local user interface is to be used, the application needs to client-local user interface is to be used, the application needs to
determine whether or not the user agent is capable of supporting a determine whether or not the user agent is capable of supporting a
client-local user interface, and in what format. In this framework, client-local user interface, and in what format. In this framework,
all client-local user interface components are described by a markup all client-local user interface components are described by a markup
language. A markup language describes a logical flow of presentation language. A markup language describes a logical flow of presentation
of information to the user, collection of information from the user, of information to the user, collection of information from the user,
and transmission of that information to an application. Examples of and transmission of that information to an application. Examples of
markup languages include HTML, WML, VoiceXML, and the Keypad Markup markup languages include HTML, WML, VoiceXML, and the Keypad Markup
Language (KPML) [10]. Language (KPML) [6].
Unlike an application instance, which has very flexible lifetimes, a Unlike an application instance, which has very flexible lifetimes, a
user interface component has a very fixed lifetime. A user interface user interface component has a very fixed lifetime. A user interface
component is always associated with a dialog. The user interface component is always associated with a dialog. The user interface
component can be created at any point after the dialog (or early component can be created at any point after the dialog (or early
dialog) is created. However, the user interface component terminates dialog) is created. However, the user interface component terminates
when the dialog terminates. The user interface component can be when the dialog terminates. The user interface component can be
terminated earlier by the user agent, and possibly by the terminated earlier by the user agent, and possibly by the
application, but its lifetime never exceeds that of its associated application, but its lifetime never exceeds that of its associated
dialog. dialog.
There are two ways to create a client local interface component. For There are two ways to create a client local interface component. For
interface components that are presentation capable, the application interface components that are presentation capable, the application
sends a REFER [9] request to the user agent. The Refer-To header sends a REFER [5] request to the user agent. The Refer-To header
field contains an HTTP URI that points to the markup for the user field contains an HTTP URI that points to the markup for the user
interface. For interface components that are presentation free (such interface. For interface components that are presentation free (such
as those defined by KPML), the application sends a SUBSCRIBE request as those defined by KPML), the application sends a SUBSCRIBE request
to the user agent. The body of the SUBSCRIBE request contains a to the user agent. The body of the SUBSCRIBE request contains a
filter, which, in this case, is the markup that defines when filter, which, in this case, is the markup that defines when
information is to be sent to the application in a NOTIFY. information is to be sent to the application in a NOTIFY.
If a user interface component is to be instantiated in the network, If a user interface component is to be instantiated in the network,
there is no need to determine the capabilities of the device on which there is no need to determine the capabilities of the device on which
the user interface is instantiated. Presumably, it is on a device on the user interface is instantiated. Presumably, it is on a device on
skipping to change at page 16, line 5 skipping to change at page 16, line 5
Indeed, for presentation free user interfaces, there are two Indeed, for presentation free user interfaces, there are two
different modalities of operation. The first is called "one shot". In different modalities of operation. The first is called "one shot". In
the one-shot role, the markup waits for a user to enter some the one-shot role, the markup waits for a user to enter some
information, and when they do, reports this event to the application. information, and when they do, reports this event to the application.
The application then does something, and the markup is no longer The application then does something, and the markup is no longer
used. In the other modality, called "monitor", the markup stays used. In the other modality, called "monitor", the markup stays
permanently resident, and reports information back to an application permanently resident, and reports information back to an application
until termination of the associated dialog. until termination of the associated dialog.
5. Client Local Interfaces 5. Application Behavior
The behavior of an application within this framework depends on
whether it seeks to use a client-local or client-remote user
interface.
5.1 Client Local Interfaces
One key component of this framework is support for client local user One key component of this framework is support for client local user
interfaces. interfaces.
5.1 Discovering Capabilities 5.1.1 Discovering Capabilities
A client local user interface can only be instantiated on a user A client local user interface can only be instantiated on a user
agent if the user agent supports that type of user interface agent if the user agent supports that type of user interface
component. Support for client local user interface components is component. Support for client local user interface components is
declared by both the UAC and a UAS in its Accept, Allow, Contact and declared by both the UAC and a UAS in its Accept, Allow, Contact and
Allow-Event header fields. If the Allow header field indicates Allow-Event header fields of dialog-initiating requests and
support for the SIP SUBSCRIBE method, and the Allow-Event header responses. If the Allow header field indicates support for the SIP
field indicates support for the [TBD] package, it means that the UA SUBSCRIBE method, and the Allow-Event header field indicates support
can instantiate presentation free user interface components. The for the kpml package [6], and the Contact header field indicates that
specific markup languages that can be supported are indicated in the its URI is a GRUU [9] it means that the UA can instantiate
Accept header field. If the Allow header field indicates support for presentation free user interface components. In this case, the
the SIP REFER method, and the Contact header field contains UA application MAY push presentation free user interface components
capabilities [6] that indicate support for the HTTP URI scheme, it according to the rules of Section 5.1.2. The specific markup
means that the UA supports presentation capable user interface languages that can be supported are indicated in the Accept header
components. The specific markups that are supported are indicated in field.
the Allow header field.
The Accept, Allow, Contact and Allow-Event header fields are sent in
dialog initiating requests and responses. As a result, an application
will generally need to wait for a dialog-initiating request or
response to pass by before it can examine the contents of these
headers and determine what kinds of user interface components the UA
supports. Because these headers are examined by intermediaries, a UA
that wishes to support client local user interfaces should not
encrypt them.
5.2 Pushing an Initial Interface Component If the Allow header field indicates support for the SIP REFER method,
and the Contact header field contains UA capabilities [4] that
indicate support for the HTTP URI scheme, it means that the UA
supports presentation capable user interface components. In this
case, the application MAY push presentation capable user interface
components to the client according to the rules of Section 5.1.2. The
specific markups that are supported are indicated in the Accept
header field.
Once the application has determined that the UA is capable of 5.1.2 Pushing an Initial Interface Component
supporting client local user interfaces, the next step is for the
application to push an interface component to the user device.
Generally, we anticipate that interface components will need to be Generally, we anticipate that interface components will need to be
created at various different points in a SIP session. Clearly, they created at various different points in a SIP session. Clearly, they
will need to be pushed during session setup, or after the session is will need to be pushed during session setup, or after the session is
established. A user interface component is always associated with a established. A user interface component is always associated with a
specific dialog, however. specific dialog, however.
To create a presentation capable UI component on the UA, the An application MUST NOT attempt to push a user interface component to
application sends a REFER request to the UA. This REFER is sent to a user agent until it has determined that the user agent has the
the Globally Routable UA URI (GRUU) [12] advertised by that UA in the neccesary capabilities and a dialog has been created. In the case of
Contact header field of the dialog initiating request or response a UAC, this means that an application MUST NOT push a user interface
sent by that UA. This means that any UA which wants to support this component for an INVITE initiated dialog until the application has
framework has to support GRUUs. Note that this REFER request creates seen a 200 OK followed by an ACK. For SUBSCRIBE initiated dialogs, it
a separate dialog between the application and the UA. MUST NOT push a user interface component until the application has
seen a 200 OK to the NOTIFY request. For a user interface component
on a UAS, the application MUST NOT push a user interface component
for an INVITE initiated dialog until it has seen a 200 OK from the
UAS. For a SUBSCRIBE initiated dialog, it MUST NOT push a user
interface component until it has seen a NOTIFY request from the
notifier.
OPEN ISSUE: This document has evolved into one that really is To create a presentation capable UI component on the UA, the
describing normative behavior. We could split the document in application sends a REFER request to the UA. This REFER MUST be sent
half, one of which is an informational framework, and the other is to the Globally Routable UA URI (GRUU) [9] advertised by that UA in
a standards track mechanism document. Or, we could have a single the Contact header field of the dialog initiating request or response
framework document that just happens to be standards track. sent by that UA. Note that this REFER request creates a separate
dialog between the application and the UA. The Refer-To header field
of the REFER request MUST contain an HTTP URI that references the
markup document to be fetched.
The Refer-To header field of the REFER request contains an HTTP URI OPEN ISSUE: The refer needs to provide a context to the UA, and in
that references the markup document to be fetched. The application particular, identify the specific dialog that this component is
should identify itself in the From header field of the request. Once associated with. There is no obvious candidate for this when REFER
the markup is fetched, the UA renders it and the user can interact is used. The former proposal, of using a grid, cannot work because
with it as needed. of forking.
To create a presentation free user interface component, the To create a presentation free user interface component, the
application sends a SUBSCRIBE request to the UA. The SUBSCRIBE is application sends a SUBSCRIBE request to the UA. The SUBSCRIBE MUST
sent to the GRUU advertised by the UA. Note that this SUBSCRIBE be sent to the GRUU advertised by the UA. This SUBSCRIBE request
request creates a separate dialog. The SUBSCRIBE request is for the creates a separate dialog. The SUBSCRIBE request MUST use the KPML
[TBD] event package. The body of the SUBSCRIBE request contains the [6] event package. The Event header field MUST contain parameters
markup document that defines the conditions under which the which identify the particular dialog that the interface component is
application wishes to be notified of user input. The application being instantiated against. The body of the SUBSCRIBE request
should identify itself in the From header field of the request. contains the markup document that defines the conditions under which
the application wishes to be notified of user input.
Since the UI components are bound to the lifetime of the dialog, the In both cases, the REFER or SUBSCRIBE request SHOULD include a
UA needs to know which dialog each component is associated with. To display name in the From header field which identifies the name of
make this determination, a UA MUST use a unique GRUU in the Contact the application. For example, a prepaid calling card might include a
header field of each dialog. This uniqueness is across dialogs From header field which looks like:
terminating at that UA. This uniqueness can be achieved by using the
grid URI parameter defined in [12].
OPEN ISSUE: This would require a UA to always use a unique GRUU in From: "Prepaid Calling Card" <sip:prepaid@example.com>
each dialog, since it doesnt know whether an application will try
to create a UI component. Is that OK?
To authenticate themselves, it is RECOMMENDED that applications use To authenticate themselves, it is RECOMMENDED that applications use
the SIP identity mechanism [11] in the REFER or SUBSCRIBE requests the SIP identity mechanism [7] in the REFER or SUBSCRIBE requests
they generate. A UA will need to authorize these subscriptions and they generate. This mechanism has the benefit that the signature is
refers. To do this, a UA SHOULD accept any REFER or SUBSCRIBE sent to over an authenticated identity body [8], which includes the From
the GRUU it used for that dialog. This would imply that only elements header field. As such, the client can obtain cryptographic assurances
privy to the INVITE requests and responses could send a REFER or about the service provider (the domain in the From header field)
SUBSCRIBE to the UA. The usage of the sips URI scheme provides along with the name of the application.
cryptographic assurances that only elements on the call setup path
could see such information. Therefore, it is RECOMMENDED that UAs
compliant to this specification use sips whenever possible. A client
SHOULD use grid parameters with sufficient randomness to eliminate
the possibility of an attacker guessing the GRUU.
5.3 Updating an Interface Component 5.1.3 Updating an Interface Component
Once a user interface component has been created on a client, it can Once a user interface component has been created on a client, it can
be updated. The means for updating it depends on the type of UI be updated. The means for updating it depends on the type of UI
component. component.
Presentation capable UI components are updated using techniques Presentation capable UI components are updated using techniques
already in place for those markups. In particular, user input will already in place for those markups. In particular, user input will
cause an HTTP POST operation to push the user input to the cause an HTTP POST operation to push the user input to the
application. The result of the POST operation is a new markup that application. The result of the POST operation is a new markup that
the UI is supposed to use. This allows the UI to updated in response the UI is supposed to use. This allows the UI to updated in response
to user action. Some markups, such as HTML, provide the ability to to user action. Some markups, such as HTML, provide the ability to
force a refresh after a certain period of time, so that the UI can be force a refresh after a certain period of time, so that the UI can be
updated without user input. Those mechanisms can be used here as updated without user input. Those mechanisms can be used here as
well. However, there is no support for an asynchronous push of an well. However, there is no support for an asynchronous push of an
updated UI component from the appliciation to the user agent. A new updated UI component from the appliciation to the user agent. A new
REFER request to the same GRUU would create a new UI component rather REFER request to the same GRUU would create a new UI component rather
than updating any components already in place. than updating any components already in place.
For presentation free UI, the story is different. The application can For presentation free UI, the story is different. The application MAY
update the filter at any time by generating a SUBSCRIBE refresh with update the filter at any time by generating a SUBSCRIBE refresh with
the new filter. The UA will immediately begin using this new filter. the new filter. The UA will immediately begin using this new filter.
5.4 Terminating an Interface Component 5.1.4 Terminating an Interface Component
User interface components have a well defined lifetime. They are User interface components have a well defined lifetime. They are
created when the component is first pushed to the client. User created when the component is first pushed to the client. User
interface components are always associated with the SIP dialog on interface components are always associated with the SIP dialog on
which they were pushed. As such, their lifetime is bound by the which they were pushed. As such, their lifetime is bound by the
lifetime of the dialog. When the dialog ends, so does the interface lifetime of the dialog. When the dialog ends, so does the interface
component. component.
However, there are some cases where the application would like to However, there are some cases where the application would like to
terminate the user interface component before its natural termination terminate the user interface component before its natural termination
point. For presentation capable user interfaces, this is not point. For presentation capable user interfaces, this is not
possible. For presentation free user interfaces, the application can possible. For presentation free user interfaces, the application MAY
terminate the component by sending a SUBSCRIBE with Expires equal to terminate the component by sending a SUBSCRIBE with Expires equal to
zero. This terminates the subscription, which removes the UI zero. This terminates the subscription, which removes the UI
component. component.
A client can remove a UI component at any time. For presentation A client can remove a UI component at any time. For presentation
aware UI, this is analagous to the user dismissing the web form aware UI, this is analagous to the user dismissing the web form
window. There is no mechanism provided for reporting this kind of window. There is no mechanism provided for reporting this kind of
event to the application. The applicatio needs to be prepared to time event to the application. The applicatio MUST be prepared to time
out, and never receive input from a user. For presentation free user out, and never receive input from a user. For presentation free user
interfaces, the UA can explicitly terminate the subscription. This interfaces, the UA can explicitly terminate the subscription. This
will result in the generation of a NOTIFY with a Subscription-State will result in the generation of a NOTIFY with a Subscription-State
header field equal to terminated. header field equal to "terminated".
6. Client Remote Interfaces 5.2 Client Remote Interfaces
As an alternative to, or in conjunction with client local user As an alternative to, or in conjunction with client local user
interfaces, an application can make use of client remote user interfaces, an application can make use of client remote user
interfaces. These user interfaces can execute co-resident with the interfaces. These user interfaces can execute co-resident with the
application itself (in which case no standardized interfaces between application itself (in which case no standardized interfaces between
the UI and the application need to be used), or it can run the UI and the application need to be used), or it can run
separately. This framework assumes that the user interface runs on a separately. This framework assumes that the user interface runs on a
host that has a sufficient trust relationship with the application. host that has a sufficient trust relationship with the application.
As such, the means for instantiating the user interface is not As such, the means for instantiating the user interface is not
considered here. considered here.
The primary issue is to connect the user device to the remote user The primary issue is to connect the user device to the remote user
interface. Doing so requires the manipulation of media streams interface. Doing so requires the manipulation of media streams
between the client and the user interface. Such manipulation can only between the client and the user interface. Such manipulation can only
be done by user agents. There are two types of user agent be done by user agents. There are two types of user agent
applications within this framework - originating/terminating applications within this framework - originating/terminating
applications, and intermediary applications. applications, and intermediary applications.
6.1 Originating and Terminating Applications 5.2.1 Originating and Terminating Applications
Originating and terminating applications are applications which are Originating and terminating applications are applications which are
themselves the originator or the final recipient of a SIP invitation. themselves the originator or the final recipient of a SIP invitation.
They are "pure" user agent applications - not back-to-back user They are "pure" user agent applications - not back-to-back user
agents. The classic example of such an application is an interactive agents. The classic example of such an application is an interactive
voice response (IVR) application, which is typically a terminating voice response (IVR) application, which is typically a terminating
application. Its a terminating application because the user application. Its a terminating application because the user
explicitly calls it; i.e., it is the actual called party. An example explicitly calls it; i.e., it is the actual called party. An example
of an originating application is a wakeup call application, which of an originating application is a wakeup call application, which
calls a user at a specified time in order to wake them up. calls a user at a specified time in order to wake them up.
Because originating and terminating applications are a natural Because originating and terminating applications are a natural
termination point of the dialog, manipulation of the media session by termination point of the dialog, manipulation of the media session by
the application is trivial. Traditional SIP techniques for adding and the application is trivial. Traditional SIP techniques for adding and
removing media streams, modifying codecs, and changing the address of removing media streams, modifying codecs, and changing the address of
the recipient of the media streams, can be applied. Similarly, the the recipient of the media streams, can be applied. Similarly, the
application can direclty authenticate itself to the user through S/ application can directly authenticate itself to the user through S/
MIME, since it is the peer UA in the dialog. MIME, since it is the peer UA in the dialog.
6.2 Intermediary Applications 5.2.2 Intermediary Applications
Intermediary application are, at the same time, more common than Intermediary applications are, at the same time, more common than
originating/terminating applications, and more complex. Intermediary originating/terminating applications, and more complex. Intermediary
applications are applications that are neither the actual caller or applications are applications that are neither the actual caller or
called party. Rather, they represent a "third party" that wishes to called party. Rather, they represent a "third party" that wishes to
interact with the user. The classic example is the ubiquitous interact with the user. The classic example is the ubiquitous
pre-paid calling card application. pre-paid calling card application.
In order for the intermediary application to add a client remote user In order for the intermediary application to add a client remote user
interface, it needs to manipulate the media streams of the user agent interface, it needs to manipulate the media streams of the user agent
to terminate on that user interface. This also introduces a to terminate on that user interface. This also introduces a
fundamental feature interaction issue. Since the intermediary fundamental feature interaction issue. Since the intermediary
application is not an actual participant in the call, how does the application is not an actual participant in the call, how does the
user interact with the intermediary application, and its actual peer user interact with the intermediary application, and its actual peer
in the dialog, at the same time? This is discussed in more detail in in the dialog, at the same time? This is discussed in more detail in
Section 7. Section 7.
6. User Agent Behavior
6.1 Advertising Capabilities
In order to participate in applications that make use of stimulus
interfaces, a user agent needs to advertise its interaction
capabilities.
If a user agent supports presentation capable user interfaces, it
MUST support the REFER method. It MUST include, in all dialog
initiating requests and responses, an Allow header field that
includes the REFER method. Furthermore, the UA MUST support the SIP
user agent capabilities specification [4]. The UA MUST be capable of
being REFER'd to an HTTP URI. It MUST include, in the Contact header
field of its dialog initiating requests and responses, a "schemes"
Contact header field parameter include the http URI scheme. The UA
MUST include, in all dialog initiating requests and responses, an
Accept header field listing all of those markups supported by the UA.
It is RECOMMENDED that all user agents that support presentation
capable user interfaces support HTML.
If a user agent supports presentation free user interfaces, it MUST
support the SUBSCRIBE [2] method. It MUST support the KPML [6] event
package. It MUST include, in all dialog initiating requests and
responses, an Allow header field that includes the SUBSCRIBE method.
It MUST include, in all dialog initiating requests and responses, an
Allow-Events header field that lists the KPML event package. The UA
MUST include, in all dialog initiating requests and responses, an
Accept header field listing those event filters it supports. At a
minimum, a UA MUST support the "application/kpml+xml" MIME type.
For either presentation free or presentation capable user interfaces,
the user agent MUST support the GRUU [9] specification. The Contact
header field in all dialog initiating requests and responses MUST
contain a GRUU. The UA MUST include a Supported header field which
contains the gruu option tag.
Because these headers are examined by proxies which may be executing
applications, a UA that wishes to support client local user
interfaces should not encrypt them.
6.2 Receiving User Interface Components
Once the UA has created a dialog (in either the early or confirmed
states), it MUST be prepared to receive a SUBSCRIBE or REFER request
against its GRUU. If the UA receives such a request prior to the
establishment of a dialog, the UA MUST reject the request.
A user agent SHOULD attempt to authenticate the sender of the
request. The sender will generally be an application, and therefore
the user agent is unlikely to ever have a shared secret with it,
making digest authentication useless. However, the REFER or SUBSCRIBE
request should have a SIP authenticated identity body [8] that
conveys the identity of the application [7]. If such a body is not
present, and no alternative means of identification (such as
P-Asserted-ID [11]) is present, the user agent MAY reject the request
with a 403 response.
Next, the user agent authorizes the application. An application is
authorized to instantiate a user interface component if the
application was resident within an element on the path of the dialog
initiating request. An application proves to the user agent that it
was on the path by presenting it with the dialog identifiers in the
SUBSCRIBE or REFER request. In the case of SUBSCRIBE, those
identifiers are present in the Event header field [6]. [[EDITORS
NOTE: Fill in here once we know how this is done for REFER.]]
Because of the dialog identifiers serve as a tool for authorization,
a user agent compliant to this framework MUST use dialog identifiers
that are cryptographically random, with at least 128 bits of
randomness. It is recommended that this randomness be split between
the Call-ID and From header field tag in the case of a UAC.
Furthermore, to ensure that only applications resident in on-path
elements can instantiate a user interface component, a user agent
compliant to this specification SHOULD use the sips URI scheme for
all dialogs it initiates. This will guarantee secure links between
all of the elements on the signaling path.
If an application does not present a valid dialog identifier in its
REFER or SUBSCRIBE request, the user agent MUST reject the request
with a 403 response. A user agent MAY apply any other policies in
addition to (but not instead of) the ones specified here in order to
authorize the creation of the user interface component. One such
mechanism would be to prompt the user, informing them of the identity
of the application. If an authorization policy requires user
interaction, the user agent SHOULD respond to the SUBSCRIBE or REFER
request with a 202. In the case of SUBSCRIBE, if authorization is not
granted, the user agent SHOULD generate a NOTIFY to terminate the
subscription. In the case of REFER, the user agent MUST NOT act upon
the URI in the Refer-To header field until user authorization was
obtained.
If a REFER request to an HTTP URI was authorized, the UA executes the
URI and fetches the content to be rendered to the user. This
instantiates a presentation capable user interface component. If a
SUBSCRIBE was authorized, a presentation free user interface
component was instantiated.
6.3 Mapping User Input to User Interface Components
Once the user interface components are instantiated, the user agent
must direct user input to the appropriate component. In the case of
presentation capable user interfaces, this process is known as focus
selection. It is done by means that are specific to the user
interface on the device. In the case of a PC, for example, the window
manager would allow the user to select the appropriate user interface
component that their input is directed to.
For presentation free user interfaces, the situation is more
complicated. In some cases, the device may support a mechanism that
allows the user to select a "line", and thus the associated dialog.
Any user input on the keypad while this line is selected are fed to
the user interface components associated with that dialog.
TODO: Need to consider the case where the user interface is
co-resident with the UAC, but the user device is separated from
the UAC, and occurs through some other protocol, and the user
interface and application are semi-trusted. Classic case is when
the UAC is a PSTN gateway.
6.4 Receiving Updates to User Interface Components
For presentation capable user interfaces, updates to the user
interface occur in ways specific to that user interface component. In
the case of HTML, for example, the document can tell the client to
fetch a new document periodically. However, this framework does not
provide any additional machinery to asynchronously push a new user
interface component to the client.
For presentation free user interfaces, an application can push an
update to a component by sending a SUBSCRIBE refresh with a new
filter. The user agent will process these according to the rules of
the event package.
6.5 Terminating a User Interface Component
Termination of a presentation capable user interface component is a
trivial procedure. The user agent merely dismisses the window (or
equivalent). The fact that the component is dismissed is not
communicated to the application. As such, it is purely a local
matter.
In the case of a presentation free user interface, if the user wishes
to cease interacting with the application, it SHOULD generate a
NOTIFY request with a Subscription-State equal to "terminated" and a
reason of "rejected". This tells the application that the component
has been removed, and that it should not attempt to re-subscribe.
7. Inter-Application Feature Interaction 7. Inter-Application Feature Interaction
The inter-application feature interaction problem is inherent to The inter-application feature interaction problem is inherent to
stimulus signaling. Whenever there are multiple applications, there stimulus signaling. Whenever there are multiple applications, there
are multiple user interfaces. When the user provides an input, to are multiple user interfaces. When the user provides an input, to
which user interface is the input destined? That question is the which user interface is the input destined? That question is the
essence of the inter-application feature interaction problem. essence of the inter-application feature interaction problem.
Inter-application feature interaction is not an easy problem to Inter-application feature interaction is not an easy problem to
resolve. For now, we consider separately the issues for client-local resolve. For now, we consider separately the issues for client-local
skipping to change at page 22, line 35 skipping to change at page 26, line 35
provide a better mechanism for feature interaction resolution than provide a better mechanism for feature interaction resolution than
the PSTN on devices which have the same user interface as they do on the PSTN on devices which have the same user interface as they do on
the PSTN. Devices with better displays, such as PCs or screen phones, the PSTN. Devices with better displays, such as PCs or screen phones,
can benefit from the capabilities of this framework, allowing the can benefit from the capabilities of this framework, allowing the
user to determine which application they are interacting with. user to determine which application they are interacting with.
Indeed, when a user provides input on a focusless device, the input Indeed, when a user provides input on a focusless device, the input
must be passed to all client local user interfaces, AND all client must be passed to all client local user interfaces, AND all client
remote user interfaces, unless the markup tells the UI to suppress remote user interfaces, unless the markup tells the UI to suppress
the media. In the case of KPML, key events are passed to remote user the media. In the case of KPML, key events are passed to remote user
interfaces by encoding them in RFC 2833 [8]. Of course, since a interfaces by encoding them in RFC 2833 [15]. Of course, since a
client cannot determine if a media stream terminates in a remote user client cannot determine if a media stream terminates in a remote user
interface or not, these key events are passed in all audio media interface or not, these key events are passed in all audio media
streams unless the "Q" digit is used to suppress. streams unless the "Q" digit is used to suppress.
7.2 Client-Remote UI 7.2 Client-Remote UI
When the user interfaces run remotely, the determination of focus can When the user interfaces run remotely, the determination of focus can
be much, much harder. There are many architectures that can be be much, much harder. There are many architectures that can be
deployed to handle the interaction. None are ideal. However, all are deployed to handle the interaction. None are ideal. However, all are
beyond the scope of this specification. beyond the scope of this specification.
skipping to change at page 24, line 5 skipping to change at page 28, line 5
such as those described by Speech Application Language Tags (SALT). such as those described by Speech Application Language Tags (SALT).
As an example, consider a user interface where a user can either As an example, consider a user interface where a user can either
press a labeled button to make a selection, or listen to a prompt, press a labeled button to make a selection, or listen to a prompt,
and speak the desired selection. Ideally, when the user presses the and speak the desired selection. Ideally, when the user presses the
button, the prompt should cease immediately, since both of them were button, the prompt should cease immediately, since both of them were
targeted at collecting the same information in parallel. Such targeted at collecting the same information in parallel. Such
interactions are best handled by markups which natively support such interactions are best handled by markups which natively support such
interactions, such as SALT, and thus require no explicit support from interactions, such as SALT, and thus require no explicit support from
this framework. this framework.
9. Examples 9. Example Call Flow
TODO. This section shows the operation of a call recording application.
This application allows a user to record the media in their call by
clicking on a button in a web form. The application uses a
presentation capable user interface component that is pushed to the
caller.
A Recording App B
|(1) INVITE | |
|----------------------->| |
| |(2) INVITE |
| |----------------------->|
| |(3) 200 OK |
| |<-----------------------|
|(4) 200 OK | |
|<-----------------------| |
|(5) ACK | |
|----------------------->| |
| |(6) ACK |
| |----------------------->|
|(7) REFER | |
|<-----------------------| |
|(8) 200 OK | |
|----------------------->| |
|(9) NOTIFY | |
|----------------------->| |
|(10) 200 OK | |
|<-----------------------| |
|(11) HTTP GET | |
|----------------------->| |
|(12) 200 OK | |
|<-----------------------| |
|(13) HTTP POST | |
|----------------------->| |
|(14) 200 OK | |
|<-----------------------| |
Figure 3
First, the caller, A, sends an INVITE to setup a call (message 1).
Since the caller supports the framework, and can handle presentation
capable user interface components, it includes the Supported header
field indicating the GRUU is understood, Allow indicating that REFER
is understood, and a Contact header field that includes the "schemes"
header field parameter.
INVITE sip:B@example.com SIP/2.0
Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8
From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com>
Call-ID: faif9a@host.example.com
CSeq: 1 INVITE
Supported: gruu
Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
Contact: <sip:bad998asd8asd0000a@example.com>;schemes="http,sip"
Content-Length: ...
Content-Type: application/sdp
--SDP not shown--
The proxy acts as a recording server, and forwards the INVITE to the
called party (message 2):
INVITE sip:B@pc.example.com SIP/2.0
Record-Route: <sip:app.example.com;lr>
Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK97sh
Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8
From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com>
Call-ID: faif9a@host.example.com
CSeq: 1 INVITE
Supported: gruu
Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
Contact: <sip:bad998asd8asd0000a@example.com>;schemes="http,sip"
Content-Length: ...
Content-Type: application/sdp
--SDP not shown--
B accepts the call with a 200 OK (message 3). It does not support the
framework, and so the various header fields are not present.
SIP/2.0 200 OK
Record-Route: <sip:app.example.com;lr>
Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK97sh
Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8
From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com>;tag=7777
Call-ID: faif9a@host.example.com
CSeq: 1 INVITE
Contact: <sip:B@pc.example.com>
Content-Length: ...
Content-Type: application/sdp
--SDP not shown--
This 200 OK is passed back to the caller (message 4):
SIP/2.0 200 OK
Record-Route: <sip:app.example.com;lr>
Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8
From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com>;tag=7777
Call-ID: faif9a@host.example.com
CSeq: 1 INVITE
Contact: <sip:B@pc.example.com>
Content-Length: ...
Content-Type: application/sdp
--SDP not shown--
The caller generates an ACK (message 5).
ACK sip:B@pc.example.com
Route: <sip:app.example.com;lr>
Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz9
From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com>;tag=7777
Call-ID: faif9a@host.example.com
CSeq: 1 ACK
The ACK is forwarded to the called party (message 6).
ACK sip:B@pc.example.com
Via: SIP/2.0/UDP app.example.com;branch=z9hG4bKh7s
Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz9
From: Caller <sip:A@example.com>;tag=kkaz-
To: Callee <sip:B@example.com>;tag=7777
Call-ID: faif9a@host.example.com
CSeq: 1 ACK
Now, the application decides to push a user interface component to
user A. So, it sends it a REFER request (message 7):
REFER sip:bad998asd8asd0000a@example.com SIP/2.0
Refer-To: http://app.example.com/script.pl
Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK9zh6
From: Recorder Application <sip:app.example.com>;tag=jhgf
To: Caller <sip:A@example.com>
Call-ID: 66676776767@app.example.com
CSeq: 1 REFER
Event: refer
Contact: <sip:sip:app.example.com>
The REFER is answered by a 200 OK (message 8).
SIP/2.0 200 OK
Refer-To: http://app.example.com/script.pl
Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK9zh6
From: Recorder Application <sip:app.example.com>;tag=jhgf
To: Caller <sip:A@example.com>;tag=pqoew
Call-ID: 66676776767@app.example.com
Supported: gruu
Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
Contact: <sip:bad998asd8asd0000a@example.com>;schemes="http,sip"
CSeq: 1 REFER
User A sends a NOTIFY (message 9):
NOTIFY sip:app.example.com SIP/2.0
Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9320394238995
To: Recorder Application <sip:app.example.com>;tag=jhgf
From: Caller <sip:A@example.com>;tag=pqoew
Call-ID: 66676776767@app.example.com
CSeq: 1 NOTIFY
Max-Forwards: 70
Event: refer;id=93809824
Subscription-State: active;expires=3600
Contact: <sip:bad998asd8asd0000a@example.com>;schemes="http,sip"
Content-Type: message/sipfrag;version=2.0
Content-Length: 20
SIP/2.0 100 Trying
And the recording server responds with a 200 OK (message 10)
SIP/2.0 200 OK
Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9320394238995
To: Recorder Application <sip:app.example.com>;tag=jhgf
From: Caller <sip:A@example.com>;tag=pqoew
Call-ID: 66676776767@app.example.com
CSeq: 1 NOTIFY
The caller, A, authorizes the application. It then acts on the
Refer-To URI, fetching the script from app.example.com (message 11).
The response, message 12, contains a web application that the user
can click on to enable recording. When the user clicks on the link
(message 13), the results are posted to the server, and an updated
display is provided (message 14).
10. Security Considerations 10. Security Considerations
There are many security considerations associated with this There are many security considerations associated with this
framework. It allows applications in the network to instantiate user framework. It allows applications in the network to instantiate user
interface components on a client device. Such instantiations need to interface components on a client device. Such instantiations need to
be from authenticated applications, and also need to be authorized to be from authenticated applications, and also need to be authorized to
place a UI into the client. Indeed, the stronger requirement is place a UI into the client. Indeed, the stronger requirement is
authorization. It is not so important to know that name of the authorization. It is not so important to know that name of the
provider of the application, but rather, that the provider is provider of the application, but rather, that the provider is
skipping to change at page 27, line 5 skipping to change at page 35, line 5
This document was produced as a result of discussions amongst the This document was produced as a result of discussions amongst the
application interaction design team. All members of this team application interaction design team. All members of this team
contributed significantly to the ideas embodied in this document. The contributed significantly to the ideas embodied in this document. The
members of this team were: members of this team were:
Eric Burger Eric Burger
Cullen Jennings Cullen Jennings
Robert Fairlie-Cuninghame Robert Fairlie-Cuninghame
Informative References Normative References
[1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP:
Session Initiation Protocol", RFC 3261, June 2002. Session Initiation Protocol", RFC 3261, June 2002.
[2] McGlashan, S., Lucas, B., Porter, B., Rehor, K., Burnett, D., [2] Roach, A., "Session Initiation Protocol (SIP)-Specific Event
Notification", RFC 3265, June 2002.
[3] McGlashan, S., Lucas, B., Porter, B., Rehor, K., Burnett, D.,
Carter, J., Ferrans, J. and A. Hunt, "Voice Extensible Markup Carter, J., Ferrans, J. and A. Hunt, "Voice Extensible Markup
Language (VoiceXML) Version 2.0", W3C CR Language (VoiceXML) Version 2.0", W3C CR CR-voicexml20-20030220,
CR-voicexml20-20030220, February 2003. February 2003.
[3] Day, M., Rosenberg, J. and H. Sugano, "A Model for Presence and [4] Rosenberg, J., "Indicating User Agent Capabilities in the
Instant Messaging", RFC 2778, February 2000. Session Initiation Protocol (SIP)",
draft-ietf-sip-callee-caps-03 (work in progress), January 2004.
[4] Rosenberg, J., "A Framework for Conferencing with the Session [5] Sparks, R., "The Session Initiation Protocol (SIP) Refer
Initiation Protocol", Method", RFC 3515, April 2003.
draft-ietf-sipping-conferencing-framework-00 (work in
progress), May 2003.
[5] Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Caller [6] Burger, E., "Keypad Stimulus Protocol (KPML)",
Preferences for the Session Initiation Protocol (SIP)", draft-ietf-sipping-kpml-02 (work in progress), February 2004.
draft-ietf-sip-callerprefs-09 (work in progress), July 2003.
[6] Rosenberg, J., "Indicating User Agent Capabilities in the [7] Peterson, J., "Enhancements for Authenticated Identity
Session Initiation Protocol (SIP)", Management in the Session Initiation Protocol (SIP)",
draft-ietf-sip-callee-caps-00 (work in progress), June 2003. draft-ietf-sip-identity-01 (work in progress), March 2003.
[7] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, [8] Peterson, J., "SIP Authenticated Identity Body (AIB) Format",
"RTP: A Transport Protocol for Real-Time Applications", RFC draft-ietf-sip-authid-body-02 (work in progress), July 2003.
1889, January 1996.
[8] Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits, [9] Rosenberg, J., "Obtaining and Using Globally Routable User Agent
Telephony Tones and Telephony Signals", RFC 2833, May 2000. (UA) URIs (GRUU) in the Session Initiation Protocol (SIP)",
draft-ietf-sip-gruu-00 (work in progress), January 2004.
[9] Sparks, R., "The Session Initiation Protocol (SIP) Refer Informative References
Method", RFC 3515, April 2003.
[10] Burger, E., "Keypad Stimulus Protocol (KPML)", [10] Day, M., Rosenberg, J. and H. Sugano, "A Model for Presence and
draft-ietf-sipping-kpml-00 (work in progress), September 2003. Instant Messaging", RFC 2778, February 2000.
[11] Peterson, J., "Enhancements for Authenticated Identity [11] Jennings, C., Peterson, J. and M. Watson, "Private Extensions
Management in the Session Initiation Protocol (SIP)", to the Session Initiation Protocol (SIP) for Asserted Identity
draft-ietf-sip-identity-01 (work in progress), March 2003. within Trusted Networks", RFC 3325, November 2002.
[12] Rosenberg, J., "Obtaining and Using Globally Routable User [12] Rosenberg, J., "A Framework for Conferencing with the Session
Agent (UA) URIs (GRUU) in the Session Initiation Protocol Initiation Protocol",
(SIP)", draft-rosenberg-sip-gruu-00 (work in progress), October draft-ietf-sipping-conferencing-framework-01 (work in
2003. progress), October 2003.
[13] Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Caller
Preferences for the Session Initiation Protocol (SIP)",
draft-ietf-sip-callerprefs-10 (work in progress), October 2003.
[14] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
"RTP: A Transport Protocol for Real-Time Applications", RFC
3550, July 2003.
[15] Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits,
Telephony Tones and Telephony Signals", RFC 2833, May 2000.
Author's Address Author's Address
Jonathan Rosenberg Jonathan Rosenberg
dynamicsoft dynamicsoft
600 Lanidex Plaza 600 Lanidex Plaza
Parsippany, NJ 07054 Parsippany, NJ 07054
US US
Phone: +1 973 952-5000 Phone: +1 973 952-5000
skipping to change at page 29, line 29 skipping to change at page 37, line 29
be obtained from the IETF Secretariat. be obtained from the IETF Secretariat.
The IETF invites any interested party to bring to its attention any The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to practice rights which may cover technology that may be required to practice
this standard. Please address the information to the IETF Executive this standard. Please address the information to the IETF Executive
Director. Director.
Full Copyright Statement Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved. Copyright (C) The Internet Society (2004). All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of Internet organizations, except as needed for the purpose of
skipping to change at page 30, line 7 skipping to change at page 38, line 7
The limited permissions granted above are perpetual and will not be The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assignees. revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement Acknowledgment
Funding for the RFC Editor function is currently provided by the Funding for the RFC Editor function is currently provided by the
Internet Society. Internet Society.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/