[Docs] [txt|pdf] [Tracker] [Email] [Nits]

Versions: 00

Internet Engineering Task Force                                   SIP WG
Internet Draft                                     Rosenberg,Mataga,Ladd
draft-rosenberg-sip-vxml-00.txt                              dynamicsoft
July 13, 2001
Expires: February 2001

               A SIP Interface to VoiceXML Dialog Servers


   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at

   To view the list Internet-Draft Shadow Directories, see


   VoiceXML is an XML based scripting language for describing voice
   dialogs. VoiceXML interpreters run within an interpreter context
   that, among other tasks, provides a call control interface for
   accessing the interpreter. It is very natural to provide a VoIP-based
   interpreter context that uses SIP and RTP to communicate with the
   outside world. In this document, we provide detailed specifications
   for a SIP/RTP based interpreter context.

1 Introduction

   VoiceXML [1] is an XML based scripting language for describing voice
   dialogs. It supports user input through speech recognition and DTMF,
   and can communicate with the user through text-to-speech or recorded
   files. VoiceXML scripts are interpreted by a VoiceXML interpreter.

Rosenberg,Mataga,Ladd                                         [Page 1]

Internet Draft                  sip-vxml                   July 13, 2001

   This interpreter, in turn, runs within an interpreter context. The
   interpreter context is the interface between the outside world and
   the interpreter. It typically handles the mechanisms by which the
   script execution begins, and by which it is fed media to drive it. It
   also provides the means for fetching documents from some form of
   document server.

   It is very natural to provide a VoiceXML interpeter context based
   purely on IP. Specifically, based on VoIP using SIP [2] and RTP [3],
   along with HTTP for document access. An incoming VoIP call triggers
   the execution of the script, fetched from a server using HTTP. The
   incoming RTP stream for the call is passed to the interpeter for
   processing, and speech generated by the interpreter is sent over RTP
   to the called party. We call a pure IP-based VoiceXML system an "IP
   dialog server", or just "dialog server".

   Dialog servers are a key part of the application story for SIP-based
   networks, as described in the SIP application component architecture
   [4]. That document describes SIP-based dialog servers, and provides a
   high level overview of how the SIP interface works. This document
   provides a stand-alone, self-contained, more thorough description of
   a SIP-based VoIP VoiceXML interpreter context.

2 Script Initiation

   The script execution begins when a session is established using an
   INVITE request.

2.1 Script Naming

   In SIP, the request-URI identifies the user or service that the call
   is destined for. In the case of a dialog server, the dialog itself is
   the target for the call. As such, the request URI should contain the
   identifier for this dialog. This is consistent with the Request-URI
   service invocation model of RFC 3087 [5]. This URL can be in one of
   two formats. In the first, the VoiceXML script is identified directly
   by an HTTP URL. In the second, the script is not specified. Rather,
   the dialog server uses its configuration to map the incoming request
   to a specific script. The format for the Request-URI in either case

   Request-URI      =  "sip:" service-ID "." dialog-type ["." dialog-specific]
                       "@" hostport url-parameters [headers]
   service-ID       =  "dialog" | extension-token
   dialog-type      =  "vxml" | service-token
   dialog-specific  =  vxml-specific | service-token

Rosenberg,Mataga,Ladd                                         [Page 2]

Internet Draft                  sip-vxml                   July 13, 2001

   service-token    =  1*(alphanum | "-" | "!" | "%" | "*"
                       | "_" | "+" | "`" | "'" | "~{}" )
   vxml-specific    =  user-unreserved | unreserved | escaped

   Since the request URI can indicate a request for a variety of
   different services, of which a dialog server is only one type, the
   request URI first begins with a service identifier, that indicates
   the basic service required. This document specifies that dialog
   servers are addressed by having the first part of the username in the
   request-URI contain the service identifier "dialog" to indicate that
   a dialog service is requested. This is followed by a period, and
   after that, an identifier that indicates the means by which the
   dialog is specified. Currently, one mechanism is defined - a VoiceXML
   script. Other tokens can be used to indicate different mechanisms
   (note that service-token is identical to the BNF for token from RFC
   2543, except that the "." character is disallowed). After that comes
   an optional period followed by dialog-mechanism specific
   identification. For VoiceXML scripts, when present, this
   identification information is always a URL-encoded version of the URL
   which references the script to execute. When not present, the dialog
   server uses server-specific configuration to determine which script
   to execute.

   Examples of URLs that invoke VoiceXML dialogs are:


   The first of these indicates that the dialog server (located at
   vxmlservers.com) should invoke a VoiceXML script fetched from
   http://dialogs.server.com/script32.vxml. Since the user part of the
   SIP URL cannot contain the : character, this must be escaped to %3a.

2.2 Responding to the INVITE

   If the server receiving the INVITE doesn't support the specifics of
   the service request (for example, the requested VoiceXML version is
   not supported), the server SHOULD generate a 501 response. It MAY
   include a Warning header providing details on why the request could
   not be serviced.

   The server SHOULD authenticate the caller and verify that they are
   authorized to access the requested service. It is anticipated that

Rosenberg,Mataga,Ladd                                         [Page 3]

Internet Draft                  sip-vxml                   July 13, 2001

   dialog servers will generally be used in conjunction with an
   application server which makes the actual authorization decision
   about whether the call is to be processed. As a result, the dialog
   server's authorization decision is simple - if it came from an
   authorized upstream server, the request is allowed. It is RECOMMENDED
   that a persistent TLS connection between the application server and
   the dialog server be used to provide the authentication credentials
   for this kind of scenario.

   The server then validates that the SDP in the INVITE, if present, is
   acceptable. It does so based on the procedures of Section 2.3.

   If it has gotten this far, the server SHOULD fetch the script
   identified by the request-URI before generating a final response to
   the request. If the script cannot be fetched, or is invalid, the
   server generates a 502 Bad Gateway response, since effectively the
   server is a gateway to HTTP. It MAY include a Warning header
   providing details on the reason for failure.

   Once the script has been fetched, and is valid, and the offered SDP
   is deemed acceptable, the server SHOULD generate a 200 OK response.
   The generation of the response, and ACK processing, are based on
   standard SIP semantics.

2.3 SDP Processing

   If the INVITE contains SDP with an offer, the dialog server will
   generate an answer as per SIP-bis [6]. The offer is deemed
   unacceptable if it contains no media lines of type audio, or if the
   dialog server supports none of the codecs listed for the audio
   streams. Otherwise, it is deemed acceptable.

   The answer generated by the dialog server SHOULD refuse all media
   streams excepting the first offered audio stream. Choice of codecs
   used by the dialog server is at the discretion of the implementor.
   However, it is STRONGLY RECOMMENDED that all dialog servers support
   G.711 and RFC 2833. If an offered media stream does not indicate
   support for RFC 2833 tones, the dialog server SHOULD add that codec
   to the answer. As described in RFC2543-bis, this allows the dialog
   server to inform the caller that it can receive rfc2833 media, even
   if the caller cannot receive it.

   The server SHOULD allow sendonly, recvonly, and sendrecv media
   streams, as well as streams on hold. The meaning of these for script
   interpretation is discussed in Section 4.

   If the INVITE from the caller did not contain an SDP, the dialog
   server SHOULD generate an offer in the 2xx with a single audio media

Rosenberg,Mataga,Ladd                                         [Page 4]

Internet Draft                  sip-vxml                   July 13, 2001

   line, listing all codecs supported by the dialog server.

2.4 Script Variables

   In VoiceXML 1.0, the interpreter context provides the script with
   several variables that provide information on the call control
   interfaces. These variables are set in the following fashion:

        session.telephone.ani: This variable is the value of the URL in
             the From field of the INVITE that triggered the script.

        session.telephone.dnis: This variable is the value of the URL in
             the To field of the INVITE that triggered the script.

        session.telephone.iidigits: If the Contact header in the INVITE
             request uses the SIP caller preferences contact parameters
             [7] to provide additional information on the initiating
             device, the interpreter context SHOULD map these parameters
             to closest II digit if possible.

        session.telephone.uui: This variable is set only if the INVITE
             request contained an embedded ISUP IAM request [8]. In that
             case, the user-to-user information elements from that IAM
             are extracted, and mapped to this variable. Support for
             this is optional, but RECOMMENDED.

3 Document Acquisition

   The interpreter context fetches the script using normal HTTP GET and
   POST requests [9]. It MUST follow the caching behaviors specified in
   VoiceXML 1.0. It MAY support other document acquisition protocols,
   such as FTP.

4 Audio Input and Output

   Audio input and output are provided through RTP. The implementation
   platform SHOULD provide DTMF recognition on the incoming media
   stream, indpendent of its codec type. This is greatly facilitated
   through RFC 2833, which pushes the DTMF detection operation to the
   originator. The implementation platform SHOULD provide speech
   recognition on the incoming media stream as well.

   To be very explicit, this means that the dialog server SHOULD support
   recognition of DTMF and speech by processing a single incoming media
   stream. Furthermore, this stream can be sent by the caller using one
   of at least two codecs - G.711 and RFC 2833, and that the sender of
   the media can switch codecs on the fly when it detects DTMF. This
   means that RTP packets 1, 2 and 3 might be G.711, followed by RTP

Rosenberg,Mataga,Ladd                                         [Page 5]

Internet Draft                  sip-vxml                   July 13, 2001

   packet 4 which is RFC 2833. Furthermore, despite the fact that the
   sender can send RFC2833, the dialog server SHOULD still perform DTMF
   detection on the media stream, in case the sender does not support
   RFC 2833, or does support it, but misses a digit.

        OPEN ISSUE: This is a strong statement; if the probability
        of missed DTMF is small, the dialog server shouldn't have
        to do detection if it knows the caller has done it.
        Problem, though: since SDP has no way to indicate code-
        specific directionalities in a sendrecv stream, a UA that
        can only send RFC 2833 doesn't say anything about it in the
        SDP in the INVITE. As a result, there is no way to know for
        sure that the sender can do it until the first RFC 2833
        packet shows up. The SDP FID [10] specification resolves
        this. Should we make support for the FID spec mandatory for
        dialog servers?

   Some implementations we are aware of use a separate stream for the
   DTMF and for the speech. This approach is NOT RECOMMENDED, since it
   makes synchronization of the speech and DTMF difficult.

   SDP allows media streams to be unidirectional. If a stream is one-way
   from the caller to the dialog server, this means that script
   processing SHOULD proceed normally, except that any audio which would
   normally be output by the implementation platform is discarded.
   Furthermore, if a stream is one-way from the dialog server to the
   caller, script processing SHOULD proceed normally, except that the
   implementation platform never delivers characters (i.e., DTMF digits)
   or utterances to the interpreter. In other words, behavior is
   identical to the case where the caller is simply not talking.

   Unidirectional streams are very useful for applications which require
   a "listener" on an existing media stream to look for a particular
   utterance and DTMF digit, and deliver that to an application server
   for event processing. Therefore, it is RECOMMENDED that they be
   supported in dialog servers as described above.

   SIP allows media streams to be placed on hold. This will happen when
   the interpreter context receives a re-INVITE with an SDP with a connection line. This is handled identically to the case of a
   media stream which is unidirectional from the dialog server to the
   caller, meaning that it's "just" disconnected, not an interpreter-

   SIP allows media streams to be disabled by setting the port to zero.
   This has very specific meaning in the case of a dialog server. It has
   the effect of requesting a freeze of the interpreter state. When the

Rosenberg,Mataga,Ladd                                         [Page 6]

Internet Draft                  sip-vxml                   July 13, 2001

   interpreter context returns a 200 OK as a response, it indicates that
   the interpreter has been frozen. The interpreter is truly frozen; the
   behavior should be as if time were literally suspended as far as the
   interpreter is concerned. To unfreeze the interpreter state, a re-
   INVITE is needed to establish a new audio media stream. This will
   cause processing of the script to continue at exactly the same place
   it left off, using the media input and output from the new media
   stream to drive the interpreter. It is critical that, as far as the
   script is concerned, the freeze never even took place.

   This capability is essential for supporting feature composition of
   voice-based applications. Consider application A, which allows the
   user to hear an announcement when a friend comes online. If the user
   says yes, a call is placed to that friend. Another application, B,
   allows the user to hear stock quotes. We'd like to compose these so
   that both can happen simultaneously. For that to happen in a
   reasonable fashion, one of these applications has the "focus",
   meaning that it is the one processing the input and output from the
   user. Consider the case where the stock quote application has the
   focus. An the stock quote application runs on dialog server X, and
   the presence application on dialog server Y. Application server Z is
   the central point for all system events related to all applications.
   The flow to consider is show in Figure 1. At the beginning of the
   flow, the caller has a call leg to the AS, the the AS has used third
   party call control [11] to connect the caller to dialog server X.
   This means there is an RTP connection between the caller and this
   dialog server, as shown.

   An external event (such as a friend coming online), will cause an
   application server to decide that the other voice application needs
   to receive the focus. However, we don't want to terminate the stock
   quote application; we merely wish to suspend it so that the user can
   resume it after hearing that the friend came online. So, the
   application server sends a re-INVITE (1) to the dialog server running
   the stock quote application, and requests it to be frozen. When the
   interpreter comples the current prompt block, the context freezes the
   interpreter and returns a 200 OK. The AS then connects the user to
   the dialog server running the presence application (4-9). Dialog
   server Y will fetch the VoiceXML script from the AS (since the AS
   knows the identity of the buddy that came online, it needs to be the
   one that generates the VoiceXML script), but this is not shown. This
   dialog runs, and assuming the user doesn't call the friend, the
   script terminates, causing server Y to send a BYE (10). The AS
   decides to resume the stock quote application. So, using 3pcc, it
   reconnects the caller with server X (12-17). The re-INVITE to server
   X (14) has the effect of unfreezing the context, so processing
   continues where the call left off.

Rosenberg,Mataga,Ladd                                         [Page 7]

Internet Draft                  sip-vxml                   July 13, 2001

   The result of this is that the user's experience is the following:

   network: Please enter the stock to check.
   user: Lucent
   network: Lucent technologies is at six dollars.
   network: Friend alert: Bob is online. Would you like to call him?
   user: no
   network: Please enter the name of the stock to check.

   Note that The issue of when the interpreter can be suspended is being
   worked in the W3C.

   The key idea with this mechanism is that in NO CASE should the
   VoiceXML script for the stock quote application need to know that
   this external event (the buddy coming online) has occurred, so that
   it can play the buddy announcement. Doing so is counter to the entire
   concept of feature interaction; it is an intractable problem if every
   application and feature needs to know about each other. In the
   approach proposed here, each voice application remains independent.
   The application server plays the role of composing them by activating
   and deactivating the contexts as needed. This still requires the AS
   to know the set of applications that are running, but in this case,
   it doesn't need to know anything except the relative precedences of
   the various applications and the events which trigger them. Logic for
   that can, in principle, be constructed in a generic way, independent
   of the specific applications.

   This approach isn't perfect for all cases, but its simple enough to
   get things started.

4.1 Processing Further SIP Messages

   The interpreter context processes subsequent SIP messages in the
   following fashion.

4.2 BYE

   If a BYE request is received from the caller, this terminates the
   call. The interpreter context SHOULD throw the telephone.disconnect
   event to the interpreter.

4.3 re-INVITE

   If a re-INVITE is received, it has the effect of changing some aspect
   of the media input and output. Codec changes, port changes, and IP

Rosenberg,Mataga,Ladd                                         [Page 8]

Internet Draft                  sip-vxml                   July 13, 2001

      Caller              AS (Z)           DS (X)            DS (Y)
         |RTP              |                 |                 |
         |...................................|                 |
         |                 |friend online    |                 |
         |                 |<--------        |                 |
         |                 |(1) INV disable  |                 |
         |                 |---------------->|request freeze   |
         |                 |(2) 200 OK       |                 |
         |                 |<----------------|frozen           |
         |                 |(3) ACK          |                 |
         |                 |---------------->|                 |
         |                 |(4) INV no SDP   |                 |
         |                 |---------------------------------->|
         |                 |(5) 200 SDP 1    |                 |
         |(6) INV SDP 1    |<----------------------------------|
         |<----------------|                 |                 |
         |(7) 200 SDP 2    |                 |                 |
         |---------------->|(8) ACK SDP 2    |                 |
         |(9) ACK          |---------------------------------->|
         |<----------------|                 |                 |
         |                 |                 |                 |
         |   RTP           |                 |                 |
         |                 |                 |                 |
         |                 |(10) BYE         |                 |
         |                 |<----------------------------------|
         |                 |(11) 200 OK      |                 |
         |                 |---------------------------------->|
         |(12) INV no SDP  |                 |                 |
         |<----------------|                 |                 |
         |(13) 200 SDP 3   |                 |                 |
         |---------------->|(14) INV SDP 3   |                 |
         |                 |---------------->|unfreeze         |
         |                 |(15) 200 SDP 4   |                 |
         |(16) ACK SDP 4   |<----------------|                 |
         |<----------------|(17) ACK         |                 |
         |                 |---------------->|                 |
         |RTP              |                 |                 |
         |.................|.................|                 |
         |                 |                 |                 |
         |                 |                 |                 |

   Figure 1: Voice Application Composition

Rosenberg,Mataga,Ladd                                         [Page 9]

Internet Draft                  sip-vxml                   July 13, 2001

   address changes are handled normally as per bis [6]. Specific
   processing is required for changes in stream direction, placing the
   call on hold, disabling a media stream, and adding a new audio stream
   after a previous re-INVITE disabled it. See Section 4.


   These messages are ignored by the interpreter context.

5 Tag Processing

   Certain tags within the VoiceXML script have call control
   implications. The following subsections describe how the interpreter
   context handles them.

5.1 Exit

   VoiceXML 1.0 says that the processing of the exit tag is entirely
   context specific.

   For SIP, the interpreter context SHOULD send a BYE to terminate the

   Ideally, the VoiceXML <exit> element would also post the given
   namelist to a URI specified in the original call setup. For example,
   the URI of an HTTP servlet running directly in the AS or in an
   associated web application server would be an appropriate choice.
   This would allow voice interactions to be completely independent of
   the calling context, and therefore be re-usable across providers and
   applications. The VoiceXML specification is silent on exactly what
   should happen with the <exit> namelist. For this reason, we do not
   specify specific processing at this time.

        OPEN ISSUE: Should we specify something? We could provide
        an additional URL at script initiation which is used to
        post the namelist upon exit.

5.2 Disconnect

   The interpreter context SHOULD send a BYE to terminate the call. As
   per the VoiceXML specification, a telephone.disconnected.hangup event
   is also thrown.

5.3 Transfer

   VoiceXML 1.0 supports two styles of transfer, bridged and blind.

Rosenberg,Mataga,Ladd                                        [Page 10]

Internet Draft                  sip-vxml                   July 13, 2001

5.3.1 Blind

   When the interpreter context needs to perform a blind transfer, it
   SHOULD generate a REFER [12] request. The REFER request is sent to
   the caller. It contains a Refer-To header which contains the target
   URL specified in the URI in the value of the "dest" attribute of the
   transfer tag. If the transfer tag contains a connecttimeout
   attribute, the URI in the Refer-To has an Expires header parameter
   appended to it, containing the duration from the attribute.

   For example, if the following transfer tag was encountered:

   <transfer name="mycall" dest="sip:support@foo.com" bridge="false"

   The REFER would look like:

   REFER sip:caller@pc13.company.com
   Via: SIP/2.0/UDP server3.vxmlservers.com
   From: sip:dialog.vxml20@vxmlservers.com;tag=8aa6s
   CSeq: 3487 REFER
   Call-ID: 9a8s9809s@
   To: sip:caller@company.com;tag=99as7
   Refer-To: sip:support@foo.com?Expires=10
   Referred-By: sip:dialog.vxml20@vxmlservers.com

   If the REFER is rejected, the interpreter context outputs a
   network_busy as the outcome of the transfer attempt. Otherwise, the
   interpreter context remains suspended until a NOTIFY is received.

   At some point before the expiration, the interpreter context will
   receive a NOTIFY request containing the final response received for
   the triggered INVITE. If this response is a 2xx, the interpreter
   context throws a telephone.disconnect.transfer, and sends a BYE
   request to terminate the call.

   If the final response was a non-2xx response, the transfer attempt
   failed. If the final response was a 486, the outcome of the transfer
   attempt is set to busy, and form processing continues. If the final
   response was a 408, the outcome of the transfer attempt is set to
   noanswer, and form processing continues. For any other response, the
   outcome of the transfer attempt is set to network_busy, and form

Rosenberg,Mataga,Ladd                                        [Page 11]

Internet Draft                  sip-vxml                   July 13, 2001

   processing continues.

5.3.2 Bridged

   In a bridged transfer, the interpreter context resumes after the
   transfer call completes. VoiceXML 1.0 also allows the script to
   specify a grammar within the transfer tag, allowing it to listen in
   for DTMF that meets that grammar. When a match is found, the transfer
   is terminated and control returns to the interpreter.

   This function requires that the dialog server act as a UAC, and make
   the outbound call to the transferred party. The flow is shown in
   Figure 2. The caller connects to the dialog server with messages 1-3.
   RTP flows between the caller and the dialog server. When the transfer
   tag is encountered, the dialog server sends an outbound INVITE (4)
   The outbound INVITE contains the same SDP, SDP 1, offered by the
   caller. If the final response (5) is a 200 OK, this contains SDP3.
   The dialog server continues to receive media from the caller. This is
   passed on to the transfer target, using SDP3. However, media from the
   transfer target to the caller goes direct, bypassing the dialog

   If the final response to the INVITE was a non-2xx response, the
   transfer attempt failed. If the final response was a 486, the outcome
   of the transfer attempt is set to busy, and form processing
   continues. If the final response was a 408, the outcome of the
   transfer attempt is set to noanswer, and form processing continues.
   For any other response, the outcome of the transfer attempt is set to
   network_busy, and form processing continues.

   The INVITE should not be left pending for more than the amount of
   time in the connecttimeout parameter, if specified. After that amount
   of time has passed, the INVITE request is cancelled, and form
   processing continues. The outcome of the transfer is set to noanswer.

   If the final response to the INVITE was a 2xx response, the transfer
   attempt succeeded. In addition to passing on the media to the
   transfer target, the interpreter passes the media received from the
   caller through the grammar present within the transfer tag, if
   present. If the grammar is matched, the interpreter context sends a
   BYE to the transfer target. Processing continues within the

   If the transfer target sends a BYE, a 200 OK is returned. The outcome
   of the script is set to far_end_disconnect. Form interpretation
   continues. If the caller sends a BYE, a 200 OK is returned. The
   dialog server sends a BYE to the transfer target. A

Rosenberg,Mataga,Ladd                                        [Page 12]

Internet Draft                  sip-vxml                   July 13, 2001

          |(1) INVITE SDP1      |                    |
          |-------------------->|                    |
          |(2) 200 SDP2         |                    |
          |<--------------------|                    |
          |(3) ACK              |                    |
          |-------------------->|                    |
          |RTP                  |                    |
          |<...................>|                    |
          |                     |(4) INVITE SDP1     |
          |                     |------------------->|
          |                     |(5) 200 SDP3        |
          |                     |<-------------------|
          |                     |(6) ACK             |
          |                     |------------------->|
          | RTP from caller     |                    |
          |....................>| RTP from caller    |
          |                     |...................>|
          |           RTP to caller                  |
          |                     |                    |
          |                     |                    |

        Caller                  DS                 Transfer

   Figure 2: Bridged Transfer flow

   telephone.disconnect.hangup event is thrown, and form processing
   continues to allow cleanup.

        OPEN ISSUE: When would it even be possible for the transfer

Rosenberg,Mataga,Ladd                                        [Page 13]

Internet Draft                  sip-vxml                   July 13, 2001

        outcome to be near_end_disconnect? Wouldn't this terminate
        the script, so that there is no transfer outcome?

   If the transfer target sends a REFER (ie., the caller is to be
   transferred elsewhere), the interpreter context responds with a 200
   OK. It creates a new REFER with the same Refer-To header (but its own
   value for Referred-By), and sends it to the caller. Upon receiving a
   200 OK to the REFER, the dialog server sends a NOTIFY to the transfer
   target, informing it of a successful REFER completion to the new
   target. If a BYE is received from the transfer target, the
   interpreter sends a BYE to the caller as well, and throws a
   telephone.disconnect.transfer event.

6 Additional Requirements

   In addition to the above behaviors, we also recommend that several
   optional SIP capabilities be implemented by dialog servers. This is
   to support their intended use cases as components in the application
   server component architecture [4]. The following list of requirements
   includes these recommended features, in addition to summarizing the
   ones scattered above:

        1.   The dialog server SHOULD support SIP over persistent TCP
             and TLS connections, and SHOULD support a configurable
             authorization listing of allowed Distinguished Names which
             can connect. This is useful when authorization decisions
             are outsourced to an application server, as described

        2.   The dialog server SHOULD fully support RFC 1889 and RFC
             1890. Of particular importance is RTCP.

        3.   The dialog server SHOULD support G.711 and RFC 2833.

        4.   The dialog server SHOULD support the UA requirements
             outlined in the third party call control specification
             [11]. This is important for building more complex
             applications, a common usage for dialog servers.

        5.   The dialog server SHOULD support the SDP FID attribute
             [10], and SHOULD use it to allow processing to occur over a
             collection of alternate streams with the same FID group.

        6.   The dialog server SHOULD support the REFER method [12],
             needed for the blind transfer tag. It SHOULD also allow
             itself to be referrred as a normal UAS.

        7.   The dialog server SHOULD allow any HTTP URL to be placed in

Rosenberg,Mataga,Ladd                                        [Page 14]

Internet Draft                  sip-vxml                   July 13, 2001

             the request-URI for specifying the script to execute.

7 Authors Addresses

   Jonathan Rosenberg
   72 Eagle Rock Avenue
   First Floor
   East Hanover, NJ 07936
   email: jdrosen@dynamicsoft.com

   Peter Mataga
   72 Eagle Rock Avenue
   First Floor
   East Hanover, NJ 07936
   email: pmataga@dynamicsoft.com

   David Ladd
   72 Eagle Rock Avenue
   First Floor
   East Hanover, NJ 07936
   email: dladd@dynamicsoft.com

8 Bibliography

   [1] VoiceXML Forum, "Voice extensible markup language (VoiceXML)
   version 1.00," VoiceXML forum specification, VoiceXML Forum, Mar.

   [2] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP:
   session initiation protocol," Request for Comments 2543, Internet
   Engineering Task Force, Mar. 1999.

   [3] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a
   transport protocol for real-time applications," Request for Comments
   1889, Internet Engineering Task Force, Jan. 1996.

   [4] J. Rosenberg, P. Mataga, and H. Schulzrinne, "An application
   server component architecture for SIP," Internet Draft, Internet
   Engineering Task Force, Mar. 2001.  Work in progress.

   [5] B. Campbell and R. Sparks, "Control of service context using SIP

Rosenberg,Mataga,Ladd                                        [Page 15]

Internet Draft                  sip-vxml                   July 13, 2001

   Request-URI," Request for Comments 3087, Internet Engineering Task
   Force, Apr. 2001.

   [6] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP:
   Session initiation protocol," Internet Draft, Internet Engineering
   Task Force, Nov. 2000.  Work in progress.

   [7] H. Schulzrinne and J. Rosenberg, "SIP caller preferences and
   callee capabilities," Internet Draft, Internet Engineering Task
   Force, Nov. 2000.  Work in progress.

   [8] E. Zimmerer, J. Peterson, A. Vemuri, L. Ong, F. Audet, M. Watson,
   and M.Zonoun, "MIME media types for ISUP and QSIG objects," Internet
   Draft, Internet Engineering Task Force, Mar. 2001.  Work in progress.

   [9] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P.
   Leach, and T. Berners-Lee, "Hypertext transfer protocol -- HTTP/1.1,"
   Request for Comments 2616, Internet Engineering Task Force, June

   [10] G. Camarillo, J. Holler, and G. Eriksson, "The SDP fid
   attribute," Internet Draft, Internet Engineering Task Force, Apr.
   2001.  Work in progress.

   [11] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo,
   "Third party call control in SIP," Internet Draft, Internet
   Engineering Task Force, Mar.  2001.  Work in progress.

   [12] R. Sparks, "SIP call control," Internet Draft, Internet
   Engineering Task Force, Feb. 2001.  Work in progress.

Rosenberg,Mataga,Ladd                                        [Page 16]

Html markup produced by rfcmarkup 1.129b, available from https://tools.ietf.org/tools/rfcmarkup/