[RFCs/IDs] [Plain Text] [WG] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17

SPEECHSC                                                   S. Shanmugham
Internet-Draft                                       Cisco Systems, Inc.
Intended status: Standards Track                              D. Burnett
Expires: December 25, 2008                                         Voxeo
                                                           June 23, 2008


           Media Resource Control Protocol Version 2 (MRCPv2)
                     draft-ietf-speechsc-mrcpv2-16

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on December 25, 2008.

Abstract

   The MRCPv2 protocol allows client hosts to control media service
   resources such as speech synthesizers, recognizers, verifiers and
   identifiers residing in servers on the network.  MRCPv2 is not a
   "stand-alone" protocol - it relies on a session management protocol
   such as the Session Initiation Protocol (SIP) to establish the MRCPv2
   control session between the client and the server, and for rendezvous
   and capability discovery.  It also depends on SIP and SDP to
   establish the media sessions and associated parameters between the
   media source or sink and the media server.  Once this is done, the
   MRCPv2 protocol exchange operates over the control session
   established above, allowing the client to control the media



Shanmugham & Burnett    Expires December 25, 2008               [Page 1]

Internet-Draft                   MRCPv2                        June 2008


   processing resources on the speech resource server.


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   8
   2.  Document Conventions  . . . . . . . . . . . . . . . . . . . .   9
     2.1.   Definitions  . . . . . . . . . . . . . . . . . . . . . .   9
     2.2.   State-Machine Diagrams . . . . . . . . . . . . . . . . .   9
   3.  Architecture  . . . . . . . . . . . . . . . . . . . . . . . .  10
     3.1.   MRCPv2 Media Resource Types  . . . . . . . . . . . . . .  11
     3.2.   Server and Resource Addressing . . . . . . . . . . . . .  12
   4.  MRCPv2 Protocol Basics  . . . . . . . . . . . . . . . . . . .  12
     4.1.   Connecting to the Server . . . . . . . . . . . . . . . .  13
     4.2.   Managing Resource Control Channels . . . . . . . . . . .  13
     4.3.   Media Streams and RTP Ports  . . . . . . . . . . . . . .  20
     4.4.   MRCPv2 Message Transport . . . . . . . . . . . . . . . .  21
   5.  MRCPv2 Specification  . . . . . . . . . . . . . . . . . . . .  22
     5.1.   Common Protocol Elements . . . . . . . . . . . . . . . .  22
     5.2.   Request  . . . . . . . . . . . . . . . . . . . . . . . .  24
     5.3.   Response . . . . . . . . . . . . . . . . . . . . . . . .  25
     5.4.   Status Codes . . . . . . . . . . . . . . . . . . . . . .  26
     5.5.   Events . . . . . . . . . . . . . . . . . . . . . . . . .  27
   6.  MRCPv2 Generic Methods, Headers, and Result Structure . . . .  28
     6.1.   Generic Methods  . . . . . . . . . . . . . . . . . . . .  28
       6.1.1.   SET-PARAMS . . . . . . . . . . . . . . . . . . . . .  28
       6.1.2.   GET-PARAMS . . . . . . . . . . . . . . . . . . . . .  29
     6.2.   Generic Message Headers  . . . . . . . . . . . . . . . .  30
       6.2.1.   Channel-Identifier . . . . . . . . . . . . . . . . .  31
       6.2.2.   Accept . . . . . . . . . . . . . . . . . . . . . . .  32
       6.2.3.   Active-Request-Id-List . . . . . . . . . . . . . . .  32
       6.2.4.   Proxy-Sync-Id  . . . . . . . . . . . . . . . . . . .  32
       6.2.5.   Accept-Charset . . . . . . . . . . . . . . . . . . .  33
       6.2.6.   Content-Type . . . . . . . . . . . . . . . . . . . .  33
       6.2.7.   Content-ID . . . . . . . . . . . . . . . . . . . . .  33
       6.2.8.   Content-Base . . . . . . . . . . . . . . . . . . . .  33
       6.2.9.   Content-Encoding . . . . . . . . . . . . . . . . . .  34
       6.2.10.  Content-Location . . . . . . . . . . . . . . . . . .  34
       6.2.11.  Content-Length . . . . . . . . . . . . . . . . . . .  35
       6.2.12.  Fetch Timeout  . . . . . . . . . . . . . . . . . . .  35
       6.2.13.  Cache-Control  . . . . . . . . . . . . . . . . . . .  35
       6.2.14.  Logging-Tag  . . . . . . . . . . . . . . . . . . . .  37
       6.2.15.  Set-Cookie and Set-Cookie2 . . . . . . . . . . . . .  37
       6.2.16.  Vendor Specific Parameters . . . . . . . . . . . . .  39
     6.3.   Generic Result Structure . . . . . . . . . . . . . . . .  39
       6.3.1.   Natural Language Semantics Markup Language . . . . .  40
   7.  Resource Discovery  . . . . . . . . . . . . . . . . . . . . .  41
   8.  Speech Synthesizer Resource . . . . . . . . . . . . . . . . .  43



Shanmugham & Burnett    Expires December 25, 2008               [Page 2]

Internet-Draft                   MRCPv2                        June 2008


     8.1.   Synthesizer State Machine  . . . . . . . . . . . . . . .  43
     8.2.   Synthesizer Methods  . . . . . . . . . . . . . . . . . .  44
     8.3.   Synthesizer Events . . . . . . . . . . . . . . . . . . .  44
     8.4.   Synthesizer Header Fields  . . . . . . . . . . . . . . .  45
       8.4.1.   Jump-Size  . . . . . . . . . . . . . . . . . . . . .  45
       8.4.2.   Kill-On-Barge-In . . . . . . . . . . . . . . . . . .  46
       8.4.3.   Speaker Profile  . . . . . . . . . . . . . . . . . .  46
       8.4.4.   Completion Cause . . . . . . . . . . . . . . . . . .  47
       8.4.5.   Completion Reason  . . . . . . . . . . . . . . . . .  47
       8.4.6.   Voice-Parameter  . . . . . . . . . . . . . . . . . .  48
       8.4.7.   Prosody-Parameters . . . . . . . . . . . . . . . . .  48
       8.4.8.   Speech Marker  . . . . . . . . . . . . . . . . . . .  49
       8.4.9.   Speech Language  . . . . . . . . . . . . . . . . . .  50
       8.4.10.  Fetch Hint . . . . . . . . . . . . . . . . . . . . .  50
       8.4.11.  Audio Fetch Hint . . . . . . . . . . . . . . . . . .  50
       8.4.12.  Failed URI . . . . . . . . . . . . . . . . . . . . .  51
       8.4.13.  Failed URI Cause . . . . . . . . . . . . . . . . . .  51
       8.4.14.  Speak Restart  . . . . . . . . . . . . . . . . . . .  51
       8.4.15.  Speak Length . . . . . . . . . . . . . . . . . . . .  51
       8.4.16.  Load-Lexicon . . . . . . . . . . . . . . . . . . . .  52
       8.4.17.  Lexicon-Search-Order . . . . . . . . . . . . . . . .  52
     8.5.   Synthesizer Message Body . . . . . . . . . . . . . . . .  52
       8.5.1.   Synthesizer Speech Data  . . . . . . . . . . . . . .  52
       8.5.2.   Lexicon Data . . . . . . . . . . . . . . . . . . . .  55
     8.6.   SPEAK Method . . . . . . . . . . . . . . . . . . . . . .  56
     8.7.   STOP . . . . . . . . . . . . . . . . . . . . . . . . . .  58
     8.8.   BARGE-IN-OCCURED . . . . . . . . . . . . . . . . . . . .  59
     8.9.   PAUSE  . . . . . . . . . . . . . . . . . . . . . . . . .  61
     8.10.  RESUME . . . . . . . . . . . . . . . . . . . . . . . . .  62
     8.11.  CONTROL  . . . . . . . . . . . . . . . . . . . . . . . .  64
     8.12.  SPEAK-COMPLETE . . . . . . . . . . . . . . . . . . . . .  66
     8.13.  SPEECH-MARKER  . . . . . . . . . . . . . . . . . . . . .  67
     8.14.  DEFINE-LEXICON . . . . . . . . . . . . . . . . . . . . .  69
   9.  Speech Recognizer Resource  . . . . . . . . . . . . . . . . .  69
     9.1.   Recognizer State Machine . . . . . . . . . . . . . . . .  71
     9.2.   Recognizer Methods . . . . . . . . . . . . . . . . . . .  71
     9.3.   Recognizer Events  . . . . . . . . . . . . . . . . . . .  72
     9.4.   Recognizer Header Fields . . . . . . . . . . . . . . . .  72
       9.4.1.   Confidence Threshold . . . . . . . . . . . . . . . .  74
       9.4.2.   Sensitivity Level  . . . . . . . . . . . . . . . . .  74
       9.4.3.   Speed Vs Accuracy  . . . . . . . . . . . . . . . . .  75
       9.4.4.   N Best List Length . . . . . . . . . . . . . . . . .  75
       9.4.5.   Input Type . . . . . . . . . . . . . . . . . . . . .  75
       9.4.6.   No Input Timeout . . . . . . . . . . . . . . . . . .  75
       9.4.7.   Recognition Timeout  . . . . . . . . . . . . . . . .  76
       9.4.8.   Waveform URI . . . . . . . . . . . . . . . . . . . .  76
       9.4.9.   Media Type . . . . . . . . . . . . . . . . . . . . .  77
       9.4.10.  Input-Waveform-URI . . . . . . . . . . . . . . . . .  77



Shanmugham & Burnett    Expires December 25, 2008               [Page 3]

Internet-Draft                   MRCPv2                        June 2008


       9.4.11.  Completion Cause . . . . . . . . . . . . . . . . . .  77
       9.4.12.  Completion Reason  . . . . . . . . . . . . . . . . .  79
       9.4.13.  Recognizer Context Block . . . . . . . . . . . . . .  79
       9.4.14.  Start Input Timers . . . . . . . . . . . . . . . . .  80
       9.4.15.  Speech Complete Timeout  . . . . . . . . . . . . . .  80
       9.4.16.  Speech Incomplete Timeout  . . . . . . . . . . . . .  81
       9.4.17.  DTMF Interdigit Timeout  . . . . . . . . . . . . . .  81
       9.4.18.  DTMF Term Timeout  . . . . . . . . . . . . . . . . .  82
       9.4.19.  DTMF-Term-Char . . . . . . . . . . . . . . . . . . .  82
       9.4.20.  Failed URI . . . . . . . . . . . . . . . . . . . . .  82
       9.4.21.  Failed URI Cause . . . . . . . . . . . . . . . . . .  82
       9.4.22.  Save Waveform  . . . . . . . . . . . . . . . . . . .  83
       9.4.23.  New Audio Channel  . . . . . . . . . . . . . . . . .  83
       9.4.24.  Speech-Language  . . . . . . . . . . . . . . . . . .  83
       9.4.25.  Ver-Buffer-Utterance . . . . . . . . . . . . . . . .  83
       9.4.26.  Recognition-Mode . . . . . . . . . . . . . . . . . .  84
       9.4.27.  Cancel-If-Queue  . . . . . . . . . . . . . . . . . .  84
       9.4.28.  Hotword-Max-Duration . . . . . . . . . . . . . . . .  85
       9.4.29.  Hotword-Min-Duration . . . . . . . . . . . . . . . .  85
       9.4.30.  Interpret-Text . . . . . . . . . . . . . . . . . . .  85
       9.4.31.  DTMF-Buffer-Time . . . . . . . . . . . . . . . . . .  85
       9.4.32.  Clear-DTMF-Buffer  . . . . . . . . . . . . . . . . .  86
       9.4.33.  Early-No-Match . . . . . . . . . . . . . . . . . . .  86
       9.4.34.  Num-Min-Consistent-Pronunciations  . . . . . . . . .  86
       9.4.35.  Consistency-Threshold  . . . . . . . . . . . . . . .  86
       9.4.36.  Clash-Threshold  . . . . . . . . . . . . . . . . . .  87
       9.4.37.  Personal-Grammar-URI . . . . . . . . . . . . . . . .  87
       9.4.38.  Enroll-Utterance . . . . . . . . . . . . . . . . . .  87
       9.4.39.  Phrase-Id  . . . . . . . . . . . . . . . . . . . . .  88
       9.4.40.  Phrase-NL  . . . . . . . . . . . . . . . . . . . . .  88
       9.4.41.  Weight . . . . . . . . . . . . . . . . . . . . . . .  88
       9.4.42.  Save-Best-Waveform . . . . . . . . . . . . . . . . .  88
       9.4.43.  New-Phrase-Id  . . . . . . . . . . . . . . . . . . .  89
       9.4.44.  Confusable-Phrases-URI . . . . . . . . . . . . . . .  89
       9.4.45.  Abort-Phrase-Enrollment  . . . . . . . . . . . . . .  89
     9.5.   Recognizer Message Body  . . . . . . . . . . . . . . . .  89
       9.5.1.   Recognizer Grammar Data  . . . . . . . . . . . . . .  90
       9.5.2.   Recognizer Result Data . . . . . . . . . . . . . . .  93
       9.5.3.   Enrollment Result Data . . . . . . . . . . . . . . .  94
       9.5.4.   Recognizer Context Block . . . . . . . . . . . . . .  94
     9.6.   Recognizer Results . . . . . . . . . . . . . . . . . . .  94
       9.6.1.   Markup Functions . . . . . . . . . . . . . . . . . .  95
       9.6.2.   Overview of Recognizer Result Elements and their
                Relationships  . . . . . . . . . . . . . . . . . . .  96
       9.6.3.   Elements and Attributes  . . . . . . . . . . . . . .  96
     9.7.   Enrollment Results . . . . . . . . . . . . . . . . . . . 101
       9.7.1.   NUM-CLASHES Element  . . . . . . . . . . . . . . . . 101
       9.7.2.   NUM-GOOD-REPETITIONS Element . . . . . . . . . . . . 101



Shanmugham & Burnett    Expires December 25, 2008               [Page 4]

Internet-Draft                   MRCPv2                        June 2008


       9.7.3.   NUM-REPETITIONS-STILL-NEEDED Element . . . . . . . . 101
       9.7.4.   CONSISTENCY-STATUS Element . . . . . . . . . . . . . 102
       9.7.5.   CLASH-PHRASE-IDS Element . . . . . . . . . . . . . . 102
       9.7.6.   TRANSCRIPTIONS Element . . . . . . . . . . . . . . . 102
       9.7.7.   CONFUSABLE-PHRASES Element . . . . . . . . . . . . . 102
     9.8.   DEFINE-GRAMMAR . . . . . . . . . . . . . . . . . . . . . 102
     9.9.   RECOGNIZE  . . . . . . . . . . . . . . . . . . . . . . . 106
     9.10.  STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 112
     9.11.  GET-RESULT . . . . . . . . . . . . . . . . . . . . . . . 113
     9.12.  START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 114
     9.13.  START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 115
     9.14.  RECOGNITION-COMPLETE . . . . . . . . . . . . . . . . . . 115
     9.15.  START-PHRASE-ENROLLMENT  . . . . . . . . . . . . . . . . 117
     9.16.  ENROLLMENT-ROLLBACK  . . . . . . . . . . . . . . . . . . 118
     9.17.  END-PHRASE-ENROLLMENT  . . . . . . . . . . . . . . . . . 119
     9.18.  MODIFY-PHRASE  . . . . . . . . . . . . . . . . . . . . . 119
     9.19.  DELETE-PHRASE  . . . . . . . . . . . . . . . . . . . . . 120
     9.20.  INTERPRET  . . . . . . . . . . . . . . . . . . . . . . . 120
     9.21.  INTERPRETATION-COMPLETE  . . . . . . . . . . . . . . . . 121
     9.22.  DTMF Detection . . . . . . . . . . . . . . . . . . . . . 123
   10. Recorder Resource . . . . . . . . . . . . . . . . . . . . . . 123
     10.1.  Recorder State Machine . . . . . . . . . . . . . . . . . 124
     10.2.  Recorder Methods . . . . . . . . . . . . . . . . . . . . 124
     10.3.  Recorder Events  . . . . . . . . . . . . . . . . . . . . 124
     10.4.  Recorder Header Fields . . . . . . . . . . . . . . . . . 124
       10.4.1.  Sensitivity Level  . . . . . . . . . . . . . . . . . 125
       10.4.2.  No Input Timeout . . . . . . . . . . . . . . . . . . 125
       10.4.3.  Completion Cause . . . . . . . . . . . . . . . . . . 125
       10.4.4.  Completion Reason  . . . . . . . . . . . . . . . . . 126
       10.4.5.  Failed URI . . . . . . . . . . . . . . . . . . . . . 126
       10.4.6.  Failed URI Cause . . . . . . . . . . . . . . . . . . 126
       10.4.7.  Record URI . . . . . . . . . . . . . . . . . . . . . 127
       10.4.8.  Media Type . . . . . . . . . . . . . . . . . . . . . 127
       10.4.9.  Max Time . . . . . . . . . . . . . . . . . . . . . . 127
       10.4.10. Trim-Length  . . . . . . . . . . . . . . . . . . . . 128
       10.4.11. Final Silence  . . . . . . . . . . . . . . . . . . . 128
       10.4.12. Capture On Speech  . . . . . . . . . . . . . . . . . 128
       10.4.13. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 128
       10.4.14. Start Input Timers . . . . . . . . . . . . . . . . . 129
       10.4.15. New Audio Channel  . . . . . . . . . . . . . . . . . 129
     10.5.  Recorder Message Body  . . . . . . . . . . . . . . . . . 129
     10.6.  RECORD . . . . . . . . . . . . . . . . . . . . . . . . . 129
     10.7.  STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 130
     10.8.  RECORD-COMPLETE  . . . . . . . . . . . . . . . . . . . . 131
     10.9.  START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 132
     10.10. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 132
   11. Speaker Verification and Identification . . . . . . . . . . . 133
     11.1.  Speaker Verification State Machine . . . . . . . . . . . 134



Shanmugham & Burnett    Expires December 25, 2008               [Page 5]

Internet-Draft                   MRCPv2                        June 2008


     11.2.  Speaker Verification Methods . . . . . . . . . . . . . . 136
     11.3.  Verification Events  . . . . . . . . . . . . . . . . . . 137
     11.4.  Verification Header Fields . . . . . . . . . . . . . . . 137
       11.4.1.  Repository-URI . . . . . . . . . . . . . . . . . . . 138
       11.4.2.  Voiceprint-Identifier  . . . . . . . . . . . . . . . 138
       11.4.3.  Verification-Mode  . . . . . . . . . . . . . . . . . 139
       11.4.4.  Adapt-Model  . . . . . . . . . . . . . . . . . . . . 140
       11.4.5.  Abort-Model  . . . . . . . . . . . . . . . . . . . . 140
       11.4.6.  Min-Verification-Score . . . . . . . . . . . . . . . 140
       11.4.7.  Num-Min-Verification-Phrases . . . . . . . . . . . . 140
       11.4.8.  Num-Max-Verification-Phrases . . . . . . . . . . . . 141
       11.4.9.  No-Input-Timeout . . . . . . . . . . . . . . . . . . 141
       11.4.10. Save-Waveform  . . . . . . . . . . . . . . . . . . . 141
       11.4.11. Media Type . . . . . . . . . . . . . . . . . . . . . 142
       11.4.12. Waveform-URI . . . . . . . . . . . . . . . . . . . . 142
       11.4.13. Voiceprint-Exists  . . . . . . . . . . . . . . . . . 142
       11.4.14. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 143
       11.4.15. Input-Waveform-Uri . . . . . . . . . . . . . . . . . 143
       11.4.16. Completion-Cause . . . . . . . . . . . . . . . . . . 143
       11.4.17. Completion Reason  . . . . . . . . . . . . . . . . . 145
       11.4.18. Speech Complete Timeout  . . . . . . . . . . . . . . 145
       11.4.19. New Audio Channel  . . . . . . . . . . . . . . . . . 145
       11.4.20. Abort-Verification . . . . . . . . . . . . . . . . . 145
       11.4.21. Start Input Timers . . . . . . . . . . . . . . . . . 145
     11.5.  Verification Message Body  . . . . . . . . . . . . . . . 146
       11.5.1.  Verification Result Data . . . . . . . . . . . . . . 146
       11.5.2.  Verification Result Elements . . . . . . . . . . . . 146
     11.6.  START-SESSION  . . . . . . . . . . . . . . . . . . . . . 150
     11.7.  END-SESSION  . . . . . . . . . . . . . . . . . . . . . . 151
     11.8.  QUERY-VOICEPRINT . . . . . . . . . . . . . . . . . . . . 152
     11.9.  DELETE-VOICEPRINT  . . . . . . . . . . . . . . . . . . . 153
     11.10. VERIFY . . . . . . . . . . . . . . . . . . . . . . . . . 154
     11.11. VERIFY-FROM-BUFFER . . . . . . . . . . . . . . . . . . . 154
     11.12. VERIFY-ROLLBACK  . . . . . . . . . . . . . . . . . . . . 157
     11.13. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 157
     11.14. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 158
     11.15. VERIFICATION-COMPLETE  . . . . . . . . . . . . . . . . . 159
     11.16. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 159
     11.17. CLEAR-BUFFER . . . . . . . . . . . . . . . . . . . . . . 160
     11.18. GET-INTERMEDIATE-RESULT  . . . . . . . . . . . . . . . . 160
   12. Security Considerations . . . . . . . . . . . . . . . . . . . 161
     12.1.  Rendezvous and Session Establishment . . . . . . . . . . 162
     12.2.  Control channel protection . . . . . . . . . . . . . . . 162
     12.3.  Media session protection . . . . . . . . . . . . . . . . 162
     12.4.  Indirect Content Access  . . . . . . . . . . . . . . . . 162
     12.5.  Protection of stored media . . . . . . . . . . . . . . . 163
   13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 163
     13.1.  New registries . . . . . . . . . . . . . . . . . . . . . 163



Shanmugham & Burnett    Expires December 25, 2008               [Page 6]

Internet-Draft                   MRCPv2                        June 2008


       13.1.1.  MRCPv2 resource types  . . . . . . . . . . . . . . . 163
       13.1.2.  MRCPv2 methods and events  . . . . . . . . . . . . . 163
       13.1.3.  MRCPv2 headers . . . . . . . . . . . . . . . . . . . 165
       13.1.4.  MRCPv2 status codes  . . . . . . . . . . . . . . . . 167
       13.1.5.  Grammar Reference List Parameters  . . . . . . . . . 167
       13.1.6.  MRCPv2 vendor-specific parameters  . . . . . . . . . 168
     13.2.  NLSML-related registrations  . . . . . . . . . . . . . . 168
       13.2.1.  application/nlsml+xml Media Type registration  . . . 168
     13.3.  NLSML XML Schema registration  . . . . . . . . . . . . . 169
     13.4.  MRCPv2 XML Namespace registration  . . . . . . . . . . . 169
     13.5.  text Media Type Registrations  . . . . . . . . . . . . . 169
       13.5.1.  text/grammar-ref-list  . . . . . . . . . . . . . . . 170
       13.5.2.  text/uri-list  . . . . . . . . . . . . . . . . . . . 170
     13.6.  session URL scheme registration  . . . . . . . . . . . . 171
     13.7.  SDP parameter registrations  . . . . . . . . . . . . . . 172
       13.7.1.  sub-registry "proto" . . . . . . . . . . . . . . . . 172
       13.7.2.  sub-registry "att-field (session-level)" . . . . . . 173
       13.7.3.  sub-registry "att-field (media-level)" . . . . . . . 173
   14. Examples  . . . . . . . . . . . . . . . . . . . . . . . . . . 174
     14.1.  Message Flow . . . . . . . . . . . . . . . . . . . . . . 174
     14.2.  Recognition Result Examples  . . . . . . . . . . . . . . 183
       14.2.1.  Simple ASR Ambiguity . . . . . . . . . . . . . . . . 183
       14.2.2.  Mixed Initiative . . . . . . . . . . . . . . . . . . 184
       14.2.3.  DTMF Input . . . . . . . . . . . . . . . . . . . . . 185
       14.2.4.  Interpreting Meta-Dialog and Meta-Task Utterances  . 185
       14.2.5.  Anaphora and Deixis  . . . . . . . . . . . . . . . . 186
       14.2.6.  Distinguishing Individual Items from Sets with
                One Member . . . . . . . . . . . . . . . . . . . . . 187
       14.2.7.  Extensibility  . . . . . . . . . . . . . . . . . . . 188
   15. ABNF Normative Definition . . . . . . . . . . . . . . . . . . 188
   16. XML Schemas . . . . . . . . . . . . . . . . . . . . . . . . . 203
     16.1.  NLSML Schema Definition  . . . . . . . . . . . . . . . . 203
     16.2.  Enrollment Results Schema Definition . . . . . . . . . . 204
     16.3.  Verification Results Schema Definition . . . . . . . . . 205
   17. References  . . . . . . . . . . . . . . . . . . . . . . . . . 209
     17.1.  Normative References . . . . . . . . . . . . . . . . . . 209
     17.2.  Informative References . . . . . . . . . . . . . . . . . 211
   Appendix A.  Contributors . . . . . . . . . . . . . . . . . . . . 212
   Appendix B.  Acknowledgements . . . . . . . . . . . . . . . . . . 213
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . 213
   Intellectual Property and Copyright Statements  . . . . . . . . . 214










Shanmugham & Burnett    Expires December 25, 2008               [Page 7]

Internet-Draft                   MRCPv2                        June 2008


1.  Introduction

   The MRCPv2 protocol is designed to allow a client device to control
   media processing resources on the network.  Some of these media
   processing resources include speech recognition engines, speech
   synthesis engines, speaker verification and speaker identification
   engines.  MRCPv2 enables the implementation of distributed
   Interactive Voice Response platforms using VoiceXML
   [W3C.REC-voicexml20-20040316] browsers or other client applications
   while maintaining separate back-end speech processing capabilities on
   specialized speech processing servers.  MRCPv2 is based on the
   earlier Media Resource Control Protocol (MRCP) [RFC4463] developed
   jointly by Cisco Systems, Inc., Nuance Communications, and
   Speechworks Inc.

   The protocol requirements of SPEECHSC [RFC4313] dictate that the
   solution be capable of reaching a media processing server and setting
   up communication channels to the media resources, and sending and
   receiving control messages and media streams to/from the server.  The
   Session Initiation Protocol (SIP) [RFC3261] meets these requirements.
   MRCPv2 leverages these capabilities by building upon SIP and the
   Session Description Protocol (SDP) [RFC4566].  MRCPv2 uses SIP to
   setup and tear down media and control sessions with the server.  In
   addition, the client can use a SIP re-INVITE method (an INVITE dialog
   sent within an existing SIP Session) to change the characteristics of
   these media and control session while maintaining the SIP dialog
   between the client and server.  SDP is used to describe the
   parameters of the media sessions associated with that dialog.  It is
   mandatory to support SIP as the session establishment protocol to
   ensure interoperability.  Other protocols can be used for session
   establishment by prior agreement.  This document only describes the
   use of SIP and SDP.

   MRCPv2 uses SIP and SDP to create the client/server dialog and set up
   the media channels to the server.  It also uses SIP and SDP to
   establish MRCPv2 control sessions between the client and the server
   for each media processing resource required for that dialog.  The
   MRCPv2 protocol exchange between the client and the media resource is
   carried on that control session.  MRCPv2 protocol exchanges do not
   change the state of the SIP dialog, the media sessions, or other
   parameters of the dialog initiated via SIP.  It controls and affects
   the state of the media processing resource associated with the MRCPv2
   session(s).

   MRCPv2 defines the messages to control the different media processing
   resources and the state machines required to guide their operation.
   It also describes how these messages are carried over a transport
   layer protocol such as TCP or TLS (Note: SCTP is a viable transport



Shanmugham & Burnett    Expires December 25, 2008               [Page 8]

Internet-Draft                   MRCPv2                        June 2008


   for MRCPv2 as well, but the mapping onto SCTP is not described in
   this specification).


2.  Document Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC2119 [RFC2119].

   Since many of the definitions and syntax are identical to HTTP/1.1
   (RFC2616 [RFC2616]), this specification refers to the section where
   they are defined rather than copying it.  For brevity, [HX.Y] is to
   be taken to refer to Section X.Y of RFC2616.

   All the mechanisms specified in this document are described in both
   prose and an augmented Backus-Naur form (ABNF [RFC4234]).

   The complete message format in ABNF form is provided in Section 15
   and is the normative format definition.

2.1.  Definitions

   Media Resource
                  An entity on the speech processing server that can be
                  controlled through the MRCPv2 protocol.
   MRCP Server
                  Aggregate of one or more "Media Resource" entities on
                  a Server, exposed through the MRCPv2 protocol
                  ("Server" for short).
   MRCP Client
                  An entity controlling one or more Media Resources
                  through the MRCPv2 protocol ("Client" for short).
   DTMF
                  Dual Tone Multi-Frequency; a method of transmitting
                  key presses in-band, either as actual tones (Q.23
                  [Q.23]) or as named tone events (RFC4733 [RFC4733]).
   Hotword Mode
                  A mode of speech recognition where a stream of
                  utterances is evaluated for match against a small set
                  of command words.  This is generally employed to
                  either trigger some action, or to control the
                  subsequent grammar to be used for further recognition

2.2.  State-Machine Diagrams

   The state-machine diagrams in this document do not show every
   possible method call.  Rather, they reflect the state of the resource



Shanmugham & Burnett    Expires December 25, 2008               [Page 9]

Internet-Draft                   MRCPv2                        June 2008


   based on the methods that have moved to IN-PROGRESS or COMPLETE
   states.  Note that since PENDING requests essentially have not
   affected the resource yet and are in queue to be processed, they are
   not reflected in the state-machine diagrams.


3.  Architecture

   A system using MRCPv2 consists of a client that requires the
   generation and/or consumption of media streams and a media resource
   server that has the resources or "engines" to process these streams
   as input or generate these streams as output.  The client uses SIP
   and SDP to establish an MRCPv2 control channel with the server to use
   its media processing resources.  MRCPv2 servers are addressed using
   SIP URIs.

   The session management protocol (SIP) uses SDP with the offer/answer
   model described in RFC3264 [RFC3264] to set up the MRCPv2 control
   channels and describe their characteristics.  A separate MRCPv2
   session is needed to control each of the media processing resources
   associated with the SIP dialog between the client and server.  Within
   a SIP dialog, the individual resource control channels for the
   different resources are added or removed through SDP offer/answer
   carried in a SIP re-INVITE transaction.

   The server, through the SDP exchange, provides the client with an
   unambiguous channel identifier and a TCP port number.  The client MAY
   then open a new TCP connection with the server using this port
   number.  Multiple MRCPv2 channels can share a TCP connection between
   the client and the server.  All MRCPv2 messages exchanged between the
   client and the server carry the specified channel identifier that the
   server MUST ensure is unambiguous among all MRCPv2 control channels
   that are active on that server.  The client uses this channel
   identifier to indicate the media processing resource associated with
   that channel.

   The session management protocol (SIP) also establishes the media
   sessions between the client (or other source/sink of media) and the
   MRCPv2 server using SDP m-lines.  One or more media processing
   resources may share a media session under a SIP session, or each
   media processing resource may have its own media session.










Shanmugham & Burnett    Expires December 25, 2008              [Page 10]

Internet-Draft                   MRCPv2                        June 2008


        MRCPv2 client                    MRCPv2 Media Resource Server
   |--------------------|             |-----------------------------|
   ||------------------||             ||---------------------------||
   || Application Layer||             || TTS  | ASR  | SV   | SI   ||
   ||------------------||             ||Engine|Engine|Engine|Engine||
   ||Media Resource API||             ||---------------------------||
   ||------------------||             || Media Resource Management ||
   || SIP  |  MRCPv2   ||             ||---------------------------||
   ||Stack |           ||             ||   SIP  |    MRCPv2        ||
   ||      |           ||             ||  Stack |                  ||
   ||------------------||             ||---------------------------||
   ||   TCP/IP Stack   ||----MRCPv2---||       TCP/IP Stack        ||
   ||                  ||             ||                           ||
   ||------------------||-----SIP-----||---------------------------||
   |--------------------|             |-----------------------------|
            |                             /
           SIP                           /
            |                           /
   |-------------------|              RTP
   |                   |              /
   | Media Source/Sink |-------------/
   |                   |
   |-------------------|


                      Figure 1: Architectural Diagram

3.1.  MRCPv2 Media Resource Types

   An MRCPv2 server may offer one or more of the following media
   processing resources to its clients.
   Basic Synthesizer
                  A speech synthesizer resource with very limited
                  capabilities, that can generate its media stream
                  exclusively from concatenated audio clips.  The speech
                  data is described using a limited subset of SSML
                  [W3C.REC-speech-synthesis-20040907] elements.  A basic
                  synthesizer MUST support the SSML tags <speak>,
                  <audio>, <say-as> and <mark>.
   Speech Synthesizer
                  A full capability speech synthesis resource capable of
                  rendering speech from text.  Such a synthesizer MUST
                  have full SSML [W3C.REC-speech-synthesis-20040907]
                  support.







Shanmugham & Burnett    Expires December 25, 2008              [Page 11]

Internet-Draft                   MRCPv2                        June 2008


   Recorder
                  A resource capable of recording audio and saving it to
                  a URI.  A recorder MUST provide some end-pointing
                  capabilities for suppressing silence at the beginning
                  and end of a recording, and MAY also suppress silence
                  in the middle of a recording.  If such suppression is
                  done, the recorder MUST maintain timing metadata to
                  indicate the actual time stamps of the recorded media.
   DTMF Recognizer
                  A recognition resource capable of extracting and
                  interpreting DTMF digits in a media stream and
                  matching them against a supplied digit grammar It
                  could also do a semantic interpretation based on
                  semantic tags in the grammar.
   Speech Recognizer
                  A full speech recognition resource that is capable of
                  receiving a media stream containing audio and
                  interpreting it to recognition results.  It also has a
                  natural language semantic interpreter to post-process
                  the recognized data according to the semantic data in
                  the grammar and provide semantic results along with
                  the recognized input.  The recognizer may also support
                  enrolled grammars, where the client can enroll and
                  create new personal grammars for use in future
                  recognition operations.
   Speaker Verifier
                  A resource capable of verifying the authenticity of a
                  claimed identity by matching a media stream containing
                  spoken input to a pre-existing voiceprint.  This may
                  also involve matching the caller's voice against more
                  than one voiceprint, also called multi-verification or
                  speaker identification.

3.2.  Server and Resource Addressing

   The MRCPv2 server as a whole is a generic SIP server and is addressed
   is by a SIP Contact URI registered by the server through SIP (or via
   static configuration of the SIP registrar).

   For example:

        sip:mrcpv2@example.net


4.  MRCPv2 Protocol Basics

   MRCPv2 requires a connection-oriented transport layer protocol such
   as TCP or SCTP to guarantee reliable sequencing and delivery of



Shanmugham & Burnett    Expires December 25, 2008              [Page 12]

Internet-Draft                   MRCPv2                        June 2008


   MRCPv2 control messages between the client and the server.  In order
   to meet the requirements for security enumerated in SpeechSC
   Requirements [RFC4313], clients and servers MUST implement TLS as
   well.  One or more connections between the client and the server can
   be shared among different MRCPv2 channels to the server.  The
   individual messages carry the channel identifier to differentiate
   messages on different channels.  MRCPv2 protocol encoding is text
   based with mechanisms to carry embedded binary data.  This allows
   arbitrary data like recognition grammars, recognition results,
   synthesizer speech markup etc. to be carried in MRCPv2 messages.

4.1.  Connecting to the Server

   MRCPv2 employs a session establishment and management protocol such
   as SIP in conjunction with SDP.  The client finds and reaches an
   MRCPv2 server using conventional INVITE and other SIP transactions
   for establishing, maintaining, and terminating SIP dialogs.  The SDP
   offer/answer exchange model over SIP is used to establish a resource
   control channel for each resource.  The SDP offer/answer exchange is
   also used to establish media sessions between the server and the
   source or sink of audio.

4.2.  Managing Resource Control Channels

   The client needs a separate MRCPv2 resource control channel to
   control each media processing resource under the SIP dialog.  A
   unique channel identifier string identifies these resource control
   channels.  The channel identifier is an unambiguous, opaque string
   followed by an "@", then by a string token specifying the type of
   resource.  The server generates the channel identifier and MUST make
   sure it does not clash with the identifier of any other MRCP channel
   currently allocated by that server.  MRCPv2 defines the following
   IANA-registered types of media processing resources.  Additional
   resource types, their associated methods/events and state machines
   may be added by future specification proposing to extend the
   capabilities of MRCPv2.

          +---------------+----------------------+--------------+
          | Resource Type | Resource Description | Described in |
          +---------------+----------------------+--------------+
          | speechrecog   | Speech Recognizer    | Section 9    |
          | dtmfrecog     | DTMF Recognizer      | Section 9    |
          | speechsynth   | Speech Synthesizer   | Section 8    |
          | basicsynth    | Basic Synthesizer    | Section 8    |
          | speakverify   | Speaker Verification | Section 11   |
          | recorder      | Speech Recorder      | Section 10   |
          +---------------+----------------------+--------------+




Shanmugham & Burnett    Expires December 25, 2008              [Page 13]

Internet-Draft                   MRCPv2                        June 2008


                              Resource Types

   The SIP INVITE or re-INVITE transaction and the SDP offer/answer
   exchange it carries contain m-lines describing the resource control
   channel to be allocated.  There MUST be one SDP m-line for each
   MRCPv2 resource to be used in the session.  This m-line MUST have a
   media type field of "application" and a transport type field of
   either "TCP/MRCPv2" or "TCP/TLS/MRCPv2".  (The usage of SCTP with
   MRCPv2 may be addressed in a future specification).  The port number
   field of the m-line MUST contain the "discard" port of the transport
   protocol (port 9 for TCP) in the SDP offer from the client and MUST
   contain the TCP listen port on the server in the SDP answer.  The
   client may then either set up a TCP or TLS connection to that server
   port or share an already established connection to that port.  Since
   MRCPv2 allows multiple sessions to share the same TCP connection,
   multiple m-lines in a single SDP document may share the same port
   field value; MRCPv2 servers MUST NOT assume any relationship between
   resources using the same port other than the sharing of the
   communication channel.

   MRCPv2 resources do not use the port or format field of the m-line to
   distinguish themselves from other resources using the same channel.
   The client MUST specify the resource type identifier in the resource
   attribute associated with the control m-line of the SDP offer.  The
   server MUST respond with the full Channel-Identifier (which includes
   the resource type identifier and an unambiguous string) in the
   "channel" attribute associated with the control m-line of the SDP
   answer.  To remain backwards compatible with conventional SDP usage,
   the format field of the m-line MUST have the arbitrarily-selected
   value of "1".

   When the client wants to add a media processing resource to the
   session, it issues a SIP re-INVITE transaction.  The SDP offer/answer
   exchange carried by this SIP transaction contains one or more
   additional control m-lines for the new resources to be allocated to
   the session.  The server, on seeing the new m-line, allocates the
   resources (if they are available) and responds with a corresponding
   control m-line in the SDP answer carried in the SIP response.

   The a=setup attribute, as described in RFC4145 [RFC4145], MUST be
   "active" for the offer from the client and MUST be "passive" for the
   answer from the MRCPv2 server.  The a=connection attribute MUST have
   a value of "new" on the very first control m-line offer from the
   client to an MRCPv2 server.  Subsequent control m-line offers from
   the client to the MRCP server MAY contain "new" or "existing",
   depending on whether the client wants to set up a new connection or
   share an existing connection, respectively.  If the client specifies
   a value of "new", the server MUST respond with a value of "new".  If



Shanmugham & Burnett    Expires December 25, 2008              [Page 14]

Internet-Draft                   MRCPv2                        June 2008


   the client specifies a value of "existing", the server MAY respond
   with a value of "existing" if it prefers to share an existing
   connection or can answer with a value of "new", in which case the
   client MUST initiate a new transport connection.

   When the client wants to de-allocate the resource from this session,
   it issues a SIP re-INVITE transaction with the server.  The SDP MUST
   offer the control m-line with port 0.  The server MUST then answer
   the control m-line with a response of port 0.  This de-allocates the
   associated MRCPv2 identifier and resource.  The server MUST NOT close
   the TCP, SCTP or TLS connection if it is currently being shared among
   multiple MRCP channels.  When all MRCP channels that may be sharing
   the connection are released and/or the associated SIP dialog is
   terminated, the client or server terminates the connection.

   All servers MUST support TLS.  Servers MAY support TCP without TLS in
   physically secure environments.  It is up to the client, through the
   SDP offer, to choose which transport it wants to use for an MRCPv2
   session.  Aside from the exceptions given above, when using TCP the
   m-lines MUST conform to RFC4145 [RFC4145], which describes the usage
   of SDP for connection-oriented transport.  When using TLS the SDP
   m-line for the control pipe MUST conform to comedia over TLS
   [RFC4572], which specifies the usage of SDP for establishing a secure
   connection-oriented transport over TLS.

   This example exchange adds a resource control channel for a
   synthesizer.  Since a synthesizer also generates an audio stream,
   this interaction also creates a receive-only RTP media session for
   the server to send audio to.

   C->S:  INVITE sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
          branch=z9hG4bK74bf9
          Max-Forwards:6
          To:MediaServer <sip:mresources@server.example.com>
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:314161 INVITE
          Contact:<sip:sarvi@example.com>
          Content-Type:application/sdp
          Content-Length: 244

          v=0
          o=sarvi 2890844526 2890842808 IN IP4 192.0.2.4
          s=-
          c=IN IP4 192.0.2.12
          m=application 9 TCP/MRCPv2 1
          a=setup:active



Shanmugham & Burnett    Expires December 25, 2008              [Page 15]

Internet-Draft                   MRCPv2                        June 2008


          a=connection:new
          a=resource:speechsynth
          a=cmid:1
          m=audio 49170 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=recvonly
          a=mid:1


   S->C:  SIP/2.0 200 OK
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
          branch=z9hG4bK74bf9
          To:MediaServer <sip:mresources@server.example.com>
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:314161 INVITE
          Contact:<sip:mresources@server.example.com>
          Content-Type:application/sdp
          Content-Length: 260

          v=0
          o=- 2890844526 2890842808 IN IP4 192.0.2.4
          s=-
          c=IN IP4 192.0.2.11
          m=application 32416 TCP/MRCPv2 1
          a=setup:passive
          a=connection:new
          a=channel:32AECB234338@speechsynth
          a=cmid:1
          m=audio 48260 RTP/AVP 00 96
          a=rtpmap:0 pcmu/8000
          a=sendonly
          a=mid:1


   C->S:  ACK sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
          branch=z9hG4bK74bf9
          Max-Forwards:6
          To:MediaServer <sip:mresources@server.example.com>;tag=a6c85cf
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:314162 ACK
          Content-Length:0

                 Example: Add Synthesizer Control Channel

   This example exchange continues from the previous figure and



Shanmugham & Burnett    Expires December 25, 2008              [Page 16]

Internet-Draft                   MRCPv2                        June 2008


   allocates an additional resource control channel for a recognizer.
   Since a recognizer would need to receive an audio stream for
   recognition, this interaction also updates the audio stream to
   sendrecv, making it a 2-way RTP media session.

   C->S:  INVITE sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
          branch=z9hG4bK74bf9
          Max-Forwards:6
          To:MediaServer <sip:mresources@server.example.com>
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:314163 INVITE
          Contact:<sip:sarvi@example.com>
          Content-Type:application/sdp
          Content-Length: 397

          v=0
          o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
          s=-
          c=IN IP4 192.0.2.12
          m=application 9 TCP/MRCPv2 1
          a=setup:active
          a=connection:existing
          a=resource:speechsynth
          a=cmid:1
          m=audio 49170 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=rtpmap:96 telephone-event/8000
          a=fmtp:96 0-15
          a=sendrecv
          a=mid:1
          m=application 9 TCP/MRCPv2 1
          a=setup:active
          a=connection:existing
          a=resource:speechrecog
          a=cmid:1


   S->C:  SIP/2.0 200 OK
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
          branch=z9hG4bK74bf9
          To:MediaServer <sip:mresources@server.example.com>
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:314163 INVITE
          Contact:<sip:sarvi@example.com>
          Content-Type:application/sdp



Shanmugham & Burnett    Expires December 25, 2008              [Page 17]

Internet-Draft                   MRCPv2                        June 2008


          Content-Length:431

          v=0
          o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
          s=-
          c=IN IP4 192.0.2.11
          m=application 32416 TCP/MRCPv2 1
          a=setup:passive
          a=connection:existing
          a=channel:32AECB234338@speechsynth
          a=cmid:1
          m=audio 48260 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=rtpmap:96 telephone-event/8000
          a=fmtp:96 0-15
          a=sendrecv
          a=mid:1
          m=application 32416 TCP/MRCPv2 1
          a=setup:passive
          a=connection:existing
          a=channel:32AECB234338@speechrecog
          a=cmid:1


   C->S:  ACK sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
          branch=z9hG4bK74bf9
          Max-Forwards:6
          To:MediaServer <sip:mresources@server.example.com>;tag=a6c85cf
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:314164 ACK
          Content-Length:0

                          Add Recognizer example

   This example exchange continues from the previous figure and de-
   allocates recognizer channel.  Since a recognizer no longer needs to
   receive an audio stream, this interaction also updates the RTP media
   session to recvonly.

   C->S:  INVITE sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
          branch=z9hG4bK74bf9
          Max-Forwards:6
          To:MediaServer <sip:mresources@server.example.com>
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710



Shanmugham & Burnett    Expires December 25, 2008              [Page 18]

Internet-Draft                   MRCPv2                        June 2008


          CSeq:314163 INVITE
          Contact:<sip:sarvi@example.com>
          Content-Type:application/sdp
          Content-Length: 276

          v=0
          o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
          s=-
          c=IN IP4 192.0.2.12
          m=application 9 TCP/MRCPv2 1
          a=resource:speechsynth
          a=cmid:1
          m=audio 49170 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=recvonly
          a=mid:1
          m=application 0 TCP/MRCPv2 1
          a=resource:speechrecog
          a=cmid:1


   S->C:  SIP/2.0 200 OK
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
          branch=z9hG4bK74bf9
          To:MediaServer <sip:mresources@server.example.com>
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:314163 INVITE
          Contact:<sip:sarvi@example.com>
          Content-Type:application/sdp
          Content-Length:303

          v=0
          o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
          s=-
          c=IN IP4 192.0.2.11
          m=application 32416 TCP/MRCPv2 1
          a=channel:32AECB234338@speechsynth
          a=cmid:1
          m=audio 48260 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=sendonly
          a=mid:1
          m=application 0 TCP/MRCPv2 1
          a=channel:32AECB234338@speechrecog
          a=cmid:1





Shanmugham & Burnett    Expires December 25, 2008              [Page 19]

Internet-Draft                   MRCPv2                        June 2008


   C->S:  ACK sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
          branch=z9hG4bK74bf9
          Max-Forwards:6
          To:MediaServer <sip:mresources@server.example.com>;tag=a6c85cf
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:314164 ACK
          Content-Length:0

                       Deallocate Recognizer example

4.3.  Media Streams and RTP Ports

   Since MRCPv2 resources either generate or consume media streams, the
   client or the server needs to associate media sessions with their
   corresponding resource or resources.  More than one resource could be
   associated with a single media session or each resource could be
   assigned a separate media session.  Also note that more that one
   media session can be associated with a single resource if need be,
   but this scenario is not useful for the current set of resources.
   For example, a synthesizer and a recognizer could be associated to
   the same media session (m=audio line), if it is opened in "sendrecv"
   mode.  Alternatively, the recognizer could have its own "sendonly"
   audio session and the synthesizer could have its own "recvonly" audio
   session.

   The association between control channels and their corresponding
   media sessions is established using a new "resource channel media
   identifier" media-level attribute ("cmid").  Valid values of this
   attribute are the values of the "mid" attribute defined in RFC3388
   [RFC3388].  If there is more than 1 audio m-line, then each audio
   m-line MUST have a "mid" attribute.  Each control m-line MAY have one
   or more "cmid" attributes that match the resource control channel to
   the "mid" attributes of the audio m-lines it is associated with.
   Note that if a control m-line does not have a "cmid" attribute it
   will not be associated with any media.  The operations on such a
   resource will hence be limited.  For example, if it was a recognizer
   resource, the RECOGNIZE method requires an associated media to
   process while the INTERPRET method does not.  The formatting of the
   "cmid" attribute in SDP RFC3388 [RFC4566] is described by the
   following ABNF:

   cmid-attribute = "a=cmid:" identification-tag
   identification-tag = token

   To allow this flexible mapping of media sessions to MRCPv2 control
   channels, a single audio m-line can be associated with multiple



Shanmugham & Burnett    Expires December 25, 2008              [Page 20]

Internet-Draft                   MRCPv2                        June 2008


   resources or each resource can have its own audio m-line.  For
   example, if the client wants to allocate a recognizer and a
   synthesizer and associate them with a single 2-way audio pipe, the
   SDP offer would contain two control m-lines and a single audio m-line
   with an attribute of "sendrecv".  Each of the control m-lines would
   have a "cmid" attribute whose value matches the "mid" of the audio
   m-line.  If, on the other hand, the client wants to allocate a
   recognizer and a synthesizer each with its own separate audio pipe,
   the SDP offer would carry two control m-lines (one for the recognizer
   and another for the synthesizer) and two audio m-lines (one with the
   attribute "sendonly" and another with attribute "recvonly").  The
   "cmid" attribute of the recognizer control m-line would match the
   "mid" value of the "sendonly" audio m-line and the "cmid" attribute
   of the synthesizer control m-line would match the "mid" attribute of
   the "recvonly" m-line.

   When a server receives media (e.g. audio) on a media session that is
   associated with more than one media processing resource, it is the
   responsibility of the server to receive and fork it to the resources
   that need to consume it.  If multiple resources in an MRCPv2 session
   are generating audio (or other media) to be sent on a single
   associated media session, it is the responsibility of the server to
   either multiplex the multiple streams onto the single RTP session or
   contain an embedded RTP mixer (see RFC3550 [RFC3550]) to combine the
   multiple streams into one.  In the former case, the media stream will
   contain RTP packets generated by different sources, and hence the
   packets will have different Synchronization Source identifiers
   (SSRCs).  In the latter case, the RTP packets will contain multiple
   (CSRCs) corresponding to the original streams before being combined
   by the mixer.  An MRCPv2 implementation either MUST correctly process
   such RTP sessions, or alternatively MUST avoid associating multiple
   resources with a single session.

   Contributing SSRCs

   If a server does not have the capability to mix/multiplex or fork
   media, in the latter cases, then the server MUST disallow the client
   from associating multiple such resources to a single audio pipe by
   rejecting the SDP offer with a SIP 501 "Not Implemented" error.

4.4.  MRCPv2 Message Transport

   The MRCPv2 messages defined in this document are transported over a
   TCP, TLS or SCTP (in the future) connection between the client and
   the server.  The method for setting up this transport connection and
   the resource control channel is discussed in Section 4.1 and
   Section 4.2.  Multiple resource control channels between a client and
   a server that belong to different SIP dialogs can share one or more



Shanmugham & Burnett    Expires December 25, 2008              [Page 21]

Internet-Draft                   MRCPv2                        June 2008


   TLS, TCP or SCTP connections between them; the server and client MUST
   support this mode of operation.  The individual MRCPv2 messages carry
   the MRCPv2 channel identifier in their Channel-Identifier header,
   which MUST be used to differentiate MRCPv2 messages from different
   resource channels (see Section 6.2.1 for details).  All MRCPv2
   servers MUST support TLS.  Servers MAY support TCP without TLS in
   physically secure environments.  It is up to the client to choose
   which mode of transport it wants to use for an MRCPv2 session.

   Most examples from here on show only the MRCPv2 messages and do not
   show the SIP messages and headers that may have been used to
   establish the MRCPv2 control channel.


5.  MRCPv2 Specification

   MRCPv2 messages are textual using the ISO 10646 character set in the
   UTF-8 encoding (RFC3629 [RFC3629]) to allow many different languages
   to be represented.  However, to assist in compact representations,
   MRCPv2 also allows other character sets such as ISO 8859-1 to be used
   when desired.  The MRCPv2 protocol headers (the first line of an MRCP
   message) and header names use only the US-ASCII subset of UTF-8.
   Internationalization only applies to certain fields like grammar,
   results, speech markup etc, and not to MRCPv2 as a whole.

   Lines are terminated by CRLF.  Also, some parameters in the message
   may contain binary data or a record spanning multiple lines.  Such
   fields have a length value associated with the parameter, which
   indicates the number of octets immediately following the parameter.

5.1.  Common Protocol Elements

   The MRCPv2 message set consists of requests from the client to the
   server, responses from the server to the client and asynchronous
   events from the server to the client.  All these messages consist of
   a start-line, one or more headers, an empty line (i.e. a line with
   nothing preceding the CRLF) indicating the end of the header fields,
   and an optional message body.













Shanmugham & Burnett    Expires December 25, 2008              [Page 22]

Internet-Draft                   MRCPv2                        June 2008


   generic-message  =    start-line
                         message-header
                         CRLF
                         [ message-body ]

   start-line       =    request-line / response-line / event-line

   message-header   =    1*(generic-header / resource-header)

   resource-header  =    recognizer-header
                    /    synthesizer-header
                    /    recorder-header
                    /    verifier-header


   The message-body contains resource-specific and message-specific
   data.  The actual Media Types used to carry the data are specified
   later in the sections defining the individual messages.

   If a message contains a message body, the message MUST contain
   content-headers indicating the Media Type and encoding of the data in
   the message body.

   Request, response and event messages include the version of MRCP that
   the message conforms to.  Version compatibility rules follow [H3.1]
   regarding version ordering, compliance requirements, and upgrading of
   version numbers.  The version information is indicated by "MRCP" (as
   opposed to "HTTP" in [H3.1]) or "MRCP/2.0" (as opposed to "HTTP/1.1"
   in [H3.1]).  To be compliant with this specification, clients and
   servers sending MRCPv2 messages MUST indicate an mrcp-version of
   "MRCP/2.0".

   mrcp-version   =    "MRCP" "/" 1*2DIGIT "." 1*2DIGIT

   The message-length field specifies the length of the message,
   including the start-line, and MUST be the 2nd token from the
   beginning of the message.  This is to make the framing and parsing of
   the message simpler to do.  This field specifies the length of the
   message including data that may be encoded into the body of the
   message.  Note that this value MAY be printed as a fixed-length
   integer that is zero-padded in front in order to eliminate or reduce
   inefficiency in cases where the message-length value would change as
   a result of the length of the message-length token itself.

   message-length =    1*19DIGIT

   All MRCPv2 messages, responses and events MUST carry the Channel-
   Identifier header so the server or client can differentiate messages



Shanmugham & Burnett    Expires December 25, 2008              [Page 23]

Internet-Draft                   MRCPv2                        June 2008


   from different control channels that may share the same transport
   connection.

   In the resource-specific header descriptions in sections 8-11, a
   header is disallowed on a method (request, response, or event) for
   that resource unless specifically listed as being allowed.  Also, the
   phrasing "This header MAY occur on method X" indicates that the
   header is allowed on that method but is not required to be used in
   every instance of that method.

5.2.  Request

   An MRCPv2 request consists of a Request line followed by message
   headers and an optional message body containing data specific to the
   request message.

   The Request message from a client to the server includes within the
   first line the method to be applied, a method tag for that request
   and the version of the protocol in use.

   request-line   =    mrcp-version SP message-length SP method-name
                       SP request-id CRLF

   The request-id field is a unique identifier representable as an
   unsigned 32 bit integer created by the client and sent to the server.
   Consecutive requests within an MRCP session MUST utilize
   monotonically increasing request-id's.  The request-id space is
   linear, (i.e. not mod(32)) so the space does not wrap and validity
   can be checked with a simple unsigned comparison operation.  The
   client may choose any initial value for its first request, but a
   small integer is RECOMMENDED to avoid exhausting the space in long
   sessions.  If the server receives duplicate or out-of-order requests
   the server MUST reject the request with a response code of 410.
   Since request-id's are scoped to the MRCP session, they are unique
   across all TCP connections and all resource channels in the session.

   The server resource MUST use the client-assigned identifier in its
   response to the request.  If the request does not complete
   synchronously, future asynchronous events associated with this
   request MUST carry the client-assigned request-id.

   The mrcp-version field is the MRCP protocol version that is being
   used by the client.

   The message-length field specifies the length of the message,
   including the start-line.

   request-id     =    1*19DIGIT



Shanmugham & Burnett    Expires December 25, 2008              [Page 24]

Internet-Draft                   MRCPv2                        June 2008


   The method-name field identifies the specific request that the client
   is making to the server.  Each resource supports a subset of the
   MRCPv2 methods.  The subset for each resource is defined in the
   section of the specification for the corresponding resource.

   method-name    =    generic-method
                  /    synthesizer-method
                  /    recorder-method
                  /    recognizer-method
                  /    verifier-method

5.3.  Response

   After receiving and interpreting the request message for a method,
   the server resource responds with an MRCPv2 response message.  The
   response consists of a response line followed by message headers and
   an optional message body containing data specific to the method.

   response-line  =    mrcp-version SP message-length SP request-id
                                    SP status-code SP request-state CRLF

   The mrcp-version field MUST contain the version of the request if
   supported; otherwise, it must contain the highest version of the
   MRCPv2 protocol supported by the server.

   The message-length field specifies the length of the message,
   including the start-line.

   The request-id used in the response MUST match the one sent in the
   corresponding request message.

   The status-code field is a 3-digit code representing the success or
   failure or other status of the request.

   The request-state field indicates if the action initiated by the
   Request is PENDING, IN-PROGRESS or COMPLETE.  The COMPLETE status
   means that the Request was processed to completion and that there
   will be no more events or other messages from that resource to the
   client with that request-id.  The PENDING status means that the
   request has been placed on a queue and will be processed in first-in-
   first-out order.  The IN-PROGRESS status means that the request is
   being processed and is not yet complete.  A PENDING or IN-PROGRESS
   status indicates that further Event messages may be delivered with
   that request-id.

   request-state    =  "COMPLETE"
                    /  "IN-PROGRESS"
                    /  "PENDING"



Shanmugham & Burnett    Expires December 25, 2008              [Page 25]

Internet-Draft                   MRCPv2                        June 2008


5.4.  Status Codes

   The status codes are classified under the Success (2XX) codes, Client
   Failure (4XX) codes, and Server Failure (5XX).

                               Success Codes

        +------------+--------------------------------------------+
        | Code       | Meaning                                    |
        +------------+--------------------------------------------+
        | 200        | Success                                    |
        | 201        | Success with some optional headers ignored |
        +------------+--------------------------------------------+

                                Success 2xx

                         Client Failure 4xx Codes

   +------------+------------------------------------------------------+
   | Code       | Meaning                                              |
   +------------+------------------------------------------------------+
   | 401        | Method not allowed                                   |
   | 402        | Method not valid in this state                       |
   | 403        | Unsupported Header                                   |
   | 404        | Illegal Value for Header.  This is the error for a   |
   |            | syntax violation.                                    |
   | 405        | Resource not allocated for this session or does not  |
   |            | exist                                                |
   | 406        | Mandatory Header Missing                             |
   | 407        | Method or Operation Failed (e.g., Grammar            |
   |            | compilation failed in the recognizer.  Detailed      |
   |            | cause codes MAY BE available through a resource      |
   |            | specific header.)                                    |
   | 408        | Unrecognized or unsupported message entity           |
   | 409        | Unsupported Header Value.  This is a value that is   |
   |            | syntactically legal but exceeds the implementation's |
   |            | capabilities or expectations.                        |
   | 410        | Non-Monotonic or Out of order sequence number in     |
   |            | request.                                             |
   | 411-420    | Reserved                                             |
   +------------+------------------------------------------------------+

                            Client Failure 4xx








Shanmugham & Burnett    Expires December 25, 2008              [Page 26]

Internet-Draft                   MRCPv2                        June 2008


                         Server Failure 5xx Codes

   +------------+------------------------------------------------------+
   | Code       | Meaning                                              |
   +------------+------------------------------------------------------+
   | 501        | Server Internal Error                                |
   | 502        | Protocol Version not supported                       |
   | 503        | Proxy Timeout.  The MRCP Proxy did not receive a     |
   |            | response from the MRCP server.                       |
   | 504        | Message too large                                    |
   +------------+------------------------------------------------------+

                            Server Failure 4xx

5.5.  Events

   The server resource may need to communicate a change in state or the
   occurrence of a certain event to the client.  These messages are used
   when a request does not complete immediately and the response returns
   a status of PENDING or IN-PROGRESS.  The intermediate results and
   events of the request are indicated to the client through the event
   message from the server.  The event message consists of an event
   header line followed by message headers and an optional message body
   containing data specific to the event message.  The header line has
   the request-id of the corresponding request and status value.  The
   status value is COMPLETE if the request is done and this was the last
   event, else it is IN-PROGRESS.

   event-line       =  mrcp-version SP message-length SP event-name
                                    SP request-id SP request-state CRLF

   The mrcp-version used here is identical to the one used in the
   Request/Response Line and indicates the version of the MRCPv2
   protocol running on the server.

   The message-length field specifies the length of the message,
   including the start-line

   The request-id used in the event MUST match the one sent in the
   request that caused this event.

   The request-state indicates whether the Request/Command causing this
   event is complete or still in progress, and is the same as the one
   mentioned in Section 5.3.  The final event for a request has a
   COMPLETE status indicating the completion of the request.

   The event-name identifies the nature of the event generated by the
   media resource.  The set of valid event names depends on the resource



Shanmugham & Burnett    Expires December 25, 2008              [Page 27]

Internet-Draft                   MRCPv2                        June 2008


   generating it.  See the corresponding resource-specific section of
   the document.

   event-name       =  synthesizer-event
                    /  recognizer-event
                    /  recorder-event
                    /  verifier-event


6.  MRCPv2 Generic Methods, Headers, and Result Structure

   MRCPv2 supports a set of methods and headers that are common to all
   resources.  These are discussed here; resource-specific methods and
   headers are discussed in the corresponding resource-specific section
   of the document.

6.1.  Generic Methods

   MRCPv2 supports two generic methods for reading and writing the state
   associated with a resource.

   generic-method      =    "SET-PARAMS"
                       /    "GET-PARAMS"

   These are described in the following sub-sections.

6.1.1.  SET-PARAMS

   The "SET-PARAMS" method, from the client to the server, tells the
   MRCPv2 resource to define parameters for the session, such as voice
   characteristics and prosody on synthesizers, recognition timers on
   recognizers, etc.  If the server accepts and sets all parameters it
   MUST return a Response-Status of 200.  If it chooses to ignore some
   optional headers that can be safely ignored without affecting
   operation of the server it MUST return 201.

   If one or more of the headers being sent is incorrect, error 403,
   404, or 409 MUST be returned as follows:
   o  If one or more of the headers being set has an illegal value, the
      server MUST reject the request with a 404 Illegal Value for
      Header.
   o  If one or more of the headers being set is unsupported for the
      resource, the server MUST reject the request with a 403
      Unsupported Header, except as described in the next paragraph.
   o  If one or more of the headers being set has an unsupported value,
      the server MUST reject the request with a 409 Unsupported Header
      Value, except as described in the next paragraph.




Shanmugham & Burnett    Expires December 25, 2008              [Page 28]

Internet-Draft                   MRCPv2                        June 2008


   If both error 404 and another error have occurred, only error 404
   MUST be returned.  If both errors 403 and 409 have occurred, but not
   error 404, only error 403 MUST be returned.

   If error 403, 404, or 409 is returned, the response MUST include the
   bad or unsupported headers and their values exactly as they were sent
   from the client.  Session parameters modified using "SET-PARAMS" do
   not override parameters explicitly specified on individual requests
   or requests that are in-PROGRESS.

   C->S:  MRCP/2.0 124 SET-PARAMS 543256
          Channel-Identifier:32AECB23433802@speechsynth
          Voice-gender:female
          Voice-variant:3

   S->C:  MRCP/2.0 47 543256 200 COMPLETE
          Channel-Identifier:32AECB23433802@speechsynth

6.1.2.  GET-PARAMS

   The "GET-PARAMS" method, from the client to the server, asks the
   MRCPv2 resource for its current session parameters, such as voice
   characteristics and prosody on synthesizers, recognition-timer on
   recognizers, etc.  For every empty header field the client sends in
   the request, the server MUST include the corresponding headers and
   their values in the response.  If no parameter headers are specified
   by the client then the server MUST return all the settable parameters
   and their values in the corresponding headers of the response,
   including vendor-specific parameters.  Such wild-card parameter
   requests can be very processing-intensive, since the number of
   settable parameters can be large depending on the implementation.
   Hence, it is RECOMMENDED that the client not use the wildcard
   "GET-PARAMS" operation very often.  Note that "GET-PARAMS" returns
   header values that apply to the whole session and not values that
   have a request level scope.

   If all of the headers requested are supported, the server MUST return
   a Response-Status of 200.  If some of the headers being retrieved are
   unsupported for the resource, the server MUST reject the request with
   a 403 Unsupported Header.  Such a response MUST include the (empty)
   unsupported headers exactly as they were sent from the client.










Shanmugham & Burnett    Expires December 25, 2008              [Page 29]

Internet-Draft                   MRCPv2                        June 2008


   C->S:   MRCP/2.0 136 GET-PARAMS 543256
           Channel-Identifier:32AECB23433802@speechsynth
           Voice-gender:
           Voice-variant:
           Vendor-Specific-Parameters:com.example.param1;
                         com.example.param2

   S->C:   MRCP/2.0 163 543256 200 COMPLETE
           Channel-Identifier:32AECB23433802@speechsynth
           Voice-gender:female
           Voice-variant:3
           Vendor-Specific-Parameters:com.example.param1="Company Name";
                         com.example.param2="124324234@example.com"

6.2.  Generic Message Headers

   All MRCPv2 headers, which include both the generic-headers defined in
   the following sub-sections and the resource-specific headers defined
   later, follow the same generic format as that given in Section 3.1 of
   RFC2822 [RFC2822].  Each header consists of a name followed by a
   colon (":") and the value.  Header names are case-insensitive.  The
   value MAY be preceded by any amount of LWS, though a single SP is
   preferred.  Headers may extend over multiple lines by preceding each
   extra line with at least one SP or HT.

   message-header = field-name ":" [ field-value ]
   field-name     = token
   field-value    = *LWS field-content *( CRLF 1*LWS field-content)
   field-content  = <the OCTETs making up the field-value
                       and consisting of either *TEXT or combinations
                       of token, separators, and quoted-string>

   The field-content does not include any leading or trailing LWS (i.e.
   linear white space occurring before the first non-whitespace
   character of the field-value or after the last non-whitespace
   character of the field-value).  Such leading or trailing LWS MAY be
   removed without changing the semantics of the field value.  Any LWS
   that occurs between field-content MAY be replaced with a single SP
   before interpreting the field value or forwarding the message
   downstream.

   MRCPv2 servers and clients MUST NOT depend on header order.  It is
   "good practice" to send general-header fields first, followed by
   request-header or response-header fields, and ending with the entity-
   header fields.  However, MRCPv2 servers and clients MUST be prepared
   to process the headers in any order.  The only exception to this rule
   is when there are multiple headers with the same header name in a
   message.



Shanmugham & Burnett    Expires December 25, 2008              [Page 30]

Internet-Draft                   MRCPv2                        June 2008


   Multiple headers with the same name MAY be present in a message if
   and only if the entire value for that header is defined as a comma-
   separated list [i.e., #(values)].

   It MUST be possible to combine the multiple headers of the same name
   into one "header:value" pair without changing the semantics of the
   message, by appending each subsequent value to the first, each
   separated by a comma.  The order in which headers with the same name
   are received is therefore significant to the interpretation of the
   combined header value, and thus an intermediary MUST NOT change the
   order of these values when a message is forwarded.

   generic-header      =    channel-identifier
                       /    accept
                       /    active-request-id-list
                       /    proxy-sync-id
                       /    accept-charset
                       /    content-type
                       /    content-id
                       /    content-base
                       /    content-encoding
                       /    content-location
                       /    content-length
                       /    fetch-timeout
                       /    cache-control
                       /    logging-tag
                       /    set-cookie
                       /    set-cookie2
                       /    vendor-specific

6.2.1.  Channel-Identifier

   All MRCPv2 requests, responses and events MUST contain the Channel-
   Identifier header.  The value is allocated by the server when a
   control channel is added to the session and communicated to the
   client by the "a=channel" attribute in the SDP answer from the
   server.  The header value consists of 2 parts separated by the '@'
   symbol.  The first part is an unambiguous string identifying the
   MRCPv2 session.  The second part is a string token which specifies
   one of the media processing resource types listed in Section 3.1.
   The unambiguous string (first part) MUST BE unique among the resource
   instances managed by the server and is common to all resource
   channels with that server established through a single SIP dialog.

   channel-identifier  = "Channel-Identifier" ":" channel-id CRLF
   channel-id          = 1*alphanum "@" 1*alphanum





Shanmugham & Burnett    Expires December 25, 2008              [Page 31]

Internet-Draft                   MRCPv2                        June 2008


6.2.2.  Accept

   The Accept header field follows the syntax defined in [H14.1].  The
   semantics are also identical, with the exception that if no Accept
   header field is present, the server MUST assume a default value that
   is specific to the resource type that is being controlled.  This
   default value can be changed for a resource on a session by sending
   this header in a SET-PARAMS method.  The current default value of
   this header for a resource in a session can be found through a GET-
   PARAMS method.

6.2.3.  Active-Request-Id-List

   In a request, this header indicates the list of request-ids to which
   the request applies.  This is useful when there are multiple requests
   that are PENDING or IN-PROGRESS and the client wants this request to
   apply to one or more of these specifically.

   In a response, this header returns the list of request-ids that the
   method modified or affected.  There could be one or more requests in
   a request-state of PENDING or IN-PROGRESS.  When a method affecting
   one or more PENDING or IN-PROGRESS requests is sent from the client
   to the server, the response MUST contain the list of request-ids that
   were affected or modified by this command in its header.

   The active-request-id-list is only used in requests and responses,
   not in events.

   For example, if a "STOP" request with no active-request-id-list is
   sent to a synthesizer resource which has one or more "SPEAK" requests
   in the PENDING or IN-PROGRESS state, all "SPEAK" requests MUST be
   cancelled, including the one IN-PROGRESS.  The response to the "STOP"
   request contains in the active-request-id-list the request-ids of all
   the "SPEAK" requests that were terminated.  After sending the STOP
   response, the server MUST NOT send any SPEAK-COMPLETE or RECOGNITION-
   COMPLETE events for the terminated requests.

   active-request-id-list  =  "Active-Request-Id-List" ":"
                              request-id *("," request-id) CRLF

6.2.4.  Proxy-Sync-Id

   When any server resource generates a barge-in-able event, it also
   generates a unique tag.  The tag is sent as this header's value in an
   event to the client.  The client then acts as a intermediary among
   the server resources and sends a BARGE-IN-OCCURRED method to the
   synthesizer server resource with the Proxy-Sync-Id it received from
   the server resource.  When the recognizer and synthesizer resources



Shanmugham & Burnett    Expires December 25, 2008              [Page 32]

Internet-Draft                   MRCPv2                        June 2008


   are part of the same session, they may choose to work together to
   achieve quicker interaction and response.  Here the proxy-sync-id
   helps the resource receiving the event, intermediated by the client,
   to decide if this event has been processed through a direct
   interaction of the resources.

   proxy-sync-id    =  "Proxy-Sync-Id" ":" 1*VCHAR CRLF

6.2.5.  Accept-Charset

   See [H14.2].  This specifies the acceptable character set for
   entities returned in the response or events associated with this
   request.  This is useful in specifying the character set to use in
   the NLSML results of a "RECOGNITION-COMPLETE" event.

6.2.6.  Content-Type

   See [H14.17].  MRCPv2 supports a restricted set of registered Media
   Types for content, including speech markup, grammar, and recognition
   results.  The content types applicable to each MRCPv2 resource-type
   are specified in the corresponding section of the document.  The
   multi-part content type "multi-part/mixed" is supported to
   communicate multiple of the above mentioned contents, in which case
   the body parts MUST NOT contain any MRCPv2 specific headers.

6.2.7.  Content-ID

   This header contains an ID or name for the content by which it can be
   referenced.  This header operates according to the specification in
   RFC2392 [RFC2392] and is required for content disambiguation in
   multi-part messages.  In MRCPv2 whenever the associated content is
   stored, by either the client or the server, it MUST be retrievable
   using this ID.  Such content can be referenced later in a session by
   addressing it with the ""session:"" URI scheme described in
   Section 13.6.

6.2.8.  Content-Base

   The content-base entity-header may be used to specify the base URI
   for resolving relative URLs within the entity.

   content-base      = "Content-Base" ":" absoluteURI CRLF

   Note, however, that the base URI of the contents within the entity-
   body may be redefined within that entity-body.  An example of this
   would be multi-part media, which in turn can have multiple entities
   within it.




Shanmugham & Burnett    Expires December 25, 2008              [Page 33]

Internet-Draft                   MRCPv2                        June 2008


6.2.9.  Content-Encoding

   The content-encoding entity-header is used as a modifier to the
   media-type.  When present, its value indicates what additional
   content encoding has been applied to the entity-body, and thus what
   decoding mechanisms must be applied in order to obtain the media-type
   referenced by the content-type header.  Content-encoding is primarily
   used to allow a document to be compressed without losing the identity
   of its underlying media type.  Note that the SDP session can be used
   to determine accepted encodings (see Section 7).

   content-encoding  = "Content-Encoding" ":"
                       *WSP content-coding
                       *(*WSP "," *WSP content-coding *WSP )
                       CRLF


   Content-coding is defined in [H3.5].  An example of its use is
   Content-Encoding:gzip

   If multiple encodings have been applied to an entity, the content
   encodings MUST be listed in the order in which they were applied.

6.2.10.  Content-Location

   The content-location entity-header MAY be used to supply the resource
   location for the entity enclosed in the message when that entity is
   accessible from a location separate from the requested resource's
   URI.  Refer to [H14.14].

   content-location  =  "Content-Location" ":"
                        ( absoluteURI / relativeURI ) CRLF


   The content-location value is a statement of the location of the
   resource corresponding to this particular entity at the time of the
   request.  This header is provided for optimization purposes only.
   The receiver of this header MAY assume that the entity being sent is
   identical to what would have been retrieved or might already have
   been retrieved from the content-location URI.

   For example, if the client provided a grammar markup inline, and it
   had previously retrieved it from a certain URI, that URI can be
   provided as part of the entity, using the content-location header.
   This allows a resource like the recognizer to look into its cache to
   see if this grammar was previously retrieved, compiled and cached.
   In this case, it might optimize by using the previously compiled
   grammar object.



Shanmugham & Burnett    Expires December 25, 2008              [Page 34]

Internet-Draft                   MRCPv2                        June 2008


   If the content-location is a relative URI, the relative URI is
   interpreted relative to the content-base URI.

6.2.11.  Content-Length

   This header contains the length of the content of the message body
   (i.e. after the double CRLF following the last header field).  Unlike
   HTTP, it MUST be included in all messages that carry content beyond
   the header portion of the message.  If it is missing, a default value
   of zero is assumed.  Otherwise, it is interpreted according to
   [H14.13].  When a message having no use for a message body contains
   one, i.e. the Content-Length is non-zero, the receiver MAY ignore the
   content of the message body.

6.2.12.  Fetch Timeout

   When the recognizer or synthesizer needs to fetch documents or other
   resources this header controls the corresponding URI access
   properties.  This defines the timeout for content that the server may
   need to fetch over the network.  The value is interpreted to be in
   milliseconds and ranges from 0 to an implementation-specific maximum
   value.  The default value for this header is implementation-specific.
   This header MAY occur in "DEFINE-GRAMMAR", "RECOGNIZE", "SPEAK",
   "SET-PARAMS" or "GET-PARAMS".

   fetch-timeout       =   "Fetch-Timeout" ":" 1*19DIGIT CRLF

6.2.13.  Cache-Control

   If the server implements content caching, it MUST adhere to the cache
   correctness rules of HTTP 1.1 [RFC2616] when accessing and caching
   stored content.  In particular, the "expires" and "cache-control"
   headers of the cached URI or document MUST be honored and take
   precedence over the Cache-Control defaults set by this header.  The
   cache-control directives are used to define the default caching
   algorithms on the server for the session or request.  The scope of
   the directive is based on the method it is sent on.  If the
   directives are sent on a "SET-PARAMS" method, it applies for all
   requests for external documents the server makes during that session,
   unless overridden by a cache-control header on an individual request.
   If the directives are sent on any other requests they apply only to
   external document requests the server makes for that request.  An
   empty cache-control header on the "GET-PARAMS" method is a request
   for the server to return the current cache-control directives setting
   on the server.






Shanmugham & Burnett    Expires December 25, 2008              [Page 35]

Internet-Draft                   MRCPv2                        June 2008


   cache-control       = "Cache-Control" ":" cache-directive
                         *("," *LWS cache-directive) CRLF

   cache-directive     = "max-age" "=" delta-seconds
                       / "max-stale" [ "=" delta-seconds ]
                       / "min-fresh" "=" delta-seconds

   delta-seconds       = 1*19DIGIT


   Here delta-seconds is a decimal time value specifying the number of
   seconds since the instant the message response or data was received
   by the server.

   The cache-directives allow the client to ask the server to override
   the default cache expiration mechanisms.
   max-age        Indicates that the client can tolerate the server
                  using content whose age is no greater than the
                  specified time in seconds.  Unless a max-stale
                  directive is also included, the client is not willing
                  to accept a response based on stale data.
   min-fresh      Indicates that the client is willing to accept a
                  server response with cached data whose expiration is
                  no less than its current age plus the specified time
                  in seconds.  If the server's cache time to live
                  exceeds the client-supplied min-fresh value, the
                  server MUST NOT utilize cached content.
   max-stale      Indicates that the client is willing to allow a server
                  to utilize cached data that has exceeded its
                  expiration time.  If max-stale is assigned a value,
                  then the client is willing to allow the server to use
                  cached data that has exceeded its expiration time by
                  no more than the specified number of seconds.  If no
                  value is assigned to max-stale, then the client is
                  willing to allow the server to use stale data of any
                  age.

   The server cache MAY be requested to use stale response/data without
   validation, but only if this does not conflict with any "MUST"-level
   requirements concerning cache validation (e.g., a "must-revalidate"
   cache-control directive in the HTTP 1.1 specification pertaining to
   the corresponding URI).

   If both the MRCPv2 cache-control directive and the cached entry on
   the server include "max-age" directives, then the lesser of the two
   values is used for determining the freshness of the cached entry for
   that request.




Shanmugham & Burnett    Expires December 25, 2008              [Page 36]

Internet-Draft                   MRCPv2                        June 2008


6.2.14.  Logging-Tag

   This header MAY be sent as part of a "SET-PARAMS"/"GET-PARAMS" method
   to set or retrieve the logging tag for logs generated by the server.
   Once set, the value persists until a new value is set or the session
   ends.  The MRCPv2 server MAY provide a mechanism to subset its output
   logs so that system administrators can examine or extract only the
   log file portion during which the logging tag was set to a certain
   value.

   It is RECOMMENDED that clients have some identifying information in
   the logging tag, so that one can determine which client request
   generated a given log message at the server.

   logging-tag    = "Logging-Tag" ":" 1*UTFCHAR CRLF

6.2.15.  Set-Cookie and Set-Cookie2

   Since the associated HTTP client on an MRCPv2 server fetches
   documents for processing on behalf of the MRCPv2 client, the cookie
   store in the HTTP client of the MRCPv2 server is treated as an
   extension of the cookie store in the HTTP client of the MRCPv2
   client.  This requires that the MRCPv2 client and server be able to
   synchronize their common cookie store as needed.  To enable the
   MRCPv2 client to push its stored cookies to the MRCPv2 server and get
   new cookies from the MRCPv2 server stored back to the MRCPv2 client,
   the set-cookie and set-cookie2 entity-header fields MAY be included
   in MRCPv2 requests to update the cookie store on a server and be
   returned in final MRCPv2 responses or events to subsequently update
   the client's own cookie store.  The stored cookies on the server
   persist for the duration of the MRCPv2 session and MUST be destroyed
   at the end of the session.  To ensure support for the type of cookie
   header dictated by the HTTP origin server, MRCPv2 clients and servers
   MUST support both the set-cookie and set-cookie2 entity header
   fields.
















Shanmugham & Burnett    Expires December 25, 2008              [Page 37]

Internet-Draft                   MRCPv2                        June 2008


   set-cookie      =       "Set-Cookie:" cookies CRLF
   cookies         =       cookie *("," *LWS cookie)
   cookie          =       attribute "=" value *(";" cookie-av)
   cookie-av       =       "Comment" "=" value
                   /       "Domain" "=" value
                   /       "Max-Age" "=" value
                   /       "Path" "=" value
                   /       "Secure"
                   /       "Version" "=" 1*19DIGIT
                   /       "Age" "=" delta-seconds

   set-cookie2     =       "Set-Cookie2:" cookies2 CRLF
   cookies2        =       cookie2 *("," *LWS cookie2)
   cookie2         =       attribute "=" value *(";" cookie-av2)
   cookie-av2      =       "Comment" "=" value
                   /       "CommentURL" "=" DQUOTE uri DQUOTE
                   /       "Discard"
                   /       "Domain" "=" value
                   /       "Max-Age" "=" value
                   /       "Path" "=" value
                   /       "Port" [ "=" DQUOTE portlist DQUOTE ]
                   /       "Secure"
                   /       "Version" "=" 1*19DIGIT
                   /       "Age" "=" delta-seconds
   portlist        =       portnum *("," *LWS portnum)
   portnum         =       1*19DIGIT

   The set-cookie and set-cookie2 headers are specified in RFC2109
   [RFC2109] and RFC2965 [RFC2965], respectively.  The "Age" attribute
   is introduced in this specification to indicate the age of the cookie
   and is optional.  An MRCPv2 client or server MUST calculate the age
   of the cookie according to the age calculation rules in the HTTP/1.1
   specification [RFC2616] and append the "Age" attribute accordingly.

   The MRCPv2 client or server MUST supply defaults for the Domain and
   Path attributes if omitted by the HTTP origin server as specified in
   RFC2109 (set-cookie) and RFC2965 (set-cookie2).  Note that there is
   no