[RFCs/IDs] [Plain Text] [WG] [Email] [Diff1] [Diff2] [Nits]
Versions: 00 01 02 03 04 05 06 07 08 09 10 11
12 13 14 15 16 17
SPEECHSC S. Shanmugham
Internet-Draft Cisco Systems, Inc.
Intended status: Standards Track D. Burnett
Expires: December 25, 2008 Voxeo
June 23, 2008
Media Resource Control Protocol Version 2 (MRCPv2)
draft-ietf-speechsc-mrcpv2-16
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 25, 2008.
Abstract
The MRCPv2 protocol allows client hosts to control media service
resources such as speech synthesizers, recognizers, verifiers and
identifiers residing in servers on the network. MRCPv2 is not a
"stand-alone" protocol - it relies on a session management protocol
such as the Session Initiation Protocol (SIP) to establish the MRCPv2
control session between the client and the server, and for rendezvous
and capability discovery. It also depends on SIP and SDP to
establish the media sessions and associated parameters between the
media source or sink and the media server. Once this is done, the
MRCPv2 protocol exchange operates over the control session
established above, allowing the client to control the media
Shanmugham & Burnett Expires December 25, 2008 [Page 1]
Internet-Draft MRCPv2 June 2008
processing resources on the speech resource server.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Document Conventions . . . . . . . . . . . . . . . . . . . . 9
2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . 9
2.2. State-Machine Diagrams . . . . . . . . . . . . . . . . . 9
3. Architecture . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1. MRCPv2 Media Resource Types . . . . . . . . . . . . . . 11
3.2. Server and Resource Addressing . . . . . . . . . . . . . 12
4. MRCPv2 Protocol Basics . . . . . . . . . . . . . . . . . . . 12
4.1. Connecting to the Server . . . . . . . . . . . . . . . . 13
4.2. Managing Resource Control Channels . . . . . . . . . . . 13
4.3. Media Streams and RTP Ports . . . . . . . . . . . . . . 20
4.4. MRCPv2 Message Transport . . . . . . . . . . . . . . . . 21
5. MRCPv2 Specification . . . . . . . . . . . . . . . . . . . . 22
5.1. Common Protocol Elements . . . . . . . . . . . . . . . . 22
5.2. Request . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3. Response . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4. Status Codes . . . . . . . . . . . . . . . . . . . . . . 26
5.5. Events . . . . . . . . . . . . . . . . . . . . . . . . . 27
6. MRCPv2 Generic Methods, Headers, and Result Structure . . . . 28
6.1. Generic Methods . . . . . . . . . . . . . . . . . . . . 28
6.1.1. SET-PARAMS . . . . . . . . . . . . . . . . . . . . . 28
6.1.2. GET-PARAMS . . . . . . . . . . . . . . . . . . . . . 29
6.2. Generic Message Headers . . . . . . . . . . . . . . . . 30
6.2.1. Channel-Identifier . . . . . . . . . . . . . . . . . 31
6.2.2. Accept . . . . . . . . . . . . . . . . . . . . . . . 32
6.2.3. Active-Request-Id-List . . . . . . . . . . . . . . . 32
6.2.4. Proxy-Sync-Id . . . . . . . . . . . . . . . . . . . 32
6.2.5. Accept-Charset . . . . . . . . . . . . . . . . . . . 33
6.2.6. Content-Type . . . . . . . . . . . . . . . . . . . . 33
6.2.7. Content-ID . . . . . . . . . . . . . . . . . . . . . 33
6.2.8. Content-Base . . . . . . . . . . . . . . . . . . . . 33
6.2.9. Content-Encoding . . . . . . . . . . . . . . . . . . 34
6.2.10. Content-Location . . . . . . . . . . . . . . . . . . 34
6.2.11. Content-Length . . . . . . . . . . . . . . . . . . . 35
6.2.12. Fetch Timeout . . . . . . . . . . . . . . . . . . . 35
6.2.13. Cache-Control . . . . . . . . . . . . . . . . . . . 35
6.2.14. Logging-Tag . . . . . . . . . . . . . . . . . . . . 37
6.2.15. Set-Cookie and Set-Cookie2 . . . . . . . . . . . . . 37
6.2.16. Vendor Specific Parameters . . . . . . . . . . . . . 39
6.3. Generic Result Structure . . . . . . . . . . . . . . . . 39
6.3.1. Natural Language Semantics Markup Language . . . . . 40
7. Resource Discovery . . . . . . . . . . . . . . . . . . . . . 41
8. Speech Synthesizer Resource . . . . . . . . . . . . . . . . . 43
Shanmugham & Burnett Expires December 25, 2008 [Page 2]
Internet-Draft MRCPv2 June 2008
8.1. Synthesizer State Machine . . . . . . . . . . . . . . . 43
8.2. Synthesizer Methods . . . . . . . . . . . . . . . . . . 44
8.3. Synthesizer Events . . . . . . . . . . . . . . . . . . . 44
8.4. Synthesizer Header Fields . . . . . . . . . . . . . . . 45
8.4.1. Jump-Size . . . . . . . . . . . . . . . . . . . . . 45
8.4.2. Kill-On-Barge-In . . . . . . . . . . . . . . . . . . 46
8.4.3. Speaker Profile . . . . . . . . . . . . . . . . . . 46
8.4.4. Completion Cause . . . . . . . . . . . . . . . . . . 47
8.4.5. Completion Reason . . . . . . . . . . . . . . . . . 47
8.4.6. Voice-Parameter . . . . . . . . . . . . . . . . . . 48
8.4.7. Prosody-Parameters . . . . . . . . . . . . . . . . . 48
8.4.8. Speech Marker . . . . . . . . . . . . . . . . . . . 49
8.4.9. Speech Language . . . . . . . . . . . . . . . . . . 50
8.4.10. Fetch Hint . . . . . . . . . . . . . . . . . . . . . 50
8.4.11. Audio Fetch Hint . . . . . . . . . . . . . . . . . . 50
8.4.12. Failed URI . . . . . . . . . . . . . . . . . . . . . 51
8.4.13. Failed URI Cause . . . . . . . . . . . . . . . . . . 51
8.4.14. Speak Restart . . . . . . . . . . . . . . . . . . . 51
8.4.15. Speak Length . . . . . . . . . . . . . . . . . . . . 51
8.4.16. Load-Lexicon . . . . . . . . . . . . . . . . . . . . 52
8.4.17. Lexicon-Search-Order . . . . . . . . . . . . . . . . 52
8.5. Synthesizer Message Body . . . . . . . . . . . . . . . . 52
8.5.1. Synthesizer Speech Data . . . . . . . . . . . . . . 52
8.5.2. Lexicon Data . . . . . . . . . . . . . . . . . . . . 55
8.6. SPEAK Method . . . . . . . . . . . . . . . . . . . . . . 56
8.7. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.8. BARGE-IN-OCCURED . . . . . . . . . . . . . . . . . . . . 59
8.9. PAUSE . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.10. RESUME . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.11. CONTROL . . . . . . . . . . . . . . . . . . . . . . . . 64
8.12. SPEAK-COMPLETE . . . . . . . . . . . . . . . . . . . . . 66
8.13. SPEECH-MARKER . . . . . . . . . . . . . . . . . . . . . 67
8.14. DEFINE-LEXICON . . . . . . . . . . . . . . . . . . . . . 69
9. Speech Recognizer Resource . . . . . . . . . . . . . . . . . 69
9.1. Recognizer State Machine . . . . . . . . . . . . . . . . 71
9.2. Recognizer Methods . . . . . . . . . . . . . . . . . . . 71
9.3. Recognizer Events . . . . . . . . . . . . . . . . . . . 72
9.4. Recognizer Header Fields . . . . . . . . . . . . . . . . 72
9.4.1. Confidence Threshold . . . . . . . . . . . . . . . . 74
9.4.2. Sensitivity Level . . . . . . . . . . . . . . . . . 74
9.4.3. Speed Vs Accuracy . . . . . . . . . . . . . . . . . 75
9.4.4. N Best List Length . . . . . . . . . . . . . . . . . 75
9.4.5. Input Type . . . . . . . . . . . . . . . . . . . . . 75
9.4.6. No Input Timeout . . . . . . . . . . . . . . . . . . 75
9.4.7. Recognition Timeout . . . . . . . . . . . . . . . . 76
9.4.8. Waveform URI . . . . . . . . . . . . . . . . . . . . 76
9.4.9. Media Type . . . . . . . . . . . . . . . . . . . . . 77
9.4.10. Input-Waveform-URI . . . . . . . . . . . . . . . . . 77
Shanmugham & Burnett Expires December 25, 2008 [Page 3]
Internet-Draft MRCPv2 June 2008
9.4.11. Completion Cause . . . . . . . . . . . . . . . . . . 77
9.4.12. Completion Reason . . . . . . . . . . . . . . . . . 79
9.4.13. Recognizer Context Block . . . . . . . . . . . . . . 79
9.4.14. Start Input Timers . . . . . . . . . . . . . . . . . 80
9.4.15. Speech Complete Timeout . . . . . . . . . . . . . . 80
9.4.16. Speech Incomplete Timeout . . . . . . . . . . . . . 81
9.4.17. DTMF Interdigit Timeout . . . . . . . . . . . . . . 81
9.4.18. DTMF Term Timeout . . . . . . . . . . . . . . . . . 82
9.4.19. DTMF-Term-Char . . . . . . . . . . . . . . . . . . . 82
9.4.20. Failed URI . . . . . . . . . . . . . . . . . . . . . 82
9.4.21. Failed URI Cause . . . . . . . . . . . . . . . . . . 82
9.4.22. Save Waveform . . . . . . . . . . . . . . . . . . . 83
9.4.23. New Audio Channel . . . . . . . . . . . . . . . . . 83
9.4.24. Speech-Language . . . . . . . . . . . . . . . . . . 83
9.4.25. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 83
9.4.26. Recognition-Mode . . . . . . . . . . . . . . . . . . 84
9.4.27. Cancel-If-Queue . . . . . . . . . . . . . . . . . . 84
9.4.28. Hotword-Max-Duration . . . . . . . . . . . . . . . . 85
9.4.29. Hotword-Min-Duration . . . . . . . . . . . . . . . . 85
9.4.30. Interpret-Text . . . . . . . . . . . . . . . . . . . 85
9.4.31. DTMF-Buffer-Time . . . . . . . . . . . . . . . . . . 85
9.4.32. Clear-DTMF-Buffer . . . . . . . . . . . . . . . . . 86
9.4.33. Early-No-Match . . . . . . . . . . . . . . . . . . . 86
9.4.34. Num-Min-Consistent-Pronunciations . . . . . . . . . 86
9.4.35. Consistency-Threshold . . . . . . . . . . . . . . . 86
9.4.36. Clash-Threshold . . . . . . . . . . . . . . . . . . 87
9.4.37. Personal-Grammar-URI . . . . . . . . . . . . . . . . 87
9.4.38. Enroll-Utterance . . . . . . . . . . . . . . . . . . 87
9.4.39. Phrase-Id . . . . . . . . . . . . . . . . . . . . . 88
9.4.40. Phrase-NL . . . . . . . . . . . . . . . . . . . . . 88
9.4.41. Weight . . . . . . . . . . . . . . . . . . . . . . . 88
9.4.42. Save-Best-Waveform . . . . . . . . . . . . . . . . . 88
9.4.43. New-Phrase-Id . . . . . . . . . . . . . . . . . . . 89
9.4.44. Confusable-Phrases-URI . . . . . . . . . . . . . . . 89
9.4.45. Abort-Phrase-Enrollment . . . . . . . . . . . . . . 89
9.5. Recognizer Message Body . . . . . . . . . . . . . . . . 89
9.5.1. Recognizer Grammar Data . . . . . . . . . . . . . . 90
9.5.2. Recognizer Result Data . . . . . . . . . . . . . . . 93
9.5.3. Enrollment Result Data . . . . . . . . . . . . . . . 94
9.5.4. Recognizer Context Block . . . . . . . . . . . . . . 94
9.6. Recognizer Results . . . . . . . . . . . . . . . . . . . 94
9.6.1. Markup Functions . . . . . . . . . . . . . . . . . . 95
9.6.2. Overview of Recognizer Result Elements and their
Relationships . . . . . . . . . . . . . . . . . . . 96
9.6.3. Elements and Attributes . . . . . . . . . . . . . . 96
9.7. Enrollment Results . . . . . . . . . . . . . . . . . . . 101
9.7.1. NUM-CLASHES Element . . . . . . . . . . . . . . . . 101
9.7.2. NUM-GOOD-REPETITIONS Element . . . . . . . . . . . . 101
Shanmugham & Burnett Expires December 25, 2008 [Page 4]
Internet-Draft MRCPv2 June 2008
9.7.3. NUM-REPETITIONS-STILL-NEEDED Element . . . . . . . . 101
9.7.4. CONSISTENCY-STATUS Element . . . . . . . . . . . . . 102
9.7.5. CLASH-PHRASE-IDS Element . . . . . . . . . . . . . . 102
9.7.6. TRANSCRIPTIONS Element . . . . . . . . . . . . . . . 102
9.7.7. CONFUSABLE-PHRASES Element . . . . . . . . . . . . . 102
9.8. DEFINE-GRAMMAR . . . . . . . . . . . . . . . . . . . . . 102
9.9. RECOGNIZE . . . . . . . . . . . . . . . . . . . . . . . 106
9.10. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9.11. GET-RESULT . . . . . . . . . . . . . . . . . . . . . . . 113
9.12. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 114
9.13. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 115
9.14. RECOGNITION-COMPLETE . . . . . . . . . . . . . . . . . . 115
9.15. START-PHRASE-ENROLLMENT . . . . . . . . . . . . . . . . 117
9.16. ENROLLMENT-ROLLBACK . . . . . . . . . . . . . . . . . . 118
9.17. END-PHRASE-ENROLLMENT . . . . . . . . . . . . . . . . . 119
9.18. MODIFY-PHRASE . . . . . . . . . . . . . . . . . . . . . 119
9.19. DELETE-PHRASE . . . . . . . . . . . . . . . . . . . . . 120
9.20. INTERPRET . . . . . . . . . . . . . . . . . . . . . . . 120
9.21. INTERPRETATION-COMPLETE . . . . . . . . . . . . . . . . 121
9.22. DTMF Detection . . . . . . . . . . . . . . . . . . . . . 123
10. Recorder Resource . . . . . . . . . . . . . . . . . . . . . . 123
10.1. Recorder State Machine . . . . . . . . . . . . . . . . . 124
10.2. Recorder Methods . . . . . . . . . . . . . . . . . . . . 124
10.3. Recorder Events . . . . . . . . . . . . . . . . . . . . 124
10.4. Recorder Header Fields . . . . . . . . . . . . . . . . . 124
10.4.1. Sensitivity Level . . . . . . . . . . . . . . . . . 125
10.4.2. No Input Timeout . . . . . . . . . . . . . . . . . . 125
10.4.3. Completion Cause . . . . . . . . . . . . . . . . . . 125
10.4.4. Completion Reason . . . . . . . . . . . . . . . . . 126
10.4.5. Failed URI . . . . . . . . . . . . . . . . . . . . . 126
10.4.6. Failed URI Cause . . . . . . . . . . . . . . . . . . 126
10.4.7. Record URI . . . . . . . . . . . . . . . . . . . . . 127
10.4.8. Media Type . . . . . . . . . . . . . . . . . . . . . 127
10.4.9. Max Time . . . . . . . . . . . . . . . . . . . . . . 127
10.4.10. Trim-Length . . . . . . . . . . . . . . . . . . . . 128
10.4.11. Final Silence . . . . . . . . . . . . . . . . . . . 128
10.4.12. Capture On Speech . . . . . . . . . . . . . . . . . 128
10.4.13. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 128
10.4.14. Start Input Timers . . . . . . . . . . . . . . . . . 129
10.4.15. New Audio Channel . . . . . . . . . . . . . . . . . 129
10.5. Recorder Message Body . . . . . . . . . . . . . . . . . 129
10.6. RECORD . . . . . . . . . . . . . . . . . . . . . . . . . 129
10.7. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 130
10.8. RECORD-COMPLETE . . . . . . . . . . . . . . . . . . . . 131
10.9. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 132
10.10. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 132
11. Speaker Verification and Identification . . . . . . . . . . . 133
11.1. Speaker Verification State Machine . . . . . . . . . . . 134
Shanmugham & Burnett Expires December 25, 2008 [Page 5]
Internet-Draft MRCPv2 June 2008
11.2. Speaker Verification Methods . . . . . . . . . . . . . . 136
11.3. Verification Events . . . . . . . . . . . . . . . . . . 137
11.4. Verification Header Fields . . . . . . . . . . . . . . . 137
11.4.1. Repository-URI . . . . . . . . . . . . . . . . . . . 138
11.4.2. Voiceprint-Identifier . . . . . . . . . . . . . . . 138
11.4.3. Verification-Mode . . . . . . . . . . . . . . . . . 139
11.4.4. Adapt-Model . . . . . . . . . . . . . . . . . . . . 140
11.4.5. Abort-Model . . . . . . . . . . . . . . . . . . . . 140
11.4.6. Min-Verification-Score . . . . . . . . . . . . . . . 140
11.4.7. Num-Min-Verification-Phrases . . . . . . . . . . . . 140
11.4.8. Num-Max-Verification-Phrases . . . . . . . . . . . . 141
11.4.9. No-Input-Timeout . . . . . . . . . . . . . . . . . . 141
11.4.10. Save-Waveform . . . . . . . . . . . . . . . . . . . 141
11.4.11. Media Type . . . . . . . . . . . . . . . . . . . . . 142
11.4.12. Waveform-URI . . . . . . . . . . . . . . . . . . . . 142
11.4.13. Voiceprint-Exists . . . . . . . . . . . . . . . . . 142
11.4.14. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 143
11.4.15. Input-Waveform-Uri . . . . . . . . . . . . . . . . . 143
11.4.16. Completion-Cause . . . . . . . . . . . . . . . . . . 143
11.4.17. Completion Reason . . . . . . . . . . . . . . . . . 145
11.4.18. Speech Complete Timeout . . . . . . . . . . . . . . 145
11.4.19. New Audio Channel . . . . . . . . . . . . . . . . . 145
11.4.20. Abort-Verification . . . . . . . . . . . . . . . . . 145
11.4.21. Start Input Timers . . . . . . . . . . . . . . . . . 145
11.5. Verification Message Body . . . . . . . . . . . . . . . 146
11.5.1. Verification Result Data . . . . . . . . . . . . . . 146
11.5.2. Verification Result Elements . . . . . . . . . . . . 146
11.6. START-SESSION . . . . . . . . . . . . . . . . . . . . . 150
11.7. END-SESSION . . . . . . . . . . . . . . . . . . . . . . 151
11.8. QUERY-VOICEPRINT . . . . . . . . . . . . . . . . . . . . 152
11.9. DELETE-VOICEPRINT . . . . . . . . . . . . . . . . . . . 153
11.10. VERIFY . . . . . . . . . . . . . . . . . . . . . . . . . 154
11.11. VERIFY-FROM-BUFFER . . . . . . . . . . . . . . . . . . . 154
11.12. VERIFY-ROLLBACK . . . . . . . . . . . . . . . . . . . . 157
11.13. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 157
11.14. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 158
11.15. VERIFICATION-COMPLETE . . . . . . . . . . . . . . . . . 159
11.16. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 159
11.17. CLEAR-BUFFER . . . . . . . . . . . . . . . . . . . . . . 160
11.18. GET-INTERMEDIATE-RESULT . . . . . . . . . . . . . . . . 160
12. Security Considerations . . . . . . . . . . . . . . . . . . . 161
12.1. Rendezvous and Session Establishment . . . . . . . . . . 162
12.2. Control channel protection . . . . . . . . . . . . . . . 162
12.3. Media session protection . . . . . . . . . . . . . . . . 162
12.4. Indirect Content Access . . . . . . . . . . . . . . . . 162
12.5. Protection of stored media . . . . . . . . . . . . . . . 163
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 163
13.1. New registries . . . . . . . . . . . . . . . . . . . . . 163
Shanmugham & Burnett Expires December 25, 2008 [Page 6]
Internet-Draft MRCPv2 June 2008
13.1.1. MRCPv2 resource types . . . . . . . . . . . . . . . 163
13.1.2. MRCPv2 methods and events . . . . . . . . . . . . . 163
13.1.3. MRCPv2 headers . . . . . . . . . . . . . . . . . . . 165
13.1.4. MRCPv2 status codes . . . . . . . . . . . . . . . . 167
13.1.5. Grammar Reference List Parameters . . . . . . . . . 167
13.1.6. MRCPv2 vendor-specific parameters . . . . . . . . . 168
13.2. NLSML-related registrations . . . . . . . . . . . . . . 168
13.2.1. application/nlsml+xml Media Type registration . . . 168
13.3. NLSML XML Schema registration . . . . . . . . . . . . . 169
13.4. MRCPv2 XML Namespace registration . . . . . . . . . . . 169
13.5. text Media Type Registrations . . . . . . . . . . . . . 169
13.5.1. text/grammar-ref-list . . . . . . . . . . . . . . . 170
13.5.2. text/uri-list . . . . . . . . . . . . . . . . . . . 170
13.6. session URL scheme registration . . . . . . . . . . . . 171
13.7. SDP parameter registrations . . . . . . . . . . . . . . 172
13.7.1. sub-registry "proto" . . . . . . . . . . . . . . . . 172
13.7.2. sub-registry "att-field (session-level)" . . . . . . 173
13.7.3. sub-registry "att-field (media-level)" . . . . . . . 173
14. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 174
14.1. Message Flow . . . . . . . . . . . . . . . . . . . . . . 174
14.2. Recognition Result Examples . . . . . . . . . . . . . . 183
14.2.1. Simple ASR Ambiguity . . . . . . . . . . . . . . . . 183
14.2.2. Mixed Initiative . . . . . . . . . . . . . . . . . . 184
14.2.3. DTMF Input . . . . . . . . . . . . . . . . . . . . . 185
14.2.4. Interpreting Meta-Dialog and Meta-Task Utterances . 185
14.2.5. Anaphora and Deixis . . . . . . . . . . . . . . . . 186
14.2.6. Distinguishing Individual Items from Sets with
One Member . . . . . . . . . . . . . . . . . . . . . 187
14.2.7. Extensibility . . . . . . . . . . . . . . . . . . . 188
15. ABNF Normative Definition . . . . . . . . . . . . . . . . . . 188
16. XML Schemas . . . . . . . . . . . . . . . . . . . . . . . . . 203
16.1. NLSML Schema Definition . . . . . . . . . . . . . . . . 203
16.2. Enrollment Results Schema Definition . . . . . . . . . . 204
16.3. Verification Results Schema Definition . . . . . . . . . 205
17. References . . . . . . . . . . . . . . . . . . . . . . . . . 209
17.1. Normative References . . . . . . . . . . . . . . . . . . 209
17.2. Informative References . . . . . . . . . . . . . . . . . 211
Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 212
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 213
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 213
Intellectual Property and Copyright Statements . . . . . . . . . 214
Shanmugham & Burnett Expires December 25, 2008 [Page 7]
Internet-Draft MRCPv2 June 2008
1. Introduction
The MRCPv2 protocol is designed to allow a client device to control
media processing resources on the network. Some of these media
processing resources include speech recognition engines, speech
synthesis engines, speaker verification and speaker identification
engines. MRCPv2 enables the implementation of distributed
Interactive Voice Response platforms using VoiceXML
[W3C.REC-voicexml20-20040316] browsers or other client applications
while maintaining separate back-end speech processing capabilities on
specialized speech processing servers. MRCPv2 is based on the
earlier Media Resource Control Protocol (MRCP) [RFC4463] developed
jointly by Cisco Systems, Inc., Nuance Communications, and
Speechworks Inc.
The protocol requirements of SPEECHSC [RFC4313] dictate that the
solution be capable of reaching a media processing server and setting
up communication channels to the media resources, and sending and
receiving control messages and media streams to/from the server. The
Session Initiation Protocol (SIP) [RFC3261] meets these requirements.
MRCPv2 leverages these capabilities by building upon SIP and the
Session Description Protocol (SDP) [RFC4566]. MRCPv2 uses SIP to
setup and tear down media and control sessions with the server. In
addition, the client can use a SIP re-INVITE method (an INVITE dialog
sent within an existing SIP Session) to change the characteristics of
these media and control session while maintaining the SIP dialog
between the client and server. SDP is used to describe the
parameters of the media sessions associated with that dialog. It is
mandatory to support SIP as the session establishment protocol to
ensure interoperability. Other protocols can be used for session
establishment by prior agreement. This document only describes the
use of SIP and SDP.
MRCPv2 uses SIP and SDP to create the client/server dialog and set up
the media channels to the server. It also uses SIP and SDP to
establish MRCPv2 control sessions between the client and the server
for each media processing resource required for that dialog. The
MRCPv2 protocol exchange between the client and the media resource is
carried on that control session. MRCPv2 protocol exchanges do not
change the state of the SIP dialog, the media sessions, or other
parameters of the dialog initiated via SIP. It controls and affects
the state of the media processing resource associated with the MRCPv2
session(s).
MRCPv2 defines the messages to control the different media processing
resources and the state machines required to guide their operation.
It also describes how these messages are carried over a transport
layer protocol such as TCP or TLS (Note: SCTP is a viable transport
Shanmugham & Burnett Expires December 25, 2008 [Page 8]
Internet-Draft MRCPv2 June 2008
for MRCPv2 as well, but the mapping onto SCTP is not described in
this specification).
2. Document Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC2119 [RFC2119].
Since many of the definitions and syntax are identical to HTTP/1.1
(RFC2616 [RFC2616]), this specification refers to the section where
they are defined rather than copying it. For brevity, [HX.Y] is to
be taken to refer to Section X.Y of RFC2616.
All the mechanisms specified in this document are described in both
prose and an augmented Backus-Naur form (ABNF [RFC4234]).
The complete message format in ABNF form is provided in Section 15
and is the normative format definition.
2.1. Definitions
Media Resource
An entity on the speech processing server that can be
controlled through the MRCPv2 protocol.
MRCP Server
Aggregate of one or more "Media Resource" entities on
a Server, exposed through the MRCPv2 protocol
("Server" for short).
MRCP Client
An entity controlling one or more Media Resources
through the MRCPv2 protocol ("Client" for short).
DTMF
Dual Tone Multi-Frequency; a method of transmitting
key presses in-band, either as actual tones (Q.23
[Q.23]) or as named tone events (RFC4733 [RFC4733]).
Hotword Mode
A mode of speech recognition where a stream of
utterances is evaluated for match against a small set
of command words. This is generally employed to
either trigger some action, or to control the
subsequent grammar to be used for further recognition
2.2. State-Machine Diagrams
The state-machine diagrams in this document do not show every
possible method call. Rather, they reflect the state of the resource
Shanmugham & Burnett Expires December 25, 2008 [Page 9]
Internet-Draft MRCPv2 June 2008
based on the methods that have moved to IN-PROGRESS or COMPLETE
states. Note that since PENDING requests essentially have not
affected the resource yet and are in queue to be processed, they are
not reflected in the state-machine diagrams.
3. Architecture
A system using MRCPv2 consists of a client that requires the
generation and/or consumption of media streams and a media resource
server that has the resources or "engines" to process these streams
as input or generate these streams as output. The client uses SIP
and SDP to establish an MRCPv2 control channel with the server to use
its media processing resources. MRCPv2 servers are addressed using
SIP URIs.
The session management protocol (SIP) uses SDP with the offer/answer
model described in RFC3264 [RFC3264] to set up the MRCPv2 control
channels and describe their characteristics. A separate MRCPv2
session is needed to control each of the media processing resources
associated with the SIP dialog between the client and server. Within
a SIP dialog, the individual resource control channels for the
different resources are added or removed through SDP offer/answer
carried in a SIP re-INVITE transaction.
The server, through the SDP exchange, provides the client with an
unambiguous channel identifier and a TCP port number. The client MAY
then open a new TCP connection with the server using this port
number. Multiple MRCPv2 channels can share a TCP connection between
the client and the server. All MRCPv2 messages exchanged between the
client and the server carry the specified channel identifier that the
server MUST ensure is unambiguous among all MRCPv2 control channels
that are active on that server. The client uses this channel
identifier to indicate the media processing resource associated with
that channel.
The session management protocol (SIP) also establishes the media
sessions between the client (or other source/sink of media) and the
MRCPv2 server using SDP m-lines. One or more media processing
resources may share a media session under a SIP session, or each
media processing resource may have its own media session.
Shanmugham & Burnett Expires December 25, 2008 [Page 10]
Internet-Draft MRCPv2 June 2008
MRCPv2 client MRCPv2 Media Resource Server
|--------------------| |-----------------------------|
||------------------|| ||---------------------------||
|| Application Layer|| || TTS | ASR | SV | SI ||
||------------------|| ||Engine|Engine|Engine|Engine||
||Media Resource API|| ||---------------------------||
||------------------|| || Media Resource Management ||
|| SIP | MRCPv2 || ||---------------------------||
||Stack | || || SIP | MRCPv2 ||
|| | || || Stack | ||
||------------------|| ||---------------------------||
|| TCP/IP Stack ||----MRCPv2---|| TCP/IP Stack ||
|| || || ||
||------------------||-----SIP-----||---------------------------||
|--------------------| |-----------------------------|
| /
SIP /
| /
|-------------------| RTP
| | /
| Media Source/Sink |-------------/
| |
|-------------------|
Figure 1: Architectural Diagram
3.1. MRCPv2 Media Resource Types
An MRCPv2 server may offer one or more of the following media
processing resources to its clients.
Basic Synthesizer
A speech synthesizer resource with very limited
capabilities, that can generate its media stream
exclusively from concatenated audio clips. The speech
data is described using a limited subset of SSML
[W3C.REC-speech-synthesis-20040907] elements. A basic
synthesizer MUST support the SSML tags <speak>,
<audio>, <say-as> and <mark>.
Speech Synthesizer
A full capability speech synthesis resource capable of
rendering speech from text. Such a synthesizer MUST
have full SSML [W3C.REC-speech-synthesis-20040907]
support.
Shanmugham & Burnett Expires December 25, 2008 [Page 11]
Internet-Draft MRCPv2 June 2008
Recorder
A resource capable of recording audio and saving it to
a URI. A recorder MUST provide some end-pointing
capabilities for suppressing silence at the beginning
and end of a recording, and MAY also suppress silence
in the middle of a recording. If such suppression is
done, the recorder MUST maintain timing metadata to
indicate the actual time stamps of the recorded media.
DTMF Recognizer
A recognition resource capable of extracting and
interpreting DTMF digits in a media stream and
matching them against a supplied digit grammar It
could also do a semantic interpretation based on
semantic tags in the grammar.
Speech Recognizer
A full speech recognition resource that is capable of
receiving a media stream containing audio and
interpreting it to recognition results. It also has a
natural language semantic interpreter to post-process
the recognized data according to the semantic data in
the grammar and provide semantic results along with
the recognized input. The recognizer may also support
enrolled grammars, where the client can enroll and
create new personal grammars for use in future
recognition operations.
Speaker Verifier
A resource capable of verifying the authenticity of a
claimed identity by matching a media stream containing
spoken input to a pre-existing voiceprint. This may
also involve matching the caller's voice against more
than one voiceprint, also called multi-verification or
speaker identification.
3.2. Server and Resource Addressing
The MRCPv2 server as a whole is a generic SIP server and is addressed
is by a SIP Contact URI registered by the server through SIP (or via
static configuration of the SIP registrar).
For example:
sip:mrcpv2@example.net
4. MRCPv2 Protocol Basics
MRCPv2 requires a connection-oriented transport layer protocol such
as TCP or SCTP to guarantee reliable sequencing and delivery of
Shanmugham & Burnett Expires December 25, 2008 [Page 12]
Internet-Draft MRCPv2 June 2008
MRCPv2 control messages between the client and the server. In order
to meet the requirements for security enumerated in SpeechSC
Requirements [RFC4313], clients and servers MUST implement TLS as
well. One or more connections between the client and the server can
be shared among different MRCPv2 channels to the server. The
individual messages carry the channel identifier to differentiate
messages on different channels. MRCPv2 protocol encoding is text
based with mechanisms to carry embedded binary data. This allows
arbitrary data like recognition grammars, recognition results,
synthesizer speech markup etc. to be carried in MRCPv2 messages.
4.1. Connecting to the Server
MRCPv2 employs a session establishment and management protocol such
as SIP in conjunction with SDP. The client finds and reaches an
MRCPv2 server using conventional INVITE and other SIP transactions
for establishing, maintaining, and terminating SIP dialogs. The SDP
offer/answer exchange model over SIP is used to establish a resource
control channel for each resource. The SDP offer/answer exchange is
also used to establish media sessions between the server and the
source or sink of audio.
4.2. Managing Resource Control Channels
The client needs a separate MRCPv2 resource control channel to
control each media processing resource under the SIP dialog. A
unique channel identifier string identifies these resource control
channels. The channel identifier is an unambiguous, opaque string
followed by an "@", then by a string token specifying the type of
resource. The server generates the channel identifier and MUST make
sure it does not clash with the identifier of any other MRCP channel
currently allocated by that server. MRCPv2 defines the following
IANA-registered types of media processing resources. Additional
resource types, their associated methods/events and state machines
may be added by future specification proposing to extend the
capabilities of MRCPv2.
+---------------+----------------------+--------------+
| Resource Type | Resource Description | Described in |
+---------------+----------------------+--------------+
| speechrecog | Speech Recognizer | Section 9 |
| dtmfrecog | DTMF Recognizer | Section 9 |
| speechsynth | Speech Synthesizer | Section 8 |
| basicsynth | Basic Synthesizer | Section 8 |
| speakverify | Speaker Verification | Section 11 |
| recorder | Speech Recorder | Section 10 |
+---------------+----------------------+--------------+
Shanmugham & Burnett Expires December 25, 2008 [Page 13]
Internet-Draft MRCPv2 June 2008
Resource Types
The SIP INVITE or re-INVITE transaction and the SDP offer/answer
exchange it carries contain m-lines describing the resource control
channel to be allocated. There MUST be one SDP m-line for each
MRCPv2 resource to be used in the session. This m-line MUST have a
media type field of "application" and a transport type field of
either "TCP/MRCPv2" or "TCP/TLS/MRCPv2". (The usage of SCTP with
MRCPv2 may be addressed in a future specification). The port number
field of the m-line MUST contain the "discard" port of the transport
protocol (port 9 for TCP) in the SDP offer from the client and MUST
contain the TCP listen port on the server in the SDP answer. The
client may then either set up a TCP or TLS connection to that server
port or share an already established connection to that port. Since
MRCPv2 allows multiple sessions to share the same TCP connection,
multiple m-lines in a single SDP document may share the same port
field value; MRCPv2 servers MUST NOT assume any relationship between
resources using the same port other than the sharing of the
communication channel.
MRCPv2 resources do not use the port or format field of the m-line to
distinguish themselves from other resources using the same channel.
The client MUST specify the resource type identifier in the resource
attribute associated with the control m-line of the SDP offer. The
server MUST respond with the full Channel-Identifier (which includes
the resource type identifier and an unambiguous string) in the
"channel" attribute associated with the control m-line of the SDP
answer. To remain backwards compatible with conventional SDP usage,
the format field of the m-line MUST have the arbitrarily-selected
value of "1".
When the client wants to add a media processing resource to the
session, it issues a SIP re-INVITE transaction. The SDP offer/answer
exchange carried by this SIP transaction contains one or more
additional control m-lines for the new resources to be allocated to
the session. The server, on seeing the new m-line, allocates the
resources (if they are available) and responds with a corresponding
control m-line in the SDP answer carried in the SIP response.
The a=setup attribute, as described in RFC4145 [RFC4145], MUST be
"active" for the offer from the client and MUST be "passive" for the
answer from the MRCPv2 server. The a=connection attribute MUST have
a value of "new" on the very first control m-line offer from the
client to an MRCPv2 server. Subsequent control m-line offers from
the client to the MRCP server MAY contain "new" or "existing",
depending on whether the client wants to set up a new connection or
share an existing connection, respectively. If the client specifies
a value of "new", the server MUST respond with a value of "new". If
Shanmugham & Burnett Expires December 25, 2008 [Page 14]
Internet-Draft MRCPv2 June 2008
the client specifies a value of "existing", the server MAY respond
with a value of "existing" if it prefers to share an existing
connection or can answer with a value of "new", in which case the
client MUST initiate a new transport connection.
When the client wants to de-allocate the resource from this session,
it issues a SIP re-INVITE transaction with the server. The SDP MUST
offer the control m-line with port 0. The server MUST then answer
the control m-line with a response of port 0. This de-allocates the
associated MRCPv2 identifier and resource. The server MUST NOT close
the TCP, SCTP or TLS connection if it is currently being shared among
multiple MRCP channels. When all MRCP channels that may be sharing
the connection are released and/or the associated SIP dialog is
terminated, the client or server terminates the connection.
All servers MUST support TLS. Servers MAY support TCP without TLS in
physically secure environments. It is up to the client, through the
SDP offer, to choose which transport it wants to use for an MRCPv2
session. Aside from the exceptions given above, when using TCP the
m-lines MUST conform to RFC4145 [RFC4145], which describes the usage
of SDP for connection-oriented transport. When using TLS the SDP
m-line for the control pipe MUST conform to comedia over TLS
[RFC4572], which specifies the usage of SDP for establishing a secure
connection-oriented transport over TLS.
This example exchange adds a resource control channel for a
synthesizer. Since a synthesizer also generates an audio stream,
this interaction also creates a receive-only RTP media session for
the server to send audio to.
C->S: INVITE sip:mresources@server.example.com SIP/2.0
Via:SIP/2.0/TCP client.atlanta.example.com:5060;
branch=z9hG4bK74bf9
Max-Forwards:6
To:MediaServer <sip:mresources@server.example.com>
From:sarvi <sip:sarvi@example.com>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314161 INVITE
Contact:<sip:sarvi@example.com>
Content-Type:application/sdp
Content-Length: 244
v=0
o=sarvi 2890844526 2890842808 IN IP4 192.0.2.4
s=-
c=IN IP4 192.0.2.12
m=application 9 TCP/MRCPv2 1
a=setup:active
Shanmugham & Burnett Expires December 25, 2008 [Page 15]
Internet-Draft MRCPv2 June 2008
a=connection:new
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=recvonly
a=mid:1
S->C: SIP/2.0 200 OK
Via:SIP/2.0/TCP client.atlanta.example.com:5060;
branch=z9hG4bK74bf9
To:MediaServer <sip:mresources@server.example.com>
From:sarvi <sip:sarvi@example.com>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314161 INVITE
Contact:<sip:mresources@server.example.com>
Content-Type:application/sdp
Content-Length: 260
v=0
o=- 2890844526 2890842808 IN IP4 192.0.2.4
s=-
c=IN IP4 192.0.2.11
m=application 32416 TCP/MRCPv2 1
a=setup:passive
a=connection:new
a=channel:32AECB234338@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 00 96
a=rtpmap:0 pcmu/8000
a=sendonly
a=mid:1
C->S: ACK sip:mresources@server.example.com SIP/2.0
Via:SIP/2.0/TCP client.atlanta.example.com:5060;
branch=z9hG4bK74bf9
Max-Forwards:6
To:MediaServer <sip:mresources@server.example.com>;tag=a6c85cf
From:Sarvi <sip:sarvi@example.com>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314162 ACK
Content-Length:0
Example: Add Synthesizer Control Channel
This example exchange continues from the previous figure and
Shanmugham & Burnett Expires December 25, 2008 [Page 16]
Internet-Draft MRCPv2 June 2008
allocates an additional resource control channel for a recognizer.
Since a recognizer would need to receive an audio stream for
recognition, this interaction also updates the audio stream to
sendrecv, making it a 2-way RTP media session.
C->S: INVITE sip:mresources@server.example.com SIP/2.0
Via:SIP/2.0/TCP client.atlanta.example.com:5060;
branch=z9hG4bK74bf9
Max-Forwards:6
To:MediaServer <sip:mresources@server.example.com>
From:sarvi <sip:sarvi@example.com>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:sarvi@example.com>
Content-Type:application/sdp
Content-Length: 397
v=0
o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
s=-
c=IN IP4 192.0.2.12
m=application 9 TCP/MRCPv2 1
a=setup:active
a=connection:existing
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=sendrecv
a=mid:1
m=application 9 TCP/MRCPv2 1
a=setup:active
a=connection:existing
a=resource:speechrecog
a=cmid:1
S->C: SIP/2.0 200 OK
Via:SIP/2.0/TCP client.atlanta.example.com:5060;
branch=z9hG4bK74bf9
To:MediaServer <sip:mresources@server.example.com>
From:sarvi <sip:sarvi@example.com>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:sarvi@example.com>
Content-Type:application/sdp
Shanmugham & Burnett Expires December 25, 2008 [Page 17]
Internet-Draft MRCPv2 June 2008
Content-Length:431
v=0
o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
s=-
c=IN IP4 192.0.2.11
m=application 32416 TCP/MRCPv2 1
a=setup:passive
a=connection:existing
a=channel:32AECB234338@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=sendrecv
a=mid:1
m=application 32416 TCP/MRCPv2 1
a=setup:passive
a=connection:existing
a=channel:32AECB234338@speechrecog
a=cmid:1
C->S: ACK sip:mresources@server.example.com SIP/2.0
Via:SIP/2.0/TCP client.atlanta.example.com:5060;
branch=z9hG4bK74bf9
Max-Forwards:6
To:MediaServer <sip:mresources@server.example.com>;tag=a6c85cf
From:Sarvi <sip:sarvi@example.com>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314164 ACK
Content-Length:0
Add Recognizer example
This example exchange continues from the previous figure and de-
allocates recognizer channel. Since a recognizer no longer needs to
receive an audio stream, this interaction also updates the RTP media
session to recvonly.
C->S: INVITE sip:mresources@server.example.com SIP/2.0
Via:SIP/2.0/TCP client.atlanta.example.com:5060;
branch=z9hG4bK74bf9
Max-Forwards:6
To:MediaServer <sip:mresources@server.example.com>
From:sarvi <sip:sarvi@example.com>;tag=1928301774
Call-ID:a84b4c76e66710
Shanmugham & Burnett Expires December 25, 2008 [Page 18]
Internet-Draft MRCPv2 June 2008
CSeq:314163 INVITE
Contact:<sip:sarvi@example.com>
Content-Type:application/sdp
Content-Length: 276
v=0
o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
s=-
c=IN IP4 192.0.2.12
m=application 9 TCP/MRCPv2 1
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=recvonly
a=mid:1
m=application 0 TCP/MRCPv2 1
a=resource:speechrecog
a=cmid:1
S->C: SIP/2.0 200 OK
Via:SIP/2.0/TCP client.atlanta.example.com:5060;
branch=z9hG4bK74bf9
To:MediaServer <sip:mresources@server.example.com>
From:sarvi <sip:sarvi@example.com>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:sarvi@example.com>
Content-Type:application/sdp
Content-Length:303
v=0
o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
s=-
c=IN IP4 192.0.2.11
m=application 32416 TCP/MRCPv2 1
a=channel:32AECB234338@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=sendonly
a=mid:1
m=application 0 TCP/MRCPv2 1
a=channel:32AECB234338@speechrecog
a=cmid:1
Shanmugham & Burnett Expires December 25, 2008 [Page 19]
Internet-Draft MRCPv2 June 2008
C->S: ACK sip:mresources@server.example.com SIP/2.0
Via:SIP/2.0/TCP client.atlanta.example.com:5060;
branch=z9hG4bK74bf9
Max-Forwards:6
To:MediaServer <sip:mresources@server.example.com>;tag=a6c85cf
From:Sarvi <sip:sarvi@example.com>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314164 ACK
Content-Length:0
Deallocate Recognizer example
4.3. Media Streams and RTP Ports
Since MRCPv2 resources either generate or consume media streams, the
client or the server needs to associate media sessions with their
corresponding resource or resources. More than one resource could be
associated with a single media session or each resource could be
assigned a separate media session. Also note that more that one
media session can be associated with a single resource if need be,
but this scenario is not useful for the current set of resources.
For example, a synthesizer and a recognizer could be associated to
the same media session (m=audio line), if it is opened in "sendrecv"
mode. Alternatively, the recognizer could have its own "sendonly"
audio session and the synthesizer could have its own "recvonly" audio
session.
The association between control channels and their corresponding
media sessions is established using a new "resource channel media
identifier" media-level attribute ("cmid"). Valid values of this
attribute are the values of the "mid" attribute defined in RFC3388
[RFC3388]. If there is more than 1 audio m-line, then each audio
m-line MUST have a "mid" attribute. Each control m-line MAY have one
or more "cmid" attributes that match the resource control channel to
the "mid" attributes of the audio m-lines it is associated with.
Note that if a control m-line does not have a "cmid" attribute it
will not be associated with any media. The operations on such a
resource will hence be limited. For example, if it was a recognizer
resource, the RECOGNIZE method requires an associated media to
process while the INTERPRET method does not. The formatting of the
"cmid" attribute in SDP RFC3388 [RFC4566] is described by the
following ABNF:
cmid-attribute = "a=cmid:" identification-tag
identification-tag = token
To allow this flexible mapping of media sessions to MRCPv2 control
channels, a single audio m-line can be associated with multiple
Shanmugham & Burnett Expires December 25, 2008 [Page 20]
Internet-Draft MRCPv2 June 2008
resources or each resource can have its own audio m-line. For
example, if the client wants to allocate a recognizer and a
synthesizer and associate them with a single 2-way audio pipe, the
SDP offer would contain two control m-lines and a single audio m-line
with an attribute of "sendrecv". Each of the control m-lines would
have a "cmid" attribute whose value matches the "mid" of the audio
m-line. If, on the other hand, the client wants to allocate a
recognizer and a synthesizer each with its own separate audio pipe,
the SDP offer would carry two control m-lines (one for the recognizer
and another for the synthesizer) and two audio m-lines (one with the
attribute "sendonly" and another with attribute "recvonly"). The
"cmid" attribute of the recognizer control m-line would match the
"mid" value of the "sendonly" audio m-line and the "cmid" attribute
of the synthesizer control m-line would match the "mid" attribute of
the "recvonly" m-line.
When a server receives media (e.g. audio) on a media session that is
associated with more than one media processing resource, it is the
responsibility of the server to receive and fork it to the resources
that need to consume it. If multiple resources in an MRCPv2 session
are generating audio (or other media) to be sent on a single
associated media session, it is the responsibility of the server to
either multiplex the multiple streams onto the single RTP session or
contain an embedded RTP mixer (see RFC3550 [RFC3550]) to combine the
multiple streams into one. In the former case, the media stream will
contain RTP packets generated by different sources, and hence the
packets will have different Synchronization Source identifiers
(SSRCs). In the latter case, the RTP packets will contain multiple
(CSRCs) corresponding to the original streams before being combined
by the mixer. An MRCPv2 implementation either MUST correctly process
such RTP sessions, or alternatively MUST avoid associating multiple
resources with a single session.
Contributing SSRCs
If a server does not have the capability to mix/multiplex or fork
media, in the latter cases, then the server MUST disallow the client
from associating multiple such resources to a single audio pipe by
rejecting the SDP offer with a SIP 501 "Not Implemented" error.
4.4. MRCPv2 Message Transport
The MRCPv2 messages defined in this document are transported over a
TCP, TLS or SCTP (in the future) connection between the client and
the server. The method for setting up this transport connection and
the resource control channel is discussed in Section 4.1 and
Section 4.2. Multiple resource control channels between a client and
a server that belong to different SIP dialogs can share one or more
Shanmugham & Burnett Expires December 25, 2008 [Page 21]
Internet-Draft MRCPv2 June 2008
TLS, TCP or SCTP connections between them; the server and client MUST
support this mode of operation. The individual MRCPv2 messages carry
the MRCPv2 channel identifier in their Channel-Identifier header,
which MUST be used to differentiate MRCPv2 messages from different
resource channels (see Section 6.2.1 for details). All MRCPv2
servers MUST support TLS. Servers MAY support TCP without TLS in
physically secure environments. It is up to the client to choose
which mode of transport it wants to use for an MRCPv2 session.
Most examples from here on show only the MRCPv2 messages and do not
show the SIP messages and headers that may have been used to
establish the MRCPv2 control channel.
5. MRCPv2 Specification
MRCPv2 messages are textual using the ISO 10646 character set in the
UTF-8 encoding (RFC3629 [RFC3629]) to allow many different languages
to be represented. However, to assist in compact representations,
MRCPv2 also allows other character sets such as ISO 8859-1 to be used
when desired. The MRCPv2 protocol headers (the first line of an MRCP
message) and header names use only the US-ASCII subset of UTF-8.
Internationalization only applies to certain fields like grammar,
results, speech markup etc, and not to MRCPv2 as a whole.
Lines are terminated by CRLF. Also, some parameters in the message
may contain binary data or a record spanning multiple lines. Such
fields have a length value associated with the parameter, which
indicates the number of octets immediately following the parameter.
5.1. Common Protocol Elements
The MRCPv2 message set consists of requests from the client to the
server, responses from the server to the client and asynchronous
events from the server to the client. All these messages consist of
a start-line, one or more headers, an empty line (i.e. a line with
nothing preceding the CRLF) indicating the end of the header fields,
and an optional message body.
Shanmugham & Burnett Expires December 25, 2008 [Page 22]
Internet-Draft MRCPv2 June 2008
generic-message = start-line
message-header
CRLF
[ message-body ]
start-line = request-line / response-line / event-line
message-header = 1*(generic-header / resource-header)
resource-header = recognizer-header
/ synthesizer-header
/ recorder-header
/ verifier-header
The message-body contains resource-specific and message-specific
data. The actual Media Types used to carry the data are specified
later in the sections defining the individual messages.
If a message contains a message body, the message MUST contain
content-headers indicating the Media Type and encoding of the data in
the message body.
Request, response and event messages include the version of MRCP that
the message conforms to. Version compatibility rules follow [H3.1]
regarding version ordering, compliance requirements, and upgrading of
version numbers. The version information is indicated by "MRCP" (as
opposed to "HTTP" in [H3.1]) or "MRCP/2.0" (as opposed to "HTTP/1.1"
in [H3.1]). To be compliant with this specification, clients and
servers sending MRCPv2 messages MUST indicate an mrcp-version of
"MRCP/2.0".
mrcp-version = "MRCP" "/" 1*2DIGIT "." 1*2DIGIT
The message-length field specifies the length of the message,
including the start-line, and MUST be the 2nd token from the
beginning of the message. This is to make the framing and parsing of
the message simpler to do. This field specifies the length of the
message including data that may be encoded into the body of the
message. Note that this value MAY be printed as a fixed-length
integer that is zero-padded in front in order to eliminate or reduce
inefficiency in cases where the message-length value would change as
a result of the length of the message-length token itself.
message-length = 1*19DIGIT
All MRCPv2 messages, responses and events MUST carry the Channel-
Identifier header so the server or client can differentiate messages
Shanmugham & Burnett Expires December 25, 2008 [Page 23]
Internet-Draft MRCPv2 June 2008
from different control channels that may share the same transport
connection.
In the resource-specific header descriptions in sections 8-11, a
header is disallowed on a method (request, response, or event) for
that resource unless specifically listed as being allowed. Also, the
phrasing "This header MAY occur on method X" indicates that the
header is allowed on that method but is not required to be used in
every instance of that method.
5.2. Request
An MRCPv2 request consists of a Request line followed by message
headers and an optional message body containing data specific to the
request message.
The Request message from a client to the server includes within the
first line the method to be applied, a method tag for that request
and the version of the protocol in use.
request-line = mrcp-version SP message-length SP method-name
SP request-id CRLF
The request-id field is a unique identifier representable as an
unsigned 32 bit integer created by the client and sent to the server.
Consecutive requests within an MRCP session MUST utilize
monotonically increasing request-id's. The request-id space is
linear, (i.e. not mod(32)) so the space does not wrap and validity
can be checked with a simple unsigned comparison operation. The
client may choose any initial value for its first request, but a
small integer is RECOMMENDED to avoid exhausting the space in long
sessions. If the server receives duplicate or out-of-order requests
the server MUST reject the request with a response code of 410.
Since request-id's are scoped to the MRCP session, they are unique
across all TCP connections and all resource channels in the session.
The server resource MUST use the client-assigned identifier in its
response to the request. If the request does not complete
synchronously, future asynchronous events associated with this
request MUST carry the client-assigned request-id.
The mrcp-version field is the MRCP protocol version that is being
used by the client.
The message-length field specifies the length of the message,
including the start-line.
request-id = 1*19DIGIT
Shanmugham & Burnett Expires December 25, 2008 [Page 24]
Internet-Draft MRCPv2 June 2008
The method-name field identifies the specific request that the client
is making to the server. Each resource supports a subset of the
MRCPv2 methods. The subset for each resource is defined in the
section of the specification for the corresponding resource.
method-name = generic-method
/ synthesizer-method
/ recorder-method
/ recognizer-method
/ verifier-method
5.3. Response
After receiving and interpreting the request message for a method,
the server resource responds with an MRCPv2 response message. The
response consists of a response line followed by message headers and
an optional message body containing data specific to the method.
response-line = mrcp-version SP message-length SP request-id
SP status-code SP request-state CRLF
The mrcp-version field MUST contain the version of the request if
supported; otherwise, it must contain the highest version of the
MRCPv2 protocol supported by the server.
The message-length field specifies the length of the message,
including the start-line.
The request-id used in the response MUST match the one sent in the
corresponding request message.
The status-code field is a 3-digit code representing the success or
failure or other status of the request.
The request-state field indicates if the action initiated by the
Request is PENDING, IN-PROGRESS or COMPLETE. The COMPLETE status
means that the Request was processed to completion and that there
will be no more events or other messages from that resource to the
client with that request-id. The PENDING status means that the
request has been placed on a queue and will be processed in first-in-
first-out order. The IN-PROGRESS status means that the request is
being processed and is not yet complete. A PENDING or IN-PROGRESS
status indicates that further Event messages may be delivered with
that request-id.
request-state = "COMPLETE"
/ "IN-PROGRESS"
/ "PENDING"
Shanmugham & Burnett Expires December 25, 2008 [Page 25]
Internet-Draft MRCPv2 June 2008
5.4. Status Codes
The status codes are classified under the Success (2XX) codes, Client
Failure (4XX) codes, and Server Failure (5XX).
Success Codes
+------------+--------------------------------------------+
| Code | Meaning |
+------------+--------------------------------------------+
| 200 | Success |
| 201 | Success with some optional headers ignored |
+------------+--------------------------------------------+
Success 2xx
Client Failure 4xx Codes
+------------+------------------------------------------------------+
| Code | Meaning |
+------------+------------------------------------------------------+
| 401 | Method not allowed |
| 402 | Method not valid in this state |
| 403 | Unsupported Header |
| 404 | Illegal Value for Header. This is the error for a |
| | syntax violation. |
| 405 | Resource not allocated for this session or does not |
| | exist |
| 406 | Mandatory Header Missing |
| 407 | Method or Operation Failed (e.g., Grammar |
| | compilation failed in the recognizer. Detailed |
| | cause codes MAY BE available through a resource |
| | specific header.) |
| 408 | Unrecognized or unsupported message entity |
| 409 | Unsupported Header Value. This is a value that is |
| | syntactically legal but exceeds the implementation's |
| | capabilities or expectations. |
| 410 | Non-Monotonic or Out of order sequence number in |
| | request. |
| 411-420 | Reserved |
+------------+------------------------------------------------------+
Client Failure 4xx
Shanmugham & Burnett Expires December 25, 2008 [Page 26]
Internet-Draft MRCPv2 June 2008
Server Failure 5xx Codes
+------------+------------------------------------------------------+
| Code | Meaning |
+------------+------------------------------------------------------+
| 501 | Server Internal Error |
| 502 | Protocol Version not supported |
| 503 | Proxy Timeout. The MRCP Proxy did not receive a |
| | response from the MRCP server. |
| 504 | Message too large |
+------------+------------------------------------------------------+
Server Failure 4xx
5.5. Events
The server resource may need to communicate a change in state or the
occurrence of a certain event to the client. These messages are used
when a request does not complete immediately and the response returns
a status of PENDING or IN-PROGRESS. The intermediate results and
events of the request are indicated to the client through the event
message from the server. The event message consists of an event
header line followed by message headers and an optional message body
containing data specific to the event message. The header line has
the request-id of the corresponding request and status value. The
status value is COMPLETE if the request is done and this was the last
event, else it is IN-PROGRESS.
event-line = mrcp-version SP message-length SP event-name
SP request-id SP request-state CRLF
The mrcp-version used here is identical to the one used in the
Request/Response Line and indicates the version of the MRCPv2
protocol running on the server.
The message-length field specifies the length of the message,
including the start-line
The request-id used in the event MUST match the one sent in the
request that caused this event.
The request-state indicates whether the Request/Command causing this
event is complete or still in progress, and is the same as the one
mentioned in Section 5.3. The final event for a request has a
COMPLETE status indicating the completion of the request.
The event-name identifies the nature of the event generated by the
media resource. The set of valid event names depends on the resource
Shanmugham & Burnett Expires December 25, 2008 [Page 27]
Internet-Draft MRCPv2 June 2008
generating it. See the corresponding resource-specific section of
the document.
event-name = synthesizer-event
/ recognizer-event
/ recorder-event
/ verifier-event
6. MRCPv2 Generic Methods, Headers, and Result Structure
MRCPv2 supports a set of methods and headers that are common to all
resources. These are discussed here; resource-specific methods and
headers are discussed in the corresponding resource-specific section
of the document.
6.1. Generic Methods
MRCPv2 supports two generic methods for reading and writing the state
associated with a resource.
generic-method = "SET-PARAMS"
/ "GET-PARAMS"
These are described in the following sub-sections.
6.1.1. SET-PARAMS
The "SET-PARAMS" method, from the client to the server, tells the
MRCPv2 resource to define parameters for the session, such as voice
characteristics and prosody on synthesizers, recognition timers on
recognizers, etc. If the server accepts and sets all parameters it
MUST return a Response-Status of 200. If it chooses to ignore some
optional headers that can be safely ignored without affecting
operation of the server it MUST return 201.
If one or more of the headers being sent is incorrect, error 403,
404, or 409 MUST be returned as follows:
o If one or more of the headers being set has an illegal value, the
server MUST reject the request with a 404 Illegal Value for
Header.
o If one or more of the headers being set is unsupported for the
resource, the server MUST reject the request with a 403
Unsupported Header, except as described in the next paragraph.
o If one or more of the headers being set has an unsupported value,
the server MUST reject the request with a 409 Unsupported Header
Value, except as described in the next paragraph.
Shanmugham & Burnett Expires December 25, 2008 [Page 28]
Internet-Draft MRCPv2 June 2008
If both error 404 and another error have occurred, only error 404
MUST be returned. If both errors 403 and 409 have occurred, but not
error 404, only error 403 MUST be returned.
If error 403, 404, or 409 is returned, the response MUST include the
bad or unsupported headers and their values exactly as they were sent
from the client. Session parameters modified using "SET-PARAMS" do
not override parameters explicitly specified on individual requests
or requests that are in-PROGRESS.
C->S: MRCP/2.0 124 SET-PARAMS 543256
Channel-Identifier:32AECB23433802@speechsynth
Voice-gender:female
Voice-variant:3
S->C: MRCP/2.0 47 543256 200 COMPLETE
Channel-Identifier:32AECB23433802@speechsynth
6.1.2. GET-PARAMS
The "GET-PARAMS" method, from the client to the server, asks the
MRCPv2 resource for its current session parameters, such as voice
characteristics and prosody on synthesizers, recognition-timer on
recognizers, etc. For every empty header field the client sends in
the request, the server MUST include the corresponding headers and
their values in the response. If no parameter headers are specified
by the client then the server MUST return all the settable parameters
and their values in the corresponding headers of the response,
including vendor-specific parameters. Such wild-card parameter
requests can be very processing-intensive, since the number of
settable parameters can be large depending on the implementation.
Hence, it is RECOMMENDED that the client not use the wildcard
"GET-PARAMS" operation very often. Note that "GET-PARAMS" returns
header values that apply to the whole session and not values that
have a request level scope.
If all of the headers requested are supported, the server MUST return
a Response-Status of 200. If some of the headers being retrieved are
unsupported for the resource, the server MUST reject the request with
a 403 Unsupported Header. Such a response MUST include the (empty)
unsupported headers exactly as they were sent from the client.
Shanmugham & Burnett Expires December 25, 2008 [Page 29]
Internet-Draft MRCPv2 June 2008
C->S: MRCP/2.0 136 GET-PARAMS 543256
Channel-Identifier:32AECB23433802@speechsynth
Voice-gender:
Voice-variant:
Vendor-Specific-Parameters:com.example.param1;
com.example.param2
S->C: MRCP/2.0 163 543256 200 COMPLETE
Channel-Identifier:32AECB23433802@speechsynth
Voice-gender:female
Voice-variant:3
Vendor-Specific-Parameters:com.example.param1="Company Name";
com.example.param2="124324234@example.com"
6.2. Generic Message Headers
All MRCPv2 headers, which include both the generic-headers defined in
the following sub-sections and the resource-specific headers defined
later, follow the same generic format as that given in Section 3.1 of
RFC2822 [RFC2822]. Each header consists of a name followed by a
colon (":") and the value. Header names are case-insensitive. The
value MAY be preceded by any amount of LWS, though a single SP is
preferred. Headers may extend over multiple lines by preceding each
extra line with at least one SP or HT.
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *LWS field-content *( CRLF 1*LWS field-content)
field-content = <the OCTETs making up the field-value
and consisting of either *TEXT or combinations
of token, separators, and quoted-string>
The field-content does not include any leading or trailing LWS (i.e.
linear white space occurring before the first non-whitespace
character of the field-value or after the last non-whitespace
character of the field-value). Such leading or trailing LWS MAY be
removed without changing the semantics of the field value. Any LWS
that occurs between field-content MAY be replaced with a single SP
before interpreting the field value or forwarding the message
downstream.
MRCPv2 servers and clients MUST NOT depend on header order. It is
"good practice" to send general-header fields first, followed by
request-header or response-header fields, and ending with the entity-
header fields. However, MRCPv2 servers and clients MUST be prepared
to process the headers in any order. The only exception to this rule
is when there are multiple headers with the same header name in a
message.
Shanmugham & Burnett Expires December 25, 2008 [Page 30]
Internet-Draft MRCPv2 June 2008
Multiple headers with the same name MAY be present in a message if
and only if the entire value for that header is defined as a comma-
separated list [i.e., #(values)].
It MUST be possible to combine the multiple headers of the same name
into one "header:value" pair without changing the semantics of the
message, by appending each subsequent value to the first, each
separated by a comma. The order in which headers with the same name
are received is therefore significant to the interpretation of the
combined header value, and thus an intermediary MUST NOT change the
order of these values when a message is forwarded.
generic-header = channel-identifier
/ accept
/ active-request-id-list
/ proxy-sync-id
/ accept-charset
/ content-type
/ content-id
/ content-base
/ content-encoding
/ content-location
/ content-length
/ fetch-timeout
/ cache-control
/ logging-tag
/ set-cookie
/ set-cookie2
/ vendor-specific
6.2.1. Channel-Identifier
All MRCPv2 requests, responses and events MUST contain the Channel-
Identifier header. The value is allocated by the server when a
control channel is added to the session and communicated to the
client by the "a=channel" attribute in the SDP answer from the
server. The header value consists of 2 parts separated by the '@'
symbol. The first part is an unambiguous string identifying the
MRCPv2 session. The second part is a string token which specifies
one of the media processing resource types listed in Section 3.1.
The unambiguous string (first part) MUST BE unique among the resource
instances managed by the server and is common to all resource
channels with that server established through a single SIP dialog.
channel-identifier = "Channel-Identifier" ":" channel-id CRLF
channel-id = 1*alphanum "@" 1*alphanum
Shanmugham & Burnett Expires December 25, 2008 [Page 31]
Internet-Draft MRCPv2 June 2008
6.2.2. Accept
The Accept header field follows the syntax defined in [H14.1]. The
semantics are also identical, with the exception that if no Accept
header field is present, the server MUST assume a default value that
is specific to the resource type that is being controlled. This
default value can be changed for a resource on a session by sending
this header in a SET-PARAMS method. The current default value of
this header for a resource in a session can be found through a GET-
PARAMS method.
6.2.3. Active-Request-Id-List
In a request, this header indicates the list of request-ids to which
the request applies. This is useful when there are multiple requests
that are PENDING or IN-PROGRESS and the client wants this request to
apply to one or more of these specifically.
In a response, this header returns the list of request-ids that the
method modified or affected. There could be one or more requests in
a request-state of PENDING or IN-PROGRESS. When a method affecting
one or more PENDING or IN-PROGRESS requests is sent from the client
to the server, the response MUST contain the list of request-ids that
were affected or modified by this command in its header.
The active-request-id-list is only used in requests and responses,
not in events.
For example, if a "STOP" request with no active-request-id-list is
sent to a synthesizer resource which has one or more "SPEAK" requests
in the PENDING or IN-PROGRESS state, all "SPEAK" requests MUST be
cancelled, including the one IN-PROGRESS. The response to the "STOP"
request contains in the active-request-id-list the request-ids of all
the "SPEAK" requests that were terminated. After sending the STOP
response, the server MUST NOT send any SPEAK-COMPLETE or RECOGNITION-
COMPLETE events for the terminated requests.
active-request-id-list = "Active-Request-Id-List" ":"
request-id *("," request-id) CRLF
6.2.4. Proxy-Sync-Id
When any server resource generates a barge-in-able event, it also
generates a unique tag. The tag is sent as this header's value in an
event to the client. The client then acts as a intermediary among
the server resources and sends a BARGE-IN-OCCURRED method to the
synthesizer server resource with the Proxy-Sync-Id it received from
the server resource. When the recognizer and synthesizer resources
Shanmugham & Burnett Expires December 25, 2008 [Page 32]
Internet-Draft MRCPv2 June 2008
are part of the same session, they may choose to work together to
achieve quicker interaction and response. Here the proxy-sync-id
helps the resource receiving the event, intermediated by the client,
to decide if this event has been processed through a direct
interaction of the resources.
proxy-sync-id = "Proxy-Sync-Id" ":" 1*VCHAR CRLF
6.2.5. Accept-Charset
See [H14.2]. This specifies the acceptable character set for
entities returned in the response or events associated with this
request. This is useful in specifying the character set to use in
the NLSML results of a "RECOGNITION-COMPLETE" event.
6.2.6. Content-Type
See [H14.17]. MRCPv2 supports a restricted set of registered Media
Types for content, including speech markup, grammar, and recognition
results. The content types applicable to each MRCPv2 resource-type
are specified in the corresponding section of the document. The
multi-part content type "multi-part/mixed" is supported to
communicate multiple of the above mentioned contents, in which case
the body parts MUST NOT contain any MRCPv2 specific headers.
6.2.7. Content-ID
This header contains an ID or name for the content by which it can be
referenced. This header operates according to the specification in
RFC2392 [RFC2392] and is required for content disambiguation in
multi-part messages. In MRCPv2 whenever the associated content is
stored, by either the client or the server, it MUST be retrievable
using this ID. Such content can be referenced later in a session by
addressing it with the ""session:"" URI scheme described in
Section 13.6.
6.2.8. Content-Base
The content-base entity-header may be used to specify the base URI
for resolving relative URLs within the entity.
content-base = "Content-Base" ":" absoluteURI CRLF
Note, however, that the base URI of the contents within the entity-
body may be redefined within that entity-body. An example of this
would be multi-part media, which in turn can have multiple entities
within it.
Shanmugham & Burnett Expires December 25, 2008 [Page 33]
Internet-Draft MRCPv2 June 2008
6.2.9. Content-Encoding
The content-encoding entity-header is used as a modifier to the
media-type. When present, its value indicates what additional
content encoding has been applied to the entity-body, and thus what
decoding mechanisms must be applied in order to obtain the media-type
referenced by the content-type header. Content-encoding is primarily
used to allow a document to be compressed without losing the identity
of its underlying media type. Note that the SDP session can be used
to determine accepted encodings (see Section 7).
content-encoding = "Content-Encoding" ":"
*WSP content-coding
*(*WSP "," *WSP content-coding *WSP )
CRLF
Content-coding is defined in [H3.5]. An example of its use is
Content-Encoding:gzip
If multiple encodings have been applied to an entity, the content
encodings MUST be listed in the order in which they were applied.
6.2.10. Content-Location
The content-location entity-header MAY be used to supply the resource
location for the entity enclosed in the message when that entity is
accessible from a location separate from the requested resource's
URI. Refer to [H14.14].
content-location = "Content-Location" ":"
( absoluteURI / relativeURI ) CRLF
The content-location value is a statement of the location of the
resource corresponding to this particular entity at the time of the
request. This header is provided for optimization purposes only.
The receiver of this header MAY assume that the entity being sent is
identical to what would have been retrieved or might already have
been retrieved from the content-location URI.
For example, if the client provided a grammar markup inline, and it
had previously retrieved it from a certain URI, that URI can be
provided as part of the entity, using the content-location header.
This allows a resource like the recognizer to look into its cache to
see if this grammar was previously retrieved, compiled and cached.
In this case, it might optimize by using the previously compiled
grammar object.
Shanmugham & Burnett Expires December 25, 2008 [Page 34]
Internet-Draft MRCPv2 June 2008
If the content-location is a relative URI, the relative URI is
interpreted relative to the content-base URI.
6.2.11. Content-Length
This header contains the length of the content of the message body
(i.e. after the double CRLF following the last header field). Unlike
HTTP, it MUST be included in all messages that carry content beyond
the header portion of the message. If it is missing, a default value
of zero is assumed. Otherwise, it is interpreted according to
[H14.13]. When a message having no use for a message body contains
one, i.e. the Content-Length is non-zero, the receiver MAY ignore the
content of the message body.
6.2.12. Fetch Timeout
When the recognizer or synthesizer needs to fetch documents or other
resources this header controls the corresponding URI access
properties. This defines the timeout for content that the server may
need to fetch over the network. The value is interpreted to be in
milliseconds and ranges from 0 to an implementation-specific maximum
value. The default value for this header is implementation-specific.
This header MAY occur in "DEFINE-GRAMMAR", "RECOGNIZE", "SPEAK",
"SET-PARAMS" or "GET-PARAMS".
fetch-timeout = "Fetch-Timeout" ":" 1*19DIGIT CRLF
6.2.13. Cache-Control
If the server implements content caching, it MUST adhere to the cache
correctness rules of HTTP 1.1 [RFC2616] when accessing and caching
stored content. In particular, the "expires" and "cache-control"
headers of the cached URI or document MUST be honored and take
precedence over the Cache-Control defaults set by this header. The
cache-control directives are used to define the default caching
algorithms on the server for the session or request. The scope of
the directive is based on the method it is sent on. If the
directives are sent on a "SET-PARAMS" method, it applies for all
requests for external documents the server makes during that session,
unless overridden by a cache-control header on an individual request.
If the directives are sent on any other requests they apply only to
external document requests the server makes for that request. An
empty cache-control header on the "GET-PARAMS" method is a request
for the server to return the current cache-control directives setting
on the server.
Shanmugham & Burnett Expires December 25, 2008 [Page 35]
Internet-Draft MRCPv2 June 2008
cache-control = "Cache-Control" ":" cache-directive
*("," *LWS cache-directive) CRLF
cache-directive = "max-age" "=" delta-seconds
/ "max-stale" [ "=" delta-seconds ]
/ "min-fresh" "=" delta-seconds
delta-seconds = 1*19DIGIT
Here delta-seconds is a decimal time value specifying the number of
seconds since the instant the message response or data was received
by the server.
The cache-directives allow the client to ask the server to override
the default cache expiration mechanisms.
max-age Indicates that the client can tolerate the server
using content whose age is no greater than the
specified time in seconds. Unless a max-stale
directive is also included, the client is not willing
to accept a response based on stale data.
min-fresh Indicates that the client is willing to accept a
server response with cached data whose expiration is
no less than its current age plus the specified time
in seconds. If the server's cache time to live
exceeds the client-supplied min-fresh value, the
server MUST NOT utilize cached content.
max-stale Indicates that the client is willing to allow a server
to utilize cached data that has exceeded its
expiration time. If max-stale is assigned a value,
then the client is willing to allow the server to use
cached data that has exceeded its expiration time by
no more than the specified number of seconds. If no
value is assigned to max-stale, then the client is
willing to allow the server to use stale data of any
age.
The server cache MAY be requested to use stale response/data without
validation, but only if this does not conflict with any "MUST"-level
requirements concerning cache validation (e.g., a "must-revalidate"
cache-control directive in the HTTP 1.1 specification pertaining to
the corresponding URI).
If both the MRCPv2 cache-control directive and the cached entry on
the server include "max-age" directives, then the lesser of the two
values is used for determining the freshness of the cached entry for
that request.
Shanmugham & Burnett Expires December 25, 2008 [Page 36]
Internet-Draft MRCPv2 June 2008
6.2.14. Logging-Tag
This header MAY be sent as part of a "SET-PARAMS"/"GET-PARAMS" method
to set or retrieve the logging tag for logs generated by the server.
Once set, the value persists until a new value is set or the session
ends. The MRCPv2 server MAY provide a mechanism to subset its output
logs so that system administrators can examine or extract only the
log file portion during which the logging tag was set to a certain
value.
It is RECOMMENDED that clients have some identifying information in
the logging tag, so that one can determine which client request
generated a given log message at the server.
logging-tag = "Logging-Tag" ":" 1*UTFCHAR CRLF
6.2.15. Set-Cookie and Set-Cookie2
Since the associated HTTP client on an MRCPv2 server fetches
documents for processing on behalf of the MRCPv2 client, the cookie
store in the HTTP client of the MRCPv2 server is treated as an
extension of the cookie store in the HTTP client of the MRCPv2
client. This requires that the MRCPv2 client and server be able to
synchronize their common cookie store as needed. To enable the
MRCPv2 client to push its stored cookies to the MRCPv2 server and get
new cookies from the MRCPv2 server stored back to the MRCPv2 client,
the set-cookie and set-cookie2 entity-header fields MAY be included
in MRCPv2 requests to update the cookie store on a server and be
returned in final MRCPv2 responses or events to subsequently update
the client's own cookie store. The stored cookies on the server
persist for the duration of the MRCPv2 session and MUST be destroyed
at the end of the session. To ensure support for the type of cookie
header dictated by the HTTP origin server, MRCPv2 clients and servers
MUST support both the set-cookie and set-cookie2 entity header
fields.
Shanmugham & Burnett Expires December 25, 2008 [Page 37]
Internet-Draft MRCPv2 June 2008
set-cookie = "Set-Cookie:" cookies CRLF
cookies = cookie *("," *LWS cookie)
cookie = attribute "=" value *(";" cookie-av)
cookie-av = "Comment" "=" value
/ "Domain" "=" value
/ "Max-Age" "=" value
/ "Path" "=" value
/ "Secure"
/ "Version" "=" 1*19DIGIT
/ "Age" "=" delta-seconds
set-cookie2 = "Set-Cookie2:" cookies2 CRLF
cookies2 = cookie2 *("," *LWS cookie2)
cookie2 = attribute "=" value *(";" cookie-av2)
cookie-av2 = "Comment" "=" value
/ "CommentURL" "=" DQUOTE uri DQUOTE
/ "Discard"
/ "Domain" "=" value
/ "Max-Age" "=" value
/ "Path" "=" value
/ "Port" [ "=" DQUOTE portlist DQUOTE ]
/ "Secure"
/ "Version" "=" 1*19DIGIT
/ "Age" "=" delta-seconds
portlist = portnum *("," *LWS portnum)
portnum = 1*19DIGIT
The set-cookie and set-cookie2 headers are specified in RFC2109
[RFC2109] and RFC2965 [RFC2965], respectively. The "Age" attribute
is introduced in this specification to indicate the age of the cookie
and is optional. An MRCPv2 client or server MUST calculate the age
of the cookie according to the age calculation rules in the HTTP/1.1
specification [RFC2616] and append the "Age" attribute accordingly.
The MRCPv2 client or server MUST supply defaults for the Domain and
Path attributes if omitted by the HTTP origin server as specified in
RFC2109 (set-cookie) and RFC2965 (set-cookie2). Note that there is
no