draft-ietf-sipping-early-media-01.txt   draft-ietf-sipping-early-media-02.txt 
Internet Engineering Task Force SIPPING WG Internet Engineering Task Force SIPPING WG
Internet Draft G. Camarillo Internet Draft G. Camarillo
Ericsson Ericsson
H. Schulzrinne H. Schulzrinne
Columbia University Columbia University
draft-ietf-sipping-early-media-01.txt draft-ietf-sipping-early-media-02.txt
November 18, 2003 June 1, 2004
Expires: May, 2004 Expires: December, 2004
Early Media and Ringing Tone Generation Early Media and Ringing Tone Generation
in the Session Initiation Protocol in the Session Initiation Protocol (SIP)
STATUS OF THIS MEMO STATUS OF THIS MEMO
This document is an Internet-Draft and is in full conformance with By submitting this Internet-Draft, I certify that any applicable
all provisions of Section 10 of RFC2026. patent or other IPR claims of which I am aware have been disclosed,
and any of which I become aware will be disclosed, in accordance with
RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that other
other groups may also distribute working documents as Internet- groups may also distribute working documents as Internet-Drafts.
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress". material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt.
To view the list Internet-Draft Shadow Directories, see The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
This document describes how to manage early media in SIP using two This document describes how to manage early media in SIP using two
models; the gateway model and the application server model. It also models; the gateway model and the application server model. It also
describes the inputs one needs to consider to define local policies describes the inputs one needs to consider to define local policies
for ringing tone generation. for ringing tone generation.
Table of Contents Table of Contents
1 Introduction ........................................ 3 1 Introduction ........................................ 3
2 Session Establishment in SIP ........................ 3 2 Session Establishment in SIP ........................ 3
3 The Gateway Model ................................... 4 3 The Gateway Model ................................... 4
3.1 Forking ............................................. 5 3.1 Forking ............................................. 5
3.2 Ringing Tone Generation ............................. 6 3.2 Ringing Tone Generation ............................. 6
3.3 Absence of an Early Media Indicator ................. 7 3.3 Absence of an Early Media Indicator ................. 8
3.4 Applicability of the Gateway Model .................. 8 3.4 Applicability of the Gateway Model .................. 8
4 The Application Server Model ........................ 8 4 The Application Server Model ........................ 9
4.1 In-Band Versus Out-of-Band Session Progress 4.1 In-Band Versus Out-of-Band Session Progress
Information ......................................... 9 Information ......................................... 10
5 Alert-Info Header Field ............................. 9 5 Alert-Info Header Field ............................. 10
6 Acknowledgments ..................................... 10 6 Security Considerations ............................. 10
7 Authors' Addresses .................................. 10 7 Acknowledgments ..................................... 11
8 Bibliography ........................................ 10 8 Authors' Addresses .................................. 11
9 Normative References ................................ 12
10 Informative References .............................. 12
1 Introduction 1 Introduction
Early media refers to media (e.g., audio and video) that is exchanged Early media refers to media (e.g., audio and video) that is exchanged
before a particular session is accepted by the called user. Within a before a particular session is accepted by the called user. Within a
dialog, early media occurs from the moment the initial INVITE is sent dialog, early media occurs from the moment the initial INVITE is sent
until the UAS generates a final response. It may be unidirectional or until the UAS generates a final response. It may be unidirectional or
bi-directional, and can be generated by the caller, the callee, or bidirectional, and can be generated by the caller, the callee, or
both. Typical examples of early media generated by the callee are both. Typical examples of early media generated by the callee are
ringing tone and announcements (e.g., queuing status.) Early media ringing tone and announcements (e.g., queuing status). Early media
generated by the caller typically consists of voice commands or DTMF generated by the caller typically consists of voice commands or DTMF
tones to drive IVRs. tones to drive IVRs.
The basic SIP spec [1] only supports very simple early media. In The basic SIP specification (RFC 3261 [1]) only supports very simple
order to support fully-featured early media, UAs need to implement early media mechanisms. These simple mechanisms have a number of
some extensions in addition to the basic SIP spec. This document problems which relate to forking and security, and do not satisfy the
describes two models to implement early media and the extensions requirements of most applications. This document goes beyond the
needed in each model. mechanisms defined in RFC 3261 [1] and describes two models to
implement early media using SIP: the gateway model and the
application server model.
Section 2 describes the offer/answer model in absence of early media, Although both early media models described in this document are
and Section 3 introduces the gateway model. In this model, the early superior to the one specified in RFC 3261 [1], the gateway model
still presents a set of issues. In particular, the gateway model does
not work well with forking. Nevertheless, the gateway model is needed
because some SIP entities (in particular, some gateways) cannot
implement the application server model.
The application server model addresses some of the issues present in
the gateway model. This model uses the early-session disposition
type, which is specified in [2].
The remainder of this document is organized as follows. Section 2
describes the offer/answer model in absence of early media, and
Section 3 introduces the gateway model. In this model, the early
media session is established using the early dialog established by media session is established using the early dialog established by
the original INVITE. Section 3.1, Section 3.2 and Section 3.4 the original INVITE. Section 3.1, Section 3.2 and Section 3.4
describe the limitations of the gateway model and the scenarios where describe the limitations of the gateway model and the scenarios where
it is appropriate to use this model. Section 4 introduces the it is appropriate to use this model. Section 4 introduces the
application server model, which resolves some of the issues present application server model, which, as stated previously, resolves some
in the gateway model. Section 5 discusses the interactions between of the issues present in the gateway model. Section 5 discusses the
the Alter-Info header field in both early media models. interactions between the Alter-Info header field in both early media
models.
2 Session Establishment in SIP 2 Session Establishment in SIP
Before presenting both early media models, we will briefly summarize Before presenting both early media models, we will briefly summarize
how session establishment works in SIP. This will let us keep how session establishment works in SIP. This will let us keep
separate features that are intrinsic to SIP (e.g., media being played separate features that are intrinsic to SIP (e.g., media being played
before the 200 (OK) to avoid media clipping) from early media before the 200 (OK) to avoid media clipping) from early media
operations. operations.
SIP [1] uses the offer/answer model [2] to negotiate session SIP [1] uses the offer/answer model [3] to negotiate session
parameters. One of the user agents - the offerer - prepares a session parameters. One of the user agents - the offerer - prepares a session
description that is called the offer. The other user agent - the description that is called the offer. The other user agent - the
answerer - responds with another session description called the answerer - responds with another session description called the
answer. This two-way handshake allows both user agents to agree upon answer. This two-way handshake allows both user agents to agree upon
the session parameters to be used to exchange media. the session parameters to be used to exchange media.
The idea behind the offer/answer model is to decouple the The idea behind the offer/answer model is to decouple the
offer/answer exchange from the messages used to transport the session offer/answer exchange from the messages used to transport the session
descriptions. For example, the offer can be sent in an INVITE request descriptions. For example, the offer can be sent in an INVITE request
and the answer can arrive in the 200 (OK) response for that INVITE, and the answer can arrive in the 200 (OK) response for that INVITE,
or, alternatively, the offer can be sent in the 200 (OK) for an empty or, alternatively, the offer can be sent in the 200 (OK) for an empty
INVITE and the answer be sent in the ACK. When reliable provisional INVITE and the answer be sent in the ACK. When reliable provisional
responses [3] and UPDATE requests [4] are used, there are many more responses [4] and UPDATE requests [5] are used, there are many more
possible ways to exchange offers and answers. possible ways to exchange offers and answers.
Media clipping occurs when the user (or the machine generating media) Media clipping occurs when the user (or the machine generating media)
believes that the media session is already established but the believes that the media session is already established but the
establishment process has not finished yet. The user starts speaking establishment process has not finished yet. The user starts speaking
(i.e., generating media) and the first few syllables or even the (i.e., generating media) and the first few syllables or even the
first few words are lost. first few words are lost.
When the offer/answer exchange takes place in the 200 (OK) response When the offer/answer exchange takes place in the 200 (OK) response
and in the ACK, media clipping is unavoidable. The called user starts and in the ACK, media clipping is unavoidable. The called user starts
skipping to change at page 8, line 17 skipping to change at page 8, line 35
and the media path. SIP signalling traverses a different path than and the media path. SIP signalling traverses a different path than
the media. The media path is typically optimized to reduce the end- the media. The media path is typically optimized to reduce the end-
to-end delay (e.g., minimum number of intermediaries) while the SIP to-end delay (e.g., minimum number of intermediaries) while the SIP
signalling path typically traverses a number of proxies providing signalling path typically traverses a number of proxies providing
different services for the session. Due to that reason, it is very different services for the session. Due to that reason, it is very
likely that the media packets with early media reach the UAC before likely that the media packets with early media reach the UAC before
any SIP message which could contain an early media indicator. any SIP message which could contain an early media indicator.
Nevertheless, sometimes, SIP responses arrive at the UAC before any Nevertheless, sometimes, SIP responses arrive at the UAC before any
media packet. There are situations when the UAS intends to send early media packet. There are situations when the UAS intends to send early
media but cannot do it straight away. For example, UAs using ICE [5] media but cannot do it straight away. For example, UAs using ICE [6]
may need to exchange several STUN messages before being able to may need to exchange several STUN messages before being able to
exchange media. In this situations, an early media indicator would exchange media. In this situations, an early media indicator would
keep the UAC from generating local ringing tone during this time. keep the UAC from generating local ringing tone during this time.
However, while the early media is not arriving to the UAC, the user However, while the early media is not arriving to the UAC, the user
would not be aware of the fact that the remote user is being alerted, would not be aware of the fact that the remote user is being alerted,
even though a 180 (Ringing) had been received. Therefore, a better even though a 180 (Ringing) had been received. Therefore, a better
solution would be to apply local ringing tone until the early media solution would be to apply local ringing tone until the early media
packets could be sent from the UAS to the UAC. This solution does not packets could be sent from the UAS to the UAC. This solution does not
require any early media indicator. require any early media indicator.
skipping to change at page 9, line 4 skipping to change at page 9, line 21
The gateway model is, therefore, acceptable in situations where the The gateway model is, therefore, acceptable in situations where the
UA cannot distinguish between early media and regular media. A PSTN UA cannot distinguish between early media and regular media. A PSTN
gateway is an example of this type of situation. The PSTN gateway gateway is an example of this type of situation. The PSTN gateway
receives media from the PSTN over a circuit, and sends it to the IP receives media from the PSTN over a circuit, and sends it to the IP
network. The gateway is not aware of the contents of the media, and network. The gateway is not aware of the contents of the media, and
it does not exactly know when the transition from early to regular it does not exactly know when the transition from early to regular
media takes place. From the PSTN perspective, the circuit is a media takes place. From the PSTN perspective, the circuit is a
continuous source of media. continuous source of media.
4 The Application Server Model 4 The Application Server Model
The application server model consists of having UAS behave as an The application server model consists of having UAS behave as an
application server to establish early media sessions with the UAC. application server to establish early media sessions with the UAC.
The UAC indicates support for the early-session disposition type The UAC indicates support for the early-session disposition type
(defined in [6]) using the early-session option tag. This way, UASs (defined in [2]) using the early-session option tag. This way, UASs
know that they can keep offer/answer exchanges for early media know that they can keep offer/answer exchanges for early media
(early-session disposition type) and for regular media (session (early-session disposition type) and for regular media (session
disposition type) separate. disposition type) separate.
Sending early media using a different offer/answer exchange than the Sending early media using a different offer/answer exchange than the
one used for sending regular media helps avoid media clipping in case one used for sending regular media helps avoid media clipping in case
of forking. The UAC can reject or mute new offers for early media of forking. The UAC can reject or mute new offers for early media
without muting the sessions that will carry media when the original without muting the sessions that will carry media when the original
INVITE is accepted. The UAC can give priority to media received over INVITE is accepted. The UAC can give priority to media received over
the latter sessions. This way, the application server model the latter sessions. This way, the application server model
skipping to change at page 10, line 4 skipping to change at page 10, line 21
are rendered to the user. In order to make this choice easier to UAs, are rendered to the user. In order to make this choice easier to UAs,
it is strongly recommended that information that is not essential for it is strongly recommended that information that is not essential for
the session is not transmitted using early media. For instance, UAs the session is not transmitted using early media. For instance, UAs
should not use early media to send special ringing tones. SIP already should not use early media to send special ringing tones. SIP already
provides a means to inform the remote user about session provides a means to inform the remote user about session
establishment progress which does not cause any of the problems establishment progress which does not cause any of the problems
associated with early media; the status code and the reason phrase in associated with early media; the status code and the reason phrase in
provisional responses. provisional responses.
5 Alert-Info Header Field 5 Alert-Info Header Field
The Alert-Info header field allows specifying an alternative ringing The Alert-Info header field allows specifying an alternative ringing
content, such as ringing tone, to the UAC. This header field tells content, such as ringing tone, to the UAC. This header field tells
the UAC which tone should be played in case local ringing is the UAC which tone should be played in case local ringing is
generated, but it does not tell the UAC when to generate local generated, but it does not tell the UAC when to generate local
ringing. A UAC should follow the rules described above for ringing ringing. A UAC should follow the rules described above for ringing
tone generation in both models. If, after following those rules, the tone generation in both models. If, after following those rules, the
UAC decides to play local ringing, it can then use the Alert-Info UAC decides to play local ringing, it can then use the Alert-Info
header field to generate it. header field to generate it.
6 Acknowledgments 6 Security Considerations
SIP uses the offer/answer model [3] to establish early sessions in
both the gateway and the application server models. User Agents (UAs)
generate a session description, which contains the transport address
(i.e., IP address plus port) where they want to receive media, and
send it to their peer in a SIP message. When media packets arrive at
this transport address, the UA assumes that they come from the
receiver of the SIP message carrying the session description.
Nevertheless, attackers may attempt to gain access to the contents of
the SIP message and send packets to the transport address contained
in the session description. To prevent this situation, UAs SHOULD
encrypt their session descriptions (e.g., using S/MIME).
Still, even if a UA encrypts its session descriptions, an attacker
may try to guess the transport address used by the UA and send media
packets to that address. Guessing such a transport address is
sometimes easier than it may seem because many UAs always pick up the
same initial media port. To prevent this situation, UAs SHOULD use
media-level authentication mechanisms (e.g., SRTP [7]). In addition,
UAs that wish to keep their communications confidential SHOULD use
media-level encryption mechanisms (e.g, SRTP [7]).
Attackers may attempt to make a UA send media to a victim as part of
a DoS attack. This can be done by sending a session description with
the victim's transport address to the UA. To prevent this attack, the
UA SHOULD engage in a handshake with the owner of the transport
address received in a session descriptions (just verifying
willingness to receive media) before sending a large amount of data
to the transport address. This check can be performed by using a
connection oriented transport protocol, by using STUN [8] in an end-
to-end fashion, or by the key exchange in SRTP [7].
In any event, note that the previous security considerations are not
early media specific, but apply to the usage of the offer/answer
model in SIP to establish sessions in general.
Additionally, an early media-specific risk (roughly speaking, an
equivalent to forms of "toll fraud" in the PSTN) attempts to exploit
the different charging policies some operators apply to early and to
regular media. When UAs are allowed to exchange early media for free,
but are required to pay for regular media sessions, rogue UAs may try
to establish a bidirectional early media session and never send a 2xx
response for the INVITE.
On the other hand, some application servers (e.g., Interactive Voice
Response systems) use bidirectional early media to obtain information
from the callers (e.g., the PIN code of a calling card). So, we do
not recommend that operators disallow bidirectional early media.
Instead, operators should consider a remedy of charging early media
exchanges that last too long, or stopping them at the media level
(according to the operator's policy).
7 Acknowledgments
Jon Peterson provided useful ideas on the separation between the Jon Peterson provided useful ideas on the separation between the
gateway model and the application server model. gateway model and the application server model.
Paul Kyzivat, Christer Holmberg, Bill Marshall, Francois Audet, John Paul Kyzivat, Christer Holmberg, Bill Marshall, Francois Audet, John
Hearty, Adam Roach, Eric Burger, and Rohan Mahy provided useful Hearty, Adam Roach, Eric Burger, Rohan Mahy, and Allison Mankin
comments and suggestions. provided useful comments and suggestions.
7 Authors' Addresses 8 Authors' Addresses
Gonzalo Camarillo Gonzalo Camarillo
Ericsson Ericsson
Advanced Signalling Research Lab. Advanced Signalling Research Lab.
FIN-02420 Jorvas FIN-02420 Jorvas
Finland Finland
electronic mail: Gonzalo.Camarillo@ericsson.com electronic mail: Gonzalo.Camarillo@ericsson.com
Henning Schulzrinne Henning Schulzrinne
Dept. of Computer Science Dept. of Computer Science
Columbia University 1214 Amsterdam Avenue, MC 0401 Columbia University 1214 Amsterdam Avenue, MC 0401
New York, NY 10027 New York, NY 10027
USA USA
electronic mail: schulzrinne@cs.columbia.edu electronic mail: schulzrinne@cs.columbia.edu
8 Bibliography 9 Normative References
[1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. R. Johnston, J. [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. R. Johnston, J.
Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session
initiation protocol," RFC 3261, Internet Engineering Task Force, June initiation protocol," RFC 3261, Internet Engineering Task Force, June
2002. 2002.
[2] J. Rosenberg and H. Schulzrinne, "An offer/answer model with [2] G. Camarillo, "The early session disposition type for the session
initiation protocol (SIP)," Internet Draft draft-ietf-sipping-early-
disposition-01, Internet Engineering Task Force, Jan. 2004. Work in
progress.
[3] J. Rosenberg and H. Schulzrinne, "An offer/answer model with
session description protocol (SDP)," RFC 3264, Internet Engineering session description protocol (SDP)," RFC 3264, Internet Engineering
Task Force, June 2002. Task Force, June 2002.
[3] J. Rosenberg and H. Schulzrinne, "Reliability of provisional 10 Informative References
[4] J. Rosenberg and H. Schulzrinne, "Reliability of provisional
responses in session initiation protocol (SIP)," RFC 3262, Internet responses in session initiation protocol (SIP)," RFC 3262, Internet
Engineering Task Force, June 2002. Engineering Task Force, June 2002.
[4] J. Rosenberg, "The session initiation protocol (SIP) UPDATE [5] J. Rosenberg, "The session initiation protocol (SIP) UPDATE
method," RFC 3311, Internet Engineering Task Force, Oct. 2002. method," RFC 3311, Internet Engineering Task Force, Oct. 2002.
[5] J. Rosenberg, "Interactive connectivity establishment (ICE): a [6] J. Rosenberg, "Interactive connectivity establishment (ICE): a
methodology for network address translator (NAT) traversal for the methodology for nettwork address translator (NAT) traversal for the
session initiation protocol (SIP)," Internet draft, Internet session initiation protocol (SIP)," internet draft, Internet
Engineering Task Force, July 2003. Work in progress. Engineering Task Force, July 2003. Work in progress.
[6] G. Camarillo, "The early session disposition type for the session [7] M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman,
initiation protocol (SIP)," Internet Draft, Internet Engineering Task "The secure real-time transport protocol (SRTP)," RFC 3711, Internet
Force, Oct. 2003. Work in progress. Engineering Task Force, Mar 2004.
[8] J. Rosenberg, J. Weinberger, C. Huitema, and R. Mahy, "STUN -
simple traversal of user datagram protocol (UDP) through network
address translators (nats)," RFC 3489, Internet Engineering Task
Force, Mar. 2003.
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it might or might not be available; nor does it represent that it has
has made any effort to identify any such rights. Information on the made any independent effort to identify any such rights. Information
IETF's procedures with respect to rights in standards-track and on the IETF's procedures with respect to rights in IETF Documents can
standards-related documentation can be found in BCP-11. Copies of be found in BCP 78 and BCP 79.
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to Copies of IPR disclosures made to the IETF Secretariat and any
obtain a general license or permission for the use of such assurances of licenses to be made available, or the result of an
proprietary rights by implementors or users of this specification can attempt made to obtain a general license or permission for the use of
be obtained from the IETF Secretariat. such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to practice rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF Executive this standard. Please address the information to the IETF at ietf-
Director. ipr@ietf.org.
Full Copyright Statement Disclaimer of Validity
Copyright (c) The Internet Society (2003). All Rights Reserved. This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
This document and translations of it may be copied and furnished to Copyright Statement
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be Copyright (C) The Internet Society (2004). This document is subject
revoked by the Internet Society or its successors or assigns. to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
This document and the information contained herein is provided on an Acknowledgment
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING Funding for the RFC Editor function is currently provided by the
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION Internet Society.
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/