draft-ietf-sipping-early-media-02.txt   rfc3960.txt 
Internet Engineering Task Force SIPPING WG Network Working Group G. Camarillo
Internet Draft G. Camarillo Request for Comments: 3960 Ericsson
Ericsson Category: Informational H. Schulzrinne
H. Schulzrinne
Columbia University Columbia University
draft-ietf-sipping-early-media-02.txt December 2004
June 1, 2004
Expires: December, 2004
Early Media and Ringing Tone Generation Early Media and Ringing Tone Generation
in the Session Initiation Protocol (SIP) in the Session Initiation Protocol (SIP)
STATUS OF THIS MEMO Status of This Memo
By submitting this Internet-Draft, I certify that any applicable
patent or other IPR claims of which I am aware have been disclosed,
and any of which I become aware will be disclosed, in accordance with
RFC 3668.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months This memo provides information for the Internet community. It does
and may be updated, replaced, or obsoleted by other documents at any not specify an Internet standard of any kind. Distribution of this
time. It is inappropriate to use Internet-Drafts as reference memo is unlimited.
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at Copyright Notice
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at Copyright (C) The Internet Society (2004).
http://www.ietf.org/shadow.html.
Abstract Abstract
This document describes how to manage early media in SIP using two This document describes how to manage early media in the Session
models; the gateway model and the application server model. It also Initiation Protocol (SIP) using two models: the gateway model and the
describes the inputs one needs to consider to define local policies application server model. It also describes the inputs one needs to
for ringing tone generation. consider in defining local policies for ringing tone generation.
Table of Contents Table of Contents
1 Introduction ........................................ 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Session Establishment in SIP ........................ 3 2. Session Establishment in SIP . . . . . . . . . . . . . . . . . 3
3 The Gateway Model ................................... 4 3. The Gateway Model. . . . . . . . . . . . . . . . . . . . . . . 4
3.1 Forking ............................................. 5 3.1. Forking. . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Ringing Tone Generation ............................. 6 3.2. Ringing Tone Generation. . . . . . . . . . . . . . . . . 5
3.3 Absence of an Early Media Indicator ................. 8 3.3. Absence of an Early Media Indicator. . . . . . . . . . . 7
3.4 Applicability of the Gateway Model .................. 8 3.4. Applicability of the Gateway Model . . . . . . . . . . . 8
4 The Application Server Model ........................ 9 4. The Application Server Model . . . . . . . . . . . . . . . . . 8
4.1 In-Band Versus Out-of-Band Session Progress 4.1. In-Band Versus Out-of-Band Session Progress Information. 9
Information ......................................... 10 5. Alert-Info Header Field. . . . . . . . . . . . . . . . . . . . 9
5 Alert-Info Header Field ............................. 10 6. Security Considerations. . . . . . . . . . . . . . . . . . . . 9
6 Security Considerations ............................. 10 7. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 10
7 Acknowledgments ..................................... 11 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8 Authors' Addresses .................................. 11 8.1. Normative References . . . . . . . . . . . . . . . . . . 11
9 Normative References ................................ 12 8.2. Informative References . . . . . . . . . . . . . . . . . 11
10 Informative References .............................. 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 12
Full Copyright Statement . . . . . . . . . . . . . . . . . . . 13
1 Introduction 1. Introduction
Early media refers to media (e.g., audio and video) that is exchanged Early media refers to media (e.g., audio and video) that is exchanged
before a particular session is accepted by the called user. Within a before a particular session is accepted by the called user. Within a
dialog, early media occurs from the moment the initial INVITE is sent dialog, early media occurs from the moment the initial INVITE is sent
until the UAS generates a final response. It may be unidirectional or until the User Agent Server (UAS) generates a final response. It may
bidirectional, and can be generated by the caller, the callee, or be unidirectional or bidirectional, and can be generated by the
both. Typical examples of early media generated by the callee are caller, the callee, or both. Typical examples of early media
ringing tone and announcements (e.g., queuing status). Early media generated by the callee are ringing tone and announcements (e.g.,
generated by the caller typically consists of voice commands or DTMF queuing status). Early media generated by the caller typically
tones to drive IVRs. consists of voice commands or dual tone multi-frequency (DTMF) tones
to drive interactive voice response (IVR) systems.
The basic SIP specification (RFC 3261 [1]) only supports very simple The basic SIP specification (RFC 3261 [1]) only supports very simple
early media mechanisms. These simple mechanisms have a number of early media mechanisms. These simple mechanisms have a number of
problems which relate to forking and security, and do not satisfy the problems which relate to forking and security, and do not satisfy the
requirements of most applications. This document goes beyond the requirements of most applications. This document goes beyond the
mechanisms defined in RFC 3261 [1] and describes two models to mechanisms defined in RFC 3261 [1] and describes two models of early
implement early media using SIP: the gateway model and the media implementations using SIP: the gateway model and the
application server model. application server model.
Although both early media models described in this document are Although both early media models described in this document are
superior to the one specified in RFC 3261 [1], the gateway model superior to the one specified in RFC 3261 [1], the gateway model
still presents a set of issues. In particular, the gateway model does still presents a set of issues. In particular, the gateway model
not work well with forking. Nevertheless, the gateway model is needed does not work well with forking. Nevertheless, the gateway model is
because some SIP entities (in particular, some gateways) cannot needed because some SIP entities (in particular, some gateways)
implement the application server model. cannot implement the application server model.
The application server model addresses some of the issues present in The application server model addresses some of the issues present in
the gateway model. This model uses the early-session disposition the gateway model. This model uses the early-session disposition
type, which is specified in [2]. type, which is specified in [2].
The remainder of this document is organized as follows. Section 2 The remainder of this document is organized as follows: Section 2
describes the offer/answer model in absence of early media, and describes the offer/answer model in the absence of early media, and
Section 3 introduces the gateway model. In this model, the early Section 3 introduces the gateway model. In this model, the early
media session is established using the early dialog established by media session is established using the early dialog established by
the original INVITE. Section 3.1, Section 3.2 and Section 3.4 the original INVITE. Sections 3.1, 3.2, and 3.4 describe the
describe the limitations of the gateway model and the scenarios where limitations of the gateway model and the scenarios where it is
it is appropriate to use this model. Section 4 introduces the appropriate to use this model. Section 4 introduces the application
application server model, which, as stated previously, resolves some server model, which, as stated previously, resolves some of the
of the issues present in the gateway model. Section 5 discusses the issues present in the gateway model. Section 5 discusses the
interactions between the Alter-Info header field in both early media interactions between the Alert-Info header field in both early media
models. models.
2 Session Establishment in SIP The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", " NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [9].
2. Session Establishment in SIP
Before presenting both early media models, we will briefly summarize Before presenting both early media models, we will briefly summarize
how session establishment works in SIP. This will let us keep how session establishment works in SIP. This will let us keep
separate features that are intrinsic to SIP (e.g., media being played separate features that are intrinsic to SIP (e.g., media being played
before the 200 (OK) to avoid media clipping) from early media before the 200 (OK) to avoid media clipping) from early media
operations. operations.
SIP [1] uses the offer/answer model [3] to negotiate session SIP [1] uses the offer/answer model [3] to negotiate session
parameters. One of the user agents - the offerer - prepares a session parameters. One of the user agents - the offerer - prepares a
description that is called the offer. The other user agent - the session description that is called the offer. The other user agent
answerer - responds with another session description called the - the answerer - responds with another session description called the
answer. This two-way handshake allows both user agents to agree upon answer. This two-way handshake allows both user agents to agree upon
the session parameters to be used to exchange media. the session parameters to be used to exchange media.
The idea behind the offer/answer model is to decouple the The offer/answer model decouples the offer/answer exchange from the
offer/answer exchange from the messages used to transport the session messages used to transport the session descriptions. For example,
descriptions. For example, the offer can be sent in an INVITE request the offer can be sent in an INVITE request and the answer can arrive
and the answer can arrive in the 200 (OK) response for that INVITE, in the 200 (OK) response for that INVITE, or, alternatively, the
or, alternatively, the offer can be sent in the 200 (OK) for an empty offer can be sent in the 200 (OK) for an empty INVITE and the answer
INVITE and the answer be sent in the ACK. When reliable provisional can be sent in the ACK. When reliable provisional responses [4] and
responses [4] and UPDATE requests [5] are used, there are many more UPDATE requests [5] are used, there are many more possible ways to
possible ways to exchange offers and answers. exchange offers and answers.
Media clipping occurs when the user (or the machine generating media) Media clipping occurs when the user (or the machine generating media)
believes that the media session is already established but the believes that the media session is already established, but the
establishment process has not finished yet. The user starts speaking establishment process has not finished yet. The user starts speaking
(i.e., generating media) and the first few syllables or even the (i.e., generating media) and the first few syllables or even the
first few words are lost. first few words are lost.
When the offer/answer exchange takes place in the 200 (OK) response When the offer/answer exchange takes place in the 200 (OK) response
and in the ACK, media clipping is unavoidable. The called user starts and in the ACK, media clipping is unavoidable. The called user
speaking at the same time as the 200 (OK) is sent, but the UAS cannot starts speaking at the same time the 200 (OK) is sent, but the UAS
send any media until the answer from the UAC arrives in the ACK. cannot send any media until the answer from the User Agent Client
(UAC) arrives in the ACK.
On the other hand, media clipping does not appear in the most common On the other hand, media clipping does not appear in the most common
offer/answer exchange (an INVITE with an offer and a 200 (OK) with an offer/answer exchange (an INVITE with an offer and a 200 (OK) with an
answer). UACs are ready to play incoming media packets as soon as answer). UACs are ready to play incoming media packets as soon as
they send an offer. They do this because they cannot count on the they send an offer, because they cannot count on the reception of the
reception of the 200 (OK) to start playing out media for the caller; 200 (OK) to start playing out media for the caller; SIP signalling
SIP signalling and media packets typically traverse different paths, and media packets typically traverse different paths, and so, media
and so, media packets may arrive before the 200 (OK) response. packets may arrive before the 200 (OK) response.
Another form of media clipping (not related to early media either) Another form of media clipping (not related to early media either)
occurs in the caller->callee direction. When the callee picks up and occurs in the caller-to-callee direction. When the callee picks up
starts speaking, the UAS sends a 200 (OK) response with an answer and and starts speaking, the UAS sends a 200 (OK) response with an
the first media packets in parallel. If the first media packets answer, in parallel with the first media packets. If the first media
arrive to the UAC before the answer, and the caller starts speaking packets arrive at the UAC before the answer and the caller starts
as well, the UAC cannot send media until the 2xx response from the speaking, the UAC cannot send media until the 200 (OK) response from
UAS arrives. the UAS arrives.
3. The Gateway Model
3 The Gateway Model
SIP uses the offer/answer model to negotiate session parameters (as SIP uses the offer/answer model to negotiate session parameters (as
described in Section 2). An offer/answer exchange that takes place described in Section 2). An offer/answer exchange that takes place
before a final response for the INVITE is sent establishes an "early" before a final response for the INVITE is sent establishes an "early"
media session. Early media sessions terminate when a final response media session. Early media sessions terminate when a final response
for the INVITE is sent. If the final response is a 2xx, the early for the INVITE is sent. If the final response is a 200 (OK), the
media session transitions to a regular media session. If the final early media session transitions to a regular media session. If the
response is a non-2xx final response, the early media session is final response is a non-200 class final response, the early media
simply terminated. session is simply terminated.
Media exchanged within an early media session is, not surprisingly, Not surprisingly, media exchanged within an early media session is
referred to as early media. The gateway model consists of managing referred to as early media. The gateway model consists of managing
early media sessions using offer/answer exchanges in reliable early media sessions using offer/answer exchanges in reliable
provisional responses, PRACKs, and UPDATEs. provisional responses, PRACKs, and UPDATEs.
The gateway model presents serious limitations in presence of The gateway model is seriously limited in the presence of forking, as
forking, as described in Section 3.1. Therefore, its use in only described in Section 3.1. Therefore, its use is only acceptable when
acceptable when the UA cannot distinguish between early and regular the User Agent (UA) cannot distinguish between early and regular
media, as described in Section 3.4. In any other situation (the media, as described in Section 3.4. In any other situation (the
majority of UAs), it is strongly recommended that the application majority of UAs), use of the application server model described in
server model described in Section 4 is used instead. Section 4 is strongly recommended instead.
3.1 Forking 3.1. Forking
In the absence of forking, assuming that the initial INVITE contains In the absence of forking, assuming that the initial INVITE contains
an offer, the gateway model does not introduce media clipping. an offer, the gateway model does not introduce media clipping.
Following normal SIP procedures, the UAC is ready to play any Following normal SIP procedures, the UAC is ready to play any
incoming media as soon as it sends the initial offer in the INVITE. incoming media as soon as it sends the initial offer in the INVITE.
The UAS sends the answer in a reliable provisional response and can The UAS sends the answer in a reliable provisional response and can
send media as soon as there is media to send. Even if the first media send media as soon as there is media to send. Even if the first
packets arrive to the UAC before the 1xx response, the UAC will play media packets arrive at the UAC before the 1xx response, the UAC will
them. play them.
Note that, in some situations, the UAC does need to receive Note that, in some situations, the UAC needs to receive the answer
the answer before being able to play any media. UAs in such before being able to play any media. UAs in such a situation
a situation (e.g., QoS, media authorization or media (e.g., QoS, media authorization, or media encryption is required)
encryption is required) use preconditions to avoid media use preconditions to avoid media clipping.
clipping.
On the other hand, if the INVITE forks, the gateway model may On the other hand, if the INVITE forks, the gateway model may
introduce media clipping. This happens when the UAC receives introduce media clipping. This happens when the UAC receives
different answers to its offer in several provisional responses from different answers to its offer in several provisional responses from
different UASs. The UAC has to deal with bandwidth limitations and different UASs. The UAC has to deal with bandwidth limitations and
early media session selection. early media session selection.
If the UAC receives early media from different UASs, it needs to If the UAC receives early media from different UASs, it needs to
present it to the user. If the early media consists of audio, playing present it to the user. If the early media consists of audio,
several audio streams to the user at the same time may be confusing. playing several audio streams to the user at the same time may be
Other media types (e.g., video), on the other hand, can be presented confusing. On the other hand, other media types (e.g., video) can be
to the user at the same time. The UAC can, for example, build a presented to the user at the same time. For example, the UAC can
mosaic with the different inputs. build a mosaic with the different inputs.
However, even with media types that can be played at the same time to However, even with media types that can be played at the same time to
the user, if the UAC has limited bandwidth, it will not be able to the user, if the UAC has limited bandwidth, it will not be able to
receive early media from all the different UASs at the same time. receive early media from all the different UASs at the same time.
Therefore, many times, the UAC needs to choose a single early media Therefore, many times, the UAC needs to choose a single early media
session and "mute" the rest of them sending UPDATE requests. session and "mute" those sending UPDATE requests.
It is difficult to decide which early media session carry It is difficult to decide which early media sessions carry more
more important information from the caller's perspective. important information from the caller's perspective. In fact, in
In fact, in some scenarios, the UA cannot even correlate some scenarios, the UA cannot even correlate media packets with
media packets with their particular SIP early dialog. their particular SIP early dialog. Therefore, UACs typically pick
Therefore, UACs typically pick up one early dialog randomly one early dialog randomly and mute the rest.
and mute the rest.
If one of the early media sessions that was muted transitions to a If one of the early media sessions that was muted transitions to a
regular media session (i.e., the UAS sends a 2xx response), media regular media session (i.e., the UAS sends a 2xx response), media
clipping is likely to appear. The UAC typically sends an UPDATE with clipping is likely. The UAC typically sends an UPDATE with a new
a new offer (upon reception of the 200 OK for the INVITE) to unmute offer (upon reception of the 200 (OK) for the INVITE) to unmute the
the media session. The UAS cannot send any media until it receives media session. The UAS cannot send any media until it receives the
the offer from the UAC. Therefore, if the caller starts speaking offer from the UAC. Therefore, if the caller starts speaking before
before the offer from the UAC is received, his words will get lost. the offer from the UAC is received, his words will get lost.
Having the UAS send the UPDATE to unmute the media session Having the UAS send the UPDATE to unmute the media session
(instead of the UAC) does not avoid media clipping in the (instead of the UAC) does not avoid media clipping in the backward
backward direction and it causes possible race conditions. direction and it causes possible race conditions.
3.2 Ringing Tone Generation 3.2. Ringing Tone Generation
In the PSTN, telephone switches typically play ringing tones to the In the PSTN, telephone switches typically play ringing tones for the
caller to indicate that the callee is being alerted. When, where and caller, indicating that the callee is being alerted. When, where,
how these ringing tones are generated has been standardized (i.e., and how these ringing tones are generated has been standardized
the local exchange of the callee generates a standardized ringing (i.e., the local exchange of the callee generates a standardized
tone while the callee is being alterted). A standardized approach to ringing tone while the callee is being alerted). It makes sense for
provide this type of feedback for the user makes sense in a a standardized approach to provide this type of feedback for the user
homogeneous environment such as the PSTN, where all the terminals in a homogeneous environment such as the PSTN, where all the
have a similar user interface. terminals have a similar user interface.
This homogeneity is not found among SIP user agents. SIP user agents This homogeneity is not found among SIP user agents. SIP user agents
have different capabilities, different user interfaces and may be have different capabilities, different user interfaces, and may be
used to establish sessions that do not involve audio at all. Because used to establish sessions that do not involve audio at all. Because
of this, the way a SIP UA provides the user with information about of this, the way a SIP UA provides the user with information about
the progress of session establishment is a matter of local policy. the progress of session establishment is a matter of local policy.
For example, a UA with a GUI may choose to display a message on the For example, a UA with a Graphical User Interface (GUI) may choose to
screen when the callee is being alerted while another UA may choose display a message on the screen when the callee is being alerted,
to show a picture of a phone ringing instead. Many SIP UAs choose to while another UA may choose to show a picture of a phone ringing
imitate the user interface of the PSTN phones. They provide a ringing instead. Many SIP UAs choose to imitate the user interface of the
tone to the caller when the callee is being alerted. Such a UAC is PSTN phones. They provide a ringing tone to the caller when the
supposed to generate ringing tones locally for its user as long as no callee is being alerted. Such a UAC is supposed to generate ringing
early media is received from the UAS. If the UAS generates early tones locally for its user as long as no early media is received from
media (e.g., an announcement or a special ringing tone), the UAC is the UAS. If the UAS generates early media (e.g., an announcement or
supposed to play it rather than generating the ringing tone locally. a special ringing tone), the UAC is supposed to play it rather than
generate the ringing tone locally.
The problem is that, sometimes, it is not an easy task for a UAC to The problem is that, sometimes, it is not an easy task for a UAC to
know whether it should generate local ringing or it will be receiving know whether it will be receiving early media or it should generate
early media. A UAS can send early media without using reliable local ringing. A UAS can send early media without using reliable
provisional responses (very simple UASs do that) or it can send an provisional responses (very simple UASs do that) or it can send an
answer in a reliable provisional response without any intention of answer in a reliable provisional response without any intention of
sending early media (this is the case when preconditions are used). sending early media (this is the case when preconditions are used).
Therefore, by only looking at the SIP signalling, a UAC cannot be Therefore, by only looking at the SIP signalling, a UAC cannot be
sure whether or not there will be early media for a particular sure whether or not there will be early media for a particular
session. The UAC needs to check if media packets are arriving at a session. The UAC needs to check if media packets are arriving at a
given moment. given moment.
An implementation could even choose to look at the contents An implementation could even choose to look at the contents of the
of the media packets, since they could carry only silence media packets, since they could carry only silence or comfort
or comfort noise. noise.
With this in mind, a UAC should develop its local policy regarding With this in mind, a UAC should develop its local policy regarding
local ringing generation. For example, a POTS-like SIP UA could local ringing generation. For example, a POTS ("Plain Old Telephone
implement the following local policy: Service")-like SIP User Agent (UA) could implement the following
local policy:
1. Unless a 180 (Ringing) response is received, never generate 1. Unless a 180 (Ringing) response is received, never generate
local ringing. local ringing.
2. If a 180 (Ringing) has been received but there are no 2. If a 180 (Ringing) has been received but there are no incoming
incoming media packets, generate local ringing. media packets, generate local ringing.
3. If a 180 (Ringing) has been received and there are incoming 3. If a 180 (Ringing) has been received and there are incoming
media packets, play them and do not generate local ringing. media packets, play them and do not generate local ringing.
Note that a 180 (Ringing) response means that the callee is Note that a 180 (Ringing) response means that the callee is
being alerted, and a UAS should send such a response if the being alerted, and a UAS should send such a response if the
callee is being alerted, regardless of the status of the callee is being alerted, regardless of the status of the early
early media session. media session.
At first sight, such a policy may look difficult to implement in At first sight, such a policy may look difficult to implement in
decomposed UAs (i.e., media gateway controller and media gateway), decomposed UAs (i.e., media gateway controller and media gateway),
but this policy is the same as the one described in Section 2, which but this policy is the same as the one described in Section 2, which
must be implemented by any UA. That is, any UA should play incoming must be implemented by any UA. That is, any UA should play incoming
media packets (and stop local ringing tone generation if it was being media packets (and stop local ringing tone generation if it was being
performed) in order to avoid media clipping, even if the 200 (OK) performed) in order to avoid media clipping, even if the 200 (OK)
response has not arrived. So, the tools to implement this early media response has not arrived. So, the tools to implement this early
policy are available already to any UA that uses SIP. media policy are already available to any UA that uses SIP.
Note that, while it is not desirable to standardize a common local Note that, while it is not desirable to standardize a common local
policy to be followed by every SIP UA, a particular subset of more or policy to be followed by every SIP UA, a particular subset of more or
less homogeneous SIP UAs could use the same local policy by less homogeneous SIP UAs could use the same local policy by
convention. Examples of such subsets of SIP UAs may be "all the convention. Examples of such subsets of SIP UAs may be "all the
PSTN/SIP gateways" or "every 3G IMS terminal". However, defining the PSTN/SIP gateways" or "every 3GPP IMS (Third Generation Partnership
Project Internet Multimedia System) terminal". However, defining the
particular common policy that such groups of SIP devices may use is particular common policy that such groups of SIP devices may use is
outside the scope of this document. outside the scope of this document.
3.3 Absence of an Early Media Indicator 3.3. Absence of an Early Media Indicator
SIP, as opposed to other signalling protocols, does not provide an SIP, as opposed to other signalling protocols, does not provide an
early media indicator. That is, there is no information about the early media indicator. That is, there is no information about the
presence or absence of early media in SIP. Such an indicator could be presence or absence of early media in SIP. Such an indicator could
potentially used to avoid generation of local ringing tone by the UAC be potentially used to avoid the generation of local ringing tone by
when UAS intends to provide in-band ringing tone or some type of the UAC when UAS intends to provide an in-band ringing tone or some
announcement. However, due to the way SIP works, such an indicator type of announcement. However, in the majority of the cases, such an
would, in the majority of the cases, be of little use. indicator would be of little use due to the way SIP works.
One important reason that would limit the benefit of a potential One important reason limiting the benefit of a potential early media
early media indicator is the loose coupling between SIP signalling indicator is the loose coupling between SIP signalling and the media
and the media path. SIP signalling traverses a different path than path. SIP signalling traverses a different path than the media. The
the media. The media path is typically optimized to reduce the end- media path is typically optimized to reduce the end-to-end delay
to-end delay (e.g., minimum number of intermediaries) while the SIP (e.g., minimum number of intermediaries), while the SIP signalling
signalling path typically traverses a number of proxies providing path typically traverses a number of proxies providing different
different services for the session. Due to that reason, it is very services for the session. Hence, it is very likely that the media
likely that the media packets with early media reach the UAC before packets with early media reach the UAC before any SIP message that
any SIP message which could contain an early media indicator. could contain an early media indicator.
Nevertheless, sometimes, SIP responses arrive at the UAC before any Nevertheless, sometimes SIP responses arrive at the UAC before any
media packet. There are situations when the UAS intends to send early media packet. There are situations in which the UAS intends to send
media but cannot do it straight away. For example, UAs using ICE [6] early media but cannot do it straight away. For example, UAs using
may need to exchange several STUN messages before being able to Interactive Connectivity Establishment (ICE) [6] may need to exchange
exchange media. In this situations, an early media indicator would several Simple Traversals of the UDP Protocol through NAT (STUN)
keep the UAC from generating local ringing tone during this time. messages before being able to exchange media. In this situation, an
However, while the early media is not arriving to the UAC, the user early media indicator would keep the UAC from generating a local
would not be aware of the fact that the remote user is being alerted, ringing tone during this time. However, while the early media is not
even though a 180 (Ringing) had been received. Therefore, a better arriving at the UAC, the user would not be aware that the remote user
solution would be to apply local ringing tone until the early media is being alerted, even though a 180 (Ringing) had been received.
packets could be sent from the UAS to the UAC. This solution does not Therefore, a better solution would be to apply a local ringing tone
require any early media indicator. until the early media packets could be sent from the UAS to the UAC.
This solution does not require any early media indicator.
Note that migrations from local ringing tone to early media Note that migrations from local ringing tone to early media at the
at the UAC happen in the presence of forking as well; one UAC happen in the presence of forking as well; one UAS sends a 180
UAS sends a 180 (Ringing) response, and later, another UAS (Ringing) response, and later, another UAS starts sending early
starts sending early media. media.
3.4. Applicability of the Gateway Model
3.4 Applicability of the Gateway Model
Section 3 described some of the limitations of the gateway model. It Section 3 described some of the limitations of the gateway model. It
produces media clipping in forking scenarios and requires media produces media clipping in forking scenarios and requires media
detection to generate local ringing properly. These issues are detection to generate local ringing properly. These issues are
addressed by the application server model, described in Section 4, addressed by the application server model, described in Section 4,
which is the recommended way of generating early media that is not which is the recommended way of generating early media that is not
continuous with the regular media generated during the session. continuous with the regular media generated during the session.
The gateway model is, therefore, acceptable in situations where the The gateway model is, therefore, acceptable in situations where the
UA cannot distinguish between early media and regular media. A PSTN UA cannot distinguish between early media and regular media. A PSTN
gateway is an example of this type of situation. The PSTN gateway gateway is an example of this type of situation. The PSTN gateway
receives media from the PSTN over a circuit, and sends it to the IP receives media from the PSTN over a circuit, and sends it to the IP
network. The gateway is not aware of the contents of the media, and network. The gateway is not aware of the contents of the media, and
it does not exactly know when the transition from early to regular it does not exactly know when the transition from early to regular
media takes place. From the PSTN perspective, the circuit is a media takes place. From the PSTN perspective, the circuit is a
continuous source of media. continuous source of media.
4 The Application Server Model 4. The Application Server Model
The application server model consists of having UAS behave as an The application server model consists of having the UAS behave as an
application server to establish early media sessions with the UAC. application server to establish early media sessions with the UAC.
The UAC indicates support for the early-session disposition type The UAC indicates support for the early-session disposition type
(defined in [2]) using the early-session option tag. This way, UASs (defined in [2]) using the early-session option tag. This way, UASs
know that they can keep offer/answer exchanges for early media know that they can keep offer/answer exchanges for early media
(early-session disposition type) and for regular media (session (early-session disposition type) separate from regular media (session
disposition type) separate. disposition type).
Sending early media using a different offer/answer exchange than the Sending early media using a different offer/answer exchange than the
one used for sending regular media helps avoid media clipping in case one used for sending regular media helps avoid media clipping in
of forking. The UAC can reject or mute new offers for early media cases of forking. The UAC can reject or mute new offers for early
without muting the sessions that will carry media when the original media without muting the sessions that will carry media when the
INVITE is accepted. The UAC can give priority to media received over original INVITE is accepted. The UAC can give priority to media
the latter sessions. This way, the application server model received over the latter sessions. This way, the application server
transitions from early to regular media at the right moment. model transitions from early to regular media at the right moment.
Having a separate offer/answer exchange for early media also helps Having a separate offer/answer exchange for early media also helps
UACs decide whether or not local ringing should be generated. If a UACs decide whether or not local ringing should be generated. If a
new early session is established and that early session contains at new early session is established and that early session contains at
least an audio stream, the UAC can assume that there will be incoming least an audio stream, the UAC can assume that there will be incoming
early media and it can then avoid generating local ringing. early media and it can then avoid generating local ringing.
An alternative model would consist of adding a new stream An alternative model would include the addition of a new stream,
labeled as "early media" to the original session between with an "early media" label, to the original session between the
the UAC and the UAS using an UPDATE, instead of UAC and the UAS using an UPDATE instead of establishing a new
establishing a new early session. We have chosen to early session. We have chosen to establish a new early session to
establish a new early session to be coherent with the be coherent with the mechanism used by application servers that
mechanism used by application servers that are NOT co- are NOT
located with the UAS. This way, the UAS uses the same co-located with the UAS. This way, the UAS uses the same
mechanism as any application server in the network to mechanism as any application server in the network to interact
interact with the UAC. with the UAC.
4.1 In-Band Versus Out-of-Band Session Progress Information 4.1. In-Band Versus Out-of-Band Session Progress Information
Note that, even when the application server model is used, a UA will Note that, even when the application server model is used, a UA will
have to choose which early media sessions are muted and which ones have to choose which early media sessions are muted and which ones
are rendered to the user. In order to make this choice easier to UAs, are rendered to the user. In order to make this choice easier for
it is strongly recommended that information that is not essential for UAs, it is strongly recommended that information that is not
the session is not transmitted using early media. For instance, UAs essential for the session not be transmitted using early media. For
should not use early media to send special ringing tones. SIP already instance, UAs should not use early media to send special ringing
provides a means to inform the remote user about session tones. The status code and the reason phrase in SIP can already
establishment progress which does not cause any of the problems inform the remote user about the progress of session establishment,
associated with early media; the status code and the reason phrase in without incurring the problems associated with early media.
provisional responses.
5 Alert-Info Header Field 5. Alert-Info Header Field
The Alert-Info header field allows specifying an alternative ringing The Alert-Info header field allows specifying an alternative ringing
content, such as ringing tone, to the UAC. This header field tells content, such as ringing tone, to the UAC. This header field tells
the UAC which tone should be played in case local ringing is the UAC which tone should be played in case local ringing is
generated, but it does not tell the UAC when to generate local generated, but it does not tell the UAC when to generate local
ringing. A UAC should follow the rules described above for ringing ringing. A UAC should follow the rules described above for ringing
tone generation in both models. If, after following those rules, the tone generation in both models. If, after following those rules, the
UAC decides to play local ringing, it can then use the Alert-Info UAC decides to play local ringing, it can then use the Alert-Info
header field to generate it. header field to generate it.
6 Security Considerations 6. Security Considerations
SIP uses the offer/answer model [3] to establish early sessions in SIP uses the offer/answer model [3] to establish early sessions in
both the gateway and the application server models. User Agents (UAs) both the gateway and the application server models. User Agents
generate a session description, which contains the transport address (UAs) generate a session description, which contains the transport
(i.e., IP address plus port) where they want to receive media, and address (i.e., IP address plus port) where they want to receive
send it to their peer in a SIP message. When media packets arrive at media, and send it to their peer in a SIP message. When media
this transport address, the UA assumes that they come from the packets arrive at this transport address, the UA assumes that they
receiver of the SIP message carrying the session description. come from the receiver of the SIP message carrying the session
Nevertheless, attackers may attempt to gain access to the contents of description. Nevertheless, attackers may attempt to gain access to
the SIP message and send packets to the transport address contained the contents of the SIP message and send packets to the transport
in the session description. To prevent this situation, UAs SHOULD address contained in the session description. To prevent this
encrypt their session descriptions (e.g., using S/MIME). situation, UAs SHOULD encrypt their session descriptions (e.g., using
S/MIME).
Still, even if a UA encrypts its session descriptions, an attacker Still, even if a UA encrypts its session descriptions, an attacker
may try to guess the transport address used by the UA and send media may try to guess the transport address used by the UA and send media
packets to that address. Guessing such a transport address is packets to that address. Guessing such a transport address is
sometimes easier than it may seem because many UAs always pick up the sometimes easier than it may seem because many UAs always pick up the
same initial media port. To prevent this situation, UAs SHOULD use same initial media port. To prevent this situation, UAs SHOULD use
media-level authentication mechanisms (e.g., SRTP [7]). In addition, media-level authentication mechanisms such as the Secure Realtime
UAs that wish to keep their communications confidential SHOULD use Transport Protocol (SRTP)[7]. In addition, UAs that wish to keep
media-level encryption mechanisms (e.g, SRTP [7]). their communications confidential SHOULD use media-level encryption
mechanisms (e.g, SRTP [7]).
Attackers may attempt to make a UA send media to a victim as part of Attackers may attempt to make a UA send media to a victim as part of
a DoS attack. This can be done by sending a session description with a DoS attack. This can be done by sending a session description with
the victim's transport address to the UA. To prevent this attack, the the victim's transport address to the UA. To prevent this attack,
UA SHOULD engage in a handshake with the owner of the transport the UA SHOULD engage in a handshake with the owner of the transport
address received in a session descriptions (just verifying address received in a session description (just verifying willingness
willingness to receive media) before sending a large amount of data to receive media) before sending a large amount of data to the
to the transport address. This check can be performed by using a transport address. This check can be performed by using a connection
connection oriented transport protocol, by using STUN [8] in an end- oriented transport protocol, by using STUN [8] in an end-to-end
to-end fashion, or by the key exchange in SRTP [7]. fashion, or by the key exchange in SRTP [7].
In any event, note that the previous security considerations are not In any event, note that the previous security considerations are not
early media specific, but apply to the usage of the offer/answer early media specific, but apply to the usage of the offer/answer
model in SIP to establish sessions in general. model in SIP to establish sessions in general.
Additionally, an early media-specific risk (roughly speaking, an Additionally, an early media-specific risk (roughly speaking,
equivalent to forms of "toll fraud" in the PSTN) attempts to exploit equivalent to forms of "toll fraud" in the PSTN) attempts to exploit
the different charging policies some operators apply to early and to the different charging policies some operators apply to early and
regular media. When UAs are allowed to exchange early media for free, regular media. When UAs are allowed to exchange early media for
but are required to pay for regular media sessions, rogue UAs may try free, but are required to pay for regular media sessions, rogue UAs
to establish a bidirectional early media session and never send a 2xx may try to establish a bidirectional early media session and never
response for the INVITE. send a 200 (OK) response for the INVITE.
On the other hand, some application servers (e.g., Interactive Voice On the other hand, some application servers (e.g., Interactive Voice
Response systems) use bidirectional early media to obtain information Response systems) use bidirectional early media to obtain information
from the callers (e.g., the PIN code of a calling card). So, we do from the callers (e.g., the PIN code of a calling card). So, we do
not recommend that operators disallow bidirectional early media. not recommend that operators disallow bidirectional early media.
Instead, operators should consider a remedy of charging early media Instead, operators should consider a remedy of charging early media
exchanges that last too long, or stopping them at the media level exchanges that last too long, or stopping them at the media level
(according to the operator's policy). (according to the operator's policy).
7 Acknowledgments 7. Acknowledgments
Jon Peterson provided useful ideas on the separation between the Jon Peterson provided useful ideas on the separation between the
gateway model and the application server model. gateway model and the application server model.
Paul Kyzivat, Christer Holmberg, Bill Marshall, Francois Audet, John Paul Kyzivat, Christer Holmberg, Bill Marshall, Francois Audet, John
Hearty, Adam Roach, Eric Burger, Rohan Mahy, and Allison Mankin Hearty, Adam Roach, Eric Burger, Rohan Mahy, and Allison Mankin
provided useful comments and suggestions. provided useful comments and suggestions.
8 Authors' Addresses 8. References
8.1. Normative References
[1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP:
Session Initiation Protocol", RFC 3261, June 2002.
[2] Camarillo, G., "The Early Session Disposition Type for the
Session Initiation Protocol (SIP)", RFC 3959, December 2004.
[3] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
Session Description Protocol (SDP)", RFC 3264, June 2002.
8.2. Informative References
[4] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional
Responses in Session Initiation Protocol (SIP)", RFC 3262, June
2002.
[5] Rosenberg, J., "The Session Initiation Protocol (SIP) UPDATE
Method", RFC 3311, October 2002.
[6] Rosenberg, J., "Interactive connectivity establishment (ICE): a
methodology for network address translator (NAT) traversal for
the session initiation protocol (SIP)", Work in progress, July
2003.
[7] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
3711, March 2004.
[8] Rosenberg, J., Weinberger, J., Huitema, C., and R. Mahy,
"STUN - Simple Traversal of User Datagram Protocol (UDP) Through
Network Address Translators (NATs)", RFC 3489, March 2003.
[9] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
Authors' Addresses
Gonzalo Camarillo Gonzalo Camarillo
Ericsson Ericsson
Advanced Signalling Research Lab. Advanced Signalling Research Lab.
FIN-02420 Jorvas FIN-02420 Jorvas
Finland Finland
electronic mail: Gonzalo.Camarillo@ericsson.com
EMail: Gonzalo.Camarillo@ericsson.com
Henning Schulzrinne Henning Schulzrinne
Dept. of Computer Science Dept. of Computer Science
Columbia University 1214 Amsterdam Avenue, MC 0401 Columbia University 1214 Amsterdam Avenue, MC 0401
New York, NY 10027 New York, NY 10027
USA USA
electronic mail: schulzrinne@cs.columbia.edu
9 Normative References
[1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. R. Johnston, J.
Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session
initiation protocol," RFC 3261, Internet Engineering Task Force, June
2002.
[2] G. Camarillo, "The early session disposition type for the session
initiation protocol (SIP)," Internet Draft draft-ietf-sipping-early-
disposition-01, Internet Engineering Task Force, Jan. 2004. Work in
progress.
[3] J. Rosenberg and H. Schulzrinne, "An offer/answer model with
session description protocol (SDP)," RFC 3264, Internet Engineering
Task Force, June 2002.
10 Informative References
[4] J. Rosenberg and H. Schulzrinne, "Reliability of provisional EMail: schulzrinne@cs.columbia.edu
responses in session initiation protocol (SIP)," RFC 3262, Internet
Engineering Task Force, June 2002.
[5] J. Rosenberg, "The session initiation protocol (SIP) UPDATE Full Copyright Statement
method," RFC 3311, Internet Engineering Task Force, Oct. 2002.
[6] J. Rosenberg, "Interactive connectivity establishment (ICE): a Copyright (C) The Internet Society (2004).
methodology for nettwork address translator (NAT) traversal for the
session initiation protocol (SIP)," internet draft, Internet
Engineering Task Force, July 2003. Work in progress.
[7] M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman, This document is subject to the rights, licenses and restrictions
"The secure real-time transport protocol (SRTP)," RFC 3711, Internet contained in BCP 78, and except as set forth therein, the authors
Engineering Task Force, Mar 2004. retain all their rights.
[8] J. Rosenberg, J. Weinberger, C. Huitema, and R. Mahy, "STUN - This document and the information contained herein are provided on an
simple traversal of user datagram protocol (UDP) through network "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
address translators (nats)," RFC 3489, Internet Engineering Task OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
Force, Mar. 2003. ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property Statement Intellectual Property
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
on the IETF's procedures with respect to rights in IETF Documents can on the IETF's procedures with respect to rights in IETF Documents can
be found in BCP 78 and BCP 79. be found in BCP 78 and BCP 79.
skipping to change at page 13, line 29 skipping to change at page 13, line 45
such proprietary rights by implementers or users of this such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr. http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf- this standard. Please address the information to the IETF at ietf-
ipr@ietf.org. ipr@ietf.org.
Disclaimer of Validity Acknowledgement
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2004). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the Funding for the RFC Editor function is currently provided by the
Internet Society. Internet Society.
 End of changes. 

This html diff was produced by rfcdiff 1.25, available from http://www.levkowetz.com/ietf/tools/rfcdiff/