Early Media and Ringing Tone Generation in SIP
If the UAC receives early media from different UASs, it needs to
present it to the user. If the early media consists of audio,
playing several audio streams to the user at the same time may be
confusing. On the other hand, other media types (e.g., video) can be
presented to the user at the same time. For example, the UAC can
build a mosaic with the different inputs.
However, even with media types that can be played at the same time to
the user, if the UAC has limited bandwidth, it will not be able to
receive early media from all the different UASs at the same time.
Therefore, many times, the UAC needs to choose a single early media
session and "mute" those sending UPDATE requests.
It is difficult to decide which early media sessions carry more
important information from the caller's perspective. In fact, in
some scenarios, the UA cannot even correlate media packets with
their particular SIP early dialog. Therefore, UACs typically pick
one early dialog randomly and mute the rest.
If one of the early media sessions that was muted transitions to a
regular media session (i.e., the UAS sends a 2xx response), media
clipping is likely. The UAC typically sends an UPDATE with a new
offer (upon reception of the 200 (OK) for the INVITE) to unmute the
media session. The UAS cannot send any media until it receives the
offer from the UAC. Therefore, if the caller starts speaking before
the offer from the UAC is received, his words will get lost.
Having the UAS send the UPDATE to unmute the media session
(instead of the UAC) does not avoid media clipping in the backward
direction and it causes possible race conditions.
3.2. Ringing Tone Generation
In the PSTN, telephone switches typically play ringing tones for the
caller, indicating that the callee is being alerted. When, where,
and how these ringing tones are generated has been standardized
(i.e., the local exchange of the callee generates a standardized
ringing tone while the callee is being alerted). It makes sense for
a standardized approach to provide this type of feedback for the user
in a homogeneous environment such as the PSTN, where all the
terminals have a similar user interface.
This homogeneity is not found among SIP user agents. SIP user agents
have different capabilities, different user interfaces, and may be
used to establish sessions that do not involve audio at all. Because
of this, the way a SIP UA provides the user with information about
the progress of session establishment is a matter of local policy.
For example, a UA with a Graphical User Interface (GUI) may choose to
Camarillo & Schulzrinne Informational [Page 5]
RFC 3960 Early Media and Ringing Tone Generation December 2004
display a message on the screen when the callee is being alerted,
while another UA may choose to show a picture of a phone ringing
instead. Many SIP UAs choose to imitate the user interface of the
PSTN phones. They provide a ringing tone to the caller when the
callee is being alerted. Such a UAC is supposed to generate ringing
tones locally for its user as long as no early media is received from
the UAS. If the UAS generates early media (e.g., an announcement or
a special ringing tone), the UAC is supposed to play it rather than
generate the ringing tone locally.
The problem is that, sometimes, it is not an easy task for a UAC to
know whether it will be receiving early media or it should generate
local ringing. A UAS can send early media without using reliable
provisional responses (very simple UASs do that) or it can send an
answer in a reliable provisional response without any intention of
sending early media (this is the case when preconditions are used).
Therefore, by only looking at the SIP signalling, a UAC cannot be
sure whether or not there will be early media for a particular
session. The UAC needs to check if media packets are arriving at a
given moment.
An implementation could even choose to look at the contents of the
media packets, since they could carry only silence or comfort
noise.
With this in mind, a UAC should develop its local policy regarding
local ringing generation. For example, a POTS ("Plain Old Telephone
Service")-like SIP User Agent (UA) could implement the following
local policy:
1. Unless a 180 (Ringing) response is received, never generate
local ringing.
2. If a 180 (Ringing) has been received but there are no incoming
media packets, generate local ringing.
3. If a 180 (Ringing) has been received and there are incoming
media packets, play them and do not generate local ringing.
Note that a 180 (Ringing) response means that the callee is
being alerted, and a UAS should send such a response if the
callee is being alerted, regardless of the status of the early
media session.
At first sight, such a policy may look difficult to implement in
decomposed UAs (i.e., media gateway controller and media gateway),
but this policy is the same as the one described in Section 2, which
must be implemented by any UA. That is, any UA should play incoming
Camarillo & Schulzrinne Informational [Page 6]
RFC 3960 Early Media and Ringing Tone Generation December 2004
media packets (and stop local ringing tone generation if it was being
performed) in order to avoid media clipping, even if the 200 (OK)
response has not arrived. So, the tools to implement this early
media policy are already available to any UA that uses SIP.
Note that, while it is not desirable to standardize a common local
policy to be followed by every SIP UA, a particular subset of more or
less homogeneous SIP UAs could use the same local policy by
convention. Examples of such subsets of SIP UAs may be "all the
PSTN/SIP gateways" or "every 3GPP IMS (Third Generation Partnership
Project Internet Multimedia System) terminal". However, defining the
particular common policy that such groups of SIP devices may use is
outside the scope of this document.
3.3. Absence of an Early Media Indicator
SIP, as opposed to other signalling protocols, does not provide an
early media indicator. That is, there is no information about the
presence or absence of early media in SIP. Such an indicator could
be potentially used to avoid the generation of local ringing tone by
the UAC when UAS intends to provide an in-band ringing tone or some
type of announcement. However, in the majority of the cases, such an
indicator would be of little use due to the way SIP works.
One important reason limiting the benefit of a potential early media
indicator is the loose coupling between SIP signalling and the media
path. SIP signalling traverses a different path than the media. The
media path is typically optimized to reduce the end-to-end delay
(e.g., minimum number of intermediaries), while the SIP signalling
path typically traverses a number of proxies providing different
services for the session. Hence, it is very likely that the media
packets with early media reach the UAC before any SIP message that
could contain an early media indicator.
Nevertheless, sometimes SIP responses arrive at the UAC before any
media packet. There are situations in which the UAS intends to send
early media but cannot do it straight away. For example, UAs using
Interactive Connectivity Establishment (ICE) [6] may need to exchange
several Simple Traversals of the UDP Protocol through NAT (STUN)
messages before being able to exchange media. In this situation, an
early media indicator would keep the UAC from generating a local
ringing tone during this time. However, while the early media is not
arriving at the UAC, the user would not be aware that the remote user
is being alerted, even though a 180 (Ringing) had been received.
Therefore, a better solution would be to apply a local ringing tone
until the early media packets could be sent from the UAS to the UAC.
This solution does not require any early media indicator.
Camarillo & Schulzrinne Informational [Page 7]
RFC 3960 Early Media and Ringing Tone Generation December 2004
Note that migrations from local ringing tone to early media at the
UAC happen in the presence of forking as well; one UAS sends a 180
(Ringing) response, and later, another UAS starts sending early
media.
3.4. Applicability of the Gateway Model
Section 3 described some of the limitations of the gateway model. It
produces media clipping in forking scenarios and requires media
detection to generate local ringing properly. These issues are
addressed by the application server model, described in Section 4,
which is the recommended way of generating early media that is not
continuous with the regular media generated during the session.
The gateway model is, therefore, acceptable in situations where the
UA cannot distinguish between early media and regular media. A PSTN
gateway is an example of this type of situation. The PSTN gateway
receives media from the PSTN over a circuit, and sends it to the IP
network. The gateway is not aware of the contents of the media, and
it does not exactly know when the transition from early to regular
media takes place. From the PSTN perspective, the circuit is a
continuous source of media.
4. The Application Server Model
The application server model consists of having the UAS behave as an
application server to establish early media sessions with the UAC.
The UAC indicates support for the early-session disposition type
(defined in [2]) using the early-session option tag. This way, UASs
know that they can keep offer/answer exchanges for early media
(early-session disposition type) separate from regular media (session
disposition type).
Sending early media using a different offer/answer exchange than the
one used for sending regular media helps avoid media clipping in
cases of forking. The UAC can reject or mute new offers for early
media without muting the sessions that will carry media when the
original INVITE is accepted. The UAC can give priority to media
received over the latter sessions. This way, the application server
model transitions from early to regular media at the right moment.
Having a separate offer/answer exchange for early media also helps
UACs decide whether or not local ringing should be generated. If a
new early session is established and that early session contains at
least an audio stream, the UAC can assume that there will be incoming
early media and it can then avoid generating local ringing.
Camarillo & Schulzrinne Informational [Page 8]
RFC 3960 Early Media and Ringing Tone Generation December 2004
An alternative model would include the addition of a new stream,
with an "early media" label, to the original session between the
UAC and the UAS using an UPDATE instead of establishing a new
early session. We have chosen to establish a new early session to
be coherent with the mechanism used by application servers that
are NOT
co-located with the UAS. This way, the UAS uses the same
mechanism as any application server in the network to interact
with the UAC.
4.1. In-Band Versus Out-of-Band Session Progress Information
Note that, even when the application server model is used, a UA will
have to choose which early media sessions are muted and which ones
are rendered to the user. In order to make this choice easier for
UAs, it is strongly recommended that information that is not
essential for the session not be transmitted using early media. For
instance, UAs should not use early media to send special ringing
tones. The status code and the reason phrase in SIP can already
inform the remote user about the progress of session establishment,
without incurring the problems associated with early media.
5. Alert-Info Header Field
The Alert-Info header field allows specifying an alternative ringing
content, such as ringing tone, to the UAC. This header field tells
the UAC which tone should be played in case local ringing is
generated, but it does not tell the UAC when to generate local
ringing. A UAC should follow the rules described above for ringing
tone generation in both models. If, after following those rules, the
UAC decides to play local ringing, it can then use the Alert-Info
header field to generate it.
6. Security Considerations
SIP uses the offer/answer model [3] to establish early sessions in
both the gateway and the application server models. User Agents
(UAs) generate a session description, which contains the transport
address (i.e., IP address plus port) where they want to receive
media, and send it to their peer in a SIP message. When media
packets arrive at this transport address, the UA assumes that they
come from the receiver of the SIP message carrying the session
description. Nevertheless, attackers may attempt to gain access to
the contents of the SIP message and send packets to the transport
address contained in the session description. To prevent this
situation, UAs SHOULD encrypt their session descriptions (e.g., using
S/MIME).
Camarillo & Schulzrinne Informational [Page 9]
RFC 3960 Early Media and Ringing Tone Generation December 2004
Still, even if a UA encrypts its session descriptions, an attacker
may try to guess the transport address used by the UA and send media
packets to that address. Guessing such a transport address is
sometimes easier than it may seem because many UAs always pick up the
same initial media port. To prevent this situation, UAs SHOULD use
media-level authentication mechanisms such as the Secure Realtime
Transport Protocol (SRTP)[7]. In addition, UAs that wish to keep
their communications confidential SHOULD use media-level encryption
mechanisms (e.g, SRTP [7]).
Attackers may attempt to make a UA send media to a victim as part of
a DoS attack. This can be done by sending a session description with
the victim's transport address to the UA. To prevent this attack,
the UA SHOULD engage in a handshake with the owner of the transport
address received in a session description (just verifying willingness
to receive media) before sending a large amount of data to the
transport address. This check can be performed by using a connection
oriented transport protocol, by using STUN [8] in an end-to-end
fashion, or by the key exchange in SRTP [7].
In any event, note that the previous security considerations are not
early media specific, but apply to the usage of the offer/answer
model in SIP to establish sessions in general.
Additionally, an early media-specific risk (roughly speaking,
equivalent to forms of "toll fraud" in the PSTN) attempts to exploit
the different charging policies some operators apply to early and
regular media. When UAs are allowed to exchange early media for
free, but are required to pay for regular media sessions, rogue UAs
may try to establish a bidirectional early media session and never
send a 200 (OK) response for the INVITE.
On the other hand, some application servers (e.g., Interactive Voice
Response systems) use bidirectional early media to obtain information
from the callers (e.g., the PIN code of a calling card). So, we do
not recommend that operators disallow bidirectional early media.
Instead, operators should consider a remedy of charging early media
exchanges that last too long, or stopping them at the media level
(according to the operator's policy).
7. Acknowledgments
Jon Peterson provided useful ideas on the separation between the
gateway model and the application server model.
Paul Kyzivat, Christer Holmberg, Bill Marshall, Francois Audet, John
Hearty, Adam Roach, Eric Burger, Rohan Mahy, and Allison Mankin
provided useful comments and suggestions.
Camarillo & Schulzrinne Informational [Page 10]
RFC 3960 Early Media and Ringing Tone Generation December 2004
8. References
8.1. Normative References
[1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP:
Session Initiation Protocol", RFC 3261, June 2002.
[2] Camarillo, G., "The Early Session Disposition Type for the
Session Initiation Protocol (SIP)", RFC 3959, December 2004.
[3] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
Session Description Protocol (SDP)", RFC 3264, June 2002.
8.2. Informative References
[4] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional
Responses in Session Initiation Protocol (SIP)", RFC 3262, June
2002.
[5] Rosenberg, J., "The Session Initiation Protocol (SIP) UPDATE
Method", RFC 3311, October 2002.
[6] Rosenberg, J., "Interactive connectivity establishment (ICE): a
methodology for network address translator (NAT) traversal for
the session initiation protocol (SIP)", Work in progress, July
2003.
[7] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
3711, March 2004.
[8] Rosenberg, J., Weinberger, J., Huitema, C., and R. Mahy,
"STUN - Simple Traversal of User Datagram Protocol (UDP) Through
Network Address Translators (NATs)", RFC 3489, March 2003.
[9] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
1 comments:
No set monthly payments, instant availability of funds
at your best comfort, amongst other benefits. Loans with each fixed curiosity and
variable curiosity are accessible.
Review my webpage :: home loans with bad credit
Post a Comment
Note: only a member of this blog may post a comment.