draft-ietf-dime-overload-reqs-06.txt   draft-ietf-dime-overload-reqs-07.txt 
Network Working Group E. McMurry Network Working Group E. McMurry
Internet-Draft B. Campbell Internet-Draft B. Campbell
Intended status: Standards Track Tekelec Intended status: Informational Tekelec
Expires: October 19, 2013 April 17, 2013 Expires: December 8, 2013 June 6, 2013
Diameter Overload Control Requirements Diameter Overload Control Requirements
draft-ietf-dime-overload-reqs-06 draft-ietf-dime-overload-reqs-07
Abstract Abstract
When a Diameter server or agent becomes overloaded, it needs to be When a Diameter server or agent becomes overloaded, it needs to be
able to gracefully reduce its load, typically by informing clients to able to gracefully reduce its load, typically by informing clients to
reduce sending traffic for some period of time. Otherwise, it must reduce sending traffic for some period of time. Otherwise, it must
continue to expend resources parsing and responding to Diameter continue to expend resources parsing and responding to Diameter
messages, possibly resulting in congestion collapse. The existing messages, possibly resulting in congestion collapse. The existing
Diameter mechanisms, listed in Section 3 are not sufficient for this Diameter mechanisms, listed in Section 3 are not sufficient for this
purpose. This document describes the limitations of the existing purpose. This document describes the limitations of the existing
skipping to change at page 1, line 38 skipping to change at page 1, line 38
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 19, 2013. This Internet-Draft will expire on December 8, 2013.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 29 skipping to change at page 2, line 29
2.2. Agent Scenarios . . . . . . . . . . . . . . . . . . . . . 9 2.2. Agent Scenarios . . . . . . . . . . . . . . . . . . . . . 9
2.3. Interconnect Scenario . . . . . . . . . . . . . . . . . . 12 2.3. Interconnect Scenario . . . . . . . . . . . . . . . . . . 12
3. Existing Mechanisms . . . . . . . . . . . . . . . . . . . . . 13 3. Existing Mechanisms . . . . . . . . . . . . . . . . . . . . . 13
4. Issues with the Current Mechanisms . . . . . . . . . . . . . . 14 4. Issues with the Current Mechanisms . . . . . . . . . . . . . . 14
4.1. Problems with Implicit Mechanism . . . . . . . . . . . . . 15 4.1. Problems with Implicit Mechanism . . . . . . . . . . . . . 15
4.2. Problems with Explicit Mechanisms . . . . . . . . . . . . 15 4.2. Problems with Explicit Mechanisms . . . . . . . . . . . . 15
5. Diameter Overload Case Studies . . . . . . . . . . . . . . . . 16 5. Diameter Overload Case Studies . . . . . . . . . . . . . . . . 16
5.1. Overload in Mobile Data Networks . . . . . . . . . . . . . 16 5.1. Overload in Mobile Data Networks . . . . . . . . . . . . . 16
5.2. 3GPP Study on Core Network Overload . . . . . . . . . . . 17 5.2. 3GPP Study on Core Network Overload . . . . . . . . . . . 17
6. Extensibility and Application Independence . . . . . . . . . . 18 6. Extensibility and Application Independence . . . . . . . . . . 18
7. Solution Requirements . . . . . . . . . . . . . . . . . . . . 19 7. Solution Requirements . . . . . . . . . . . . . . . . . . . . 18
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23
9. Security Considerations . . . . . . . . . . . . . . . . . . . 23 9. Security Considerations . . . . . . . . . . . . . . . . . . . 23
9.1. Access Control . . . . . . . . . . . . . . . . . . . . . . 24 9.1. Access Control . . . . . . . . . . . . . . . . . . . . . . 24
9.2. Denial-of-Service Attacks . . . . . . . . . . . . . . . . 24 9.2. Denial-of-Service Attacks . . . . . . . . . . . . . . . . 24
9.3. Replay Attacks . . . . . . . . . . . . . . . . . . . . . . 24 9.3. Replay Attacks . . . . . . . . . . . . . . . . . . . . . . 24
9.4. Man-in-the-Middle Attacks . . . . . . . . . . . . . . . . 25 9.4. Man-in-the-Middle Attacks . . . . . . . . . . . . . . . . 25
9.5. Compromised Hosts . . . . . . . . . . . . . . . . . . . . 25 9.5. Compromised Hosts . . . . . . . . . . . . . . . . . . . . 25
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
10.1. Normative References . . . . . . . . . . . . . . . . . . . 25 10.1. Normative References . . . . . . . . . . . . . . . . . . . 25
10.2. Informative References . . . . . . . . . . . . . . . . . . 25 10.2. Informative References . . . . . . . . . . . . . . . . . . 26
Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 26 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 26
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 26 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 27
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27
1. Introduction 1. Introduction
When a Diameter [RFC6733] server or agent becomes overloaded, it When a Diameter [RFC6733] server or agent becomes overloaded, it
needs to be able to gracefully reduce its load, typically by needs to be able to gracefully reduce its load, typically by
informing clients to reduce sending traffic for some period of time. informing clients to reduce sending traffic for some period of time.
Otherwise, it must continue to expend resources parsing and Otherwise, it must continue to expend resources parsing and
responding to Diameter messages, possibly resulting in congestion responding to Diameter messages, possibly resulting in congestion
collapse. The existing mechanisms provided by Diameter are not collapse. The existing mechanisms provided by Diameter are not
skipping to change at page 16, line 15 skipping to change at page 16, line 15
client should wait before retrying the overloaded destination. If an client should wait before retrying the overloaded destination. If an
agent or server supports multiple realms and/or applications, agent or server supports multiple realms and/or applications,
DIAMETER_TOO_BUSY offers no way to indicate that it is overloaded for DIAMETER_TOO_BUSY offers no way to indicate that it is overloaded for
one application but not another. A DIAMETER_TOO_BUSY error can only one application but not another. A DIAMETER_TOO_BUSY error can only
indicate overload at a "whole server" scope. indicate overload at a "whole server" scope.
Agent processing of a DIAMETER_TOO_BUSY response is also problematic Agent processing of a DIAMETER_TOO_BUSY response is also problematic
as described in the base specification. DIAMETER_TOO_BUSY is defined as described in the base specification. DIAMETER_TOO_BUSY is defined
as a protocol error. If an agent receives a protocol error, it may as a protocol error. If an agent receives a protocol error, it may
either handle it locally or it may forward the response back towards either handle it locally or it may forward the response back towards
the downstream peer. (The Diameter specification is inconsistent the downstream peer. If a downstream peer receives the
about whether a protocol error MAY or SHOULD be handled by an agent,
rather than forwarded downstream.) If a downstream peer receives the
DIAMETER_TOO_BUSY response, it may stop sending all requests to the DIAMETER_TOO_BUSY response, it may stop sending all requests to the
agent for some period of time, even though the agent may still be agent for some period of time, even though the agent may still be
able to deliver requests to other upstream peers. able to deliver requests to other upstream peers.
DIAMETER_UNABLE_TO_DELIVER, or using DPR with cause code BUSY also DIAMETER_UNABLE_TO_DELIVER, or using DPR with cause code BUSY also
have no mechanisms for specifying the scope or cause of the failure, have no mechanisms for specifying the scope or cause of the failure,
or the durational validity. or the durational validity.
The issues with error responses in [RFC6733] extend beyond the The issues with error responses in [RFC6733] extend beyond the
particular issues for overload control and have been addressed in an particular issues for overload control and have been addressed in an
ad hoc fashion by various implementations. Addressing these in a ad hoc fashion by various implementations. Addressing these in a
standard way would be a useful exercise, but it us beyond the scope standard way would be a useful exercise, but it us beyond the scope
of this document. of this document.
5. Diameter Overload Case Studies 5. Diameter Overload Case Studies
5.1. Overload in Mobile Data Networks 5.1. Overload in Mobile Data Networks
As the number of Third Generation (3G) and Long Term Evolution (LTE) As the number of Third Generation (3G) and Long Term Evolution (LTE)
enabled smartphone devices continue to expand in mobility networks, enabled smartphone devices continue to expand in mobile networks,
there have been situations where high signaling traffic load led to there have been situations where high signaling traffic load led to
overload events at the Diameter-based Home Location Registries (HLR) overload events at the Diameter-based Home Location Registries (HLR)
and/or Home Subscriber Servers (HSS) [TR23.843]. The root causes of and/or Home Subscriber Servers (HSS) [TR23.843]. The root causes of
the HLR congestion events were manifold but included hardware failure the HLR congestion events were manifold but included hardware failure
and procedural errors. The result was high signaling traffic load on and procedural errors. The result was high signaling traffic load on
the HLR and HSS. the HLR and HSS.
The 3GPP architecture [TS23.002] makes extensive use of Diameter. It The 3GPP architecture [TS23.002] makes extensive use of Diameter. It
is used for mobility management [TS29.272] (and others), (IP is used for mobility management [TS29.272] (and others), (IP
Multimedia Subsystem) IMS [TS29.228] (and others), policy and Multimedia Subsystem) IMS [TS29.228] (and others), policy and
skipping to change at page 17, line 25 skipping to change at page 17, line 23
Smartphones, an increasingly large percentage of mobile devices, Smartphones, an increasingly large percentage of mobile devices,
contribute much more heavily, relative to non-smartphones, to the contribute much more heavily, relative to non-smartphones, to the
continuation of a registration surge due to their very aggressive continuation of a registration surge due to their very aggressive
registration algorithms. Smartphone behavior contributes to network registration algorithms. Smartphone behavior contributes to network
loading and can contribute to overload conditions. The aggressive loading and can contribute to overload conditions. The aggressive
smartphone logic is designed to: smartphone logic is designed to:
a. always have voice and data registration, and a. always have voice and data registration, and
b. constantly try to be on 3G or LTE data (and thus on 3G voice or b. constantly try to be on 3G or LTE data (and thus on 3G voice or
VoLTE) for their added benefits. VoLTE [IR.92]) for their added benefits.
Non-smartphones typically have logic to wait for a time period after Non-smartphones typically have logic to wait for a time period after
registering successfully on voice and data. registering successfully on voice and data.
The smartphone aggressive registration is problematic in two ways: The smartphone aggressive registration is problematic in two ways:
o first by generating excessive signaling load towards the HLR that o first by generating excessive signaling load towards the HSS that
is ten times that from a non-smartphone, is ten times that from a non-smartphone,
o and second by causing continual registration attempts when a o and second by causing continual registration attempts when a
network failure affects registrations through the 3G data network. network failure affects registrations through the 3G data network.
5.2. 3GPP Study on Core Network Overload 5.2. 3GPP Study on Core Network Overload
A study in 3GPP SA2 on core network overload has produced the A study in 3GPP SA2 on core network overload has produced the
technical report [TR23.843]. This enumerates several causes of technical report [TR23.843]. This enumerates several causes of
overload in mobile core networks including portions that are signaled overload in mobile core networks including portions that are signaled
skipping to change at page 21, line 15 skipping to change at page 21, line 15
REQ 17: In a mixed environment with nodes that support the overload REQ 17: In a mixed environment with nodes that support the overload
control mechanism and that do not, the mechanism MUST result control mechanism and that do not, the mechanism MUST result
in at least as much useful throughput as would have resulted in at least as much useful throughput as would have resulted
if the mechanism were not present. It SHOULD result in less if the mechanism were not present. It SHOULD result in less
severe congestion in this environment. severe congestion in this environment.
REQ 18: In a mixed environment of nodes that support the overload REQ 18: In a mixed environment of nodes that support the overload
control mechanism and that do not, the mechanism MUST NOT control mechanism and that do not, the mechanism MUST NOT
preclude elements that support overload control from preclude elements that support overload control from
treating elements that do not support overload control in a treating elements that do not support overload control in a
equitable fashion relative to those that do. users and equitable fashion relative to those that do. Users and
operators of nodes that do not support the mechanism MUST operators of nodes that do not support the mechanism MUST
NOT unfairly benefit from the mechanism. The mechanism NOT unfairly benefit from the mechanism. The mechanism
specification SHOULD provide guidance to implementors for specification SHOULD provide guidance to implementors for
dealing with elements not supporting overload control. dealing with elements not supporting overload control.
REQ 19: It MUST be possible to use the mechanism between nodes in REQ 19: It MUST be possible to use the mechanism between nodes in
different realms and in different administrative domains. different realms and in different administrative domains.
REQ 20: Any explicit overload indication MUST be clearly REQ 20: Any explicit overload indication MUST be clearly
distinguishable from other errors reported via Diameter. distinguishable from other errors reported via Diameter.
REQ 21: In cases where a network node fails, is so overloaded that REQ 21: In cases where a network node fails, is so overloaded that
it cannot process messages, or cannot communicate due to a it cannot process messages, or cannot communicate due to a
network failure, it may not be able to provide explicit network failure, it may not be able to provide explicit
indications of the nature of the failure or its levels of indications of the nature of the failure or its levels of
congestion. The mechanism MUST result in at least as much congestion. The mechanism MUST result in at least as much
useful throughput as would have resulted if the overload useful throughput as would have resulted if the overload
control mechanism was not in place. control mechanism was not in place.
REQ 22: The mechanism MUST provide a way for an node to throttle the REQ 22: The mechanism MUST provide a way for a node to throttle the
amount of traffic it receives from an peer node. This amount of traffic it receives from a peer node. This
throttling SHOULD be graded so that it can be applied throttling SHOULD be graded so that it can be applied
gradually as offered load increases. Overload is not a gradually as offered load increases. Overload is not a
binary state; there may be degrees of overload. binary state; there may be degrees of overload.
REQ 23: The mechanism MUST provide sufficient information to enable REQ 23: The mechanism MUST provide sufficient information to enable
a load balancing node to divert messages that are rejected a load balancing node to divert messages that are rejected
or otherwise throttled by an overloaded upstream node to or otherwise throttled by an overloaded upstream node to
other upstream nodes that are the most likely to have other upstream nodes that are the most likely to have
sufficient capacity to process them. sufficient capacity to process them.
skipping to change at page 26, line 18 skipping to change at page 26, line 23
[TR23.843] [TR23.843]
3GPP, "Study on Core Network Overload Solutions", 3GPP, "Study on Core Network Overload Solutions",
TR 23.843 0.6.0, October 2012. TR 23.843 0.6.0, October 2012.
[IR.34] GSMA, "Inter-Service Provider IP Backbone Guidelines", [IR.34] GSMA, "Inter-Service Provider IP Backbone Guidelines",
IR 34 7.0, January 2012. IR 34 7.0, January 2012.
[IR.88] GSMA, "LTE Roaming Guidelines", IR 88 7.0, January 2012. [IR.88] GSMA, "LTE Roaming Guidelines", IR 88 7.0, January 2012.
[IR.92] GSMA, "IMS Profile for Voice and SMS", IR 92 7.0,
March 2013.
[TS23.002] [TS23.002]
3GPP, "Network Architecture", TS 23.002 12.0.0, 3GPP, "Network Architecture", TS 23.002 12.0.0,
September 2012. September 2012.
[TS29.272] [TS29.272]
3GPP, "Evolved Packet System (EPS); Mobility Management 3GPP, "Evolved Packet System (EPS); Mobility Management
Entity (MME) and Serving GPRS Support Node (SGSN) related Entity (MME) and Serving GPRS Support Node (SGSN) related
interfaces based on Diameter protocol", TS 29.272 11.4.0, interfaces based on Diameter protocol", TS 29.272 11.4.0,
September 2012. September 2012.
 End of changes. 13 change blocks. 
16 lines changed or deleted 17 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/