draft-ietf-dime-overload-reqs-10.txt   draft-ietf-dime-overload-reqs-11.txt 
Network Working Group E. McMurry Network Working Group E. McMurry
Internet-Draft B. Campbell Internet-Draft B. Campbell
Intended status: Informational Tekelec Intended status: Informational Tekelec
Expires: January 30, 2014 July 29, 2013 Expires: February 27, 2014 August 26, 2013
Diameter Overload Control Requirements Diameter Overload Control Requirements
draft-ietf-dime-overload-reqs-10 draft-ietf-dime-overload-reqs-11
Abstract Abstract
When a Diameter server or agent becomes overloaded, it needs to be When a Diameter server or agent becomes overloaded, it needs to be
able to gracefully reduce its load, typically by informing clients to able to gracefully reduce its load, typically by informing clients to
reduce sending traffic for some period of time. Otherwise, it must reduce sending traffic for some period of time. Otherwise, it must
continue to expend resources parsing and responding to Diameter continue to expend resources parsing and responding to Diameter
messages, possibly resulting in congestion collapse. The existing messages, possibly resulting in congestion collapse. The existing
Diameter mechanisms, listed in Section 4 are not sufficient for this Diameter mechanisms are not sufficient for this purpose. This
purpose. This document describes the limitations of the existing document describes the limitations of the existing mechanisms.
mechanisms in Section 5. Requirements for new overload management Requirements for new overload management mechanisms are also
mechanisms are provided in Section 7. provided.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 30, 2014. This Internet-Draft will expire on February 27, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 30 skipping to change at page 2, line 30
2.3. Interconnect Scenario . . . . . . . . . . . . . . . . . . 12 2.3. Interconnect Scenario . . . . . . . . . . . . . . . . . . 12
3. Diameter Overload Case Studies . . . . . . . . . . . . . . . . 13 3. Diameter Overload Case Studies . . . . . . . . . . . . . . . . 13
3.1. Overload in Mobile Data Networks . . . . . . . . . . . . . 13 3.1. Overload in Mobile Data Networks . . . . . . . . . . . . . 13
3.2. 3GPP Study on Core Network Overload . . . . . . . . . . . 15 3.2. 3GPP Study on Core Network Overload . . . . . . . . . . . 15
4. Existing Mechanisms . . . . . . . . . . . . . . . . . . . . . 15 4. Existing Mechanisms . . . . . . . . . . . . . . . . . . . . . 15
5. Issues with the Current Mechanisms . . . . . . . . . . . . . . 16 5. Issues with the Current Mechanisms . . . . . . . . . . . . . . 16
5.1. Problems with Implicit Mechanism . . . . . . . . . . . . . 17 5.1. Problems with Implicit Mechanism . . . . . . . . . . . . . 17
5.2. Problems with Explicit Mechanisms . . . . . . . . . . . . 17 5.2. Problems with Explicit Mechanisms . . . . . . . . . . . . 17
6. Extensibility and Application Independence . . . . . . . . . . 18 6. Extensibility and Application Independence . . . . . . . . . . 18
7. Solution Requirements . . . . . . . . . . . . . . . . . . . . 19 7. Solution Requirements . . . . . . . . . . . . . . . . . . . . 19
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 7.1. General . . . . . . . . . . . . . . . . . . . . . . . . . 19
9. Security Considerations . . . . . . . . . . . . . . . . . . . 23 7.2. Performance . . . . . . . . . . . . . . . . . . . . . . . 20
7.3. Heterogeneous Support for Solution . . . . . . . . . . . . 21
7.4. Granular Control . . . . . . . . . . . . . . . . . . . . . 21
7.5. Priority and Policy . . . . . . . . . . . . . . . . . . . 22
7.6. Security . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.7. Flexibility and Extensibility . . . . . . . . . . . . . . 23
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24
9. Security Considerations . . . . . . . . . . . . . . . . . . . 24
9.1. Access Control . . . . . . . . . . . . . . . . . . . . . . 24 9.1. Access Control . . . . . . . . . . . . . . . . . . . . . . 24
9.2. Denial-of-Service Attacks . . . . . . . . . . . . . . . . 24 9.2. Denial-of-Service Attacks . . . . . . . . . . . . . . . . 25
9.3. Replay Attacks . . . . . . . . . . . . . . . . . . . . . . 24 9.3. Replay Attacks . . . . . . . . . . . . . . . . . . . . . . 25
9.4. Man-in-the-Middle Attacks . . . . . . . . . . . . . . . . 25 9.4. Man-in-the-Middle Attacks . . . . . . . . . . . . . . . . 25
9.5. Compromised Hosts . . . . . . . . . . . . . . . . . . . . 25 9.5. Compromised Hosts . . . . . . . . . . . . . . . . . . . . 26
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26
10.1. Normative References . . . . . . . . . . . . . . . . . . . 25 10.1. Normative References . . . . . . . . . . . . . . . . . . . 26
10.2. Informative References . . . . . . . . . . . . . . . . . . 26 10.2. Informative References . . . . . . . . . . . . . . . . . . 26
Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 26 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 27
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 27 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 27
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 28
1. Introduction 1. Introduction
A Diameter [RFC6733] node is said to be overloaded when it has A Diameter [RFC6733] node is said to be overloaded when it has
insufficient resources to successfully process all of the Diameter insufficient resources to successfully process all of the Diameter
requests that it receives. When a node becomes overloaded, it needs requests that it receives. When a node becomes overloaded, it needs
to be able to gracefully reduce its load, typically by informing to be able to gracefully reduce its load, typically by informing
clients to reduce sending traffic for some period of time. clients to reduce sending traffic for some period of time.
Otherwise, it must continue to expend resources parsing and Otherwise, it must continue to expend resources parsing and
responding to Diameter messages, possibly resulting in congestion responding to Diameter messages, possibly resulting in congestion
skipping to change at page 15, line 10 skipping to change at page 15, line 10
is ten times that from a non-smartphone, is ten times that from a non-smartphone,
o and second by causing continual registration attempts when a o and second by causing continual registration attempts when a
network failure affects registrations through the 3G data network. network failure affects registrations through the 3G data network.
3.2. 3GPP Study on Core Network Overload 3.2. 3GPP Study on Core Network Overload
A study in 3GPP SA2 on core network overload has produced the A study in 3GPP SA2 on core network overload has produced the
technical report [TR23.843]. This enumerates several causes of technical report [TR23.843]. This enumerates several causes of
overload in mobile core networks including portions that are signaled overload in mobile core networks including portions that are signaled
using Diameter. This document is a work in progress and is not using Diameter. TR23.843 is a work in progress and is not complete.
complete. However, it is useful for pointing out scenarios and the However, it is useful for pointing out scenarios and the general need
general need for an overload control mechanism for Diameter. for an overload control mechanism for Diameter.
It is common for mobile networks to employ more than one radio It is common for mobile networks to employ more than one radio
technology and to do so in an overlay fashion with multiple technology and to do so in an overlay fashion with multiple
technologies present in the same location (such as 2nd or 3rd technologies present in the same location (such as 2nd or 3rd
generation mobile technologies along with LTE). This presents generation mobile technologies along with LTE). This presents
opportunities for traffic storms when issues occur on one overlay and opportunities for traffic storms when issues occur on one overlay and
not another as all devices that had been on the overlay with issues not another as all devices that had been on the overlay with issues
switch. This causes a large amount of Diameter traffic as locations switch. This causes a large amount of Diameter traffic as locations
and policies are updated. and policies are updated.
skipping to change at page 16, line 43 skipping to change at page 16, line 43
facilities (and are at the wrong level) to handle server overload. facilities (and are at the wrong level) to handle server overload.
Transport level congestion management is also not sufficient to Transport level congestion management is also not sufficient to
address overload in cases of multi-hop and multi-destination address overload in cases of multi-hop and multi-destination
signaling. signaling.
5. Issues with the Current Mechanisms 5. Issues with the Current Mechanisms
The currently available Diameter mechanisms for indicating an The currently available Diameter mechanisms for indicating an
overload condition are not adequate to avoid service outages due to overload condition are not adequate to avoid service outages due to
overload. This inadequacy may, in turn, contribute to broader overload. This inadequacy may, in turn, contribute to broader
congestion collapse due to unresponsive Diameter nodes causing congestion impacts due to unresponsive Diameter nodes causing
application or transport layer retransmissions. In particular, they application or transport layer retransmissions. In particular, they
do not allow a Diameter agent or server to shed load as it approaches do not allow a Diameter agent or server to shed load as it approaches
overload. At best, a node can only indicate that it needs to overload. At best, a node can only indicate that it needs to
entirely stop receiving requests, i.e. that it has effectively entirely stop receiving requests, i.e. that it has effectively
failed. Even that is problematic due to the inability to indicate failed. Even that is problematic due to the inability to indicate
durational validity on the transient errors available in the base durational validity on the transient errors available in the base
Diameter protocol. Diameter offers no mechanism to allow a node to Diameter protocol. Diameter offers no mechanism to allow a node to
indicate different overload states for different categories of indicate different overload states for different categories of
messages, for example, if it is overloaded for one Diameter messages, for example, if it is overloaded for one Diameter
application but not another. application but not another.
skipping to change at page 19, line 10 skipping to change at page 19, line 10
specific behavior over and above the mechanism's defaults. For specific behavior over and above the mechanism's defaults. For
example, an application specification might specify relative example, an application specification might specify relative
priorities of messages or selection of a specific overload control priorities of messages or selection of a specific overload control
algorithm. algorithm.
7. Solution Requirements 7. Solution Requirements
This section proposes requirements for an improved mechanism to This section proposes requirements for an improved mechanism to
control Diameter overload, with the goals of improving the issues control Diameter overload, with the goals of improving the issues
described in Section 5 and supporting the scenarios described in described in Section 5 and supporting the scenarios described in
Section 2 Section 2. These requirements are stated primarily in terms of
individual node behavior to inform the design of the improved
mechanism; solution designers should keep in mind that the overall
goal is improved overall system behavior across all the nodes
involved, not just improved behavior from specific individual nodes.
REQ 1: The solution MUST provide a communication method for 7.1. General
Diameter nodes to exchange load and overload information.
REQ 2: The solution MUST allow Diameter nodes to support overload REQ 1: The solution MUST provide a communication method for Diameter
control regardless of which Diameter applications they nodes to exchange load and overload information.
support. Diameter clients and agents must be able to use
the received load and overload information to support
graceful behavior during an overload condition. Graceful
behavior under overload conditions is best described by REQ
3.
REQ 3: The solution MUST limit the impact of overload on the REQ 2: The solution MUST allow Diameter nodes to support overload
overall useful throughput of a Diameter server, even when control regardless of which Diameter applications they
the incoming load on the network is far in excess of its support. Diameter clients and agents must be able to use the
capacity. The overall useful throughput under load is the received load and overload information to support graceful
ultimate measure of the value of a solution. behavior during an overload condition. Graceful behavior
under overload conditions is best described by REQ 3.
REQ 4: Diameter allows requests to be sent from either side of a REQ 3: The solution MUST limit the impact of overload on the overall
connection and either side of a connection may have need to useful throughput of a Diameter server, even when the
provide its overload status. The solution MUST allow each incoming load on the network is far in excess of its
side of a connection to independently inform the other of capacity. The overall useful throughput under load is the
its overload status. ultimate measure of the value of a solution.
REQ 5: Diameter allows nodes to determine their peers via dynamic REQ 4: Diameter allows requests to be sent from either side of a
discovery or manual configuration. The solution MUST work connection and either side of a connection may have need to
consistently without regard to how peers are determined. provide its overload status. The solution MUST allow each
side of a connection to independently inform the other of its
overload status.
REQ 6: The solution designers SHOULD seek to minimize the amount of REQ 5: Diameter allows nodes to determine their peers via dynamic
new configuration required in order to work. For example, discovery or manual configuration. The solution MUST work
it is better to allow peers to advertise or negotiate consistently without regard to how peers are determined.
support for the solution, rather than to require this
knowledge to be configured at each node. REQ 6: The solution designers SHOULD seek to minimize the amount of
new configuration required in order to work. For example, it
is better to allow peers to advertise or negotiate support
for the solution, rather than to require this knowledge to be
configured at each node.
7.2. Performance
REQ 7: The solution and any associated default algorithm(s) MUST REQ 7: The solution and any associated default algorithm(s) MUST
ensure that the system remains stable. At some point after ensure that the system remains stable. At some point after
an overload condition has ended, the solution MUST enable an overload condition has ended, the solution MUST enable
capacity to stabilize and become equal to what it would be capacity to stabilize and become equal to what it would be
in the absence of an overload condition. Note that this in the absence of an overload condition. Note that this
also requires that the solution MUST allow nodes to shed also requires that the solution MUST allow nodes to shed
load without introducing non converging oscillations during load without introducing non converging oscillations during
or after an overload condition. or after an overload condition.
skipping to change at page 20, line 44 skipping to change at page 21, line 11
increase of traffic with little time between normal levels increase of traffic with little time between normal levels
and overload inducing levels. The solution SHOULD provide and overload inducing levels. The solution SHOULD provide
for rapid feedback when traffic levels increase. for rapid feedback when traffic levels increase.
REQ 15: The solution MUST NOT interfere with the congestion control REQ 15: The solution MUST NOT interfere with the congestion control
mechanisms of underlying transport protocols. For example, mechanisms of underlying transport protocols. For example,
a solution that opened additional TCP connections when the a solution that opened additional TCP connections when the
network is congested would reduce the effectiveness of the network is congested would reduce the effectiveness of the
underlying congestion control mechanisms. underlying congestion control mechanisms.
7.3. Heterogeneous Support for Solution
REQ 16: The solution is likely to be deployed incrementally. The REQ 16: The solution is likely to be deployed incrementally. The
solution MUST support a mixed environment where some, but solution MUST support a mixed environment where some, but
not all, nodes implement it. not all, nodes implement it.
REQ 17: In a mixed environment with nodes that support the solution REQ 17: In a mixed environment with nodes that support the solution
and that do not, the solution MUST NOT result in materially and that do not, the solution MUST NOT result in materially
less useful throughput as would have resulted if the less useful throughput as would have resulted if the
solution were not present. It SHOULD result in less severe solution were not present. It SHOULD result in less severe
congestion in this environment. congestion in this environment.
skipping to change at page 21, line 29 skipping to change at page 21, line 47
distinguishable from other errors reported via Diameter. distinguishable from other errors reported via Diameter.
REQ 21: In cases where a network node fails, is so overloaded that REQ 21: In cases where a network node fails, is so overloaded that
it cannot process messages, or cannot communicate due to a it cannot process messages, or cannot communicate due to a
network failure, it may not be able to provide explicit network failure, it may not be able to provide explicit
indications of the nature of the failure or its levels of indications of the nature of the failure or its levels of
congestion. The solution MUST result in at least as much congestion. The solution MUST result in at least as much
useful throughput as would have resulted if the solution was useful throughput as would have resulted if the solution was
not in place. not in place.
7.4. Granular Control
REQ 22: The solution MUST provide a way for a node to throttle the REQ 22: The solution MUST provide a way for a node to throttle the
amount of traffic it receives from a peer node. This amount of traffic it receives from a peer node. This
throttling SHOULD be graded so that it can be applied throttling SHOULD be graded so that it can be applied
gradually as offered load increases. Overload is not a gradually as offered load increases. Overload is not a
binary state; there may be degrees of overload. binary state; there may be degrees of overload.
REQ 23: The solution MUST provide sufficient information to enable a REQ 23: The solution MUST provide sufficient information to enable a
load balancing node to divert messages that are rejected or load balancing node to divert messages that are rejected or
otherwise throttled by an overloaded upstream node to other otherwise throttled by an overloaded upstream node to other
upstream nodes that are the most likely to have sufficient upstream nodes that are the most likely to have sufficient
capacity to process them. capacity to process them.
REQ 24: The solution MUST provide a mechanism for indicating load REQ 24: The solution MUST provide a mechanism for indicating load
levels even when not in an overloaded condition, to assist levels even when not in an overloaded condition, to assist
nodes making decisions to prevent overload conditions from nodes making decisions to prevent overload conditions from
occurring. occurring.
7.5. Priority and Policy
REQ 25: The base specification for the solution SHOULD offer general REQ 25: The base specification for the solution SHOULD offer general
guidance on which message types might be desirable to send guidance on which message types might be desirable to send
or process over others during times of overload, based on or process over others during times of overload, based on
application-specific considerations. For example, it may be application-specific considerations. For example, it may be
more beneficial to process messages for existing sessions more beneficial to process messages for existing sessions
ahead of new sessions. Some networks may have a requirement ahead of new sessions. Some networks may have a requirement
to give priority to requests associated with emergency to give priority to requests associated with emergency
sessions. Any normative or otherwise detailed definition of sessions. Any normative or otherwise detailed definition of
the relative priorities of message types during an overload the relative priorities of message types during an overload
condition will be the responsibility of the application condition will be the responsibility of the application
specification. specification.
REQ 26: The solution MUST NOT prevent a node from prioritizing REQ 26: The solution MUST NOT prevent a node from prioritizing
requests based on any local policy, so that certain requests requests based on any local policy, so that certain requests
are given preferential treatment, given additional are given preferential treatment, given additional
retransmission, not throttled, or processed ahead of others. retransmission, not throttled, or processed ahead of others.
7.6. Security
REQ 27: The solution MUST NOT provide new vulnerabilities to REQ 27: The solution MUST NOT provide new vulnerabilities to
malicious attack, or increase the severity of any existing malicious attack, or increase the severity of any existing
vulnerabilities. This includes vulnerabilities to DoS and vulnerabilities. This includes vulnerabilities to DoS and
DDoS attacks as well as replay and man-in-the middle DDoS attacks as well as replay and man-in-the middle
attacks. Note that the Diameter base specification attacks. Note that the Diameter base specification
[RFC6733] lacks end to end security and this must be [RFC6733] lacks end to end security and this must be
considered. considered. Note that this requirement was expressed at a
high level so as to not preclude any particular solution.
Is is expected that the solution will address this in more
detail.
REQ 28: The solution MUST NOT depend on being deployed in REQ 28: The solution MUST NOT depend on being deployed in
environments where all Diameter nodes are completely environments where all Diameter nodes are completely
trusted. It SHOULD operate as effectively as possible in trusted. It SHOULD operate as effectively as possible in
environments where other nodes are malicious; this includes environments where other nodes are malicious; this includes
preventing malicious nodes from obtaining more than a fair preventing malicious nodes from obtaining more than a fair
share of service. Note that this does not imply any share of service. Note that this does not imply any
responsibility on the solution to detect, or take responsibility on the solution to detect, or take
countermeasures against, malicious nodes. countermeasures against, malicious nodes.
skipping to change at page 22, line 45 skipping to change at page 23, line 31
their nodes to be sensitive information to restrict access their nodes to be sensitive information to restrict access
to that information. Of course, in such cases, there is no to that information. Of course, in such cases, there is no
expectation that the solution itself will help prevent expectation that the solution itself will help prevent
overload from that peer node. overload from that peer node.
REQ 30: The solution MUST NOT interfere with any Diameter compliant REQ 30: The solution MUST NOT interfere with any Diameter compliant
method that a node may use to protect itself from overload method that a node may use to protect itself from overload
from non-supporting nodes, or from denial of service from non-supporting nodes, or from denial of service
attacks. attacks.
7.7. Flexibility and Extensibility
REQ 31: There are multiple situations where a Diameter node may be REQ 31: There are multiple situations where a Diameter node may be
overloaded for some purposes but not others. For example, overloaded for some purposes but not others. For example,
this can happen to an agent or server that supports multiple this can happen to an agent or server that supports multiple
applications, or when a server depends on multiple external applications, or when a server depends on multiple external
resources, some of which may become overloaded while others resources, some of which may become overloaded while others
are fully available. The solution MUST allow Diameter nodes are fully available. The solution MUST allow Diameter nodes
to indicate overload with sufficient granularity to allow to indicate overload with sufficient granularity to allow
clients to take action based on the overloaded resources clients to take action based on the overloaded resources
without unreasonably forcing available capacity to go without unreasonably forcing available capacity to go
unused. The solution MUST support specification of overload unused. The solution MUST support specification of overload
skipping to change at page 26, line 15 skipping to change at page 27, line 6
10.2. Informative References 10.2. Informative References
[RFC5390] Rosenberg, J., "Requirements for Management of Overload in [RFC5390] Rosenberg, J., "Requirements for Management of Overload in
the Session Initiation Protocol", RFC 5390, December 2008. the Session Initiation Protocol", RFC 5390, December 2008.
[RFC6357] Hilt, V., Noel, E., Shen, C., and A. Abdelal, "Design [RFC6357] Hilt, V., Noel, E., Shen, C., and A. Abdelal, "Design
Considerations for Session Initiation Protocol (SIP) Considerations for Session Initiation Protocol (SIP)
Overload Control", RFC 6357, August 2011. Overload Control", RFC 6357, August 2011.
[TR23.843] [TR23.843]
3GPP, "Study on Core Network Overload Solutions", 3GPP, "Study on Core Network Overload Solutions (Work in
TR 23.843 0.6.0, October 2012. Progress)", TR 23.843 0.6.0, October 2012.
[IR.34] GSMA, "Inter-Service Provider IP Backbone Guidelines", [IR.34] GSMA, "Inter-Service Provider IP Backbone Guidelines",
IR 34 7.0, January 2012. IR 34 7.0, January 2012.
[IR.88] GSMA, "LTE Roaming Guidelines", IR 88 7.0, January 2012. [IR.88] GSMA, "LTE Roaming Guidelines", IR 88 7.0, January 2012.
[IR.92] GSMA, "IMS Profile for Voice and SMS", IR 92 7.0, [IR.92] GSMA, "IMS Profile for Voice and SMS", IR 92 7.0,
March 2013. March 2013.
[TS23.002] [TS23.002]
 End of changes. 25 change blocks. 
51 lines changed or deleted 78 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/