draft-ietf-dime-overload-reqs-07.txt   draft-ietf-dime-overload-reqs-08.txt 
Network Working Group E. McMurry Network Working Group E. McMurry
Internet-Draft B. Campbell Internet-Draft B. Campbell
Intended status: Informational Tekelec Intended status: Informational Tekelec
Expires: December 8, 2013 June 6, 2013 Expires: January 16, 2014 July 15, 2013
Diameter Overload Control Requirements Diameter Overload Control Requirements
draft-ietf-dime-overload-reqs-07 draft-ietf-dime-overload-reqs-08
Abstract Abstract
When a Diameter server or agent becomes overloaded, it needs to be When a Diameter server or agent becomes overloaded, it needs to be
able to gracefully reduce its load, typically by informing clients to able to gracefully reduce its load, typically by informing clients to
reduce sending traffic for some period of time. Otherwise, it must reduce sending traffic for some period of time. Otherwise, it must
continue to expend resources parsing and responding to Diameter continue to expend resources parsing and responding to Diameter
messages, possibly resulting in congestion collapse. The existing messages, possibly resulting in congestion collapse. The existing
Diameter mechanisms, listed in Section 3 are not sufficient for this Diameter mechanisms, listed in Section 4 are not sufficient for this
purpose. This document describes the limitations of the existing purpose. This document describes the limitations of the existing
mechanisms in Section 4. Requirements for new overload management mechanisms in Section 5. Requirements for new overload management
mechanisms are provided in Section 7. mechanisms are provided in Section 7.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 8, 2013. This Internet-Draft will expire on January 16, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Causes of Overload . . . . . . . . . . . . . . . . . . . . 3 1.1. Documentation Conventions . . . . . . . . . . . . . . . . 3
1.2. Effects of Overload . . . . . . . . . . . . . . . . . . . 5 1.2. Causes of Overload . . . . . . . . . . . . . . . . . . . . 4
1.3. Overload vs. Network Congestion . . . . . . . . . . . . . 5 1.3. Effects of Overload . . . . . . . . . . . . . . . . . . . 5
1.4. Diameter Applications in a Broader Network . . . . . . . . 5 1.4. Overload vs. Network Congestion . . . . . . . . . . . . . 6
1.5. Documentation Conventions . . . . . . . . . . . . . . . . 6 1.5. Diameter Applications in a Broader Network . . . . . . . . 6
2. Overload Scenarios . . . . . . . . . . . . . . . . . . . . . . 6 2. Overload Control Scenarios . . . . . . . . . . . . . . . . . . 6
2.1. Peer to Peer Scenarios . . . . . . . . . . . . . . . . . . 7 2.1. Peer to Peer Scenarios . . . . . . . . . . . . . . . . . . 7
2.2. Agent Scenarios . . . . . . . . . . . . . . . . . . . . . 9 2.2. Agent Scenarios . . . . . . . . . . . . . . . . . . . . . 9
2.3. Interconnect Scenario . . . . . . . . . . . . . . . . . . 12 2.3. Interconnect Scenario . . . . . . . . . . . . . . . . . . 12
3. Existing Mechanisms . . . . . . . . . . . . . . . . . . . . . 13 3. Diameter Overload Case Studies . . . . . . . . . . . . . . . . 13
4. Issues with the Current Mechanisms . . . . . . . . . . . . . . 14 3.1. Overload in Mobile Data Networks . . . . . . . . . . . . . 13
4.1. Problems with Implicit Mechanism . . . . . . . . . . . . . 15 3.2. 3GPP Study on Core Network Overload . . . . . . . . . . . 15
4.2. Problems with Explicit Mechanisms . . . . . . . . . . . . 15 4. Existing Mechanisms . . . . . . . . . . . . . . . . . . . . . 15
5. Diameter Overload Case Studies . . . . . . . . . . . . . . . . 16 5. Issues with the Current Mechanisms . . . . . . . . . . . . . . 16
5.1. Overload in Mobile Data Networks . . . . . . . . . . . . . 16 5.1. Problems with Implicit Mechanism . . . . . . . . . . . . . 17
5.2. 3GPP Study on Core Network Overload . . . . . . . . . . . 17 5.2. Problems with Explicit Mechanisms . . . . . . . . . . . . 17
6. Extensibility and Application Independence . . . . . . . . . . 18 6. Extensibility and Application Independence . . . . . . . . . . 18
7. Solution Requirements . . . . . . . . . . . . . . . . . . . . 18 7. Solution Requirements . . . . . . . . . . . . . . . . . . . . 19
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23
9. Security Considerations . . . . . . . . . . . . . . . . . . . 23 9. Security Considerations . . . . . . . . . . . . . . . . . . . 23
9.1. Access Control . . . . . . . . . . . . . . . . . . . . . . 24 9.1. Access Control . . . . . . . . . . . . . . . . . . . . . . 24
9.2. Denial-of-Service Attacks . . . . . . . . . . . . . . . . 24 9.2. Denial-of-Service Attacks . . . . . . . . . . . . . . . . 24
9.3. Replay Attacks . . . . . . . . . . . . . . . . . . . . . . 24 9.3. Replay Attacks . . . . . . . . . . . . . . . . . . . . . . 24
9.4. Man-in-the-Middle Attacks . . . . . . . . . . . . . . . . 25 9.4. Man-in-the-Middle Attacks . . . . . . . . . . . . . . . . 25
9.5. Compromised Hosts . . . . . . . . . . . . . . . . . . . . 25 9.5. Compromised Hosts . . . . . . . . . . . . . . . . . . . . 25
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
10.1. Normative References . . . . . . . . . . . . . . . . . . . 25 10.1. Normative References . . . . . . . . . . . . . . . . . . . 25
10.2. Informative References . . . . . . . . . . . . . . . . . . 26 10.2. Informative References . . . . . . . . . . . . . . . . . . 26
Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 26 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 26
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 27 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 27
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27
1. Introduction 1. Introduction
When a Diameter [RFC6733] server or agent becomes overloaded, it A Diameter [RFC6733] node is said to be overloaded when it has
needs to be able to gracefully reduce its load, typically by insufficient resources to successfully process all of the Diameter
informing clients to reduce sending traffic for some period of time. requests that it receives. When a node becomes overloaded, it needs
to be able to gracefully reduce its load, typically by informing
clients to reduce sending traffic for some period of time.
Otherwise, it must continue to expend resources parsing and Otherwise, it must continue to expend resources parsing and
responding to Diameter messages, possibly resulting in congestion responding to Diameter messages, possibly resulting in congestion
collapse. The existing mechanisms provided by Diameter are not collapse. The existing mechanisms provided by Diameter are not
sufficient for this purpose. This document describes the limitations sufficient for this purpose. This document describes the limitations
of the existing mechanisms, and provides requirements for new of the existing mechanisms, and provides requirements for new
overload management mechanisms. overload management mechanisms.
This document draws on the work done on SIP overload control This document draws on the work done on SIP overload control
([RFC5390], [RFC6357]) as well as on experience gained via overload ([RFC5390], [RFC6357]) as well as on experience gained via overload
handling in Signaling System No. 7 (SS7) networks and studies done by handling in Signaling System No. 7 (SS7) networks and studies done by
the Third Generation Partnership Project (3GPP) (Section 5). the Third Generation Partnership Project (3GPP) (Section 3).
Diameter is not typically an end-user protocol; rather it is Diameter is not typically an end-user protocol; rather it is
generally used as one component in support of some end-user activity. generally used as one component in support of some end-user activity.
For example, a SIP server might use Diameter to authenticate and For example, a SIP server might use Diameter to authenticate and
authorize user access. Overload in the Diameter backend authorize user access. Overload in the Diameter backend
infrastructure will likely impact the experience observed by the end infrastructure will likely impact the experience observed by the end
user in the SIP application. user in the SIP application.
The impact of Diameter overload on the client application (a client The impact of Diameter overload on the client application (a client
application may use the Diameter protocol and other protocols to do application may use the Diameter protocol and other protocols to do
its job) is beyond the scope of this document. its job) is beyond the scope of this document.
This document presents non-normative descriptions of causes of This document presents non-normative descriptions of causes of
overload along with related scenarios and studies. Finally, it overload along with related scenarios and studies. Finally, it
offers a set of normative requirements for an improved overload offers a set of normative requirements for an improved overload
indication mechanism. indication mechanism.
1.1. Causes of Overload 1.1. Documentation Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as defined in [RFC2119], with the
exception that they are not intended for interoperability of
implementations.
These requirements concern the standards specification process and
not the implementation of specified standards. All requirements in
this document must be reflected by standards specifications to be
developed. However, which of the features specified by these
standards will be mandatory, recommended, or optional for compliant
implementations is to be defined by standards track document(s) and
not in this document.
The terms "client", "server", "agent", "node", "peer", "upstream",
and "downstream" are used as defined in [RFC6733].
1.2. Causes of Overload
Overload occurs when an element, such as a Diameter server or agent, Overload occurs when an element, such as a Diameter server or agent,
has insufficient resources to successfully process all of the traffic has insufficient resources to successfully process all of the traffic
it is receiving. Resources include all of the capabilities of the it is receiving. Resources include all of the capabilities of the
element used to process a request, including CPU processing, memory, element used to process a request, including CPU processing, memory,
I/O, and disk resources. It can also include external resources such I/O, and disk resources. It can also include external resources such
as a database or DNS server, in which case the CPU, processing, as a database or DNS server, in which case the CPU, processing,
memory, I/O, and disk resources of those elements are effectively memory, I/O, and disk resources of those elements are effectively
part of the logical element processing the request. part of the logical element processing the request.
External resources can include upstream Diameter nodes; for example,
a Diameter agent can become effectively overloaded if one or more
upstream nodes are overloaded. While overload is not the same thing
as network congestion, network congestion can reduce a Diameter nodes
ability to process and respond to requests, thus contributing to
overload.
A Diameter node can become overloaded due to request levels that
exceed its capacity, a reduction of available resources ( for
example, a local or upstream hardware failure) or a combination of
the two.
Overload can occur for many reasons, including: Overload can occur for many reasons, including:
Inadequate capacity: When designing Diameter networks, that is, Inadequate capacity: When designing Diameter networks, that is,
application layer multi-node Diameter deployments, it can be very application layer multi-node Diameter deployments, it can be very
difficult to predict all scenarios that may cause elevated difficult to predict all scenarios that may cause elevated
traffic. It may also be more costly to implement support for some traffic. It may also be more costly to implement support for some
scenarios than a network operator may deem worthwhile. This scenarios than a network operator may deem worthwhile. This
results in the likelihood that a Diameter network will not have results in the likelihood that a Diameter network will not have
adequate capacity to handle all situations. adequate capacity to handle all situations.
skipping to change at page 5, line 12 skipping to change at page 5, line 41
aggressive registration strategies that generate unusually high aggressive registration strategies that generate unusually high
Diameter traffic loads. Diameter traffic loads.
DoS attacks: An attacker, wishing to disrupt service in the network, DoS attacks: An attacker, wishing to disrupt service in the network,
can cause a large amount of traffic to be launched at a target can cause a large amount of traffic to be launched at a target
element. This can be done from a central source of traffic or element. This can be done from a central source of traffic or
through a distributed DoS attack. In all cases, the volume of through a distributed DoS attack. In all cases, the volume of
traffic well exceeds the capacity of the element, sending the traffic well exceeds the capacity of the element, sending the
system into overload. system into overload.
1.2. Effects of Overload 1.3. Effects of Overload
Modern Diameter networks, comprised of application layer multi-node Modern Diameter networks, comprised of application layer multi-node
deployments of Diameter elements, may operate at very large deployments of Diameter elements, may operate at very large
transaction volumes. If a Diameter node becomes overloaded, or even transaction volumes. If a Diameter node becomes overloaded, or even
worse, fails completely, a large number of messages may be lost very worse, fails completely, a large number of messages may be lost very
quickly. Even with redundant servers, many messages can be lost in quickly. Even with redundant servers, many messages can be lost in
the time it takes for failover to complete. While a Diameter client the time it takes for failover to complete. While a Diameter client
or agent should be able to retry such requests, an overloaded peer or agent should be able to retry such requests, an overloaded peer
may cause a sudden large increase in the number of transaction may cause a sudden large increase in the number of transaction
transactions needing to be retried, rapidly filling local queues or transactions needing to be retried, rapidly filling local queues or
otherwise contributing to local overload. Therefore Diameter devices otherwise contributing to local overload. Therefore Diameter devices
need to be able to shed load before critical failures can occur. need to be able to shed load before critical failures can occur.
1.3. Overload vs. Network Congestion 1.4. Overload vs. Network Congestion
This document uses the term "overload" to refer to application-layer This document uses the term "overload" to refer to application-layer
overload at Diameter nodes. This is distinct from "network overload at Diameter nodes. This is distinct from "network
congestion", that is, congestion that occurs at the lower networking congestion", that is, congestion that occurs at the lower networking
layers that may impact the delivery of Diameter messages between layers that may impact the delivery of Diameter messages between
nodes. The authors recognize that element overload and network nodes. The authors recognize that element overload and network
congestion are interrelated, and that overload can contribute to congestion are interrelated, and that overload can contribute to
network congestion and vice versa. network congestion and vice versa.
Network congestion issues are better handled by the transport Network congestion issues are better handled by the transport
protocols. Diameter uses TCP and SCTP, both of which include protocols. Diameter uses TCP and SCTP, both of which include
congestion management features. Analysis of whether those features congestion management features. Analysis of whether those features
are sufficient for transport level congestion between Diameter nodes, are sufficient for transport level congestion between Diameter nodes,
and any work to further mitigate network congestion is out of scope and any work to further mitigate network congestion is out of scope
both for this document, and for the work proposed by this document. both for this document, and for the work proposed by this document.
1.4. Diameter Applications in a Broader Network 1.5. Diameter Applications in a Broader Network
Most elements using Diameter applications do not use Diameter Most elements using Diameter applications do not use Diameter
exclusively. It is important to realize that overload of an element exclusively. It is important to realize that overload of an element
can be caused by a number of factors that may be unrelated to the can be caused by a number of factors that may be unrelated to the
processing of Diameter or Diameter applications. processing of Diameter or Diameter applications.
A element communicating via protocols other than Diameter that is A element communicating via protocols other than Diameter that is
also using a Diameter application needs to be able to signal to also using a Diameter application needs to be able to signal to
Diameter peers that it is experiencing overload regardless of the Diameter peers that it is experiencing overload regardless of the
cause of the overload, since the overload will affect that element's cause of the overload, since the overload will affect that element's
ability to process Diameter transactions. The element may also need ability to process Diameter transactions. The element may also need
to signal this on other protocols depending on its function and the to signal this on other protocols depending on its function and the
architecture of the network and application it is providing services architecture of the network and application it is providing services
for. Whether that is necessary can only be decided within the for. Whether that is necessary can only be decided within the
context of that architecture and application. A mechanism for context of that architecture and application. A mechanism for
signaling overload with Diameter, which this specification details signaling overload with Diameter, which this specification details
the requirements for, provides applications the ability to signal the requirements for, provides Diameter nodes the ability to signal
their Diameter peers of overload, mitigating that part of the issue. their Diameter peers of overload, mitigating that part of the issue.
Applications may need to use this, as well as other mechanisms, to Diameter nodes may need to use this, as well as other mechanisms, to
solve their broader overload issues. Indicating overload on solve their broader overload issues. Indicating overload on
protocols other than Diameter is out of scope for this document, and protocols other than Diameter is out of scope for this document, and
for the work proposed by this document. for the work proposed by this document.
1.5. Documentation Conventions 2. Overload Control Scenarios
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
The terms "client", "server", "agent", "node", "peer", "upstream",
and "downstream" are used as defined in [RFC6733].
2. Overload Scenarios
Several Diameter deployment scenarios exist that may impact overload Several Diameter deployment scenarios exist that may impact overload
management. The following scenarios help motivate the requirements management. The following scenarios help motivate the requirements
for an overload management mechanism. for an overload management mechanism.
These scenarios are by no means exhaustive, and are in general These scenarios are by no means exhaustive, and are in general
simplified for the sake of clarity. In particular, the authors simplified for the sake of clarity. In particular, the authors
assume for the sake of clarity that the client sends Diameter assume for the sake of clarity that the client sends Diameter
requests to the server, and the server sends responses to client, requests to the server, and the server sends responses to client,
even though Diameter supports bidirectional applications. Each even though Diameter supports bidirectional applications. Each
skipping to change at page 11, line 40 skipping to change at page 11, line 49
| | | |
| | | |
| Client | | Client |
| | | |
+------------------+ +------------------+
Figure 5: Multiple Server Agent Scenario Figure 5: Multiple Server Agent Scenario
Figure 6 shows a scenario where an agent routes requests to a set of Figure 6 shows a scenario where an agent routes requests to a set of
servers for more than one Diameter realm and application. In this servers for more than one Diameter realm and application. In this
scenario, if server 1 becomes overloaded or unavailable, the agent scenario, if server 1 becomes overloaded or unavailable while server
may effectively operate at reduced capacity for application A, but at 2 still has available capacity, the agent may effectively operate at
full capacity for application B. Therefore, the agent needs to be reduced capacity for application A, but at full capacity for
able to report that it is overloaded for one application, but not for application B. Therefore, the agent needs to be able to report that
another. it is overloaded for one application, but not for another.
+--------------------------------------------+ +--------------------------------------------+
| Application A +----------------------+----------------------+ | Application A +----------------------+----------------------+
|+------------------+ | +----------------+ | +------------------+| |+------------------+ | +----------------+ | +------------------+|
|| | | | | | | || || | | | | | | ||
|| | | | | | | || || | | | | | | ||
|| Server 1 | | | Server 2 | | | Server 3 || || Server 1 | | | Server 2 | | | Server 3 ||
|| | | | | | | || || | | | | | | ||
|+---------+--------+ | +-------+--------+ | +--+---------------+| |+---------+--------+ | +-------+--------+ | +--+---------------+|
| | | | | | | | | | | | | |
skipping to change at page 13, line 44 skipping to change at page 13, line 44
This case is distinct from those internal to a network operator's This case is distinct from those internal to a network operator's
network, where there may be many more elements in a more complicated network, where there may be many more elements in a more complicated
topology. Also, the elements in the interconnect network may not topology. Also, the elements in the interconnect network may not
support Diameter overload control, and the network operators may not support Diameter overload control, and the network operators may not
want the interconnect network to use overload or loading information. want the interconnect network to use overload or loading information.
They may only want the information to pass through the interconnect They may only want the information to pass through the interconnect
network without further processing or action by the interconnect network without further processing or action by the interconnect
network even if the elements in the interconnect network do support network even if the elements in the interconnect network do support
Diameter overload control. Diameter overload control.
3. Existing Mechanisms 3. Diameter Overload Case Studies
3.1. Overload in Mobile Data Networks
As the number of Third Generation (3G) and Long Term Evolution (LTE)
enabled smartphone devices continue to expand in mobile networks,
there have been situations where high signaling traffic load led to
overload events at the Diameter-based Home Location Registries (HLR)
and/or Home Subscriber Servers (HSS) [TR23.843]. The root causes of
the HLR congestion events were manifold but included hardware failure
and procedural errors. The result was high signaling traffic load on
the HLR and HSS.
The 3GPP architecture [TS23.002] makes extensive use of Diameter. It
is used for mobility management [TS29.272] (and others), (IP
Multimedia Subsystem) IMS [TS29.228] (and others), policy and
charging control [TS29.212] (and others) as well as other functions.
The details of the architecture are out of scope for this document,
but it is worth noting that there are quite a few Diameter
applications, some with quite large amounts of Diameter signaling in
deployed networks.
The 3GPP specifications do not currently address overload for
Diameter applications or provide an equivalent load control mechanism
to those provided in the more traditional SS7 elements in (Global
System for Mobile Communications) GSM [TS29.002]. The capabilities
specified in the 3GPP standards do not adequately address the
abnormal condition where excessively high signaling traffic load
situations are experienced.
Smartphones, an increasingly large percentage of mobile devices,
contribute much more heavily, relative to non-smartphones, to the
continuation of a registration surge due to their very aggressive
registration algorithms. Smartphone behavior contributes to network
loading and can contribute to overload conditions. The aggressive
smartphone logic is designed to:
a. always have voice and data registration, and
b. constantly try to be on 3G or LTE data (and thus on 3G voice or
VoLTE [IR.92]) for their added benefits.
Non-smartphones typically have logic to wait for a time period after
registering successfully on voice and data.
The smartphone aggressive registration is problematic in two ways:
o first by generating excessive signaling load towards the HSS that
is ten times that from a non-smartphone,
o and second by causing continual registration attempts when a
network failure affects registrations through the 3G data network.
3.2. 3GPP Study on Core Network Overload
A study in 3GPP SA2 on core network overload has produced the
technical report [TR23.843]. This enumerates several causes of
overload in mobile core networks including portions that are signaled
using Diameter. This document is a work in progress and is not
complete. However, it is useful for pointing out scenarios and the
general need for an overload control mechanism for Diameter.
It is common for mobile networks to employ more than one radio
technology and to do so in an overlay fashion with multiple
technologies present in the same location (such as 2nd or 3rd
generation mobile technologies along with LTE). This presents
opportunities for traffic storms when issues occur on one overlay and
not another as all devices that had been on the overlay with issues
switch. This causes a large amount of Diameter traffic as locations
and policies are updated.
Another scenario called out by this study is a flood of registration
and mobility management events caused by some element in the core
network failing. This flood of traffic from end nodes falls under
the network initiated traffic flood category. There is likely to
also be traffic resulting directly from the component failure in this
case. A similar flood can occur when elements or components recover
as well.
Subscriber initiated traffic floods are also indicated in this study
as an overload mechanism where a large number of mobile devices
attempting to access services at the same time, such as in response
to an entertainment event or a catastrophic event.
While this 3GPP study is concerned with the broader effects of these
scenarios on wireless networks and their elements, they have
implications specifically for Diameter signaling. One of the goals
of this document is to provide guidance for a core mechanism that can
be used to mitigate the scenarios called out by this study.
4. Existing Mechanisms
Diameter offers both implicit and explicit mechanisms for a Diameter Diameter offers both implicit and explicit mechanisms for a Diameter
node to learn that a peer is overloaded or unreachable. The implicit node to learn that a peer is overloaded or unreachable. The implicit
mechanism is simply the lack of responses to requests. If a client mechanism is simply the lack of responses to requests. If a client
fails to receive a response in a certain time period, it assumes the fails to receive a response in a certain time period, it assumes the
upstream peer is unavailable, or overloaded to the point of effective upstream peer is unavailable, or overloaded to the point of effective
unavailability. The watchdog mechanism [RFC3539] ensures that a unavailability. The watchdog mechanism [RFC3539] ensures that a
certain rate of transaction responses occur even when there is certain rate of transaction responses occur even when there is
otherwise little or no other Diameter traffic. otherwise little or no other Diameter traffic.
skipping to change at page 14, line 34 skipping to change at page 16, line 32
originates (in the case of a client) or inform the client to reduce originates (in the case of a client) or inform the client to reduce
traffic (in the case of an agent.) traffic (in the case of an agent.)
Diameter requires the use of a congestion-managed transport layer, Diameter requires the use of a congestion-managed transport layer,
currently TCP or SCTP, to mitigate network congestion. It is currently TCP or SCTP, to mitigate network congestion. It is
expected that these transports manage network congestion and that expected that these transports manage network congestion and that
issues with transport (e.g. congestion propagation and window issues with transport (e.g. congestion propagation and window
management) are managed at that level. But even with a congestion- management) are managed at that level. But even with a congestion-
managed transport, a Diameter node can become overloaded at the managed transport, a Diameter node can become overloaded at the
Diameter protocol or application layers due to the causes described Diameter protocol or application layers due to the causes described
in Section 1.1 and congestion managed transports do not provide in Section 1.2 and congestion managed transports do not provide
facilities (and are at the wrong level) to handle server overload. facilities (and are at the wrong level) to handle server overload.
Transport level congestion management is also not sufficient to Transport level congestion management is also not sufficient to
address overload in cases of multi-hop and multi-destination address overload in cases of multi-hop and multi-destination
signaling. signaling.
4. Issues with the Current Mechanisms 5. Issues with the Current Mechanisms
The currently available Diameter mechanisms for indicating an The currently available Diameter mechanisms for indicating an
overload condition are not adequate to avoid service outages due to overload condition are not adequate to avoid service outages due to
overload. This inadequacy may, in turn, contribute to broader overload. This inadequacy may, in turn, contribute to broader
congestion collapse due to unresponsive Diameter nodes causing congestion collapse due to unresponsive Diameter nodes causing
application or transport layer retransmissions. In particular, they application or transport layer retransmissions. In particular, they
do not allow a Diameter agent or server to shed load as it approaches do not allow a Diameter agent or server to shed load as it approaches
overload. At best, a node can only indicate that it needs to overload. At best, a node can only indicate that it needs to
entirely stop receiving requests, i.e. that it has effectively entirely stop receiving requests, i.e. that it has effectively
failed. Even that is problematic due to the inability to indicate failed. Even that is problematic due to the inability to indicate
durational validity on the transient errors available in the base durational validity on the transient errors available in the base
Diameter protocol. Diameter offers no mechanism to allow a node to Diameter protocol. Diameter offers no mechanism to allow a node to
indicate different overload states for different categories of indicate different overload states for different categories of
messages, for example, if it is overloaded for one Diameter messages, for example, if it is overloaded for one Diameter
application but not another. application but not another.
4.1. Problems with Implicit Mechanism 5.1. Problems with Implicit Mechanism
The implicit mechanism doesn't allow an agent or server to inform the The implicit mechanism doesn't allow an agent or server to inform the
client of a problem until it is effectively too late to do anything client of a problem until it is effectively too late to do anything
about it. The client does not know to take action until the upstream about it. The client does not know to take action until the upstream
node has effectively failed. A Diameter node has no opportunity to node has effectively failed. A Diameter node has no opportunity to
shed load early to avoid collapse in the first place. shed load early to avoid collapse in the first place.
Additionally, the implicit mechanism cannot distinguish between Additionally, the implicit mechanism cannot distinguish between
overload of a Diameter node and network congestion. Diameter treats overload of a Diameter node and network congestion. Diameter treats
the failure to receive an answer as a transport failure. the failure to receive an answer as a transport failure.
4.2. Problems with Explicit Mechanisms 5.2. Problems with Explicit Mechanisms
The Diameter specification is ambiguous on how a client should handle The Diameter specification is ambiguous on how a client should handle
receipt of a DIAMETER_TOO_BUSY response. The base specification receipt of a DIAMETER_TOO_BUSY response. The base specification
[RFC6733] indicates that the sending client should attempt to send [RFC6733] indicates that the sending client should attempt to send
the request to a different peer. It makes no suggestion that the the request to a different peer. It makes no suggestion that the
receipt of a DIAMETER_TOO_BUSY response should affect future Diameter receipt of a DIAMETER_TOO_BUSY response should affect future Diameter
messages in any way. messages in any way.
The Authentication, Authorization, and Accounting (AAA) Transport The Authentication, Authorization, and Accounting (AAA) Transport
Profile [RFC3539] recommends that a AAA node that receives a "Busy" Profile [RFC3539] recommends that a AAA node that receives a "Busy"
skipping to change at page 16, line 30 skipping to change at page 18, line 27
DIAMETER_UNABLE_TO_DELIVER, or using DPR with cause code BUSY also DIAMETER_UNABLE_TO_DELIVER, or using DPR with cause code BUSY also
have no mechanisms for specifying the scope or cause of the failure, have no mechanisms for specifying the scope or cause of the failure,
or the durational validity. or the durational validity.
The issues with error responses in [RFC6733] extend beyond the The issues with error responses in [RFC6733] extend beyond the
particular issues for overload control and have been addressed in an particular issues for overload control and have been addressed in an
ad hoc fashion by various implementations. Addressing these in a ad hoc fashion by various implementations. Addressing these in a
standard way would be a useful exercise, but it us beyond the scope standard way would be a useful exercise, but it us beyond the scope
of this document. of this document.
5. Diameter Overload Case Studies
5.1. Overload in Mobile Data Networks
As the number of Third Generation (3G) and Long Term Evolution (LTE)
enabled smartphone devices continue to expand in mobile networks,
there have been situations where high signaling traffic load led to
overload events at the Diameter-based Home Location Registries (HLR)
and/or Home Subscriber Servers (HSS) [TR23.843]. The root causes of
the HLR congestion events were manifold but included hardware failure
and procedural errors. The result was high signaling traffic load on
the HLR and HSS.
The 3GPP architecture [TS23.002] makes extensive use of Diameter. It
is used for mobility management [TS29.272] (and others), (IP
Multimedia Subsystem) IMS [TS29.228] (and others), policy and
charging control [TS29.212] (and others) as well as other functions.
The details of the architecture are out of scope for this document,
but it is worth noting that there are quite a few Diameter
applications, some with quite large amounts of Diameter signaling in
deployed networks.
The 3GPP specifications do not currently address overload for
Diameter applications or provide an equivalent load control mechanism
to those provided in the more traditional SS7 elements in (Global
System for Mobile Communications) GSM [TS29.002]. The capabilities
specified in the 3GPP standards do not adequately address the
abnormal condition where excessively high signaling traffic load
situations are experienced.
Smartphones, an increasingly large percentage of mobile devices,
contribute much more heavily, relative to non-smartphones, to the
continuation of a registration surge due to their very aggressive
registration algorithms. Smartphone behavior contributes to network
loading and can contribute to overload conditions. The aggressive
smartphone logic is designed to:
a. always have voice and data registration, and
b. constantly try to be on 3G or LTE data (and thus on 3G voice or
VoLTE [IR.92]) for their added benefits.
Non-smartphones typically have logic to wait for a time period after
registering successfully on voice and data.
The smartphone aggressive registration is problematic in two ways:
o first by generating excessive signaling load towards the HSS that
is ten times that from a non-smartphone,
o and second by causing continual registration attempts when a
network failure affects registrations through the 3G data network.
5.2. 3GPP Study on Core Network Overload
A study in 3GPP SA2 on core network overload has produced the
technical report [TR23.843]. This enumerates several causes of
overload in mobile core networks including portions that are signaled
using Diameter. This document is a work in progress and is not
complete. However, it is useful for pointing out scenarios and the
general need for an overload control mechanism for Diameter.
It is common for mobile networks to employ more than one radio
technology and to do so in an overlay fashion with multiple
technologies present in the same location (such as 2nd or 3rd
generation mobile technologies along with LTE). This presents
opportunities for traffic storms when issues occur on one overlay and
not another as all devices that had been on the overlay with issues
switch. This causes a large amount of Diameter traffic as locations
and policies are updated.
Another scenario called out by this study is a flood of registration
and mobility management events caused by some element in the core
network failing. This flood of traffic from end nodes falls under
the network initiated traffic flood category. There is likely to
also be traffic resulting directly from the component failure in this
case. A similar flood can occur when elements or components recover
as well.
Subscriber initiated traffic floods are also indicated in this study
as an overload mechanism where a large number of mobile devices
attempting to access services at the same time, such as in response
to an entertainment event or a catastrophic event.
While this 3GPP study is concerned with the broader effects of these
scenarios on wireless networks and their elements, they have
implications specifically for Diameter signaling. One of the goals
of this document is to provide guidance for a core mechanism that can
be used to mitigate the scenarios called out by this study.
6. Extensibility and Application Independence 6. Extensibility and Application Independence
Given the variety of scenarios Diameter elements can be deployed in, Given the variety of scenarios Diameter elements can be deployed in,
and the variety of roles they can fulfill with Diameter and other and the variety of roles they can fulfill with Diameter and other
technologies, a single algorithm for handling overload may not be technologies, a single algorithm for handling overload may not be
sufficient. This effort cannot anticipate all possible future sufficient. This effort cannot anticipate all possible future
scenarios and roles. Extensibility, particularly of algorithms used scenarios and roles. Extensibility, particularly of algorithms used
to deal with overload, will be important to cover these cases. to deal with overload, will be important to cover these cases.
Similarly, the scopes that overload information may apply to may Similarly, the scopes that overload information may apply to may
skipping to change at page 18, line 50 skipping to change at page 19, line 9
Diameter applications might, however, benefit from application- Diameter applications might, however, benefit from application-
specific behavior over and above the mechanism's defaults. For specific behavior over and above the mechanism's defaults. For
example, an application specification might specify relative example, an application specification might specify relative
priorities of messages or selection of a specific overload control priorities of messages or selection of a specific overload control
algorithm. algorithm.
7. Solution Requirements 7. Solution Requirements
This section proposes requirements for an improved mechanism to This section proposes requirements for an improved mechanism to
control Diameter overload, with the goals of improving the issues control Diameter overload, with the goals of improving the issues
described in Section 4 and supporting the scenarios described in described in Section 5 and supporting the scenarios described in
Section 2 Section 2
REQ 1: The overload control mechanism MUST provide a communication REQ 1: The solution MUST provide a communication method for
method for Diameter nodes to exchange load and overload Diameter nodes to exchange load and overload information.
information.
REQ 2: The mechanism MUST allow Diameter nodes to support overload REQ 2: The solution MUST allow Diameter nodes to support overload
control regardless of which Diameter applications they control regardless of which Diameter applications they
support. Diameter clients must be able to use the received support. Diameter clients and agents must be able to use
load and overload information to support graceful behavior the received load and overload information to support
during an overload condition. Graceful behavior under graceful behavior during an overload condition. Graceful
overload conditions is best described by REQ 3. behavior under overload conditions is best described by REQ
3.
REQ 3: The overload control mechanism MUST limit the impact of REQ 3: The solution MUST limit the impact of overload on the
overload on the overall useful throughput of a Diameter overall useful throughput of a Diameter server, even when
server, even when the incoming load on the network is far in the incoming load on the network is far in excess of its
excess of its capacity. The overall useful throughput under capacity. The overall useful throughput under load is the
load is the ultimate measure of the value of an overload ultimate measure of the value of a solution.
control mechanism.
REQ 4: Diameter allows requests to be sent from either side of a REQ 4: Diameter allows requests to be sent from either side of a
connection and either side of a connection may have need to connection and either side of a connection may have need to
provide its overload status. The mechanism MUST allow each provide its overload status. The solution MUST allow each
side of a connection to independently inform the other of side of a connection to independently inform the other of
its overload status. its overload status.
REQ 5: Diameter allows nodes to determine their peers via dynamic REQ 5: Diameter allows nodes to determine their peers via dynamic
discovery or manual configuration. The mechanism MUST work discovery or manual configuration. The solution MUST work
consistently without regard to how peers are determined. consistently without regard to how peers are determined.
REQ 6: The mechanism designers SHOULD seek to minimize the amount REQ 6: The solution designers SHOULD seek to minimize the amount of
of new configuration required in order to work. For new configuration required in order to work. For example,
example, it is better to allow peers to advertise or it is better to allow peers to advertise or negotiate
negotiate support for the mechanism, rather than to require support for the solution, rather than to require this
this knowledge to be configured at each node. knowledge to be configured at each node.
REQ 7: The overload control mechanism and any associated default REQ 7: The solution and any associated default algorithm(s) MUST
algorithm(s) MUST ensure that the system remains stable. At ensure that the system remains stable. At some point after
some point after an overload condition has ended, the an overload condition has ended, the solution MUST enable
mechanism MUST enable capacity to stabilize and become equal capacity to stabilize and become equal to what it would be
to what it would be in the absence of an overload condition. in the absence of an overload condition. Note that this
Note that this also requires that the mechanism MUST allow also requires that the solution MUST allow nodes to shed
nodes to shed load without introducing non converging load without introducing non converging oscillations during
oscillations during or after an overload condition. or after an overload condition.
REQ 8: Supporting nodes MUST be able to distinguish current REQ 8: Supporting nodes MUST be able to distinguish current
overload information from stale information, and SHOULD make overload information from stale information.
decisions using the most currently available information.
REQ 9: The mechanism MUST function across fully loaded as well as REQ 9: The solution MUST function across fully loaded as well as
quiescent transport connections. This is partially derived quiescent transport connections. This is partially derived
from the requirement for stability in REQ 7. from the requirement for stability in REQ 7.
REQ 10: Consumers of overload information MUST be able to determine REQ 10: Consumers of overload information MUST be able to determine
when the overload condition improves or ends. when the overload condition improves or ends.
REQ 11: The overload control mechanism MUST be able to operate in REQ 11: The solution MUST be able to operate in networks of
networks of different sizes. different sizes.
REQ 12: When a single network node fails, goes into overload, or REQ 12: When a single network node fails, goes into overload, or
suffers from reduced processing capacity, the mechanism MUST suffers from reduced processing capacity, the solution MUST
make it possible to limit the impact of this on other nodes make it possible to limit the impact of this on other nodes
in the network. This helps to prevent a small-scale failure in the network. This helps to prevent a small-scale failure
from becoming a widespread outage. from becoming a widespread outage.
REQ 13: The mechanism MUST NOT introduce substantial additional work REQ 13: The solution MUST NOT introduce substantial additional work
for node in an overloaded state. For example, a requirement for node in an overloaded state. For example, a requirement
for an overloaded node to send overload information every for an overloaded node to send overload information every
time it received a new request would introduce substantial time it received a new request would introduce substantial
work. Existing messaging is likely to have the work. Existing messaging is likely to have the
characteristic of increasing as an overload condition characteristic of increasing as an overload condition
approaches, allowing for the possibility of increased approaches, allowing for the possibility of increased
feedback for information piggybacked on it. feedback for information piggybacked on it.
REQ 14: Some scenarios that result in overload involve a rapid REQ 14: Some scenarios that result in overload involve a rapid
increase of traffic with little time between normal levels increase of traffic with little time between normal levels
and overload inducing levels. The mechanism SHOULD provide and overload inducing levels. The solution SHOULD provide
for rapid feedback when traffic levels increase. for rapid feedback when traffic levels increase.
REQ 15: The mechanism MUST NOT interfere with the congestion control REQ 15: The solution MUST NOT interfere with the congestion control
mechanisms of underlying transport protocols. For example, mechanisms of underlying transport protocols. For example,
a mechanism that opened additional TCP connections when the a solution that opened additional TCP connections when the
network is congested would reduce the effectiveness of the network is congested would reduce the effectiveness of the
underlying congestion control mechanisms. underlying congestion control mechanisms.
REQ 16: The overload control mechanism is likely to be deployed REQ 16: The solution is likely to be deployed incrementally. The
incrementally. The mechanism MUST support a mixed solution MUST support a mixed environment where some, but
environment where some, but not all, nodes implement it. not all, nodes implement it.
REQ 17: In a mixed environment with nodes that support the overload REQ 17: In a mixed environment with nodes that support the solution
control mechanism and that do not, the mechanism MUST result and that do not, the solution MUST NOT result in materially
in at least as much useful throughput as would have resulted less useful throughput as would have resulted if the
if the mechanism were not present. It SHOULD result in less solution were not present. It SHOULD result in less severe
severe congestion in this environment. congestion in this environment.
REQ 18: In a mixed environment of nodes that support the overload REQ 18: In a mixed environment of nodes that support the solution
control mechanism and that do not, the mechanism MUST NOT and that do not, the solution MUST NOT preclude elements
preclude elements that support overload control from that support overload control from treating elements that do
treating elements that do not support overload control in a not support overload control in a equitable fashion relative
equitable fashion relative to those that do. Users and to those that do. Users and operators of nodes that do not
operators of nodes that do not support the mechanism MUST support the solution MUST NOT unfairly benefit from the
NOT unfairly benefit from the mechanism. The mechanism solution. The solution specification SHOULD provide
specification SHOULD provide guidance to implementors for guidance to implementors for dealing with elements not
dealing with elements not supporting overload control. supporting overload control.
REQ 19: It MUST be possible to use the mechanism between nodes in REQ 19: It MUST be possible to use the solution between nodes in
different realms and in different administrative domains. different realms and in different administrative domains.
REQ 20: Any explicit overload indication MUST be clearly REQ 20: Any explicit overload indication MUST be clearly
distinguishable from other errors reported via Diameter. distinguishable from other errors reported via Diameter.
REQ 21: In cases where a network node fails, is so overloaded that REQ 21: In cases where a network node fails, is so overloaded that
it cannot process messages, or cannot communicate due to a it cannot process messages, or cannot communicate due to a
network failure, it may not be able to provide explicit network failure, it may not be able to provide explicit
indications of the nature of the failure or its levels of indications of the nature of the failure or its levels of
congestion. The mechanism MUST result in at least as much congestion. The solution MUST result in at least as much
useful throughput as would have resulted if the overload useful throughput as would have resulted if the solution was
control mechanism was not in place. not in place.
REQ 22: The mechanism MUST provide a way for a node to throttle the REQ 22: The solution MUST provide a way for a node to throttle the
amount of traffic it receives from a peer node. This amount of traffic it receives from a peer node. This
throttling SHOULD be graded so that it can be applied throttling SHOULD be graded so that it can be applied
gradually as offered load increases. Overload is not a gradually as offered load increases. Overload is not a
binary state; there may be degrees of overload. binary state; there may be degrees of overload.
REQ 23: The mechanism MUST provide sufficient information to enable REQ 23: The solution MUST provide sufficient information to enable a
a load balancing node to divert messages that are rejected load balancing node to divert messages that are rejected or
or otherwise throttled by an overloaded upstream node to otherwise throttled by an overloaded upstream node to other
other upstream nodes that are the most likely to have upstream nodes that are the most likely to have sufficient
sufficient capacity to process them. capacity to process them.
REQ 24: The mechanism MUST provide a mechanism for indicating load REQ 24: The solution MUST provide a mechanism for indicating load
levels even when not in an overloaded condition, to assist levels even when not in an overloaded condition, to assist
nodes making decisions to prevent overload conditions from nodes making decisions to prevent overload conditions from
occurring. occurring.
REQ 25: The base specification for the overload control mechanism REQ 25: The base specification for the solution SHOULD offer general
SHOULD offer general guidance on which message types might guidance on which message types might be desirable to send
be desirable to send or process over others during times of or process over others during times of overload, based on
overload, based on application-specific considerations. For application-specific considerations. For example, it may be
example, it may be more beneficial to process messages for more beneficial to process messages for existing sessions
existing sessions ahead of new sessions. Some networks may ahead of new sessions. Some networks may have a requirement
have a requirement to give priority to requests associated to give priority to requests associated with emergency
with emergency sessions. Any normative or otherwise sessions. Any normative or otherwise detailed definition of
detailed definition of the relative priorities of message the relative priorities of message types during an overload
types during an overload condition will be the condition will be the responsibility of the application
responsibility of the application specification. specification.
REQ 26: The mechanism MUST NOT prevent a node from prioritizing REQ 26: The solution MUST NOT prevent a node from prioritizing
requests based on any local policy, so that certain requests requests based on any local policy, so that certain requests
are given preferential treatment, given additional are given preferential treatment, given additional
retransmission, not throttled, or processed ahead of others. retransmission, not throttled, or processed ahead of others.
REQ 27: The overload control mechanism MUST NOT provide new REQ 27: The solution MUST NOT provide new vulnerabilities to
vulnerabilities to malicious attack, or increase the malicious attack, or increase the severity of any existing
severity of any existing vulnerabilities. This includes vulnerabilities. This includes vulnerabilities to DoS and
vulnerabilities to DoS and DDoS attacks as well as replay DDoS attacks as well as replay and man-in-the middle
and man-in-the middle attacks. Note that the Diameter base attacks. Note that the Diameter base specification
specification [RFC6733] lacks end to end security and this [RFC6733] lacks end to end security and this must be
must be considered. considered.
REQ 28: The mechanism MUST NOT depend on being deployed in REQ 28: The solution MUST NOT depend on being deployed in
environments where all Diameter nodes are completely environments where all Diameter nodes are completely
trusted. It SHOULD operate as effectively as possible in trusted. It SHOULD operate as effectively as possible in
environments where other nodes are malicious; this includes environments where other nodes are malicious; this includes
preventing malicious nodes from obtaining more than a fair preventing malicious nodes from obtaining more than a fair
share of service. Note that this does not imply any share of service. Note that this does not imply any
responsibility on the mechanism to detect, or take responsibility on the solution to detect, or take
countermeasures against, malicious nodes. countermeasures against, malicious nodes.
REQ 29: It MUST be possible for a supporting node to make REQ 29: It MUST be possible for a supporting node to make
authorization decisions about what information will be sent authorization decisions about what information will be sent
to peer nodes based on the identity of those nodes. This to peer nodes based on the identity of those nodes. This
allows a domain administrator who considers the load of allows a domain administrator who considers the load of
their nodes to be sensitive information to restrict access their nodes to be sensitive information to restrict access
to that information. Of course, in such cases, there is no to that information. Of course, in such cases, there is no
expectation that the overload control mechanism itself will expectation that the solution itself will help prevent
help prevent overload from that peer node. overload from that peer node.
REQ 30: The mechanism MUST NOT interfere with any Diameter compliant REQ 30: The solution MUST NOT interfere with any Diameter compliant
method that a node may use to protect itself from overload method that a node may use to protect itself from overload
from non-supporting nodes, or from denial of service from non-supporting nodes, or from denial of service
attacks. attacks.
REQ 31: There are multiple situations where a Diameter node may be REQ 31: There are multiple situations where a Diameter node may be
overloaded for some purposes but not others. For example, overloaded for some purposes but not others. For example,
this can happen to an agent or server that supports multiple this can happen to an agent or server that supports multiple
applications, or when a server depends on multiple external applications, or when a server depends on multiple external
resources, some of which may become overloaded while others resources, some of which may become overloaded while others
are fully available. The mechanism MUST allow Diameter are fully available. The solution MUST allow Diameter nodes
nodes to indicate overload with sufficient granularity to to indicate overload with sufficient granularity to allow
allow clients to take action based on the overloaded clients to take action based on the overloaded resources
resources without unreasonably forcing available capacity to without unreasonably forcing available capacity to go
go unused. The mechanism MUST support specification of unused. The solution MUST support specification of overload
overload information with granularities of at least information with granularities of at least "Diameter node",
"Diameter node", "realm", and "Diameter application", and "realm", and "Diameter application", and MUST allow
MUST allow extensibility for others to be added in the extensibility for others to be added in the future.
future.
REQ 32: The mechanism MUST provide a method for extending the REQ 32: The solution MUST provide a method for extending the
information communicated and the algorithms used for information communicated and the algorithms used for
overload control. overload control.
REQ 33: The mechanism MUST provide a default algorithm that is REQ 33: The solution MUST provide a default algorithm that is
mandatory to implement. mandatory to implement.
REQ 34: The mechanism SHOULD provide a method for exchanging REQ 34: The solution SHOULD provide a method for exchanging overload
overload and load information between elements that are and load information between elements that are connected by
connected by intermediaries that do not support the intermediaries that do not support the solution.
mechanism.
8. IANA Considerations 8. IANA Considerations
This document makes no requests of IANA. This document makes no requests of IANA.
9. Security Considerations 9. Security Considerations
A Diameter overload control mechanism is primarily concerned with the A Diameter overload control mechanism is primarily concerned with the
load and overload related behavior of nodes in a Diameter network, load and overload related behavior of nodes in a Diameter network,
and the information used to affect that behavior. Load and overload and the information used to affect that behavior. Load and overload
skipping to change at page 24, line 14 skipping to change at page 24, line 5
information carried by Diameter is sent inappropriately. information carried by Diameter is sent inappropriately.
Note that the Diameter base specification [RFC6733] lacks end to end Note that the Diameter base specification [RFC6733] lacks end to end
security, making verifying the authenticity and ownership of load and security, making verifying the authenticity and ownership of load and
overload information difficult for non-adjacent nodes. overload information difficult for non-adjacent nodes.
Authentication of load and overload information helps to alleviate Authentication of load and overload information helps to alleviate
several of the security issues listed in this section. several of the security issues listed in this section.
This document includes requirements intended to mitigate the effects This document includes requirements intended to mitigate the effects
of attacks and to protect the information used by the mechanism. of attacks and to protect the information used by the mechanism.
This section discusses potential security considerations for overload
control solutions. This discussion provides the motivation for
several normative requirements described in Section 7. The
discussion includes specific references to the normative requirements
that apply for each issue.
9.1. Access Control 9.1. Access Control
To control the visibility of load and overload information, sending To control the visibility of load and overload information, sending
should be subject to some form of authentication and authorization of should be subject to some form of authentication and authorization of
the receiver. It is also important to the receivers that they are the receiver. It is also important to the receivers that they are
confident the load and overload information they receive is from a confident the load and overload information they receive is from a
legitimate source. Note that this implies a certain amount of legitimate source. REQ 28 requires the solution to work without
configurability on the nodes supporting the Diameter overload control assuming that all Diameter nodes in a network are trusted for the
mechanism. purposes of exchanging overload and load information. REQ 29
requires the solution to let nodes restrict unauthorized parties from
seeing overload information. Note that this implies a certain amount
of configurability on the nodes supporting the Diameter overload
control mechanism.
9.2. Denial-of-Service Attacks 9.2. Denial-of-Service Attacks
An overload control mechanism provides a very attractive target for An overload control mechanism provides a very attractive target for
denial-of-service attacks. A small number of messages may affect a denial-of-service attacks. A small number of messages may affect a
large service disruption by falsely reporting overload conditions. large service disruption by falsely reporting overload conditions.
Alternately, attacking servers nearing, or in, overload may also be Alternately, attacking servers nearing, or in, overload may also be
facilitated by disrupting their overload indications, potentially facilitated by disrupting their overload indications, potentially
preventing them from mitigating their overload condition. preventing them from mitigating their overload condition.
A design goal for the Diameter overload control mechanism is to A design goal for the Diameter overload control mechanism is to
minimize or eliminate the possibility of using the mechanism for this minimize or eliminate the possibility of using the mechanism for this
type of attack. type of attack. More strongly, REQ 27 forbids the solution from
introducing new vulnerabilities to malicious attack. Additionally,
REQ 30 stipulates that the solution not interfere with other
mechanisms used for protection against denial of service attacks.
As the intent of some denial-of-service attacks is to induce overload As the intent of some denial-of-service attacks is to induce overload
conditions, an effective overload control mechanism should help to conditions, an effective overload control mechanism should help to
mitigate the effects of an such an attack. mitigate the effects of an such an attack.
9.3. Replay Attacks 9.3. Replay Attacks
An attacker that has managed to obtain some messages from the An attacker that has managed to obtain some messages from the
overload control mechanism may attempt to affect the behavior of overload control mechanism may attempt to affect the behavior of
nodes supporting the mechanism by sending those messages at nodes supporting the mechanism by sending those messages at
potentially inopportune times. In addition to time shifting, replay potentially inopportune times. In addition to time shifting, replay
attacks may send messages to other nodes as well (target shifting). attacks may send messages to other nodes as well (target shifting).
A design goal for the Diameter overload control mechanism is to A design goal for the Diameter overload control solution is to
minimize or eliminate the possibility of causing disruption by using minimize or eliminate the possibility of causing disruption by using
a replay attack on the Diameter overload control mechanism. a replay attack on the Diameter overload control mechanism.
(Allowing a replay attack using the overload control solution would
violate REQ 27.)
9.4. Man-in-the-Middle Attacks 9.4. Man-in-the-Middle Attacks
By inserting themselves in between two nodes supporting the Diameter By inserting themselves in between two nodes supporting the Diameter
overload control mechanism, an attacker may potentially both access overload control mechanism, an attacker may potentially both access
and alter the information sent between those nodes. This can be used and alter the information sent between those nodes. This can be used
for information gathering for business intelligence and attack for information gathering for business intelligence and attack
targeting, as well as direct attacks. targeting, as well as direct attacks.
A design goal for the Diameter overload control mechanism is to REQs 27, 28, and 29 imply a need to prevent man-in-the-middle attacks
minimize or eliminate the possibility of causing disruption man-in- on the overload control solution. A transport using TLS and/or IPSEC
the-middle attacks on the Diameter overload control mechanism. A may be desirable for this purpose.
transport using TLS and/or IPSEC may be desirable for this.
9.5. Compromised Hosts 9.5. Compromised Hosts
A compromised host that supports the Diameter overload control A compromised host that supports the Diameter overload control
mechanism could be used for information gathering as well as for mechanism could be used for information gathering as well as for
sending malicious information to any Diameter node that would sending malicious information to any Diameter node that would
normally accept information from it. While is is beyond the scope of normally accept information from it. While is is beyond the scope of
the Diameter overload control mechanism to mitigate any operational the Diameter overload control mechanism to mitigate any operational
interruption to the compromised host, a reasonable design goal is to interruption to the compromised host, REQs 28 and 29 imply a need to
minimize the impact that a compromised host can have on other nodes minimize the impact that a compromised host can have on other nodes
through the use of the Diameter overload control mechanism. Of through the use of the Diameter overload control mechanism. Of
course, a compromised host could be used to cause damage in a number course, a compromised host could be used to cause damage in a number
of other ways. This is out of scope for a Diameter overload control of other ways. This is out of scope for a Diameter overload control
mechanism. mechanism.
10. References 10. References
10.1. Normative References 10.1. Normative References
 End of changes. 68 change blocks. 
256 lines changed or deleted 289 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/