[Docs] [txt|pdf] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits]
Versions: (draft-allan-mpls-oam-frmwk) 00 01
02 03 04 05 RFC 4378
Internet Draft David Allan, Editor
Document: draft-ietf-mpls-oam-frmwk-05.txt Nortel Networks
Thomas D. Nadeau, Editor
Cisco Systems, Inc.
Category: Informational
Expires: May 2006 November 2005
A Framework for MPLS Operations
and Management (OAM)
Status of this Memo
By submitting this Internet-Draft, each author represents that
any applicable patent or other IPR claims of which he or she is
aware have been or will be disclosed, and any of which he or she
becomes aware will be disclosed, in accordance with Section 6 of
BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use
Internet-Drafts as reference material or to cite them other than
as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document is a framework for how data plane protocols can
be applied to operations and maintenance procedures for
Multi-Protocol Label Switching. The document is structured to
outline how Operations and Management functionality can be used to
assist in fault management, configuration, accounting, performance
management and security, commonly known by the acronym FCAPS.
Table of Contents
1. Introduction ...................................................2
2. Terminology.....................................................2
3. Fault Management................................................3
3.1 Fault detection...............................................3
3.1.1 Enumeration and detection of types of data plane faults.....3
MPLS Working Group Expires May 2006 [Page 1]
draft-ietf-mpls-oam-frmwk-05 December 6, 2005
3.1.2 Timeliness..................................................5
3.2 Diagnosis.....................................................6
3.2.1 Characterization............................................6
3.2.2 Isolation...................................................6
3.3 Availability..................................................7
4. Configuration Management.......................................7
5. Accounting Management..........................................7
6. Performance Management.........................................7
7. Security Management............................................8
8. IANA Considerations ...........................................8
9. Security Considerations .......................................8
10. Intellectual Property Statement................................8
11. Copyright statement............................................9
12. Acknowledgments ...............................................9
13. References.....................................................9
13.1 Normative References ..........................................9
13.2 Informative References ........................................9
14. Authors' Address..............................................10
1. Introduction
This memo outlines in broader terms how data plane protocols
can assist in meeting the operations and management (OAM)
requirements outlined in [MPLSREQS] and [Y1710] and can apply to
the management functions of fault, configuration, accounting,
performance and security (commonly known as FCAPS) for MPLS networks
as defined in [RFC3031]. The approach of the document is to outline
functionality, the potential mechanisms to provide the function and
the required applicability of data plane OAM functions. Included
in the discussion are security issues specific to use of tools
within a provider domain and use for inter provider LSPs.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
OAM Operations and Management
FCAPS Fault management, Configuration management,
Administration management, Performance
management, and Security management
FEC Forwarding Equivalence Class
ILM Incoming Label Map
NHLFE Next Hop Label Forwarding Entry
MIB Management Information Base
LSR Label Switching Router
MPLS Working Group Expires May 2006 [Page 2]
draft-ietf-mpls-oam-frmwk-05 December 6, 2005
RTT Round Trip Time
3. Fault Management
3.1 Fault detection
Fault detection encompasses the identification of all data
plane failures between the ingress and egress of an LSP.
This section will enumerate common failure scenarios and
explain how one might (or might not) detect the situation.
3.1.1 Enumeration and detection of types of data plane faults
Lower layer faults:
Lower layer faults are those in the physical or virtual link
that impact the transport of MPLS labeled packets between
adjacent LSRs at the specific level of interest. Some physical
links (such as SONET/SDH) may have link layer OAM functionality
and detect and notify the LSR of link layer faults directly.
Some physical links (such as Ethernet) may not have this
capability and require MPLS or IP layer heartbeats to detect
failures. However, once detected, reaction to these fault
notifications is often the same as those described in the first
case.
Node failures:
Node failures are those that impact the forwarding capability
of a node component, including its entire set of links. This
can be due to component failure, power outage, or reset of
control processor in an LSR employing a distributed
architecture, etc.
MPLS LSP mis-forwarding:
Mis-forwarding occurs when there is a loss of synchronization
between the data and the control planes in one or more nodes.
This can occur due to hardware failure, software failure or
configuration problems.
It will manifest itself in one of two forms:
- packets belonging to a particular LSP are cross-connected
into an NHLFE for which there is no corresponding ILM at
the next downstream LSR. This can occur in cases where the
NHLFE entry is corrupted. Therefore the packet arrives at
the next LSR with a top label value for which the LSR has no
MPLS Working Group Expires May 2006 [Page 3]
draft-ietf-mpls-oam-frmwk-05 December 6, 2005
corresponding forwarding information, and is typically
dropped. This is a No Incoming Label Map (No ILM) condition
and can be detected directly by the downstream LSR which
receives the incorrectly labeled packet.
- packets belonging to a particular LSP are cross-connected
into an incorrect NHLFE entry for which there is a
corresponding ILM at the next downstream LSR, but is
associated with a different LSP. This may be detected by
a number of means:
o some or all of the misdirected traffic is not routable
at the egress node.
o Or OAM probing is able to detect the fault by detecting
the inconsistency between the data path and the control
plane state.
Discontinuities in the MPLS Encapsulation
The forwarding path of the FEC carried by an LSP may transit
nodes or links for which MPLS is not configured. This may
result in a number of behaviors which are undesirable and not
easily detected
- if exposed payload is not routable at the LSR resulting in
silent discard OR
- the exposed MPLS label was not offered by the LSR which may
result in either silent discard or mis-forwarding
Alternately the payload may be routable and packets
successfully delivered but bypasses associated MPLS
instrumentation and tools.
MTU problems
MTU problems occur when client traffic cannot be fragmented by
intermediate LSRs, and is dropped somewhere along the path of
the LSP. MTU problems should appear as a discrepancy in the
traffic count between the set of ingress LSRs and the egress
LSRs for a FEC and will appear in the corresponding MPLS MIB
performance tables in the transit LSRs as discarded packets.
TTL Mishandling
The implementation of TTL handling is inconsistent at
penultimate hop LSRs. Tools that rely on consistent TTL
processing may produce inconsistent results in any given
network.
Congestion
Congestion occurs when the offered load on any interface
exceeds the link capacity for sufficient time that the
interface buffering is exhausted. Congestion problems will
MPLS Working Group Expires May 2006 [Page 4]
draft-ietf-mpls-oam-frmwk-05 December 6, 2005
appear as a discrepancy in the traffic count between the set of
ingress LSRs and the egress LSRs for a FEC and will appear in
the MPLS MIB performance tables in the transit LSRs as
discarded packets.
Mis-ordering
Mis-ordering of LSP traffic occurs when incorrect or
inappropriate load sharing is implemented within an MPLS
network. Load sharing typically takes place when equal cost
paths exist between the ingress and egress of an LSP. In these
cases, traffic is split among these equal cost paths using a
variety of algorithms. One such algorithm relies on splitting
traffic between each path on a per-packet basis. When this is
done, it is possible for some packets along the path to be
delayed due to congestion or slower links, which may result in
packets being received out of order at the egress. Detection
and remedy of this situation may be left up to client
applications that use the LSPs. For instance, TCP is capable of
re-ordering packets belonging to a specific flow (although this
may result in re-transmission of some of the mis-ordered
packets).
Detection of mis-ordering can also be determined by sending
probe traffic along the path and verifying that all probe
traffic is indeed received in the order it was transmitted.
This will only detect truly pathological problems as
mis-ordering typically is an insufficiently predictable and
repeatable problem.
LSRs do not normally implement mechanisms to detect
mis-ordering of flows.
Payload Corruption
Payload corruption may occur and be undetectable by LSRs. Such
errors are typically detected by client payload integrity
mechanisms.
3.1.2 Timeliness
The design of SLAs and management support systems requires that
ample headroom be alloted in terms of their processing capabilities
in order to process and handle all necessary fault conditions
within the bounds stipulated in the SLA. This includes planning for
event handling using a time budget which takes into account the
over-all SLA and time to address any defects which arise. However,
it is possible that some fault conditions may surpass this budget
due their catastrophic nature (e.g.: fibre cut) or due to
incorrect planning of the time processing budget.
MPLS Working Group Expires May 2006 [Page 5]
draft-ietf-mpls-oam-frmwk-05 December 6, 2005
^ --------------
| | ^
| | |---- Time to notify NOC + process/correct
SLA | | v defect
Max - | -------------
Time | | ^
| | |----- Time to diagnose/isolate/correct
| | v
v -------------
Figure 1: Fault Correction Budget
In figure 1, we represent the overall fault correction time budget
by the maximum time as specified in an SLA for the service in
question. This time is then divided into two subsections, the first
encompassing the total time required to detect a fault and notify an
operator (or optionally automatically correct the defect). This
section may have an explicit maximum time to detect defects arising
from either the application or a need to do alarm management (i.e.:
suppression) and this will be reflected in the frequency of OAM
execution. The second section indicates the time required to notify
the operational systems used to diagnose, isolate and correct the
defect (if they cannot be corrected automatically).
3.2 Diagnosis
3.2.1 Characterization
Characterization is defined as determining the forwarding path of a
packet (which may not be necessarily known). Characterization may be
performed on a working path through the network. This is done for
example, to determine ECMP paths, the MTU of a path, or simply to
know the path occupied by a specific FEC. Characterization will be
able to leverage mechanisms used for isolation.
3.2.2 Isolation
Isolation of a fault can occur in two forms. In the first case, the
local failure is detected, and the node where the failure occurred
is capable of issuing an alarm for such an event. The node should
attempt to withdraw the defective resources and/or rectify the
situation prior to raising an alarm. Active data plane OAM
mechanisms may also detect the failure conditions remotely and issue
their own alarms if the situation is not rectified quickly enough.
In the second case, the fault has not been detected locally. In this
case, the local node cannot raise an alarm, nor can it be expected
MPLS Working Group Expires May 2006 [Page 6]
draft-ietf-mpls-oam-frmwk-05 December 6, 2005
to rectify the situation. In this case, the failure may be detected
remotely via data plane OAM. This mechanism should also be able to
determine the location of the fault, perhaps on the basis of limited
information such as a customer complaint. This mechanism may also be
able to automatically remove the defective resources from the
network and restore service, but should at least provide a network
operator with enough information by which they can perform this
operation. Given that detection of faults is desired to happen as
quickly as possible, tools which posses the ability to incrementally
test LSP health should be used to uncover faults.
3.3 Availability
Availability is the measure of the percentage of time that a service
is operating within specification, often specified by an SLA.
MPLS has several forwarding modes (depending on the control plane
used). As such more than one model may be defined and require more
than one measurement technique.
4. Configuration Management
Data plane OAM can assist in configuration management by providing
the ability to verify the configuration of an LSP or of applications
utilizing that LSP. This would be an ad-hoc data plane probe
that should both verify path integrity (a complete path exists) as
well as verifying that the path function is synchronized with the
control plane. The probe would carry as part of the payload relevant
control plane information that the receiver would be able to compare
with the local control plane configuration.
5. Accounting
The requirements for accounting in MPLS networks as specified in
[MPLSREQS] do not place any requirements on data plane OAM.
6. Performance Management
Performance management permits the information transfer
characteristics of LSPs to be measured, perhaps in order to
compare against an SLA. This falls into two categories, latency
(where jitter is considered a variation in latency) and information
loss.
Latency can be measured in two ways: one is to have precisely
synchronized clocks at the ingress and egress such that time-stamps
in PDUs flowing from the ingress to the egress can be compared. The
MPLS Working Group Expires May 2006 [Page 7]
draft-ietf-mpls-oam-frmwk-05 December 6, 2005
other is to use an exchange of PING type PDUs that gives a round
trip time (RTT) measurement, and an estimate of the one way latency
can be inferred with some loss of precision. Use of load spreading
techniques such as ECMP mean that any individual RTT measurement is
only representative of the typical RTT for a FEC.
To measure information loss, a common practice is to periodically
read ingress and egress counters (i.e.: MIB module counters). This
information may also be used for offline correlation. Another common
practice is to send explicit probe traffic which traverses the data
plane path in question. This probe traffic can also be used to
measure jitter and delay.
7. Security Management
Providing a secure OAM environment is required if MPLS specific
network mechanisms are to be used successfully. To this end,
operators have a number of options when deploying network mechanisms
including simply filtering OAM messages at the edge of the MPLS
network. Malicious users should not be able to use non-MPLS
interfaces to insert MPLS specific OAM transactions. Provider
initiated OAM transactions should be able to be blocked from leaking
outside the MPLS cloud.
Finally, if a provider does wish to allow OAM messages to flow into
(or through) their networks, for example, in a multi-provider
deployment, authentication and authorization is required to prevent
malicious and/or unauthorized access. Also, given that MPLS networks
often run IP simultaneously, similar requirements apply to any
native IP OAM network mechanisms in use. Therefore, authentication
and authorization for OAM technologies is something that MUST be
considered when designing network mechanisms which satisfy the
framework presented in this document.
OAM messaging can address some existing security concerns with the
MPLS architecture. i.e. through rigorous defect handling operator's
can offer their customers a greater degree of integrity protection
that their traffic will not be incorrectly delivered (for example by
being able to detect leaking LSP traffic from a VPN).
Support for inter-provider data plane OAM messaging introduces a
number of security concerns as by definition, portions of LSPs will
not be within a single provider's network, the provider has no
control over who may inject traffic into the LSP which can be
exploited for denial of service attacks. OAM PDUs are not
explicitly identified in the MPLS header and therefore are not
typically inspected by transit LSRs. This creates opportunity for
malicious or poorly behaved users to disrupt network operations.
MPLS Working Group Expires May 2006 [Page 8]
draft-ietf-mpls-oam-frmwk-05 December 6, 2005
Attempts to introduce filtering on target LSP OAM flows may be
problematic if flows are not visible to intermediate LSRs. However
it may be possible to interdict flows on the return path between
providers (as faithfulness to the forwarding path is to a return
path requirement) to mitigate aspects of this vulnerability.
OAM tools may permit unauthorized or malicious users to extract
significant amounts of information about network configuration. This
would be especially true of IP based tools as in many network
configurations, MPLS does not typically extend to untrusted hosts,
but IP does. For example, TTL hiding at ingress and egress LSRs will
prevent external users from using TTL-based mechanisms to probe an
operator's network. This suggests that tools used for problem
diagnosis or which by design are capable of extracting significant
amounts of information will require authentication and authorization
of the originator. This may impact the scalability of such tools
when employed for monitoring instead of diagnosis.
8. IANA Considerations
This document does not contain any IANA considerations.
9. Security Considerations
This document describes a framework for MPLS Operations and
Management. Although this document discusses and addresses some
security concerns in section 7 above, it does not introduce any
new security concerns.
10. Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
MPLS Working Group Expires May 2006 [Page 9]
draft-ietf-mpls-oam-frmwk-05 December 6, 2005
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
11. Copyright Statement
Copyright (C) The Internet Society (2005).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
12. Acknowledgments
The editors would like to thank Monique Morrow from Cisco Systems,
and Harmen van Der Linde from AT&T for their valuable review comments
on this document.
13. References
13.1 Normative References
[RFC2119] Bradner, S., "Key Words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3031] Rosen, E., Viswanathan, A., and R. Callon,
"Multiprotocol Label Switching Architecture", RFC
3031, January 2001.
[MPLSREQS] Nadeau et.al., "OAM Requirements for MPLS Networks",
draft-ietf-mpls-oam-requirements-05.txt, November 2004
[Y1710] ITU-T Recommendation Y.1710(2002), "Requirements for OAM
Functionality for MPLS Networks"
13.2 Informative References
MPLS Working Group Expires May 2006 [Page 10]
draft-ietf-mpls-oam-frmwk-05 December 6, 2005
14. Authors' Addresses
David Allan
Nortel Networks Phone: +1-613-763-6362
3500 Carling Ave. Email: dallan@nortelnetworks.com
Ottawa, Ontario, CANADA
Thomas D. Nadeau
Cisco Systems Phone: +1-978-936-1470
300 Beaver Brook Drive Email: tnadeau@cisco.com
Boxborough, MA 01824
MPLS Working Group Expires May 2006 [Page 11]
Html markup produced by rfcmarkup 1.129d, available from
https://tools.ietf.org/tools/rfcmarkup/