draft-ietf-mpls-oam-frmwk-00.txt   draft-ietf-mpls-oam-frmwk-01.txt 
Internet Draft David Allan, Editor Internet Draft David Allan, Editor
Document: draft-ietf-mpls-oam-frmwk-00.txt Nortel Networks Document: draft-ietf-mpls-oam-frmwk-01.txt Nortel Networks
Thomas D. Nadeau, Editor Thomas D. Nadeau, Editor
Cisco Systems, Inc. Cisco Systems, Inc.
Category: Informational Category: Informational
Expires: May 2005 November 2004 Expires: May 2005 November 2004
A Framework for MPLS Operations A Framework for MPLS Operations
and Management (OAM) and Management (OAM)
Status of this Memo Status of this Memo
skipping to change at page 2, line 27 skipping to change at page 2, line 27
7. Security.......................................................6 7. Security.......................................................6
8. Full Copyright Statement.......................................7 8. Full Copyright Statement.......................................7
9. Intellectual Property Rights Notices...........................7 9. Intellectual Property Rights Notices...........................7
10. References.....................................................7 10. References.....................................................7
11. Editors Address................................................8 11. Editors Address................................................8
1. Introduction and Scope 1. Introduction and Scope
This memo outlines in broader terms how data plane OAM functionality This memo outlines in broader terms how data plane OAM functionality
can assist in meeting the operations and management (OAM) can assist in meeting the operations and management (OAM)
requirements outlined in [REQ] and can apply to the operational requirements outlined in [MPLSREQS] and can apply to the operational
functions of fault, configuration, accounting, performance and functions of fault, configuration, accounting, performance and
security (commonly known as FCAPS). The approach of the document is security (commonly known as FCAPS) for MPLS networks as defined in
to outline the requisite functionality, the potential mechanisms to [RFC3031]. The approach of the document is
to outline the required functionality, the potential mechanisms to
provide the function and the applicability of data plane OAM provide the function and the applicability of data plane OAM
functions. functions.
2. Terminology 2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119. document are to be interpreted as described in RFC 2119.
OAM Operations and Management OAM Operations and Management
FCAPS Fault, Administration, Configuration, FCAPS Fault, Configuration, Administration,
Provisioning, and Security Provisioning, and Security
ILM Incoming Label Map ILM Incoming Label Map
NHLFE Next Hop Label Forwarding Entry NHLFE Next Hop Label Forwarding Entry
MIB Management Information Base MIB Management Information Base
LSR Label Switching Router LSR Label Switching Router
RTT Round Trip Time RTT Round Trip Time
3. Fault Management 3. Fault Management
3.1 Fault detection 3.1 Fault detection
Fault detection encompasses identifying all causes of failure to Fault detection encompasses identifying all causes of failure to
transfer information between the ingress and egress of an LSP transfer information between the ingress and egress of an LSP.
ingress. This section will enumerate common failure scenarios and This section will enumerate common failure scenarios and
explain how one might (or might not) detect the situation. explain how one might (or might not) detect the situation.
3.1.1 Enumeration and detection of types of data plane faults 3.1.1 Enumeration and detection of types of data plane faults
Physical layer faults: Physical layer faults:
Lower layer faults are those that impact the physical layer or Lower layer faults are those that impact the physical layer or
link layer that transports MPLS between adjacent LSRs. Some link layer that transports MPLS labeled packets between
physical links (such as SONET/SDH) may have link layer OAM adjacent LSRs. Some physical links (such as SONET/SDH) may
functionality and detect and notify the LSR of link layer have link layer OAM functionality and detect and notify the
faults directly. Some physical links (such as Ethernet) may not LSR of link layer faults directly. Some physical links (such
have this capability and require MPLS or IP layer heartbeats to as Ethernet) may not have this capability and require MPLS or
detect failures. However, once detected, reaction to these IP layer heartbeats to detect failures. However, once detected,
fault notifications is often the same as those described in the reaction to these fault notifications is often the same as
first case. those described in the first case.
Node failures: Node failures:
Node failures are those that impact the forwarding capability Node failures are those that impact the forwarding capability
of an entire node, including its entire set of links. This can of a node component, including its entire set of links. This
be due to component failure, power outage, or reset of control can be due to component failure, power outage, or reset of
processor in an LSR employing a distributed architecture, etc. control processor in an LSR employing a distributed
architecture, etc.
MPLS LSP misbranching: MPLS LSP misbranching:
Misbranching occurs when there is a loss of synchronization Misbranching occurs when there is a loss of synchronization
between the data and the control planes. This can occur due to between the data and the control planes in one or more nodes.
hardware failure, software failure or configuration problems. This can occur due to hardware failure, software failure or
configuration problems.
It will manifest itself in one of two forms: It will manifest itself in one of two forms:
- packets belonging to a particular LSP are cross connected - packets belonging to a particular LSP are cross connected
into a an NHLFE for which there is no corresponding ILM at into a an NHLFE for which there is no corresponding ILM at
the next downstream LSR. This can occur in cases where the the next downstream LSR. This can occur in cases where the
NHLFE entry is corrupted. Therefore the packet arrives at NHLFE entry is corrupted. Therefore the packet arrives at
the next LSR with a top label value for which the LSR has no the next LSR with a top label value for which the LSR has no
corresponding forwarding information, and is typically corresponding forwarding information, and is typically
dropped. This is a No Incoming Label Map (ILM) condition and dropped. This is a No Incoming Label Map (ILM) condition and
can be detected directly by the downstream LSR which can be detected directly by the downstream LSR which
receives the incorrectly labeled packet. receives the incorrectly labeled packet.
- packets belonging to a particular LSP are cross connected - packets belonging to a particular LSP are cross connected
into an incorrect NHLFE entry for which there is a into an incorrect NHLFE entry for which there is a
corresponding ILM at the next downstream LSR, but which was corresponding ILM at the next downstream LSR, but which was
is associated with a different L is associated with a different LSP. This may be detected by
SP. This may be detected by
a number of means: a number of means:
o some or all of the misdirected traffic is not routable o some or all of the misdirected traffic is not routable
at the egress node. at the egress node.
o Or OAM probing is able to detect the fault by detecting o Or OAM probing is able to detect the fault by detecting
the inconsistency between the path and the control the inconsistency between the path and the control
plane. plane.
Discontinuities in the MPLS Encapsulation Discontinuities in the MPLS Encapsulation
The forwarding path of the FEC carried by an LSP may transit The forwarding path of the FEC carried by an LSP may transit
nodes for which MPLS is not configured. This may result in a nodes for which MPLS is not configured. This may result in a
number of behaviors (most undesirable). When there was only one number of behaviors which are undesirable and not easily
label in the stack and the payload was IP, IP forwarding will detected. For example, if there is only one label in the stack
direct the packet to the correct interface. This would be the of a packet's MPLS encapsulation, and the payload is IP, the
same if PHP is employed. Packets with a label stack will be MPLS header may be removed prematurely at a node not
discarded (Tom: can you confirm this for your end). configured for MPLS forwarding on an outgoing interface. In
this case, the MPLS header would be popped (instead of
swapped) because there would be no outgoing label mapping due
to the outgoing line card not having MPLS enabled. At this
point, if the egress interface is configured for IP forwarding
and has a routing entry that matches the packet's destination,
the packet may still be able be successfully delivered
to the correct destination router. This scenario is not easily
detectable by the ends of the LSP since traffic is indeed
delivered.
MTU problems MTU problems
MTU problems occur when client traffic cannot be fragmented by MTU problems occur when client traffic cannot be fragmented by
intermediate LSRs, and is dropped somewhere along the path of intermediate LSRs, and is dropped somewhere along the path of
the LSP. MTU problems should appear as a discrepancy in the the LSP. MTU problems should appear as a discrepancy in the
traffic count between the set of ingresses and the egresses for traffic count between the set of ingresses and the egresses for
a FEC and will appear in the corresponding MIB performance a FEC and will appear in the corresponding MPLS MIB performance
tables in the transit LSRs as discarded packets. tables in the transit LSRs as discarded packets.
TTL Mishandling TTL Mishandling
Some Penultimate hop LSRs may consistently process TTL expiry Some Penultimate hop LSRs may consistently process TTL expiry
and propagation at penultimate hop LSRs. In these cases, it is and propagation at penultimate hop LSRs. In these cases, it is
possible for tools that rely on consistent processing to fail. possible for tools that rely on consistent processing to fail.
Congestion Congestion
Congestion occurs when the offered load on any interface Congestion occurs when the offered load on any interface
exceeds the link capacity for sufficient time that the exceeds the link capacity for sufficient time that the
interface buffering is exhausted. Congestion problems will interface buffering is exhausted. Congestion problems will
appear as a discrepancy in the traffic count between the set of appear as a discrepancy in the traffic count between the set of
ingresses and the egresses for a FEC and will appear in the MIB ingresses and the egresses for a FEC and will appear in the
performance tables in the transit LSRs as discarded packets. MPLS MIB performance tables in the transit LSRs as discarded
packets.
Misordering Misordering
Misordering of LSP traffic occurs when incorrect or Misordering of LSP traffic occurs when incorrect or
inappropriate load sharing is implemented within an MPLS inappropriate load sharing is implemented within an MPLS
network. Load sharing typically takes place when equal cost network. Load sharing typically takes place when equal cost
paths exist between the ingress and egress of an LSP. In these paths exist between the ingress and egress of an LSP. In these
cases, traffic is split among these equal cost paths using a cases, traffic is split among these equal cost paths using a
variety of algorithms. One such algorithm relies on splitting variety of algorithms. One such algorithm relies on splitting
traffic between each path on a per-packet basis. When this is traffic between each path on a per-packet basis. When this is
done, it is possible for some packets along the path to be done, it is possible for some packets along the path to be
skipping to change at page 5, line 18 skipping to change at page 5, line 33
LSRs do not normally implement mechanisms to detect misordering LSRs do not normally implement mechanisms to detect misordering
of flows. of flows.
Payload Corruption Payload Corruption
Payload corruption may occur and be undetectable by LSRs. Such Payload corruption may occur and be undetectable by LSRs. Such
errors are typically detected by client payload integrity errors are typically detected by client payload integrity
mechanisms. mechanisms.
3.1.2 Timeliness 3.1.2 Timeliness
The design of SLAs and systems requires that ample headroom be The design of SLAs and management support systems requires that
alloted in terms of their processing capabilites in order to ample headroom be alloted in terms of their processing capabilites
to process and handle all necessary fault conditions within the in order to process and handle all necessary fault conditions
bounds stipulated in the SLA. This includes planning for event hand within the bounds stipulated in the SLA. This includes planning for
ling using a time budget which takes into account the over-all SLA event hand ling using a time budget which takes into account the
and time to address any defects which arise. However, it is over-all SLA and time to address any defects which arise. However,
possible that some fault conditions may surpass this budget due it is possible that some fault conditions may surpass this budget
their catastrophic nature (i.e.: fibre cut) or due to misplanning due their catastrophic nature (i.e.: fibre cut) or due to
of the time processing budget. misplanning of the time processing budget.
^ -------------- ^ --------------
| | ^ | | ^
| | |---- Time to notify NOC + process/correct | | |---- Time to notify NOC + process/correct
SLA | | v defect SLA | | v defect
Max - | ------------- Max - | -------------
Time | | ^ Time | | ^
| | |----- Time to detect/diagnose fault | | |----- Time to dignose/isolate/correct
| | v | | v
v ------------- v -------------
Figure 1: Fault Correction Budget Figure 1: Fault Correction Budget
In figure 1, we represent the overall fault correction time budget In figure 1, we represent the overall fault correction time budget
by the maximum time as specified in an SLA for the service in by the maximum time as specified in an SLA for the service in
question. This time is then divided into two subsections, the first question. This time is then divided into two subsections, the first
encompassing the total time required to detect a fault and notify an encompassing the total time required to detect a fault and notify an
operator (or optionally automatically correct the defect). This operator (or optionally automatically correct the defect). This
section may have an explicit maximum time to detect defects arising section may have an explicit maximum time to detect defects arising
from either the application or a need to do alarm management (i.e.: from either the application or a need to do alarm management (i.e.:
supression) and this will be reflected in the frequency of OAM supression) and this will be reflected in the frequency of OAM
execution. The second section indicates the time required to notify execution. The second section indicates the time required to notify
the operational systems used to diagnose and correct the defect the operational systems used to diagnose, isolate and correct the
(if they cannot be corrected automatically). defect (if they cannot be corrected automatically).
3.2 Diagnosis 3.2 Diagnosis
3.2.1 Characterization 3.2.1 Characterization
Characterization is defined as determining the forwarding path of a Characterization is defined as determining the forwarding path of a
packet (which may not be necessarily known). Characterization may be packet (which may not be necessarily known). Characterization may be
performed on a working path through the network. This is done for performed on a working path through the network. This is done for
example, to determine ECMP paths, the MTU of a path, or simply to example, to determine ECMP paths, the MTU of a path, or simply to
know the path occupied by a specific FEC. Characterization will be know the path occupied by a specific FEC. Characterization will be
skipping to change at page 6, line 45 skipping to change at page 7, line 11
operation. Given that detection of faults is desired to happen as operation. Given that detection of faults is desired to happen as
quickly as possible, tools which posses the ability to incrementally quickly as possible, tools which posses the ability to incrementally
test LSP health should be used to uncover faults. test LSP health should be used to uncover faults.
3.3 Availability 3.3 Availability
Availability is the measure of the percentage of time that a service Availability is the measure of the percentage of time that a service
is operating within specification, often specified by an SLA. is operating within specification, often specified by an SLA.
MPLS has several forwarding modes (depending on the control plane MPLS has several forwarding modes (depending on the control plane
used). As such more than one availability models may be defined. used). As such more than one model may be defined.
4. Configuration Management 4. Configuration Management
Data plane OAM can assist in configuration management by providing Data plane OAM can assist in configuration management by providing
the ability to verify configuration of an LSP or of applications the ability to verify the configuration of an LSP or of applications
that may utilize that LSP. This would be an ad-hoc data plane probe utilizing that LSP. This would be an ad-hoc data plane probe
Internet Draft MPLS OAM Framework
November, 2004
that should both verify path integrity (a complete path exists) as that should both verify path integrity (a complete path exists) as
well as verifying that the path function is synchronized with the well as verifying that the path function is synchronized with the
control plane. The probe would carry as part of the payload relevant control plane. The probe would carry as part of the payload relevant
control plane information that the receiver would be able to compare control plane information that the receiver would be able to compare
with the local control plane configuration. with the local control plane configuration.
5. Accounting 5. Accounting
The requirements for accounting as specified in [MPLSREQS] do not The requirements for accounting in MPLS network as specified in
place any requirements on data plane OAM. [MPLSREQS] do not place any requirements on data plane OAM.
6. Performance measurement 6. Performance measurement
Performance measurement permits the information transfer Performance measurement permits the information transfer
characteristics of LSPs to be measured, perhaps in order to characteristics of LSPs to be measured, perhaps in order to
compare against an SLA. This falls into two categories, latency compare against an SLA. This falls into two categories, latency
(where jitter is considered a variation in latency) and information (where jitter is considered a variation in latency) and information
loss. loss.
Latency can be measured in two ways: one is to have precisely Latency can be measured in two ways: one is to have precisely
skipping to change at page 9, line 26 skipping to change at page 9, line 39
Copyright (C) The Internet Society (2004). This document is subject Copyright (C) The Internet Society (2004). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights. except as set forth therein, the authors retain all their rights.
12. Acknowledgment 12. Acknowledgment
Funding for the RFC Editor function is currently provided by the Funding for the RFC Editor function is currently provided by the
Internet Society. Internet Society.
The editors would like to thank Monique Morrow from Cisco Systems,
and Harmen van Der Linde from AT&T for their valuable review comments
on this document.
13. References 13. References
13.1 Normative References 13.1 Normative References
13.2 Informative References 13.2 Informative References
[RFC3031] Rosen, E., Viswanathan, A., and R. Callon, [RFC3031] Rosen, E., Viswanathan, A., and R. Callon,
"Multiprotocol Label Switching Architecture", RFC "Multiprotocol Label Switching Architecture", RFC
3031, January 2001. 3031, January 2001.
[ALLAN] Allan, D., "Guidelines for MPLS Load Balancing", [ALLAN] Allan, D., "Guidelines for MPLS Load Balancing",
draft-allan-mpls-loadbal-05.txt, IETF work in progress, draft-allan-mpls-loadbal-05.txt, IETF work in progress,
October 2003 October 2003
[MPLSREQS] Nadeau et.al., "OAM Requirements for MPLS Networks", [MPLSREQS] Nadeau et.al., "OAM Requirements for MPLS Networks",
draft-ietf-mpls-oam-requirements-01.txt, June 2003 draft-ietf-mpls-oam-requirements-05.txt, November 2004
[Y1710] ITU-T Recommendation Y.1710(2002), "Requirements for OAM [Y1710] ITU-T Recommendation Y.1710(2002), "Requirements for OAM
Functionality for MPLS Networks" Functionality for MPLS Networks"
14. Editors' Address 14. Editors' Address
David Allan David Allan
Nortel Networks Phone: +1-613-763-6362 Nortel Networks Phone: +1-613-763-6362
3500 Carling Ave. Email: dallan@nortelnetworks.com 3500 Carling Ave. Email: dallan@nortelnetworks.com
Ottawa, Ontario, CANADA Ottawa, Ontario, CANADA
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/