--- 1/draft-ietf-mpls-oam-frmwk-01.txt 2006-02-05 00:42:04.000000000 +0100 +++ 2/draft-ietf-mpls-oam-frmwk-02.txt 2006-02-05 00:42:04.000000000 +0100 @@ -1,16 +1,17 @@ + Internet Draft David Allan, Editor - Document: draft-ietf-mpls-oam-frmwk-01.txt Nortel Networks + Document: draft-ietf-mpls-oam-frmwk-02.txt Nortel Networks Thomas D. Nadeau, Editor Cisco Systems, Inc. Category: Informational - Expires: May 2005 November 2004 + Expires: July 2005 January 2005 A Framework for MPLS Operations and Management (OAM) Status of this Memo By submitting this Internet-Draft, we certify that any applicable patent or other IPR claims of which we are aware have been disclosed, or will be disclosed, and any of which we become aware will be disclosed, in accordance with RFC 3668. @@ -33,57 +34,61 @@ Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract - This document is a framework for how data plane OAM functions can be - applied to operations and maintenance procedures. The document is - structured to outline how OAM functionality can be used to assist in - fault management, configuration, accounting, performance management - and security, commonly known by the acronym FCAPS. + This document is a framework for how data plane protocols can + be applied to operations and maintenance procedures for MPLS. The + document is structured to outline how OAM functionality can be used + to assist in fault management, configuration, accounting, + performance management and security, commonly known by the acronym + FCAPS. Table of Contents 1. Introduction and Scope ........................................2 2. Terminology....................................................2 - 3. Fault Management...............................................2 - 3.1 Fault detection...............................................2 + 3. Fault Management...............................................3 + 3.1 Fault detection...............................................3 3.1.1 Enumeration and detection of types of data plane faults.....3 3.1.2 Timeliness..................................................5 - 3.2 Diagnosis.....................................................5 - 3.2.1 Characterization............................................5 - 3.2.2 Isolation...................................................5 - 3.3 Availability..................................................5 - 4. Configuration Management.......................................5 - 5. Accounting.....................................................6 - 6. Performance measurement........................................6 - 7. Security.......................................................6 - 8. Full Copyright Statement.......................................7 - 9. Intellectual Property Rights Notices...........................7 - 10. References.....................................................7 - 11. Editors Address................................................8 + 3.2 Diagnosis.....................................................6 + 3.2.1 Characterization............................................6 + 3.2.2 Isolation...................................................6 + 3.3 Availability..................................................7 + 4. Configuration Management.......................................7 + 5. Accounting.....................................................7 + 6. Performance measurement........................................7 + 7. Security.......................................................8 + 8. Intellectual Property Statement................................8 + 9. Disclaimer of Validity.........................................9 + 10. Copyright statement............................................9 + 11. Acknowledgements...............................................9 + 12. References.....................................................9 + 11. Editors Address...............................................10 1. Introduction and Scope - This memo outlines in broader terms how data plane OAM functionality + This memo outlines in broader terms how data plane protocols can assist in meeting the operations and management (OAM) requirements outlined in [MPLSREQS] and can apply to the operational functions of fault, configuration, accounting, performance and security (commonly known as FCAPS) for MPLS networks as defined in - [RFC3031]. The approach of the document is - to outline the required functionality, the potential mechanisms to - provide the function and the applicability of data plane OAM - functions. + [RFC3031]. The approach of the document is to outline the required + functionality, the potential mechanisms to provide the function and + the applicability of data plane OAM functions. Included in the + discussion are security issues specific to use of tools within a + provider domain and use for inter provider LSPs. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. OAM Operations and Management FCAPS Fault, Configuration, Administration, Provisioning, and Security @@ -85,111 +90,109 @@ document are to be interpreted as described in RFC 2119. OAM Operations and Management FCAPS Fault, Configuration, Administration, Provisioning, and Security ILM Incoming Label Map NHLFE Next Hop Label Forwarding Entry MIB Management Information Base LSR Label Switching Router RTT Round Trip Time - 3. Fault Management + 3.1 Fault detection Fault detection encompasses identifying all causes of failure to transfer information between the ingress and egress of an LSP. This section will enumerate common failure scenarios and explain how one might (or might not) detect the situation. 3.1.1 Enumeration and detection of types of data plane faults - Physical layer faults: + Lower layer faults: - Lower layer faults are those that impact the physical layer or - link layer that transports MPLS labeled packets between - adjacent LSRs. Some physical links (such as SONET/SDH) may - have link layer OAM functionality and detect and notify the - LSR of link layer faults directly. Some physical links (such - as Ethernet) may not have this capability and require MPLS or - IP layer heartbeats to detect failures. However, once detected, - reaction to these fault notifications is often the same as - those described in the first case. + Lower layer faults are those in the physical or virtual link + that impact the transport of MPLS labeled packets between + adjacent LSRs at the specific level of interest. Some physical + links (such as SONET/SDH) may have link layer OAM functionality + and detect and notify the LSR of link layer faults directly. + Some physical links (such as Ethernet) may not have this + capability and require MPLS or IP layer heartbeats to detect + failures. However, once detected, reaction to these fault + notifications is often the same as those described in the first + case. Node failures: Node failures are those that impact the forwarding capability of a node component, including its entire set of links. This can be due to component failure, power outage, or reset of control processor in an LSR employing a distributed architecture, etc. - MPLS LSP misbranching: + MPLS LSP misforwarding: - Misbranching occurs when there is a loss of synchronization + Misforwarding occurs when there is a loss of synchronization between the data and the control planes in one or more nodes. This can occur due to hardware failure, software failure or configuration problems. It will manifest itself in one of two forms: - - packets belonging to a particular LSP are cross connected - into a an NHLFE for which there is no corresponding ILM at + - packets belonging to a particular LSP are cross-connected + into an NHLFE for which there is no corresponding ILM at the next downstream LSR. This can occur in cases where the NHLFE entry is corrupted. Therefore the packet arrives at the next LSR with a top label value for which the LSR has no corresponding forwarding information, and is typically dropped. This is a No Incoming Label Map (ILM) condition and can be detected directly by the downstream LSR which receives the incorrectly labeled packet. - - packets belonging to a particular LSP are cross connected + - packets belonging to a particular LSP are cross-connected into an incorrect NHLFE entry for which there is a - corresponding ILM at the next downstream LSR, but which was - is associated with a different LSP. This may be detected by + corresponding ILM at the next downstream LSR, but is + associated with a different LSP. This may be detected by a number of means: o some or all of the misdirected traffic is not routable at the egress node. o Or OAM probing is able to detect the fault by detecting - the inconsistency between the path and the control - plane. + the inconsistency between the data path and the control + plane state. Discontinuities in the MPLS Encapsulation The forwarding path of the FEC carried by an LSP may transit - nodes for which MPLS is not configured. This may result in a - number of behaviors which are undesirable and not easily - detected. For example, if there is only one label in the stack - of a packet's MPLS encapsulation, and the payload is IP, the - MPLS header may be removed prematurely at a node not - configured for MPLS forwarding on an outgoing interface. In - this case, the MPLS header would be popped (instead of - swapped) because there would be no outgoing label mapping due - to the outgoing line card not having MPLS enabled. At this - point, if the egress interface is configured for IP forwarding - and has a routing entry that matches the packet's destination, - the packet may still be able be successfully delivered - to the correct destination router. This scenario is not easily - detectable by the ends of the LSP since traffic is indeed - delivered. + nodes or links for which MPLS is not configured. This may + result in a number of behaviors which are undesirable and not + easily detected + - if exposed payload is not routable at the LSR resulting in + silent discard OR + - the exposed MPLS label was not offered by the LSR which may + result in either silent discard or misforwarding + + Alternately the payload may be routable and packets + successfully delivered but bypasses associated MPLS + instrumentation and tools. MTU problems MTU problems occur when client traffic cannot be fragmented by intermediate LSRs, and is dropped somewhere along the path of the LSP. MTU problems should appear as a discrepancy in the traffic count between the set of ingresses and the egresses for a FEC and will appear in the corresponding MPLS MIB performance tables in the transit LSRs as discarded packets. TTL Mishandling - Some Penultimate hop LSRs may consistently process TTL expiry - and propagation at penultimate hop LSRs. In these cases, it is - possible for tools that rely on consistent processing to fail. + The implementation of TTL handling is inconsistent at + penultimate hop LSRs. Tools that rely on consistent TTL + processing may produce inconsistent results in any given + network. Congestion Congestion occurs when the offered load on any interface exceeds the link capacity for sufficient time that the interface buffering is exhausted. Congestion problems will appear as a discrepancy in the traffic count between the set of ingresses and the egresses for a FEC and will appear in the MPLS MIB performance tables in the transit LSRs as discarded packets. @@ -199,43 +202,47 @@ network. Load sharing typically takes place when equal cost paths exist between the ingress and egress of an LSP. In these cases, traffic is split among these equal cost paths using a variety of algorithms. One such algorithm relies on splitting traffic between each path on a per-packet basis. When this is done, it is possible for some packets along the path to be delayed due to congestion or slower links, which may result in packets being received out of order at the egress. Detection and remedy of this situation may be left up to client applications that use the LSPs. For instance, TCP is capable of - re-ordering packets belonging to a specific flow. Detection of - mis-ordering can also be determined by sending probe traffic - along the path and verifying that all probe traffic is indeed - received in the order it was transmitted. + re-ordering packets belonging to a specific flow. + + Detection of mis-ordering can also be determined by sending + probe traffic along the path and verifying that all probe + traffic is indeed received in the order it was transmitted. + This will only detect truly pathological problems as + misordering typically is an insufficiently predictable and + repeatable problem. LSRs do not normally implement mechanisms to detect misordering of flows. Payload Corruption Payload corruption may occur and be undetectable by LSRs. Such errors are typically detected by client payload integrity mechanisms. 3.1.2 Timeliness The design of SLAs and management support systems requires that ample headroom be alloted in terms of their processing capabilites in order to process and handle all necessary fault conditions within the bounds stipulated in the SLA. This includes planning for event hand ling using a time budget which takes into account the over-all SLA and time to address any defects which arise. However, it is possible that some fault conditions may surpass this budget - due their catastrophic nature (i.e.: fibre cut) or due to + due their catastrophic nature (e.g.: fibre cut) or due to misplanning of the time processing budget. ^ -------------- | | ^ | | |---- Time to notify NOC + process/correct SLA | | v defect Max - | ------------- Time | | ^ | | |----- Time to dignose/isolate/correct | | v @@ -287,36 +294,37 @@ operation. Given that detection of faults is desired to happen as quickly as possible, tools which posses the ability to incrementally test LSP health should be used to uncover faults. 3.3 Availability Availability is the measure of the percentage of time that a service is operating within specification, often specified by an SLA. MPLS has several forwarding modes (depending on the control plane - used). As such more than one model may be defined. + used). As such more than one model may be defined and require more + than one measurement technique. 4. Configuration Management Data plane OAM can assist in configuration management by providing the ability to verify the configuration of an LSP or of applications utilizing that LSP. This would be an ad-hoc data plane probe that should both verify path integrity (a complete path exists) as well as verifying that the path function is synchronized with the control plane. The probe would carry as part of the payload relevant control plane information that the receiver would be able to compare with the local control plane configuration. 5. Accounting - The requirements for accounting in MPLS network as specified in + The requirements for accounting in MPLS networks as specified in [MPLSREQS] do not place any requirements on data plane OAM. 6. Performance measurement Performance measurement permits the information transfer characteristics of LSPs to be measured, perhaps in order to compare against an SLA. This falls into two categories, latency (where jitter is considered a variation in latency) and information loss. @@ -331,57 +339,59 @@ To measure information loss, a common practice is to periodically read ingress and egress counters (i.e.: MIB module counters). This information may also be used for offline correlation. Another common practice is to send explicit probe traffic which traverses the data plane path in question. This probe traffic can also be used to measure jitter and delay. 7. Security - Support for intra-provider data plane OAM messaging does not - introduce any new security concerns to the MPLS architecture. - Though it does actually address some that already exist, i.e. - through rigorous defect handling operator's can offer their - customers a greater degree of integrity protection that their - traffic will not be misdelivered (for example by being able to - detect leaking LSP traffic from a VPN). + Providing a secure OAM environment does require that if MPLS + specific IP based tools are used, they can be filtered at the + edge of the MPLS network. Malicious users cannot use non-MPLS + interfaces to insert MPLS specific OAM transactions, and provider + initiated OAM transactions do not leak outside the MPLS cloud. + + OAM messaging does address existing security concerns with the + MPLS architecture. i.e. through rigorous defect handling operator's + can offer their customers a greater degree of integrity protection + that their traffic will not be misdelivered (for example by being + able to detect leaking LSP traffic from a VPN). Support for inter-provider data plane OAM messaging introduces a number of security concerns as by definition, portions of LSPs will not be in trusted space, the provider has no control over who may inject traffic into the LSP which can be exploited for denial of - service attacks. This creates opportunity for malicious - or poorly behaved users to disrupt network operations. Attempts to - introduce filtering on target LSP OAM flows may be problematic if - flows are not visible to intermediate LSRs. However it may be - possible to interdict flows on the return path between providers (as - faithfulness to the forwarding path is not a return path - requirement) to mitigate aspects of this vulnerability. + service attacks. OAM PDUs are not explicitly identified in the MPLS + header and therefore are not typically inspected by transit LSRs. + This creates opportunity for malicious or poorly behaved users to + disrupt network operations. Attempts to introduce filtering on + target LSP OAM flows may be problematic if flows are not visible + to intermediate LSRs. However it may be possible to interdict flows + on the return path between providers (as faithfulness to the + forwarding path is ot a return path requirement) to mitigate + aspects of this vulnerability. OAM tools may permit unauthorized or malicious users to extract significant amounts of information about network configuration. This would be especially true of IP based tools as in many network configurations, MPLS does not typically extend to untrusted hosts, but IP does. For example, TTL hiding at ingress and egress LSRs will prevent external users from using TTL-based mechanisms to probe an operator's network. This suggests that tools used for problem diagnosis or which by design are capable of extracting significant amounts of information will require authentication and authorization of the originator. This may impact the scalability of such tools when employed for monitoring instead of diagnosis. -8. Copyright Notice - - Copyright (C) The Internet Society (2004). All Rights Reserved. - -9. Intellectual Property Statement +8. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. @@ -391,66 +401,62 @@ such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. -10. Disclaimer of Validity +9. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. -11. Copyright Statement +10. Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. -12. Acknowledgment +11. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. The editors would like to thank Monique Morrow from Cisco Systems, and Harmen van Der Linde from AT&T for their valuable review comments on this document. - 13. References - - 13.1 Normative References - - 13.2 Informative References +12. References [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol Label Switching Architecture", RFC 3031, January 2001. [ALLAN] Allan, D., "Guidelines for MPLS Load Balancing", draft-allan-mpls-loadbal-05.txt, IETF work in progress, October 2003 [MPLSREQS] Nadeau et.al., "OAM Requirements for MPLS Networks", draft-ietf-mpls-oam-requirements-05.txt, November 2004 [Y1710] ITU-T Recommendation Y.1710(2002), "Requirements for OAM Functionality for MPLS Networks" - 14. Editors' Address +13. Editors' Address David Allan Nortel Networks Phone: +1-613-763-6362 3500 Carling Ave. Email: dallan@nortelnetworks.com Ottawa, Ontario, CANADA Thomas D. Nadeau Cisco Systems Phone: +1-978-936-1470 300 Beaver Brook Drive Email: tnadeau@cisco.com Boxborough, MA 01824