--- 1/draft-ietf-mpls-oam-requirements-00.txt 2006-02-05 00:42:06.000000000 +0100 +++ 2/draft-ietf-mpls-oam-requirements-01.txt 2006-02-05 00:42:06.000000000 +0100 @@ -1,23 +1,23 @@ Network Working Group Thomas D. Nadeau Internet Draft Monique Morrow -Expires: August 2003 George Swallow +Expires: November 2003 George Swallow Cisco Systems, Inc. David Allan Nortel Networks - February 2003 + June 2003 OAM Requirements for MPLS Networks - draft-ietf-mpls-oam-requirements-00.txt + draft-ietf-mpls-oam-requirements-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026 [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working @@ -32,49 +32,51 @@ The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract As transport of diverse traffic types such as voice, frame relay, and ATM over MPLS become more common, the ability to detect, - handle and diagnose control and data plane defects becomes critical. + handle and diagnose control and data plane defects becomes + critical. + Detection and specification of how to handle those defects is not only important because such defects may not only affect the fundamental operation of an MPLS network, but also because they may impact SLA commitments for customers of that network. This Internet draft describes requirements for user and data plane operations and management (OAM) for Multi-Protocol Label Switching (MPLS). These requirements have been gathered from network operators who have extensive experience deploying MPLS networks, similarly some of these requirements have appeared in other documents [Y1710]. This draft specifies OAM requirements for MPLS, as well as for applications of MPLS such as pseudowire voice and VPN services. Those interested in specific issues relating to instrumenting MPLS for OAM purposes are directed to [FRAMEWORK] Table of Contents - Introduction 2 - Terminology 2 - Motivations 3 - Requirements 4 - Security Considerations 8 - Acknowledgments 9 - References 9 - Authors' Addresses 10 - Intellectual Property Rights Notices 11 - Full Copyright Statement 11 + Introduction.....................................................2 + Terminology......................................................2 + Motivations......................................................3 + Requirements.....................................................4 + Security Considerations..........................................8 + Acknowledgments..................................................9 + References.......................................................9 + Authors' Addresses..............................................10 + Intellectual Property Rights Notices............................11 + Full Copyright Statement........................................11 1. Introduction This Internet draft describes requirements for user and data plane operations and management (OAM) for Multi-Protocol Label Switching (MPLS). These requirements have been gathered from network operators who have extensive experience deploying MPLS networks. This draft specifies OAM requirements for MPLS, as well as for applications of MPLS such as pseudowire [PWE3FRAME] voice, and VPN services. @@ -120,167 +123,167 @@ LSR: Label Switch Router OAM: Operations and Management PE: Provider Edge PW: Pseudowire SLA: Service Level Agreement - VCC: Virtual Circuit Channel + VCC: Virtual Channel Connection VPC: Virtual Path Connection 3 Motivations MPLS OAM has been tackled in numerous Internet drafts. However all existing drafts focus on single provider solutions or focus on a single aspect of the MPLS architecture or application of MPLS. For example, the use of RSVP or LDP signaling and defects may be covered in some deployments, and a corresponding SNMP MIB module exists to manage this application; however, the handling of defects and specification of which types of defects are interesting to operational networks may not have been created in concert with those for other applications of MPLS such as L3 VPN. This leads to inconsistent and inefficient applicability across the MPLS architecture, and/or requires significant modifications to operational procedure and systems in order to provide consistent and useful OAM functionality. As MPLS matures relationships between providers has become more complex. Furthermore, the - deployment of multiple concurrent applications - of MPLS is commonplace. This has led to a need to consider - deployments that span arbitrary networking arrangements and - boundaries so that broader and more uniform applicability - to the MPLS architecture for OAM is possible. + deployment of multiple concurrent applications of MPLS is common + place. This has led to a need to consider deployments that span + arbitrary networking arrangements and boundaries; + so that broader and more uniform applicability to the MPLS + architecture for OAM is possible. 3. Requirements The following sections enumerate the OAM requirements gathered from service providers. Each requirement is further specified in detail to further clarify its applicability. 3.1 Detection of Broken Label Switch Paths The ability to detect a broken Label Switch Path (LSP) should not require manual hop-by-hop troubleshooting of each LSR used to switch traffic for that LSP. For example, - it is not desirable to manually visit each LSR - along the data plane path used to transport an LSP; instead, - this function should be automated and performed from the - origination of that LSP. Furthermore, the automation of - path liveliness is desired in cases where large amounts of - LSPs might be tested. For example, automated PE-to-PE - LSP testing functionality is desired. The goal is to detect LSP - problems before customers do, and this requires detection of - problems in a "reasonable" amount of time. One useful definition - of reasonable is both predictable and consistent. If the time to - detect defects is specified and tools designed accordingly then - a harmonized operational framework can be established both - within MPLS levels, and with MPLS applications. If the time to - detect is known, then automated responses can be + it is not desirable to manually visit each LSR along the data + plane path used to transport an LSP; instead,this function + should be automated and performed from the origination of that LSP. + Furthermore, the automation of path liveliness is desired in + cases where large amounts of LSPs might be tested. For example, + automated PE-to-PE LSP testing functionality is desired. + The goal is to detect LSP problems before customers do, and + this requires detection of problems in a "reasonable" amount of + time. + + One useful definition of reasonable is both predictable and + consistent. + + If the time to detect defects is specified and tools designed + accordingly then a harmonized operational framework can be + established both within MPLS levels, and with MPLS applications. + If the time to detect is known, then automated responses can be specified both w.r.t.with regard to resiliency and SLA reporting. One consequence is that ambiguity in maintenance procedures MUST be minimized as ambiguity in test results impacts detection time. - Although ICMP-based ping can be sent through an LSP, the use of - this tool to verify the LSP path liveliness has the potential - for returning erroneous results (both positive and negative) - given the nature of MPLS LSPs. For example, failures can be - may occur where inconsistencies exist between the IP and MPLS + Although ICMP-based ping can be sent through an LSP, + the use of this tool to verify the LSP path liveliness has the + potential for returning erroneous results (both positive and + negative) given the nature of MPLS LSPs. For example, failures can + be may occur where inconsistencies exist between the IP and MPLS forwarding tables, inconsistencies in the MPLS control and data plane or problems with the reply path (i.e.: a reverse MPLS path does not exist). Detection tools should have minimal dependencies on network components that do not implement the LSP. - Furthermore, the path liveliness function - MUST have the ability to support equal cost multipath - (ECMP) scenarios within the operator's network. Specifically, - the ability to detect failures on any parallel (i.e.: equal - IGP cost) paths used to load share traffic in order to more - efficiently use the network. It is common to base the algorithm - of how to load share traffic by examining certain fields within - the packet header. Unfortunately, there is no standard for this - algorithm, but it is important that any function be capable - of detecting failures on all operational paths as failure of - any branch may lead to loss of traffic, regardless of load sharing - algorithm. This introduces complexity into ensuring that ECMP - connectivity permutations are exercised, and that defect - detection occurs in a reasonable amount of time. [GUIDELINES] - discusses some of the issues and offers suggestions for ensuring - mutual compatibility of ECMP and maintenance functions (both - detection and diagnostic). + The OAM packet MUST follow exactly the customer data path in order + to reflect path liveliness used by customer data. Particular + cases of interest are forwarding mechanisms such as equal cost + multipath (ECMP) scenarios within the operator's network whereby + flows are load-shared across parallel (i.e.: equal IGP cost) paths. + Where the customer traffic may be spread over multiple paths, it + is required to be able to detect failures on any of the path + permutations. Where the spreading mechanism is payload specific, + payloads need to have forwarding that is common with the traffic + under test. Satisfying these requirements introduces complexity + into ensuring that ECMP connectivity permutations are exercised, + and that defect detection occurs in a reasonable amount of time. + [GUIDELINES] discusses some of the issues and offers suggestions + for ensuring mutual compatibility of ECMP and maintenance + functions (both detection and diagnostic). 3.2 Diagnosis of a Broken Label Switch Path The ability to diagnose a broken LSP and to isolate the failed resource in the path is required. This is particularly true for misbranching defects which are particularly difficult to specify recovery actions in an LDP network. Experience suggests that this is best accomplished via a path trace function that can return the entire list of LSRs and links used by a certain LSP (or at least the set of LSRs/links up to the location of the defect) is required. The tracing capability should include the ability to trace recursive paths, such as when nested LSPs are used, or when LSPs enter and exit traffic-engineered tunnels [TUNTRACE]. This path trace function must also be capable of diagnosing LSP mis-merging by permitting comparison of expected vs. actual forwarding behavior at any LSR in the path. The path trace capability should be capable of being executed from both the head end Label Switch Router (LSR) and any - mid-point LSR. Additionally, the path trace function MUST have - the ability to support equal cost multipath scenarios as described - above in section 3.1. + mid-point LSR. + Additionally, the path trace function MUST have the ability to + support equal cost multipath scenarios as described above in + section 3.1. 3.3 Path characterization The ability of a path trace function to reveal details of LSR forwarding operations relevant to OAM functionality. This would include but not be limited to: - use of pipe or uniform TTL models by an LSR - externally visible aspects of load spreading (such as - ECMP), including - type of algorithm used + ECMP), including type of algorithm used examples of how algorithm will spread traffic - data/control plane OAM capabilities of the LSR - stack operations performed by the LSR (pushes and pops) 3.4 Service Level Agreement Measurement Mechanisms are required to measure diverse aspects of Service Level Agreements: - availability - in which the service is considered to be available and the other aspects of performance measurement listed below have meaning, or unavailable and other aspects of performance measurement do not. - latency - amount of time required for traffic to transit the network - packet loss - jitter - measurement of latency variation Such measurements can be made independently of the user traffic or via a hybrid of user traffic measurement and OAM probing. - At least one mechanism - is required to measure the quantity + At least one mechanism is required to measure the quantity (i.e.: number of packets) of OAM packets. In addition, the ability to measure the qualitative aspects of OAM probing must be available to specifically compute the latency of OAM packets generated and received at each end of a tested LSP. Latency is - considered in this context as a measurable parameter for SLA reporting. - There is no assumption that bursts of OAM packets are required to - characterize the performance of an LSP, but it is suggested that any - method considered be capable of measuring the latency of an LSP with - minimal impact on network resources. + considered in this context as a measurable parameter for SLA + reporting. There is no assumption that bursts of OAM packets are + required to characterize the performance of an LSP, but it is + suggested that any method considered be capable of measuring the + latency of an LSP with minimal impact on network resources. 3.5 Frequency of OAM Execution The operator MUST be have the flexibility to configure OAM parameters and the frequency of the execution of any OAM functions provided that there is some synchronization possible of tool usage for availability metrics. The motivation for this is to permit the network to function as a system of harmonious OAM functions consistent across the entire network. @@ -303,65 +306,67 @@ Devices must provide alarm suppression functionality that prevents the generation of superfluous generation of alarms. When viewed in conjuction with requirement 3.6 below, this typically requires fault notification to the LSP egress, that may have specific time constraints if the client PW independently implements path continuity testing (for example ATM I.610 Continuity check (CC)[I610]). This would also be true for LSPs that have client LSPs that are - monitored. MPLS arbitrary hierarchy introduces the opportunity to have - multiple MPLS levels attempt to respond to defects simultaneously. - Mechanisms are required to coordinate network response to defects. + monitored. MPLS arbitrary hierarchy introduces the opportunity to + have multiple MPLS levels attempt to respond to defects + simultaneously. Mechanisms are required to coordinate network + response to defects. 3.6 Support for OAM Interworking for Fault Notification An LSR supporting OAM functions for pseudo-wire functions that join one or more networking technologies over MPLS must be able to translate an MPLS defect into the native technology's error condition. For example, errors occurring over the MPLS transport LSP that supports an emulated ATM VC must translate errors into native ATM OAM AIS cells at the edges of the pseudo- wire. The mechanism SHOULD consider possible bounded detection time parameters, e.g., a "hold off" function before reacting as - to harmonize with the client OAM. One goal would be alarm suppression - in the psuedo-wire's client layer. As observed in 3.5, this requires - that the MPLS layer perform detection in a bounded timeframe in - order to initiate alarm suppression prior to the psuedo-wire - client layer independently detecting the defect. + to harmonize with the client OAM. One goal would be alarm + suppression in the psuedo-wire's client layer. As observed in + section 3.5, this requires that the MPLS layer perform detection + in a bounded timeframe in order to initiate alarm suppression + prior to the psuedo-wire client layer independently detecting the + defect. 3.7 Error Detection and Recovery. Mechanisms are needed to detect an error, react to it (ideally in some form of automated response by the network), recover from it and alert the network operator prior to the customer informing the network operator of the error condition. The ideal situation would be where the network is resilient and can restore service prior any significant impact on the customer perception of the service. There are also defects that by virtue of available network resources or topology that cannot be recovered automatically. It is however, sometimes a requirement that the customer be notified of the defect condition at the same time that the network operator is made aware of the defect (as in the example of alarm suppression for PW clients discussed above). In these situations, - the customer network may be capable of processing automated responses - based on notification of a defect condition. It is preferred - that the format of these notifications be made consistent (i.e.: - standardized) as to increase the applicability of such messages. - Depending on the device's capabilities, the device may be programmed - to take automatic corrective actions as a result of detection of - defect conditions. These actions may be user or operator-specified, - or may simply be inherent to the underlying transport technology - (i.e.: MPLS Fast-Reroute, graceful restart or high-availability - functionality). + the customer network may be capable of processing automated + responses based on notification of a defect condition. It is + preferred that the format of these notifications be made + consistent (i.e.: standardized) as to increase the applicability + of such messages. Depending on the device's capabilities, the + device may be programmed to take automatic corrective actions as + a result of detection of defect conditions. These actions may be + user or operator-specified, or may simply be inherent to the + underlying transport technology (i.e.: MPLS Fast-Reroute, + graceful restart or high-availability functionality). 3.8 The commoditization of MPLS will require common information modeling of management and control of OAM functionality. This will be reflected in the the integration of standard MPLS-related MIBs (e.g. [LSRMIB][TEMIB][LBMIB][FTNMIB]) for fault, statistics and configuration management. These standard interfaces provide operators with common programmatic interface access to operations and management functions and their status. 3.9 Detection of Denial of Service attacks as part of security @@ -369,31 +374,34 @@ 4. Security Considerations LSP mis-merging has security implications beyond that of simply being a network defect. LSP mis-merging can happen due to a number of potential sources of failure, some of which (due to MPLS label stacking) are new to MPLS. The performance of diagnostic functions and path characterization involve extracting a significant amount of information about - network construction which the network operator may consider private. + network construction which the network operator may consider + private. Mechanisms are required to prevent unauthorized use of either those tools or protocol features. 5. Acknowledgments The authors wish to acknowledge and thank the following individuals for their valuable comments to this document: Adrian Smith, British Telecom; Chou Lan Pok, SBC; Mr. Ikejiri, NTT Communications and Mr.Kumaki of KDDI. - Hari Rakotoranto, Cisco Systems; Danny McPherson from TCB. + + Hari Rakotoranto, Cisco Systems; Luyuan Fang, AT&T; + Danny McPherson, TCB. 6. References [TUNTRACE] Bonica, R., Kompella, K., Meyer, D., "Tracing Requirements for Generic Tunnels", Internet Draft , November 2001. [LSRMIB] Srinivasan, C., Viswanathan, A. and T. Nadeau, "MPLS Label Switch Router Management @@ -438,45 +447,45 @@ [I610] ITU-T Recommendation I.610, "B-ISDN operations and maintenance principles and functions", February 1999 [FRAMEWORK] Allan et.al. "A Framework for MPLS OAM", Internet draft , February 2003 7. Authors' Addresses Thomas D. Nadeau Cisco Systems, Inc. - 300 Apollo Drive - Chelmsford, MA 01824 - Phone: 978-244-3051 + 300 Beaver Brook Road + Boxboro, MA 01719 + Phone: +1-978-936-1470 Email: tnadeau@cisco.com Monique Jeanne Morrow Cisco Systems, Inc. Glatt-Com, 2nd Floor CH-8301 Switzerland Voice: (0)1 878-9412 - EMail: mmorrow@cisco.com + Email: mmorrow@cisco.com George Swallow Cisco Systems, Inc. - 250 Apollo Drive - Chelmsford, MA 01824 - Voice: 978 244 8143 + 300 Beaver Brook Road + Boxboro, MA 01719 + Voice: +1-978-936-1398 Email: swallow@cisco.com David Allan Nortel Networks 3500 Carling Ave. - Voice: 1-613-763-6362 Ottawa, Ontario, CANADA + Voice: 1-613-763-6362 Email: dallan@nortelnetworks.com 8. Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may @@ -506,21 +515,21 @@ 9. Intellectual Property Rights Notices The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of - claims of rights made available for publication and any assurances of - licenses to be made available, or the result of an attempt made to - obtain a general license or permission for the use of such - proprietary rights by implementers or users of this specification can - be obtained from the IETF Secretariat. + claims of rights made available for publication and any assurances + of licenses to be made available, or the result of an attempt made + to obtain a general license or permission for the use of such + proprietary rights by implementers or users of this specification + can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director.