--- 1/draft-ietf-rtgwg-ipfrr-framework-03.txt 2006-02-04 17:26:45.000000000 +0100 +++ 2/draft-ietf-rtgwg-ipfrr-framework-04.txt 2006-02-04 17:26:46.000000000 +0100 @@ -1,20 +1,20 @@ Network Working Group M. Shand Internet Draft S. Bryant -Expiration Date: December 2005 Cisco Systems +Expiration Date: April 2006 Cisco Systems - June 2005 + October 2005 IP Fast Reroute Framework - draft-ietf-rtgwg-ipfrr-framework-03.txt + draft-ietf-rtgwg-ipfrr-framework-04.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other @@ -248,21 +248,24 @@ Performing the second without the first will result in traffic being discarded by the router(s) adjacent to the failure. Both tasks are necessary for an effective solution to the problem. However, repair paths can be used in isolation where the failure is short-lived. The repair paths can be kept in place until the failure is repaired and there is no need to advertise the failure to other routers. Similarly, micro-loop avoidance can be used in isolation to prevent - loops arising from pre-planned management action. + loops arising from pre-planned management action, because the link or + node being shut down can remain in service for a short time after its + removal has been announced into the network, and hence it can + function as its own "repair path". Note that micro-loops can also occur when a link or node is restored to service and thus a micro-loop avoidance mechanism is required for both link up and link down cases. 3. Mechanisms for IP Fast-reroute The set of mechanisms required for an effective solution to the problem can be broken down into the following sub-problems. @@ -288,44 +291,45 @@ are three basic categories of repair paths: 1. Equal cost multi-paths (ECMP). Where such paths exist, and one or more of the alternate paths do not traverse the failure, they may trivially be used as repair paths. 2. Loop free alternate paths. Such a path exists when a direct neighbor of the router adjacent to the failure has a path to the destination which can be guaranteed not to traverse the failure. - 3. Multi-hop repair paths. When there is no feasible downstream - path it may still be possible to locate a router, which is more - than one hop away from the router adjacent to the failure, from - which traffic will be forwarded to the destination without - traversing the failure. + 3. Multi-hop repair paths. When there is no feasible loop free + alternate path it may still be possible to locate a router, + which is more than one hop away from the router adjacent to the + failure, from which traffic will be forwarded to the destination + without traversing the failure. ECMP and loop free alternate paths (as described in [BASE]) offer the simplest repair paths and would normally be used when they are available. It is anticipated that around 80% of failures (see section - 3.2.2) can be repaired using these alone. + 3.2.2) can be repaired using these basic methods alone. - Multi-hop repair paths are considerably more complex, both in the - computations required to determine their existence, and in the - mechanisms required to invoke them. They can be further classified - as: + Multi-hop repair paths are more complex, both in the computations + required to determine their existence, and in the mechanisms required + to invoke them. They can be further classified as: 1. Mechanisms where one or more alternate FIBs are pre-computed in all routers and the repaired packet is instructed to be - forwarded using a "repair FIB" by some method of signaling such - as detecting a "U-turn" [U-TURNS] or marking the packet. + forwarded using a "repair FIB" by some method of per packet + signaling such as detecting a "U-turn" [U-TURNS, FIFR] or by + marking the packet. 2. Mechanisms functionally equivalent to a loose source route which - is invoked using the normal FIB. These include tunnels [TUNNELS] - and label based mechanisms. + is invoked using the normal FIB. These include tunnels + [TUNNELS], alternative shortest paths [ALT-SP] and label based + mechanisms. 3. Mechanisms employing special addresses or labels which are installed in the FIBs of all routers with routes pre-computed to avoid certain components of the network. For example [NOT-VIA]. In many cases a repair path which reaches two hops away from the router detecting the failure will suffice, and it is anticipated that around 98% of failures (see section 3.2.2) can be repaired by this method. However, to provide complete repair coverage some use of longer multi-hop repair paths is generally necessary. @@ -345,56 +349,60 @@ the repair coverage can be determined and reported via network management. There is a tradeoff to be achieved between minimizing the number of repair paths to be computed, and minimizing the overheads incurred in using higher order multi-hop repair paths for destinations for which they are not strictly necessary. However, the computational cost of determining repair paths on an individual destination basis can be very high. - It will frequently be the case that the majority of destinations can + It will frequently be the case that the majority of destinations may be repaired using only the "basic" repair mechanism, leaving a smaller subset of the destinations to be repaired using one of the more complex multi-hop methods. Such a hybrid approach may go some way to resolving the conflict between completeness and complexity. The use of repair paths may result in excessive traffic passing over a link, resulting in congestion discard. This reduces the effectiveness of IPFRR. Mechanisms to influence the distribution of repaired traffic to minimize this effect are therefore desirable. 3.2.2. Analysis of repair coverage In some cases the repair strategy will permit the repair of all single link or node failures in the network for all possible destinations. This can be defined as 100% coverage. However, where the coverage is less than 100% it is important for the purposes of comparisons between different proposed repair strategies to define - what is meant by such a percentage. There are three possibilities: + what is meant by such a percentage. There are four possibilities: 1. The percentage of links (or nodes) which can be fully protected for all destinations. This is appropriate where the requirement is to protect all traffic, but some percentage of the possible failures may be identified as being un-protectable. 2. The percentage of destinations which can be fully protected for all link (or node) failures. This is appropriate where the requirement is to protect against all possible failures, but some percentage of destinations may be identified as being un-protectable. 3. For all destinations (d) and for all failures (f), the percentage of the total potential failure cases (d*f) which are protected. This is appropriate where the requirement is an overall "best effort" protection. + 4. The percentage of packets normally passing though the network + that will continue to reach their destination. This requires a + traffic matrix for the network as part of the analysis. + The coverage obtained is dependent on the repair strategy and highly dependent on the detailed topology and metrics. Any figures quoted in this document are for illustrative purposes only. 3.2.3. Link or node repair A repair path may be computed to protect against failure of an adjacent link, or failure of an adjacent node. In general, link protection is simpler to achieve. A repair which protects against node failure will also protect against link failure for all @@ -432,37 +440,45 @@ Once the routing protocol has re-converged it is necessary for all repair paths to take account of the new topology. Various optimizations may permit the efficient identification of repair paths which are unaffected by the change, and hence do not require full re-computation. Since the new repair paths will not be required until the next failure occurs, the re-computation may be performed as a background task and be subject to a hold-down, but excessive delay in completing this operation will increase the risk of a new failure occurring before the repair paths are in place. -3.2.5. Multiple failures and Shared Risk Groups +3.2.5. Multiple failures and Shared Risk Link Groups Complete protection against multiple unrelated failures is out of scope of this work. However, it is important that the occurrence of a second failure while one failure is undergoing repair should not result in a level of service which is significantly worse than that which would have been achieved in the absence of any repair strategy. - Shared Risk Groups are an example of multiple related failures, and - their protection is a matter for further study. + Shared Risk Link Groups are an example of multiple related failures, + and the more complex aspects of their protection is a matter for + further study. One specific example of an SRLG which is clearly within the scope of this work is a node failure. This causes the simultaneous failure of multiple links, but their closely defined topological relationship makes the problem more tractable. -3.3. Mechanisms for micro-loop prevention +3.3. Local Area Networks + + Protection against partial or complete failure of LANs is more + complex than the point to point case. In general there is a tradeoff + between the simplicity of the repair and the ability to provide + complete and optimal repair coverage. + +3.4. Mechanisms for micro-loop prevention Control of micro-loops is important not only because they can cause packet loss in traffic which is affected by the failure, but because by saturating a link with looping packets they can also cause congestion loss of traffic flowing over that link which would otherwise be unaffected by the failure. A number of solutions to the problem of micro-loop formation have been proposed and are summarized in [MICROLOOP]. The following factors are significant in their classification: @@ -505,23 +521,23 @@ protected. b. Notification of pre-computed repair paths, and anticipated traffic patterns. c. Counts of failure detections, protection invocations and packets forwarded over repair paths. 5. Scope and applicability + The initial scope of this work is in the context of link state IGPs. Link state protocols provide ubiquitous topology information, which - facilitates the computation of repairs paths. Therefore the initial - scope of this work is in the context of link state IGPs. + facilitates the computation of repairs paths. Provision of similar facilities in non-link state IGPs and BGP is a matter for further study, but the correct operation of the repair mechanisms for traffic with a destination outside the IGP domain is an important consideration for solutions based on this framework 6. IANA considerations There are no IANA considerations that arise from this framework document. @@ -568,49 +584,59 @@ 10. Normative References Internet-drafts are works in progress available from http://www.ietf.org/internet-drafts/ 11. Informative References Internet-drafts are works in progress available from http://www.ietf.org/internet-drafts/ - BASE Atlas, A., "Basic Specification for IP - Fast-Reroute: Loop-free Alternates", - draft-ietf-rtgwg-ipfrr-spec-base-03.txt, + ALT-SP Tian, A., Chen, N., "Fast Reroute using + Alternative Shortest Paths", draft-tian-frr- + alt-shortest-path-01.txt, (work in progress) + + BASE Atlas, A., Zinin, A., "Basic Specification + for IP Fast-Reroute: Loop-free Alternates", + draft-ietf-rtgwg-ipfrr-spec-base-04.txt, (work in progress) BFD Katz, D. and Ward, D., "Bidirectional Forwarding Detection", - draft-ietf-bfd-base-02.txt, (work in + draft-ietf-bfd-base-03.txt, (work in progress). + FIFR S. Nelakuditi, S. Lee, Y. Yu, Z.-L. Zhang, + and C.-N. Chuah, "Fast local rerouting for + handling transient link failures.," Tech. + Rep. TR-2004-004, University of South + Carolina, 2004. + MPLSFRR Pan, P. et al, "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090. MICROLOOP Bryant, S. and Shand, M., "A Framework for Loop-free Convergence", draft-bryant-shand-lf-conv-frmwk-01.txt, (work in progress). - NOT-VIA Bryant, S. and Shand, M., "IP Fast Reroute - Using Notvia Addresses", + NOT-VIA Bryant, S., Previdi, S., Shand, M., "IP Fast + Reroute Using Notvia Addresses", draft-bryant-shand-ipfrr-notvia-addresses- - 00.txt, (work in progress). + 01.txt, (work in progress). TUNNELS Bryant, S. et al, "IP Fast Reroute using - tunnels", draft-bryant-ipfrr-tunnels-01.txt, + tunnels", draft-bryant-ipfrr-tunnels-02.txt, (work in progress). U-TURNS Atlas, A. et al, "IP/LDP Local Protection", - draft-atlas-ip-local-protect-01.txt, (work in + draft-atlas-ip-local-protect-02.txt, (work in progress). 12. Authors' Addresses Stewart Bryant Cisco Systems, 250, Longwater, Green Park, Reading, RG2 6GB, United Kingdom. Email: stbryant@cisco.com