draft-ietf-rtgwg-ipfrr-framework-13.txt   rfc5714.txt 
Network Working Group M. Shand Internet Engineering Task Force (IETF) M. Shand
Internet-Draft S. Bryant Request for Comments: 5714 S. Bryant
Intended status: Informational Cisco Systems Category: Informational Cisco Systems
Expires: April 26, 2010 October 23, 2009 ISSN: 2070-1721 January 2010
IP Fast Reroute Framework IP Fast Reroute Framework
draft-ietf-rtgwg-ipfrr-framework-13
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the Abstract
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering This document provides a framework for the development of IP fast-
Task Force (IETF), its areas, and its working groups. Note that reroute mechanisms that provide protection against link or router
other groups may also distribute working documents as Internet- failure by invoking locally determined repair paths. Unlike MPLS
Drafts. fast-reroute, the mechanisms are applicable to a network employing
conventional IP routing and forwarding.
Internet-Drafts are draft documents valid for a maximum of six months Status of This Memo
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at This document is not an Internet Standards Track specification; it is
http://www.ietf.org/ietf/1id-abstracts.txt. published for informational purposes.
The list of Internet-Draft Shadow Directories can be accessed at This document is a product of the Internet Engineering Task Force
http://www.ietf.org/shadow.html. (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are a candidate for any level of Internet
Standard; see Section 2 of RFC 5741.
This Internet-Draft will expire on April 26, 2010. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc5714.
Copyright Notice Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of Provisions Relating to IETF Documents
publication of this document (http://trustee.ietf.org/license-info). (http://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Abstract Table of Contents
This document provides a framework for the development of IP fast- 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
reroute mechanisms which provide protection against link or router 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
failure by invoking locally determined repair paths. Unlike MPLS 3. Scope and Applicability . . . . . . . . . . . . . . . . . . . 5
fast-reroute, the mechanisms are applicable to a network employing 4. Problem Analysis . . . . . . . . . . . . . . . . . . . . . . . 5
conventional IP routing and forwarding. 5. Mechanisms for IP Fast-Reroute . . . . . . . . . . . . . . . . 7
5.1. Mechanisms for Fast Failure Detection . . . . . . . . . . 7
5.2. Mechanisms for Repair Paths . . . . . . . . . . . . . . . 8
5.2.1. Scope of Repair Paths . . . . . . . . . . . . . . . . 9
5.2.2. Analysis of Repair Coverage . . . . . . . . . . . . . 9
5.2.3. Link or Node Repair . . . . . . . . . . . . . . . . . 10
5.2.4. Maintenance of Repair Paths . . . . . . . . . . . . . 10
5.2.5. Local Area Networks . . . . . . . . . . . . . . . . . 11
5.2.6. Multiple Failures and Shared Risk Link Groups . . . . 11
5.3. Mechanisms for Micro-Loop Prevention . . . . . . . . . . . 12
6. Management Considerations . . . . . . . . . . . . . . . . . . 12
7. Security Considerations . . . . . . . . . . . . . . . . . . . 13
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
9. Informative References . . . . . . . . . . . . . . . . . . . . 14
Table of Contents 1. Introduction
1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 When a link or node failure occurs in a routed network, there is
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 inevitably a period of disruption to the delivery of traffic until
3. Scope and applicability . . . . . . . . . . . . . . . . . . . 6 the network re-converges on the new topology. Packets for
4. Problem Analysis . . . . . . . . . . . . . . . . . . . . . . . 6 destinations that were previously reached by traversing the failed
5. Mechanisms for IP Fast-reroute . . . . . . . . . . . . . . . . 8 component may be dropped or may suffer looping. Traditionally, such
5.1. Mechanisms for fast failure detection . . . . . . . . . . 8 disruptions have lasted for periods of at least several seconds, and
5.2. Mechanisms for repair paths . . . . . . . . . . . . . . . 8 most applications have been constructed to tolerate such a quality of
5.2.1. Scope of repair paths . . . . . . . . . . . . . . . . 9 service.
5.2.2. Analysis of repair coverage . . . . . . . . . . . . . 10
5.2.3. Link or node repair . . . . . . . . . . . . . . . . . 11
5.2.4. Maintenance of Repair paths . . . . . . . . . . . . . 11
5.2.5. Local Area Networks . . . . . . . . . . . . . . . . . 12
5.2.6. Multiple failures and Shared Risk Link Groups . . . . 12
5.3. Mechanisms for micro-loop prevention . . . . . . . . . . . 12
6. Management Considerations . . . . . . . . . . . . . . . . . . 13
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
8. Security Considerations . . . . . . . . . . . . . . . . . . . 14
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14
10. Informative References . . . . . . . . . . . . . . . . . . . . 14
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15
1. Terminology Recent advances in routers have reduced this interval to under a
second for carefully configured networks using link state IGPs.
However, new Internet services are emerging that may be sensitive to
periods of traffic loss that are orders of magnitude shorter than
this.
This section defines words and acronyms used in this draft and other Addressing these issues is difficult because the distributed nature
drafts discussing IP fast-reroute. of the network imposes an intrinsic limit on the minimum convergence
time that can be achieved.
However, there is an alternative approach, which is to compute backup
routes that allow the failure to be repaired locally by the router(s)
detecting the failure without the immediate need to inform other
routers of the failure. In this case, the disruption time can be
limited to the small time taken to detect the adjacent failure and
invoke the backup routes. This is analogous to the technique
employed by MPLS fast-reroute [RFC4090], but the mechanisms employed
for the backup routes in pure IP networks are necessarily very
different.
This document provides a framework for the development of this
approach.
Note that in order to further minimize the impact on user
applications, it may be necessary to design the network such that
backup paths with suitable characteristics (for example, capacity
and/or delay) are available for the algorithms to select. Such
considerations are outside the scope of this document.
2. Terminology
This section defines words and acronyms used in this document and
other documents discussing IP fast-reroute.
D Used to denote the destination router under D Used to denote the destination router under
discussion. discussion.
Distance_opt(A,B) The metric sum of the shortest path from A to B. Distance_opt(A,B) The metric sum of the shortest path from A to B.
Downstream Path This is a subset of the loop-free alternates Downstream Path This is a subset of the loop-free alternates
where the neighbor N meets the following where the neighbor N meets the following
condition:- condition:
Distance_opt(N, D) < Distance_opt(S,D) Distance_opt(N, D) < Distance_opt(S,D)
E Used to denote the router which is the primary E Used to denote the router that is the primary
neighbor to get from S to the destination D. neighbor to get from S to the destination D.
Where there is an ECMP set for the shortest path Where there is an ECMP set for the shortest path
from S to D, these are referred to as E_1, E_2, from S to D, these are referred to as E_1, E_2,
etc. etc.
ECMP Equal cost multi-path: Where, for a particular ECMP Equal cost multi-path: Where, for a particular
destination D, multiple primary next-hops are destination D, multiple primary next-hops are
used to forward traffic because there exist used to forward traffic because there exist
multiple shortest paths from S via different multiple shortest paths from S via different
output layer-3 interfaces. output layer-3 interfaces.
FIB Forwarding Information Base. The database used FIB Forwarding Information Base. The database used
by the packet forwarder to determine what actions by the packet forwarder to determine what actions
to perform on a packet. to perform on a packet.
IPFRR IP fast-reroute. IPFRR IP fast-reroute.
Link(A->B) A link connecting router A to router B. Link(A->B) A link connecting router A to router B.
LFA Loop Free Alternate. A neighbor N, that is not a LFA Loop-Free Alternate. A neighbor N, that is not a
primary neighbor E, whose shortest path to the primary neighbor E, whose shortest path to the
destination D does not go back through the router destination D does not go back through the router
S. The neighbor N must meet the following S. The neighbor N must meet the following
condition:- condition:
Distance_opt(N, D) < Distance_opt(N, S) + Distance_opt(N, D) < Distance_opt(N, S) +
Distance_opt(S, D) Distance_opt(S, D)
Loop Free Neighbor A neighbor N_i, which is not the particular Loop-Free Neighbor A neighbor N_i, which is not the particular
primary neighbor E_k under discussion, and whose primary neighbor E_k under discussion, and whose
shortest path to D does not traverse S. For shortest path to D does not traverse S. For
example, if there are two primary neighbors E_1 example, if there are two primary neighbors E_1
and E_2, E_1 is a loop-free neighbor with regard and E_2, E_1 is a loop-free neighbor with regard
to E_2 and vice versa. to E_2, and vice versa.
Loop Free Link Protecting Alternate Loop-Free Link-Protecting Alternate
A path via a Loop-Free Neighbor N_i that reaches A path via a Loop-Free Neighbor N_i that reaches
destination D without going through the destination D without going through the
particular link of S that is being protected. In particular link of S that is being protected. In
some cases the path to D may go through the some cases, the path to D may go through the
primary neighbor E. primary neighbor E.
Loop Free Node-protecting Alternate Loop-Free Node-Protecting Alternate
A path via a Loop-Free Neighbor N_i that reaches A path via a Loop-Free Neighbor N_i that reaches
destination D without going through the destination D without going through the
particular primary neighbor (E) of S which is particular primary neighbor (E) of S that is
being protected. being protected.
N_i The ith neighbor of S. N_i The ith neighbor of S.
Primary Neighbor A neighbor N_i of S which is one of the next hops Primary Neighbor A neighbor N_i of S which is one of the next hops
for destination D in S's FIB prior to any for destination D in S's FIB prior to any
failure. failure.
R_i_j The jth neighbor of N_i. R_i_j The jth neighbor of N_i.
Repair Path The path used by a repairing node to send traffic Repair Path The path used by a repairing node to send traffic
that it is unable to send via the normal path that it is unable to send via the normal path
owing to a failure. owing to a failure.
Routing Transition The process whereby routers converge on a new Routing Transition The process whereby routers converge on a new
topology. In conventional networks this process topology. In conventional networks, this process
frequently causes some disruption to packet frequently causes some disruption to packet
delivery. delivery.
RPF Reverse Path Forwarding. I.e. checking that a RPF Reverse Path Forwarding, i.e., checking that a
packet is received over the interface which would packet is received over the interface that would
be used to send packets addressed to the source be used to send packets addressed to the source
address of the packet. address of the packet.
S Used to denote a router that is the source of a S Used to denote a router that is the source of a
repair that is computed in anticipation of the repair that is computed in anticipation of the
failure of a neighboring router denoted as E, or failure of a neighboring router denoted as E, or
of the link between S and E. It is the viewpoint of the link between S and E. It is the viewpoint
from which IP fast-reroute is described. from which IP fast-reroute is described.
SPF Shortest Path First, e.g. Dijkstra's algorithm. SPF Shortest Path First, e.g., Dijkstra's algorithm.
SPT Shortest path tree SPT Shortest path tree
Upstream Forwarding Loop Upstream Forwarding Loop
A forwarding loop that involves a set of routers, A forwarding loop that involves a set of routers,
none of which is directly connected to the link none of which is directly connected to the link
that has caused the topology change that that has caused the topology change that
triggered a new SPF in any of the routers. triggered a new SPF in any of the routers.
2. Introduction 3. Scope and Applicability
When a link or node failure occurs in a routed network, there is
inevitably a period of disruption to the delivery of traffic until
the network re-converges on the new topology. Packets for
destinations which were previously reached by traversing the failed
component may be dropped or may suffer looping. Traditionally such
disruptions have lasted for periods of at least several seconds, and
most applications have been constructed to tolerate such a quality of
service.
Recent advances in routers have reduced this interval to under a
second for carefully configured networks using link state IGPs.
However, new Internet services are emerging which may be sensitive to
periods of traffic loss which are orders of magnitude shorter than
this.
Addressing these issues is difficult because the distributed nature
of the network imposes an intrinsic limit on the minimum convergence
time which can be achieved.
However, there is an alternative approach, which is to compute backup
routes that allow the failure to be repaired locally by the router(s)
detecting the failure without the immediate need to inform other
routers of the failure. In this case, the disruption time can be
limited to the small time taken to detect the adjacent failure and
invoke the backup routes. This is analogous to the technique
employed by MPLS fast-reroute [RFC4090], but the mechanisms employed
for the backup routes in pure IP networks are necessarily very
different.
This document provides a framework for the development of this
approach.
Note that in order to further minimise the impact on user
applications, it may be necessary to design the network such that
backup paths with suitable characteristics, for example capacity
and/or delay, are available for the algorithms to select. Such
considerations are outside the scope of this document.
3. Scope and applicability
The initial scope of this work is in the context of link state IGPs. The initial scope of this work is in the context of link state IGPs.
Link state protocols provide ubiquitous topology information, which Link state protocols provide ubiquitous topology information, which
facilitates the computation of repairs paths. facilitates the computation of repairs paths.
Provision of similar facilities in non-link state IGPs and BGP is a Provision of similar facilities in non-link state IGPs and BGP is a
matter for further study, but the correct operation of the repair matter for further study, but the correct operation of the repair
mechanisms for traffic with a destination outside the IGP domain is mechanisms for traffic with a destination outside the IGP domain is
an important consideration for solutions based on this framework. an important consideration for solutions based on this framework.
skipping to change at page 6, line 29 skipping to change at page 5, line 48
scope of this work. scope of this work.
4. Problem Analysis 4. Problem Analysis
The duration of the packet delivery disruption caused by a The duration of the packet delivery disruption caused by a
conventional routing transition is determined by a number of factors: conventional routing transition is determined by a number of factors:
1. The time taken to detect the failure. This may be of the order 1. The time taken to detect the failure. This may be of the order
of a few milliseconds when it can be detected at the physical of a few milliseconds when it can be detected at the physical
layer, up to several tens of seconds when a routing protocol layer, up to several tens of seconds when a routing protocol
Hello is employed. During this period packets will be Hello is employed. During this period, packets will be
unavoidably lost. unavoidably lost.
2. The time taken for the local router to react to the failure. 2. The time taken for the local router to react to the failure.
This will typically involve generating and flooding new routing This will typically involve generating and flooding new routing
updates, perhaps after some hold-down delay, and re-computing the updates, perhaps after some hold-down delay, and re-computing the
router's FIB. router's FIB.
3. The time taken to pass the information about the failure to other 3. The time taken to pass the information about the failure to other
routers in the network. In the absence of routing protocol routers in the network. In the absence of routing protocol
packet loss, this is typically between 10 milliseconds and 100 packet loss, this is typically between 10 milliseconds and 100
milliseconds per hop. milliseconds per hop.
4. The time taken to re-compute the forwarding tables. This is 4. The time taken to re-compute the forwarding tables. This is
typically a few milliseconds for a link state protocol using typically a few milliseconds for a link state protocol using
Dijkstra's algorithm. Dijkstra's algorithm.
5. The time taken to load the revised forwarding tables into the 5. The time taken to load the revised forwarding tables into the
forwarding hardware. This time is very implementation dependant forwarding hardware. This time is very implementation dependent
and also depends on the number of prefixes affected by the and also depends on the number of prefixes affected by the
failure, but may be several hundred milliseconds. failure, but may be several hundred milliseconds.
The disruption will last until the routers adjacent to the failure The disruption will last until the routers adjacent to the failure
have completed steps 1 and 2, and then all the routers in the network have completed steps 1 and 2, and until all the routers in the
whose paths are affected by the failure have completed the remaining network whose paths are affected by the failure have completed the
steps. remaining steps.
The initial packet loss is caused by the router(s) adjacent to the The initial packet loss is caused by the router(s) adjacent to the
failure continuing to attempt to transmit packets across the failure failure continuing to attempt to transmit packets across the failure
until it is detected. This loss is unavoidable, but the detection until it is detected. This loss is unavoidable, but the detection
time can be reduced to a few tens of milliseconds as described in time can be reduced to a few tens of milliseconds as described in
Section 5.1. Section 5.1.
In some topologies subsequent packet loss may be caused by the In some topologies, subsequent packet loss may be caused by the
"micro-loops" which may form as a result of temporary inconsistencies "micro-loops" which may form as a result of temporary inconsistencies
between routers' forwarding tables[I-D.ietf-rtgwg-lf-conv-frmwk]. between routers' forwarding tables [RFC5715]. These inconsistencies
These inconsistencies are caused by steps 3, 4 and 5 above and in are caused by steps 3, 4, and 5 above, and in many routers it is step
many routers it is step 5 which is both the largest factor and which 5 that is both the largest factor and that has the greatest variance
has the greatest variance between routers. The large variance arises between routers. The large variance arises from implementation
from implementation differences and from the differing impact that a differences and from the differing impact that a failure has on each
failure has on each individual router. For example, the number of individual router. For example, the number of prefixes affected by
prefixes affected by the failure may vary dramatically from one the failure may vary dramatically from one router to another.
router to another.
In order to reduce packet disruption times to a duration commensurate In order to reduce packet disruption times to a duration commensurate
with the failure detection times, two mechanisms may be required:- with the failure detection times, two mechanisms may be required:
a. A mechanism for the router(s) adjacent to the failure to rapidly a. A mechanism for the router(s) adjacent to the failure to rapidly
invoke a repair path, which is unaffected by any subsequent re- invoke a repair path, which is unaffected by any subsequent re-
convergence. convergence.
b. In topologies that are susceptible to micro-loops, a micro-loop b. In topologies that are susceptible to micro-loops, a micro-loop
control mechanism may be required[I-D.ietf-rtgwg-lf-conv-frmwk]. control mechanism may be required [RFC5715].
Performing the first task without the second may result in the repair Performing the first task without the second may result in the repair
path being starved of traffic and hence being redundant. Performing path being starved of traffic and hence being redundant. Performing
the second without the first will result in traffic being discarded the second without the first will result in traffic being discarded
by the router(s) adjacent to the failure. by the router(s) adjacent to the failure.
Repair paths may always be used in isolation where the failure is Repair paths may always be used in isolation where the failure is
short-lived. In this case, the repair paths can be kept in place short-lived. In this case, the repair paths can be kept in place
until the failure is repaired in which case there is no need to until the failure is repaired, therefore there is no need to
advertise the failure to other routers. advertise the failure to other routers.
Similarly, micro-loop avoidance may be used in isolation to prevent Similarly, micro-loop avoidance may be used in isolation to prevent
loops arising from pre-planned management action. In which case the loops arising from pre-planned management action. In which case the
link or node being shut down can remain in service for a short time link or node being shut down can remain in service for a short time
after its removal has been announced into the network, and hence it after its removal has been announced into the network, and hence it
can function as its own "repair path". can function as its own "repair path".
Note that micro-loops may also occur when a link or node is restored Note that micro-loops may also occur when a link or node is restored
to service and thus a micro-loop avoidance mechanism may be required to service, and thus a micro-loop avoidance mechanism may be required
for both link up and link down cases. for both link up and link down cases.
5. Mechanisms for IP Fast-reroute 5. Mechanisms for IP Fast-Reroute
The set of mechanisms required for an effective solution to the The set of mechanisms required for an effective solution to the
problem can be broken down into the sub-problems described in this problem can be broken down into the sub-problems described in this
section. section.
5.1. Mechanisms for fast failure detection 5.1. Mechanisms for Fast Failure Detection
It is critical that the failure detection time is minimized. A It is critical that the failure detection time is minimized. A
number of well documented approaches are possible, such as: number of well-documented approaches are possible, such as:
1. Physical detection; for example, loss of light. 1. Physical detection; for example, loss of light.
2. Routing protocol independent protocol detection; for example, The 2. Protocol detection that is routing protocol independent; for
Bidirectional Failure Detection protocol [I-D.ietf-bfd-base]. example, the Bidirectional Failure Detection protocol [BFD].
3. Routing protocol detection; for example, use of "fast Hellos". 3. Routing protocol detection; for example, use of "fast Hellos".
When configuring packet based failure detection mechanisms it is When configuring packet-based failure detection mechanisms it is
important that consideration be given to the likelihood and important that consideration be given to the likelihood and
consequences of false indications of failure. The incidence of false consequences of false indications of failure. The incidence of false
indication of failure may be minimised by appropriately prioritizing indication of failure may be minimized by appropriately prioritizing
of the transmission, reception and processing of the packets used to the transmission, reception, and processing of the packets used to
detect link or node failure. Note that this is not an issue that is detect link or node failure. Note that this is not an issue that is
specific to IPFRR. specific to IPFRR.
5.2. Mechanisms for repair paths 5.2. Mechanisms for Repair Paths
Once a failure has been detected by one of the above mechanisms, Once a failure has been detected by one of the above mechanisms,
traffic which previously traversed the failure is transmitted over traffic that previously traversed the failure is transmitted over one
one or more repair paths. The design of the repair paths should be or more repair paths. The design of the repair paths should be such
such that they can be pre-calculated in anticipation of each local that they can be pre-calculated in anticipation of each local failure
failure and made available for invocation with minimal delay. There and made available for invocation with minimal delay. There are
are three basic categories of repair paths: three basic categories of repair paths:
1. Equal cost multi-paths (ECMP). Where such paths exist, and one 1. Equal cost multi-paths (ECMP). Where such paths exist, and one
or more of the alternate paths do not traverse the failure, they or more of the alternate paths do not traverse the failure, they
may trivially be used as repair paths. may trivially be used as repair paths.
2. Loop free alternate paths. Such a path exists when a direct 2. Loop-free alternate paths. Such a path exists when a direct
neighbor of the router adjacent to the failure has a path to the neighbor of the router adjacent to the failure has a path to the
destination which can be guaranteed not to traverse the failure. destination that can be guaranteed not to traverse the failure.
3. Multi-hop repair paths. When there is no feasible loop free 3. Multi-hop repair paths. When there is no feasible loop-free
alternate path it may still be possible to locate a router, which alternate path it may still be possible to locate a router, which
is more than one hop away from the router adjacent to the is more than one hop away from the router adjacent to the
failure, from which traffic will be forwarded to the destination failure, from which traffic will be forwarded to the destination
without traversing the failure. without traversing the failure.
ECMP and loop free alternate paths (as described in [RFC5286]) offer ECMP and loop-free alternate paths (as described in [RFC5286]) offer
the simplest repair paths and would normally be used when they are the simplest repair paths and would normally be used when they are
available. It is anticipated that around 80% of failures (see available. It is anticipated that around 80% of failures (see
Section 5.2.2) can be repaired using these basic methods alone. Section 5.2.2) can be repaired using these basic methods alone.
Multi-hop repair paths are more complex, both in the computations Multi-hop repair paths are more complex, both in the computations
required to determine their existence, and in the mechanisms required required to determine their existence, and in the mechanisms required
to invoke them. They can be further classified as: to invoke them. They can be further classified as:
a. Mechanisms where one or more alternate FIBs are pre-computed in a. Mechanisms where one or more alternate FIBs are pre-computed in
all routers and the repaired packet is instructed to be forwarded all routers, and the repaired packet is instructed to be
using a "repair FIB" by some method of per packet signaling such forwarded using a "repair FIB" by some method of per-packet
as detecting a "U-turn" [I-D.atlas-ip-local-protect-uturn] , signaling such as detecting a "U-turn" [UTURN], [FIFR] or by
[FIFR] or by marking the packet [SIMULA]. marking the packet [SIMULA].
b. Mechanisms functionally equivalent to a loose source route which b. Mechanisms functionally equivalent to a loose source route that
is invoked using the normal FIB. These include tunnels is invoked using the normal FIB. These include tunnels
[I-D.bryant-ipfrr-tunnels], alternative shortest paths [TUNNELS], alternative shortest paths [ALT-SP], and label-based
[I-D.tian-frr-alt-shortest-path] and label based mechanisms. mechanisms.
c. Mechanisms employing special addresses or labels which are c. Mechanisms employing special addresses or labels that are
installed in the FIBs of all routers with routes pre-computed to installed in the FIBs of all routers with routes pre-computed to
avoid certain components of the network. For example avoid certain components of the network. For example, see
[I-D.ietf-rtgwg-ipfrr-notvia-addresses]. [NOTVIA].
In many cases a repair path which reaches two hops away from the In many cases, a repair path that reaches two hops away from the
router detecting the failure will suffice, and it is anticipated that router detecting the failure will suffice, and it is anticipated that
around 98% of failures (see Section 5.2.2) can be repaired by this around 98% of failures (see Section 5.2.2) can be repaired by this
method. However, to provide complete repair coverage some use of method. However, to provide complete repair coverage, some use of
longer multi-hop repair paths is generally necessary. longer multi-hop repair paths is generally necessary.
5.2.1. Scope of repair paths 5.2.1. Scope of Repair Paths
A particular repair path may be valid for all destinations which A particular repair path may be valid for all destinations which
require repair or may only be valid for a subset of destinations. If require repair or may only be valid for a subset of destinations. If
a repair path is valid for a node immediately downstream of the a repair path is valid for a node immediately downstream of the
failure, then it will be valid for all destinations previously failure, then it will be valid for all destinations previously
reachable by traversing the failure. However, in cases where such a reachable by traversing the failure. However, in cases where such a
repair path is difficult to achieve because it requires a high order repair path is difficult to achieve because it requires a high order
multi-hop repair path, it may still be possible to identify lower multi-hop repair path, it may still be possible to identify lower-
order repair paths (possibly even loop free alternate paths) which order repair paths (possibly even loop-free alternate paths) that
allow the majority of destinations to be repaired. When IPFRR is allow the majority of destinations to be repaired. When IPFRR is
unable to provide complete repair, it is desirable that the extent of unable to provide complete repair, it is desirable that the extent of
the repair coverage can be determined and reported via network the repair coverage can be determined and reported via network
management. management.
There is a trade-off to be achieved between minimizing the number of There is a trade-off between minimizing the number of repair paths to
repair paths to be computed, and minimizing the overheads incurred in be computed, and minimizing the overheads incurred in using higher-
using higher order multi-hop repair paths for destinations for which order multi-hop repair paths for destinations for which they are not
they are not strictly necessary. However, the computational cost of strictly necessary. However, the computational cost of determining
determining repair paths on an individual destination basis can be repair paths on an individual destination basis can be very high.
very high.
It will frequently be the case that the majority of destinations may It will frequently be the case that the majority of destinations may
be repaired using only the "basic" repair mechanism, leaving a be repaired using only the "basic" repair mechanism, leaving a
smaller subset of the destinations to be repaired using one of the smaller subset of the destinations to be repaired using one of the
more complex multi-hop methods. Such a hybrid approach may go some more complex multi-hop methods. Such a hybrid approach may go some
way to resolving the conflict between completeness and complexity. way to resolving the conflict between completeness and complexity.
The use of repair paths may result in excessive traffic passing over The use of repair paths may result in excessive traffic passing over
a link, resulting in congestion discard. This reduces the a link, resulting in congestion discard. This reduces the
effectiveness of IPFRR. Mechanisms to influence the distribution of effectiveness of IPFRR. Mechanisms to influence the distribution of
repaired traffic to minimize this effect are therefore desirable. repaired traffic to minimize this effect are therefore desirable.
5.2.2. Analysis of repair coverage 5.2.2. Analysis of Repair Coverage
The repair coverage obtained is dependent on the repair strategy and The repair coverage obtained is dependent on the repair strategy and
highly dependent on the detailed topology and metrics. Estimates of highly dependent on the detailed topology and metrics. Estimates of
the repair coverage quoted in this document are for illustrative the repair coverage quoted in this document are for illustrative
purposes only and may not be always be achievable. purposes only and may not be always be achievable.
In some cases the repair strategy will permit the repair of all In some cases the repair strategy will permit the repair of all
single link or node failures in the network for all possible single link or node failures in the network for all possible
destinations. This can be defined as 100% coverage. However, where destinations. This can be defined as 100% coverage. However, where
the coverage is less than 100% it is important for the purposes of the coverage is less than 100%, it is important for the purposes of
comparisons between different proposed repair strategies to define comparisons between different proposed repair strategies to define
what is meant by such a percentage. There are four possibilities: what is meant by such a percentage. There are four possibilities:
1. The percentage of links (or nodes) which can be fully protected 1. The percentage of links (or nodes) that can be fully protected
(i.e. for all destinations). This is appropriate where the (i.e., for all destinations). This is appropriate where the
requirement is to protect all traffic, but some percentage of the requirement is to protect all traffic, but some percentage of the
possible failures may be identified as being un-protectable. possible failures may be identified as being un-protectable.
2. The percentage of destinations which can be protected for all 2. The percentage of destinations that can be protected for all link
link (or node) failures. This is appropriate where the (or node) failures. This is appropriate where the requirement is
requirement is to protect against all possible failures, but some to protect against all possible failures, but some percentage of
percentage of destinations may be identified as being un- destinations may be identified as being un-protectable.
protectable.
3. For all destinations (d) and for all failures (f), the percentage 3. For all destinations (d) and for all failures (f), the percentage
of the total potential failure cases (d*f) which are protected. of the total potential failure cases (d*f) that are protected.
This is appropriate where the requirement is an overall "best This is appropriate where the requirement is an overall "best-
effort" protection. effort" protection.
4. The percentage of packets normally passing though the network 4. The percentage of packets normally passing though the network
that will continue to reach their destination. This requires a that will continue to reach their destination. This requires a
traffic matrix for the network as part of the analysis. traffic matrix for the network as part of the analysis.
5.2.3. Link or node repair 5.2.3. Link or Node Repair
A repair path may be computed to protect against failure of an A repair path may be computed to protect against failure of an
adjacent link, or failure of an adjacent node. In general, link adjacent link, or failure of an adjacent node. In general, link
protection is simpler to achieve. A repair which protects against protection is simpler to achieve. A repair which protects against
node failure will also protect against link failure for all node failure will also protect against link failure for all
destinations except those for which the adjacent node is a single destinations except those for which the adjacent node is a single
point of failure. point of failure.
In some cases it may be necessary to distinguish between a link or In some cases, it may be necessary to distinguish between a link or
node failure in order that the optimal repair strategy is invoked. node failure in order that the optimal repair strategy is invoked.
Methods for link/node failure determination may be based on Methods for link/node failure determination may be based on
techniques such as BFD[I-D.ietf-bfd-base]. This determination may be techniques such as BFD [BFD]. This determination may be made prior
made prior to invoking any repairs, but this will increase the period to invoking any repairs, but this will increase the period of packet
of packet loss following a failure unless the determination can be loss following a failure unless the determination can be performed as
performed as part of the failure detection mechanism itself. part of the failure detection mechanism itself. Alternatively, a
Alternatively, a subsequent determination can be used to optimise an subsequent determination can be used to optimize an already invoked
already invoked default strategy. default strategy.
5.2.4. Maintenance of Repair paths 5.2.4. Maintenance of Repair Paths
In order to meet the response time goals, it is expected (though not In order to meet the response-time goals, it is expected (though not
required) that repair paths, and their associated FIB entries, will required) that repair paths, and their associated FIB entries, will
be pre-computed and installed ready for invocation when a failure is be pre-computed and installed ready for invocation when a failure is
detected. Following invocation the repair paths remain in effect detected. Following invocation, the repair paths remain in effect
until they are no longer required. This will normally be when the until they are no longer required. This will normally be when the
routing protocol has re-converged on the new topology taking into routing protocol has re-converged on the new topology taking into
account the failure, and traffic will no longer be using the repair account the failure, and traffic will no longer be using the repair
paths. paths.
The repair paths have the property that they are unaffected by any The repair paths have the property that they are unaffected by any
topology changes resulting from the failure which caused their topology changes resulting from the failure that caused their
instantiation. Therefore there is no need to re-compute them during instantiation. Therefore, there is no need to re-compute them during
the convergence period. They may be affected by an unrelated the convergence period. They may be affected by an unrelated
simultaneous topology change, but such events are out of scope of simultaneous topology change, but such events are out of scope of
this work (see Section 5.2.6). this work (see Section 5.2.6).
Once the routing protocol has re-converged it is necessary for all Once the routing protocol has re-converged, it is necessary for all
repair paths to take account of the new topology. Various repair paths to take account of the new topology. Various
optimizations may permit the efficient identification of repair paths optimizations may permit the efficient identification of repair paths
which are unaffected by the change, and hence do not require full re- that are unaffected by the change, and hence do not require full re-
computation. Since the new repair paths will not be required until computation. Since the new repair paths will not be required until
the next failure occurs, the re-computation may be performed as a the next failure occurs, the re-computation may be performed as a
background task and be subject to a hold-down, but excessive delay in background task and be subject to a hold-down, but excessive delay in
completing this operation will increase the risk of a new failure completing this operation will increase the risk of a new failure
occurring before the repair paths are in place. occurring before the repair paths are in place.
5.2.5. Local Area Networks 5.2.5. Local Area Networks
Protection against partial or complete failure of LANs is more Protection against partial or complete failure of LANs is more
complex than the point to point case. In general there is a trade- complex than the point-to-point case. In general, there is a trade-
off between the simplicity of the repair and the ability to provide off between the simplicity of the repair and the ability to provide
complete and optimal repair coverage. complete and optimal repair coverage.
5.2.6. Multiple failures and Shared Risk Link Groups 5.2.6. Multiple Failures and Shared Risk Link Groups
Complete protection against multiple unrelated failures is out of Complete protection against multiple unrelated failures is out of
scope of this work. However, it is important that the occurrence of scope of this work. However, it is important that the occurrence of
a second failure while one failure is undergoing repair should not a second failure while one failure is undergoing repair should not
result in a level of service which is significantly worse than that result in a level of service which is significantly worse than that
which would have been achieved in the absence of any repair strategy. which would have been achieved in the absence of any repair strategy.
Shared Risk Link Groups (SRLGs) are an example of multiple related Shared Risk Link Groups (SRLGs) are an example of multiple related
failures, and the more complex aspects of their protection is a failures, and the more complex aspects of their protection are a
matter for further study. matter for further study.
One specific example of an SRLG which is clearly within the scope of One specific example of an SRLG that is clearly within the scope of
this work is a node failure. This causes the simultaneous failure of this work is a node failure. This causes the simultaneous failure of
multiple links, but their closely defined topological relationship multiple links, but their closely defined topological relationship
makes the problem more tractable. makes the problem more tractable.
5.3. Mechanisms for micro-loop prevention 5.3. Mechanisms for Micro-Loop Prevention
Ensuring the absence of micro-loops is important not only because Ensuring the absence of micro-loops is important not only because
they can cause packet loss in traffic which is affected by the they can cause packet loss in traffic that is affected by the
failure, but because by saturating a link with looping packets they failure, but because by saturating a link with looping packets micro-
can also cause congestion loss of traffic flowing over that link loops can cause congestion. This congestion can then lead to routers
which would otherwise be unaffected by the failure. discarding traffic that would otherwise be unaffected by the failure.
A number of solutions to the problem of micro-loop formation have A number of solutions to the problem of micro-loop formation have
been proposed and are summarized in [I-D.ietf-rtgwg-lf-conv-frmwk]. been proposed and are summarized in [RFC5715]. The following factors
The following factors are significant in their classification: are significant in their classification:
1. Partial or complete protection against micro-loops. 1. Partial or complete protection against micro-loops.
2. Delay imposed upon convergence. 2. Convergence delay.
3. Tolerance of multiple failures (from node failures, and in 3. Tolerance of multiple failures (from node failures, and in
general). general).
4. Computational complexity (pre-computed or real time). 4. Computational complexity (pre-computed or real time).
5. Applicability to scheduled events. 5. Applicability to scheduled events.
6. Applicability to link/node reinstatement. 6. Applicability to link/node reinstatement.
skipping to change at page 13, line 26 skipping to change at page 12, line 42
6. Management Considerations 6. Management Considerations
While many of the management requirements will be specific to While many of the management requirements will be specific to
particular IPFRR solutions, the following general aspects need to be particular IPFRR solutions, the following general aspects need to be
addressed: addressed:
1. Configuration 1. Configuration
A. Enabling/disabling IPFRR support. A. Enabling/disabling IPFRR support.
B. Enabling/disabling protection on a per link/node basis. B. Enabling/disabling protection on a per-link or per-node
basis.
C. Expressing preferences regarding the links/nodes used for C. Expressing preferences regarding the links/nodes used for
repair paths. repair paths.
D. Configuration of failure detection mechanisms. D. Configuration of failure detection mechanisms.
E. Configuration of loop avoidance strategies E. Configuration of loop-avoidance strategies
2. Monitoring and operational support 2. Monitoring and operational support
A. Notification of links/nodes/destinations which cannot be A. Notification of links/nodes/destinations that cannot be
protected. protected.
B. Notification of pre-computed repair paths, and anticipated B. Notification of pre-computed repair paths, and anticipated
traffic patterns. traffic patterns.
C. Counts of failure detections, protection invocations and C. Counts of failure detections, protection invocations, and
packets forwarded over repair paths. packets forwarded over repair paths.
D. Testing repairs. D. Testing repairs.
7. IANA Considerations 7. Security Considerations
There are no IANA considerations that arise from this framework
document.
8. Security Considerations
This framework document does not itself introduce any security This framework document does not itself introduce any security
issues, but attention must be paid to the security implications of issues, but attention must be paid to the security implications of
any proposed solutions to the problem. any proposed solutions to the problem.
Where the chosen solution uses tunnels it is necessary to ensure that Where the chosen solution uses tunnels it is necessary to ensure that
the tunnel is not used as an attack vector. One method of addressing the tunnel is not used as an attack vector. One method of addressing
this is to use a set of tunnel endpoint addresses that are excluded this is to use a set of tunnel endpoint addresses that are excluded
from use by user traffic. from use by user traffic.
There is a compatibility issue between IPFRR and reverse path There is a compatibility issue between IPFRR and reverse path
forwarding (RPF) checking. Many of the solutions described in this forwarding (RPF) checking. Many of the solutions described in this
document result in traffic arriving from a direction inconsistent document result in traffic arriving from a direction inconsistent
with a standard RPF check. When a network relies on RPF checking for with a standard RPF check. When a network relies on RPF checking for
security purposes, an alternative security mechanism will need to be security purposes, an alternative security mechanism will need to be
deployed in order to permit IPFRR to used. deployed in order to permit IPFRR to used.
Because the repair path will often be of a different length to the Because the repair path will often be of a different length than the
pre-failure path, security mechanisms which rely on specific TTL pre-failure path, security mechanisms that rely on specific Time to
values will be adversely affected. Live (TTL) values will be adversely affected.
9. Acknowledgements 8. Acknowledgements
The authors would like to acknowledge contributions made by Alia The authors would like to acknowledge contributions made by Alia
Atlas, Clarence Filsfils, Pierre Francois, Joel Halpern, Stefano Atlas, Clarence Filsfils, Pierre Francois, Joel Halpern, Stefano
Previdi and Alex Zinin. Previdi, and Alex Zinin.
10. Informative References
[FIFR] Nelakuditi, S., Lee, S., Lu, Y., Zhang, Z., and C. Chuah,
"Fast local rerouting for handling transient link
failures."", Tech. Rep. TR-2004-004, 2004.
[I-D.atlas-ip-local-protect-uturn]
Atlas, A., "U-turn Alternates for IP/LDP Fast-Reroute",
draft-atlas-ip-local-protect-uturn-03 (work in progress),
March 2006.
[I-D.bryant-ipfrr-tunnels] 9. Informative References
Bryant, S., Filsfils, C., Previdi, S., and M. Shand, "IP
Fast Reroute using tunnels", draft-bryant-ipfrr-tunnels-03
(work in progress), November 2007.
[I-D.ietf-bfd-base] [ALT-SP] Tian, A., "Fast Reroute using Alternative Shortest Paths",
Katz, D. and D. Ward, "Bidirectional Forwarding Work in Progress, July 2004.
Detection", draft-ietf-bfd-base-09 (work in progress),
February 2009.
[I-D.ietf-rtgwg-ipfrr-notvia-addresses] [BFD] Katz, D. and D. Ward, "Bidirectional Forwarding
Shand, M., Bryant, S., and S. Previdi, "IP Fast Reroute Detection", Work in Progress, January 2010.
Using Not-via Addresses",
draft-ietf-rtgwg-ipfrr-notvia-addresses-04 (work in
progress), July 2009.
[I-D.ietf-rtgwg-lf-conv-frmwk] [FIFR] Nelakuditi, S., Lee, S., Lu, Y., Zhang, Z., and C. Chuah,
Shand, M. and S. Bryant, "A Framework for Loop-free "Fast Local Rerouting for Handling Transient Link
Convergence", draft-ietf-rtgwg-lf-conv-frmwk-07 (work in Failures", IEEE/ACM Transactions on Networking, Vol. 15,
progress), October 2009. No. 2, DOI 10.1109/TNET.2007.892851, available
from http://www.ieeexplore.ieee.org, April 2007.
[I-D.tian-frr-alt-shortest-path] [NOTVIA] Shand, M., Bryant, S., and S. Previdi, "IP Fast Reroute
Tian, A., "Fast Reroute using Alternative Shortest Paths", Using Not-via Addresses", Work in Progress, July 2009.
draft-tian-frr-alt-shortest-path-01 (work in progress),
July 2004.
[RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute [RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute
Extensions to RSVP-TE for LSP Tunnels", RFC 4090, Extensions to RSVP-TE for LSP Tunnels", RFC 4090,
May 2005. May 2005.
[RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast
Reroute: Loop-Free Alternates", RFC 5286, September 2008. Reroute: Loop-Free Alternates", RFC 5286, September 2008.
[SIMULA] Lysne, O., Kvalbein, A., Cicic, T., Gjessing, S., and A. [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free
Hansen, "Fast IP Network Recovery using Multiple Routing Convergence", RFC 5715, January 2010.
Configurations."", Infocom 10.1109/INFOCOM.2006.227, 2006,
<http://folk.uio.no/amundk/infocom06.pdf>. [SIMULA] Kvalbein, A., Hansen, A., Cicic, T., Gjessing, S., and O.
Lysne, "Fast IP Network Recovery using Multiple Routing
Configurations", Infocom 10.1109/INFOCOM.2006.227,
available from http://www.ieeexplore.ieee.org, April 2006.
[TUNNELS] Bryant, S., Filsfils, C., Previdi, S., and M. Shand, "IP
Fast Reroute using tunnels", Work in Progress,
November 2007.
[UTURN] Atlas, A., "U-turn Alternates for IP/LDP Fast-Reroute",
Work in Progress, February 2006.
Authors' Addresses Authors' Addresses
Mike Shand Mike Shand
Cisco Systems Cisco Systems
250, Longwater Avenue. 250, Longwater Avenue.
Reading, Berks RG2 6GB Reading, Berks RG2 6GB
UK UK
Email: mshand@cisco.com EMail: mshand@cisco.com
Stewart Bryant Stewart Bryant
Cisco Systems Cisco Systems
250, Longwater Avenue. 250, Longwater Avenue.
Reading, Berks RG2 6GB Reading, Berks RG2 6GB
UK UK
Email: stbryant@cisco.com EMail: stbryant@cisco.com
 End of changes. 100 change blocks. 
262 lines changed or deleted 238 lines changed or added

This html diff was produced by rfcdiff 1.37b. The latest version is available from http://tools.ietf.org/tools/rfcdiff/