draft-ietf-rtgwg-ipfrr-framework-00.txt   draft-ietf-rtgwg-ipfrr-framework-01.txt 
Network Working Group M. Shand Network Working Group M. Shand
Internet Draft Internet Draft
Expiration Date: Dec 2004 Cisco Systems Expiration Date: Dec 2004 Cisco Systems
June 2004 June 2004
IP Fast Reroute Framework IP Fast Reroute Framework
draft-ietf-rtgwg-ipfrr-framework-00.txt draft-ietf-rtgwg-ipfrr-framework-01.txt
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with By submitting this Internet-Draft, I certify that any applicable
all provisions of Section 10 of RFC 2026. patent or other IPR claims of which I am aware have been disclosed,
or will be disclosed, and any of which I become aware will be
disclosed, in accordance with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsolete by other documents at any and may be updated, replaced, or obsolete by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress". material or to cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
skipping to change at page 2, line 5 skipping to change at page 2, line 5
This document provides a framework for the development of IP fast re- This document provides a framework for the development of IP fast re-
route mechanisms which provide protection against link or router route mechanisms which provide protection against link or router
failure by invoking locally determined repair paths. Unlike MPLS failure by invoking locally determined repair paths. Unlike MPLS
Fast-reroute, the mechanisms are applicable to a network employing Fast-reroute, the mechanisms are applicable to a network employing
conventional IP routing and forwarding. An essential part of such conventional IP routing and forwarding. An essential part of such
mechanisms is the prevention of packet loss caused by the loops which mechanisms is the prevention of packet loss caused by the loops which
normally occur during the re-convergence of the network following a normally occur during the re-convergence of the network following a
failure. failure.
Terminology
This section defines words, acronyms, and actions used in this draft.
1. Introduction 1. Introduction
When a link or node failure occurs in a routed network, there is When a link or node failure occurs in a routed network, there is
inevitably a period of disruption to the delivery of traffic until inevitably a period of disruption to the delivery of traffic until
the network re-converges on the new topology. Packets for the network re-converges on the new topology. Packets for
destinations which were previously reached by traversing the failed destinations which were previously reached by traversing the failed
component may be dropped or may suffer looping. Traditionally such component may be dropped or may suffer looping. Traditionally such
disruptions have lasted for periods of at least several seconds, and disruptions have lasted for periods of at least several seconds, and
most applications have been constructed to tolerate such a quality of most applications have been constructed to tolerate such a quality of
service. service.
skipping to change at page 3, line 9 skipping to change at page 2, line 50
The duration of the packet delivery disruption caused by a The duration of the packet delivery disruption caused by a
conventional routing transition is determined by a number of factors: conventional routing transition is determined by a number of factors:
1. The time taken to detect the failure. This may be of the order 1. The time taken to detect the failure. This may be of the order
of a few mS when it can be detected at the physical layer, up to of a few mS when it can be detected at the physical layer, up to
several tens of seconds when a routing protocol hello is several tens of seconds when a routing protocol hello is
employed. During this period packets will be unavoidably lost. employed. During this period packets will be unavoidably lost.
2. The time taken for the local router to react to the failure. 2. The time taken for the local router to react to the failure.
This will typically involve generating and flooding new routing This will typically involve generating and flooding new routing
updates, and re-computing the router's FIB. updates, perhaps after some hold-down delay, and re-computing
the router's FIB.
3. The time taken to pass the information about the failure to 3. The time taken to pass the information about the failure to
other routers in the network. In the absence of routing protocol other routers in the network. In the absence of routing protocol
packet loss, this is typically between 10mS and 100mS per hop in packet loss, this is typically between 10mS and 100mS per hop.
a well designed router.
4. The time taken to re-compute the forwarding tables. This is 4. The time taken to re-compute the forwarding tables. This is
typically a few mS for a link state protocol using Dijkstra's typically a few mS for a link state protocol using Dijkstra's
algorithm. algorithm.
5. The time taken to load the revised forwarding tables into the 5. The time taken to load the revised forwarding tables into the
forwarding hardware. This time is very implementation dependant forwarding hardware. This time is very implementation dependant
and also depends on the number of prefixes affected by the and also depends on the number of prefixes affected by the
failure, but may be several hundred mS. failure, but may be several hundred mS.
skipping to change at page 4, line 33 skipping to change at page 4, line 22
3. Mechanisms for IP Fast-route 3. Mechanisms for IP Fast-route
The set of mechanisms required for an effective solution to the The set of mechanisms required for an effective solution to the
problem can be broken down into the following sub-problems. problem can be broken down into the following sub-problems.
3.1. Mechanisms for fast failure detection 3.1. Mechanisms for fast failure detection
It is critical that the failure detection time is minimized. A number It is critical that the failure detection time is minimized. A number
of approaches are possible, such as: of approaches are possible, such as:
1. Physical detection, such as loss of light. 1. Physical detection; for example, loss of light.
2. The Bidirectional Failure Detection protocol [BFD] 2. Routing protocol independent protocol detection; for example,
The Bidirectional Failure Detection protocol [BFD].
3. Other forms of "fast hellos" 3. Routing protocol detection; for example, use of "fast hellos".
3.2. Mechanisms for repair paths 3.2. Mechanisms for repair paths
Once a failure has been detected by one of the above mechanisms, Once a failure has been detected by one of the above mechanisms,
traffic which previously traversed the failure is transmitted over traffic which previously traversed the failure is transmitted over
one or more repair paths. The design of the repair paths should be one or more repair paths. The design of the repair paths should be
such that they can be pre-calculated in anticipation of each local such that they can be pre-calculated in anticipation of each local
failure and made available for invocation with minimal delay. There failure and made available for invocation with minimal delay. There
are three basic categories of repair paths: are three basic categories of repair paths:
1. Equal cost multiple paths (ECMP). Where such paths exist, and 1. Equal cost multiple paths (ECMP). Where such paths exist, and
one or more of the alternate paths do not traverse the failure, one or more of the alternate paths do not traverse the failure,
they may trivially be used as repair paths. they may trivially be used as repair paths.
2. Downstream paths. (Also known as "loop free feasible 2. Downstream paths. (Also known as "loop free feasible
alternates".) Such a path exists when a direct neighbor of the alternates".) Such a path exists when a direct neighbor of the
router adjacent to the failure has a path to the destination router adjacent to the failure has a path to the destination
which cannot traverse the failure. which can be guaranteed not to traverse the failure.
3. Multihop repair paths. When there is no feasible downstream path 3. Multihop repair paths. When there is no feasible downstream path
it may still be possible to locate a router, which is more than it may still be possible to locate a router, which is more than
one hop away from the router adjacent to the failure, from which one hop away from the router adjacent to the failure, from which
traffic will be forwarded to the destination without traversing traffic will be forwarded to the destination without traversing
the failure. the failure.
ECMP and downstream paths offer the simplest repair paths and would ECMP and downstream paths offer the simplest repair paths and would
normally be used when they are available. It is anticipated that normally be used when they are available. It is anticipated that
around 80% of failures can be repaired using these alone. around 80% of failures (see section 3.2.2) can be repaired using
these alone.
Multi-hop repair paths are considerably more complex, both in the Multi-hop repair paths are considerably more complex, both in the
computations required to determine their existence, and in the computations required to determine their existence, and in the
mechanisms required to invoke them. They can be further classified mechanisms required to invoke them. They can be further classified
as: as:
1. Mechanisms where one or more alternate FIBs are pre-computed in 1. Mechanisms where one or more alternate FIBs are pre-computed in
all routers and the repaired packet is instructed to be all routers and the repaired packet is instructed to be
forwarded using a "repair FIB" by some method of signaling such forwarded using a "repair FIB" by some method of signaling such
as detecting a "U-turn" or marking the packet. as detecting a "U-turn" [U-TURNS] or marking the packet.
2. Mechanisms functionally equivalent to a loose source route which 2. Mechanisms functionally equivalent to a loose source route which
is invoked using the normal FIB. These include tunnels and label is invoked using the normal FIB. These include tunnels [TUNNELS]
based mechanisms. and label based mechanisms.
In many cases a repair path which reaches two-hops away from the In many cases a repair path which reaches two-hops away from the
router detecting the failure will suffice, and it is anticipated that router detecting the failure will suffice, and it is anticipated that
around 95% of failures can be repaired by this method. However, to around 98% of failures (see section 3.2.2) can be repaired by this
effect complete repair coverage some use of longer multi-hop repair method. However, to provide complete repair coverage some use of
paths is generally necessary. longer multi-hop repair paths is generally necessary.
3.2.1. Scope of repair paths 3.2.1. Scope of repair paths
A particular repair path may be valid for all destinations which A particular repair path may be valid for all destinations which
require repair or may only be valid for a subset of destinations. If require repair or may only be valid for a subset of destinations. If
a repair path is valid for a node immediately downstream of the a repair path is valid for a node immediately downstream of the
failure, then it will be valid for all destinations previously failure, then it will be valid for all destinations previously
reachable by traversing the failure. However, in cases where such a reachable by traversing the failure. However, in cases where such a
repair path is difficult to achieve because it requires a high order repair path is difficult to achieve because it requires a high order
multi-hop repair path, it may still be possible to identify lower multi-hop repair path, it may still be possible to identify lower
skipping to change at page 6, line 10 skipping to change at page 6, line 5
using higher order multi-hop repair paths for destinations for which using higher order multi-hop repair paths for destinations for which
they are not strictly necessary. However, the computational cost of they are not strictly necessary. However, the computational cost of
determining repair paths on an individual destination basis can be determining repair paths on an individual destination basis can be
very high. very high.
The use of repair paths may result in excessive traffic passing over The use of repair paths may result in excessive traffic passing over
a link, resulting in congestion discard. This reduces the a link, resulting in congestion discard. This reduces the
effectiveness of IPFRR. Mechanisms to influence the distribution of effectiveness of IPFRR. Mechanisms to influence the distribution of
repaired traffic to minimize this effect are therefore desirable. repaired traffic to minimize this effect are therefore desirable.
3.2.2. Link or node repair 3.2.2. Analysis of repair coverage
In some cases the repair strategy will permit the repair of all
single link or node failures in the network for all possible
destinations. This can be defined as 100% coverage. However, where
the coverage is less than 100% it is important for the purposes of
comparisons between different proposed repair strategies to define
what is meant by such a percentage. There are three possibilities:
1. The percentage of links (or nodes) which can be fully protected
for all destinations. This is appropriate where the requirement
is to protect all traffic, but some percentage of the possible
failures may be identified as being un-protectable.
2. The percentage of destinations which can be fully protected for
all link (or node) failures. This is appropriate where the
requirement is to protect against all possible failures, but
some percentage of destinations may be identified as being un-
protectable.
3. For all destinations (d) and for all failures (f), the
percentage of the total potential failure cases (d*f) which are
protected. This is appropriate where the requirement is an
overall "best effort" protection.
The coverage obtained is dependent on the repair strategy and highly
dependent on the detailed topology and metrics. Any figures quoted in
this document are for illustrative purposes only.
3.2.3. Link or node repair
A repair path may be computed to protect against failure of an A repair path may be computed to protect against failure of an
adjacent link, or failure of an adjacent node. In general, link adjacent link, or failure of an adjacent node. In general, link
protection is simpler to achieve. A repair which protects against protection is simpler to achieve. A repair which protects against
node failure will also protect against link failure for all node failure will also protect against link failure for all
destinations except those for which the adjacent node is a single destinations except those for which the adjacent node is a single
point of failure. point of failure.
In some cases it may be necessary to distinguish between a link or In some cases it may be necessary to distinguish between a link or
node failure in order that the optimal repair strategy is invoked. node failure in order that the optimal repair strategy is invoked.
Methods for link/node failure determination may be based on Methods for link/node failure determination may be based on
techniques such as BFD. This determination may be made prior to techniques such as BFD. This determination may be made prior to
invoking any repairs, but this will increase the period of packet invoking any repairs, but this will increase the period of packet
loss following a failure unless the determination can be performed as loss following a failure unless the determination can be performed as
part of the failure detection mechanism itself. Alternatively, a part of the failure detection mechanism itself. Alternatively, a
subsequent determination can be used to optimise an already invoked subsequent determination can be used to optimise an already invoked
default strategy. default strategy.
3.2.3. Multiple failures and Shared Risk Groups 3.2.4. Maintenance of Repair paths
In order to meet the response time goals, it is expected (though not
required) that repair paths, and their associated FIB entries, will
be pre-computed and installed ready for invocation when a failure is
detected. Following invocation the repair paths remain in effect
until they are no longer required. This will normally be when the
routing protocol has re-converged on the new topology taking into
account the failure, and traffic will no longer be using the repair
paths.
The repair paths have the property that they are unaffected by any
topology changes resulting from the failure which caused their
instantiation. Therefore there is no need to re-compute them during
the convergence period. They may be affected by an unrelated
simultaneous topology change, but such events are out of scope of
this work (see section 3.2.5).
Once the routing protocol has re-converged it is necessary for all
repair paths to take account of the new topology. Various
optimizations may permit the efficient identification of repair paths
which are unaffected by the change, and hence do not require full re-
computation. Since the new repair paths will not be required until
the next failure occurs, the re-computation may be performed as a
background task and be subject to a hold-down, but excessive delay in
completing this operation will increase the risk of a new failure
occurring before the repair paths are in place.
3.2.5. Multiple failures and Shared Risk Groups
Complete protection against multiple unrelated failures is out of Complete protection against multiple unrelated failures is out of
scope of this work. However, it is important that the occurrence of a scope of this work. However, it is important that the occurrence of a
second failure while one failure is undergoing repair should not second failure while one failure is undergoing repair should not
result in a level of service which is significantly worse than that result in a level of service which is significantly worse than that
which would have been achieved in the absence of any repair strategy. which would have been achieved in the absence of any repair strategy.
Shared Risk Groups are an example of multiple related failures, and Shared Risk Groups are an example of multiple related failures, and
their protection is a matter for further study. their protection is a matter for further study.
One specific example of an SRLG which is clearly within the scope of One specific example of an SRLG which is clearly within the scope of
this work is a node failure. This causes the simultaneous failure of this work is a node failure. This causes the simultaneous failure of
multiple links, but their closely defined topological relationship multiple links, but their closely defined topological relationship
makes the problem more tractable. makes the problem more tractable.
3.3. Mechanisms for micro-loop prevention 3.3. Mechanisms for micro-loop prevention
Control of micro-loops is important not only because they can cause Control of micro-loops is important not only because they can cause
packet loss in traffic which is affected by the failure, but because packet loss in traffic which is affected by the failure, but because
they can also cause congestion loss of traffic which would otherwise by saturating a link with looping packets they can also cause
be unaffected by the failure. congestion loss of traffic flowing over that link which would
otherwise be unaffected by the failure.
A number of solutions to the problem of micro-loop formation have A number of solutions to the problem of micro-loop formation have
been proposed. The following factors are significant in their been proposed. The following factors are significant in their
classification: classification:
1. Partial or complete protection against micro-loops. 1. Partial or complete protection against micro-loops.
2. Delay imposed upon convergence. 2. Delay imposed upon convergence.
3. Tolerance of multiple failures (from node failures, and in 3. Tolerance of multiple failures (from node failures, and in
general) general).
4. Computational complexity (pre-computed or real time) 4. Computational complexity (pre-computed or real time).
5. Applicability to scheduled events 5. Applicability to scheduled events.
6. Applicability to link/node reinstatement. 6. Applicability to link/node reinstatement.
4. Scope and applicability 4. Management Considerations
While many of the management requirements will be specific to
particular IPFRR solutions, the following general aspects need to be
addressed:
1. Configuration
a. Enabling/disabling IPFRR support.
b. Enabling/disabling protection on a per link/node basis.
c. Expressing preferences regarding the links/nodes used for
repair paths.
d. Configuration of failure detection mechanisms.
e. Configuration of loop avoidance strategies.
2. Monitoring
a. Notification of links/nodes/destinations which cannot be
protected.
b. Notification of pre-computed repair paths, and anticipated
traffic patterns.
c. Counts of failure detections, protection invocations and
packets forwarded over repair paths.
5. Scope and applicability
Link state protocols provide ubiquitous topology information, which Link state protocols provide ubiquitous topology information, which
facilitates the computation of repairs paths. Therefore the initial facilitates the computation of repairs paths. Therefore the initial
scope of this work is in the context of link state IGPs. scope of this work is in the context of link state IGPs.
Provision of similar facilities in non-link state IGPs and BGP is a Provision of similar facilities in non-link state IGPs and BGP is a
matter for further study, but the correct operation of the repair matter for further study, but the correct operation of the repair
mechanisms for traffic with a destination outside the IGP domain is mechanisms for traffic with a destination outside the IGP domain is
an important consideration for solutions based on this framework an important consideration for solutions based on this framework
5. IANA considerations 6. IANA considerations
There are no IANA considerations that arise from this description of There are no IANA considerations that arise from this framework
IPFRR. However there may be changes to the IGPs to support IPFRR in document.
which there will be IANA considerations.
6. Security Considerations 7. Security Considerations
This framework document does not itself introduce any security This framework document does not itself introduce any security
issues, but attention must be paid to the security implications of issues, but attention must be paid to the security implications of
any proposed solutions to the problem. any proposed solutions to the problem.
Acknowledgments 8. IPR Disclosure Acknowledgement
Normative References Certain IPR may be applicable to the mechanisms outlined in this
document. Please check the detailed specifications for possible IPR
notices.
9. Normative References
Internet-drafts are works in progress available from Internet-drafts are works in progress available from
http://www.ietf.org/internet-drafts/ http://www.ietf.org/internet-drafts/
Informative References 10. Informative References
Internet-drafts are works in progress available from Internet-drafts are works in progress available from
http://www.ietf.org/internet-drafts/ http://www.ietf.org/internet-drafts/
BFD Katz, D., and Ward, D., "Bidirectional Forwarding BFD Katz, D., and Ward, D., "Bidirectional Forwarding
Detection", draft-katz-ward-bfd-01.txt, August Detection", draft-katz-ward-bfd-02.txt, (work in
2003 (work in progress). progress).
MPLSFRR Pan, P. et al, "Fast Reroute Extensions to RSVP- MPLSFRR Pan, P. et al, "Fast Reroute Extensions to RSVP-
TE for LSP Tunnels", draft-ietf-mpls-rsvp-lsp- TE for LSP Tunnels",
fastreroute-05.txt draft-ietf-mpls-rsvp-lsp-fastreroute-05.txt
Author's Address TUNNELS Bryant, S. et al, "IP Fast Reroute using
tunnels", draft-bryant-ipfrr-tunnels-00.txt,
(work in progress).
U-TURNS Atlas, A. et al, "IP/LDP Local Protection",
draft-atlas-ip-local-protect-00.txt, (work in
progress).
11. Author's Address
Mike Shand Mike Shand
Cisco Systems, Cisco Systems,
250, Longwater Avenue, 250, Longwater Avenue,
Green Park, Green Park,
Reading, RG2 6GB, Reading, RG2 6GB,
United Kingdom. Email: mshand@cisco.com United Kingdom. Email: mshand@cisco.com
Full copyright statement Full copyright statement
Copyright (C) The Internet Society (2004). All Rights Reserved. Copyright (C) The Internet Society (2004). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
This document is subject to the rights, licenses and restrictions except as set forth therein, the authors retain all their rights.
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/