draft-ietf-rtgwg-ipfrr-framework-01.txt   draft-ietf-rtgwg-ipfrr-framework-02.txt 
Network Working Group M. Shand Network Working Group M. Shand
Internet Draft Internet Draft
Expiration Date: Dec 2004 Cisco Systems Expiration Date: April 2005 Cisco Systems
June 2004 October 2004
IP Fast Reroute Framework IP Fast Reroute Framework
draft-ietf-rtgwg-ipfrr-framework-01.txt draft-ietf-rtgwg-ipfrr-framework-02.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, I certify that any applicable By submitting this Internet-Draft, I certify that any applicable
patent or other IPR claims of which I am aware have been disclosed, patent or other IPR claims of which I am aware have been disclosed,
or will be disclosed, and any of which I become aware will be or will be disclosed, and any of which I become aware will be
disclosed, in accordance with RFC 3668. disclosed, in accordance with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
skipping to change at page 1, line 36 skipping to change at page 1, line 35
material or to cite them other than as "work in progress". material or to cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
This document provides a framework for the development of IP fast re- This document provides a framework for the development of IP
route mechanisms which provide protection against link or router fast-reroute mechanisms which provide protection against link or
failure by invoking locally determined repair paths. Unlike MPLS router failure by invoking locally determined repair paths. Unlike
Fast-reroute, the mechanisms are applicable to a network employing MPLS Fast-reroute, the mechanisms are applicable to a network
conventional IP routing and forwarding. An essential part of such employing conventional IP routing and forwarding. An essential part
mechanisms is the prevention of packet loss caused by the loops which of such mechanisms is the prevention of packet loss caused by the
normally occur during the re-convergence of the network following a loops which normally occur during the re-convergence of the network
failure. following a failure.
Terminology
This section defines words and acronyms used in this draft and other
drafts discussing IP Fast-reroute.
D Used to denote the destination router under
discussion.
Distance_opt(A,B) The distance of the shortest path from A
to B.
Downstream Path This is a subset of the loop-free alternates
where the neighbor N meet the following
condition:-
Distance_opt(N, D) < Distance_opt(S,D)
E Used to denote the router which is the
primary next-hop neighbor to get from S to
the destination D. Where there is an ECMP set
for the shortest path from S to D, these are
referred to as E_1, E_2, etc.
ECMP Equal cost multi-path: Where, for a
particular destination D, multiple primary
next-hops are used to forward traffic because
there exist multiple shortest paths from S
via different output layer-3 interfaces.
FIB Forwarding Information Base. The database
used by the packet forwarder to determine
what actions to perform on a packet.
IPFRR IP fast-reroute
Link(A->B) A link connecting router A to router B.
Loop-Free This is a neighbor N, that is not a primary
Alternate next-hop neighbor E, whose shortest path to
the destination D does not go back through
the router S. The neighbor N must meet the
following condition:-
Distance_opt(N, D) < Distance_opt(N, S) +
Distance_opt(S, D)
Loop-Free A neighbor N_i, which is not the particular
Neighbor primary neighbor E_k under discussion, and
whose shortest path to D does not traverse S.
For example, if there are two primary
neighbors E_1 and E_2, E_1 is a loop-free
neighbor with regard to E_2 and vice versa.
Loop-Free This is a path via a Loop-Free Neighbor N_i
Link-Protecting which does not go through the particular link
Alternate of S which is being protected to reach the
destination D.
Loop-Free This is a path via a Loop-Free Neighbor N_i
Node-Protecting which does not go through the particular
Alternate primary neighbor of S which is being
protected to reach the destination D.
Micro-loop A temporary forwarding loop which exists
during a routing transition as a result of
temporary inconsistencies between FIBs.
N_i The ith neighbor of S.
Primary Neighbor A neighbor N_i of S which is one of the next
hops for destination D in S's FIB prior to
any failure.
R_i_j The jth neighbor of N_i.
Routing The process whereby routers converge on a new
transition topology. In conventional networks this
process frequently causes some disruption to
packet delivery.
RPF Reverse Path Forwarding. I.e. checking that a
packet is received over the interface which
would be used to send packets addressed to
the source address of the packet.
S Used to denote a router that is the source of
a repair that is computed in anticipation of
the failure of a neighboring router denoted
as E, or of the link between S and E. It is
the viewpoint from which IP Fast-Reroute is
described.
S_i The set of neighbors of E, in addition to S,
which will independently take the role of S
for the traffic they carry.
SPF Shortest Path First, e.g. Dijkstra's
algorithm.
SPT Shortest path tree
Upstream This is a forwarding loop which involves a
Forwarding Loop set of routers, none of which are directly
connected to the link which has caused the
topology change that triggered a new SPF in
any of the routers.
1. Introduction 1. Introduction
When a link or node failure occurs in a routed network, there is When a link or node failure occurs in a routed network, there is
inevitably a period of disruption to the delivery of traffic until inevitably a period of disruption to the delivery of traffic until
the network re-converges on the new topology. Packets for the network re-converges on the new topology. Packets for
destinations which were previously reached by traversing the failed destinations which were previously reached by traversing the failed
component may be dropped or may suffer looping. Traditionally such component may be dropped or may suffer looping. Traditionally such
disruptions have lasted for periods of at least several seconds, and disruptions have lasted for periods of at least several seconds, and
most applications have been constructed to tolerate such a quality of most applications have been constructed to tolerate such a quality of
skipping to change at page 2, line 32 skipping to change at page 4, line 32
Addressing these issues is difficult because the distributed nature Addressing these issues is difficult because the distributed nature
of the network imposes an intrinsic limit on the minimum convergence of the network imposes an intrinsic limit on the minimum convergence
time which can be achieved. time which can be achieved.
However, there is an alternative approach, which is to compute backup However, there is an alternative approach, which is to compute backup
routes that allow the failure to be repaired locally by the router(s) routes that allow the failure to be repaired locally by the router(s)
detecting the failure without the immediate need to inform other detecting the failure without the immediate need to inform other
routers of the failure. In this case, the disruption time can be routers of the failure. In this case, the disruption time can be
limited to the small time taken to detect the adjacent failure and limited to the small time taken to detect the adjacent failure and
invoke the backup routes. This is analogous to the technique employed invoke the backup routes. This is analogous to the technique employed
by MPLS Fast Reroute [MPLSFRR], but the mechanisms employed for the by MPLS Fast-Reroute [MPLSFRR], but the mechanisms employed for the
backup routes in pure IP networks are necessarily very different. backup routes in pure IP networks are necessarily very different.
This document provides a framework for the development of this This document provides a framework for the development of this
approach. approach.
2. Problem Analysis 2. Problem Analysis
The duration of the packet delivery disruption caused by a The duration of the packet delivery disruption caused by a
conventional routing transition is determined by a number of factors: conventional routing transition is determined by a number of factors:
skipping to change at page 3, line 44 skipping to change at page 5, line 44
router to another. router to another.
In order to achieve packet disruption times which are commensurate In order to achieve packet disruption times which are commensurate
with the failure detection times it is necessary to perform two with the failure detection times it is necessary to perform two
distinct tasks: distinct tasks:
1. Provide a mechanism for the router(s) adjacent to the failure to 1. Provide a mechanism for the router(s) adjacent to the failure to
rapidly invoke a repair path, which is unaffected by any rapidly invoke a repair path, which is unaffected by any
subsequent re-convergence. subsequent re-convergence.
2. Provide a mechanism to prevent the effects of micro loops during 2. Provide a mechanism to prevent the effects of micro-loops during
subsequent re-convergence. subsequent re-convergence.
Performing the first task without the second will result in the Performing the first task without the second will result in the
repair path being starved of traffic and hence being redundant. repair path being starved of traffic and hence being redundant.
Performing the second without the first will result in traffic being Performing the second without the first will result in traffic being
discarded by the router(s) adjacent to the failure. Both tasks are discarded by the router(s) adjacent to the failure. Both tasks are
necessary for an effective solution to the problem. necessary for an effective solution to the problem.
However, repair paths can be used in isolation where the failure is However, repair paths can be used in isolation where the failure is
short-lived. The repair paths can be kept in place until the failure short-lived. The repair paths can be kept in place until the failure
is repaired and there is no need to advertise the failure to other is repaired and there is no need to advertise the failure to other
routers. routers.
Similarly, micro loop avoidance can be used in isolation to prevent Similarly, micro-loop avoidance can be used in isolation to prevent
loops arising from pre-planned management action. loops arising from pre-planned management action.
Note that micro-loops can also occur when a link or node is restored Note that micro-loops can also occur when a link or node is restored
to service and thus a micro-loop avoidance mechanism is required for to service and thus a micro-loop avoidance mechanism is required for
both link up and link down cases. both link up and link down cases.
3. Mechanisms for IP Fast-route 3. Mechanisms for IP Fast-reroute
The set of mechanisms required for an effective solution to the The set of mechanisms required for an effective solution to the
problem can be broken down into the following sub-problems. problem can be broken down into the following sub-problems.
3.1. Mechanisms for fast failure detection 3.1. Mechanisms for fast failure detection
It is critical that the failure detection time is minimized. A number It is critical that the failure detection time is minimized. A number
of approaches are possible, such as: of approaches are possible, such as:
1. Physical detection; for example, loss of light. 1. Physical detection; for example, loss of light.
skipping to change at page 4, line 38 skipping to change at page 6, line 38
3.2. Mechanisms for repair paths 3.2. Mechanisms for repair paths
Once a failure has been detected by one of the above mechanisms, Once a failure has been detected by one of the above mechanisms,
traffic which previously traversed the failure is transmitted over traffic which previously traversed the failure is transmitted over
one or more repair paths. The design of the repair paths should be one or more repair paths. The design of the repair paths should be
such that they can be pre-calculated in anticipation of each local such that they can be pre-calculated in anticipation of each local
failure and made available for invocation with minimal delay. There failure and made available for invocation with minimal delay. There
are three basic categories of repair paths: are three basic categories of repair paths:
1. Equal cost multiple paths (ECMP). Where such paths exist, and 1. Equal cost multi-paths (ECMP). Where such paths exist, and one
one or more of the alternate paths do not traverse the failure, or more of the alternate paths do not traverse the failure, they
they may trivially be used as repair paths. may trivially be used as repair paths.
2. Downstream paths. (Also known as "loop free feasible 2. Loop free alternate paths. Such a path exists when a direct
alternates".) Such a path exists when a direct neighbor of the neighbor of the router adjacent to the failure has a path to the
router adjacent to the failure has a path to the destination destination which can be guaranteed not to traverse the failure.
which can be guaranteed not to traverse the failure.
3. Multihop repair paths. When there is no feasible downstream path 3. Multi-hop repair paths. When there is no feasible downstream
it may still be possible to locate a router, which is more than path it may still be possible to locate a router, which is more
one hop away from the router adjacent to the failure, from which than one hop away from the router adjacent to the failure, from
traffic will be forwarded to the destination without traversing which traffic will be forwarded to the destination without
the failure. traversing the failure.
ECMP and downstream paths offer the simplest repair paths and would ECMP and loop free alternate paths (as described in [BASE]) offer the
normally be used when they are available. It is anticipated that simplest repair paths and would normally be used when they are
around 80% of failures (see section 3.2.2) can be repaired using available. It is anticipated that around 80% of failures (see section
these alone. 3.2.2) can be repaired using these alone.
Multi-hop repair paths are considerably more complex, both in the Multi-hop repair paths are considerably more complex, both in the
computations required to determine their existence, and in the computations required to determine their existence, and in the
mechanisms required to invoke them. They can be further classified mechanisms required to invoke them. They can be further classified
as: as:
1. Mechanisms where one or more alternate FIBs are pre-computed in 1. Mechanisms where one or more alternate FIBs are pre-computed in
all routers and the repaired packet is instructed to be all routers and the repaired packet is instructed to be
forwarded using a "repair FIB" by some method of signaling such forwarded using a "repair FIB" by some method of signaling such
as detecting a "U-turn" [U-TURNS] or marking the packet. as detecting a "U-turn" [U-TURNS] or marking the packet.
2. Mechanisms functionally equivalent to a loose source route which 2. Mechanisms functionally equivalent to a loose source route which
is invoked using the normal FIB. These include tunnels [TUNNELS] is invoked using the normal FIB. These include tunnels [TUNNELS]
and label based mechanisms. and label based mechanisms.
In many cases a repair path which reaches two-hops away from the In many cases a repair path which reaches two hops away from the
router detecting the failure will suffice, and it is anticipated that router detecting the failure will suffice, and it is anticipated that
around 98% of failures (see section 3.2.2) can be repaired by this around 98% of failures (see section 3.2.2) can be repaired by this
method. However, to provide complete repair coverage some use of method. However, to provide complete repair coverage some use of
longer multi-hop repair paths is generally necessary. longer multi-hop repair paths is generally necessary.
3.2.1. Scope of repair paths 3.2.1. Scope of repair paths
A particular repair path may be valid for all destinations which A particular repair path may be valid for all destinations which
require repair or may only be valid for a subset of destinations. If require repair or may only be valid for a subset of destinations. If
a repair path is valid for a node immediately downstream of the a repair path is valid for a node immediately downstream of the
failure, then it will be valid for all destinations previously failure, then it will be valid for all destinations previously
reachable by traversing the failure. However, in cases where such a reachable by traversing the failure. However, in cases where such a
repair path is difficult to achieve because it requires a high order repair path is difficult to achieve because it requires a high order
multi-hop repair path, it may still be possible to identify lower multi-hop repair path, it may still be possible to identify lower
order repair paths (possibly even downstream paths) which allow the order repair paths (possibly even loop free alternate paths) which
majority of destinations to be repaired. When IPFRR is unable to allow the majority of destinations to be repaired. When IPFRR is
provide complete repair, it is desirable that the extent of the unable to provide complete repair, it is desirable that the extent of
repair coverage can be determined and reported via network the repair coverage can be determined and reported via network
management. management.
There is a tradeoff to be achieved between minimizing the number of There is a tradeoff to be achieved between minimizing the number of
repair paths to be computed, and minimizing the overheads incurred in repair paths to be computed, and minimizing the overheads incurred in
using higher order multi-hop repair paths for destinations for which using higher order multi-hop repair paths for destinations for which
they are not strictly necessary. However, the computational cost of they are not strictly necessary. However, the computational cost of
determining repair paths on an individual destination basis can be determining repair paths on an individual destination basis can be
very high. very high.
The use of repair paths may result in excessive traffic passing over The use of repair paths may result in excessive traffic passing over
skipping to change at page 6, line 22 skipping to change at page 8, line 16
what is meant by such a percentage. There are three possibilities: what is meant by such a percentage. There are three possibilities:
1. The percentage of links (or nodes) which can be fully protected 1. The percentage of links (or nodes) which can be fully protected
for all destinations. This is appropriate where the requirement for all destinations. This is appropriate where the requirement
is to protect all traffic, but some percentage of the possible is to protect all traffic, but some percentage of the possible
failures may be identified as being un-protectable. failures may be identified as being un-protectable.
2. The percentage of destinations which can be fully protected for 2. The percentage of destinations which can be fully protected for
all link (or node) failures. This is appropriate where the all link (or node) failures. This is appropriate where the
requirement is to protect against all possible failures, but requirement is to protect against all possible failures, but
some percentage of destinations may be identified as being un- some percentage of destinations may be identified as being
protectable. un-protectable.
3. For all destinations (d) and for all failures (f), the 3. For all destinations (d) and for all failures (f), the
percentage of the total potential failure cases (d*f) which are percentage of the total potential failure cases (d*f) which are
protected. This is appropriate where the requirement is an protected. This is appropriate where the requirement is an
overall "best effort" protection. overall "best effort" protection.
The coverage obtained is dependent on the repair strategy and highly The coverage obtained is dependent on the repair strategy and highly
dependent on the detailed topology and metrics. Any figures quoted in dependent on the detailed topology and metrics. Any figures quoted in
this document are for illustrative purposes only. this document are for illustrative purposes only.
skipping to change at page 7, line 21 skipping to change at page 9, line 17
The repair paths have the property that they are unaffected by any The repair paths have the property that they are unaffected by any
topology changes resulting from the failure which caused their topology changes resulting from the failure which caused their
instantiation. Therefore there is no need to re-compute them during instantiation. Therefore there is no need to re-compute them during
the convergence period. They may be affected by an unrelated the convergence period. They may be affected by an unrelated
simultaneous topology change, but such events are out of scope of simultaneous topology change, but such events are out of scope of
this work (see section 3.2.5). this work (see section 3.2.5).
Once the routing protocol has re-converged it is necessary for all Once the routing protocol has re-converged it is necessary for all
repair paths to take account of the new topology. Various repair paths to take account of the new topology. Various
optimizations may permit the efficient identification of repair paths optimizations may permit the efficient identification of repair paths
which are unaffected by the change, and hence do not require full re- which are unaffected by the change, and hence do not require full
computation. Since the new repair paths will not be required until re-computation. Since the new repair paths will not be required until
the next failure occurs, the re-computation may be performed as a the next failure occurs, the re-computation may be performed as a
background task and be subject to a hold-down, but excessive delay in background task and be subject to a hold-down, but excessive delay in
completing this operation will increase the risk of a new failure completing this operation will increase the risk of a new failure
occurring before the repair paths are in place. occurring before the repair paths are in place.
3.2.5. Multiple failures and Shared Risk Groups 3.2.5. Multiple failures and Shared Risk Groups
Complete protection against multiple unrelated failures is out of Complete protection against multiple unrelated failures is out of
scope of this work. However, it is important that the occurrence of a scope of this work. However, it is important that the occurrence of a
second failure while one failure is undergoing repair should not second failure while one failure is undergoing repair should not
skipping to change at page 7, line 53 skipping to change at page 9, line 49
3.3. Mechanisms for micro-loop prevention 3.3. Mechanisms for micro-loop prevention
Control of micro-loops is important not only because they can cause Control of micro-loops is important not only because they can cause
packet loss in traffic which is affected by the failure, but because packet loss in traffic which is affected by the failure, but because
by saturating a link with looping packets they can also cause by saturating a link with looping packets they can also cause
congestion loss of traffic flowing over that link which would congestion loss of traffic flowing over that link which would
otherwise be unaffected by the failure. otherwise be unaffected by the failure.
A number of solutions to the problem of micro-loop formation have A number of solutions to the problem of micro-loop formation have
been proposed. The following factors are significant in their been proposed [MICROLOOP, ZININ]. The following factors are
classification: significant in their classification:
1. Partial or complete protection against micro-loops. 1. Partial or complete protection against micro-loops.
2. Delay imposed upon convergence. 2. Delay imposed upon convergence.
3. Tolerance of multiple failures (from node failures, and in 3. Tolerance of multiple failures (from node failures, and in
general). general).
4. Computational complexity (pre-computed or real time). 4. Computational complexity (pre-computed or real time).
skipping to change at page 9, line 24 skipping to change at page 11, line 22
This framework document does not itself introduce any security This framework document does not itself introduce any security
issues, but attention must be paid to the security implications of issues, but attention must be paid to the security implications of
any proposed solutions to the problem. any proposed solutions to the problem.
8. IPR Disclosure Acknowledgement 8. IPR Disclosure Acknowledgement
Certain IPR may be applicable to the mechanisms outlined in this Certain IPR may be applicable to the mechanisms outlined in this
document. Please check the detailed specifications for possible IPR document. Please check the detailed specifications for possible IPR
notices. notices.
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
9. Normative References 9. Normative References
Internet-drafts are works in progress available from Internet-drafts are works in progress available from
http://www.ietf.org/internet-drafts/ http://www.ietf.org/internet-drafts/
10. Informative References 10. Informative References
Internet-drafts are works in progress available from Internet-drafts are works in progress available from
http://www.ietf.org/internet-drafts/ http://www.ietf.org/internet-drafts/
BFD Katz, D., and Ward, D., "Bidirectional Forwarding BASE Atlas, A., "Basic Specification for IP
Detection", draft-katz-ward-bfd-02.txt, (work in Fast-Reroute: Loop-free Alternates",
draft-ietf-rtgwg-ipfrr-spec-base-01.txt,
(work in progress)
BFD Katz, D. and Ward, D., "Bidirectional
Forwarding Detection",
draft-ietf-bfd-base-00.txt, (work in
progress). progress).
MPLSFRR Pan, P. et al, "Fast Reroute Extensions to RSVP- MPLSFRR Pan, P. et al, "Fast Reroute Extensions to
TE for LSP Tunnels", RSVP-TE for LSP Tunnels",
draft-ietf-mpls-rsvp-lsp-fastreroute-05.txt draft-ietf-mpls-rsvp-lsp-fastreroute-07.txt
MICROLOOP Bryant, S. and Shand, M., "A Framework for
Loop-free Convergence",
draft-bryant-shand-lf-conv-frmwk-00.txt,
(work in progress).
TUNNELS Bryant, S. et al, "IP Fast Reroute using TUNNELS Bryant, S. et al, "IP Fast Reroute using
tunnels", draft-bryant-ipfrr-tunnels-00.txt, tunnels", draft-bryant-ipfrr-tunnels-01.txt,
(work in progress). (work in progress).
U-TURNS Atlas, A. et al, "IP/LDP Local Protection", U-TURNS Atlas, A. et al, "IP/LDP Local Protection",
draft-atlas-ip-local-protect-00.txt, (work in draft-atlas-ip-local-protect-01.txt, (work in
progress). progress).
ZININ Zinin, A., "Analysis and Minimization of
Microloops in Link-state Routing Protocols",
draft-zinin-microloop-analysis-00.txt, (work
in progress).
11. Author's Address 11. Author's Address
Mike Shand Mike Shand
Cisco Systems, Cisco Systems,
250, Longwater Avenue, 250, Longwater Avenue,
Green Park, Green Park,
Reading, RG2 6GB, Reading, RG2 6GB,
United Kingdom. Email: mshand@cisco.com United Kingdom. Email: mshand@cisco.com
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/