draft-ietf-rtgwg-ipfrr-framework-03.txt   draft-ietf-rtgwg-ipfrr-framework-04.txt 
Network Working Group M. Shand Network Working Group M. Shand
Internet Draft S. Bryant Internet Draft S. Bryant
Expiration Date: December 2005 Cisco Systems Expiration Date: April 2006 Cisco Systems
June 2005 October 2005
IP Fast Reroute Framework IP Fast Reroute Framework
draft-ietf-rtgwg-ipfrr-framework-03.txt draft-ietf-rtgwg-ipfrr-framework-04.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
skipping to change at page 6, line 6 skipping to change at page 6, line 6
Performing the second without the first will result in traffic being Performing the second without the first will result in traffic being
discarded by the router(s) adjacent to the failure. Both tasks are discarded by the router(s) adjacent to the failure. Both tasks are
necessary for an effective solution to the problem. necessary for an effective solution to the problem.
However, repair paths can be used in isolation where the failure is However, repair paths can be used in isolation where the failure is
short-lived. The repair paths can be kept in place until the failure short-lived. The repair paths can be kept in place until the failure
is repaired and there is no need to advertise the failure to other is repaired and there is no need to advertise the failure to other
routers. routers.
Similarly, micro-loop avoidance can be used in isolation to prevent Similarly, micro-loop avoidance can be used in isolation to prevent
loops arising from pre-planned management action. loops arising from pre-planned management action, because the link or
node being shut down can remain in service for a short time after its
removal has been announced into the network, and hence it can
function as its own "repair path".
Note that micro-loops can also occur when a link or node is restored Note that micro-loops can also occur when a link or node is restored
to service and thus a micro-loop avoidance mechanism is required for to service and thus a micro-loop avoidance mechanism is required for
both link up and link down cases. both link up and link down cases.
3. Mechanisms for IP Fast-reroute 3. Mechanisms for IP Fast-reroute
The set of mechanisms required for an effective solution to the The set of mechanisms required for an effective solution to the
problem can be broken down into the following sub-problems. problem can be broken down into the following sub-problems.
skipping to change at page 6, line 46 skipping to change at page 6, line 49
are three basic categories of repair paths: are three basic categories of repair paths:
1. Equal cost multi-paths (ECMP). Where such paths exist, and one 1. Equal cost multi-paths (ECMP). Where such paths exist, and one
or more of the alternate paths do not traverse the failure, they or more of the alternate paths do not traverse the failure, they
may trivially be used as repair paths. may trivially be used as repair paths.
2. Loop free alternate paths. Such a path exists when a direct 2. Loop free alternate paths. Such a path exists when a direct
neighbor of the router adjacent to the failure has a path to the neighbor of the router adjacent to the failure has a path to the
destination which can be guaranteed not to traverse the failure. destination which can be guaranteed not to traverse the failure.
3. Multi-hop repair paths. When there is no feasible downstream 3. Multi-hop repair paths. When there is no feasible loop free
path it may still be possible to locate a router, which is more alternate path it may still be possible to locate a router,
than one hop away from the router adjacent to the failure, from which is more than one hop away from the router adjacent to the
which traffic will be forwarded to the destination without failure, from which traffic will be forwarded to the destination
traversing the failure. without traversing the failure.
ECMP and loop free alternate paths (as described in [BASE]) offer the ECMP and loop free alternate paths (as described in [BASE]) offer the
simplest repair paths and would normally be used when they are simplest repair paths and would normally be used when they are
available. It is anticipated that around 80% of failures (see section available. It is anticipated that around 80% of failures (see section
3.2.2) can be repaired using these alone. 3.2.2) can be repaired using these basic methods alone.
Multi-hop repair paths are considerably more complex, both in the Multi-hop repair paths are more complex, both in the computations
computations required to determine their existence, and in the required to determine their existence, and in the mechanisms required
mechanisms required to invoke them. They can be further classified to invoke them. They can be further classified as:
as:
1. Mechanisms where one or more alternate FIBs are pre-computed in 1. Mechanisms where one or more alternate FIBs are pre-computed in
all routers and the repaired packet is instructed to be all routers and the repaired packet is instructed to be
forwarded using a "repair FIB" by some method of signaling such forwarded using a "repair FIB" by some method of per packet
as detecting a "U-turn" [U-TURNS] or marking the packet. signaling such as detecting a "U-turn" [U-TURNS, FIFR] or by
marking the packet.
2. Mechanisms functionally equivalent to a loose source route which 2. Mechanisms functionally equivalent to a loose source route which
is invoked using the normal FIB. These include tunnels [TUNNELS] is invoked using the normal FIB. These include tunnels
and label based mechanisms. [TUNNELS], alternative shortest paths [ALT-SP] and label based
mechanisms.
3. Mechanisms employing special addresses or labels which are 3. Mechanisms employing special addresses or labels which are
installed in the FIBs of all routers with routes pre-computed to installed in the FIBs of all routers with routes pre-computed to
avoid certain components of the network. For example [NOT-VIA]. avoid certain components of the network. For example [NOT-VIA].
In many cases a repair path which reaches two hops away from the In many cases a repair path which reaches two hops away from the
router detecting the failure will suffice, and it is anticipated that router detecting the failure will suffice, and it is anticipated that
around 98% of failures (see section 3.2.2) can be repaired by this around 98% of failures (see section 3.2.2) can be repaired by this
method. However, to provide complete repair coverage some use of method. However, to provide complete repair coverage some use of
longer multi-hop repair paths is generally necessary. longer multi-hop repair paths is generally necessary.
skipping to change at page 7, line 51 skipping to change at page 8, line 5
the repair coverage can be determined and reported via network the repair coverage can be determined and reported via network
management. management.
There is a tradeoff to be achieved between minimizing the number of There is a tradeoff to be achieved between minimizing the number of
repair paths to be computed, and minimizing the overheads incurred in repair paths to be computed, and minimizing the overheads incurred in
using higher order multi-hop repair paths for destinations for which using higher order multi-hop repair paths for destinations for which
they are not strictly necessary. However, the computational cost of they are not strictly necessary. However, the computational cost of
determining repair paths on an individual destination basis can be determining repair paths on an individual destination basis can be
very high. very high.
It will frequently be the case that the majority of destinations can It will frequently be the case that the majority of destinations may
be repaired using only the "basic" repair mechanism, leaving a be repaired using only the "basic" repair mechanism, leaving a
smaller subset of the destinations to be repaired using one of the smaller subset of the destinations to be repaired using one of the
more complex multi-hop methods. Such a hybrid approach may go some more complex multi-hop methods. Such a hybrid approach may go some
way to resolving the conflict between completeness and complexity. way to resolving the conflict between completeness and complexity.
The use of repair paths may result in excessive traffic passing over The use of repair paths may result in excessive traffic passing over
a link, resulting in congestion discard. This reduces the a link, resulting in congestion discard. This reduces the
effectiveness of IPFRR. Mechanisms to influence the distribution of effectiveness of IPFRR. Mechanisms to influence the distribution of
repaired traffic to minimize this effect are therefore desirable. repaired traffic to minimize this effect are therefore desirable.
3.2.2. Analysis of repair coverage 3.2.2. Analysis of repair coverage
In some cases the repair strategy will permit the repair of all In some cases the repair strategy will permit the repair of all
single link or node failures in the network for all possible single link or node failures in the network for all possible
destinations. This can be defined as 100% coverage. However, where destinations. This can be defined as 100% coverage. However, where
the coverage is less than 100% it is important for the purposes of the coverage is less than 100% it is important for the purposes of
comparisons between different proposed repair strategies to define comparisons between different proposed repair strategies to define
what is meant by such a percentage. There are three possibilities: what is meant by such a percentage. There are four possibilities:
1. The percentage of links (or nodes) which can be fully protected 1. The percentage of links (or nodes) which can be fully protected
for all destinations. This is appropriate where the requirement for all destinations. This is appropriate where the requirement
is to protect all traffic, but some percentage of the possible is to protect all traffic, but some percentage of the possible
failures may be identified as being un-protectable. failures may be identified as being un-protectable.
2. The percentage of destinations which can be fully protected for 2. The percentage of destinations which can be fully protected for
all link (or node) failures. This is appropriate where the all link (or node) failures. This is appropriate where the
requirement is to protect against all possible failures, but requirement is to protect against all possible failures, but
some percentage of destinations may be identified as being some percentage of destinations may be identified as being
un-protectable. un-protectable.
3. For all destinations (d) and for all failures (f), the 3. For all destinations (d) and for all failures (f), the
percentage of the total potential failure cases (d*f) which are percentage of the total potential failure cases (d*f) which are
protected. This is appropriate where the requirement is an protected. This is appropriate where the requirement is an
overall "best effort" protection. overall "best effort" protection.
4. The percentage of packets normally passing though the network
that will continue to reach their destination. This requires a
traffic matrix for the network as part of the analysis.
The coverage obtained is dependent on the repair strategy and highly The coverage obtained is dependent on the repair strategy and highly
dependent on the detailed topology and metrics. Any figures quoted in dependent on the detailed topology and metrics. Any figures quoted in
this document are for illustrative purposes only. this document are for illustrative purposes only.
3.2.3. Link or node repair 3.2.3. Link or node repair
A repair path may be computed to protect against failure of an A repair path may be computed to protect against failure of an
adjacent link, or failure of an adjacent node. In general, link adjacent link, or failure of an adjacent node. In general, link
protection is simpler to achieve. A repair which protects against protection is simpler to achieve. A repair which protects against
node failure will also protect against link failure for all node failure will also protect against link failure for all
skipping to change at page 9, line 33 skipping to change at page 9, line 43
Once the routing protocol has re-converged it is necessary for all Once the routing protocol has re-converged it is necessary for all
repair paths to take account of the new topology. Various repair paths to take account of the new topology. Various
optimizations may permit the efficient identification of repair paths optimizations may permit the efficient identification of repair paths
which are unaffected by the change, and hence do not require full which are unaffected by the change, and hence do not require full
re-computation. Since the new repair paths will not be required until re-computation. Since the new repair paths will not be required until
the next failure occurs, the re-computation may be performed as a the next failure occurs, the re-computation may be performed as a
background task and be subject to a hold-down, but excessive delay in background task and be subject to a hold-down, but excessive delay in
completing this operation will increase the risk of a new failure completing this operation will increase the risk of a new failure
occurring before the repair paths are in place. occurring before the repair paths are in place.
3.2.5. Multiple failures and Shared Risk Groups 3.2.5. Multiple failures and Shared Risk Link Groups
Complete protection against multiple unrelated failures is out of Complete protection against multiple unrelated failures is out of
scope of this work. However, it is important that the occurrence of a scope of this work. However, it is important that the occurrence of a
second failure while one failure is undergoing repair should not second failure while one failure is undergoing repair should not
result in a level of service which is significantly worse than that result in a level of service which is significantly worse than that
which would have been achieved in the absence of any repair strategy. which would have been achieved in the absence of any repair strategy.
Shared Risk Groups are an example of multiple related failures, and Shared Risk Link Groups are an example of multiple related failures,
their protection is a matter for further study. and the more complex aspects of their protection is a matter for
further study.
One specific example of an SRLG which is clearly within the scope of One specific example of an SRLG which is clearly within the scope of
this work is a node failure. This causes the simultaneous failure of this work is a node failure. This causes the simultaneous failure of
multiple links, but their closely defined topological relationship multiple links, but their closely defined topological relationship
makes the problem more tractable. makes the problem more tractable.
3.3. Mechanisms for micro-loop prevention 3.3. Local Area Networks
Protection against partial or complete failure of LANs is more
complex than the point to point case. In general there is a tradeoff
between the simplicity of the repair and the ability to provide
complete and optimal repair coverage.
3.4. Mechanisms for micro-loop prevention
Control of micro-loops is important not only because they can cause Control of micro-loops is important not only because they can cause
packet loss in traffic which is affected by the failure, but because packet loss in traffic which is affected by the failure, but because
by saturating a link with looping packets they can also cause by saturating a link with looping packets they can also cause
congestion loss of traffic flowing over that link which would congestion loss of traffic flowing over that link which would
otherwise be unaffected by the failure. otherwise be unaffected by the failure.
A number of solutions to the problem of micro-loop formation have A number of solutions to the problem of micro-loop formation have
been proposed and are summarized in [MICROLOOP]. The following been proposed and are summarized in [MICROLOOP]. The following
factors are significant in their classification: factors are significant in their classification:
skipping to change at page 10, line 54 skipping to change at page 11, line 20
protected. protected.
b. Notification of pre-computed repair paths, and anticipated b. Notification of pre-computed repair paths, and anticipated
traffic patterns. traffic patterns.
c. Counts of failure detections, protection invocations and c. Counts of failure detections, protection invocations and
packets forwarded over repair paths. packets forwarded over repair paths.
5. Scope and applicability 5. Scope and applicability
The initial scope of this work is in the context of link state IGPs.
Link state protocols provide ubiquitous topology information, which Link state protocols provide ubiquitous topology information, which
facilitates the computation of repairs paths. Therefore the initial facilitates the computation of repairs paths.
scope of this work is in the context of link state IGPs.
Provision of similar facilities in non-link state IGPs and BGP is a Provision of similar facilities in non-link state IGPs and BGP is a
matter for further study, but the correct operation of the repair matter for further study, but the correct operation of the repair
mechanisms for traffic with a destination outside the IGP domain is mechanisms for traffic with a destination outside the IGP domain is
an important consideration for solutions based on this framework an important consideration for solutions based on this framework
6. IANA considerations 6. IANA considerations
There are no IANA considerations that arise from this framework There are no IANA considerations that arise from this framework
document. document.
skipping to change at page 12, line 15 skipping to change at page 12, line 33
10. Normative References 10. Normative References
Internet-drafts are works in progress available from Internet-drafts are works in progress available from
http://www.ietf.org/internet-drafts/ http://www.ietf.org/internet-drafts/
11. Informative References 11. Informative References
Internet-drafts are works in progress available from Internet-drafts are works in progress available from
http://www.ietf.org/internet-drafts/ http://www.ietf.org/internet-drafts/
BASE Atlas, A., "Basic Specification for IP ALT-SP Tian, A., Chen, N., "Fast Reroute using
Fast-Reroute: Loop-free Alternates", Alternative Shortest Paths", draft-tian-frr-
draft-ietf-rtgwg-ipfrr-spec-base-03.txt, alt-shortest-path-01.txt, (work in progress)
BASE Atlas, A., Zinin, A., "Basic Specification
for IP Fast-Reroute: Loop-free Alternates",
draft-ietf-rtgwg-ipfrr-spec-base-04.txt,
(work in progress) (work in progress)
BFD Katz, D. and Ward, D., "Bidirectional BFD Katz, D. and Ward, D., "Bidirectional
Forwarding Detection", Forwarding Detection",
draft-ietf-bfd-base-02.txt, (work in draft-ietf-bfd-base-03.txt, (work in
progress). progress).
FIFR S. Nelakuditi, S. Lee, Y. Yu, Z.-L. Zhang,
and C.-N. Chuah, "Fast local rerouting for
handling transient link failures.," Tech.
Rep. TR-2004-004, University of South
Carolina, 2004.
MPLSFRR Pan, P. et al, "Fast Reroute Extensions to MPLSFRR Pan, P. et al, "Fast Reroute Extensions to
RSVP-TE for LSP Tunnels", RFC 4090. RSVP-TE for LSP Tunnels", RFC 4090.
MICROLOOP Bryant, S. and Shand, M., "A Framework for MICROLOOP Bryant, S. and Shand, M., "A Framework for
Loop-free Convergence", Loop-free Convergence",
draft-bryant-shand-lf-conv-frmwk-01.txt, draft-bryant-shand-lf-conv-frmwk-01.txt,
(work in progress). (work in progress).
NOT-VIA Bryant, S. and Shand, M., "IP Fast Reroute NOT-VIA Bryant, S., Previdi, S., Shand, M., "IP Fast
Using Notvia Addresses", Reroute Using Notvia Addresses",
draft-bryant-shand-ipfrr-notvia-addresses- draft-bryant-shand-ipfrr-notvia-addresses-
00.txt, (work in progress). 01.txt, (work in progress).
TUNNELS Bryant, S. et al, "IP Fast Reroute using TUNNELS Bryant, S. et al, "IP Fast Reroute using
tunnels", draft-bryant-ipfrr-tunnels-01.txt, tunnels", draft-bryant-ipfrr-tunnels-02.txt,
(work in progress). (work in progress).
U-TURNS Atlas, A. et al, "IP/LDP Local Protection", U-TURNS Atlas, A. et al, "IP/LDP Local Protection",
draft-atlas-ip-local-protect-01.txt, (work in draft-atlas-ip-local-protect-02.txt, (work in
progress). progress).
12. Authors' Addresses 12. Authors' Addresses
Stewart Bryant Stewart Bryant
Cisco Systems, Cisco Systems,
250, Longwater, 250, Longwater,
Green Park, Green Park,
Reading, RG2 6GB, Reading, RG2 6GB,
United Kingdom. Email: stbryant@cisco.com United Kingdom. Email: stbryant@cisco.com
 End of changes. 24 change blocks. 
35 lines changed or deleted 61 lines changed or added

This html diff was produced by rfcdiff 1.27, available from http://www.levkowetz.com/ietf/tools/rfcdiff/