draft-ietf-rtgwg-ipfrr-framework-11.txt   draft-ietf-rtgwg-ipfrr-framework-12.txt 
Network Working Group M. Shand Network Working Group M. Shand
Internet-Draft S. Bryant Internet-Draft S. Bryant
Intended status: Informational Cisco Systems Intended status: Informational Cisco Systems
Expires: December 31, 2009 June 29, 2009 Expires: March 22, 2010 September 18, 2009
IP Fast Reroute Framework IP Fast Reroute Framework
draft-ietf-rtgwg-ipfrr-framework-11 draft-ietf-rtgwg-ipfrr-framework-12
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 32 skipping to change at page 1, line 32
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 31, 2009. This Internet-Draft will expire on March 22, 2010.
Copyright Notice Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info). publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 3, line 41 skipping to change at page 3, line 41
output layer-3 interfaces. output layer-3 interfaces.
FIB Forwarding Information Base. The database used FIB Forwarding Information Base. The database used
by the packet forwarder to determine what actions by the packet forwarder to determine what actions
to perform on a packet. to perform on a packet.
IPFRR IP fast-reroute. IPFRR IP fast-reroute.
Link(A->B) A link connecting router A to router B. Link(A->B) A link connecting router A to router B.
LFA Loop Free Alternate. This is a neighbor N, that LFA Loop Free Alternate. A neighbor N, that is not a
is not a primary next-hop neighbor E, whose primary next-hop neighbor E, whose shortest path
shortest path to the destination D does not go to the destination D does not go back through the
back through the router S. The neighbor N must router S. The neighbor N must meet the following
meet the following condition:- condition:-
Distance_opt(N, D) < Distance_opt(N, S) + Distance_opt(N, D) < Distance_opt(N, S) +
Distance_opt(S, D) Distance_opt(S, D)
Loop Free Neighbor A neighbor N_i, which is not the particular Loop Free Neighbor A neighbor N_i, which is not the particular
primary neighbor E_k under discussion, and whose primary neighbor E_k under discussion, and whose
shortest path to D does not traverse S. For shortest path to D does not traverse S. For
example, if there are two primary neighbors E_1 example, if there are two primary neighbors E_1
and E_2, E_1 is a loop-free neighbor with regard and E_2, E_1 is a loop-free neighbor with regard
to E_2 and vice versa. to E_2 and vice versa.
Loop Free Link Protecting Alternate Loop Free Link Protecting Alternate
This is a path via a Loop-Free Neighbor N_i which A path via a Loop-Free Neighbor N_i that reaches
does not go through the particular link of S destination D without going through the
which is being protected to reach the destination particular link of S that is being protected. In
D. some cases the path to D may go through the
primary neighbor E.
Loop Free Node-protecting Alternate Loop Free Node-protecting Alternate
This is a path via a Loop-Free Neighbor N_i which A path via a Loop-Free Neighbor N_i that reaches
does not go through the particular primary destination D without going through the
neighbor of S which is being protected to reach particular primary neighbor (E) of S which is
the destination D. being protected.
N_i The ith neighbor of S. N_i The ith neighbor of S.
Primary Neighbor A neighbor N_i of S which is one of the next hops Primary Neighbor A neighbor N_i of S which is one of the next hops
for destination D in S's FIB prior to any for destination D in S's FIB prior to any
failure. failure.
R_i_j The jth neighbor of N_i. R_i_j The jth neighbor of N_i.
Routing Transition The process whereby routers converge on a new Routing Transition The process whereby routers converge on a new
skipping to change at page 5, line 8 skipping to change at page 5, line 8
repair that is computed in anticipation of the repair that is computed in anticipation of the
failure of a neighboring router denoted as E, or failure of a neighboring router denoted as E, or
of the link between S and E. It is the viewpoint of the link between S and E. It is the viewpoint
from which IP fast-reroute is described. from which IP fast-reroute is described.
SPF Shortest Path First, e.g. Dijkstra's algorithm. SPF Shortest Path First, e.g. Dijkstra's algorithm.
SPT Shortest path tree SPT Shortest path tree
Upstream Forwarding Loop Upstream Forwarding Loop
This is a forwarding loop which involves a set of A forwarding loop that involves a set of routers,
routers, none of which are directly connected to none of which are directly connected to the link
the link which has caused the topology change that has caused the topology change that
that triggered a new SPF in any of the routers. triggered a new SPF in any of the routers.
2. Introduction 2. Introduction
When a link or node failure occurs in a routed network, there is When a link or node failure occurs in a routed network, there is
inevitably a period of disruption to the delivery of traffic until inevitably a period of disruption to the delivery of traffic until
the network re-converges on the new topology. Packets for the network re-converges on the new topology. Packets for
destinations which were previously reached by traversing the failed destinations which were previously reached by traversing the failed
component may be dropped or may suffer looping. Traditionally such component may be dropped or may suffer looping. Traditionally such
disruptions have lasted for periods of at least several seconds, and disruptions have lasted for periods of at least several seconds, and
most applications have been constructed to tolerate such a quality of most applications have been constructed to tolerate such a quality of
skipping to change at page 6, line 8 skipping to change at page 6, line 8
approach. approach.
3. Problem Analysis 3. Problem Analysis
The duration of the packet delivery disruption caused by a The duration of the packet delivery disruption caused by a
conventional routing transition is determined by a number of factors: conventional routing transition is determined by a number of factors:
1. The time taken to detect the failure. This may be of the order 1. The time taken to detect the failure. This may be of the order
of a few milliseconds when it can be detected at the physical of a few milliseconds when it can be detected at the physical
layer, up to several tens of seconds when a routing protocol layer, up to several tens of seconds when a routing protocol
hello is employed. During this period packets will be Hello is employed. During this period packets will be
unavoidably lost. unavoidably lost.
2. The time taken for the local router to react to the failure. 2. The time taken for the local router to react to the failure.
This will typically involve generating and flooding new routing This will typically involve generating and flooding new routing
updates, perhaps after some hold-down delay, and re-computing the updates, perhaps after some hold-down delay, and re-computing the
router's FIB. router's FIB.
3. The time taken to pass the information about the failure to other 3. The time taken to pass the information about the failure to other
routers in the network. In the absence of routing protocol routers in the network. In the absence of routing protocol
packet loss, this is typically between 10 milliseconds and 100 packet loss, this is typically between 10 milliseconds and 100
skipping to change at page 6, line 44 skipping to change at page 6, line 44
The initial packet loss is caused by the router(s) adjacent to the The initial packet loss is caused by the router(s) adjacent to the
failure continuing to attempt to transmit packets across the failure failure continuing to attempt to transmit packets across the failure
until it is detected. This loss is unavoidable, but the detection until it is detected. This loss is unavoidable, but the detection
time can be reduced to a few tens of milliseconds as described in time can be reduced to a few tens of milliseconds as described in
Section 4.1. Section 4.1.
In some topologies subsequent packet loss may be caused by the In some topologies subsequent packet loss may be caused by the
"micro-loops" which may form as a result of temporary inconsistencies "micro-loops" which may form as a result of temporary inconsistencies
between routers' forwarding tables[I-D.ietf-rtgwg-lf-conv-frmwk]. between routers' forwarding tables[I-D.ietf-rtgwg-lf-conv-frmwk].
When micro-loops occur, this is as a result of the different times at These inconsistencies are caused by steps 3, 4 and 5 above and in
which routers update their forwarding tables to reflect the failure.
These variable delays are caused by steps 3, 4 and 5 above and in
many routers it is step 5 which is both the largest factor and which many routers it is step 5 which is both the largest factor and which
has the greatest variance between routers. The large variance arises has the greatest variance between routers. The large variance arises
from implementation differences and from the differing impact that a from implementation differences and from the differing impact that a
failure has on each individual router. For example, the number of failure has on each individual router. For example, the number of
prefixes affected by the failure may vary dramatically from one prefixes affected by the failure may vary dramatically from one
router to another. router to another.
In order to achieve packet disruption times which are commensurate In order to achieve packet disruption times which are commensurate
with the failure detection times two factors must be considered:- with the failure detection times two mechanisms may be required:-
1. The provision of a mechanism for the router(s) adjacent to the 1. A mechanism for the router(s) adjacent to the failure to rapidly
failure to rapidly invoke a repair path, which is unaffected by invoke a repair path, which is unaffected by any subsequent re-
any subsequent re-convergence. convergence.
2. In topologies that are susceptible to micro-loops, the provision 2. In topologies that are susceptible to micro-loops, a mechanism to
of a mechanism to prevent the effects of any micro-loops during prevent the effects of any micro-loops during subsequent re-
subsequent re-convergence. convergence.
Performing the first task without the second may result in the repair Performing the first task without the second may result in the repair
path being starved of traffic and hence being redundant. Performing path being starved of traffic and hence being redundant. Performing
the second without the first will result in traffic being discarded the second without the first will result in traffic being discarded
by the router(s) adjacent to the failure. by the router(s) adjacent to the failure.
Repair paths may always be used in isolation where the failure is Repair paths may always be used in isolation where the failure is
short-lived. In this case, the repair paths can be kept in place short-lived. In this case, the repair paths can be kept in place
until the failure is repaired in which case there is no need to until the failure is repaired in which case there is no need to
advertise the failure to other routers. advertise the failure to other routers.
skipping to change at page 8, line 5 skipping to change at page 8, line 5
4.1. Mechanisms for fast failure detection 4.1. Mechanisms for fast failure detection
It is critical that the failure detection time is minimized. A It is critical that the failure detection time is minimized. A
number of well documented approaches are possible, such as: number of well documented approaches are possible, such as:
1. Physical detection; for example, loss of light. 1. Physical detection; for example, loss of light.
2. Routing protocol independent protocol detection; for example, The 2. Routing protocol independent protocol detection; for example, The
Bidirectional Failure Detection protocol [I-D.ietf-bfd-base]. Bidirectional Failure Detection protocol [I-D.ietf-bfd-base].
3. Routing protocol detection; for example, use of "fast hellos". 3. Routing protocol detection; for example, use of "fast Hellos".
4.2. Mechanisms for repair paths 4.2. Mechanisms for repair paths
Once a failure has been detected by one of the above mechanisms, Once a failure has been detected by one of the above mechanisms,
traffic which previously traversed the failure is transmitted over traffic which previously traversed the failure is transmitted over
one or more repair paths. The design of the repair paths should be one or more repair paths. The design of the repair paths should be
such that they can be pre-calculated in anticipation of each local such that they can be pre-calculated in anticipation of each local
failure and made available for invocation with minimal delay. There failure and made available for invocation with minimal delay. There
are three basic categories of repair paths: are three basic categories of repair paths:
skipping to change at page 9, line 28 skipping to change at page 9, line 28
failure, then it will be valid for all destinations previously failure, then it will be valid for all destinations previously
reachable by traversing the failure. However, in cases where such a reachable by traversing the failure. However, in cases where such a
repair path is difficult to achieve because it requires a high order repair path is difficult to achieve because it requires a high order
multi-hop repair path, it may still be possible to identify lower multi-hop repair path, it may still be possible to identify lower
order repair paths (possibly even loop free alternate paths) which order repair paths (possibly even loop free alternate paths) which
allow the majority of destinations to be repaired. When IPFRR is allow the majority of destinations to be repaired. When IPFRR is
unable to provide complete repair, it is desirable that the extent of unable to provide complete repair, it is desirable that the extent of
the repair coverage can be determined and reported via network the repair coverage can be determined and reported via network
management. management.
There is a tradeoff to be achieved between minimizing the number of There is a trade-off to be achieved between minimizing the number of
repair paths to be computed, and minimizing the overheads incurred in repair paths to be computed, and minimizing the overheads incurred in
using higher order multi-hop repair paths for destinations for which using higher order multi-hop repair paths for destinations for which
they are not strictly necessary. However, the computational cost of they are not strictly necessary. However, the computational cost of
determining repair paths on an individual destination basis can be determining repair paths on an individual destination basis can be
very high. very high.
It will frequently be the case that the majority of destinations may It will frequently be the case that the majority of destinations may
be repaired using only the "basic" repair mechanism, leaving a be repaired using only the "basic" repair mechanism, leaving a
smaller subset of the destinations to be repaired using one of the smaller subset of the destinations to be repaired using one of the
more complex multi-hop methods. Such a hybrid approach may go some more complex multi-hop methods. Such a hybrid approach may go some
way to resolving the conflict between completeness and complexity. way to resolving the conflict between completeness and complexity.
The use of repair paths may result in excessive traffic passing over The use of repair paths may result in excessive traffic passing over
a link, resulting in congestion discard. This reduces the a link, resulting in congestion discard. This reduces the
effectiveness of IPFRR. Mechanisms to influence the distribution of effectiveness of IPFRR. Mechanisms to influence the distribution of
repaired traffic to minimize this effect are therefore desirable. repaired traffic to minimize this effect are therefore desirable.
4.2.2. Analysis of repair coverage 4.2.2. Analysis of repair coverage
The repair coverage obtained is dependent on the repair strategy and
highly dependent on the detailed topology and metrics. Estimates of
the repair coverage quoted in this document are for illustrative
purposes only and may not be always be achievable.
In some cases the repair strategy will permit the repair of all In some cases the repair strategy will permit the repair of all
single link or node failures in the network for all possible single link or node failures in the network for all possible
destinations. This can be defined as 100% coverage. However, where destinations. This can be defined as 100% coverage. However, where
the coverage is less than 100% it is important for the purposes of the coverage is less than 100% it is important for the purposes of
comparisons between different proposed repair strategies to define comparisons between different proposed repair strategies to define
what is meant by such a percentage. There are four possibilities: what is meant by such a percentage. There are four possibilities:
1. The percentage of links (or nodes) which can be fully protected 1. The percentage of links (or nodes) which can be fully protected
for all destinations. This is appropriate where the requirement for all destinations. This is appropriate where the requirement
is to protect all traffic, but some percentage of the possible is to protect all traffic, but some percentage of the possible
skipping to change at page 10, line 26 skipping to change at page 10, line 32
3. For all destinations (d) and for all failures (f), the percentage 3. For all destinations (d) and for all failures (f), the percentage
of the total potential failure cases (d*f) which are protected. of the total potential failure cases (d*f) which are protected.
This is appropriate where the requirement is an overall "best This is appropriate where the requirement is an overall "best
effort" protection. effort" protection.
4. The percentage of packets normally passing though the network 4. The percentage of packets normally passing though the network
that will continue to reach their destination. This requires a that will continue to reach their destination. This requires a
traffic matrix for the network as part of the analysis. traffic matrix for the network as part of the analysis.
The coverage obtained is dependent on the repair strategy and highly
dependent on the detailed topology and metrics. Any figures quoted
in this document are for illustrative purposes only.
4.2.3. Link or node repair 4.2.3. Link or node repair
A repair path may be computed to protect against failure of an A repair path may be computed to protect against failure of an
adjacent link, or failure of an adjacent node. In general, link adjacent link, or failure of an adjacent node. In general, link
protection is simpler to achieve. A repair which protects against protection is simpler to achieve. A repair which protects against
node failure will also protect against link failure for all node failure will also protect against link failure for all
destinations except those for which the adjacent node is a single destinations except those for which the adjacent node is a single
point of failure. point of failure.
In some cases it may be necessary to distinguish between a link or In some cases it may be necessary to distinguish between a link or
skipping to change at page 11, line 41 skipping to change at page 11, line 41
occurring before the repair paths are in place. occurring before the repair paths are in place.
4.2.5. Multiple failures and Shared Risk Link Groups 4.2.5. Multiple failures and Shared Risk Link Groups
Complete protection against multiple unrelated failures is out of Complete protection against multiple unrelated failures is out of
scope of this work. However, it is important that the occurrence of scope of this work. However, it is important that the occurrence of
a second failure while one failure is undergoing repair should not a second failure while one failure is undergoing repair should not
result in a level of service which is significantly worse than that result in a level of service which is significantly worse than that
which would have been achieved in the absence of any repair strategy. which would have been achieved in the absence of any repair strategy.
Shared Risk Link Groups are an example of multiple related failures, Shared Risk Link Groups (SRLGs) are an example of multiple related
and the more complex aspects of their protection is a matter for failures, and the more complex aspects of their protection is a
further study. matter for further study.
One specific example of an SRLG which is clearly within the scope of One specific example of an SRLG which is clearly within the scope of
this work is a node failure. This causes the simultaneous failure of this work is a node failure. This causes the simultaneous failure of
multiple links, but their closely defined topological relationship multiple links, but their closely defined topological relationship
makes the problem more tractable. makes the problem more tractable.
4.3. Local Area Networks 4.3. Local Area Networks
Protection against partial or complete failure of LANs is more Protection against partial or complete failure of LANs is more
complex than the point to point case. In general there is a tradeoff complex than the point to point case. In general there is a trade-
between the simplicity of the repair and the ability to provide off between the simplicity of the repair and the ability to provide
complete and optimal repair coverage. complete and optimal repair coverage.
4.4. Mechanisms for micro-loop prevention 4.4. Mechanisms for micro-loop prevention
Ensuring the absence of micro-loops is important not only because Ensuring the absence of micro-loops is important not only because
they can cause packet loss in traffic which is affected by the they can cause packet loss in traffic which is affected by the
failure, but because by saturating a link with looping packets they failure, but because by saturating a link with looping packets they
can also cause congestion loss of traffic flowing over that link can also cause congestion loss of traffic flowing over that link
which would otherwise be unaffected by the failure. which would otherwise be unaffected by the failure.
skipping to change at page 14, line 5 skipping to change at page 13, line 47
There are no IANA considerations that arise from this framework There are no IANA considerations that arise from this framework
document. document.
8. Security Considerations 8. Security Considerations
This framework document does not itself introduce any security This framework document does not itself introduce any security
issues, but attention must be paid to the security implications of issues, but attention must be paid to the security implications of
any proposed solutions to the problem. any proposed solutions to the problem.
Where the chosen solution uses tunnels it is necessary to ensure that
the tunnel is not used as an attack vector. One method of addressing
this is to use a set of tunnel endpoint addresses that are excluded
from use by user traffic.
There is a compatibility issue between IPFRR and reverse path
forwarding (RPF) checking. Many of the solutions described in this
document result in traffic arriving from a direction inconsistent
with a standard RPF check. When a network relies on RPF checking for
security purposes, an alternative security mechanism will need to be
deployed in order to permit IPFRR to used.
Because the repair path will often be of a different length to the
pre-failure path, security mechanisms which rely on specific TTL
values will be adversely affected.
9. Acknowledgements 9. Acknowledgements
The authors would like to acknowledge contributions made by Alia The authors would like to acknowledge contributions made by Alia
Atlas, Clarence Filsfils, Pierre Francois, Joel Halpern, Stefano Atlas, Clarence Filsfils, Pierre Francois, Joel Halpern, Stefano
Previdi and Alex Zinin. Previdi and Alex Zinin.
10. Informative References 10. Informative References
[FIFR] Nelakuditi, S., Lee, S., Lu, Y., Zhang, Z., and C. Chuah, [FIFR] Nelakuditi, S., Lee, S., Lu, Y., Zhang, Z., and C. Chuah,
"Fast local rerouting for handling transient link "Fast local rerouting for handling transient link
skipping to change at page 14, line 35 skipping to change at page 14, line 47
(work in progress), November 2007. (work in progress), November 2007.
[I-D.ietf-bfd-base] [I-D.ietf-bfd-base]
Katz, D. and D. Ward, "Bidirectional Forwarding Katz, D. and D. Ward, "Bidirectional Forwarding
Detection", draft-ietf-bfd-base-09 (work in progress), Detection", draft-ietf-bfd-base-09 (work in progress),
February 2009. February 2009.
[I-D.ietf-rtgwg-ipfrr-notvia-addresses] [I-D.ietf-rtgwg-ipfrr-notvia-addresses]
Shand, M., Bryant, S., and S. Previdi, "IP Fast Reroute Shand, M., Bryant, S., and S. Previdi, "IP Fast Reroute
Using Not-via Addresses", Using Not-via Addresses",
draft-ietf-rtgwg-ipfrr-notvia-addresses-03 (work in draft-ietf-rtgwg-ipfrr-notvia-addresses-04 (work in
progress), October 2008. progress), July 2009.
[I-D.ietf-rtgwg-lf-conv-frmwk] [I-D.ietf-rtgwg-lf-conv-frmwk]
Shand, M. and S. Bryant, "A Framework for Loop-free Shand, M. and S. Bryant, "A Framework for Loop-free
Convergence", draft-ietf-rtgwg-lf-conv-frmwk-05 (work in Convergence", draft-ietf-rtgwg-lf-conv-frmwk-05 (work in
progress), June 2009. progress), June 2009.
[I-D.tian-frr-alt-shortest-path] [I-D.tian-frr-alt-shortest-path]
Tian, A., "Fast Reroute using Alternative Shortest Paths", Tian, A., "Fast Reroute using Alternative Shortest Paths",
draft-tian-frr-alt-shortest-path-01 (work in progress), draft-tian-frr-alt-shortest-path-01 (work in progress),
July 2004. July 2004.
 End of changes. 20 change blocks. 
44 lines changed or deleted 60 lines changed or added

This html diff was produced by rfcdiff 1.36. The latest version is available from http://tools.ietf.org/tools/rfcdiff/