draft-ietf-rtgwg-uloop-delay-00.txt   draft-ietf-rtgwg-uloop-delay-01.txt 
Routing Area Working Group S. Litkowski Routing Area Working Group S. Litkowski
Internet-Draft B. Decraene Internet-Draft B. Decraene
Intended status: Standards Track Orange Intended status: Standards Track Orange
Expires: May 15, 2016 C. Filsfils Expires: October 7, 2016 C. Filsfils
P. Francois P. Francois
Cisco Systems Cisco Systems
November 12, 2015 April 5, 2016
Microloop prevention by introducing a local convergence delay Microloop prevention by introducing a local convergence delay
draft-ietf-rtgwg-uloop-delay-00 draft-ietf-rtgwg-uloop-delay-01
Abstract Abstract
This document describes a mechanism for link-state routing protocols This document describes a mechanism for link-state routing protocols
to prevent local transient forwarding loops in case of link failure. to prevent local transient forwarding loops in case of link failure.
This mechanism Proposes a two-steps convergence by introducing a This mechanism Proposes a two-steps convergence by introducing a
delay between the convergence of the node adjacent to the topology delay between the convergence of the node adjacent to the topology
change and the network wide convergence. change and the network wide convergence.
As this mechanism delays the IGP convergence it may only be used for As this mechanism delays the IGP convergence it may only be used for
planned maintenance or when fast reroute protects the traffic between planned maintenance or when fast reroute protects the traffic between
the link failure and the IGP convergence. the link failure and the IGP convergence.
The proposed mechanism will be limited to link down event in order to
keep simplicity.
Simulations using real network topologies have been performed and Simulations using real network topologies have been performed and
show that local loops are a significant portion (>50%) of the total show that local loops are a significant portion (>50%) of the total
forwarding loops. forwarding loops.
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
skipping to change at page 2, line 4 skipping to change at page 2, line 9
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 15, 2016.
This Internet-Draft will expire on October 7, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Transient forwarding loops side effects . . . . . . . . . . . 3 2. Transient forwarding loops side effects . . . . . . . . . . . 3
2.1. Fast reroute unefficiency . . . . . . . . . . . . . . . . 3 2.1. Fast reroute unefficiency . . . . . . . . . . . . . . . . 4
2.2. Network congestion . . . . . . . . . . . . . . . . . . . 5 2.2. Network congestion . . . . . . . . . . . . . . . . . . . 6
3. Overview of the solution . . . . . . . . . . . . . . . . . . 6 3. Overview of the solution . . . . . . . . . . . . . . . . . . 7
4. Specification . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Specification . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 7 4.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 7
4.2. Current IGP reactions . . . . . . . . . . . . . . . . . . 7 4.2. Current IGP reactions . . . . . . . . . . . . . . . . . . 7
4.3. Local events . . . . . . . . . . . . . . . . . . . . . . 7 4.3. Local events . . . . . . . . . . . . . . . . . . . . . . 8
4.4. Local delay . . . . . . . . . . . . . . . . . . . . . . . 8 4.4. Local delay for link down . . . . . . . . . . . . . . . . 8
4.4.1. Link down event . . . . . . . . . . . . . . . . . . . 8
4.4.2. Link up event . . . . . . . . . . . . . . . . . . . . 9
5. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 9 5. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1. Applicable case : local loops . . . . . . . . . . . . . . 9 5.1. Applicable case : local loops . . . . . . . . . . . . . . 9
5.2. Non applicable case : remote loops . . . . . . . . . . . 10 5.2. Non applicable case : remote loops . . . . . . . . . . . 9
6. Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 10 6. Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 10
7. Deployment considerations . . . . . . . . . . . . . . . . . . 11 7. Deployment considerations . . . . . . . . . . . . . . . . . . 11
8. Comparison with other solutions . . . . . . . . . . . . . . . 12 8. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8.1. PLSN . . . . . . . . . . . . . . . . . . . . . . . . . . 12 8.1. Local link down . . . . . . . . . . . . . . . . . . . . . 12
8.2. OFIB . . . . . . . . . . . . . . . . . . . . . . . . . . 13 8.2. Local and remote event . . . . . . . . . . . . . . . . . 15
9. Security Considerations . . . . . . . . . . . . . . . . . . . 13 8.3. Aborting local delay . . . . . . . . . . . . . . . . . . 17
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 9. Comparison with other solutions . . . . . . . . . . . . . . . 19
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 9.1. PLSN . . . . . . . . . . . . . . . . . . . . . . . . . . 19
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 9.2. OFIB . . . . . . . . . . . . . . . . . . . . . . . . . . 20
12.1. Normative References . . . . . . . . . . . . . . . . . . 14 10. Existing implementations . . . . . . . . . . . . . . . . . . 20
12.2. Informative References . . . . . . . . . . . . . . . . . 14 11. Security Considerations . . . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
14. References . . . . . . . . . . . . . . . . . . . . . . . . . 21
14.1. Normative References . . . . . . . . . . . . . . . . . . 21
14.2. Informative References . . . . . . . . . . . . . . . . . 21
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22
1. Introduction 1. Introduction
Micro-forwarding loops and some potential solutions are well Micro-forwarding loops and some potential solutions are well
described in [RFC5715]. This document describes a simple targeted described in [RFC5715]. This document describes a simple targeted
mechanism that solves micro-loops local to the failure; based on mechanism that solves micro-loops local to the failure; based on
network analysis, these are a significant portion of the micro- network analysis, these are a significant portion of the micro-
forwarding loops. A simple and easily deployable solution to these forwarding loops. A simple and easily deployable solution to these
local micro-loops is critical because these local loops cause traffic local micro-loops is critical because these local loops cause traffic
loss after an advanced fast-reroute alternate has been used (see loss after an advanced fast-reroute alternate has been used (see
skipping to change at page 5, line 5 skipping to change at page 4, line 46
* IGP path to D is recomputed : C->E->A->D * IGP path to D is recomputed : C->E->A->D
3. On C, tail-end of the TE tunnel (router B) is no more on SPT to 3. On C, tail-end of the TE tunnel (router B) is no more on SPT to
D, so C does not encapsulate anymore the traffic to D using the D, so C does not encapsulate anymore the traffic to D using the
tunnel T and update forwarding entry to D using nexthop E. tunnel T and update forwarding entry to D using nexthop E.
If C updates its forwarding entry to D before router E, there would If C updates its forwarding entry to D before router E, there would
be a transient forwarding loop between C and E until E has converged. be a transient forwarding loop between C and E until E has converged.
Router C timeline Router E timeline +-----------+------------+------------------+-----------------------+
| Network | Time | Router C events | Router E events |
| condition | | | |
+-----------+------------+------------------+-----------------------+
| S->D | | | |
| Traffic | | | |
| OK | | | |
| | | | |
| S->D | t0 | Link B-C fails | Link B-C fails |
| Traffic | | | |
| lost | | | |
| | | | |
| | t0+20msec | C detects the | |
| | | failure | |
| | | | |
| S->D | t0+40msec | C activates FRR | |
| Traffic | | | |
| OK | | | |
| | | | |
| | t0+50msec | C updates its | |
| | | local LSP/LSA | |
| | | | |
| | t0+60msec | C schedules SPF | |
| | | (100ms) | |
| | | | |
| | t0+70msec | C floods its | |
| | | local updated | |
| | | LSP/LSA | |
| | | | |
| | t0+87msec | | E receives LSP/LSA |
| | | | from C and schedules |
| | | | SPF (100ms) |
| | | | |
| | t0+117msec | | E floods LSP/LSA from |
| | | | C |
| | | | |
| | t0+160msec | C computes SPF | |
| | | | |
| | t0+165msec | C starts | |
| | | updating its | |
| | | RIB/FIB | |
| | | | |
| | t0+193msec | | E computes SPF |
| | | | |
| | t0+199msec | | E starts updating its |
| | | | RIB/FIB |
| | | | |
| S->D | t0+255msec | C updates its | |
| Traffic | | RIB/FIB for D | |
| lost | | | |
| | | | |
| | t0+340msec | C convergence | |
| | | ends | |
| | | | |
| S->D | t0+443msec | | E updates its RIB/FIB |
| Traffic | | | for D |
| OK | | | |
| | | | |
| | t0+470msec | | E convergence ends |
+-----------+------------+------------------+-----------------------+
--- + ---- t0 C-B link fails Route computation event time scale
LoC | ---- t1 C detects failure
--- + ---- t2 C activates FRR
|
T | ---- t3 C updates local LSA/LSP
R |
A | ---- t4 C floods local LSA/LSP
F |
F | ---- t5 C computes SPF --- t0 E receives LSA/LSP
I |
C | ---- t6 C updates RIB/FIB --- t1 E floods LSA/LSP
|
O | --- t2 E computes SPF
K |
--- + * (t6' C updates FIB for D) --- t3 E updates RIB/FIB
|
LoC | ---- t7 Convergence ended on C
|
|
|
|
|
--- + * (Traffic restored to D) * (t3' E updates FIB for D)
|
| --- t4 Convergence ended on E
|
The issue described here is completely independent of the fast- The issue described here is completely independent of the fast-
reroute mechanism involved (TE FRR, LFA/rLFA, MRT ...). Fast-reroute reroute mechanism involved (TE FRR, LFA/rLFA, MRT ...). Fast-reroute
is working perfectly but ensures protection, by definition, only is working perfectly but ensures protection, by definition, only
until the PLR has converged. When implementing FRR, a service until the PLR has converged. When implementing FRR, a service
provider wants to guarantee a very limited loss of connectivity time. provider wants to guarantee a very limited loss of connectivity time.
The previous example shows that the benefit of FRR may be completely The previous example shows that the benefit of FRR may be completely
lost due to a transient forwarding loop appearing when PLR has lost due to a transient forwarding loop appearing when PLR has
converged. Delaying FIB updates after IGP convergence may permit to converged. Delaying FIB updates after IGP convergence may permit to
keep fast-reroute path until neighbor has converged and preserve keep fast-reroute path until neighbor has converged and preserve
skipping to change at page 6, line 35 skipping to change at page 7, line 10
the failure : e.g. A to B, F to B, E to B. Class of services may be the failure : e.g. A to B, F to B, E to B. Class of services may be
implemented to mitigate the congestion but some traffic not directly implemented to mitigate the congestion but some traffic not directly
concerned by the failure would still be dropped as a router is not concerned by the failure would still be dropped as a router is not
able to identify looped traffic from normal traffic. able to identify looped traffic from normal traffic.
3. Overview of the solution 3. Overview of the solution
This document defines a two-step convergence initiated by the router This document defines a two-step convergence initiated by the router
detecting the failure and advertising the topological changes in the detecting the failure and advertising the topological changes in the
IGP. This introduces a delay between the convergence of the local IGP. This introduces a delay between the convergence of the local
router and the network wide convergence. This delay is positive in router and the network wide convergence.
case of "down" events and negative in case of "up" events.
The proposed solution is kept limited to local link down events.
This ordered convergence, is similar to the ordered FIB proposed This ordered convergence, is similar to the ordered FIB proposed
defined in [RFC6976], but limited to only one hop distance. As a defined in [RFC6976], but limited to only one hop distance. As a
consequence, it is simpler and becomes a local only feature not consequence, it is simpler and becomes a local only feature not
requiring interoperability; at the cost of only covering the requiring interoperability; at the cost of only covering the
transient forwarding loops involving this local router. The proposed transient forwarding loops involving this local router. The proposed
mechanism also reuses some concept described in mechanism also reuses some concept described in
[I-D.ietf-rtgwg-microloop-analysis] with some limitation. [I-D.ietf-rtgwg-microloop-analysis] with some limitation.
4. Specification 4. Specification
skipping to change at page 7, line 17 skipping to change at page 7, line 37
o LSP_GEN_TIMER: to batch multiple local events in one single local o LSP_GEN_TIMER: to batch multiple local events in one single local
LSP update. It is often associated with damping mechanism to LSP update. It is often associated with damping mechanism to
slowdown reactions by incrementing the timer when multiple slowdown reactions by incrementing the timer when multiple
consecutive events are detected. consecutive events are detected.
o SPF_TIMER: to batch multiple events in one single computation. It o SPF_TIMER: to batch multiple events in one single computation. It
is often associated with damping mechanism to slowdown reactions is often associated with damping mechanism to slowdown reactions
by incrementing the timer when the IGP is instable. by incrementing the timer when the IGP is instable.
o IGP_LDP_SYNC_TIMER: defined in [RFC5443] to give LDP some time to This document introduces the following a new timer :
establish the session and learn the MPLS labels before the link is
used.
This document introduces the following two new timers :
o ULOOP_DELAY_DOWN_TIMER: slowdown the local node convergence in o ULOOP_DELAY_DOWN_TIMER: slowdown the local node convergence in
case of link down events. case of link down events.
o ULOOP_DELAY_UP_TIMER: slowdown the network wide IGP convergence in
case of link up events.
4.2. Current IGP reactions 4.2. Current IGP reactions
Upon a change of status on an adjacency/link, the existing behavior Upon a change of status on an adjacency/link, the existing behavior
of the router advertising the event is the following: of the router advertising the event is the following:
1. UP/Down event is notified to IGP. 1. UP/Down event is notified to IGP.
2. IGP processes the notification and postpones the reaction in 2. IGP processes the notification and postpones the reaction in
LSP_GEN_TIMER msec. LSP_GEN_TIMER msec.
skipping to change at page 7, line 50 skipping to change at page 8, line 16
it. it.
4. SPF is scheduled in SPF_TIMER msec. 4. SPF is scheduled in SPF_TIMER msec.
5. Upon SPF_TIMER expiration, SPF is computed and RIB/FIB are 5. Upon SPF_TIMER expiration, SPF is computed and RIB/FIB are
updated. updated.
4.3. Local events 4.3. Local events
The mechanisms described in this document assume that there has been The mechanisms described in this document assume that there has been
a single failure as seen by the IGP area/level. If this assumption a single link failure as seen by the IGP area/level. If this
is violated (e.g. multiple links or nodes failed), then standard IP assumption is violated (e.g. multiple links or nodes failed), then
convergence MUST be applied. There are three types of single standard IP convergence MUST be applied (as described in
failures: local link, local node, and remote failure. Section 4.2). There are three types of single failures: local link,
local node, and remote failure.
Example : Example :
+--- E ----+--------+ +--- E ----+--------+
| | | | | |
A ---- B -------- C ------ D A ---- B -------- C ------ D
Let B be the computing router when the link B-C fails. B updates its Let B be the computing router when the link B-C fails. B updates its
local LSP/LSA describing the link B->C as down, C does the same, and local LSP/LSA describing the link B->C as down, C does the same, and
both start flooding their updated LSP/LSAs. During the SPF_TIMER both start flooding their updated LSP/LSAs. During the SPF_TIMER
period, B and C learn all the LSPs/LSAs to consider. B sees that C period, B and C learn all the LSPs/LSAs to consider. B sees that C
is flooding as down a link where B is the other end and that B and C is flooding as down a link where B is the other end and that B and C
are describing the same single event. Since B receives no other are describing the same single event. Since B receives no other
changes, B can determine that this is a local link failure. changes, B can determine that this is a local link failure.
[Editor s Note: Detection of a failed broadcast link involves An implementation SHOULD implement a logic to correlate protocol
additional complexity and will be described in a future version.] messages (LSP/LSA) received during SPF scheduling and topology
changes as multiple protocol messages may describe the same topology
If a router determines that the event is local link failure, then the change. As a consequence, determining a particular topology change
router may use the mechanism described in this document. MUST be independent of the order of reception of those protocol
messages. How the logic works is let to implementation details.
Distinguishing local node failure from remote or multiple link
failure requires additional logic which is future work to fully
describe. To give a sense of the work necessary, if node C is
failing, routers B,E and D are updating and flooding updated LSPs/
LSAs. B would need to determine the changes in the LSPs/LSAs from E
and D and see that they all relate to node C which is also the far-
end of the locally failed link. Once this detection is accurately
done, the same mechanism of delaying local convergence can be
applied.
4.4. Local delay Using this logic, if an implementation determines that the associated
event is a single local link failure, then the router MAY use the
mechanism described in this document, otherwise standard IP
convergence MUST be used.
4.4.1. Link down event 4.4. Local delay for link down
Upon an adjacency/link down event, this document introduces a change Upon an adjacency/link down event, this document introduces a change
in step 5 in order to delay the local convergence compared to the in step 5 in order to delay the local convergence compared to the
network wide convergence: the node SHOULD delay the forwarding entry network wide convergence: the node SHOULD delay the forwarding entry
updates by ULOOP_DELAY_DOWN_TIMER. Such delay SHOULD only be updates by ULOOP_DELAY_DOWN_TIMER. Such delay SHOULD only be
introduced if all the LSDB modifications processed are only reporting introduced if all the LSDB modifications processed are only reporting
down local events . Note that determining that all topological down local events . Note that determining that all topological
change are only local down events requires analyzing all modified change are only local down events requires analyzing all modified
LSP/LSA as a local link or node failure will typically be notified by LSP/LSA as a local link or node failure will typically be notified by
multiple nodes. If a subsequent LSP/LSA is received/updated and a multiple nodes. If a subsequent LSP/LSA is received/updated and a
new SPF computation is triggered before the expiration of new SPF computation is triggered before the expiration of
ULOOP_DELAY_DOWN_TIMER, then the same evaluation SHOULD be performed. ULOOP_DELAY_DOWN_TIMER, then the same evaluation SHOULD be performed.
As a result of this addition, routers local to the failure will As a result of this addition, routers local to the failure will
converge slower than remote routers. Hence it SHOULD only be done converge slower than remote routers. Hence it SHOULD only be done
for non urgent convergence, such as for administrative de-activation for non urgent convergence, such as for administrative de-activation
(maintenance) or when the traffic is Fast ReRouted. (maintenance) or when the traffic is Fast ReRouted.
4.4.2. Link up event
Upon an adjacency/link up event, this document introduces the
following change in step 3 where the node SHOULD:
o Firstly build a LSP/LSA with the new adjacency but setting the
metric to MAX_METRIC . It SHOULD flood it but not compute the SPF
at this time. This step is required to ensure the two way
connectivity check on all nodes when computing SPF.
o Then build the LSP/LSA with the target metric but SHOULD delay the
flooding of this LSP/LSA by SPF_TIMER + ULOOP_DELAY_UP_TIMER.
MAX_METRIC is equal to MaxLinkMetric (0xFFFF) for OSPF and 2^24-2
(0xFFFFFE) for IS-IS.
o Then continue with next steps (SPF computation) without waiting
for the expiration of the above timer. In other word, only the
flooding of the LSA/LSP is delayed, not the local SPF computation.
As as result of this addition, routers local to the failure will
converge faster than remote routers.
If this mechanism is used in cooperation with "LDP IGP
Synchronization" as defined in [RFC5443] then the mechanism defined
in RFC 5443 is applied first, followed by the mechanism defined in
this document. More precisely, the procedure defined in this
document is applied once the LDP session is considered "fully
operational" as per [RFC5443].
5. Applicability 5. Applicability
As previously stated, the mechanism only avoids the forwarding loops As previously stated, the mechanism only avoids the forwarding loops
on the links between the node local to the failure and its neighbor. on the links between the node local to the failure and its neighbor.
Forwarding loops may still occur on other links. Forwarding loops may still occur on other links.
5.1. Applicable case : local loops 5.1. Applicable case : local loops
A ------ B ----- E A ------ B ----- E
| / | | / |
skipping to change at page 12, line 17 skipping to change at page 11, line 46
This local delay proposal is a transient forwarding loop avoidance This local delay proposal is a transient forwarding loop avoidance
mechanism (like OFIB). Even if it only address local transient mechanism (like OFIB). Even if it only address local transient
loops, , the efficiency versus complexity comparison of the mechanism loops, , the efficiency versus complexity comparison of the mechanism
makes it a good solution. It is also incrementally deployable with makes it a good solution. It is also incrementally deployable with
incremental benefits, which makes it an attractive option for both incremental benefits, which makes it an attractive option for both
vendors to implement and Service Providers to deploy. Delaying vendors to implement and Service Providers to deploy. Delaying
convergence time is not an issue if we consider that the traffic is convergence time is not an issue if we consider that the traffic is
protected during the convergence. protected during the convergence.
8. Comparison with other solutions 8. Examples
We will consider the following figure for the associated examples :
D
1 | F----X
| 1 |
A ------ B
| | ^
10 | | 5 | T
| | |
E--------C
| 1
1 |
S
The network above is considered to have a convergence time about 1
second, so ULOOP_DELAY_UP_TIMER and ULOOP_DELAY_DOWN_TIMER will be
adjusted to this value. We also consider FRR running on each node.
8.1. Local link down
The table below describes the events and associating timing that
happens on router C and E when link B-C goes down. As C detects a a
single local event corresponding to a link down (its LSP + LSP from B
received), it decides to apply the local delay down behavior and no
microloop is formed.
+-----------+-------------+------------------+----------------------+
| Network | Time | Router C events | Router E events |
| condition | | | |
+-----------+-------------+------------------+----------------------+
| S->D | | | |
| Traffic | | | |
| OK | | | |
| | | | |
| S->D | t0 | Link B-C fails | Link B-C fails |
| Traffic | | | |
| lost | | | |
| | | | |
| | t0+20msec | C detects the | |
| | | failure | |
| | | | |
| S->D | t0+40msec | C activates FRR | |
| Traffic | | | |
| OK | | | |
| | | | |
| | t0+50msec | C updates its | |
| | | local LSP/LSA | |
| | | | |
| | t0+60msec | C schedules SPF | |
| | | (100ms) | |
| | | | |
| | t0+67msec | C receives | |
| | | LSP/LSA from B | |
| | | | |
| | t0+70msec | C floods its | |
| | | local updated | |
| | | LSP/LSA | |
| | | | |
| | t0+87msec | | E receives LSP/LSA |
| | | | from C and schedules |
| | | | SPF (100ms) |
| | | | |
| | t0+117msec | | E floods LSP/LSA |
| | | | from C |
| | | | |
| | t0+160msec | C computes SPF | |
| | | | |
| | t0+165msec | C delays its | |
| | | RIB/FIB update | |
| | | (1 sec) | |
| | | | |
| | t0+193msec | | E computes SPF |
| | | | |
| | t0+199msec | | E starts updating |
| | | | its RIB/FIB |
| | | | |
| | t0+443msec | | E updates its |
| | | | RIB/FIB for D |
| | | | |
| | t0+470msec | | E convergence ends |
| | | | |
| | t0+1165msec | C starts | |
| | | updating its | |
| | | RIB/FIB | |
| | | | |
| | t0+1255msec | C updates its | |
| | | RIB/FIB for D | |
| | | | |
| | t0+1340msec | C convergence | |
| | | ends | |
+-----------+-------------+------------------+----------------------+
Route computation event time scale
Similarly, upon B-C link down event, if LSP/LSA from B is received
before C detects the link failure, C will apply the route update
delay if the local detection is part of the same SPF run.
+-----------+-------------+------------------+----------------------+
| Network | Time | Router C events | Router E events |
| condition | | | |
+-----------+-------------+------------------+----------------------+
| S->D | | | |
| Traffic | | | |
| OK | | | |
| | | | |
| S->D | t0 | Link B-C fails | Link B-C fails |
| Traffic | | | |
| lost | | | |
| | | | |
| | t0+32msec | C receives | |
| | | LSP/LSA from B | |
| | | | |
| | t0+33msec | C schedules SPF | |
| | | (100ms) | |
| | | | |
| | t0+50msec | C detects the | |
| | | failure | |
| | | | |
| S->D | t0+55msec | C activates FRR | |
| Traffic | | | |
| OK | | | |
| | | | |
| | t0+55msec | C updates its | |
| | | local LSP/LSA | |
| | | | |
| | t0+70msec | C floods its | |
| | | local updated | |
| | | LSP/LSA | |
| | | | |
| | t0+87msec | | E receives LSP/LSA |
| | | | from C and schedules |
| | | | SPF (100ms) |
| | | | |
| | t0+117msec | | E floods LSP/LSA |
| | | | from C |
| | | | |
| | t0+160msec | C computes SPF | |
| | | | |
| | t0+165msec | C delays its | |
| | | RIB/FIB update | |
| | | (1 sec) | |
| | | | |
| | t0+193msec | | E computes SPF |
| | | | |
| | t0+199msec | | E starts updating |
| | | | its RIB/FIB |
| | | | |
| | t0+443msec | | E updates its |
| | | | RIB/FIB for D |
| | | | |
| | t0+470msec | | E convergence ends |
| | | | |
| | t0+1165msec | C starts | |
| | | updating its | |
| | | RIB/FIB | |
| | | | |
| | t0+1255msec | C updates its | |
| | | RIB/FIB for D | |
| | | | |
| | t0+1340msec | C convergence | |
| | | ends | |
+-----------+-------------+------------------+----------------------+
Route computation event time scale
8.2. Local and remote event
The table below describes the events and associating timing that
happens on router C and E when link B-C goes down, in addition F-X
link will fail in the same time window. C will not apply the local
delay because a non local topology change is also received.
+-----------+------------+-----------------+------------------------+
| Network | Time | Router C events | Router E events |
| condition | | | |
+-----------+------------+-----------------+------------------------+
| S->D | | | |
| Traffic | | | |
| OK | | | |
| | | | |
| S->D | t0 | Link B-C fails | Link B-C fails |
| Traffic | | | |
| lost | | | |
| | | | |
| | t0+20msec | C detects the | |
| | | failure | |
| | | | |
| | t0+36msec | Link F-X fails | Link F-X fails |
| | | | |
| S->D | t0+40msec | C activates FRR | |
| Traffic | | | |
| OK | | | |
| | | | |
| | t0+50msec | C updates its | |
| | | local LSP/LSA | |
| | | | |
| | t0+54msec | C receives | |
| | | LSP/LSA from F | |
| | | and floods it | |
| | | | |
| | t0+60msec | C schedules SPF | |
| | | (100ms) | |
| | | | |
| | t0+67msec | C receives | |
| | | LSP/LSA from B | |
| | | | |
| | t0+69msec | | E receives LSP/LSA |
| | | | from F, floods it and |
| | | | schedules SPF (100ms) |
| | | | |
| | t0+70msec | C floods its | |
| | | local updated | |
| | | LSP/LSA | |
| | | | |
| | t0+87msec | | E receives LSP/LSA |
| | | | from C |
| | | | |
| | t0+117msec | | E floods LSP/LSA from |
| | | | C |
| | | | |
| | t0+160msec | C computes SPF | |
| | | | |
| | t0+165msec | C starts | |
| | | updating its | |
| | | RIB/FIB (NO | |
| | | DELAY) | |
| | | | |
| | t0+170msec | | E computes SPF |
| | | | |
| | t0+173msec | | E starts updating its |
| | | | RIB/FIB |
| | | | |
| S->D | t0+365msec | C updates its | |
| Traffic | | RIB/FIB for D | |
| lost | | | |
| | | | |
| S->D | t0+443msec | | E updates its RIB/FIB |
| Traffic | | | for D |
| OK | | | |
| | | | |
| | t0+450msec | C convergence | |
| | | ends | |
| | | | |
| | t0+470msec | | E convergence ends |
| | | | |
+-----------+------------+-----------------+------------------------+
Route computation event time scale
8.3. Aborting local delay
The table below describes the events and associating timing that
happens on router C and E when link B-C goes down, in addition F-X
link will fail during local delay run. C will first apply local
delay, but when the new event happens, it will fallback to the
standard convergence mechanism without delaying route insertion
anymore. In this example, we consider a ULOOP_DELAY_DOWN_TIMER
configured to 2 seconds.
+-----------+------------+-------------------+----------------------+
| Network | Time | Router C events | Router E events |
| condition | | | |
+-----------+------------+-------------------+----------------------+
| S->D | | | |
| Traffic | | | |
| OK | | | |
| | | | |
| S->D | t0 | Link B-C fails | Link B-C fails |
| Traffic | | | |
| lost | | | |
| | | | |
| | t0+20msec | C detects the | |
| | | failure | |
| | | | |
| S->D | t0+40msec | C activates FRR | |
| Traffic | | | |
| OK | | | |
| | | | |
| | t0+50msec | C updates its | |
| | | local LSP/LSA | |
| | | | |
| | t0+60msec | C schedules SPF | |
| | | (100ms) | |
| | | | |
| | t0+67msec | C receives | |
| | | LSP/LSA from B | |
| | | | |
| | t0+70msec | C floods its | |
| | | local updated | |
| | | LSP/LSA | |
| | | | |
| | t0+87msec | | E receives LSP/LSA |
| | | | from C and schedules |
| | | | SPF (100ms) |
| | | | |
| | t0+117msec | | E floods LSP/LSA |
| | | | from C |
| | | | |
| | t0+160msec | C computes SPF | |
| | | | |
| | t0+165msec | C delays its | |
| | | RIB/FIB update (2 | |
| | | sec) | |
| | | | |
| | t0+193msec | | E computes SPF |
| | | | |
| | t0+199msec | | E starts updating |
| | | | its RIB/FIB |
| | | | |
| | t0+254msec | Link F-X fails | Link F-X fails |
| | | | |
| | t0+300msec | C receives | |
| | | LSP/LSA from F | |
| | | and floods it | |
| | | | |
| | t0+303msec | C schedules SPF | |
| | | (200ms) | |
| | | | |
| | t0+312msec | E receives | |
| | | LSP/LSA from F | |
| | | and floods it | |
| | | | |
| | t0+313msec | E schedules SPF | |
| | | (200ms) | |
| | | | |
| | t0+502msec | C computes SPF | |
| | | | |
| | t0+505msec | C starts updating | |
| | | its RIB/FIB (NO | |
| | | DELAY) | |
| | | | |
| | t0+514msec | | E computes SPF |
| | | | |
| | t0+519msec | | E starts updating |
| | | | its RIB/FIB |
| | | | |
| S->D | t0+659msec | C updates its | |
| Traffic | | RIB/FIB for D | |
| lost | | | |
| | | | |
| S->D | t0+778msec | | E updates its |
| Traffic | | | RIB/FIB for D |
| OK | | | |
| | | | |
| | t0+781msec | C convergence | |
| | | ends | |
| | | | |
| | t0+810msec | | E convergence ends |
+-----------+------------+-------------------+----------------------+
Route computation event time scale
9. Comparison with other solutions
As stated in Section 3, our solution reuses some concepts already As stated in Section 3, our solution reuses some concepts already
introduced by other IETF proposals but tries to find a tradeoff introduced by other IETF proposals but tries to find a tradeoff
between efficiency and simplicity. This section tries to compare between efficiency and simplicity. This section tries to compare
behaviors of the solutions. behaviors of the solutions.
8.1. PLSN 9.1. PLSN
PLSN ([I-D.ietf-rtgwg-microloop-analysis]) describes a mechanism PLSN ([I-D.ietf-rtgwg-microloop-analysis]) describes a mechanism
where each node in the network tries a avoid transient forwarding where each node in the network tries a avoid transient forwarding
loops upon a topology change by always keeping traffic on a loop-free loops upon a topology change by always keeping traffic on a loop-free
path for a defined duration (locked path to a safe neighbor). The path for a defined duration (locked path to a safe neighbor). The
locked path may be the new primary nexthop, another neighbor, or the locked path may be the new primary nexthop, another neighbor, or the
old primary nexthop depending how the safety condition is satisified. old primary nexthop depending how the safety condition is satisified.
PLSN does not solve all transient forwarding loops (see PLSN does not solve all transient forwarding loops (see
[I-D.ietf-rtgwg-microloop-analysis] Section 4 for more details). [I-D.ietf-rtgwg-microloop-analysis] Section 4 for more details).
skipping to change at page 13, line 9 skipping to change at page 20, line 9
the new primary nexthop in case the new safe nexthop does not the new primary nexthop in case the new safe nexthop does not
enough provide enough bandwidth (see enough provide enough bandwidth (see
[I-D.ietf-rtgwg-lfa-manageability]). Our solution may not [I-D.ietf-rtgwg-lfa-manageability]). Our solution may not
experience this issue as the service provider may have control on experience this issue as the service provider may have control on
the FRR path being used preventing network congestion. the FRR path being used preventing network congestion.
o PLSN applies to all nodes in a network (remote or local changes), o PLSN applies to all nodes in a network (remote or local changes),
while our mechanism applies only on the nodes connected to the while our mechanism applies only on the nodes connected to the
topology change. topology change.
8.2. OFIB 9.2. OFIB
OFIB ([RFC6976]) describes a mechanism where convergence of the OFIB ([RFC6976]) describes a mechanism where convergence of the
network upon a topology change is made ordered to prevent transient network upon a topology change is made ordered to prevent transient
forwarding loops. Each router in the network must deduce the failure forwarding loops. Each router in the network must deduce the failure
type from the LSA/LSP received and compute/apply a specific FIB type from the LSA/LSP received and compute/apply a specific FIB
update timer based on the failure type and its rank in the network update timer based on the failure type and its rank in the network
considering the failure point as root. considering the failure point as root.
This mechanism permit to solve all the transient forwarding loop in a This mechanism permit to solve all the transient forwarding loop in a
network at the price of introducing complexity in the convergence network at the price of introducing complexity in the convergence
process that may require strong monitoring by the service provider. process that may require strong monitoring by the service provider.
Our solution reuses the OFIB concept but limits it to the first hop Our solution reuses the OFIB concept but limits it to the first hop
that experience the topology change. As demonstrated, our proposal that experience the topology change. As demonstrated, our proposal
permits to solve all the local transient forwarding loops that permits to solve all the local transient forwarding loops that
represents a high percentage of all the loops. Moreover limiting the represents a high percentage of all the loops. Moreover limiting the
mechanism to one hop permit to keep the network-wide convergence mechanism to one hop permit to keep the network-wide convergence
behavior. behavior.
9. Security Considerations 10. Existing implementations
At this time, there is three different implementations of this
mechanism : CISCO IOS-XR, CISCO IOS-XE and Juniper JUNOS. The three
implementations have been tested in labs and demonstrated a good
behavior in term of local micro-loop avoidance. No side effects have
been found.
11. Security Considerations
This document does not introduce change in term of IGP security. The This document does not introduce change in term of IGP security. The
operation is internal to the router. The local delay does not operation is internal to the router. The local delay does not
increase the attack vector as an attacker could only trigger this increase the attack vector as an attacker could only trigger this
mechanism if he already has be ability to disable or enable an IGP mechanism if he already has be ability to disable or enable an IGP
link. The local delay does not increase the negative consequences as link. The local delay does not increase the negative consequences as
if an attacker has the ability to disable or enable an IGP link, it if an attacker has the ability to disable or enable an IGP link, it
can already harm the network by creating instability and harm the can already harm the network by creating instability and harm the
traffic by creating forwarding packet loss and forwarding loss for traffic by creating forwarding packet loss and forwarding loss for
the traffic crossing that link. the traffic crossing that link.
10. Acknowledgements 12. Acknowledgements
We wish to thanks the authors of [RFC6976] for introducing the We wish to thanks the authors of [RFC6976] for introducing the
concept of ordered convergence: Mike Shand, Stewart Bryant, Stefano concept of ordered convergence: Mike Shand, Stewart Bryant, Stefano
Previdi, and Olivier Bonaventure. Previdi, and Olivier Bonaventure.
11. IANA Considerations 13. IANA Considerations
This document has no actions for IANA. This document has no actions for IANA.
12. References 14. References
12.1. Normative References 14.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>. <http://www.rfc-editor.org/info/rfc2119>.
[RFC5443] Jork, M., Atlas, A., and L. Fang, "LDP IGP
Synchronization", RFC 5443, DOI 10.17487/RFC5443, March
2009, <http://www.rfc-editor.org/info/rfc5443>.
[RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free
Convergence", RFC 5715, DOI 10.17487/RFC5715, January Convergence", RFC 5715, DOI 10.17487/RFC5715, January
2010, <http://www.rfc-editor.org/info/rfc5715>. 2010, <http://www.rfc-editor.org/info/rfc5715>.
12.2. Informative References 14.2. Informative References
[I-D.ietf-rtgwg-lfa-manageability] [I-D.ietf-rtgwg-lfa-manageability]
Litkowski, S., Decraene, B., Filsfils, C., Raza, K., Litkowski, S., Decraene, B., Filsfils, C., Raza, K.,
Horneffer, M., and P. Sarkar, "Operational management of Horneffer, M., and P. Sarkar, "Operational management of
Loop Free Alternates", draft-ietf-rtgwg-lfa- Loop Free Alternates", draft-ietf-rtgwg-lfa-
manageability-11 (work in progress), June 2015. manageability-11 (work in progress), June 2015.
[I-D.ietf-rtgwg-microloop-analysis] [I-D.ietf-rtgwg-microloop-analysis]
Zinin, A., "Analysis and Minimization of Microloops in Zinin, A., "Analysis and Minimization of Microloops in
Link-state Routing Protocols", draft-ietf-rtgwg-microloop- Link-state Routing Protocols", draft-ietf-rtgwg-microloop-
 End of changes. 30 change blocks. 
124 lines changed or deleted 484 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/