draft-ietf-rtgwg-uloop-delay-07.txt   draft-ietf-rtgwg-uloop-delay-08.txt 
Routing Area Working Group S. Litkowski Routing Area Working Group S. Litkowski
Internet-Draft B. Decraene Internet-Draft B. Decraene
Intended status: Standards Track Orange Intended status: Standards Track Orange
Expires: April 13, 2018 C. Filsfils Expires: April 15, 2018 C. Filsfils
Cisco Systems Cisco Systems
P. Francois P. Francois
Individual Individual
October 10, 2017 October 12, 2017
Micro-loop prevention by introducing a local convergence delay Micro-loop prevention by introducing a local convergence delay
draft-ietf-rtgwg-uloop-delay-07 draft-ietf-rtgwg-uloop-delay-08
Abstract Abstract
This document describes a mechanism for link-state routing protocols This document describes a mechanism for link-state routing protocols
to prevent local transient forwarding loops in case of link failure. to prevent local transient forwarding loops in case of link failure.
This mechanism proposes a two-step convergence by introducing a delay This mechanism proposes a two-step convergence by introducing a delay
between the convergence of the node adjacent to the topology change between the convergence of the node adjacent to the topology change
and the network wide convergence. and the network wide convergence.
As this mechanism delays the IGP convergence it may only be used for As this mechanism delays the IGP convergence it may only be used for
skipping to change at page 2, line 10 skipping to change at page 2, line 10
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 13, 2018. This Internet-Draft will expire on April 15, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 37 skipping to change at page 2, line 37
Table of Contents Table of Contents
1. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Transient forwarding loops side effects . . . . . . . . . . . 4 3. Transient forwarding loops side effects . . . . . . . . . . . 4
3.1. Fast reroute inefficiency . . . . . . . . . . . . . . . . 4 3.1. Fast reroute inefficiency . . . . . . . . . . . . . . . . 4
3.2. Network congestion . . . . . . . . . . . . . . . . . . . 7 3.2. Network congestion . . . . . . . . . . . . . . . . . . . 7
4. Overview of the solution . . . . . . . . . . . . . . . . . . 7 4. Overview of the solution . . . . . . . . . . . . . . . . . . 7
5. Specification . . . . . . . . . . . . . . . . . . . . . . . . 8 5. Specification . . . . . . . . . . . . . . . . . . . . . . . . 8
5.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 8 5.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 8
5.2. Current IGP reactions . . . . . . . . . . . . . . . . . . 8 5.2. Regular IGP reaction . . . . . . . . . . . . . . . . . . 8
5.3. Local events . . . . . . . . . . . . . . . . . . . . . . 9 5.3. Local events . . . . . . . . . . . . . . . . . . . . . . 9
5.4. Local delay for link down . . . . . . . . . . . . . . . . 9 5.4. Local delay for link down . . . . . . . . . . . . . . . . 10
6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 10 6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 10
6.1. Applicable case: local loops . . . . . . . . . . . . . . 10 6.1. Applicable case: local loops . . . . . . . . . . . . . . 10
6.2. Non applicable case: remote loops . . . . . . . . . . . . 11 6.2. Non applicable case: remote loops . . . . . . . . . . . . 11
7. Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 11 7. Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 11
8. Deployment considerations . . . . . . . . . . . . . . . . . . 12 8. Deployment considerations . . . . . . . . . . . . . . . . . . 12
9. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 13 9. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 13
9.1. Local link down . . . . . . . . . . . . . . . . . . . . . 13 9.1. Local link down . . . . . . . . . . . . . . . . . . . . . 14
9.2. Local and remote event . . . . . . . . . . . . . . . . . 17 9.2. Local and remote event . . . . . . . . . . . . . . . . . 18
9.3. Aborting local delay . . . . . . . . . . . . . . . . . . 18 9.3. Aborting local delay . . . . . . . . . . . . . . . . . . 19
10. Comparison with other solutions . . . . . . . . . . . . . . . 21 10. Comparison with other solutions . . . . . . . . . . . . . . . 23
10.1. PLSN . . . . . . . . . . . . . . . . . . . . . . . . . . 21 10.1. PLSN . . . . . . . . . . . . . . . . . . . . . . . . . . 23
10.2. OFIB . . . . . . . . . . . . . . . . . . . . . . . . . . 21 10.2. OFIB . . . . . . . . . . . . . . . . . . . . . . . . . . 23
11. Existing implementations . . . . . . . . . . . . . . . . . . 22 11. Implementation Status . . . . . . . . . . . . . . . . . . . . 24
12. Security Considerations . . . . . . . . . . . . . . . . . . . 22 12. Security Considerations . . . . . . . . . . . . . . . . . . . 25
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 25
14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26
15. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 26
15.1. Normative References . . . . . . . . . . . . . . . . . . 23 15.1. Normative References . . . . . . . . . . . . . . . . . . 26
15.2. Informative References . . . . . . . . . . . . . . . . . 23 15.2. Informative References . . . . . . . . . . . . . . . . . 26
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27
1. Acronyms 1. Acronyms
FIB: Forwarding Information Base FIB: Forwarding Information Base
FRR: Fast ReRoute FRR: Fast ReRoute
IGP: Interior Gateway Protocol IGP: Interior Gateway Protocol
LFA: Loop Free Alternate LFA: Loop Free Alternate
skipping to change at page 4, line 6 skipping to change at page 4, line 6
Micro-forwarding loops and some potential solutions are well Micro-forwarding loops and some potential solutions are well
described in [RFC5715]. This document describes a simple targeted described in [RFC5715]. This document describes a simple targeted
mechanism that prevents micro-loops that are local to the failure. mechanism that prevents micro-loops that are local to the failure.
Based on network analysis, local failures make up a significant Based on network analysis, local failures make up a significant
portion of the micro-forwarding loops. A simple and easily portion of the micro-forwarding loops. A simple and easily
deployable solution for these local micro-loops is critical because deployable solution for these local micro-loops is critical because
these local loops cause some traffic loss after a fast-reroute these local loops cause some traffic loss after a fast-reroute
alternate has been used (see Section 3.1). alternate has been used (see Section 3.1).
Consider the case in Figure 1 where S does not have an LFA (Loop Free Consider the case in Figure 1 where S does not have an LFA (Loop Free
Alternate) to protect its traffic to D. That means that all non-D Alternate) to protect its traffic to D when the S-D link fails. That
neighbors of S on the topology will send to S any traffic destined to means that all non-D neighbors of S on the topology will send to S
D if a neighbor did not, then that neighbor would be loop-free. any traffic destined to D; if a neighbor did not, then that neighbor
Regardless of the advanced fast-reroute (FRR) technique used, when S would be loop-free. Regardless of the advanced fast-reroute (FRR)
converges to the new topology, it will send its traffic to a neighbor technique used, when S converges to the new topology, it will send
that was not loop-free and thus cause a local micro-loop. The its traffic to a neighbor that was not loop-free and thus cause a
deployment of advanced fast-reroute techniques motivates this simple local micro-loop. The deployment of advanced fast-reroute techniques
router-local mechanism to solve this targeted problem. This solution motivates this simple router-local mechanism to solve this targeted
can be work with the various techniques described in [RFC5715]. problem. This solution can work with the various techniques
described in [RFC5715].
1
D ------ C D ------ C
| | | |
1 | | 5 | | 5
| | | |
S ------ B S ------ B
1
Figure 1 Figure 1
When S-D fails, a transient forwarding loop may appear between S and In the Figure 1, all links have a metric of 1 except B-C which has a
B if S updates its forwarding entry to D before B. metric of 5. When S-D fails, a transient forwarding loop may appear
between S and B if S updates its forwarding entry to D before B does.
3. Transient forwarding loops side effects 3. Transient forwarding loops side effects
Even if they are very limited in duration, transient forwarding loops Even if they are very limited in duration, transient forwarding loops
may cause high damages for a network. may cause significant network damage.
3.1. Fast reroute inefficiency 3.1. Fast reroute inefficiency
D D
1 | 1 |
| 1 | 1
A ------ B A ------ B
| | ^ | | ^
10 | | 5 | T 10 | | 5 | T
| | | | | |
E--------C E--------C
| 1 | 1
1 | 1 |
S S
Figure 2 - RSVP-TE FRR case Figure 2 - RSVP-TE FRR case
In the Figure 2, we consider an IP/LDP routed network. An RSVP-TE In the Figure 2, we consider an IP/LDP routed network. An RSVP-TE
tunnel T, provisioned on C and terminating on B, is used to protect tunnel T, provisioned on C and terminating on B, is used to protect
the traffic against C-B link failure (the IGP shortcut feature is the traffic against C-B link failure (the IGP shortcut feature,
activated on C). The primary path of T is C->B and FRR is activated defined in [RFC3906], is activated on C ). The primary path of T is
on T providing an FRR bypass or detour using path C->E->A->B. On C->B and FRR is activated on T providing an FRR bypass or detour
router C, the next hop to D is the tunnel T thanks to the IGP using path C->E->A->B. On router C, the next hop to D is the tunnel
shortcut. When C-B link fails: T thanks to the IGP shortcut. When C-B link fails:
1. C detects the failure, and updates the tunnel path using a 1. C detects the failure, and updates the tunnel path using a
preprogrammed FRR path. The traffic path from S to D becomes: preprogrammed FRR path. The traffic path from S to D becomes:
S->E->C->E->A->B->A->D. S->E->C->E->A->B->A->D.
2. In parallel, on router C, both the IGP convergence and the TE 2. In parallel, on router C, both the IGP convergence and the TE
tunnel convergence (tunnel path recomputation) are occurring: tunnel convergence (tunnel path recomputation) are occurring:
* The Tunnel T path is recomputed and now uses C->E->A->B. * The Tunnel T path is recomputed and now uses C->E->A->B.
* The IGP path to D is recomputed and now uses C->E->A->D. * The IGP path to D is recomputed and now uses C->E->A->D.
3. On C, the tail-end of the TE tunnel (router B) is no longer on 3. On C, the tail-end of the TE tunnel (router B) is no longer on
the shortest-path tree (SPT) to D, so C does not continue to the shortest-path tree (SPT) to D, so C does not continue to
encapsulate the traffic to D using the tunnel T and updates its encapsulate the traffic to D using the tunnel T and updates its
forwarding entry to D using the nexthop E. forwarding entry to D using the nexthop E.
If C updates its forwarding entry to D before router E, there would If C updates its forwarding entry to D before router E, there would
be a transient forwarding loop between C and E until E has converged. be a transient forwarding loop between C and E until E has converged.
The table 1 below describes a theoretical sequence of events
happening when the B-C link fails. This theoretical sequence of
events should only be read as an example.
+-----------+------------+------------------+-----------------------+ +-----------+------------+------------------+-----------------------+
| Network | Time | Router C events | Router E events | | Network | Time | Router C events | Router E events |
| condition | | | | | condition | | | |
+-----------+------------+------------------+-----------------------+ +-----------+------------+------------------+-----------------------+
| S->D | | | | | S->D | | | |
| Traffic | | | | | Traffic | | | |
| OK | | | | | OK | | | |
| | | | | | | | | |
| S->D | t0 | Link B-C fails | Link B-C fails | | S->D | t0 | Link B-C fails | Link B-C fails |
| Traffic | | | | | Traffic | | | |
skipping to change at page 6, line 40 skipping to change at page 6, line 46
| | t0+340msec | C convergence | | | | t0+340msec | C convergence | |
| | | ends | | | | | ends | |
| | | | | | | | | |
| S->D | t0+443msec | | E updates its RIB/FIB | | S->D | t0+443msec | | E updates its RIB/FIB |
| Traffic | | | for D | | Traffic | | | for D |
| OK | | | | | OK | | | |
| | | | | | | | | |
| | t0+470msec | | E convergence ends | | | t0+470msec | | E convergence ends |
+-----------+------------+------------------+-----------------------+ +-----------+------------+------------------+-----------------------+
Route computation event time scale Table 1 - Route computation event time scale
The issue described here is completely independent of the fast- The issue described here is completely independent of the fast-
reroute mechanism involved (TE FRR, LFA/rLFA, MRT ...) when the reroute mechanism involved (TE FRR, LFA/rLFA, MRT ...) when the
primary path uses hop-by-hop routing. The protection enabled by primary path uses hop-by-hop routing. The protection enabled by
fast-reroute is working perfectly, but ensures a protection, by fast-reroute is working perfectly, but ensures a protection, by
definition, only until the PLR has converged (as soon as the PLR has definition, only until the PLR has converged (as soon as the PLR has
converged, it replaces its FRR path by a new primary path). When converged, it replaces its FRR path by a new primary path). When
implementing FRR, a service provider wants to guarantee a very implementing FRR, a service provider wants to guarantee a very
limited loss of connectivity time. The previous example shows that limited loss of connectivity time. The previous example shows that
the benefit of FRR may be completely lost due to a transient the benefit of FRR may be completely lost due to a transient
skipping to change at page 7, line 43 skipping to change at page 7, line 49
affected by the failure: e.g. A to B, F to B, E to B. Class of affected by the failure: e.g. A to B, F to B, E to B. Class of
service may mitigate the congestion for some traffic. However, some service may mitigate the congestion for some traffic. However, some
traffic not directly affected by the failure will still be dropped as traffic not directly affected by the failure will still be dropped as
a router is not able to distinguish the looping traffic from the a router is not able to distinguish the looping traffic from the
normally forwarded traffic. normally forwarded traffic.
4. Overview of the solution 4. Overview of the solution
This document defines a two-step convergence initiated by the router This document defines a two-step convergence initiated by the router
detecting a failure and advertising the topological changes in the detecting a failure and advertising the topological changes in the
IGP. This introduces a delay between the convergence of the local IGP. This introduces a delay between network-wide convergence and
router and the network wide convergence. the convergence of the local router.
The proposed solution is limited to local link down events in order The proposed solution is limited to local link down events in order
to keep the solution simple. to keep the solution simple.
This ordered convergence is similar to the ordered FIB proposed This ordered convergence is similar to the ordered FIB proposed
defined in [RFC6976], but it is limited to only a "one hop" distance. defined in [RFC6976], but it is limited to only a "one hop" distance.
As a consequence, it is more simple and becomes a local-only feature As a consequence, it is more simple and becomes a local-only feature
that does not require interoperability. This benefit comes at the that does not require interoperability. This benefit comes with the
expense of eliminating transient forwarding loops involving the local limitation of eliminating transient forwarding loops involving the
router. The proposed mechanism also reuses some concepts described local router only. The proposed mechanism also reuses some concepts
in [I-D.ietf-rtgwg-microloop-analysis]. described in [I-D.ietf-rtgwg-microloop-analysis].
5. Specification 5. Specification
5.1. Definitions 5.1. Definitions
This document will refer to the following existing IGP timers: This document will refer to the following existing IGP timers. These
timers may be standardized or implemented as a vendor specific local
feature.
o LSP_GEN_TIMER: The delay used to batch multiple local events in o LSP_GEN_TIMER: The delay used to batch multiple local events in
one single local LSP/LSA update. It is often associated with a one single local LSP/LSA update. In IS-IS, this timer is defined
damping mechanism to slow down reactions by incrementing the timer as minimumLSPGenerationInterval in [ISO10589]. In OSPF version 2,
when multiple consecutive events are detected. this timer is defined as MinLSInterval in [RFC2328]. It is often
associated with a vendor specific damping mechanism to slow down
reactions by incrementing the timer when multiple consecutive
events are detected.
o SPF_DELAY: The delay between the first IGP event triggering a new o SPF_DELAY: The delay between the first IGP event triggering a new
routing table computation and the start of that routing table routing table computation and the start of that routing table
computation. It is often associated with a damping mechanism to computation. It is often associated with a damping mechanism to
slow down reactions by incrementing the timer when the IGP becomes slow down reactions by incrementing the timer when the IGP becomes
unstable. As an example, [I-D.ietf-rtgwg-backoff-algo] defines a unstable. As an example, [I-D.ietf-rtgwg-backoff-algo] defines a
standard SPF delay algorithm. standard SPF (Shortest Path First) delay algorithm.
This document introduces the following new timer: This document introduces the following new timer:
o ULOOP_DELAY_DOWN_TIMER: used to slow down the local node o ULOOP_DELAY_DOWN_TIMER: used to slow down the local node
convergence in case of link down events. convergence in case of link down events.
5.2. Current IGP reactions 5.2. Regular IGP reaction
Upon a change of the status of an adjacency/link, the existing Upon a change of the status of an adjacency/link, the regular IGP
behavior of the router advertising the event is the following: convergence behavior of the router advertising the event involves the
following main steps:
1. The Up/Down event is notified to the IGP. 1. IGP is notified of the Up/Down event.
2. The IGP processes the notification and postpones the reaction for 2. The IGP processes the notification and postpones the reaction for
LSP_GEN_TIMER msec. LSP_GEN_TIMER msec.
3. Upon LSP_GEN_TIMER expiration, the IGP updates its LSP/LSA and 3. Upon LSP_GEN_TIMER expiration, the IGP updates its LSP/LSA and
floods it. floods it.
4. The SPF computation is scheduled in SPF_DELAY msec. 4. The SPF computation is scheduled in SPF_DELAY msec.
5. Upon SPF_DELAY timer expiration, the SPF is computed, then the 5. Upon SPF_DELAY timer expiration, the SPF is computed, then the
RIB and FIB are updated. RIB and FIB are updated.
5.3. Local events 5.3. Local events
The mechanism described in this document assumes that there has been The mechanism described in this document assumes that there has been
a single link failure as seen by the IGP area/level. If this a single link failure as seen by the IGP area/level. If this
assumption is violated (e.g. multiple links or nodes failed), then assumption is violated (e.g. multiple links or nodes failed), then
standard IP convergence MUST be applied (as described in regular IP convergence must be applied (as described in Section 5.2).
Section 5.2).
To determine if the mechanism can be applicable or not, an To determine if the mechanism can be applicable or not, an
implementation SHOULD implement logic to correlate the protocol implementation SHOULD implement logic to correlate the protocol
messages (LSP/LSA) received during the SPF scheduling period in order messages (LSP/LSA) received during the SPF scheduling period in order
to determine the topology changes that occured. This is necessary as to determine the topology changes that occured. This is necessary as
multiple protocol messages may describe the same topology change and multiple protocol messages may describe the same topology change and
a single protocol message may describe multiple topology changes. As a single protocol message may describe multiple topology changes. As
a consequence, determining a particular topology change MUST be a consequence, determining a particular topology change MUST be
independent of the order of reception of those protocol messages. independent of the order of reception of those protocol messages.
How the logic works is left to the implementation. How the logic works is left to the implementation.
Using this logic, if an implementation determines that the associated Using this logic, if an implementation determines that the associated
topology change is a single local link failure, then the router MAY topology change is a single local link failure, then the router MAY
use the mechanism described in this document, otherwise the standard use the mechanism described in this document, otherwise the regular
IP convergence MUST be used. IP convergence MUST be used.
Example: Example:
+--- E ----+--------+ +--- E ----+--------+
| | | | | |
A ---- B -------- C ------ D A ---- B -------- C ------ D
Figure 4 Figure 4
skipping to change at page 10, line 7 skipping to change at page 10, line 16
5.4. Local delay for link down 5.4. Local delay for link down
Upon an adjacency/link down event, this document introduces a change Upon an adjacency/link down event, this document introduces a change
in step 5 (Section 5.2) in order to delay the local convergence in step 5 (Section 5.2) in order to delay the local convergence
compared to the network wide convergence. The new step 5 is compared to the network wide convergence. The new step 5 is
described below: described below:
5. Upon SPF_DELAY timer expiration, the SPF is computed. If the 5. Upon SPF_DELAY timer expiration, the SPF is computed. If the
condition of a single local link-down event has been met, then an condition of a single local link-down event has been met, then an
update of the RIB and the FIB SHOULD be delayed for update of the RIB and the FIB MUST be delayed for
ULOOP_DELAY_DOWN_TIMER msecs. Otherwise, the RIB and FIB SHOULD ULOOP_DELAY_DOWN_TIMER msecs. Otherwise, the RIB and FIB SHOULD
be updated immediately. be updated immediately.
If a new convergence occurs while ULOOP_DELAY_DOWN_TIMER is running, If a new convergence occurs while ULOOP_DELAY_DOWN_TIMER is running,
ULOOP_DELAY_DOWN_TIMER is stopped and the RIB/FIB SHOULD be updated ULOOP_DELAY_DOWN_TIMER is stopped and the RIB/FIB SHOULD be updated
as part of the new convergence event. as part of the new convergence event.
As a result of this addition, routers local to the failure will As a result of this addition, routers local to the failure will
converge slower than remote routers. Hence it SHOULD only be done converge slower than remote routers. Hence it SHOULD only be done
for a non-urgent convergence, such as for administrative de- for a non-urgent convergence, such as for administrative de-
skipping to change at page 11, line 51 skipping to change at page 12, line 18
| T1 | 71% | | T1 | 71% |
| T2 | 81% | | T2 | 81% |
| T3 | 62% | | T3 | 62% |
| T4 | 50% | | T4 | 50% |
| T5 | 70% | | T5 | 70% |
| T6 | 70% | | T6 | 70% |
| T7 | 59% | | T7 | 59% |
| T8 | 77% | | T8 | 77% |
+----------+------+ +----------+------+
Table 1: Number of Repair/Dst that may loop Table 2 - Number of Repair/Dst that may loop
We evaluated the efficiency of the mechanism on eight different We evaluated the efficiency of the mechanism on eight different
service provider topologies (different network size, design). The service provider topologies (different network size, design). The
benefit is displayed in the table above. The benefit is evaluated as benefit is displayed in the table above. The benefit is evaluated as
follows: follows:
o We consider a tuple (link A-B, destination D, PLR S, backup o We consider a tuple (link A-B, destination D, PLR S, backup
nexthop N) as a loop if upon link A-B failure, the flow from a nexthop N) as a loop if upon link A-B failure, the flow from a
router S upstream from A (A could be considered as PLR also) to D router S upstream from A (A could be considered as PLR also) to D
may loop due to convergence time difference between S and one of may loop due to convergence time difference between S and one of
skipping to change at page 12, line 51 skipping to change at page 13, line 17
This local delay proposal is a transient forwarding loop avoidance This local delay proposal is a transient forwarding loop avoidance
mechanism (like OFIB). Even if it only addresses local transient mechanism (like OFIB). Even if it only addresses local transient
loops, the efficiency versus complexity comparison of the mechanism loops, the efficiency versus complexity comparison of the mechanism
makes it a good solution. It is also incrementally deployable with makes it a good solution. It is also incrementally deployable with
incremental benefits, which makes it an attractive option both for incremental benefits, which makes it an attractive option both for
vendors to implement and service providers to deploy. Delaying the vendors to implement and service providers to deploy. Delaying the
convergence time is not an issue if we consider that the traffic is convergence time is not an issue if we consider that the traffic is
protected during the convergence. protected during the convergence.
The ULOOP_DELAY_DOWN_TIMER value should be set according to the
maximum IGP convergence time observed in the network (usually
observed in the slowest node).
The proposed mechanism is limited to link down events. When a link The proposed mechanism is limited to link down events. When a link
goes down, it eventually goes back up. As a consequence, with the goes down, it eventually goes back up. As a consequence, with the
proposed mechanism deployed, only the link down event will be proposed mechanism deployed, only the link down event will be
protected against transient forwarding loops while the link up event protected against transient forwarding loops while the link up event
will not. If the operator wants to limit the impact of the transient will not. If the operator wants to limit the impact of the transient
forwarding loops during the link up event, it should take care of forwarding loops during the link up event, it should take care of
using specific procedures to bring the link back online. As using specific procedures to bring the link back online. As
examples, the operator can decide to put back the link online out of examples, the operator can decide to put back the link online out of
business hours or it can use some incremental metric changes to business hours or it can use some incremental metric changes to
prevent loops (as proposed in [RFC5715]). prevent loops (as proposed in [RFC5715]).
9. Examples 9. Examples
We will consider the following figure for the associated examples : We will consider the following figure for the associated examples :
D D
1 | F----X 1 | F----X
| 1 | | 1 |
A ------ B A ------ B
| | ^ | |
10 | | 5 | T 10 | | 5
| | | | |
E--------C E--------C
| 1 | 1
1 | 1 |
S S
Figure 7 Figure 7
The network above is considered to have a convergence time about 1 The network above is considered to have a convergence time about 1
second, so ULOOP_DELAY_DOWN_TIMER will be adjusted to this value. We second, so ULOOP_DELAY_DOWN_TIMER will be adjusted to this value. We
also consider that FRR is running on each node. also consider that FRR is running on each node.
9.1. Local link down 9.1. Local link down
The table below describes the events and associated timing that The table 3 describes the events and associated timing that happen on
happens on router C and E when link B-C goes down. As C detects a router C and E when link B-C goes down. It is based on a theoretical
single local event corresponding to a link down (its LSP + LSP from B sequence of event that should only been read as an example. As C
received), it decides to apply the local delay down behavior and no detects a single local event corresponding to a link down (its LSP +
LSP from B received), it applies the local delay down behavior and no
microloop is formed. microloop is formed.
+-----------+-------------+------------------+----------------------+ +-----------+-------------+------------------+----------------------+
| Network | Time | Router C events | Router E events | | Network | Time | Router C events | Router E events |
| condition | | | | | condition | | | |
+-----------+-------------+------------------+----------------------+ +-----------+-------------+------------------+----------------------+
| S->D | | | | | S->D | | | |
| Traffic | | | | | Traffic | | | |
| OK | | | | | OK | | | |
| | | | | | | | | |
skipping to change at page 15, line 22 skipping to change at page 16, line 22
| | | updating its | | | | | updating its | |
| | | RIB/FIB | | | | | RIB/FIB | |
| | | | | | | | | |
| | t0+1255msec | C updates its | | | | t0+1255msec | C updates its | |
| | | RIB/FIB for D | | | | | RIB/FIB for D | |
| | | | | | | | | |
| | t0+1340msec | C convergence | | | | t0+1340msec | C convergence | |
| | | ends | | | | | ends | |
+-----------+-------------+------------------+----------------------+ +-----------+-------------+------------------+----------------------+
Route computation event time scale Table 3 - Route computation event time scale
Similarly, upon B-C link down event, if LSP/LSA from B is received Similarly, upon B-C link down event, if LSP/LSA from B is received
before C detects the link failure, C will apply the route update before C detects the link failure, C will apply the route update
delay if the local detection is part of the same SPF run. delay if the local detection is part of the same SPF run. The table
4 describes the associated theoretical sequence of events. It should
only been read as an example.
+-----------+-------------+------------------+----------------------+ +-----------+-------------+------------------+----------------------+
| Network | Time | Router C events | Router E events | | Network | Time | Router C events | Router E events |
| condition | | | | | condition | | | |
+-----------+-------------+------------------+----------------------+ +-----------+-------------+------------------+----------------------+
| S->D | | | | | S->D | | | |
| Traffic | | | | | Traffic | | | |
| OK | | | | | OK | | | |
| | | | | | | | | |
| S->D | t0 | Link B-C fails | Link B-C fails | | S->D | t0 | Link B-C fails | Link B-C fails |
skipping to change at page 16, line 46 skipping to change at page 17, line 48
| | | updating its | | | | | updating its | |
| | | RIB/FIB | | | | | RIB/FIB | |
| | | | | | | | | |
| | t0+1255msec | C updates its | | | | t0+1255msec | C updates its | |
| | | RIB/FIB for D | | | | | RIB/FIB for D | |
| | | | | | | | | |
| | t0+1340msec | C convergence | | | | t0+1340msec | C convergence | |
| | | ends | | | | | ends | |
+-----------+-------------+------------------+----------------------+ +-----------+-------------+------------------+----------------------+
Route computation event time scale Table 4 - Route computation event time scale
9.2. Local and remote event 9.2. Local and remote event
The table below describes the events and associating timing that The table 5 describes the events and associated timing that happen on
happens on router C and E when link B-C goes down, in addition F-X router C and E when link B-C goes down, in addition F-X link will
link will fail in the same time window. C will not apply the local fail in the same time window. C will not apply the local delay
delay because a non local topology change is also received. because a non local topology change is also received. The table 5 is
based on a theoretical sequence of event that should only been read
as an example.
+-----------+------------+-----------------+------------------------+ +-----------+------------+-----------------+------------------------+
| Network | Time | Router C events | Router E events | | Network | Time | Router C events | Router E events |
| condition | | | | | condition | | | |
+-----------+------------+-----------------+------------------------+ +-----------+------------+-----------------+------------------------+
| S->D | | | | | S->D | | | |
| Traffic | | | | | Traffic | | | |
| OK | | | | | OK | | | |
| | | | | | | | | |
| S->D | t0 | Link B-C fails | Link B-C fails | | S->D | t0 | Link B-C fails | Link B-C fails |
skipping to change at page 18, line 38 skipping to change at page 19, line 40
| Traffic | | | for D | | Traffic | | | for D |
| OK | | | | | OK | | | |
| | | | | | | | | |
| | t0+450msec | C convergence | | | | t0+450msec | C convergence | |
| | | ends | | | | | ends | |
| | | | | | | | | |
| | t0+470msec | | E convergence ends | | | t0+470msec | | E convergence ends |
| | | | | | | | | |
+-----------+------------+-----------------+------------------------+ +-----------+------------+-----------------+------------------------+
Route computation event time scale Table 5 - Route computation event time scale
9.3. Aborting local delay 9.3. Aborting local delay
The table below describes the events and associated timing that The table 6 describes the events and associated timing that happen on
happen on router C and E when link B-C goes down. In addition, we router C and E when link B-C goes down. In addition, we consider
consider what happens when F-X link fails during local delay of the what happens when F-X link fails during local delay of the FIB
FIB update. C will first apply the local delay, but when the new update. C will first apply the local delay, but when the new event
event happens, it will fall back to the standard convergence happens, it will fall back to the standard convergence mechanism
mechanism without further delaying route insertion. In this example, without further delaying route insertion. In this example, we
we consider a ULOOP_DELAY_DOWN_TIMER configured to 2 seconds. consider a ULOOP_DELAY_DOWN_TIMER configured to 2 seconds. The table
6 is based on a theoretical sequence of event that should only been
read as an example.
+-----------+------------+-------------------+----------------------+ +-----------+------------+-------------------+----------------------+
| Network | Time | Router C events | Router E events | | Network | Time | Router C events | Router E events |
| condition | | | | | condition | | | |
+-----------+------------+-------------------+----------------------+ +-----------+------------+-------------------+----------------------+
| S->D | | | | | S->D | | | |
| Traffic | | | | | Traffic | | | |
| OK | | | | | OK | | | |
| | | | | | | | | |
| S->D | t0 | Link B-C fails | Link B-C fails | | S->D | t0 | Link B-C fails | Link B-C fails |
skipping to change at page 20, line 47 skipping to change at page 22, line 47
| S->D | t0+778msec | | E updates its | | S->D | t0+778msec | | E updates its |
| Traffic | | | RIB/FIB for D | | Traffic | | | RIB/FIB for D |
| OK | | | | | OK | | | |
| | | | | | | | | |
| | t0+781msec | C convergence | | | | t0+781msec | C convergence | |
| | | ends | | | | | ends | |
| | | | | | | | | |
| | t0+810msec | | E convergence ends | | | t0+810msec | | E convergence ends |
+-----------+------------+-------------------+----------------------+ +-----------+------------+-------------------+----------------------+
Route computation event time scale Table 6 - Route computation event time scale
10. Comparison with other solutions 10. Comparison with other solutions
As stated in Section 4, the proposed solution reuses some concepts As stated in Section 4, the proposed solution reuses some concepts
already introduced by other IETF proposals but tries to find a already introduced by other IETF proposals but tries to find a
tradeoff between efficiency and simplicity. This section tries to tradeoff between efficiency and simplicity. This section tries to
compare behaviors of the solutions. compare behaviors of the solutions.
10.1. PLSN 10.1. PLSN
skipping to change at page 22, line 16 skipping to change at page 24, line 16
network at the price of introducing complexity in the convergence network at the price of introducing complexity in the convergence
process that may require a strong monitoring by the service provider. process that may require a strong monitoring by the service provider.
Our solution reuses the OFIB concept but limits it to the first hop Our solution reuses the OFIB concept but limits it to the first hop
that experiences the topology change. As demonstrated, the mechanism that experiences the topology change. As demonstrated, the mechanism
proposed in this document allows to solve all the local transient proposed in this document allows to solve all the local transient
forwarding loops that represents an high percentage of all the loops. forwarding loops that represents an high percentage of all the loops.
Moreover limiting the mechanism to one hop allows to keep the Moreover limiting the mechanism to one hop allows to keep the
network-wide convergence behavior. network-wide convergence behavior.
11. Existing implementations 11. Implementation Status
At this time, there are three different implementations of this At this time, there are three different implementations of this
mechanism: CISCO IOS-XR, CISCO IOS-XE and Juniper JUNOS. The three mechanism.
implementations have been tested in labs and demonstrated good
behavior in term of local micro-loop avoidance. The feature has also o Implementation 1:
been deployed in some live networks. No side effects have been
found. * Organization: Cisco
* Implementation name: Local Microloop Protection
* Operating system: IOS-XE
* Level of maturity: production release
* Coverage: all the specification is implemented
* Protocols supported: ISIS and OSPF
* Implementation experience: tested in lab and works as expected
* Comment: the feature gives the ability to choose to apply the
delay to FRR protected entry only
* Report last update: 10-11-2017
o Implementation 2:
* Organization: Cisco
* Implementation name: Local Microloop Protection
* Operating system: IOS-XR
* Level of maturity: deployed
* Coverage: all the specification is implemented
* Protocols supported: ISIS and OSPF
* Implementation experience: deployed and works as expected
* Comment: the feature gives the ability to choose to apply the
delay to FRR protected entry only
* Report last update: 10-11-2017
o Implementation 3:
* Organization: Juniper Networks
* Implementation name: Microloop avoidance when IS-IS link fails
* Operating system: JUNOS
* Level of maturity: deployed (hidden command)
* Coverage: all the specification is implemented
* Protocols supported: ISIS only
* Implementation experience: deployed and works as expected
* Comment: the feature applies to all the ISIS routes
* Report last update: 10-11-2017
12. Security Considerations 12. Security Considerations
This document does not introduce any change in term of IGP security. This document does not introduce any change in term of IGP security.
The operation is internal to the router. The local delay does not The operation is internal to the router. The local delay does not
increase the number of attack vectors as an attacker could only increase the number of attack vectors as an attacker could only
trigger this mechanism if he already has be ability to disable or trigger this mechanism if he already has be ability to disable or
enable an IGP link. The local delay does not increase the negative enable an IGP link. The local delay does not increase the negative
consequences. If an attacker has the ability to disable or enable an consequences. If an attacker has the ability to disable or enable an
IGP link, it can already harm the network by creating instability and IGP link, it can already harm the network by creating instability and
skipping to change at page 23, line 4 skipping to change at page 26, line 10
We would like to thanks the authors of [RFC6976] for introducing the We would like to thanks the authors of [RFC6976] for introducing the
concept of ordered convergence: Mike Shand, Stewart Bryant, Stefano concept of ordered convergence: Mike Shand, Stewart Bryant, Stefano
Previdi, and Olivier Bonaventure. Previdi, and Olivier Bonaventure.
14. IANA Considerations 14. IANA Considerations
This document has no actions for IANA. This document has no actions for IANA.
15. References 15. References
15.1. Normative References 15.1. Normative References
[ISO10589]
"Intermediate System to Intermediate System intra-domain
routeing information exchange protocol for use in
conjunction with the protocol for providing the
connectionless-mode network service (ISO 8473)",
ISO 10589, 2002.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328,
DOI 10.17487/RFC2328, April 1998,
<https://www.rfc-editor.org/info/rfc2328>.
15.2. Informative References 15.2. Informative References
[I-D.ietf-rtgwg-backoff-algo] [I-D.ietf-rtgwg-backoff-algo]
Decraene, B., Litkowski, S., Gredler, H., Lindem, A., Decraene, B., Litkowski, S., Gredler, H., Lindem, A.,
Francois, P., and C. Bowers, "SPF Back-off algorithm for Francois, P., and C. Bowers, "SPF Back-off algorithm for
link state IGPs", draft-ietf-rtgwg-backoff-algo-05 (work link state IGPs", draft-ietf-rtgwg-backoff-algo-05 (work
in progress), May 2017. in progress), May 2017.
[I-D.ietf-rtgwg-microloop-analysis] [I-D.ietf-rtgwg-microloop-analysis]
Zinin, A., "Analysis and Minimization of Microloops in Zinin, A., "Analysis and Minimization of Microloops in
Link-state Routing Protocols", draft-ietf-rtgwg-microloop- Link-state Routing Protocols", draft-ietf-rtgwg-microloop-
analysis-01 (work in progress), October 2005. analysis-01 (work in progress), October 2005.
[RFC3906] Shen, N. and H. Smit, "Calculating Interior Gateway
Protocol (IGP) Routes Over Traffic Engineering Tunnels",
RFC 3906, DOI 10.17487/RFC3906, October 2004,
<https://www.rfc-editor.org/info/rfc3906>.
[RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free
Convergence", RFC 5715, DOI 10.17487/RFC5715, January Convergence", RFC 5715, DOI 10.17487/RFC5715, January
2010, <https://www.rfc-editor.org/info/rfc5715>. 2010, <https://www.rfc-editor.org/info/rfc5715>.
[RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C.,
Francois, P., and O. Bonaventure, "Framework for Loop-Free Francois, P., and O. Bonaventure, "Framework for Loop-Free
Convergence Using the Ordered Forwarding Information Base Convergence Using the Ordered Forwarding Information Base
(oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July (oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July
2013, <https://www.rfc-editor.org/info/rfc6976>. 2013, <https://www.rfc-editor.org/info/rfc6976>.
 End of changes. 44 change blocks. 
90 lines changed or deleted 185 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/