draft-ietf-rtgwg-backoff-algo-08.txt   draft-ietf-rtgwg-backoff-algo-09.txt 
Network Working Group B. Decraene Network Working Group B. Decraene
Internet-Draft Orange Internet-Draft Orange
Intended status: Standards Track S. Litkowski Intended status: Standards Track S. Litkowski
Expires: August 30, 2018 Orange Business Service Expires: August 31, 2018 Orange Business Service
H. Gredler H. Gredler
RtBrick Inc RtBrick Inc
A. Lindem A. Lindem
Cisco Systems Cisco Systems
P. Francois P. Francois
C. Bowers C. Bowers
Juniper Networks, Inc. Juniper Networks, Inc.
February 26, 2018 February 27, 2018
SPF Back-off algorithm for link state IGPs SPF Back-off Delay algorithm for link state IGPs
draft-ietf-rtgwg-backoff-algo-08 draft-ietf-rtgwg-backoff-algo-09
Abstract Abstract
This document defines a standard algorithm to temporarily postpone or This document defines a standard algorithm to temporarily postpone or
'back-off' link-state IGP Shortest Path First (SPF) computations. 'back-off' link-state IGP Shortest Path First (SPF) computations.
This reduces the computational load and churn on IGP nodes when This reduces the computational load and churn on IGP nodes when
multiple temporally close network events trigger multiple SPF multiple temporally close network events trigger multiple SPF
computations. computations.
Having one standard algorithm improves interoperability by reducing Having one standard algorithm improves interoperability by reducing
skipping to change at page 2, line 10 skipping to change at page 2, line 10
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 30, 2018. This Internet-Draft will expire on August 31, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 28 skipping to change at page 3, line 28
the network is experiencing multiple failures over a short period of the network is experiencing multiple failures over a short period of
time, there is a conflicting desire to limit the frequency of SPF time, there is a conflicting desire to limit the frequency of SPF
computations, which would allow a reduction in control plane computations, which would allow a reduction in control plane
resources used by IGPs and all protocols/subsystems reacting on the resources used by IGPs and all protocols/subsystems reacting on the
attendant route change, such as LDP [RFC5036], RSVP-TE [RFC3209], BGP attendant route change, such as LDP [RFC5036], RSVP-TE [RFC3209], BGP
[RFC4271], Fast ReRoute computations (e.g., Loop Free Alternates [RFC4271], Fast ReRoute computations (e.g., Loop Free Alternates
(LFA) [RFC5286]), FIB updates, etc. This also reduces network churn (LFA) [RFC5286]), FIB updates, etc. This also reduces network churn
and, in particular, reduces the side effects such as micro-loops and, in particular, reduces the side effects such as micro-loops
[RFC5715] that ensue during IGP convergence. [RFC5715] that ensue during IGP convergence.
To allow for this, IGPs usually implement an SPF back-off algorithm To allow for this, IGPs usually implement an SPF Back-off Delay
that postpones or backs-off the SPF computation. However, different algorithm that postpones or backs-off the SPF computation. However,
implementations have chosen different algorithms. Hence, in a multi- different implementations have chosen different algorithms. Hence,
vendor network, it's not possible to ensure that all routers trigger in a multi-vendor network, it's not possible to ensure that all
their SPF computation after the same delay. This situation increases routers trigger their SPF computation after the same delay. This
the average and maximum differential delay between routers completing situation increases the average and maximum differential delay
their SPF computation. It also increases the probability that between routers completing their SPF computation. It also increases
different routers compute their FIBs based on different LSDB the probability that different routers compute their FIBs based on
versions. Both factors increase the probability and/or duration of different LSDB versions. Both factors increase the probability and/
micro-loops as discussed in Section 8. or duration of micro-loops as discussed in Section 8.
To allow multi-vendor networks to have all routers delay their SPF To allow multi-vendor networks to have all routers delay their SPF
computations for the same duration, this document specifies a computations for the same duration, this document specifies a
standard algorithm. Optionally, implementations may also offer standard algorithm.
alternative algorithms.
2. High level goals 2. High level goals
The high level goals of this algorithm are the following: The high level goals of this algorithm are the following:
o Very fast convergence for a single event (e.g., link failure). o Very fast convergence for a single event (e.g., link failure).
o Paced fast convergence for multiple temporally close IGP events o Paced fast convergence for multiple temporally close IGP events
while IGP stability is considered acceptable. while IGP stability is considered acceptable.
skipping to change at page 5, line 37 skipping to change at page 5, line 37
(TIME_TO_LEARN_INTERVAL), we then assume that a single component (TIME_TO_LEARN_INTERVAL), we then assume that a single component
failed, but that this failure requires the knowledge of multiple IGP failed, but that this failure requires the knowledge of multiple IGP
events in order for IGP routing to converge. Under this assumption, events in order for IGP routing to converge. Under this assumption,
we want fast convergence since this is a normal network situation. we want fast convergence since this is a normal network situation.
However, there is a benefit in waiting for all IGP events related to However, there is a benefit in waiting for all IGP events related to
this single component failure so that the IGP can compute the post- this single component failure so that the IGP can compute the post-
failure routing table in a single additional route computation. In failure routing table in a single additional route computation. In
this situation, we delay the routing computation by SHORT_SPF_DELAY. this situation, we delay the routing computation by SHORT_SPF_DELAY.
If IGP events are still received after TIME_TO_LEARN_INTERVAL from If IGP events are still received after TIME_TO_LEARN_INTERVAL from
the initial IGP event received in QUIET state Figure 1, then the the initial IGP event received in QUIET state Section 5.1, then the
network is presumably experiencing multiple independent failures. In network is presumably experiencing multiple independent failures. In
this case, while waiting for network stability, the computations are this case, while waiting for network stability, the computations are
delayed for a longer time represented by LONG_SPF_DELAY. This SPF delayed for a longer time represented by LONG_SPF_DELAY. This SPF
delay is kept until no IGP events are received for HOLDDOWN_INTERVAL. delay is kept until no IGP events are received for HOLDDOWN_INTERVAL.
Note that in order to increase the consistency network wide, the Note that in order to increase the consistency network wide, the
algorithm uses a delay (TIME_TO_LEARN_INTERVAL) from the initial IGP algorithm uses a delay (TIME_TO_LEARN_INTERVAL) from the initial IGP
event, rather than the number of SPF computation performed. Indeed, event, rather than the number of SPF computation performed. Indeed,
as all routers may receive the IGP events at different times, we as all routers may receive the IGP events at different times, we
cannot assume that all routers will perform the same number of SPF cannot assume that all routers will perform the same number of SPF
computations. For example, assuming that the SPF delay is 50 ms, computations. For example, assuming that the SPF delay is 50 ms,
router R1 may receive 3 IGP events (E1, E2, E3) in those 50 ms and router R1 may receive 3 IGP events (E1, E2, E3) in those 50 ms and
hence will perform a single routing computation. While another hence will perform a single routing computation. While another
router R2 may only receive 2 events (E1, E2) in those 50 ms and hence router R2 may only receive 2 events (E1, E2) in those 50 ms and hence
will schedule another routing computation when receiving E3. will schedule another routing computation when receiving E3.
5. Specification of the SPF delay state machine 5. Specification of the SPF delay state machine
This section describes the abstract finite state machine (FSM) This section specifies the finite state machine (FSM) intended to
intended to control the timing of the execution of SPF calculations control the timing of the execution of SPF calculations in response
in response to IGP events. to IGP events.
5.1. State Machine 5.1. State Machine
The FSM is initialized to the QUIET state with all three timers The FSM is initialized to the QUIET state with all three timers
timers (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER) deactivated. timers (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER) deactivated.
The events which may change the FSM states are an IGP event or the The events which may change the FSM states are an IGP event or the
expiration of one timer (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER). expiration of one timer (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER).
The following diagram briefly describes the state transitions. The following diagram briefly describes the state transitions.
skipping to change at page 8, line 12 skipping to change at page 8, line 12
QUIET state. This state is meant to handle single component failure QUIET state. This state is meant to handle single component failure
requiring multiple IGP events (e.g., node, SRLG). requiring multiple IGP events (e.g., node, SRLG).
LONG_WAIT: State reached after TIME_TO_LEARN_INTERVAL. In other LONG_WAIT: State reached after TIME_TO_LEARN_INTERVAL. In other
words, state reached after TIME_TO_LEARN_INTERVAL in state words, state reached after TIME_TO_LEARN_INTERVAL in state
SHORT_WAIT. This state is meant to handle multiple independent SHORT_WAIT. This state is meant to handle multiple independent
component failures during periods of IGP instability. component failures during periods of IGP instability.
5.3. Timers 5.3. Timers
SPF_TIMER: The FSM abstract timer that uses the computed SPF delay. SPF_TIMER: The FSM timer that uses the computed SPF delay. Upon
Upon expiration, the Route Table Computation (as defined in expiration, the Route Table Computation (as defined in Section 3) is
Section 3) is performed. performed.
HOLDDOWN_TIMER: The FSM abstract timer that is (re)started whan an HOLDDOWN_TIMER: The FSM timer that is (re)started whan an IGP event
IGP event is received and set to HOLDDOWN_INTERVAL. Upon expiration, is received and set to HOLDDOWN_INTERVAL. Upon expiration, the FSM
the FSM is moved to the QUIET state. is moved to the QUIET state.
LEARN_TIMER: The FSM abstract timer that is started when an IGP event LEARN_TIMER: The FSM timer that is started when an IGP event is
is recevied while the FSM is in the QUIET state. Upon expiration, recevied while the FSM is in the QUIET state. Upon expiration, the
the FSM is moved to the LONG_WAIT state. FSM is moved to the LONG_WAIT state.
5.4. FSM Events 5.4. FSM Events
This section describes the events and the actions performed in This section describes the events and the actions performed in
response. response.
Transition 1: IGP event, while in QUIET state. Transition 1: IGP event, while in QUIET state.
Actions on event 1: Actions on event 1:
skipping to change at page 11, line 9 skipping to change at page 11, line 9
the network. In case of doubt, it's RECOMMENDED that timer intervals the network. In case of doubt, it's RECOMMENDED that timer intervals
should be chosen conservatively (i.e., longer timer values). should be chosen conservatively (i.e., longer timer values).
For the standard algorithm to be effective in mitigating micro-loops, For the standard algorithm to be effective in mitigating micro-loops,
it is RECOMMENDED that all routers in the IGP domain, or at least all it is RECOMMENDED that all routers in the IGP domain, or at least all
the routers in the same area/level, have exactly the same configured the routers in the same area/level, have exactly the same configured
values. values.
7. Partial Deployment 7. Partial Deployment
In general, the SPF delay algorithm is only effective in mitigating In general, the SPF Back-off Delay algorithm is only effective in
micro-loops if it is deployed, with the same parameters, on all mitigating micro-loops if it is deployed, with the same parameters,
routers in the IGP domain or, at least, all routers in an IGP area/ on all routers in the IGP domain or, at least, all routers in an IGP
level. The impact of partial deployment is dependent on the area/level. The impact of partial deployment is dependent on the
particular event, topology, and the SPF algorithm(s) used on other particular event, topology, and the algorithm(s) used on other
routers in the IGP area/level. In cases where the previous SPF routers in the IGP area/level. In cases where the previous SPF Back-
algorithm was implemented uniformly, partial deployment will increase off Delay algorithm was implemented uniformly, partial deployment
the frequency and duration of micro-loops. Hence, it is RECOMMENDED will increase the frequency and duration of micro-loops. Hence, it
that all routers in the IGP domain or at least within the same area/ is RECOMMENDED that all routers in the IGP domain or at least within
level be migrated to the SPF algorithm described herein at roughly the same area/level be migrated to the SPF algorithm described herein
the same time. at roughly the same time.
Note that this is not a new consideration as over times, network Note that this is not a new consideration as over times, network
operators have changed SPF delay parameters in order to accommodate operators have changed SPF delay parameters in order to accommodate
new customer requirements for fast convergence, as permitted by new new customer requirements for fast convergence, as permitted by new
software and hardware. They may also have progressively replaced an software and hardware. They may also have progressively replaced an
implementation with a given SPF delay algorithm by another implementation with a given SPF Back-off Delay algorithm by another
implementation with a different one. implementation with a different one.
8. Impact on micro-loops 8. Impact on micro-loops
Micro-loops during IGP convergence are due to a non-synchronized or Micro-loops during IGP convergence are due to a non-synchronized or
non-ordered update of the forwarding information tables (FIB) non-ordered update of the forwarding information tables (FIB)
[RFC5715] [RFC6976] [I-D.ietf-rtgwg-spf-uloop-pb-statement]. FIBs [RFC5715] [RFC6976] [I-D.ietf-rtgwg-spf-uloop-pb-statement]. FIBs
are installed after multiple steps such as flooding of the IGP event are installed after multiple steps such as flooding of the IGP event
across the network, SPF wait time, SPF computation, FIB distribution across the network, SPF wait time, SPF computation, FIB distribution
across line cards, and FIB update. This document only addresses the across line cards, and FIB update. This document only addresses the
 End of changes. 13 change blocks. 
42 lines changed or deleted 41 lines changed or added

This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/