[Docs] [txt|pdf] [Tracker] [WG] [Email] [Nits]
Versions: 00 01 02 03 04 05 06 07 08 RFC 3623
Network Working Group J. Moy
Internet Draft Sycamore Networks, Inc.
Expiration Date: July 2001 February 2001
File name: draft-ietf-ospf-hitless-restart-00.txt
Hitless OSPF Restart
draft-ietf-ospf-hitless-restart-00.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet- Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This memo documents an enhancement to the OSPF routing protocol,
whereby an OSPF router can stay on the forwarding path even as its
OSPF software is restarted. This is called "hitless restart" or
"non-stop forwarding". A restarting router may not be capable of
adjusting its forwarding in a timely manner when the network
topology changes. In order to avoid the possible resulting routing
loops the procedure in this memo automatically terminates when such
a topology change is detected. The restart procedure is also
backward-compatible, reverting to standard OSPF processing when one
or more of the restarting router's neighbors do not support the
enhancements in this memo. Proper network operation during a hitless
restart makes assumptions upon the operating environment of the
restarting router; these assumptions are also documented.
Moy [Page 1]
Internet Draft Hitless OSPF Restart February 2001
Table of Contents
1 Overview ............................................... 2
2 Operation of restarting router ......................... 3
2.1 Entering hitless restart ............................... 3
2.2 Exiting hitless restart ................................ 5
3 Operation of helper neighbor ........................... 6
3.1 Entering helper mode ................................... 6
3.2 Exiting helper mode .................................... 7
4 Backward compatibility ................................. 7
5 Notes .................................................. 7
References ............................................. 8
A Grace-LSA format ....................................... 9
Security Considerations ............................... 10
Authors' Addresses .................................... 10
1. Overview
Today many Internet routers implement a separation of control and
forwarding functions. Certain processors are dedicated to control
and management tasks such as OSPF routing, while other processors
perform the data forwarding tasks. This separation creates the
possibility of maintaining a router's data forwarding capability
while the router's control software is restarted/reloaded. We call
such a possibility "hitless restart" or "non-stop forwarding".
The problem that the OSPF protocol presents to hitless restart is
that, under normal operation, OSPF intentionally routes around a
restarting router while it rebuilds its link-state database. OSPF
avoids the restarting router to minimize the possibility of routing
loops and/or black holes caused by lack of database synchronization.
Avoidance is accomplished by have the router's neighbors reissue
their LSAs, omitting links to the restarting router.
However, if (a) the network topology remains stable and (b) the
restarting router is able to keep its forwarding table(s) across the
restart, it would be safe to keep the restarting router on the
forwarding path. This memo documents an enhancement to OSPF that
makes such hitless restart possible, and one that automatically
reverts back to standard OSPF for safety when network topology
changes are detected.
In a nutshell, the OSPF enhancements for hitless restart are as
follows. The router attempting a hitless restart originates link-
local Opaque-LSAs, herein called Grace-LSAs, announcing the
intention to perform a hitless restart, and asking for a "grace
period". During the grace period its neighbors continue to announce
the restarting router in their LSAs as if it were fully adjacent
Moy [Page 2]
Internet Draft Hitless OSPF Restart February 2001
(i.e., OSPF neighbor state Full), but only if the network topology
remains static (i.e, the contents of the LSAs in the link-state
database having LS types 1-5,7 remain unchanged; simple refreshes
are allowed).
There are two roles being played by OSPF routers during hitless
restart. First there is the router that is being restarted. The
operation of this router during hitless restart, including how the
router enters and leaves hitless restart, is the subject of Section
2. Then there are the router's neighbors, which must cooperate in
order for the restart to be hitless. During hitless restart we say
that the neighbors are executing in "helper mode". Section 3 covers
the responsibilities of a router executing in helper mode, including
entering and leaving helper mode.
2. Operation of restarting router
After the router restarts/reloads, it must change its OSPF
processing somewhat until it re-establishes full adjacencies with
all its previously fully-adjacent neighbors. This time period,
between the restart/reload and the reestablishment of adjacencies,
is called "hitless restart". During hitless restart:
(1) The restarting router does not originate LSAs with LS types
1-5,7. Instead, the restarting router wants the other routers
in the OSPF domain to calculate routes using the LSAs that it
had originated prior to its restart, in order to maintain
forwarding through the restart.
(2) The restarting router doesn't run its OSPF routing
calculations, instead using the forwarding table(s) that it
had built prior to the restart.
Otherwise, the restarting router operates the same as any other OSPF
router. It discovers neighbors using OSPF's Hello protocol, elects
Designated and Backup Designated Routers, performs the Database
Exchange procedure to initially synchronize link-state databases
with its neighbors, and maintains this synchronization through
flooding.
The processes of entering hitless restart, and of exiting hitless
restart (either successfully or not) are covered in the following
sections.
2.1. Entering hitless restart
The router (call it Router X) is informed of the desire for its
hitless restart when an appropriate command is issued by the
Moy [Page 3]
Internet Draft Hitless OSPF Restart February 2001
network operator. The network operator may also specify the
length of the grace period, or the necessary grace period may be
calculated by the router's OSPF software.
In preparation for the hitless restart, Router X must perform
the following actions before its software is restarted/reloaded.
Note that common OSPF shutdown procedures are *not* performed,
since we want the other OSPF routers to act as if Router X
remains in continuous service. For example, Router X does not
flush its locally originated LSAs, since we want them to remain
in other routers' link-state databases throughout the restart
period.
(1) Router X must ensure that its forwarding table(s) is/are
up-to-date and will remain in place across the restart.
(2) Router X must resign any Designated Router (DR) or Backup
Designated Router duties that it currently has. It does
this by sending Hellos with Designated Router Priority
set to 0. Resigning DR duties ensures that flooding works
unimpeded across restarts, and that the DR/Backup will
not change *after* the Grace-LSA is generated, which
would be interpreted as a topology change and would
terminate the hitless restart procedure prematurely.
(3) The router must note in non-volatile storage the
cryptographic sequence numbers being used for each
interface. Otherwise it will take up to
RouterDeadInterval seconds after the restart before it
can start to reestablish its adjacencies, which would
force the grace period to be lengthened severely.
Router X then originates the grace-LSAs. These are link-local
Opaque-LSAs (see Appendix A). Their LS Age field is set to 0,
and the requested grace period (in seconds) is inserted into the
body of the grace-LSA. A grace-LSA is originated for each of the
router's OSPF interfaces. However, a grace-LSA need not be
originated for an interface if either a) the interface has no
fully adjacent neighbors or b) the interface is of type point-
to-point and a grace-LSA has already been sent to the attached
neighbor on another interface. If Router X wants to ensure that
its neighbors receive the grace-LSAs, it should retransmit the
grace-LSAs until they are acknowledged (i.e, perform standard
OSPF reliable flooding of the grace-LSAs). If one or more fully
adjacent neighbors do not receive grace-LSAs, they will more
than likely cause premature termination of the hitless restart
procedure (see Section 4).
Moy [Page 4]
Internet Draft Hitless OSPF Restart February 2001
After the grace-LSAs have been sent, the router should store the
fact that it is performing hitless restart along with the length
of the requested grace period in non-volatile storage. The OSPF
software should then be restarted/reloaded, and when the
reloaded software starts executing the hitless restart
modifications in Section 2 above are followed.
2.2. Exiting hitless restart
On exiting "hitless restart", the reloaded router reverts back
to completely normal OSPF operation, reoriginating LSAs based on
the router's current state and recalculating its forwarding
table(s) based on the current contents of the link-state
database. The router exits hitless restart when any of the
following occurs:
(1) Router X has reestablished all its adjacencies. Router X
can determine this by building (but not installing or
flooding) its router-LSA, based on the current router
state, and comparing it to the router-LSA that it had
last originated before the restart (called the "pre-
restart router-LSA"). If the contents are the same, all
adjacencies have been reestablished.
(2) Router X receives an LSA that is inconsistent with its
pre-restart router-LSA. For example, X receives a router-
LSA originated by router Y that does not contain a link
to X, even though X's pre-start router-LSA did contain a
link to Y. This indicates that either a) Y does not
support hitless restart, b) Y never received the grace-
LSA or c) Y has terminated its helper mode for some
reason (Section 3.2).
(3) The grace period expires.
(4) Router X gets a valid hitless restart request (grace-LSA)
from another router. A router cannot both simultaneously
attempt hitless restart and help a neighboring router
undergo hitless restart, because the neighboring router
must be monitoring the network state for changes
throughout the entire restart period.
When it exits hitless restart, the reloaded router should flush
any grace-LSAs that it had originated.
Moy [Page 5]
Internet Draft Hitless OSPF Restart February 2001
3. Operation of helper neighbor
As a "helper neighbor" for a router X undergoing hitless restart,
router Y has two duties. It monitors the network for topology
changes, and as long as there are none, continues to its advertise
its LSAs as if X had remained in continuous OSPF operation. This
means that Y's LSAs continue to list all adjacencies to X that were
full (OSPF neighbor state Full) when the grace-LSA was first
received, regardless of their current sycnchronization state. This
logic affects the contents of both router-LSAs and network-LSAs, and
also depends on the type of interface associated with the (possibly
former) adjacency (see Sections 12.4.1.1 through 12.4.1.5 and
Section 12.4.2 of [Ref1]).
3.1. Entering helper mode
When a router Y receives a grace-LSA from router X, it enters
helper mode for X as long as all the following checks pass:
(1) There have been no changes in content to the link-state
database (LS types 1-5,7) since the beginning of the
grace period specified by the grace-LSA. The grace period
began N seconds ago, where N is the current LS age of the
grace-LSA.
(2) The grace period has not yet expired. This means that the
LS age of the grace-LSA is less than the grace period
specified in the body of the grace-LSA (Appendix A).
(3) Local policy allows Y to act as the helper for X.
Examples of configured policies might be a) never act as
helper, b) never allow the grace period to exceed a Time
T, or c) never act as a helper for certain specific
routers (specified by OSPF Router ID).
Note that Router Y only needs to receive a single grace-LSA from
X, even if X and Y attach to multiple common segments. The data
in the first valid grace-LSA received is used to indicate the
beginning and the end of the grace period -- all subsequent
grace-LSAs received from X are ignored. This first grace-LSA is
referred to below as simply "the grace-LSA from X".
A single router is allowed to simultaneously serve as a helper
for multiple restarting neighbors.
Moy [Page 6]
Internet Draft Hitless OSPF Restart February 2001
3.2. Exiting helper mode
Router Y ceases to perform the helper function for its neighbor
Router X when one of the following events occurs.
(1) The grace-LSA originated by X is flushed. This is the
successful termination of hitless restart.
(2) The grace period expires.
(3) Router Y receives an LSA with LS types 1-5,7 and whose
contents have changed. This includes LSAs with no
previous link-state database instance and the flushing of
LSAs from the database, but excludes simple LSA
refreshes. A change in LSA contents indicates a network
topology change, which forces termination of a hitless
restart.
When router Y exits helper mode for X, Y reoriginates its LSAs
based on the current state of its Router X adjacencies.
4. Backward compatibility
Backward-compatibility with unmodified OSPF routers is an automatic
consequence of the functionality documented above. If one or more
neighbors of a router requesting hitless restart are unmodified, or
if they do not received the grace-LSA, the hitless restart is
prematurely aborted.
The unmodified routers will start routing around the restarted
router X as it performs initial database synchronization, by
reissuing their LSAs with links to X omitted. These LSAs will be
interpreted by helper neighbors as a topology change, and by X as an
LSA inconsistency, in either case aborting hitless restart and
resuming normal OSPF operation.
5. Notes
Note the following details concerning the hitless OSPF restart
mechanism described in this memo.
o DoNotAge is never set in a grace-LSA, even if the grace-LSA is
flooded over a demand circuit. This is because the grace-LSA's
LS age field is used to calculate the extent of the grace period
(see Appendix A).
o Grace-LSAs have link-local scope because a) they only need to be
seen by the router's direct neighbors and b) restricting them to
Moy [Page 7]
Internet Draft Hitless OSPF Restart February 2001
link-local scope makes it easy to detect the illegal
configuration of two restarting routers being asked to help each
other (Section 2.2).
o It may be noted that the hitless restart mechanisms in this memo
can also be used for unplanned outages. For example, after a
crash of its control software, the router may come up and send
grace-LSAs in an attempt to remain on the forwarding path while
it regains its control state. This may not be a good idea, as it
seems unlikely that such a router could guarantee the sanity of
its forwarding table(s). However, if the router does attempt a
hitless restart from an unplanned outage, it should at the least
(a) allow the network operator to turn this feature off and (b)
attempt to determine when its forwarding tables were last
updated, setting the beginning of the grace period accordingly
(this means originating the grace-LSA with LS age equal to the
time that the forwarding tables were last updated).
References
[Ref1] Moy, J., "OSPF Version 2", RFC 2328, April 1998.
[Ref2] Coltun, R., "The OSPF Opaque LSA Option", RFC 2370, July
1998.
[Ref3] Murphy, S., M. Badger and B. Wellington, "OSPF with Digital
Signatures", RFC 2154, June 1997.
Moy [Page 8]
Internet Draft Hitless OSPF Restart February 2001
A. Grace-LSA format
The grace-LSA is a link-local scoped Opaque-LSA [Ref2] having Opaque
Type of TBD1 and Opaque ID equal to TBD2. The grace-LSA is
originated by a router that wishes to execute a hitless restart of
its OSPF software. The grace-LSA requests that the router's
neighbors aid it in its hitless restart by continuing to advertise
the router as fully adjacent during a specified grace period.
It is assumed that the grace-LSA has LS age field set to 0 when the
LSA is first originated; the current value of LS age then indicates
how long ago the restarting router made its request. The body of the
LSA contains the length of the grace period in seconds.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| LS age | Options | 9 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opaque Type | Opaque ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Advertising Router |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| LS sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| LS checksum | length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Grace Period |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Grace Period
The number of seconds that the router's neighbors should
continue to advertise the router as fully adjacent, regardless
of the the state of database synchronization between the router
and its neighbors. Since this time period began when grace-LSA's
LS age was equal to 0, the grace period terminates when either
a) the LS age of the grace-LSA exceeds the value of Grace Period
or b) the grace-LSA is flushed. See Section 3.2 for other
conditions which terminate the grace period.
Moy [Page 9]
Internet Draft Hitless OSPF Restart February 2001
Security Considerations
One of the ways to attack a link-state protocol such as OSPF is to
inject false LSAs into, or corrupt existing LSAs in, the link-state
database. Injecting a false grace-LSA would allow an attacker to
spoof a router that, in reality, has been withdrawn from service.
The standard way to prevent such corruption of the link-state
database is to secure OSPF protocol exchanges using the Crytographic
authentication specified in [Ref1]. An even stronger way of securing
link-state database contents has been proposed in [Ref3].
Authors' Addresses
J. Moy
Sycamore Networks, Inc.
150 Apollo Drive
Chelmsford, MA 01824
Phone: (978) 367-2505
Fax: (978) 256-4203
email: jmoy@sycamorenet.com
Moy [Page 10]
Html markup produced by rfcmarkup 1.129d, available from
https://tools.ietf.org/tools/rfcmarkup/