[Docs] [txt|pdf] [Tracker] [Email] [Nits]

Versions: 00 01 02 03 RFC 2791

Internet Engineering Task Force                 Jieyun (Jessica) Yu
INTERNET DRAFT                                                UUNET
Expiration Date: October, 1999                          April, 1999
draft-yu-routing-scaling-00.txt

                          Scalable Routing Design Principles

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   Distribution of this memo is unlimited.

   Copyright Notice

   Copyright (C) The Internet Society (1999).  All Rights Reserved.

Abstract

   Routing is essential to a network. Routing scalability is essential
   to a large network.  When routing does not scale, there is a direct
   impact on the stability and performance of a network.  Thus, routing
   scalability is an important issue, especially for a large network.
   This document identifies the major factors affecting routing
   scalability as well as basic principles of designing scalable routing
   for large networks.

Contents

   1         Introduction  ....................................    2
   2         Common Routing Design Goals  .....................    2
   3         Characteristics of Today's Large Networks  .......    3

Yu                 Scalable Routing Design Principles           [Page 1]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   4         Routing Scaling Issues  ..........................    3
   4.1       Router Resource Consumption  .....................    3
   4.2       Routing Complexity  ..............................    5
   5         Routing Protocol Scalability .....................    6
   5.1       IS-IS and OSPF  ..................................    6
   5.2       BGP  .............................................    8
   6         Scalable Routing Design Principles  ..............    9
   6.1       Building Hierarchy  ..............................   10
   6.2       Compartmentalization .............................   13
   6.3       Making Proper Trade-offs .........................   13
   6.4       Reduce Burdens of Routing Information Process  ...   14
   6.4.1     Routing Intelligence Placement  ..................   14
   6.4.2     Reduce Routes and Routing Information  ...........   14
   6.4.2.1   CIDR and Route Aggregation  ......................   14
   6.4.2.2   Utilize Default Routing where it's Possible  .....   15
   6.4.2.3   Reduce Alternative Paths  ........................   15
   6.4.2.4   Use New Technologies  ............................   15
   6.4.3     Use Static Route at Edge  .........................  16
   6.4.4     Minimize the Impact of Route Flapping  ............  16
   6.5       Scalable Routing Policy and Scalable Implementation  17
   6.6       Out-of-band Process  ..............................  18
   7         Conclusion and Discussion  ........................  19
   8         Security Considerations  ..........................  20
   9         Acknowledgement  ..................................  20
   10        References  .......................................  20
   11        Appendix - Out-of-Band Routing Processes  .........  22

1. Introduction

   Routing is essential to a network. Without routing, packets cannot be
   delivered to the desired destinations and the network would not
   function.  The challenge of designing routing for a large network,
   such as a large ISP backbone network, is not only to make it work but
   also to make it scale.  Without a scalable routing system, a  network
   may suffer from severe performance penalty as unfortunately proven by
   disastrous events in large networks. This document attempts to
   analyze routing scalability issues and define a set of principles for
   designing scalable routing system for large networks.

   The organization of this document is as follows: Section 2 describes
   routing functions and design goals. Sections 3 and 4 discuss the
   characteristics of today's large networks and the associated routing
   scaling issues. Section 5 explores routing protocol scalability, and
   Section 6 presents scalable routing design principles. Section 7
   provides a conclusion to the document.

2. Common Routing Design Goals



Yu                 Scalable Routing Design Principles           [Page 2]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   The basic goals a routing system should achieve are as follows:

      o stability
      o redundancy and robustness
      o reasonable convergency time
      o routing information integrity
      o sensible and manageable routing policy

   The challenge of designing routing in a large network is not only to
   achieve these basic goals but also to make the routing system scale.

3. Characteristics of Today's Large Networks

   Today's large networks typically possess the following features:

      o They are composed of a large number of nodes (routers and/or
      switches), typically in the hundreds.  Some provider networks
      include customer CPE routers within their administrative domain,
      which increases the number of nodes to thousands.

      o They have rich connectivity to meet redundancy and robustness
      requirements, and they consequently have complex topologies.

      o They are default-free; that is, they carry all the routes known
      to the entire Internet. Currently, the total number is about
      57,000.

      o The aggregation routers for customers sometimes connect hundreds
      of customer routers.

   These characteristics impose direct challenge to the routing
   scalability of the network.

4. Routing Scaling Issues

   Today, the main issues surrounding routing scaling are excessive
   router resource consumption destabilizing the network; and routing
   complexity resulting poor management of network thus low service
   quality.

4.1. Router Resource Consumption

   The routing process puts bursty loads on routers, especially under
   unstable network  conditions. In the extreme, the routing process
   takes all available resources from the routers, which results in slow
   routing convergence or no convergence. A network is paralyzed when it
   cannot converge internal routing information.




Yu                 Scalable Routing Design Principles           [Page 3]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   It's worthy noting that routers with architecture that tightly couple
   the forwarding and routing process tends to handle the excessive
   routing load poorly.  New generation of routers that are emerging
   separate resource used for forwarding and routing are in the hope of
   scaling better.

   Today, a large network typically employ IS-IS[1,2] or OSPF[3] as its
   Interior Routing Protocol (IGP) and BGP[4] as their Exterior Routing
   Protocol (EGP). IGP is responsible for routing information with
   regarding to nodes within the network; BGP processes and propagates
   external routing information within the network. Router resources are
   consumed when there are many nodes and many adjacencies in an
   internal routing system; or, in external routing, when routing
   updates (route flapping) are frequent and contain large amounts of
   routing information to process.

   IS-IS and OSPF are Link-State routing protocols. The main CPU
   consumption associated with link-state protocols is the routing area
   wide flooding triggered by network topology change. In a network with
   N nodes and full mesh topology (i.e. each node has N-1 adjacencies),
   any node would have to process O(N) Link State Advertisements (LSA)
   when the state of a single link changes and to process in the order
   of O(N^2) updates due to a single node failure.  Therefore, the more
   nodes in a Link State IGP routing area and adjacency, the more CPU
   burden each router needs to bear.  When a network is unstable, the
   load is amplified.  Section 5.1. discusses IGP scalability in detail.

   BGP facilitates routing exchange between routing domains, or
   Autonomous Systems (AS), and thus the name inter-domain routing.
   When used for carrying external routes within a routing domain, BGP
   requires that border routers establish a full BGP peering mesh among
   them, which scales to O(N^2).  Thus, the larger the number of the
   nodes in a routing domain, the more CPU resources BGP will claim.

   In addition, many routes in a routing system combined with multiple
   paths associated with these routes impose the following scaling
   issues on BGP:

      o Many routes combined with multiple paths for each increase the
      cost of routing processing, especially in BGP (for route
      selection, routing policy application and filtering).

      o Too many routes combined with multiple paths require large
      amounts of memory on routers for storage. The demand is even
      higher at InterExchange Points such as NAPs.

      o The higher the number of routes, the greater the chance route
      flapping will occur and the more BGP routing updates will happen



Yu                 Scalable Routing Design Principles           [Page 4]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


      as a result. Based on statistics collected by [5], thousands of
      BGP updates in a measured 15 minute interval can occur on a
      typical default-free router at a NAP.

      Route flapping refers to highly frequent routing updates that
      occur due to network instability, for example when the state of a
      physical link in the network is fluctuating, or when a BGP session
      is torn down and re-established numerous time within a short
      period.

      To favor fast convergence, topology change information must be
      propagated in a timely fashion.  When a route becomes unavailable
      and is withdrawn, the information is typically sent immediately.
      If the affected routes have been announced to the global Internet,
      the update information is likely to be propagated to the entire
      Internet.

      Route flapping has profound impact on routers running BGP.  The
      routers have to processing routing information frequently and this
      consumes tremendous resources.  When a local route or link is
      oscillating, interior routing is affected as well by excessive
      topology information flooding and shortest path calculations.
      However, OSPF (or IS-IS) imposes rate limits on such activity to
      reduce the burden on the routers.  For example, OSPF specifies
      that an individual SLA can be updated at most once every 5
      seconds.  This essentially dampens the flapping.

   Large numbers of E-BGP sessions on a single router create another
   potential scaling issue. Large networks usually have huge customer
   subscriptions and connections. To scale the hardware and the number
   of nodes in the network, providers tend to dedicate a group of
   customer concentrator routers, each connecting as many customer CPE
   routers as possible. This means the concentrator routers have to
   handle hundreds of E-BGP sessions, which imposes potential problems
   such as BGP session processing and maintenance, route processing,
   filtering and route storage.

4.2. Routing Complexity

   Routing complexity results in management difficulties that have
   impact on troubleshooting and can result in low service quality
   across the network. Complicated routing policies and routing designs
   that create special cases or exceptions to the main routing scheme
   can contribute to routing complexity in a large system.

   Routing Policy refers to the administrative criteria for handling
   routing information, typically in the form of routing path selection
   and route filtering.  The way routing information is handled has



Yu                 Scalable Routing Design Principles           [Page 5]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   direct impact on traffic flow within a network and across domains. As
   a result, it affects business agreements among different networks.
   Therefore, the determination of routing policy is largely dominated
   by non-technical concerns such as business considerations. Routing
   policy can be very complex, which would make management and
   configuration an unscalable task.

   The keys to reducing routing complexity are systematic as well as
   consistent routing scheme and routing policy that is simple but meets
   the requirement of administrative polices.

   Another factor contributing to the complexity of routing management
   is prefix-based route filtering for the purpose of preserving the
   integrity of the routing information which, as is well known, is
   necessary in order to protect the integrity of the routing system.
   This becomes a challenge when the number of routes known to the
   Internet is as large as about 57,000 today.

5. Routing Protocol Scalability

   Today's commonly deployed routing protocols are IS-IS or OSPF for
   Interior routing (also known as IGP) and BGP for exterior routing
   (also known as EGP). In terms of scaling and other factors, these
   protocols are already an improvement over previous generation
   protocols, such as RIP and EGP. However, scalability is still a major
   issue when a network is large, the routing design is insensitive to
   scaling issues, and the protocol implementation is inefficient. It is
   worth noting that routing protocol scaling is influenced by protocol
   design decisions and protocol implementation decisions.

5.1. IS-IS and OSPF

   As described earlier in the document, IS-IS and OSPF are Link State
   routing protocols.  The basic components of a link State routing
   protocol are generation and maintenance of a link-state-database
   (LSD) that describes routing topology of a given routing area and
   route calculation based on the topology information in the database.
   Each node in a routing area is responsible for describing its local
   routing topology in Link State Advertisement or LSA (LSP in the case
   of IS-IS.) Each individually generated LSA will be distributed or
   flooded to all the nodes in the area.  Each node will receive and
   maintenance the LSAs from all the other nodes forming a link-state-
   database that reflects the routing topology of the entire area.

   The generals costs associated with link state IGP protocols are
   consumption of router CPU resources, link bandwidth and router
   memory.  The cost of router CPU resource is a major issue with
   scaling problem and thus is concentrated in here.



Yu                 Scalable Routing Design Principles           [Page 6]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   Flooding and route calculation are largely responsible for CPU
   consumption. Flooding is a process for a node to distribute its
   self-originated LSA to the rest of the nodes in the area in case of
   link state change.  A node will send the LSA to via all its
   interfaces.  When receiving an LSA update, a node validates the
   information and then installs it and updates its LSD before sending
   it out via all its own except the one from which it received the
   original LSA update. Given the nature of flooding, a node in a
   network with N nodes and full mesh topology would have to process
   O(N) LSA updates when the state of a single link changes and to
   process in the order of O(N^2) updates due to a single node failure.
   This is because each node in a full mesh network will have N-1
   adjacencies.  When a link state changes, a router will advertise the
   change to other N-1 routers.  The N-1 will in turn advertise the
   change to other N-1 routers thus each node will get N-1 updates.  In
   the case of node failure, it's adjacent nodes (N-1 of them) will
   advertise the change to N-2 nodes. The recipient routers will in turn
   advertise to its N-1 adjacent routers creating O(N^2) situation.

   In the case of OSPF, the protocol will refresh or flood every 30
   minutes even under stable conditions, it could magnify an already
   highly loaded router.

   From the above discussion, one can easily observe that the higher the
   number of nodes and adjacencies, the higher the cost to CPU
   consumption on the router.  The less stable the network; that is, the
   more LSAs that it generates and distributes, the higher the cost that
   is imposed on routers in the network.

   A link-state protocol typically uses Dijkstra's Shortest Path First
   (SPF) algorithm for route calculation.  The Dijkstra algorithm scales
   to the order of O(N^2) where N is the number of the nodes.  The
   algorithm could be improved to the order of O(l*logN) where l is the
   number of links in the network and N is the number of destinations or
   routers [6].

   Consequently, link state protocols do not scale to a network topology
   with many nodes and excessive adjacencies in an area. When the
   network topology is unstable, the computation, processing and
   bandwidth cost are magnified, which causes excessive consumption of
   route resources.  When the instability prevents IS-IS or OSPF from
   maintaining adjacencies, network routing meltdown occurs.

   Node adjacencies are discovered and maintained through the exchange
   of Hello messages sent periodically from each node.  When a node
   fails to receive Hello messages from its neighbor within a certain
   period (40 seconds for OSPF and less for IS-IS), it considers the
   neighbor down.  When heavy flooding, re-calculation and other



Yu                 Scalable Routing Design Principles           [Page 7]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   activities happen that make router CPU a scarce resource, a router
   may not be able to allocate CPU time to send or process Hello
   packets. Routers in the network then lose adjacency, which magnifies
   the instability. As a result, an isolated instability can escalate to
   routing failure across the entire network.

   Link-state IGP also does not scale well to carry large number of
   routes such as the 57,000 routes known to the Internet today.  Since
   external routes are included in link-state-database and in LSA (LSP
   for IS-IS) updates, the link bandwidth and router memory consumption
   will be tremendous.  Moreover, due to the large size of LSA updates,
   it'd aggravate router resource consumption in the process of LSA
   flooding especially under unstable network condition.

   To summarize, a scalable design should avoid inclusion of too many
   nodes in a routing area, too many external routes carried by IGP and,
   more important, excessive adjacency in the area.

5.2. BGP

   BGP is an inter-domain routing protocol allowing the exchange of
   routing or accessibility information between different autonomous-
   system network. Functionally, BGP is composed of External BGP (E-BGP)
   and Internal BGP (I-BGP).  E-BGP is used to exchange external routes
   while I-BGP is typically used for distributing externally learned
   routes within an AS.

   The general costs of BGP are as follows:

     o CPU consumption in BGP session establishment, route selection,
       routing information processing, and handling of routing updates
     o Router memory to install routes and multiple paths associated
       with the routes.

   Since it does not scale for IGP to carry externally learned prefixes
   as mentioned in the previous section, I-BGP typically assumes this
   duty. In order to prevent routing loops, prefixes learned via I-BGP
   are prohibited from being advertised to another I-BGP speaker. As a
   result, full mesh of I-BGP sessions among the routers within an AS is
   required. In an AS with N routers, each router will have to establish
   I-BGP sessions with N-1 nodes, and the system complexity is in the
   order of O(N^2).  Therefore, BGP scales poorly when the number of
   nodes involved in I-BGP mesh is large.  Note that it is possible to
   optimize an AS; for example, a non-border router only needs to
   establish I-BGP sessions with all the border routers, not with other
   non-border routers.

   A large network normally learns all the routes known to the Internet,



Yu                 Scalable Routing Design Principles           [Page 8]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   which is about 50-60,000. I-BGP needs to carry all these routes.

   The large number of I-BGP sessions and routes consumes tremendous
   resources from each router, especially during BGP session
   establishment and during periods of heavy route flapping.

   Frequent routing update are another potential scaling problem in
   large networks. BGP uses incremental updates and sends out routing
   information about unreachable routes quickly to reduce convergence
   time. This is a great improvement from EGP, in which the whole
   routing table is updated periodically.  However, when a network is
   not stable, a route flaps, or a BGP session oscillates, the updates,
   especially those containing route withdrawals, are sent out
   immediately, causing global BGP updates. As a result, instability
   initiated anywhere in a network triggers updates all over the
   Internet.  This effect is magnified when so many routes are visible
   to the Internet, putting a heavy load on routers that participate in
   BGP.

   The introduction of a routing hierarchy in BGP, through I-BGP route
   reflector [7] and BGP confederations [8], for example, will help
   alleviate the scaling problem caused by the requirement of full mesh
   I-BGP establishment.

   Another potential solution is to avoid the requirement of full mesh
   pairwise I-BGP connections.  This will change the way that BGP
   distributes routing information among the internal BGP peers.
   Mechanisms worth considering are using multicast to distribute
   information or adopting flooding mechanisms similar to those used in
   IS-IS or OSPF.

   Route dampening [9] is one way to reduce the excessive updates
   triggered by route flapping. The Trade-off between fast convergence
   and stability of the network should be considered. See Section 6.3.

6. Scalable Routing Design Principles

   Routing architecture for a large-scale network should achieve the
   basic goals of accuracy, stability, redundancy and convergence as
   described in Section 2 and achieve it in a scalable fashion.

   How routing scales is influenced by protocol design decisions,
   protocol implementation decisions, and network design decisions. A
   network engineer has direct control over network design decisions and
   can have substantial influence over protocol design and
   implementation. The focus of this document is network design
   decisions.




Yu                 Scalable Routing Design Principles           [Page 9]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   Following is a set of design principles for making a large network
   routing system more scalable:

      o Building Hierarchy
      o Compartmentalization
      o Making proper trade-offs
      o Reducing routing process burdens
      o Defining scalable routing policies and implementation
      o Using out-of-band routing assistance

6.1. Building Hierarchy

   As discussed in Section 5.1, OSPF and IS-IS scales poorly when a
   network has large number of nodes (or routers) and especially large
   quantity of adjacencies. This has unfortunately been proven by
   networks that deploy IP over ATM with full mesh adjacencies among the
   routers.  The full mesh overlay design combined with the inefficient
   protocol implementation has led to disastrous network outages. A
   lesson learned from this is to avoid full mesh overlay topology in a
   large network with a large, flat network routing structure.

   Building hierarchical routing structures in the network is the key to
   achieving routing scalability in a large network. As discussed
   earlier in this document, large networks usually have many routers
   and a redundant topology, which result in large amount of
   adjacencies. As also discussed earlier, the currently available
   routing protocols scale poorly for handling a large number of nodes
   in a routing domain or many adjacencies among the routers. Therefore,
   it is sensible to build a routing hierarchy to reduce the number of
   nodes in a domain as well as the number of adjacencies.

   The current common practice is to build a two-tier hierarchy in a
   network with a center component (or transit core network) attached by
   a number of outlying components (or access network). The transit core
   network covers the entire geographical area network serves; each
   access network (also known as a regional network) covers one region.
   There are usually no direct link connections among the regional
   components. Traffic from one regional network to another traverses
   the transit core. Customer networks connect only to access or
   regional networks.

   There are a number of ways to build a routing hierarchy on top of the
   above described hierarchical network topology.

      1. Completely Separate Routing Domains

      This design treats the transit core network and each regional
      network as a completely independent AS with respect to routing,
      and each AS runs an independent IGP. Each regional network E-BGP
      with the transit core for exchanging routing knowledge.  Full I-



Yu                 Scalable Routing Design Principles          [Page 10]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


      BGP connection needs to be established only within each component
      network. With this design, the maximum number of nodes in an IGP
      domain is the total number of routers in each component. As a
      result, the IGP processing load is reduced, and the number of
      nodes in an I-BGP mesh within a routing domain is decreased
      dramatically.

      Another advantage of this design is that it compartmentalizes the
      routing system so that instability in one of such component has
      less impact on the entire system. See the discussion in the
      following section.

      The main disadvantage of this scheme is that it inserts one more
      AS in the routing path when routes are advertised to the Internet
      via BGP. The extra AS in the path may cause route selection
      difficulties for other providers.

      2. One Domain with IGP and BGP Hierarchy

      This method includes the transit core and each regional network
      into one AS domain. The routing hierarchy is realized using
      multi-area IS-IS or OSPF and either BGP Confederation or I-BGP
      Reflector or a combination of the two.

      This mechanism avoids the introduction of an extra AS in the
      routing path, which is an advantage over the method described in
      Point 1.  However, multi-area hierarchical IGP is rarely used
      now-a-days in large networks since most of them are using IS-IS
      for internal routing, which does not have sufficient multi-level
      support. Although IS-IS supports multi-area routing, it imposes a
      strict hierarchy between backbone and sub-areas and allows only
      the advertisement of a default route from the backbone area to the
      sub-areas instead of specific prefixes. This restriction may be
      suitable for a network with simple sub-area topology. A sub-area
      for a large network, typically a regional or access network,
      itself has complicated topology. Receiving highly abstract routing
      information, such as a default route, would affect the sub-area's
      ability to make route selections required for traffic engineering.
      It would also limit the information passed to external ASs, for
      example, IGP-derived BGP Multi-Exit-Discriminator (MED)
      information.

      3. One IGP Area with BGP Hierarchy

      In lieu of multi-area IS-IS, the routing hierarchy could be
      achieved by defining one IGP domain for the entire network while
      employing BGP hierarchy. Fortunately, the hierarchical topology of
      the network in this case helps reduce adjacencies in the routing



Yu                 Scalable Routing Design Principles          [Page 11]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


      domain (recall there are no connections among the second-level
      network components). However, this is less than ideal since the
      number of nodes remains unchanged when there is only one IGP
      domain, which increases the load on SPF calculation. Moreover,
      instability within any regional network would still affect the
      entire network (that is, there would be no fault isolation). Some
      improvements that could be made to further reduce the burden on
      IGP would include carefully arrangement of the adjacencies to keep
      them at minimum but still achieve good redundancy.

      Even with one IGP domain, it is possible to build BGP hierarchy to
      make I-BGP more scalable in the network. BGP Reflector and BGP
      Confederation are existing mechanisms to address the scaling
      problem of full-mesh I-BGP.

      Further, BGP reflector provides the ability to build more than two
      levels of hierarchy, as long as the interaction among the
      different levels of the hierarchy are carefully arranged to avoid
      the possibility of creating routing loops.

   One question worth asking is: "Are two levels of routing hierarchy
   enough for addressing scaling issues?" "Is there really a need for
   more than two levels of hierarchy?"

   When a Level 2 sub-domain, such as a regional network, grows too big
   for routing protocols to handle, either another layer of hierarchy
   needs to be introduced or the sub-domain needs to be split into
   multiple Level 2 sub-domains.

   Keeping two levels of hierarchy and adding more sub-domains appears
   to be more manageable than adding another layer to the hierarchy.
   However, one precaution is to avoid adding more nodes to Level 1 or
   to the transit core network to make it less scalable. Connecting the
   split sub-areas to the same core router would eliminate the need to
   add more nodes in the core area than is recommended.

   Having more than two level of hierarchy would exceed the capability
   of IGPs as they are defined today. In OSPF, for example, all the
   areas must be connected via the backbone area, which eliminates the
   possibility of having more than two levels of hierarchy. IS-IS has
   the same limitation. Therefore, the specific area of the protocols
   needs to be redefined should more than two hierarchical layers in IGP
   be desirable. There is work in progress to build multi-layer IS-IS
   for more than two levels of hierarchy.

   The complexity of protocols and management will increase with the
   number of levels added to the hierarchy. According to [6], most of
   the OSPF protocol bugs found over the years are related to routing



Yu                 Scalable Routing Design Principles          [Page 12]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   area support.  Because the interaction among the multiple levels
   increases management and debugging complexity, it is desirable to
   keep the levels within a hierarchy to the minimum.

6.2. Compartmentalization

   A scalable routing design of a large network should be able to
   localize problems or failures thus preventing them from spreading to
   the entire network consuming resource of network routers. This is
   compartmentalization.

   To achieve compartmentalization in routing design for a large
   network, one needs to avoid the design of involving the whole large
   network as one routing system or routing domain. This is the reason
   for the architecture of dividing Interior and exterior routing in the
   global routing system. Within a network, it is best to divide the
   network into multiple routing domains or multiple routing areas. For
   example, in OSPF, only summary route SLAs, rather than individual
   area routes, are flooded beyond the area. When an area border routers
   aggregates the routes in its sub-area, instability of any route
   included in the summary route would not cause flooding of SLAs to
   other areas. As a result, router resources in other areas would not
   be consumed for handling flooding and recalculation.  In other words,
   instability within an area would be prevented from spreading to the
   entire routing domain.

   Since building routing hierarchy essentially divides a big routing
   area into small areas or domains, it helps achieving the goal of
   compartmentalization.

6.3. Making Proper Trade-offs

   When designing routing for a large network, the overall goal should
   be set with considerations of routing scalability and stability. The
   trade-offs between conflicting goals should be taken into account.
   Examples of such trade-offs are redundancy vs. scalability and
   convergence vs. stability.

   Redundancy introduces complexity and increased adjacencies to the
   network topology. Redundancy also imposes the need for as many
   alternative paths as possible for each route, which increases route
   processing and storage burdens.  Because of these problems, it may be
   necessary to sacrifice absolute redundancy in favor of a reasonable
   level that scales better to the routing system.

   Fast convergence requires that changes in topology be propagated to
   the network as quickly as possible. This increases routing updates
   and, consequently, the route processing burden. The burden is



Yu                 Scalable Routing Design Principles          [Page 13]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   aggravated when a network carries full Internet routing information
   as large networks typically do. Route dampening may be necessary to
   achieve stability at the expense of absolute fast convergence.

6.4. Reduce Burdens of Routing Information Process

   The tasks of reducing routing process burdens includes strategically
   place the routing intelligence within the network, avoid carrying
   unnecessary routing information and reduce the impact of route
   flapping.

6.4.1. Routing Intelligence Placement

   A router that executes routing policies, does route filtering and
   dampening posses routing intelligence.  Routing intelligence is
   needed for a network 1) to enforce the business agreement between
   network entities in the form of routing policies; 2) to protect
   integrity of the routing information within the network and sometimes
   3) to shield a network from instability happening elsewhere in the
   Internet.

   The more routing intelligence a router has, the more resources from
   the router are needed for related tasks. It is logical, then, to
   place as little routing intelligence as possible on routers that
   already are heavily burdened with forwarding and switching tasks.

   Usually, traffic is heavily concentrated in the core of the network.
   Because traffic aggregates from the edge of the network toward the
   core, traffic is less concentrated near the edge of the network.
   Consequently, to build a scalable routing system, it is wise to place
   routing intelligence at the edge of the network, especially when
   routers that are not sufficiently decouple forwarding and routing are
   deployed in the network. In addition, pushing routing intelligency as
   close to the edge of the network as possible also serve the purpose
   of distributing computational and configuration burdens on all
   routers. Further, it is also desirable to move the heavy burden of
   processing routes to out-of-band processors, freeing more resources
   in network routers for packet forwarding and handling.

6.4.2. Reduce Routes and Routing Information


   As discussed in Section 4.1, a large number of routes in the system
   is one of the major culprits in route scaling problems. It is best to
   reduce the routes in the system without losing necessary routing
   information.

6.4.2.1. CIDR and Route Aggregation



Yu                 Scalable Routing Design Principles          [Page 14]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   CIDR as specified in [10] is a mechanism to aggregate routes to
   efficiently utilize IP address space as well as reduce the number of
   routes in the global routing table. CIDR offers a way to summarize
   routing information, which is one of the keys for routing scalability
   in today's Internet.

   Aggregating routes would not only help global Internet scalability
   but would also contribute to scalability in local networks. The
   overall goal is to keep the routes in the backbone to a minimum.

   To achieve better aggregation within the network; that is, to reduce
   routes in the network, a block of consecutive IP addresses should be
   allocated to each access network so that when the access network
   announces its routes to the core network, they can be aggregated.
   This way, the core and other access networks would not need to know
   the specific prefixes of any particular access network. Although
   assignment of customer addresses from a provider block would have to
   be planned to support aggregation, the effort would be worthwhile.

6.4.2.2. Utilize Default Routing where it's Possible

   When the access network (tier two in the hierarchy) is a stub and
   there is no customer connection behind it requesting full Internet
   routing information, the access network can simply point default to
   the core network via its connection to the core. However, when a stub
   access network, for the sake of redundancy, connects to multiple core
   routers at the same location where large networks usually connect,
   the default cannot be used because the traffic should be engineered
   intelligently to distribute the load among the multiple connections.
   Some additional features should be added to default routes to make
   load sharing possible.

6.4.2.3. Reduce Alternative Paths

   Due to the requirement of reliability, the connectivity in the
   Internet is rich result in many paths toward a particular
   destination. In another word, there are many alternative paths in the
   BGP routing table towards the same destination, consuming router
   memory and routing processing time.

   To make routing scale, it may be desirable to reduce alternate paths
   while preserving reasonable redundancy. For example, on a given
   border router (such as a NAP router), one primary path plus an
   alternate path should provide reasonable redundancy. In this case, a
   third or even a fourth alternate route could be discarded for the
   sake of scaling.  This is a trade-off decision every network
   administrator needs to make based on the particular needs of his
   network.



Yu                 Scalable Routing Design Principles          [Page 15]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


6.4.2.4. Use New Technologies

   New technologies such as MPLS [11,12] would free routers in the core
   network from carrying full external routes, which would significantly
   reduce the total amount of routes carried and processed by these
   routers.

   This reduction is possible because, in an MPLS domain, Label Switched
   Path (LSP) can be established within the core network.  An ingress
   edge router that functions as a Label Switch Router (LSR) could
   assign a packet with a label. Each LSR in the core along the LSP
   would forward the packet based on the label, eliminating the need for
   full routing knowledge in the core.

6.4.3. Use Static Route at Edge

   As mentioned earlier, one of the scaling issues in large networks is
   that a single router may fan out to hundreds of customer routers. As
   a result, resource consumption will be very intensive if all the
   customer routers BGP with the router. At first glance, it seems
   reasonable for a customer network in a different administrative
   system (that is, in another AS) to exchange routing information with
   the large network and an attached customer network via BGP. However,
   this is not necessary the case. If a customer network is single-
   homed; that is, if the sole network connection is via the large
   network and the customer network does not announce a large number of
   routes, static routing can work in this situation. Since the customer
   network is single-homed, dynamic routing is not necessary: Static
   routing will not have any negative impact on services.  The
   advantages are that the customer concentrator will have fewer E-BGP
   sessions to handle, and no route flapping can result from the
   statically configured customer routes.

   Configuration of the customer's static routes on the provider's
   concentrator router may add management overhead, especially if a
   customer advertises a large number of routes; however, the trade-off
   is worthwhile.

6.4.4. Minimize the Impact of Route Flapping

   As discussed earlier, route flapping is typically caused by link
   instability that results in excessive routing updates across the
   Internet. Route flapping can originate anywhere in the global
   Internet and affect every network in the Internet routing mesh (BGP
   mesh). Because route flapping consumes resources, it needs to be
   dealt with.

   One way to reduce the effect of route flapping is to turn on route



Yu                 Scalable Routing Design Principles          [Page 16]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   dampening as specified in [10]. Essentially, dampening suppresses an
   unstable route until it becomes stable. The current practice is for
   each ISP to enable route dampening on its border routers. This way,
   the excessive routing updates can be stopped at the border.

   An ideal model is to suppress the announcement of a flapping route
   right at the source. One way to implement this is to have a router
   recognize instability on its directly connected links and suppress
   the announcement of the route. So far, there is no such
   implementation; however, the approach should be explored.

   Route aggregation often masks route flapping since components of an
   aggregated route (more specific routes) would not cause the
   aggregated route to flap. Therefore using CIDR would also help to
   alleviate route flapping.

6.5. Scalable Routing Policy and Scalable Implementation

   Routing policy involves routing decisions about acceptance and
   advertisement of certain routes to or from other networks and about
   routing preference when more than one route becomes available.
   Routing policy enforces business agreements between network entities
   and is largely governed by non-technical criteria. In essence,
   routing policy involves defining criteria for route filtering and
   route selection.

   One aspect of route filtering has to do with traffic control between
   routing domains or between different provider networks. Making policy
   based on individual prefixes should be avoided in this case because,
   with 57,000 prefixes in the Internet, it does not scale.  Making
   policy based on ASs that administratively represent a set of prefixes
   scales better.

   Another purpose of route filtering is to protect the integrity of
   routing information by preventing accepting falsely advertised
   routing information that would lead traffic to black holes. In this
   case, only prefix-based filtering will sufficiently achieve the goal.
   Prefix based filtering needs to occur at the borders of a network of
   direct customers and peer networks. The filtering is harder to manage
   at the boundary of the peer networks since a peer network usually
   advertises a large amount of prefixes. As mentioned earlier, there
   are about 57,000 routes known to the Internet. This means a large
   default-free network would need to filter in the order of hundred
   thousands prefixes or even more since a route could be advertised by
   more than one sources. The sheer amount of the prefixes to be
   filtered imposes challenges for router configuration memory and
   configuration management. To make it scale, one would need to rely on
   the help from out-of-band process to sort out which prefix should be



Yu                 Scalable Routing Design Principles          [Page 17]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   accepted or denied from which source. IRR [13] and DNS [14] are among
   the current proposed mechanisms for implementing prefix-based
   filtering.

   Route selection policy determines which path is used to send traffic
   toward a certain destination. This is important, for example, when a
   network has two connections to another network and learns routes from
   both connections.  The decision would be involving which path to use
   to send traffic to the customers behind the other network. The
   choices are typically:

   o Take the closest exit
   o Take the best exit
   o Always prefers customers direct route
   o Let other AS or customer to determine

   When a policy is defined, its implications for scaling and
   implementing the policy in a scalable fashion need to be considered.
   For example, if the customer determines which paths traffic follows,
   customers, not the provider, should be required to set routing
   parameters. Customers can use the BGP community or mechanisms such as
   MED to set routing preferences in a much more scalable way.

   Again, it does not scale to make such policy decisions based on
   individual prefixes. An AS should be the logical unit for applying
   the policy.  Further, the policy should be universal and should avoid
   exceptions.

   Another scaling measure is to avoid making complex policy. When
   routing policy is complex, the management such as configuration of
   the router and debugging would be a problem. The ultimate goal is to
   make the network manageable.

   The following basic principles would help scaling the routing policy
   management.

   o Make policies as simple as possible but meet the requirements
   o Automate as much as possible to avoid error-prone manual work
   o Avoid prefix based policy as much as possible with the exception of
     prefix-based route filtering for routing integrity.
   o Avoid making exceptions
   o Use out-of-band routing policy process where possible

6.6. Out-of-band Process

   A typical router assumes both routing and forwarding functions.
   However, conceptually, routing and forwarding are two separate
   processes. A router's ultimate task is to forward packets based on
   its forwarding table, which is derived from routing information. One
   cause of route scaling problems is that routers run out of steam
   because routing requires too much processing.  While a router has to



Yu                 Scalable Routing Design Principles          [Page 18]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   forward packets, it does not necessarily have to exchange and process
   routing information or execute routing policy; these tasks can be
   performed elsewhere. Thus the question should be: Would it be
   possible to remove the routing process from a router to reduce its
   burden ?  Moving the routing process from the routers to other
   systems is referred to as out-of-band route processing.

   Out-of-band route processes would, in short, perform the heavy-weight
   routing tasks. They would build a forwarding table for the router,
   select routes based on pre-defined policy, filter routes, and shield
   the router from route flapping attacks.

   The shortcomings of out-of-band route processing are the possible
   introduction of delays in routing changes; the de-coupling of routing
   and forwarding paths, which could introduce inaccurate routing
   information; and the cost of extra equipment.

   Appendix presents a current example of out-of-band route processing.
   It also suggests other possible solutions.

7. Conclusion and Discussion

   How routing scales has direct impact on network stability and
   performance.  Route scaling is an important issue, especially in
   large networks. This document identifies the major factors that
   affect route scalability and establishes basic principles for
   designing scalable routing in large networks.

   The major routing scaling issues we are facing today are excessive
   router resource consumption resulting from routing processing burdens
   causing network instability and routing complexity resulting in
   difficulties of manage and trouble shoot causing degradation of
   service.

   The outlined principles for designing a scalable routing system are
   building routing hierarchy; reducing routing processing burden where
   possible; defining manageable routing policies and using assistant of
   out-of-band routing process.

   Emerging technologies may alleviate route scaling issues, or they may
   exacerbate the problem. For example, MPLS would reduce the need for
   core routers to handle external routes and would free much-needed
   resources for packet forwarding and switching. However, the
   deployment of Dense Wavelength Division Multiplexing (DWDM) may put
   more burdens on routing, since it would introduce many more node
   adjacencies in a network.

   QoS and differentiated service provision [15,16] would put more
   requirements on policy and configuration. Constraint-based routing



Yu                 Scalable Routing Design Principles          [Page 19]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   [17] may impose more challenges for routing management.

   Using out-of-band resource to assist routing processing is a concept
   only been used in the IXP, there are a lot of potential to use it
   within a network to help addressing routing scaling issues. This is a
   topic worthy further exploring.

   Routing protocols and/or its implementations can still be improved or
   enhanced for better handle the scaling issues. For example, the IS-IS
   multiple area mechanism is needed in order to scale the IGP in large
   network.  Also, using multicast or reliable flooding mechanism for
   I-BGP updates instead of pairwise full mesh peering is something
   worth investigating.

   It is our believe that even with the deployment of new technologies
   such as DWDM, MPLS and others in the future, the fundermental routing
   scheme will remain the current IGP/BGP paradigm.  Therefore, the
   scalable routing design principles outlined in this document should
   still apply with the deployment of new technologies.

8. Security Considerations

   Security considerations are out of scope of this document.

9. Acknowledgement

   Special thanks to Curtis Villamizar for the extensive review of the
   document and many helpful comments. Many thanks to Dave Katz and
   Yakov Rekhter for their insightful comments.

10. References

   [1] "Intermediate System to Intermediate System Intra-Domain Routeing
   Exchange Protocol for use in Conjunction with the Protocol for
   Providing the Connectionless-mode Network Service (ISO 8473)", ISO DP
   10589, February 1990.

   [2] R. Callon. "Use of OSI IS-IS for Routing in TCP/IP and Dual
   Environments.", RFC 1195, December 1990.

   [3] J. Moy. "OSPF Version 2." RFC 2328, April 1998

   [4] Y. Rekhter, T. Li. "A Border Gateway Protocol 4 (BGP-4)." RFC
   1771, March 1995

   [5] C. Labovitz, R. Malan, F. Jahanian, ``Origins of Internet Routing
   Instability," in the Proceedings of INFOCOM99, New York, NY, June,
   1999.



Yu                 Scalable Routing Design Principles          [Page 20]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   [6] J. Moy, "OSPF- Anatomy of an Internet Routing Protocol." Addison-
   Wesley, January 1998.

   [7] T. Bates, R. Chandra, E. Chen, "An alternative to full mesh IBGP"
   Working in Progress

   [8] P. Traina, "Autonomous System Confederation Approach to Solving
   the I-BGP Scaling Problem." RFC1965, June 1996

   [9] V. Curtis, R. Chandra, R. Govindan, "BGP Route Flap Damping." RFC
   2439, November 1998.

   [10] V. Fuller, T. Li, J. Yu, K. Varadhan "Classless Inter-Domain
   Routing (CIDR): an Address Assignment and Aggregation Strategy." RFC
   1519, September 1993.

   [11] E. Rosen, A. Viswanathan, R. Callon, "A Proposed Architecture
   for MPLS", Work in Progress.

   [12] R. Callon, P. Doolan, N. Feldman, A. Fredette, G. Swallow, A.
   Viswanathan, "A Framework for Multiprotocol Label Switching." Work in
   Progress.

   [13] V. Curtis, C. Alaettinoglu, R. Govindan, D. Meyer, "Distributed
   Routing Policy System."  Internet Draft, February, 1999.

   [14] T. Bates, R. Bush, T. Li, Y. Rekhter, "DNS-based NLRI origin AS
   verification in BGP." Working in Progress.

   [15] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. Weiss,
   "An Architecture for Differentiated Services." RFC 2475, December
   1998.

   [16] Y.Bernet, et al "A Framework for Differentiated Services."
   Internet Draft, February 1999.

   [17] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus,
   "Requirements for Traffic Engineering Over MPLS. " Internet-Draft,
   October 1998.

Author's Address

   Jieyun (Jessica) Yu
   UUNET Technology
   880 Technology Dr.
   Ann Arbor, MI 48108

   Phone: (734) 214-7314



Yu                 Scalable Routing Design Principles          [Page 21]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   EMail: jyy@uu.net


11. Appendix. Out-of-Band Routing Processes

   The use of a Route Server (RS) at NAPs is an example of achieving
   routing scalability through an out-of-band routing process. A NAP is
   a public inter-connection point where ISP networks exchange traffic.
   ISP routers at the NAP establish BGP peer sessions with each other.
   The result is full mesh E-BGP peering with a complexity of O(N^2)
   system wide. When the RS is in place, each router peers only with the
   RS (and its backup) to obtain necessary routing information (or more
   precisely, the necessary forwarding information). In addition, the RS
   also filters routes and execute policy for each provider's router,
   which further reduces the burden on all routers involved.
   The concept of the Route Server can also be used to help address
   routing scalability in a large network.

   1) RS Assisted Peering between Customer Concentrator Router and
   Customer Routers

   Currently, in a typical large provider network, a customer
   concentrator router would connect up to hundreds of customer routers.
   That means the router has to handle hundreds of E-BGP sessions,
   numerous number of prefix filtering. These tasks would impose a heavy
   burden on the concentrator router. Reducing the number of customer
   router per concentrator router is not an option since this would
   introduce more routers in the routing system of the whole network,
   which is neither scalable for backbone routing, nor cost efficient.
   Using RS between customers and the providers' customer concentrator
   router become an attractive option to reduce the burden on the
   router.

   Figure 1 shows one way to incorporate an RS router ?? between a
   provider's customer concentrator router and customer routers.

        ---------------------------  LAN Media in a POP
                |            |
              -----        -----
              |CR |        |RS |
              -----        -----
              / | \
             /  |  \
            C1  C2..Cn

     Figure 1: RS serving customer concentrator and customer routers


   In a scenario without an RS, the concentrator router (CR) has to peer
   with customer routers C1, C2 ... Cn (where n could be in the
   hundreds). When an RS router is introduced, CR, C1, C2 ... Cn each



Yu                 Scalable Routing Design Principles          [Page 22]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   peer with the RS router instead, and the RS passes the processed
   routing information (or forwarding information) to all of them
   according to policy and filters.

   The advantages are obvious:

   o The concentrator router peers only with the RS router instead of
   with hundreds of customer routers.

   o The concentrator router does not need to filter prefixes or process
   routing policies, which frees resources for packet forwarding and
   handling.

   One general concern with the use of an RS router is the possibility
   of a mismatch of routing connectivity mismatch the physical
   connectivity. For example, if the link between the CR and C1 is down
   and if the RS router is not aware of the outage, it will continue to
   pass routes from C1 to the concentrator router, and the traffic
   following these routes will be black holed. However, this is not a
   problem in the case described here. This is because the RS router has
   to go through the concentrator router to peer with C1, C2 ... Cn,
   when the link is down, C1 is inaccessible from the RS router, and no
   routing information can be exchanged between the two. Consequently,
   the RS will announce no routes related to C1.

   Another concern is the single point of failure. If the RS router is
   down, no routing information can be exchanged between the
   concentrator router and C1, C2 ... Cn, and no traffic will flow
   between them. This problem could be addressed by adding a second RS
   router as a backup.

   In this scenario, when the RS router passes routing information to
   C1...Cn, it needs to designate the IP address of the concentrator
   router as the next hop. Likewise, when the RS router passes routes
   from each customer router to the concentrator router, it needs to
   place the correct next hop on the route. Modifications should be made
   to the RS code to include this function.

   2) Private RS Router at InterExchange Point

   A large provider network often has many BGP peers at the
   Interexchange point, NAP or private interconnection. This means the
   border router has to handle many E-BGP sessions. Since an
   Interconnect points is usually the administrative boundaries between
   ISPs, policy and route filtering are very demanding. This imposes a
   scaling problem on the border router.

   Deploying many routers to distribute the load among them is an



Yu                 Scalable Routing Design Principles          [Page 23]


Internet Draft      draft-yu-routing-scaling-00.txt           April 1999


   expensive solution: Extra hardware and extra ports cost money.
   Shifting the routing burden to an RS router is a promising solution.
   In the case of using RS for multiple peers at a private interexchange
   point, the scenario is similar to RS used between customer
   concentrator and customer routers as described in 1) above. In the
   case of such peering at a NAP, the private RS could be placed either
   on the same NAP media or a private media between the ISP's NAP router
   and the RS.

   3) RS Routers at Each POP in a Large Network

   Even in a network with a hierarchical routing structure, a sub-area
   may become too large, and I-BGP full meshing may impose a scaling
   problem. One way to address this would be to split the sub-area or
   add yet another tier of I-BGP reflector structure. Another possible
   solution would be to use an RS router as an I-BGP Server. Depending
   on the topology of a POP, this solution may or may not be suitable.
   The use of RS routers at network POPs should be investigated further.

































Yu                 Scalable Routing Design Principles          [Page 24]


Html markup produced by rfcmarkup 1.129b, available from https://tools.ietf.org/tools/rfcmarkup/