Network Working Group                                  A. Bashandy, Ed.
Internet Draft                                              C. Filsfils
Intended status: Informational                            Cisco Systems
Expires: February May 2017                                          P. Mohapatra
                                                       Sproute Networks
                                                         August 1,
                                                      November 22, 2016
                   BGP Prefix Independent Convergence
                     draft-ietf-rtgwg-bgp-pic-02.txt
                     draft-ietf-rtgwg-bgp-pic-03.txt

Abstract

In the network comprising thousands of iBGP peers exchanging millions
of routes, many routes are reachable via more than one next-hop.
Given the large scaling targets, it is desirable to restore traffic
after failure in a time period that does not depend on the number of
BGP prefixes. In this document we proposed an architecture by which
traffic can be re-routed to ECMP or pre-calculated backup paths in a
timeframe that does not depend on the number of BGP prefixes. The
objective is achieved through organizing the forwarding data
structures in a hierarchical manner and sharing forwarding elements
among the maximum possible number of routes. The proposed technique
achieves prefix independent convergence while ensuring incremental
deployment, complete automation, and zero management and provisioning
effort. It is noteworthy to mention that the benefits of BGP-PIC are
hinged on the existence of more than one path whether as ECMP or
primary-backup.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   This document may contain material from IETF Documents or IETF
   Contributions published or made publicly available before November
   10, 2008. The person(s) controlling the copyright in some of this
   material may not have granted the IETF Trust the right to allow
   modifications of such material outside the IETF Standards Process.
   Without obtaining an adequate license from the person(s)
   controlling the copyright in such materials, this document may not
   be modified outside the IETF Standards Process, and derivative
   works of it may not be created outside the IETF Standards Process,
   except to format it for publication as an RFC or to translate it
   into languages other than English.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-Drafts
   as reference material or to cite them other than as "work in
   progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on December 1, May 22, 2016.

Copyright Notice

   Copyright (c) 2016 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document. Code Components extracted from this
   document must include Simplified BSD License text as described in
   Section 4.e of the Trust Legal Provisions and are provided without
   warranty as described in the Simplified BSD License.

Table of Contents

   1. Introduction...................................................3
      1.1. Conventions used in this document.........................4
      1.2. Terminology...............................................4
   2. Overview.......................................................5 Overview.......................................................6
      2.1. Dependency................................................6
         2.1.1. Hierarchical Hardware FIB............................6
         2.1.2. Availability of more than one primary or secondary BGP
         next-hops...................................................7
         2.1.3. Pre-Computation of a secondary BGP next-hop.....Error!
         Bookmark not defined.
      2.2. BGP-PIC Illustration......................................7
   3. Constructing the Shared Hierarchical Forwarding Chain..........7 Chain..........9
      3.1. Example 1: Constructing the BGP-PIC forwarding Chain.................9
         3.1.1. Example: Primary-Backup Path Scenario...................8
      3.2. Example 2: Scenario...............10
   4. Forwarding Behavior...........................................11
   5. Handling Platforms with Limited Levels of Hierarchy.....9
   4. Hierarchy...........12
      5.1. Flattening the Forwarding Behavior...........................................13
   5. Chain..........................12
      5.2. Example: Flattening a forwarding chain...................14
   6. Forwarding Chain Adjustment at a Failure......................15
      5.1. Failure......................21
      6.1. BGP-PIC core.............................................16
      5.2. core.............................................22
      6.2. BGP-PIC edge.............................................17
         5.2.1. edge.............................................23
         6.2.1. Adjusting forwarding Chain in egress node failure...17
         5.2.2. failure...23
         6.2.2. Adjusting Forwarding Chain on PE-CE link Failure....17
      5.3. Failure....23
      6.3. Handling Failures for Flattended Flattened Forwarding Chains.......18
   6. Properties....................................................19
      6.1. Coverage.................................................19
         6.1.1. Chains........24
   7. Properties....................................................25
      7.1. Coverage.................................................25
         7.1.1. A remote failure on the path to a BGP next-hop......19
         6.1.2. next-hop......25
         7.1.2. A local failure on the path to a BGP next-hop.......19
         6.1.3. next-hop.......25
         7.1.3. A remote iBGP next-hop fails........................20
         6.1.4. fails........................26
         7.1.4. A local eBGP next-hop fails.........................20
      6.2. Performance..............................................20
         6.2.1. Perspective.........................................20
      6.3. Automated................................................21
      6.4. Incremental Deployment...................................22
   7. Dependency....................................................22
      7.1. Hierarchical Hardware FIB................................22 fails.........................26
      7.2. Availability of more than one primary or secondary BGP next-
      hops..........................................................22 Performance..............................................26
      7.3. Pre-Computation of a secondary BGP next-hop..............23 Automated................................................27
      7.4. Incremental Deployment...................................27
   8. Security Considerations.......................................23 Considerations.......................................27
   9. IANA Considerations...........................................23 Considerations...........................................27
   10. Conclusions..................................................23 Conclusions..................................................27
   11. Acknowledgments..............................................25
   12. References...................................................23
      12.1. References...................................................28
      11.1. Normative References....................................23
      12.2. References....................................28
      11.2. Informative References..................................24 References..................................28
   12. Acknowledgments..............................................29
   Appendix A. Perspective..........................................30

1. Introduction

   As a path vector protocol, BGP propagates reachability serially.
   Hence BGP convergence speed is inherently slow due limited by the time taken to
   serially propagate reachability information from the
   serial nature point of reachability propagation.
   failure to the device that must re-converge. BGP speakers exchange
   reachability information about prefixes[2][3] and, for labeled
   address families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128, an
   edge router assigns local labels to prefixes and associates the
   local label with each advertised prefix such as L3VPN [8], 6PE
   [9], and Softwire [7] using BGP label unicast technique[4]. A BGP
   speaker then applies the path selection steps to choose the best
   path. In modern networks, it is not uncommon to have a prefix
   reachable via multiple edge routers. In addition to proprietary
   techniques, multiple techniques have been proposed to allow for
   BGP to advertise more than one path for a given prefix
   [6][11][12], whether in the form of equal cost multipath or
   primary-backup. Another more common and widely deployed scenario is
   L3VPN with multi-homed VPN sites with unique Route Distinguisher.
   It is advantageous to utilize the commonality among paths used by
   NLRIs to significantly improve convergence in case of topology
   modifications.

   This document proposes a hierarchical and shared forwarding chain
   organization that allows traffic to be restored to pre-calculated
   alternative equal cost primary path or backup path in a time
   period that does not depend on the number of BGP prefixes. The
   technique relies on internal router behavior that is completely
   transparent to the operator and can be incrementally deployed and
   enabled with zero operator intervention.

1.1. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
   NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
   in this document are to be interpreted as described in RFC-2119
   [1].

   In this document, these words will appear with that interpretation
   only when in ALL CAPS. Lower case uses of these words are not to
   be interpreted as carrying RFC-2119 significance.

1.2. Terminology

   This section defines the terms used in this document. For ease of
   use, we will use terms similar to those used by L3VPN [8]

   o  BGP prefix: It is a A prefix P/m (of any AFI/SAFI) that a BGP speaker
      has a path for.

   o  IGP prefix: It is a A prefix P/m (of any AFI/SAFI) that is learnt via
      an Interior Gateway Protocol, such as OSPF and ISIS, has a path
      for. The prefix may be learnt directly through the IGP or
      redistributed from other protocol(s)

   o  CE: It is an An external router through which an egress PE can reach a
      prefix P/m.

   o  Ingress PE, "iPE": It is a A BGP speaker that learns about a prefix
      through a IBGP peer and chooses an egress PE as the next-hop for
      the prefix.. prefix.

   o  Path: It is the The next-hop in a sequence of unique connected nodes starting from the
      current node and ending with the destination node or network
      identified by the prefix. The nodes may not be directly
      connected.

   o  Recursive path: It is a A path consisting only of the IP address of the
      next-hop without the outgoing interface. Subsequent lookups are needed
      necessary to determine the outgoing interface. interface and a directly
      connected next-hop

   o  Non-recursive path: It is a A path consisting of the IP address of the a
      directly connected next-hop and one outgoing interface

   o  Primary path: It is a A recursive or non-recursive path that can be
      used all the time. time as long as a walk starting from this path can
      end to an adjacency. A prefix can have more than one primary
      path

   o  Backup path: It is a A recursive or non-recursive path that can be used
      only after some or all primary paths become unreachable

   o  Leaf: A leaf is container data structure for a prefix or local label.
      Alternatively, it is the data structure that contains prefix
      specific information.

   o  IP leaf: Is the The leaf corresponding to an IPv4 or IPv6 prefix

   o  Label leaf. It is the The leaf corresponding to a locally allocated label
      such as the VPN label on an egress PE [8].

   o  Pathlist: It is an An array of paths used by one or more prefix to forward
      traffic to destination(s) covered by a IP prefix. Each path in
      the pathlist carries its "path-index" that identifies its
      position in the array of paths. "). In general, the value of the
      "path-index" stored in path may not necessarily has the same
      value of the location of the path in the pathlist. For example
      the 3rd path may carry path-index value of 1

   o  A pathlist may contain a mix of primary and backup paths

   o  OutLabel-List: Each labeled prefix is associated with an
      OutLabel-List. The OutLabel-List is an array of one or more
      outgoing labels and/or label actions where each label or label
      action has 1-to-1 correspondence to a path in the pathlist.
      Label actions are: push the label, pop the label, or swap the
      incoming label with the label in the Outlabel-Array entry. entry, or
      don't push anything at all in case of "unlabeled". The prefix
      may be an IGP or BGP prefix

   o  Adjacency: It is the The layer 2 encapsulation leading to the layer 3
      directly connected next-hop

   o  Dependency: An object X is said to be a dependent or Child child of
      object Y if Object Y cannot be deleted unless object there is at least one forwarding chain where the
      forwarding engine must visits the object X before visiting the
      object Y in order to forward a packet. Note that if object X is
      a child of object Y, then Y cannot be deleted unless object X
      is no longer a dependent/child of object Y

   o  Route: It is a A prefix with one or more paths associated with it.
      Hence the minimum set of objects needed to construct a route is
      a leaf and a pathlist.

2. Overview

   The idea of BGP-PIC is based on two pillars

   o  A shared hierarchical Forwarding Chain forwarding Chain: It is not uncommon to see
      multiple destinations are reachable via the same list of next-
      hops. Instead of having a separate list of next-hops for each
      destination, all destinations sharing the same list of next-hops
      can point to a single copy of this list thereby allowing fast
      convergence by making changes to a single shared list of next-
      hops rather than possibly a large number of destinations. Because
      paths in a pathlist may be recursive, a hierarchy is formed
      between pathlist and the resolving prefix whereby the pathlist
      depends on the resolving prefix.

   o  A forwarding plane that supports multiple levels of indirection

   To illustrate the two pillars above, we will use indirection:
      A forwarding that starts with a destination and ends with an example of
      outgoing interface is not a simple multihomed L3VPN [8] prefix in flat structure. Instead a BGP-free core running LDP
   [5] or segment routing over MPLS
      forwarding plane [14].

           +--------------------------------+
           |                                |
           |                               ePE2
           |                                |  \
           |                                |   \
           |                                |    \
          iPE                               |    CE.......VRF "Blue"
           |                                |    /       (VPN-IP1)
           |                                |   /        (VPN-IP2)
           |   LDP/Segment-Routing Core     |  /
           |                               ePE1
           |                                |
           +--------------------------------+
             Figure 1 VPN prefix reachable entry is constructed via multiple PEs

   Referring to Figure 1, suppose the iPE (the ingress PE) receives
   NLRIs for the VPN prefixes VPN-IP1 levels of
      dependency. A BGP NLRI uses a recursive next-hop, which in turn
      resolves via an IGP next-hop, which in turn resolves via an
      adjacency consisting of one or more outgoing interface(s) and VPN-IP2 from two egress PEs,
   ePE1 and ePE2
      next-hop(s).

   Designing a forwarding plane that constructs multi-level forwarding
   chains with next-hop BGP-NH1 and BGP-NH2, respectively.
   Assume maximal sharing of forwarding objects allows rerouting a
   large number of destinations by modifying a small number of objects
   thereby achieving convergence in a time frame that ePE1 advertise the VPN labels VPN-L11 and VPN-L12 while
   ePE2 advertise does not depend
   on the VPN labels VPN-L21 and VPN-L22 for VPN-IP1 and
   VPN-IP2, respectively. Suppose that BGP-NH1 and BGP-NH2 are resolved
   via number of destinations. For example, if the IGP prefixes IGP-IP1 and IGP-P2, where each happen prefix that
   resolves a recursive next-hop is updated there is no need to have 2
   ECMP paths with IGP-NH1 and IGP-NH2 reachable via update
   the interfaces I1
   and I2, respectively. Suppose possibly large number of BGP NLRIs that local labels (whether LDP[5] or
   segment routing [14]) on use this recursive next-
   hop.

2.1. Dependency

   This section describes the downstream LSRs for IGP-IP1 are IGP-L11 required functionality in the forwarding
   and IGP-L12 while control planes to support BGP-PIC described in this document

2.1.1. Hierarchical Hardware FIB

   BGP PIC requires a hierarchical hardware FIB support: for IGP-P2 are IGP-L21 and IGP-L22.

   Based on each BGP
   forwarded packet, a BGP leaf is looked up, then a BGP Pathlist is
   consulted, then an IGP Pathlist, then an Adjacency.

   An alternative method consists in "flattening" the information about NLRIs and dependencies when
   programming the resolving BGP destinations into HW FIB resulting in
   potentially eliminating both the BGP Path-List and IGP prefixes,
   a hierarchical Path-List
   consultation. Such an approach decreases the number of memory
   lookup's per forwarding chain can be constructed as shown in
   Figure 2.

   IP Leaf:    Pathlist:    IP Leaf:           Pathlist:
   --------  +-------+     --------          +----------+
   VPN-IP1-->|BGP-NH1|-->IGP-IP1(BGP NH1)--->|IGP NH1,I1|--->Adjacency1
     |       |BGP-NH2|-->....     |          |IGP NH2,I2|--->Adjacency2
     |       +-------+            |          +----------+
     |                            |
     |                            |
     v                            v
   OutLabel-List:               OutLabel-List:
   +----------------------+      +----------------------+
   |VPN-L11 (VPN-IP1, NH1)|      |IGP-L11 (IGP-IP1, NH1)|
   |VPN-L12 (VPN-IP1, NH2)|      |IGP-L12 (IGP-IP1, NH2)|
   +----------------------+      +----------------------+

           Figure 2 Shared Hierarchical Forwarding Chain operation at iPE

   The forwarding chain depicted in Figure 2 illustrates the first
   pillar, which is expense of HW FIB memory
   increase (flattening means less sharing hence duplication), loss of
   ECMP properties (flattening means less pathlist entropy) and hierarchy. We can see that the loss of
   BGP
   pathlist consisting PIC properties.

2.1.2. Availability of BGP-NH1 and BGP-NH2 is shared by all NLRIs
   reachable via ePE1 and ePE2. As such, it is possible to make changes
   to more than one primary or secondary BGP next-hops

   When the pathlist without having to make changes to primary BGP next-hop fails, BGP PIC depends on the NLRIs. For
   example, if BGP-NH2 becomes unreachable, there is no need to modify
   any
   availability of a pre-computed and pre-installed secondary BGP next-
   hop in the possibly large number BGP Pathlist.

   The existence of NLRIs. Instead only the shared
   pathlist needs to be modified. Likewise, due to a secondary next-hop is clear for the hierarchical
   structure following
   reason: a service caring for network availability will require two
   disjoint network connections hence two BGP next-hops.

   The BGP distribution of the forwarding chain, it secondary next-hop is possible to make
   modifications to the IGP routes without having to make any changes available thanks
   to the following BGP NLRIs. For example, if mechanisms: Add-Path [11], BGP Best-External
   [6], diverse path [12], and the interface "I2" goes down, only
   the shared IGP pathlist needs to be updated, but none frequent use in VPN deployments of
   different VPN RD's per PE. It is noteworthy to mention that the IGP
   prefixes sharing the IGP pathlist nor the
   availability of another BGP NLRIs using the IGP
   prefixes for resolution need to be modified.

   Figure 2 path does not mean that all failure
   scenarios can also be used covered by simply forwarding traffic to illustrate the second
   available secondary path. The discussion of how to cover various
   failure scenarios is beyond the scope of this document

2.2. BGP-PIC pillar.
   Having a deep forwarding chain such Illustration

   To illustrate the two pillars above as well as the one illustrated platform
   dependency, we will use an example of a simple multihomed L3VPN [8]
   prefix in Figure
   2 requires a BGP-free core running LDP [5] or segment routing over
   MPLS forwarding plane that is capable of accessing [14].

    +--------------------------------+
    |                                |
    |                               ePE2 (IGP-IP1 192.0.2.1, Loopback)
    |                                |  \
    |                                |   \
    |                                |    \
   iPE                               |    CE....VRF "Blue", ASnum 65000
    |                                |    /    (VPN-IP1 11.1.1.0/24)
    |                                |   /     (VPN-IP2 11.1.2.0/24)
    |   LDP/Segment-Routing Core     |  /
    |                               ePE1 (IGP-IP2 192.0.2.2, Loopback)
    |                                |
    +--------------------------------+
             Figure 1 VPN prefix reachable via multiple
   levels of indirection in order PEs

   Referring to calculate Figure 1, suppose the outgoing
   interface(s) and next-hops(s). While a deeper forwarding chain
   minimizes iPE (the ingress PE) receives
   NLRIs for the re-convergence time on topology change, there will
   always exist platforms VPN prefixes VPN-IP1 and VPN-IP2 from two egress PEs,
   ePE1 and ePE2 with limited capabilities next-hop BGP-NH1 and hence imposing
   a limit on BGP-NH2, respectively.
   Assume that ePE1 advertise the depth of VPN labels VPN-L11 and VPN-L12 while
   ePE2 advertise the forwarding chain. The example in Section
   3.2 illustrates how to gracefully trade off convergence speed with VPN labels VPN-L21 and VPN-L22 for VPN-IP1 and
   VPN-IP2, respectively. Suppose that BGP-NH1 and BGP-NH2 are resolved
   via the number of hierarchical levels IGP prefixes IGP-IP1 and IGP-P2, where each happen to support platforms have 2
   ECMP paths with
   different capabilities.

3. Constructing IGP-NH1 and IGP-NH2 reachable via the interfaces I1
   and I2, respectively. Suppose that local labels (whether LDP[5] or
   segment routing [14]) on the downstream LSRs for IGP-IP1 are IGP-L11
   and IGP-L12 while for IGP-P2 are IGP-L21 and IGP-L22. As such, the
   routing table at iPE is as follows:

         65000:11.1.1.0/24
            via ePE1 (192.0.2.1), VPN Label: VPN-L11
            via ePE2 (192.0.2.2), VPN Label: VPN-L21

         65000:11.1.2.0/24
            via ePE1 (192.0.2.1), VPN Label: VPN-L12
            via ePE2 (192.0.2.2), VPN Label: VPN-L22

         192.0.2.1/32
            via Core, Label:  IGP-L11
            via Core, Label:  IGP-L12

         192.0.2.2/32
            via Core, Label:  IGP-L21
            via Core, Label:  IGP-L22

   Based on the above routing table, a hierarchical forwarding chain
   can be constructed as shown in Figure 2.

   IP Leaf:    Pathlist:    IP Leaf:           Pathlist:
   --------  +-------+     --------          +----------+
   VPN-IP1-->|BGP-NH1|-->IGP-IP1(BGP NH1)--->|IGP NH1,I1|--->Adjacency1
     |       |BGP-NH2|-->....      |         |IGP NH2,I2|--->Adjacency2
     |       +-------+             |         +----------+
     |                             |
     |                             |
     v                             v
   OutLabel-List:                OutLabel-List:
   +----------------------+      +----------------------+
   |VPN-L11 (VPN-IP1, NH1)|      |IGP-L11 (IGP-IP1, NH1)|
   |VPN-L12 (VPN-IP1, NH2)|      |IGP-L12 (IGP-IP1, NH2)|
   +----------------------+      +----------------------+

           Figure 2 Shared Hierarchical Forwarding Chain

   Constructing the at iPE

   The forwarding chain depicted in Figure 2 illustrates the first
   pillar, which is an application of sharing and hierarchy. We can see that the two
   pillars described in Section 2.

   The whole process starts when BGP downloads a prefix to FIB. The
   prefix contains one or more outgoing paths. For certain labeled
   prefixes, such as VPN [8] prefixes, each path may be associated with
   an outgoing label
   pathlist consisting of BGP-NH1 and BGP-NH2 is shared by all NLRIs
   reachable via ePE1 and ePE2. As such, it is possible to make changes
   to the prefix itself may be assigned a local
   label. The list of outgoing paths defines a pathlist. If such
   pathlist does not already exist, then FIB creates a new pathlist,
   otherwise the existing pathlist without having to make changes to the NLRIs. For
   example, if BGP-NH2 becomes unreachable, there is used. The BGP prefix is added as
   a dependent no need to modify
   any of the pathlist.

   The previous step constructs the upper part possibly large number of NLRIs. Instead only the shared
   pathlist needs to be modified. Likewise, due to the hierarchical
   structure of the forwarding chain. The forwarding chain chain, it is completed by resolving possible to make
   modifications to the
   paths of IGP routes without having to make any changes
   to the pathlist. A BGP path usually consists NLRIs. For example, if the interface "I2" goes down, only
   the shared IGP pathlist needs to be updated, but none of a next-hop.
   The next-hop is resolved by finding a matching the IGP prefix.

   The end result is a hierarchical shared forwarding chain where
   prefixes sharing the
   BGP IGP pathlist is shared by all BGP prefixes that use nor the same list of
   paths and BGP NLRIs using the IGP prefix is shared by all pathlists that have a path
   resolving via that IGP prefix. It is noteworthy
   prefixes for resolution need to mention that be modified.

   Figure 2 can also be used to illustrate the second BGP-PIC pillar.
   Having a deep forwarding chain such as the one illustrated in Figure
   2 requires a forwarding plane that is constructed without any operator intervention at
   all.

   The remainder capable of this section illustrates two examples. The first
   example illustrates the applicability accessing multiple
   levels of BGP-PIC in a primary-backup
   path deployment. The second example illustrates how BGP-PIC can be
   applied indirection in cases where order to calculate the outgoing
   interface(s) and next-hops(s). While a deeper forwarding plane supports limited number
   of indirections.

   3.1. Example 1: Primary-Backup Path Scenario

   Consider chain
   minimizes the egress PE ePE1 in re-convergence time on topology change, there will
   always exist platforms with limited capabilities and hence imposing
   a limit on the case depth of the multi-homed VPN
   prefixes in the BGP-free core depicted in Figure 1. Suppose ePE1
   determines that forwarding chain. Section 5 describes
   how to gracefully trade off convergence speed with the primary path is number of
   hierarchical levels to support platforms with different
   capabilities.

3. Constructing the external path but Shared Hierarchical Forwarding Chain

   Constructing the backup
   path forwarding chain is an application of the iBGP path two
   pillars described in Section 2. This section describes how to the other PE ePE2 with next-hop BGP-NH2.
   ePE2 constructs
   construct the forwarding chain depicted in Figure 3. We are
   only showing hierarchical shared manner

3.1. Constructing the BGP-PIC forwarding Chain

   The whole process starts when BGP downloads a single VPN prefix for simplicity. But all prefixes
   that are multihomed to ePE1 FIB. The
   prefix contains one or more outgoing paths. For certain labeled
   prefixes, such as VPN [8] prefixes, each path may be associated with
   an outgoing label and ePE2 share the BGP prefix itself may be assigned a local
   label. The list of outgoing paths defines a pathlist.

                    BGP OutLabel Array
     VPN-L11            +---------+
   (Label-leaf)---+---->|Unlabeled|
                  |     +---------+
                  |     | VPN-L21 |
                  | If such
   pathlist does not already exist, then FIB creates a new pathlist,
   otherwise the existing pathlist is used. The BGP prefix is added as
   a dependent of the pathlist.

   The previous step constructs the upper part of the hierarchical
   forwarding chain. The forwarding chain is completed by resolving the
   paths of the pathlist. A BGP path usually consists of a next-hop.
   The next-hop is resolved by finding a matching IGP prefix.

   The end result is a hierarchical shared forwarding chain where the
   BGP pathlist is shared by all BGP prefixes that use the same list of
   paths and the IGP prefix is shared by all pathlists that have a path
   resolving via that IGP prefix. It is noteworthy to mention that the
   forwarding chain is constructed without any operator intervention at
   all.

   The remainder of this section goes over an example to illustrate the
   applicability of BGP-PIC in a primary-backup path scenario.

3.2. Example: Primary-Backup Path Scenario

   Consider the egress PE ePE1 in the case of the multi-homed VPN
   prefixes in the BGP-free core depicted in Figure 1. Suppose ePE1
   determines that the primary path is the external path but the backup
   path is the iBGP path to the other PE ePE2 with next-hop BGP-NH2.
   ePE2 constructs the forwarding chain depicted in Figure 3. We are
   only showing a single VPN prefix for simplicity. But all prefixes
   that are multihomed to ePE1 and ePE2 share the BGP pathlist.

                    BGP OutLabel Array
     VPN-L11            +---------+
   (Label-leaf)---+---->|Unlabeled|
                  |     +---------+
                  |     | VPN-L21 |
                  |     | (swap)  |
                  |     +---------+
                  |           ^
                  |
                  |                    BGP Pathlist
                  |           |                   +------------+    Connected route
                  |                   |       |   CE-NH    |------>(to the CE)
                  |           |                   |path-index=0|
                  |           |                   +------------+
                  V
                  |                   |  VPN-NH2   |
     VPN-IP1 -----------------+------>| -----+------------------>|  (backup)  |------>IGP Leaf
   (IP prefix leaf)                   |path-index=1|    (Towards ePE2)
        |                             +------------+
        |
        |           BGP OutLabel Array
        |              +---------+
        +------------->|Unlabeled|
                       +---------+
                       | VPN-L21 |
                       | (push)  |
                       +---------+

   Figure 3 : VPN Prefix Forwarding Chain with eiBGP paths on egress PE

   The example depicted in Figure 3 differs from the example in Figure
   2 in two main aspects. First, as long as the primary path towards
   the CE (external path) is useable, it will be the only path used for
   forwarding while the OutLabel-List contains both the unlabeled label
   (primary path) and the VPN label (backup path) advertised by the
   backup path ePE2. The second aspect is presence of the label leaf
   corresponding to the VPN prefix. This label leaf is used to match
   VPN traffic arriving from the core. Note that the label leaf shares
   the OutLabel-List and the pathlist with the IP prefix.

   3.2. Example 2: Platforms with Limited Levels of Hierarchy

4. Forwarding Behavior

   This example section explains how the forwarding plane uses the hierarchical
   shared forwarding chain to forward a case of inter-AS option C [8] where there are 3
   levels of hierarchy. Figure 4 illustrates packet.

   When a packet arrives at a router, it matches a leaf. A labeled
   packet matches a label leaf while an IP packet matches an IP prefix
   leaf. The forwarding engines walks the sample topology. To
   force 3 levels of hierarchy, forwarding chain starting
   from the ASBRs leaf until the walk terminates on an adjacency. Thus when a
   packet arrives, the ingress domain (domain
   1) advertise chain is walked as follows:

   1. Lookup the core routers leaf based on the destination address or the label at
      the top of the egress domain (domain 2) to packet

   2. Retrieve the
   ingress PE (iPE) via BGP-LU [4] instead parent pathlist of redistributing them into the IGP leaf

   3. Pick the outgoing path "Pi" from the list of domain 1. resolved paths in
      the pathlist. The end result method by which the outgoing path is that picked is
      beyond the ingress PE (iPE) has
   2 levels scope of recursion for this document (e.g. flow-preserving hash
      exploiting entropy within the MPLS stack and IP header). Let the
      "path-index" of the outgoing path "Pi" be "j".

   4. If the VPN prefix VPN-IP1 is labeled, use the "path-index" "j" to retrieve
      the jth label "Lj" stored the jth entry in the OutLabel-List and VPN2-P2.

                 Domain 1                  Domain 2
           +----------------+           +-------------+
           |                |           |             |
           |   LDP/SR Core  |           | LDP/SR core |
           |                |           |             |
           |              ASBR11------ASBR21.......ePE1\
           |                |   \    /  |    .    .   | \
           |                |    \  /   |     .  .    |  \
           |                |     \/    |      ..     |   \VPN-IP1
           |                |     /\    |      . .    |   /
           |                |    /  \   |     .   .   |  /
           |                |   /    \  |    .     .  | /
          iPE             ASBR12------ASBR22.......ePE2
           |                |           |             | \
           |                |           |             |  \
           |                |           |             |   \
           |                |           |             |   /VPN-IP2
           |                |           |             |  /
           |                |           |             | /
           |              ASBR13------ASBR23.......ePE3/
           |                |           |             |
           |                |           |             |
           +----------------+           +-------------+
            <==============  <=========  <============
            Advertise PE2x    Advertise   Redistribute
             Using iBGP-LU    PE2x Using    IGP into
                               eBGP-LU        BGP

                Figure 4 Sample 3-level hierarchy topology

   We will make
      apply the following assumptions about connectivity

   o  In "domain 2", both ASBR21 and ASBR22 can reach both ePE1 and
      ePE2 using label action of the label on the packet (e.g. for VPN
      label on the same distance

   o  In "domain 2", only ASBR23 can reach ePE3
   o  In "domain 1", iPE (the ingress PE) can reach ASBR11, ASBR12, and
      ASBR13 via IGP using PE, the label action is "push"). As
      mentioned in Section 1.2, the value of the "path-index" stored
      in path may not necessarily be the same distance.

   We will make value of the following assumptions about location of
      the labels

   o  The VPN labels advertised by ePE1 and ePE2 for path in the pathlist.

   5. Move to the parent of the chosen path "Pi"

   6. If the chosen path "Pi" is recursive, move to its parent prefix VPN-IP1 are
      VPN-L11 and VPN-L21, respectively

   o  The VPN labels advertised by ePE2
      and ePE3 for prefix VPN-IP2 are
      VPN-L22 and VPN-L32, respectively

   o  The labels advertised by ASBR11 go to iPE using BGP-LU [4] for step 2

   7. If the
      egress PEs ePE1 and ePE2 are LASBR11(ePE1) and LASBR11(ePE2),
      respectively.

   o  The labels advertised by ASBR12 chosen path is non-recursive move to iPE using BGP-LU [4] for the
      egress PEs ePE1 and ePE2 are LASBR12(ePE1) and LASBR12(ePE2),
      respectively

   o  The label advertised by ASBR11 its parent adjacency.
      Otherwise go to iPE using BGP-LU [4] for the
      egress PE ePE3 is LASBR13(ePE3)

   o  The IGP labels advertised by the next hops directly connected to
      iPE towards ASBR11, ASBR12, and ASBR13 step.

   8. Encapsulate the packet in the core of domain 1
      are IGP-L11, IGP-L12, layer string specified by the
      adjacency and IGP-L13, respectively.

   The diagram in Figure 5 illustrates send the packet out.

   Let's apply the above forwarding steps to the forwarding chain
   depicted in Figure 2 in Section 2. Suppose a packet arrives at
   ingress PE iPE
   assuming that from an external neighbor. Assume the packet matches
   the VPN prefix VPN-IP1. While walking the forwarding hardware in iPE supports 3 levels of
   hierarchy. The leaves corresponding chain, the
   forwarding engine applies a hashing algorithm to choose the ABSRs on domain 1
   (ASBR11, ASBR12, path and ASBR13) are
   the hashing at the bottom of BGP level yields path 0 while the hierarchy.
   There are few important points:

   o  Because hashing at the hardware supports
   IGP level yields path 1. In that case, the required depth packet will be sent out
   of hierarchy, interface I2 with the sizes label stack "IGP-L12,VPN-L11".

5. Handling Platforms with Limited Levels of a pathlist equal Hierarchy

   This section describes the size construction of the label list
      associated with the leaves using this pathlist forwarding chain if a
   platform does not support the number of recursion levels required to
   resolve the NLRIs. There are two main design objectives

   o  The index inside  Being able to reduce the number of hierarchical levels from any
      arbitrary value to a smaller arbitrary value that can be
      supported by the forwarding engine

   o  Minimal modifications to the forwarding algorithm due to such
      reduction.

5.1. Flattening the Forwarding Chain

   Let's consider a pathlist entry indicates associated with the label leaf "R1" consisting
   of the list of paths <P1, P2,..., Pn>. Assume that the leaf "R1" has
   an Outlabel-list <L1, L2,..., Ln>. Suppose the path Pi is a
   recursive path that resolves via a prefix represented by the leaf
   "R2". The leaf "R2" itself is pointing to a pathlist consisting of
   the paths <Q1, Q2,..., Qm>

   If the platform supports the number of hierarchy levels of the
   forwarding chain, then a packet that uses the path "Pi" will be picked
   forwarded as follows:

   1. The forwarding engine is now at leaf "R1"

   2. So it moves to its parent pathlist, which contains the list <P1,
      P2,..., Pn>.

   3. The forwarding engine applies a hashing algorithm and picks the
      path "Pi". So now the forwarding engine is at the path "Pi"

   4. The forwarding engine retrieves the label "Li" from the Outlabel-List if that outlabel-
      list attached to the leaf "R1" and applies the label action

   5. The path "Pi" uses the leaf "R2"

   6. The forwarding engine walks forward to the leaf "R2" for
      resolution

   7. The forwarding plane performs a hash to pick a path among the
      pathlist of the leaf "R2", which is chosen by <Q1, Q2,..., Qm>

   8. Suppose the forwarding engine hashing function.

   Outlabel-List                                      Outlabel-List
     For VPN-IP1                                         For VPN-IP2
   +------------+    +--------+           +-------+   +------------+
   |  VPN-L11   |<---| VPN-IP1|           |VPN-IP2|-->|  VPN-L22   |
   +------------+    +---+----+           +---+---+   +------------+
   |  VPN-L21   |        |                    |       |  VPN-L32   |
   +------------+        |                    |       +------------+
                         |                    |
                         V                    V
                    +---+---+            +---+---+
                    | 0 | 1 |            | 0 | 1 |
                    +-|-+-\-+            +-/-+-\-+
                      |    \              /     \
                      |     \            /       \
                      |      \          /         \
                      |       \        /           \
                      v        \      /             \
                 +-----+       +-----+             +-----+
            +----+ ePE1|       |ePE2 +-----+       | ePE3+-----+
            |    +--+--+       +-----+     |       +--+--+     |
            v       |            /         v          |        v
   +-------------+  |           /   +-------------+   | +-------------+
   |LASBR11(ePE1)|  |          /    |LASBR11(ePE2)|   | |LASBR13(ePE3)|
   +-------------+  |         /     +-------------+   | +-------------+
   |LASBR12(ePE1)|  |        /      |LASBR12(ePE2)|   | Outlabel-List
   +-------------+  |       /       +-------------+   |    For ePE3
   Outlabel-List    |      /        Outlabel-List     |
       For ePE1     |     /           For ePE2        |
                    |    /                            |
                    |   /                             |
                    |  /                              |
                    v /                               v
                +---+---+  Shared Pathlist          +---+  Pathlist
                | 0 | 1 | For ePE1 and ePE2         | 0 |  For ePE3
                +-|-+-\-+                           +-|-+
                  |    \                              |
                  |     \                             |
                  |      \                            |
                  |       \                           |
                  v        \                          v
               +------+    +------+               +------+
           +---+ASBR11|    |ASBR12+--+            |ASBR13+---+
           |   +------+    +------+  |            +------+   |
           v                         v                       v
      +-------+                  +-------+              +-------+
      |IGP-L11|                  |IGP-L12|              |IGP-L13|
      +-------+                  +-------+              +-------+

       Figure 5 : Forwarding Chain for hardware supporting 3 Levels picks the path "Qj"

   9. Now suppose the hardware on iPE (the ingress PE) supports 2 levels forwarding engine continues the walk to the parent of
      "Qj"

   Suppose the platform cannot support the number of hierarchy only. In that case, levels
   in the 3-levels forwarding chain in
   Figure 5 chain. FIB needs to be "flattended" into 2 reduce the number of hierarchy
   levels. The idea of reducing the number of hierarchy levels only.

   Outlabel-List                                  Outlabel-List
     For VPN-IP1                                    For VPN-IP2
   +------------+    +-------+      +-------+     +------------+
   |  VPN-L11   |<---|VPN-IP1|      | VPN-IP2|--->|  VPN-L22   |
   +------------+    +---+---+      +---+---+     +------------+
   |  VPN-L21   |        |              |         |  VPN-L32   |
   +------------+        |              |         +------------+
                         |              |
                         |              |
                         |              |
          Flattened      |              |  Flattened
          pathlist       V              V   pathlist
                    +===+===+        +===+===+===+     +=============+
           +--------+ 0 | 1 |        | 0 | 0 | 1 +---->|LASBR11(ePE2)|
           |        +=|=+=\=+        +=/=+=/=+=\=+     +=============+
           v          |    \          /   /     \      |LASBR12(ePE2)|
    +=============+   |     \  +-----+   /       \     +=============+
    |LASBR11(ePE1)|   |      \/         /         \    |LASBR13(ePE3)|
    +=============+   |      /\        /           \   +=============+
    |LASBR12(ePE1)|   |     /  \      /             \
    +=============+   |    /    \    /               \
                      |   /      \  /                 \
                      |  /       +  +                  \
                      |  +       |  |                   \
                      |  |       |  |                    \
                      v  v       v  v                     \
                    +------+    +------+              +------+
               +----|ASBR11|    |ASBR12+---+          |ASBR13+---+
               |    +------+    +------+   |          +------+   |
               v                           v                     v
           +-------+                  +-------+              +-------+
           |IGP-L11|                  |IGP-L12|              |IGP-L13|
           +-------+                  +-------+              +-------+

     Figure 6 : Flattening 3 levels to 2 levels of Hierarchy on iPE

   Figure 6 represents one way is to
   "flatten" a 3 two chain levels hierarchy into
   two levels. There are few important points:

   o  The flattened pathlists have label lists associated with them. a single level. The size "flattening"
   steps are as follows

   1. FIB wants to reduce the number of levels used by "Pi" by 1

   2. FIB walks to the label list associated with parent of "Pi", which is the flattened pathlist
      equals leaf "R2"

   3. FIB extracts the size parent pathlist of the pathlist. Hence it leaf "R2", which is possible <Q1,
      Q2,..., Qm>

   4. FIB also extracts the OutLabel-list(R2) associated with the leaf
      "R2". Remember that an
      implementation includes these label lists in OutLabel-list(R2) = <L1, L2,..., Lm>

   5. FIB replaces the flattened
      pathlist itself

   o  Because of "flattening", path "Pi", with the size of a flattened pathlist may not
      be equal to the size of the label lists list of leaves using paths <Q1, Q2,...,
      Qm>

   6. Hence the
      flattened pathlist.

   o path list <P1, P2,..., Pn> now becomes "<P1, P2,...,Pi-
      1, Q1, Q2,..., Qm, Pi+1, Pn>

   7. The indices path index stored inside a flattened pathlist still indicate the label
     index in locations "Q1", "Q2", ..., "Qm"
      must all be "i" because the Outlabel-Lists of index "i" refers to the leaves using that pathlist.
     Because label "Li"
      associated with leaf "R1"

   8. FIB attaches an OutLabel-list with the new pathlist as follows:
      <Unlabeled,..., Unlabeled, L1, L2,..., Lm, Unlabeled, ...,
      Unlabeled>. The size of the label list associated with the
      flattened pathlist may be different from equals the size of the label lists of the leaves, the indices may be
     repeated

   o  Let's take pathlist. Hence there
      is a look at 1-1 mapping between every path in the flattened "flattened" pathlist used by
      and the prefix
      "VPN-IP2", The pathlist OutLabel-list associated with the prefix "VPN-IP2" has
      three entries.

       o The first and second entry have index "0". This it.

   It is because
         both entries correspond noteworthy to ePE2. Hence when hashing performed
         by mention that the forwarding engine results labels in using first or the second
         entry in outlabel-list
   associated with the pathlist, "flattened" pathlist may be stored in the forwarding engine will pick same
   memory location as the
         correct VPN label "VPN-L22", which path itself to avoid additional memory
   access. But that is the label advertised by
         ePE2 for the prefix "VPN-IP2"

       o The third entry has the index "1". This an implementation detail that is because beyond the third
         entry corresponds
   scope of this document.

   The same steps can be applied to ePE3. Hence when the hashing is performed
         by the forwarding engine results all paths in using the third entry pathlist <P1,
   P2,..., Pn> so that all paths are "flattened" thereby reducing the
   number of hierarchical levels by one. Note that that "flattening" a
   pathlist pulls in all paths of the flattened pathlist, the forwarding engine will pick the
         correct VPN label "VPN-L32", which is the label advertised by
         "ePE3" for the prefix "VPN-IP2"

4. Forwarding Behavior

   This section explains how the forwarding plane uses the hierarchical
   shared forwarding chain to forward a packet.

   When parent paths, a packet arrives desired feature
   to utilize all ECMP/UCMP paths at a router, it matches a leaf. all levels. A labeled
   packet matches a label leaf while an IP packet matches an IP prefix
   leaf. The forwarding engines walks the forwarding chain starting
   from the leaf until the walk terminates on an adjacency. Thus when platform that has a
   packet arrives, the chain is walked as follows:

   1. Lookup the leaf based
   limit on the destination address or the label at
      the top number of the packet

   2. Retrieve the parent paths in a pathlist of the for any given leaf
   3. Pick the outgoing path from may
   choose to reduce the list of resolved number paths in the
      pathlist. The method by which the outgoing path is picked is using methods that are beyond the
   scope of this document (i.e. flow-preserving hash
      exploiting entropy within document.

   The steps can be recursively applied to other paths at the MPLS stack and IP header). Let same
   levels or other levels to recursively reduce the
      "path-index" number of
   hierarchical levels to an arbitrary value so as to accommodate the
   capability of the outgoing path forwarding engine.

   Because a flattened pathlist may have an associated OutLabel-list
   the forwarding behavior has to be "i". slightly modified. The
   modification is done by adding the following step right after step 4
   in Section 4.

   5. If there is an OutLabel-list associated with the prefix pathlist, then
      if the path "Pi" is labeled, use chosen by the "path-index" "i" to hashing algorithm, retrieve the ith
      label "Li" stored the ith entry in the OutLabel-List at location "i" in that OutLabel-list and apply the label
      action of the label on the packet (e.g. for VPN that label on the ingress PE, the label action is "push").

   5. Move to the parent of the chosen path "i"

   6. If the chosen path "i" is recursive, move to its parent prefix
      and go to step 2

   7. If the chosen path "i" is non-recursive move to its parent
      adjacency

   8. Encapsulate the packet in the L2 string specified by the
      adjacency and send

   In the packet out.

   Let's next subsection, we apply the above forwarding steps in this subsection to the a
   sample scenario.

5.2. Example: Flattening a forwarding chain
   depicted in Figure 2 in Section 2. Suppose

   This example uses a packet arrives at
   ingress PE iPE from an external neighbor. Assume the packet matches
   the VPN prefix VPN-IP1. While walking case of inter-AS option C [8] where there are 3
   levels of hierarchy. Figure 4 illustrates the forwarding chain, sample topology. To
   force 3 levels of hierarchy, the
   forwarding engine applies a hashing algorithm to choose ASBRs on the path and ingress domain (domain
   1) advertise the hashing at core routers of the BGP level yields path 0 while egress domain (domain 2) to the hashing at
   ingress PE (iPE) via BGP-LU [4] instead of redistributing them into
   the IGP level yields path of domain 1. In The end result is that case, the packet will be sent out ingress PE (iPE) has
   2 levels of interface I2 with the label stack "IGP-L12,VPN-L11".

   Now let's try and apply the above steps to the flattened forwarding
   chain illustrated in Figure 6.

   o  Suppose a packet arrives at "iPE" and matches recursion for the VPN prefix
      "VPN-IP2"

   o  The forwarding engine walks to the parent of the "VPN_P2", which
      is VPN-IP1 and VPN2-IP2.

        Domain 1                  Domain 2
    +-------------+            +-------------+
    |             |            |             |
    | LDP/SR Core |            | LDP/SR core |
    |             |            |             |
    |     (192.0.1.1)          |             |
    |         ASBR11---------ASBR21........ePE1(192.0.2.1)
    |             |  \      /  |   .      .  |\
    |             |   \    /   |    .    .   | \
    |             |    \  /    |     .  .    |  \
    |             |     \/     |      ..     |   \VPN-IP1 (11.1.1.0/24)
    |             |     /\     |      . .    |   /VRF "Blue" ASn: 65000
    |             |    /  \    |     .   .   |  /
    |             |   /    \   |    .     .  | /
    |             |  /      \  |   .       . |/
   iPE        ASBR12---------ASBR22........ePE2 (192.0.2.2)
    |     (192.0.1.2)          |             |\
    |             |            |             | \
    |             |            |             |  \
    |             |            |             |   \VRF "Blue" ASn: 65000
    |             |            |             |   /VPN-IP2 (11.1.2.0/24)
    |             |            |             |  /
    |             |            |             | /
    |             |            |             |/
    |         ASBR13---------ASBR23........ePE3(192.0.2.3)
    |     (192.0.1.3)          |             |
    |             |            |             |
    |             |            |             |
    +-------------+            +-------------+
     <============  <=========  <============
    Advertise ePEx   Advertise   Redistribute
    Using iBGP-LU    ePEx Using    IGP into
                       eBGP-LU        BGP

               Figure 4 : Sample 3-level hierarchy topology

   We will make the flattened pathlist following assumptions about connectivity

   o  In "domain 2", both ASBR21 and applies a hashing algorithm to pick
      a path ASBR22 can reach both ePE1 and
      ePE2 using the same distance

   o  Suppose  In "domain 2", only ASBR23 can reach ePE3

   o  In "domain 1", iPE (the ingress PE) can reach ASBR11, ASBR12, and
      ASBR13 via IGP using the hashing by same distance.

   We will make the forwarding engine picks following assumptions about the second
      entry in labels
   o  The VPN labels advertised by ePE1 and ePE2 for prefix VPN-IP1 are
      VPN-L11 and VPN-L21, respectively

   o  The VPN labels advertised by ePE2 and ePE3 for prefix VPN-IP2 are
      VPN-L22 and VPN-L32, respectively

   o  The labels advertised by ASBR11 to iPE using BGP-LU [4] for the flattened pathlist associated with
      egress PEs ePE1 and ePE2 are LASBR11(ePE1) and LASBR11(ePE2),
      respectively.

   o  The labels advertised by ASBR12 to iPE using BGP-LU [4] for the leaf "VPN-
      IP2".
      egress PEs ePE1 and ePE2 are LASBR12(ePE1) and LASBR12(ePE2),
      respectively

   o  Because  The label advertised by ASBR11 to iPE using BGP-LU [4] for the second entry has
      egress PE ePE3 is LASBR13(ePE3)

   o  The IGP labels advertised by the index "0", next hops directly connected to
      iPE towards ASBR11, ASBR12, and ASBR13 in the label "VPN-L22"
      is pushed core of domain 1
      are IGP-L11, IGP-L12, and IGP-L13, respectively.

   Based on these connectivity assumptions and the packet

   o  At topology in Figure
   4, the same time, routing table on iPE is

         65000:11.1.1.0/24
            via ePE1 (192.0.2.1), VPN Label: VPN-L11
            via ePE2 (192.0.2.2), VPN Label: VPN-L21
         65000:11.1.2.0/24
            via ePE1 (192.0.2.2), VPN Label: VPN-L22
            via ePE2 (192.0.2.3), VPN Label: VPN-L23

         192.0.2.1/32 (ePE1)
            Via ASBR11, BGP-LU Label: LASBR11(ePE1)
            Via ASBR12, BGP-LU Label: LASBR12(ePE1)
         192.0.2.2/32 (ePE2)
            Via ASBR11, BGP-LU Label: LASBR11(ePE2)
            Via ASBR12, BGP-LU Label: LASBR12(ePE2)
         192.0.2.3/32 (ePE3)
            Via ASBR13, BGP-LU Label: LASBR13(ePE3)

         192.0.1.1/32 (ASBR11)
            via Core, Label:  IGP-L11
         192.0.1.2/32 (ASBR12)
            via Core, Label:  IGP-L12
         192.0.1.3/32 (ASBR13)
            via Core, Label:  IGP-L13

   The diagram in Figure 5 illustrates the forwarding engine picks the second label
      from the Outlabel-Array associated with the flattened pathlist.
      Hence the next label chain in iPE
   assuming that is pushed is "LASBR12(ePE2)"

   o  The forwarding engine now moves to the parent forwarding hardware in iPE supports 3 levels of the flattened
      pathlist corresponding to the second entry.
   hierarchy. The parent is the IGP
      label leaf leaves corresponding to "ASBR12"

   o  So the packet is forwarded towards the ASBR "ASBR12" ABSRs on domain 1
   (ASBR11, ASBR12, and the IGP
      label ASBR13) are at the top will be "L12"

   Based on the above steps, a packet arriving at iPE and destined to bottom of the prefix VPN-L22 reaches its destination as follows hierarchy.
   There are few important points:

   o  iPE sends the packet along the shortest path towards  ASBR12 with  Because the following label stack starting from hardware supports the top: {L12,
      LASBR12(ePE2), VPN-L22}.

   o  The penultimate hop required depth of ASBR12 pops hierarchy,
      the top label "L12". Hence sizes of a pathlist equal the
      packet arrives at ASBR12 with size of the label stack {LASBR12(ePE2),
      VPN-L22} where "LASBR12(ePE2)" is the top label.

   o  ASBR12 swaps "LASBR12(ePE2)" list
      associated with the label "LASBR22(ePE2)",
      which is the label advertised by ASBR22 for the ePE2 (the egress
      PE). leaves using this pathlist

   o  ASBR22 receives the packet with "LASBR22(ePE2)" at  The index inside the top.

   o  Hence ASBR22 swaps "LASBR22(ePE2)" with pathlist entry indicates the IGP label for ePE2
      advertised by the next-hop towards ePE2 in domain 2, and sends
      the packet along the shortest path towards ePE2.

   o  The penultimate hop of ePE2 pops the top label. Hence ePE2
      receives that will
      be picked from the packet Outlabel-List associated with the top label VPN-L22 at child leaf
      if that path is chosen by the top

   o forwarding engine hashing function.

   Outlabel-List                                      Outlabel-List
     For VPN-IP1                                         For VPN-IP2
   +------------+    +--------+           +-------+   +------------+
   |  VPN-L11   |<---| VPN-IP1|           |VPN-IP2|-->|  VPN-L22   |
   +------------+    +---+----+           +---+---+   +------------+
   |  VPN-L21   |        |                    |       |  VPN-L32   |
   +------------+        |                    |       +------------+
                         |                    |
                         V                    V
                    +---+---+            +---+---+
                    | 0 | 1 |            | 0 | 1 |
                    +-|-+-\-+            +-/-+-\-+
                      |    \              /     \
                      |     \            /       \
                      |      \          /         \
                      |       \        /           \
                      v        \      /             \
                 +-----+       +-----+             +-----+
            +----+ ePE1|       |ePE2 +-----+       | ePE3+-----+
            |    +--+--+       +-----+     |       +--+--+     |
            v       |            /         v          |        v
   +-------------+  |           /   +-------------+   | +-------------+
   |LASBR11(ePE1)|  |          /    |LASBR11(ePE2)|   | |LASBR13(ePE3)|
   +-------------+  |         /     +-------------+   | +-------------+
   |LASBR12(ePE1)|  |        /      |LASBR12(ePE2)|   | Outlabel-List
   +-------------+  |       /       +-------------+   |    For ePE3
   Outlabel-List    |      /        Outlabel-List     |
       For ePE1     |     /           For ePE2 pops "VPN-L22"        |
                    |    /                            |
                    |   /                             |
                    |  /                              |
                    v /                               v
                +---+---+  Shared Pathlist          +---+  Pathlist
                | 0 | 1 | For ePE1 and sends the packet as a pure IP packet
      towards the destination VPN-IP2.

5. ePE2         | 0 |  For ePE3
                +-|-+-\-+                           +-|-+
                  |    \                              |
                  |     \                             |
                  |      \                            |
                  |       \                           |
                  v        \                          v
               +------+    +------+               +------+
           +---+ASBR11|    |ASBR12+--+            |ASBR13+---+
           |   +------+    +------+  |            +------+   |
           v                         v                       v
      +-------+                  +-------+              +-------+
      |IGP-L11|                  |IGP-L12|              |IGP-L13|
      +-------+                  +-------+              +-------+

       Figure 5 : Forwarding Chain Adjustment at a Failure

   The hierarchical and shared structure of for hardware supporting 3 Levels

   Now suppose the forwarding chain
   explained in Section hardware on iPE (the ingress PE) supports 2 allows modifying a small number of
   forwarding chain objects to re-route traffic to a pre-calculated
   equal-cost or backup path without the need to modify the possibly
   very large number levels
   of BGP prefixes. hierarchy only. In this section, we go over
   various core and edge failure scenarios to illustrate how FIB
   manager can utilize that case, the 3-levels forwarding chain structure in
   Figure 5 needs to achieve BGP
   prefix independent convergence.

   5.1. BGP-PIC core

   This section describes the adjustments be "flattened" into 2 levels only.

   Outlabel-List                                  Outlabel-List
     For VPN-IP1                                    For VPN-IP2
   +------------+    +-------+      +-------+     +------------+
   |  VPN-L11   |<---|VPN-IP1|      | VPN-IP2|--->|  VPN-L22   |
   +------------+    +---+---+      +---+---+     +------------+
   |  VPN-L21   |        |              |         |  VPN-L32   |
   +------------+        |              |         +------------+
                         |              |
                         |              |
                         |              |
          Flattened      |              |  Flattened
          pathlist       V              V   pathlist
                    +===+===+        +===+===+===+     +=============+
           +--------+ 0 | 1 |        | 0 | 0 | 1 +---->|LASBR11(ePE2)|
           |        +=|=+=\=+        +=/=+=/=+=\=+     +=============+
           v          |    \          /   /     \      |LASBR12(ePE2)|
    +=============+   |     \  +-----+   /       \     +=============+
    |LASBR11(ePE1)|   |      \/         /         \    |LASBR13(ePE3)|
    +=============+   |      /\        /           \   +=============+
    |LASBR12(ePE1)|   |     /  \      /             \
    +=============+   |    /    \    /               \
                      |   /      \  /                 \
                      |  /       +  +                  \
                      |  +       |  |                   \
                      |  |       |  |                    \
                      v  v       v  v                     \
                    +------+    +------+              +------+
               +----|ASBR11|    |ASBR12+---+          |ASBR13+---+
               |    +------+    +------+   |          +------+   |
               v                           v                     v
           +-------+                  +-------+              +-------+
           |IGP-L11|                  |IGP-L12|              |IGP-L13|
           +-------+                  +-------+              +-------+

     Figure 6 : Flattening 3 levels to 2 levels of Hierarchy on iPE

   Figure 6 represents one way to the forwarding chain when "flatten" a core link or node fails but the BGP next-hop remains reachable.

   There are 3 levels hierarchy into
   two case: remote link failure and attached link failure.
   Node failures levels. There are treated as link failures.

   When a remote link or node fails, IGP on the ingress PE receives
   advertisement indicating a topology change so IGP re-converges to
   either find few important points:

   o  As mentioned in Section 5.1 a new next-hop and/or outgoing interface or remove the
   path completely from the IGP prefix used to resolve BGP next-hops.
   IGP and/or LDP download the modified IGP leaves flattened pathlist may have label
      lists associated with modified
   outgoing labels for labeled core.

   When a local link fails, FIB manager detects the failure almost
   immediately. them. The FIB manager marks size of the impacted path(s) as unusable
   so that only useable paths are used to forward packets. Hence only
   IGP pathlists label list associated
      with paths using the failed local link need to be
   modified. All other pathlists are not impacted. Note that in this
   particular case there is actually no need even to backwalk to IGP
   leaves to adjust the OutLabel-Lists because FIB can rely on the
   path-index stored in the useable paths in the a flattened pathlist to pick the
   right label.

   It is noteworthy to mention that because FIB manager modifies equals the
   forwarding chain starting from size of the IGP leaves only, BGP pathlists
   and leaves are not modified. pathlist. Hence traffic restoration occurs within
      it is possible that an implementation includes these label lists
      in the time frame flattened pathlist itself

   o  Again as mentioned in Section 5.1, the size of IGP convergence, and, for local link failure,
   assuming a backup path has been precomputed, within flattened
      pathlist may not be equal to the timeframe size of
   local detection (e.g. 50ms). Examples the OutLabel-lists of solutions that pre-
   computing backup paths are IP FRR [16] remote LFA [17], Ti-LFA [15]
   and MRT [18] or eBGP path having a backup path [10].

   Let's apply
      leaves using the procedure to flattened pathlist. So the forwarding chain depicted in Figure
   2. Suppose indices inside a remote link failure occurs and impacts the first ECMP
   IGP path to
      flattened pathlist still indicate the remote BGP next-hop. Upon IGP convergence, label index in the IGP
   pathlist used by
      Outlabel-Lists of the BGP next-hop is updated to reflect leaves using that pathlist. Because the new
   topology (one path instead
      size of two). As soon as the IGP convergence
   is effective for flattened pathlist may be different from the BGP next-hop entry, size of
      the new forwarding state is
   immediately available to all dependent BGP prefixes. The same
   behavior would occur if OutLabel-lists of the failure was local such as an interface
   going down. As soon as leaves, the IGP convergence is complete for indices may be repeated.

   o  Let's take a look at the BGP
   next-hop IGP route, all its BGP depending routes benefit from flattened pathlist used by the
   new path. In fact, upon local failure, if LFA protection is enabled
   for prefix
      "VPN-IP2", The pathlist associated with the IGP route prefix "VPN-IP2" has
      three entries.

       o The first and second entry have index "0". This is because
         both entries correspond to ePE2. Hence when hashing performed
         by the BGP next-hop and a backup path was pre-
   computed and installed forwarding engine results in using first or the second
         entry in the pathlist, upon the local interface
   failure, forwarding engine will pick the LFA backup path
         correct VPN label "VPN-L22", which is immediately activated (sub-50msec)
   and thus protection benefits all the depending BGP traffic through label advertised by
         ePE2 for the hierarchical forwarding dependency between prefix "VPN-IP2"

       o The third entry has the routes.

   5.2. BGP-PIC edge index "1". This section describes is because the adjustments third
         entry corresponds to ePE3. Hence when the hashing is performed
         by the forwarding chains as a
   result of edge node or edge link failure.

      5.2.1. Adjusting forwarding Chain engine results in egress node failure

   When an edge node fails, IGP on neighboring core nodes send route
   updates indicating that the edge node is no longer reachable. IGP
   running on using the iBGP peers instructs FIB to remove third entry in
         the IP and label
   leaves corresponding to flattened pathlist, the failed edge node from FIB. So FIB
   manager performs forwarding engine will pick the following steps:

   o  FIB manager deletes
         correct VPN label "VPN-L32", which is the IGP leaf corresponding to label advertised by
         "ePE3" for the failed edge
      node

   o  FIB manager backwalks to all dependent BGP pathlists prefix "VPN-IP2"

   Now let's try and marks
      that path using apply the deleted IGP leaf as unresolved

   o  Note that there is no need to modify BGP leaves because each path forwarding steps in Section 4. together
   with the pathlist carries its path index and hence the correct
      outgoing label will be picked. Consider for example additional step in Section 5.1 to the flattened forwarding
   chain depicted illustrated in Figure 2. If 6.

   o  Suppose a packet arrives at "iPE" and matches the 1st BGP VPN prefix
      "VPN-IP2"

   o  The forwarding engine walks to the parent of the "VPN-IP2", which
      is the flattened pathlist and applies a hashing algorithm to pick
      a path
      becomes unresolved, then

   o  Suppose the hashing by the forwarding engine will only use picks the second
      path for forwarding. Yet in the pathindex of that single
      resolved flattened pathlist associated with the leaf "VPN-
      IP2".

   o  Because the second path will still be 1 and hence has the index "0", the label VPN-L12 will be
      pushed.

      5.2.2. Adjusting Forwarding Chain "VPN-L22" is
      pushed on PE-CE link Failure

   Suppose the link between an edge router and its external peer fails.
   There are two scenarios (1) the edge node attached to the failed
   link performs next-hop self and (2) packet

   o  Next the edge node attached to forwarding engine picks the
   failure advertises second label from the IP address of
      Outlabel-Array associated with the failed link as flattened pathlist. Hence the next-hop
   attribute
      next label that is pushed is "LASBR12(ePE2)"

   o  The forwarding engine now moves to its iBGP peers.

   In the first case, the rest of iBGP peers will remain unaware parent of the
   link failure and will continue to forward traffic to the edge node
   until the edge node attached flattened
      pathlist corresponding to the failed link withdraws the BGP
   prefixes. If second path. The parent is the destination prefixes are multi-homed IGP
      label leaf corresponding to another
   iBGP peer, say ePE2, then FIB manager on the edge router detecting "ASBR12"

   o  So the link failure applies packet is forwarded towards the following steps:

   o  FIB manager backwalks to ASBR "ASBR12" and the BGP pathlists marks IGP
      label at the path through top will be "L12"

   Based on the failed link above steps, a packet arriving at iPE and destined to
   the external peer prefix VPN-L22 reaches its destination as unresolved follows

   o  Hence traffic will be forwarded used  iPE sends the backup packet along the shortest path towards ePE2
   o  For labeled traffic

       o The Outlabel-List attached to  ASBR12 with
      the BGP leaf already contains
          an entry corresponding to following label stack starting from the backup path. top: {L12,
      LASBR12(ePE2), VPN-L22}.

   o  The penultimate hop of ASBR12 pops the top label entry in OutLabel-List corresponding to "L12". Hence the
          internal path to backup egress PE has swap action to
      packet arrives at ASBR12 with the label stack {LASBR12(ePE2),
      VPN-L22} where "LASBR12(ePE2)" is the top label.

   o  ASBR12 swaps "LASBR12(ePE2)" with the label "LASBR22(ePE2)",
      which is the label advertised by backup ASBR22 for the ePE2 (the egress PE
      PE).

   o For an arriving label  ASBR22 receives the packet (e.g. VPN), with "LASBR22(ePE2)" at the top label is
          swapped top.

   o  Hence ASBR22 swaps "LASBR22(ePE2)" with the IGP label for ePE2
      advertised by backup egress PE the next-hop towards ePE2 in domain 2, and sends
      the packet is sent along the shortest path towards that backup egress PE ePE2.

   o  For unlabeled traffic, packets are simply redirected  The penultimate hop of ePE2 pops the top label. Hence ePE2
      receives the packet with the top label VPN-L22 at the top

   o  ePE2 pops "VPN-L22" and sends the packet as a pure IP packet
      towards the destination VPN-IP2.

6. Forwarding Chain Adjustment at a Failure

   The hierarchical and shared structure of the forwarding chain
   explained in the previous section allows modifying a small number of
   forwarding chain objects to re-route traffic to a pre-calculated
   equal-cost or backup egress PE.

   In path without the second case where need to modify the possibly
   very large number of BGP prefixes. In this section, we go over
   various core and edge router uses failure scenarios to illustrate how FIB
   manager can utilize the IP address of forwarding chain structure to achieve BGP
   prefix independent convergence.

6.1. BGP-PIC core

   This section describes the
   failed adjustments to the forwarding chain when
   a core link as or node fails but the BGP next-hop, next-hop remains reachable.

   There are two case: remote link failure and attached link failure.
   Node failures are treated as link failures.

   When a remote link or node fails, IGP on the edge router will still perform ingress PE receives
   advertisement indicating a topology change so IGP re-converges to
   either find a new next-hop and/or outgoing interface or remove the previous steps. But, unlike
   path completely from the case of next-hop self, IGP on
   failed edge node informs prefix used to resolve BGP next-hops.
   IGP and/or LDP download the rest of modified IGP leaves with modified
   outgoing labels for labeled core.

   When a local link fails, FIB manager detects the iBGP peers failure almost
   immediately. The FIB manager marks the impacted path(s) as unusable
   so that IP address
   of only useable paths are used to forward packets. Hence only
   IGP pathlists with paths using the failed local link need to be
   modified. All other pathlists are not impacted. Note that in this
   particular case there is actually no longer reachable. Hence the FIB manager on
   iBGP peers will delete the need even to backwalk to IGP leaf corresponding
   leaves to adjust the IP prefix
   of OutLabel-Lists because FIB can rely on the failed link. The behavior of
   path-index stored in the iBGP peers will be identical useable paths in the pathlist to pick the case of edge node failure outlined in Section 5.2.1.
   right label.

   It is noteworthy to mention that because FIB manager modifies the edge link failure is
   forwarding chain starting from the IGP leaves only. BGP pathlists
   and leaves are not modified. Hence traffic restoration occurs within
   the time frame of IGP convergence, and, for local to link failure,
   assuming a backup path has been precomputed, within the edge router, sub-50 msec convergence can be achieved as
   described in timeframe of
   local detection (e.g. 50ms). Examples of solutions that pre-
   computing backup paths are IP FRR [16] remote LFA [17], Ti-LFA [15]
   and MRT [18] or eBGP path having a backup path [10].

   Let's try to apply the case of next-hop self procedure mentioned in this subsection to the
   forwarding chain depicted in Figure 3. After 2. Suppose a remote link failure
   occurs and impacts the first ECMP IGP path to the remote BGP next-
   hop. Upon IGP convergence, the IGP pathlist used by the BGP next-hop
   is updated to reflect the new topology (one path instead of two). As
   soon as the link between ePE1 and CE, IGP convergence is effective for the BGP next-hop entry,
   the new forwarding state is immediately available to all dependent
   BGP prefixes. The same behavior would occur if the failure was local
   such as an interface going down. As soon as the IGP convergence is
   complete for the BGP next-hop IGP route, all its BGP depending
   routes benefit from the new path. In fact, upon local failure, if
   LFA protection is enabled for the forwarding engine will IGP route traffic arriving from to the core
   towards VPN-NH2 with path-index=1. A packet arriving from BGP next-hop and
   a backup path was pre-computed and installed in the core
   will contain pathlist, upon
   the label VPN-L11 at top. The label VPN-L11 is swapped
   with local interface failure, the label VPN-L21 LFA backup path is immediately
   activated (e.g. sub-50msec) and thus protection benefits all the packet is forwarded towards ePE2.

   5.3. Handling Failures for Flattended Forwarding Chains

   As explained in
   depending BGP traffic through the Example in Section 3.2 if hierarchical forwarding dependency
   between the number of
   hierarchy levels of a platform cannot support routes.

6.2. BGP-PIC edge

   This section describes the native number of
   hierarchy levels of a recursive forwarding chain, adjustments to the instantiated forwarding chain is constructed by flattening two or more levels.
   Hence chains as a 3 levels chain
   result of edge node or edge link failure.

6.2.1. Adjusting forwarding Chain in Figure 5 egress node failure

   When an edge node fails, IGP on neighboring core nodes send route
   updates indicating that the edge node is flattened into no longer reachable. IGP
   running on the 2 levels
   chain in Figure 6.

   While reducing iBGP peers instructs FIB to remove the benefits of BGP-PIC, flattening one hierarchy
   into a shallower hierarchy does not always result in a complete loss
   of IP and label
   leaves corresponding to the benefits of failed edge node from FIB. So FIB
   manager performs the BGP-PIC. To illustrate this fact suppose
   ASBR12 following steps:

   o  FIB manager deletes the IGP leaf corresponding to the failed edge
      node

   o  FIB manager backwalks to all dependent BGP pathlists and marks
      that path using the deleted IGP leaf as unresolved

   o  Note that there is no longer reachable need to modify the possibly large number of
      BGP leaves because each path in domain 1. If the platform supports pathlist carries its path
      index and hence the full hierarchy depth, correct outgoing label will be picked.
      Consider for example the forwarding chain is the one depicted in Figure 5 and hence the FIB manager needs to backwalk one level to
   the pathlist shared by "ePE1" and "ePE2" and adjust it. 2.
      If the
   platform supports 2 levels of hierarchy, 1st BGP path becomes unresolved, then a useable the forwarding
   chain is
      engine will only use the one depicted in Figure 6. In second path for forwarding. Yet the path
      index of that case, if ASBR12 is no
   longer reachable, single resolved path will still be 1 and hence the FIB manager has to backwalk to
      label VPN-L12 will be pushed.

6.2.2. Adjusting Forwarding Chain on PE-CE link Failure

   Suppose the two
   flattened pathlists link between an edge router and update both of them.

   The main observation is that its external peer fails.
   There are two scenarios (1) the loss of convergence speed due edge node attached to the loss of hierarchy depth depends on the structure of failed
   link performs next-hop self and (2) the
   forwarding chain itself. To illustrate this fact, let's take two
   extremes. Suppose edge node attached to the forwarding objects in level i+1 depend on
   failure advertises the
   forwarding objects in level i. If every object on level i+1 depends
   on a separate object in level i, then flattening level i into level
   i+1 will not result in loss IP address of convergence speed. Now let's take the
   other extreme. Suppose "n" objects in level i+1 depend on 1 object
   in level i. Now suppose FIB flattens level i into level i+1. If a
   topology change results in modifying failed link as the single object in level i,
   then FIB has next-hop
   attribute to backwalk and modify "n" objects in its iBGP peers.

   In the flattened
   level, thereby losing all first case, the benefit rest of BGP-PIC. Experience shows
   that flattening forwarding chains usually results in moderate loss iBGP peers will remain unaware of BGP-PIC benefits. Further analysis is needed to corroborate the
   link failure and
   quantify this statement.

6. Properties

   6.1. Coverage

   All will continue to forward traffic to the possible failures, except CE edge node failure, are covered,
   whether they impact a local or remote IGP path or a local or remote
   until the edge node attached to the failed link withdraws the BGP next-hop as described in Section 5.  This section provides
   details for each failure and now
   prefixes. If the hierarchical and shared destination prefixes are multi-homed to another
   iBGP peer, say ePE2, then FIB
   structure proposed in this document allows recovery that does not
   depend manager on number of BGP prefixes.

      6.1.1. A remote the edge router detecting
   the link failure on applies the following steps:

   o  FIB manager backwalks to the BGP pathlists marks the path through
      the failed link to the external peer as unresolved

   o  Hence traffic will be forwarded used the backup path towards ePE2

   o  For labeled traffic

       o The Outlabel-List attached to the BGP leaf already contains
          an entry corresponding to the backup path.

       o The label entry in OutLabel-List corresponding to the
          internal path to a BGP next-hop

   Upon IGP convergence, backup egress PE has swap action to the IGP leaf for
          label advertised by backup egress PE

       o For an arriving label packet (e.g. VPN), the BGP next-hop top label is updated
   upon IGP convergence
          swapped with the label advertised by backup egress PE and all the
          packet is sent towards that backup egress PE

   o  For unlabeled traffic, packets are simply redirected towards
      backup egress PE.

   In the second case where the edge router uses the IP address of the
   failed link as the BGP depending routes leverage next-hop, the
   new edge router will still perform
   the previous steps. But, unlike the case of next-hop self, IGP forwarding state immediately.

   This BGP resiliency property only depends on IGP convergence and is
   independent
   failed edge node informs the rest of the number iBGP peers that IP address
   of BGP prefixes impacted.

      6.1.2. A local failure on the path to a BGP next-hop

   Upon LFA protection, failed link is no longer reachable. Hence the FIB manager on
   iBGP peers will delete the IGP leaf for the BGP next-hop is updated corresponding to
   use the precomputed LFA backup path and all the BGP depending routes
   leverage this LFA protection.

   This BGP resiliency property only depends on LFA protection and is
   independent IP prefix
   of the number failed link. The behavior of BGP prefixes impacted.

      6.1.3. A remote iBGP next-hop fails

   Upon IGP convergence, the IGP leaf for iBGP peers will be identical
   to the BGP next-hop case of edge node failure outlined in Section 6.2.1.

   It is deleted
   and all the depending BGP Path-Lists are updated noteworthy to either use mention that because the
   remaining ECMP BGP best-paths or if none remains available edge link failure is
   local to
   activate precomputed backups.

   This BGP resiliency property only depends on IGP the edge router, sub-50 msec convergence and is
   independent of can be achieved as
   described in [10].

   Let's try to apply the number case of BGP prefixes impacted.

      6.1.4. A local eBGP next-hop fails

   Upon local link self to the forwarding chain
   depicted in Figure 3. After failure detection, of the adjacency to link between ePE1 and CE,
   the BGP next-hop forwarding engine will route traffic arriving from the core
   towards VPN-NH2 with path-index=1. A packet arriving from the core
   will contain the label VPN-L11 at top. The label VPN-L11 is deleted swapped
   with the label VPN-L21 and all the depending BGP pathlists are updated to either
   use packet is forwarded towards ePE2.

6.3. Handling Failures for Flattened Forwarding Chains

   As explained in the remaining ECMP BGP best-paths or in Section 5 if none remains available
   to activate precomputed backups.

   This BGP resiliency property only depends on local link failure
   detection and is independent the number of hierarchy levels
   of a platform cannot support the native number of BGP prefixes impacted.

   6.2. Performance

   When hierarchy levels
   of a recursive forwarding chain, the failure instantiated forwarding chain
   is local (a local IGP next-hop failure constructed by flattening two or more levels. Hence a local
   eBGP next-hop failure), a pre-computed and pre-installed backup 3 levels
   chain in Figure 5 is
   activated by flattened into the 2 levels chain in Figure 6.

   While reducing the benefits of BGP-PIC, flattening one hierarchy
   into a local-protection mechanism that shallower hierarchy does not depend on
   the number always result in a complete loss
   of BGP destinations impacted by the failure. Sub-50msec
   is thus possible even if millions benefits of BGP routes are impacted.

   When the failure BGP-PIC. To illustrate this fact suppose
   ASBR12 is remote (a remote IGP failure not impacting no longer reachable in domain 1. If the
   BGP next-hop or a remote BGP next-hop failure), an alternate path is
   activated upon IGP convergence. All platform supports
   the impacted BGP destinations
   benefit from a working alternate path as soon as full hierarchy depth, the IGP convergence
   occurs for their impacted BGP next-hop even if millions of BGP
   routes are impacted.

      6.2.1. Perspective

   The following table puts forwarding chain is the BGP PIC benefits one depicted
   in perspective
   assuming

   o  1M impacted BGP prefixes

   o  IGP convergence ~ 500 msec

   o  local protection ~ 50msec

   o Figure 5 and hence the FIB Update per BGP destination ~ 100usec conservative,
                                     ~ 10usec optimistic

   o  BGP Convergence per BGP destination ~ 200usec conservative,

                                          ~ 100usec optimistic

                                 Without PIC                With PIC

   Local IGP Failure             10 manager needs to 100sec                50msec

   Local BGP Failure            100 backwalk one level to 200sec                50msec

   Remote IGP Failure            10
   the pathlist shared by "ePE1" and "ePE2" and adjust it. If the
   platform supports 2 levels of hierarchy, then a useable forwarding
   chain is the one depicted in Figure 6. In that case, if ASBR12 is no
   longer reachable, the FIB manager has to 100sec               500msec

   Local BGP Failure            100 backwalk to 200sec               500msec

   Upon local IGP next-hop failure or remote IGP next-hop failure, the
   existing primary BGP next-hop two
   flattened pathlists and updates both of them.

   The main observation is intact and usable hence that the
   resiliency only loss of convergence speed due to
   the loss of hierarchy depth depends on the ability structure of the
   forwarding chain itself. To illustrate this fact, let's take two
   extremes. Suppose the forwarding objects in level i+1 depend on the
   forwarding objects in level i. If every object on level i+1 depends
   on a separate object in level i, then flattening level i into level
   i+1 will not result in loss of convergence speed. Now let's take the
   other extreme. Suppose "n" objects in level i+1 depend on 1 object
   in level i. Now suppose FIB mechanism to
   reflect flattens level i into level i+1. If a
   topology change results in modifying the new path single object in level i,
   then FIB has to backwalk and modify "n" objects in the BGP next-hop flattened
   level, thereby losing all the benefit of BGP-PIC. Experience shows
   that flattening forwarding chains usually results in moderate loss
   of BGP-PIC benefits. Further analysis is needed to corroborate and
   quantify this statement.

7. Properties

7.1. Coverage

   All the depending BGP
   destinations. Without BGP PIC, possible failures, except CE node failure, are covered,
   whether they impact a local or remote IGP path or a conservative back-of-the-envelope
   estimation for this FIB update is 100usec per BGP destination. An
   optimistic estimation is 10usec per entry.

   Upon local or remote
   BGP next-hop as described in Section 6.  This section provides
   details for each failure or and now the hierarchical and shared FIB
   structure proposed in this document allows recovery that does not
   depend on number of BGP prefixes.

7.1.1. A remote failure on the path to a BGP next-hop failure,
   without

   Upon IGP convergence, the IGP leaf for the BGP PIC mechanism, a new BGP Best-Path needs to be
   recomputed next-hop is updated
   upon IGP convergence and all the BGP depending routes leverage the
   new updates need to IGP forwarding state immediately. Details of this behavior can
   be sent to peers. found in Section 6.1.

   This BGP resiliency property only depends on
   BGP processing time that will be shared between best-path
   computation, RIB update IGP convergence and peer update. A conservative back-of-the-
   envelope estimation for this is 200usec per BGP destination. An
   optimistic estimation is 100usec per entry.

   6.3. Automated

   The BGP PIC solution does not require any operator involvement. The
   process is entirely automated as part
   independent of the FIB implementation.

   The salient points enabling this automation are:

   o  Extension number of the BGP Best Path prefixes impacted.

7.1.2. A local failure on the path to compute more than one primary
      ([11]and [12]) or backup BGP next-hop ([6] and [13]).

   o  Sharing of BGP Path-list across BGP destinations with same
      primary and backup a BGP next-hop

   o  Hierarchical indirection and dependency between BGP pathlist and

   Upon LFA protection, the IGP pathlist

   6.4. Incremental Deployment

   As soon as one router supports BGP PIC solution, it benefits from
   all its benefits without any requirement leaf for other routers to
   support BGP PIC.

7. Dependency

   This section describes the required functionality in BGP next-hop is updated to
   use the forwarding precomputed LFA backup path and control planes to support BGP-PIC described in this document

   7.1. Hierarchical Hardware FIB all the BGP PIC requires a hierarchical hardware FIB support: for each depending routes
   leverage this LFA protection. Details of this behavior can be found
   in Section 6.1.

   This BGP
   forwarded packet, a resiliency property only depends on LFA protection and is
   independent of the number of BGP prefixes impacted.

7.1.3. A remote iBGP next-hop fails

   Upon IGP convergence, the IGP leaf is looked up, then a for the BGP Pathlist next-hop is
   consulted, then an IGP Pathlist, then an Adjacency.

   An alternative method consists in "flattening" deleted
   and all the dependencies when
   programming depending BGP Path-Lists are updated to either use the
   remaining ECMP BGP destinations into HW FIB resulting best-paths or if none remains available to
   activate precomputed backups. Details about this behavior can be
   found in
   potentially eliminating both the Section 6.2.1.

   This BGP Path-List and resiliency property only depends on IGP Path-List
   consultation. Such an approach decreases convergence and is
   independent of the number of memory
   lookup's per forwarding operation at BGP prefixes impacted.

7.1.4. A local eBGP next-hop fails

   Upon local link failure detection, the expense of HW FIB memory
   increase (flattening means less sharing hence duplication), loss of adjacency to the BGP next-hop
   is deleted and all the depending BGP pathlists are updated to either
   use the remaining ECMP properties (flattening means less pathlist entropy) BGP best-paths or if none remains available
   to activate precomputed backups. Details about this behavior can be
   found in Section 6.2.2.

   This BGP resiliency property only depends on local link failure
   detection and loss is independent of
   BGP PIC properties.

   7.2. Availability the number of more than one primary or secondary BGP next-
      hops prefixes impacted.

7.2. Performance

   When the primary BGP failure is local (a local IGP next-hop fails, BGP PIC depends failure or a local
   eBGP next-hop failure), a pre-computed and pre-installed backup is
   activated by a local-protection mechanism that does not depend on
   the
   availability number of a pre-computed and pre-installed secondary BGP next-
   hop in destinations impacted by the BGP Pathlist.

   The existence failure. Sub-50msec
   is thus possible even if millions of a secondary next-hop BGP routes are impacted.

   When the failure is clear for remote (a remote IGP failure not impacting the following
   reason: a service caring for network availability will require two
   disjoint network connections hence two
   BGP next-hops.

   The next-hop or a remote BGP distribution of the secondary next-hop failure), an alternate path is available thanks
   to
   activated upon IGP convergence. All the following BGP mechanisms: Add-Path [11], impacted BGP Best-External
   [6], diverse destinations
   benefit from a working alternate path [12], and as soon as the frequent use in VPN deployments IGP convergence
   occurs for their impacted BGP next-hop even if millions of
   different VPN RD's per PE. It is noteworthy to mention that BGP
   routes are impacted.

   Appendix A puts the
   availability of another BGP path PIC benefits in perspective by providing
   some results using actual numbers.

7.3. Automated

   The BGP PIC solution does not mean that all failure
   scenarios can be covered by simply forwarding traffic to the
   available secondary path. require any operator involvement. The discussion of how to cover various
   failure scenarios
   process is beyond the scope entirely automated as part of the FIB implementation.

   The salient points enabling this document
   7.3. Pre-Computation automation are:

   o  Extension of a secondary the BGP Best Path to compute more than one primary
      ([11]and [12]) or backup BGP next-hop

   [13] describes how a secondary ([6] and [13]).

   o  Sharing of BGP Path-list across BGP destinations with same
      primary and backup BGP next-hop

   o  Hierarchical indirection and dependency between BGP pathlist and
      IGP pathlist

7.4. Incremental Deployment

   As soon as one router supports BGP next-hop can be precomputed on a
   per PIC solution, it benefits from
   all its benefits without any requirement for other routers to
   support BGP destination basis. PIC.

8. Security Considerations

   The behavior described in this document is internal functionality
   to a router that result in significant improvement to convergence
   time as well as reduction in CPU and memory used by FIB while not
   showing change in basic routing and forwarding functionality. As
   such no additional security risk is introduced by using the
   mechanisms proposed in this document.

9. IANA Considerations

   No requirements for IANA

10. Conclusions

   This document proposes a hierarchical and shared forwarding chain
   structure that allows achieving BGP prefix independent
   convergence, and in the case of locally detected failures, sub-50
   msec convergence. A router can construct the forwarding chains in
   a completely transparent manner with zero operator intervention
   thereby supporting smooth and incremental deployment.

11. References

11.1. Normative References

   [1]   Bradner, S., "Key words for use in RFCs to Indicate
         Requirement Levels", BCP 14, RFC 2119, March 1997.

   [2]   Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol
         4 (BGP-4), RFC 4271, January 2006

   [3]   Bates, T., Chandra, R., Katz, D., and Rekhter Y.,
         "Multiprotocol Extensions for BGP", RFC 4760, January 2007

   [4]   Y. Rekhter and E. Rosen, " Carrying Label Information in BGP-
         4", RFC 3107, May 2001

   [5]   Andersson, L., Minei, I., and B. Thomas, "LDP Specification",
         RFC 5036, October 2007

11.2. Informative References

   [6]   Marques,P., Fernando, R., Chen, E, Mohapatra, P., Gredler, H.,
         "Advertisement of the best external route in BGP", draft-ietf-
         idr-best-external-05.txt, January 2012.

   [7]   Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh
         Framework", RFC 5565, June 2009.

   [8]   Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
         Networks (VPNs)", RFC 4364, February 2006.

   [9]   De Clercq, J. , Ooms, D., Prevost, S., Le Faucheur, F.,
         "Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider
         Edge Routers (6PE)", RFC 4798, February 2007

   [10]  O. Bonaventure, C. Filsfils, and P. Francois. "Achieving sub-
         50 milliseconds recovery upon bgp peering link failures, "
         IEEE/ACM Transactions on Networking, 15(5):1123-1135, 2007

   [11]  D. Walton, A. Retana, E. Chen, J. Scudder, "Advertisement of
         Multiple Paths in BGP", draft-ietf-idr-add-paths-12.txt,
         November 2015

   [12]  R. Raszuk, R. Fernando, K. Patel, D. McPherson, K. Kumaki,
         "Distribution of diverse BGP paths", RFC 6774, November 2012

   [13]  P. Mohapatra, R. Fernando, C. Filsfils, and R. Raszuk, "Fast
         Connectivity Restoration Using BGP Add-path", draft-pmohapat-
         idr-fast-conn-restore-03, Jan 2013

   [14]  C. Filsfils, S. Previdi, A. Bashandy, B. Decraene, S.
         Litkowski, M. Horneffer, R. Shakir, J. Tansura, E. Crabbe
         "Segment Routing with MPLS data plane", draft-ietf-spring-
         segment-routing-mpls-02 (work in progress), October 2015

   [15]  C. Filsfils, S. Previdi, A. Bashandy, B. Decraene, " Topology
         Independent Fast Reroute using Segment Routing", draft-
         francois-spring-segment-routing-ti-lfa-02 (work in progress),
         August 2015

   [16]  M. Shand and S. Bryant, "IP Fast Reroute Framework", RFC 5714,
         January 2010

   [17]  S. Bryant, C. Filsfils, S. Previdi, M. Shand, N So, " Remote
         Loop-Free Alternate (LFA) Fast Reroute (FRR)", RFC 7490 April
         2015

   [18]  A. Atlas, C. Bowers, G. Enyedi, " An Architecture for IP/LDP
         Fast-Reroute Using Maximally Redundant Trees", draft-ietf-
         rtgwg-mrt-frr-architecture-10 (work in progress), February
         2016

12. Acknowledgments

   Special thanks to Neeraj Malhotra, Yuri Tsier for the valuable
   help

   Special thanks to Bruno Decraene for the valuable comments

   This document was prepared using 2-Word-v2.0.template.dot.

Authors' Addresses

   Ahmed Bashandy
   Cisco Systems
   170 West Tasman Dr, San Jose, CA 95134, USA
   Email: bashandy@cisco.com

   Clarence Filsfils
   Cisco Systems
   Brussels, Belgium
   Email: cfilsfil@cisco.com

   Prodosh Mohapatra
   Sproute Networks
   Email: mpradosh@yahoo.com

Appendix A.                 Perspective

   The following table puts the BGP PIC benefits in perspective
   assuming

   o  1M impacted BGP prefixes

   o  IGP convergence ~ 500 msec

   o  local protection ~ 50msec

   o  FIB Update per BGP destination ~ 100usec conservative,

                                     ~ 10usec optimistic

   o  BGP Convergence per BGP destination ~ 200usec conservative,

                                          ~ 100usec optimistic

                                 Without PIC                With PIC

   Local IGP Failure             10 to 100sec                50msec

   Local BGP Failure            100 to 200sec                50msec

   Remote IGP Failure            10 to 100sec               500msec

   Local BGP Failure            100 to 200sec               500msec

   Upon local IGP next-hop failure or remote IGP next-hop failure, the
   existing primary BGP next-hop is intact and usable hence the
   resiliency only depends on the ability of the FIB mechanism to
   reflect the new path to the BGP next-hop to the depending BGP
   destinations. Without BGP PIC, a conservative back-of-the-envelope
   estimation for this FIB update is 100usec per BGP destination. An
   optimistic estimation is 10usec per entry.

   Upon local BGP next-hop failure or remote BGP next-hop failure,
   without the BGP PIC mechanism, a new BGP Best-Path needs to be
   recomputed and new updates need to be sent to peers. This depends on
   BGP processing time that will be shared between best-path
   computation, RIB update and peer update. A conservative back-of-the-
   envelope estimation for this is 200usec per BGP destination. An
   optimistic estimation is 100usec per entry.