draft-ietf-rtgwg-bgp-pic-00.txt   draft-ietf-rtgwg-bgp-pic-01.txt 
Network Working Group A. Bashandy, Ed. Network Working Group A. Bashandy, Ed.
Internet Draft C. Filsfils Internet Draft C. Filsfils
Intended status: Informational Cisco Systems Intended status: Informational Cisco Systems
Expires: June 2016 P. Mohapatra Expires: December 2016 P. Mohapatra
Sproute Networks Sproute Networks
December 7, 2015 June 20, 2016
BGP Prefix Independent Convergence BGP Prefix Independent Convergence
draft-ietf-rtgwg-bgp-pic-00.txt draft-ietf-rtgwg-bgp-pic-01.txt
Abstract Abstract
In the network comprising thousands of iBGP peers exchanging millions In the network comprising thousands of iBGP peers exchanging millions
of routes, many routes are reachable via more than one path. Given of routes, many routes are reachable via more than one next-hop.
the large scaling targets, it is desirable to restore traffic after Given the large scaling targets, it is desirable to restore traffic
failure in a time period that does not depend on the number of BGP after failure in a time period that does not depend on the number of
prefixes. In this document we proposed an architecture by which BGP prefixes. In this document we proposed an architecture by which
traffic can be re-routed to ECMP or pre-calculated backup paths in a traffic can be re-routed to ECMP or pre-calculated backup paths in a
timeframe that does not depend on the number of BGP prefixes. The timeframe that does not depend on the number of BGP prefixes. The
objective is achieved through organizing the forwarding chains in a objective is achieved through organizing the forwarding data
hierarchical manner and sharing forwarding elements among the maximum structures in a hierarchical manner and sharing forwarding elements
possible number of routes. The proposed technique achieves prefix among the maximum possible number of routes. The proposed technique
independent convergence while ensuring incremental deployment, achieves prefix independent convergence while ensuring incremental
complete transparency and automation, and zero management and deployment, complete automation, and zero management and provisioning
provisioning effort. It is noteworthy to mention that the benefits of effort. It is noteworthy to mention that the benefits of BGP-PIC are
BGP-PIC are hinged on the existence of more than one path whether as hinged on the existence of more than one path whether as ECMP or
ECMP or primary-backup. primary-backup.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
This document may contain material from IETF Documents or IETF This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this 10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow material may not have granted the IETF Trust the right to allow
skipping to change at page 2, line 17 skipping to change at page 2, line 17
documents at any time. It is inappropriate to use Internet-Drafts documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in as reference material or to cite them other than as "work in
progress." progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on May 7, 2016. This Internet-Draft will expire on December 20, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License. warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction...................................................3 1. Introduction...................................................3
1.1. Conventions used in this document.........................3 1.1. Conventions used in this document.........................4
1.2. Terminology...............................................4 1.2. Terminology...............................................4
2. Constructing the Shared Hierarchical Forwarding Chain..........5 2. Overview.......................................................5
2.1. Databases.................................................5 3. Constructing the Shared Hierarchical Forwarding Chain..........7
2.2. Constructing the forwarding chain from a downloaded route.6 3.1. Example 1: Primary-Backup Path Scenario...................8
2.3. Examples..................................................7 3.2. Example 2: Platforms with Limited Levels of Hierarchy.....9
2.3.1. Example 1: Forwarding Chain for iBGP ECMP............7 4. Forwarding Behavior...........................................13
2.3.2. Example 2: Primary Backup Paths.....................10 5. Forwarding Chain Adjustment at a Failure......................15
2.3.3. Example 3: Platforms with Limited Levels of Hierarchy10 5.1. BGP-PIC core.............................................16
3. Forwarding Behavior...........................................15 5.2. BGP-PIC edge.............................................17
4. Forwarding Chain Adjustment at a Failure......................17 5.2.1. Adjusting forwarding Chain in egress node failure...17
4.1. BGP-PIC core.............................................17 5.2.2. Adjusting Forwarding Chain on PE-CE link Failure....17
4.2. BGP-PIC edge.............................................18 5.3. Handling Failures for Flattended Forwarding Chains.......18
4.2.1. Adjusting forwarding Chain in egress node failure...19 6. Properties....................................................19
4.2.2. Adjusting Forwarding Chain on PE-CE link Failure....19 6.1. Coverage.................................................19
4.3. Handling Failures for Flattended Forwarding Chains.......20 6.1.1. A remote failure on the path to a BGP next-hop......19
6.1.2. A local failure on the path to a BGP next-hop.......19
5. Properties....................................................21 6.1.3. A remote iBGP next-hop fails........................20
6. Dependency....................................................23 6.1.4. A local eBGP next-hop fails.........................20
7. Security Considerations.......................................24 6.2. Performance..............................................20
8. IANA Considerations...........................................24 6.2.1. Perspective.........................................20
9. Conclusions...................................................25 6.3. Automated................................................21
10. References...................................................25 6.4. Incremental Deployment...................................22
10.1. Normative References....................................25 7. Dependency....................................................22
10.2. Informative References..................................25 7.1. Hierarchical Hardware FIB................................22
11. Acknowledgments..............................................26 7.2. Availability of more than one primary or secondary BGP next-
hops..........................................................22
7.3. Pre-Computation of a secondary BGP next-hop..............23
8. Security Considerations.......................................23
9. IANA Considerations...........................................23
10. Conclusions..................................................23
11. Acknowledgments..............................................25
12. References...................................................23
12.1. Normative References....................................23
12.2. Informative References..................................24
1. Introduction 1. Introduction
As a path vector protocol, BGP is inherently slow due to the As a path vector protocol, BGP is inherently slow due to the
serial nature of reachability propagation. BGP speakers exchange serial nature of reachability propagation. BGP speakers exchange
reachability information about prefixes[2][3] and, for labeled reachability information about prefixes[2][3] and, for labeled
address families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128, an address families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128, an
edge router assigns local labels to prefixes and associates the edge router assigns local labels to prefixes and associates the
local label with each advertised prefix such as L3VPN [8], 6PE local label with each advertised prefix such as L3VPN [8], 6PE
[9], and Softwire [7] using BGP label unicast technique[4]. A BGP [9], and Softwire [7] using BGP label unicast technique[4]. A BGP
speaker then applies the path selection steps to choose the best speaker then applies the path selection steps to choose the best
path. In modern networks, it is not uncommon to have a prefix path. In modern networks, it is not uncommon to have a prefix
reachable via multiple edge routers. In addition to proprietary reachable via multiple edge routers. In addition to proprietary
techniques, multiple techniques have been proposed to allow for techniques, multiple techniques have been proposed to allow for
more than one path for a given prefix [6][11][12], whether in the BGP to advertise more than one path for a given prefix
form of equal cost multipath or primary-backup. Another more [6][11][12], whether in the form of equal cost multipath or
common and widely deployed scenario is L3VPN with multi-homed VPN primary-backup. Another more common and widely deployed scenario
sites. is L3VPN with multi-homed VPN sites with unique Route
Distinguisher.
This document proposes a hierarchical and shared forwarding chain This document proposes a hierarchical and shared forwarding chain
organization that allows traffic to be restored to pre-calculated organization that allows traffic to be restored to pre-calculated
alternative equal cost primary path or backup path in a time alternative equal cost primary path or backup path in a time
period that does not depend on the number of BGP prefixes. The period that does not depend on the number of BGP prefixes. The
technique relies on internal router behavior that is completely technique relies on internal router behavior that is completely
transparent to the operator and can be incrementally deployed and transparent to the operator and can be incrementally deployed and
enabled with zero operator intervention. enabled with zero operator intervention.
1.1. Conventions used in this document 1.1. Conventions used in this document
skipping to change at page 4, line 21 skipping to change at page 4, line 32
speaker has a path for. speaker has a path for.
o IGP prefix: It is a prefix P/m (of any AFI/SAFI) that is learnt o IGP prefix: It is a prefix P/m (of any AFI/SAFI) that is learnt
via an Interior Gateway Protocol, such as OSPF and ISIS, has a via an Interior Gateway Protocol, such as OSPF and ISIS, has a
path for. The prefix may be learnt directly through the IGP or path for. The prefix may be learnt directly through the IGP or
redistributed from other protocol(s) redistributed from other protocol(s)
o CE: It is an external router through which an egress PE can o CE: It is an external router through which an egress PE can
reach a prefix P/m. reach a prefix P/m.
o Ingress PE, "iPE": It is a BGP speaker that learns about a o Ingress PE, "iPE": t is a BGP speaker that learns about a prefix
prefix through another IBGP peer and chooses that IBGP peer as through a IBGP peer and chooses an egress PE as the next-hop for
the next-hop for the prefix. the prefix..
o Path: It is the next-hop in a sequence of unique connected o Path: It is the next-hop in a sequence of unique connected
nodes starting from the current node and ending with the nodes starting from the current node and ending with the
destination node or network identified by the prefix. destination node or network identified by the prefix.
o Recursive path: It is a path consisting only of the IP address o Recursive path: It is a path consisting only of the IP address
of the next-hop without the outgoing interface. Subsequent of the next-hop without the outgoing interface. Subsequent
lookups are needed to determine the outgoing interface. lookups are needed to determine the outgoing interface.
o Non-recursive path: It is a path consisting of the IP address o Non-recursive path: It is a path consisting of the IP address
skipping to change at page 5, line 11 skipping to change at page 5, line 20
o Label leaf. It is the leaf corresponding to a locally allocated o Label leaf. It is the leaf corresponding to a locally allocated
label such as the VPN label on an egress PE [8]. label such as the VPN label on an egress PE [8].
o Pathlist: It is an array of paths used by one or more prefix to o Pathlist: It is an array of paths used by one or more prefix to
forward traffic to destination(s) covered by a IP prefix. Each forward traffic to destination(s) covered by a IP prefix. Each
path in the pathlist carries its "path-index" that identifies path in the pathlist carries its "path-index" that identifies
its position in the array of paths. A pathlist may contain a its position in the array of paths. A pathlist may contain a
mix of primary and backup paths mix of primary and backup paths
o OutLabel-Array: Each labeled prefix is associated with an o OutLabel-List: Each labeled prefix is associated with an
OutLabel-Array. The OutLabel-Array is a list of one or more OutLabel-List. The OutLabel-List is an array of one or more
outgoing labels and/or label actions where each label or label outgoing labels and/or label actions where each label or label
action has 1-to-1 correspondence to a path in the pathlist. It action has 1-to-1 correspondence to a path in the pathlist.
is possible that the number of entries in the OutLabel-array is Label actions are: push the label, pop the label, or swap the
different from the number of paths in the pathlist and the ith incoming label with the label in the Outlabel-Array entry. The
Outlabel-Array entry is associated with the path whose path- prefix may be an IGP or BGP prefix
index is "i". Label actions are: push the label, pop the label,
or swap the incoming label with the label in the Outlabel-Array
entry. The prefix may be an IGP or BGP prefix
o Adjacency: It is the layer 2 encapsulation leading to the layer o Adjacency: It is the layer 2 encapsulation leading to the layer
3 directly connected next-hop 3 directly connected next-hop
o Dependency: An object X is said to be a dependent or Child of o Dependency: An object X is said to be a dependent or Child of
object Y if Object Y cannot be deleted unless object X is no object Y if Object Y cannot be deleted unless object X is no
longer a dependent/child of object Y longer a dependent/child of object Y
o Route: It is a prefix with one or more paths associated with o Route: It is a prefix with one or more paths associated with
it. Hence the minimum set of objects needed to construct a it. Hence the minimum set of objects needed to construct a
route is a leaf and a pathlist. route is a leaf and a pathlist.
2. Constructing the Shared Hierarchical Forwarding Chain 2. Overview
2.1. Databases
The Forwarding Information Base (FIB) on a router maintains 3 basic
databases
o Pathlist-DB: A pathlist is uniquely identified by the list of
paths. The Pathlist DB contains the set of all shared pathlists
o Leaf-DB: A leaf is uniquely identified by the prefix or the label
o Adjacency-DB: An adjacency is uniquely identified by the outgoing
layer 3 interface and the IP address of the next-hop directly
connected to the layer 3 interface. Adjacency DB contains the
list of all adjacencies
2.2. Constructing the forwarding chain from a downloaded route
1. A prefix with a list of paths is downloaded to FIB from BGP. For
labeled prefixes, an OutLabel-Array and possibly a local label
(e.g. for a VPN [8] prefix on an egress PE) are also downloaded
2. If the prefix does not exist, construct a new IP leaf from the
downloaded prefix. If a local label is allocated, construct a
label leaf from the local label
3. Construct an OutLabel-Array and attach the Outlabel array to the
IP and label leaf
4. The list of paths attached to the route is looked up in the
pathlist-DB
5. If a pathlist PL is found
a. Retrieve the pathlist
6. Else
a. Construct a new pathlist
b. Insert the new pathlist in the pathlist-DB
c. Resolve the paths of the pathlist as follows
d. Recursive path:
i. Lookup the next-hop in the leaf-DB
ii. If a leaf with at least one reachable path is found, add The idea of BGP-PIC is based on two pillars
the path to the dependency list of the leaf
iii. Otherwise the path remains unresolved and cannot be used o A shared hierarchal Forwarding Chain
for forwarding
e. Non-recursive path o A forwarding plane that supports multiple levels of indirection
i. Lookup the next-hop and outgoing interface in the To illustrate the two pillars above, we will use an example of a
adjacency-DB simple multihomed L3VPN [8] prefix in a BGP-free core running LDP
[5] or segment routing over MPLS forwarding plane [14].
ii. If an adjacency is found, add the path to the dependency +--------------------------------+
list of adjacency | |
| ePE2
| | \
| | \
| | \
iPE | CE.......VRF "Blue"
| | / (VPN-IP1)
| | / (VPN-IP2)
| LDP/Segment-Routing Core | /
| ePE1
| |
+--------------------------------+
Figure 1 VPN prefix reachable via multiple PEs
iii. Otherwise, create a new adjacency and add the path to Referring to Figure 1, suppose the iPE (the ingress PE) receives
its dependency list NLRIs for the VPN prefixes VPN-IP1 and VPN-IP2 from two egress PEs,
ePE1 and ePE2 with next-hop BGP-NH1 and BGP-NH2, respectively.
Assume that ePE1 advertise the VPN labels VPN-L11 and VPN-L12 while
ePE2 advertise the VPN labels VPN-L21 and VPN-L22 for VPN-IP1 and
VPN-IP2, respectively. Suppose that BGP-NH1 and BGP-NH2 are resolved
via the IGP prefixes IGP-IP1 and IGP-P2, where each happen to have 2
ECMP paths with IGP-NH1 and IGP-NH2 reachable via the interfaces I1
and I2, respectively. Suppose that local labels (whether LDP[5] or
segment routing [14]) on the downstream LSRs for IGP-IP1 are IGP-L11
and IGP-L12 while for IGP-P2 are IGP-L21 and IGP-L22.
7. Attach the leaf(s) as (a) dependent(s) of the pathlist Based on the information about NLRIs and the resolving IGP prefixes,
As a result of the above steps, a forwarding chain starting with a a hierarchical forwarding chain can be constructed as shown in
leaf and ending with one or more adjacency is constructed. It is Figure 2.
noteworthy to mention that the forwarding chain is constructed
without any operator intervention at all.
2.3. Examples IP Leaf: Pathlist: IP Leaf: Pathlist:
-------- +-------+ -------- +----------+
VPN-IP1-->|BGP-NH1|-->IGP-IP1(BGP NH1)--->|IGP NH1,I1|--->Adjacency1
| |BGP-NH2|-->.... | |IGP NH2,I2|--->Adjacency2
| +-------+ | +----------+
| |
| |
v v
OutLabel-List: OutLabel-List:
+----------------------+ +----------------------+
|VPN-L11 (VPN-IP1, NH1)| |IGP-L11 (IGP-IP1, NH1)|
|VPN-L12 (VPN-IP1, NH2)| |IGP-L12 (IGP-IP1, NH2)|
+----------------------+ +----------------------+
This section outlines three examples that we will use for Figure 2 Shared Hierarchical Forwarding Chain at iPE
illustration for the rest of the document. The first two examples
use a standard multihomed VPN [8] prefix in a BGP-free core running
LDP [5] or segment routing on MPLS [14]. The third example uses
inter-AS option C [8] with 2 domains running segment routing [14] or
LDP [5] in the core
The topology for the first two examples is depicted in Figure 1. The forwarding chain depicted in Figure 2 illustrates the first
pillar, which is sharing and hierarchy. We can see that the BGP
pathlist consisting of BGP-NH1 and BGP-NH2 is shared by all NLRIs
reachable via ePE1 and ePE2. As such, it is possible to make changes
to the pathlist without having to make changes to the NLRIs. For
example, if BGP-NH2 becomes unreacreachable, there is no need to
modify any of the possibly large number of NLRIs. Instead only the
shared pathlist needs to be modified. Likewise, due to the
hierarchical structure of the forwarding chain, it is possible to
make modifications to the IGP routes without having to make any
changes to the BGP NLRIs. For example, if the interface "I2" goes
down, only the shared IGP pathlist needs to be updated, but none of
the IGP prefixes sharing the IGP pathlist nor the BGP NLRIs using
the IGP prefixes for resolution need to be modified.
+-----------------------------------+ Figure 2 can also be used to illustrate the second BGP-PIC pillar.
| | Having a deep forwarding chain such as the one illustrated in Figure
| LDP/Segment-Routing Core | 2 requires a forwarding plane that is capable of accessing multiple
| | levels of indirection in order to calculate the outgoing
| ePE2 interface(s) and next-hops(s). While a deeper forwarding chain
| |\ minimizes the re-convergence time on topology change, there will
| | \ always exist platforms with limited capabilities and hence imposing
| | \ a limit on the depth of the forwarding chain. The example in Section
| | \ 3.2 illustrates how to gracefully trade off convergence speed with
iPE | CE.......VRF "Blue" the number of hierarchical levels to support platforms with
| | / (VPN-P1) different capabilities.
| | / (VPN-P2)
| | /
| |/
| ePE1
| |
| |
| |
+-----------------------------------+
Figure 1 VPN prefix reachable via multiple PEs
The first example is an illustration of ECMP while the second 3. Constructing the Shared Hierarchical Forwarding Chain
example is an illustration of primary-backup paths. The third
example illustrate how to handle limited hardware capability.
2.3.1. Example 1: Forwarding Chain for iBGP ECMP Constructing the forwarding chain is an application of the two
pillars described in Section 2.
Consider the case of the ingress PE (iPE) in the multi-homed VPN The whole process starts when BGP downloads a prefix to FIB. The
prefixes depicted in Figure 1. Suppose the iPE receives route prefix contains one or more outgoing paths. For certain labeled
advertisements for the VPN prefixes VPN-P1 and VPN-P2 from two prefixes, such as VPN [8] prefixes, each path may be associated with
egress PEs, ePE1 and ePE2 with next-hop BGP-NH1 and BGP-NH2, an outgoing label and the prefix itself may be assigned a local
respectively. Assume that ePE1 advertise the VPN labels VPN-L11 and label. The list of outgoing paths defines a pathlist. If such
VPN-L12 while ePE2 advertise the VPN labels VPN-L21 and VPN-L22 for pathlist does not already exist, then FIB creates a new pathlist,
VPN-P1 and VPN-P2, respectively. Suppose that BGP-NH1 and BGP-NH2 otherwise the existing pathlist is used. The BGP prefix is added as
are resolved via the IGP prefixes IGP-P1 and IGP-P2, which also a dependent of the pathlist.
happen to have 2 ECMP paths with IGP-NH1 and IGP-NH2 reachable via
the interfaces I1 and I2. Suppose that local labels (whether LDP[5]
or segment routing [14]) on the downstream LSRs for IGP-P1 and IGP-
P2 are assign the LDP labels LDP-L1 and LDP-L2 to the prefixes IGP-
P1 and IGP-P2. The forwarding chain on the ingress PE "iPE" for the
VPN prefixes is depicted in Figure 2.
BGP OutLabel Array The previous step constructs the upper part of the hierarchical
+---------+ forwarding chain. The forwarding chain is completed by resolving the
| VPN-L11 | paths of the pathlist. A BGP path usually consists of a next-hop.
+--->+---------+ The next-hop is resolved by finding a matching IGP prefix.
| | VPN-L21 |
| +---------+ IGP OutLabel Array
| +---------+
| | LDP-L11 |
| +-->+---------+
| | | LDP-L21 |
VPN-P1------+ | +---------+
| |
| |
| IGP-P1-----+
| ^ |
| | |
V | V IGP Pathlist
+--------+ | +-------------+
|BGP-NH1 |---------------+ | IGP-NH1, I1 |------>adj1
BGP +--------+ +-------------+
Pathlist |BGP-NH2 |----+ | IGP-NH2, I2 |------>adj2
+--------+ | +-------------+
^ | ^
| | |
| | |
| IGP-P2----------------+
| |
| |
VPN-P2------+ | +---------+
| | | LDP-L12 |
| +--->+---------+
| | LDP-L22 |
| +---------+
| +---------+ IGP OutLabel Array
| | VPN-L12 |
+--->+---------+
| VPN-L22 |
+---------+
BGP OutLabel Array
Figure 2 Forwarding Chain for VPN Prefixes with iBGP ECMP The end result is a hierarchical shared forwarding chain where the
BGP pathlist is shared by all BGP prefixes that use the same list of
paths and the IGP prefix is shared by all pathlists that have a path
resolving via that IGP prefix. It is noteworthy to mention that the
forwarding chain is constructed without any operator intervention at
all.
The structure depicted in Figure 2 illustrates the two important The remainder of this section illustrates two examples. The first
properties discussed in this memo: sharing and hierarchy. We can example illustrates the applicability of BGP-PIC in a primary-backup
see that the both the BGP and IGP pathlists are shared among path deployment. The second example illustrates how BGP-PIC can be
multiple BGP and IGP prefixes, respectively. At the same time, the applied in cases where the forwarding plane supports limited number
forwarding chain objects depend on each other in a child-parent of indirections.
relation instead of being collapsed into a single level.
2.3.2. Example 2: Primary Backup Paths 3.1. Example 1: Primary-Backup Path Scenario
Consider the egress PE ePE1 in the case of the multi-homed VPN Consider the egress PE ePE1 in the case of the multi-homed VPN
prefixes in the BGP-free LDP core depicted in Figure 1. Suppose ePE1 prefixes in the BGP-free core depicted in Figure 1. Suppose ePE1
determines that the primary path is the external path but the backup determines that the primary path is the external path but the backup
path is the iBGP path to the other PE ePE2 with next-hop BGP-NH2. path is the iBGP path to the other PE ePE2 with next-hop BGP-NH2.
ePE2 constructs the forwarding chain depicted in Figure 1. We are ePE2 constructs the forwarding chain depicted in Figure 3. We are
only showing a single VPN prefix for simplicity. But all prefixes only showing a single VPN prefix for simplicity. But all prefixes
that are multihomed to ePE1 and ePE2 share the BGP pathlist that are multihomed to ePE1 and ePE2 share the BGP pathlist.
BGP OutLabel Array BGP OutLabel Array
VPL-L11 +---------+ VPN-L11 +---------+
(Label-leaf)---+---->|Unlabeled| (Label-leaf)---+---->|Unlabeled|
| +---------+ | +---------+
| | VPN-L21 | | | VPN-L21 |
| | (swap) | | | (swap) |
| +---------+ | +---------+
| ^ | ^
| | BGP Pathlist | | BGP Pathlist
| | +------------+ Connected route | | +------------+ Connected route
| | | CE-NH |------>(to the CE) | | | CE-NH |------>(to the CE)
| | |path-index=0| | | |path-index=0|
| | +------------+ | | +------------+
V | | VPN-NH2 | V | | VPN-NH2 |
VPN-P1 ------------------+------>| (backup) |------>IGP Leaf VPN-IP1 -----------------+------>| (backup) |------>IGP Leaf
(IP prefix leaf) |path-index=1| (Towards ePE2) (IP prefix leaf) |path-index=1| (Towards ePE2)
+-----+------+ +------------+
Figure 3 : VPN Prefix Forwarding Chain with eiBGP paths on egress PE Figure 3 : VPN Prefix Forwarding Chain with eiBGP paths on egress PE
The example depicted in Figure 3 differs from the example in Figure The example depicted in Figure 3 differs from the example in Figure
2 in two main aspects. First as long as the primary path towards the 2 in two main aspects. First, as long as the primary path towards
CE (external path) is useable, it will be the only path used for the CE (external path) is useable, it will be the only path used for
forwarding while the OutLabel-Array contains both the unlabeled forwarding while the OutLabel-List contains both the unlabeled label
label (primary path) and the VPN label (backup path) advertised by (primary path) and the VPN label (backup path) advertised by the
the backup path ePE2. The second aspect is presence of the label backup path ePE2. The second aspect is presence of the label leaf
leaf corresponding to the VPN prefix. This label leaf is used to corresponding to the VPN prefix. This label leaf is used to match
match VPN traffic arriving from the core. Note that the label leaf VPN traffic arriving from the core. Note that the label leaf shares
shares the OutLabel-Array and the pathlist with the IP prefix. the OutLabel-List and the pathlist with the IP prefix.
2.3.3. Example 3: Platforms with Limited Levels of Hierarchy 3.2. Example 2: Platforms with Limited Levels of Hierarchy
This example uses a case of inter-AS option C [8] where there are 3 This example uses a case of inter-AS option C [8] where there are 3
levels of hierarchy. Figure 4 illustrates the sample topology. To levels of hierarchy. Figure 4 illustrates the sample topology. To
force 3 levels of hierarchy, the ASBRs on the ingress domain (domain force 3 levels of hierarchy, the ASBRs on the ingress domain (domain
1) advertise the core routers of the egress domain (domain 2) to the 1) advertise the core routers of the egress domain (domain 2) to the
ingress PE (iPE) via BGP-LU [4] instead of redistributing then into ingress PE (iPE) via BGP-LU [4] instead of redistributing them into
the IGP of domain 1. The end result is that the ingress PE (iPE) has the IGP of domain 1. The end result is that the ingress PE (iPE) has
2 levels of recursion for the VPN prefix VPN-P1 and VPN2-P2. 2 levels of recursion for the VPN prefix VPN-IP1 and VPN2-P2.
Domain 1 Domain 2 Domain 1 Domain 2
+----------------+ +-------------+ +----------------+ +-------------+
| | | | | | | |
| LDP/SR Core | | LDP/SR core | | LDP/SR Core | | LDP/SR core |
| | | | | | | |
| ASBR11------ASBR21.......PE21\ | ASBR11------ASBR21.......ePE1\
| | \ / | . . | \ | | \ / | . . | \
| | \ / | . . | \ | | \ / | . . | \
| | \/ | .. | \VPN-P1 | | \/ | .. | \VPN-IP1
| | /\ | . . | / | | /\ | . . | /
| | / \ | . . | / | | / \ | . . | /
| | / \ | . . | / | | / \ | . . | /
iPE ASBR12------ASBR22.......PE22 iPE ASBR12------ASBR22.......ePE2
| | | | \ | | | | \
| | | | \ | | | | \
| | | | \ | | | | \
| | | | /VPN-P2 | | | | /VPN-IP2
| | | | / | | | | /
| | | | / | | | | /
| ASBR13------ASBR23.......PE23/ | ASBR13------ASBR23.......ePE3/
| | | | | | | |
| | | | | | | |
+----------------+ +-------------+ +----------------+ +-------------+
<============== <========= <============ <============== <========= <============
Advertise PE2x Advertise Redistribute Advertise PE2x Advertise Redistribute
Using iBGP-LU PE2x Using IGP into Using iBGP-LU PE2x Using IGP into
eBGP-LU BGP eBGP-LU BGP
Figure 4 Sample 3-level hierarchy topology Figure 4 Sample 3-level hierarchy topology
We will make the following assumptions about connectivity We will make the following assumptions about connectivity
o In "domain 2", both ASBR21 and ASBR22 can reach both PE21 and o In "domain 2", both ASBR21 and ASBR22 can reach both ePE1 and
PE22 using the same distance ePE2 using the same distance
o In "domain 2", only ASBR23 can reach PE23
o In "domain 1", iPE (the ingress PE) can reach ASBR1, ASBR12, and o In "domain 2", only ASBR23 can reach ePE3
ASBR13 via IGP using the same distance o In "domain 1", iPE (the ingress PE) can reach ASBR11, ASBR12, and
ASBR13 via IGP using the same distance.
We will make the following assumptions about the labels We will make the following assumptions about the labels
o The VPN labels advertised by PE21 and PE22 for prefix VPN-P1 are o The VPN labels advertised by ePE1 and ePE2 for prefix VPN-IP1 are
VPN-PE21(P1) and VPN-PE22(P1), respectively VPN-L11 and VPN-L21, respectively
o The VPN labels advertised byPE22 and PE23 for prefix VPN-P2 are o The VPN labels advertised by ePE2 and ePE3 for prefix VPN-IP2 are
VPN-PE22(P2) and VPN-PE23(P2), respectively VPN-L22 and VPN-L32, respectively
o The labels for advertised to iPE by ASBR11 using BGP-LU [4] for o The labels advertised by ASBR11 to iPE using BGP-LU [4] for the
the egress PEs PE21 and PE22 are LASBR11(PE21) and LASBR11(PE22), egress PEs ePE1 and ePE2 are LASBR11(ePE1) and LASBR11(ePE2),
respectively. respectively.
o The labels for advertised by ASBR12 to iPE using BGP-LU [4] for o The labels advertised by ASBR12 to iPE using BGP-LU [4] for the
the egress PEs PE21 and PE22 are LASBR12(PE21) and LASBR12(PE22), egress PEs ePE1 and ePE2 are LASBR12(ePE1) and LASBR12(ePE2),
respectively respectively
o The label for advertised by ASBR11 to iPE using BGP-LU [4] for o The label advertised by ASBR11 to iPE using BGP-LU [4] for the
the egress PE PE23 is LASBR13(PE23) egress PE ePE3 is LASBR13(ePE3)
o The local labels of the next hops from the ingress PE iPE towards o The IGP labels advertised by the next hops directly connected to
ASBR11, ASBR12, and ASBR13 in the core of domain 1 are L11, L12, iPE towards ASBR11, ASBR12, and ASBR13 in the core of domain 1
and L13, respectively. are IGP-L11, IGP-L12, and IGP-L13, respectively.
The diagram in Figure 5 illustrates the forwarding chain assuming The diagram in Figure 5 illustrates the forwarding chain in iPE
that the forwarding hardware in iPE supports 3 levels of hierarchy. assuming that the forwarding hardware in iPE supports 3 levels of
The leaves corresponding to the ABSRs on domain 1 (ASBR11, ASBR12, hierarchy. The leaves corresponding to the ABSRs on domain 1
and ASBR13) are at the bottom of the hierarchy. There are few (ASBR11, ASBR12, and ASBR13) are at the bottom of the hierarchy.
important points There are few important points:
o Because the hardware supports the required depth of hierarchy, o Because the hardware supports the required depth of hierarchy,
the sizes of a pathlist equal the size of the label array the sizes of a pathlist equal the size of the label list
associated with the leaves using this pathlist associated with the leaves using this pathlist
o The index inside the pathlist entry indicates the label that will o The index inside the pathlist entry indicates the label that will
be picked from the Outlabel-array if that path is chosen by the be picked from the Outlabel-List if that path is chosen by the
forwarding engine hashing function. forwarding engine hashing function.
Outlabel Array Outlabel Array Outlabel-List Outlabel-List
For VPN-P1 For VPN-P2 For VPN-IP1 For VPN-IP2
+------------+ +-------+ +-------+ +------------+ +------------+ +--------+ +-------+ +------------+
|VPN-PE21(P1)|<---| VPN-P1| | VPN-P2|-->|VPN-PE22(P2)| | VPN-L11 |<---| VPN-IP1| |VPN-IP2|-->| VPN-L22 |
+------------+ +---+---+ +---+---+ +------------+ +------------+ +---+----+ +---+---+ +------------+
|VPN-PE22(P1)| | | |VPN-PE23(P2)| | VPN-L21 | | | | VPN-L32 |
+------------+ | | +------------+ +------------+ | | +------------+
| | | |
V V V V
+---+---+ +---+---+ +---+---+ +---+---+
| 0 | 1 | | 0 | 1 | | 0 | 1 | | 0 | 1 |
+-|-+-\-+ +-/-+-\-+ +-|-+-\-+ +-/-+-\-+
| \ / \ | \ / \
| \ / \ | \ / \
| \ / \ | \ / \
| \ / \ | \ / \
v \ / \ v \ / \
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+
+----+ PE21| |PE22 +-----+ | PE23+-----+ +----+ ePE1| |ePE2 +-----+ | ePE3+-----+
| +--+--+ +-----+ | +--+--+ | | +--+--+ +-----+ | +--+--+ |
v | / v | v v | / v | v
+-------------+ | / +-------------+ | +-------------+ +-------------+ | / +-------------+ | +-------------+
|LASBR11(PE21)| | / |LASBR11(PE22)| | |LASBR13(PE23)| |LASBR11(ePE1)| | / |LASBR11(ePE2)| | |LASBR13(ePE3)|
+-------------+ | / +-------------+ | +-------------+ +-------------+ | / +-------------+ | +-------------+
|LASBR12(PE21)| | / |LASBR12(PE22)| | Outlabel Array |LASBR12(ePE1)| | / |LASBR12(ePE2)| | Outlabel-List
+-------------+ | / +-------------+ | For PE23 +-------------+ | / +-------------+ | For ePE3
Outlabel Array | / Outlabel Array | Outlabel-List | / Outlabel-List |
For PE21 | / For PE22 | For ePE1 | / For ePE2 |
| / | | / |
| / | | / |
| / | | / |
v / v v / v
+---+---+ Shared Pathlist +---+ Pathlist +---+---+ Shared Pathlist +---+ Pathlist
| 0 | 1 | For PE21 and PE22 | 0 | For PE23 | 0 | 1 | For ePE1 and ePE2 | 0 | For ePE3
+-|-+-\-+ +-|-+ +-|-+-\-+ +-|-+
| \ | | \ |
| \ | | \ |
| \ | | \ |
| \ | | \ |
v \ v v \ v
+---+ +------+ +------+ +---+ +------+ +---+ +------+ +------+ +------+
|L11|<--->|ASBR11| |ASBR12+--->|L12| |ASBR13+--->|L13| +---+ASBR11| |ASBR12+--+ |ASBR13+---+
+---+ +------+ +------+ +---+ +------+ +---+ | +------+ +------+ | +------+ |
v v v
+-------+ +-------+ +-------+
|IGP-L11| |IGP-L12| |IGP-L13|
+-------+ +-------+ +-------+
Figure 5 : Forwarding Chain for hardware supporting 3 Levels Figure 5 : Forwarding Chain for hardware supporting 3 Levels
Now suppose the hardware on iPE (the ingress PE) supports 2 levels Now suppose the hardware on iPE (the ingress PE) supports 2 levels
of hierarchy only. In that case, the 3-levels forwarding chain in of hierarchy only. In that case, the 3-levels forwarding chain in
Figure 5 needs to be "flattended" into 2 levels only. Figure 5 needs to be "flattended" into 2 levels only.
Outlabel Array Outlabel Array Outlabel-List Outlabel-List
For VPN-P1 For VPN-P2 For VPN-IP1 For VPN-IP2
+------------+ +-------+ +-------+ +------------+ +------------+ +-------+ +-------+ +------------+
|VPN-PE21(P1)|<---| VPN-P1| | VPN-P2|--->|VPN-PE22(P2)| | VPN-L11 |<---|VPN-IP1| | VPN-IP2|--->| VPN-L22 |
+------------+ +---+---+ +---+---+ +------------+ +------------+ +---+---+ +---+---+ +------------+
|VPN-PE22(P1)| | | |VPN-PE23(P2)| | VPN-L21 | | | | VPN-L32 |
+------------+ | | +------------+ +------------+ | | +------------+
| | | |
| | | |
| | | |
Flattened | | Flattened Flattened | | Flattened
pathlist V V pathlist pathlist V V pathlist
+===+===+ +===+===+===+ +=============+ +===+===+ +===+===+===+ +=============+
+--------+ 0 | 1 | | 0 | 0 | 1 +---->|LASBR11(PE22)| +--------+ 0 | 1 | | 0 | 0 | 1 +---->|LASBR11(ePE2)|
| +=|=+=\=+ +=/=+=/=+=\=+ +=============+ | +=|=+=\=+ +=/=+=/=+=\=+ +=============+
v | \ / / \ |LASBR12(PE22)| v | \ / / \ |LASBR12(ePE2)|
+=============+ | \ +-----+ / \ +=============+ +=============+ | \ +-----+ / \ +=============+
|LASBR11(PE21)| | \/ / \ |LASBR13(PE23)| |LASBR11(ePE1)| | \/ / \ |LASBR13(ePE3)|
+=============+ | /\ / \ +=============+ +=============+ | /\ / \ +=============+
|LASBR12(PE21)| | / \ / \ |LASBR12(ePE1)| | / \ / \
+=============+ | / \ / \ +=============+ | / \ / \
| / \ / \ | / \ / \
| / + + \ | / + + \
| + | | \ | + | | \
| | | | \ | | | | \
v v v v \ v v v v \
+---+ +------+ +------+ +---+ +------+ +---+ +------+ +------+ +------+
|L11|<--->|ASBR11| |ASBR12+--->|L12| |ASBR13+--->|L13| +----|ASBR11| |ASBR12+---+ |ASBR13+---+
+---+ +------+ +------+ +---+ +------+ +---+ | +------+ +------+ | +------+ |
v v v
+-------+ +-------+ +-------+
|IGP-L11| |IGP-L12| |IGP-L13|
+-------+ +-------+ +-------+
Figure 6 : Flattening 3 levels to 2 levels of Hierarchy on iPE Figure 6 : Flattening 3 levels to 2 levels of Hierarchy on iPE
Figure 6 represents one way to "flatten" a 3 levels hierarchy into Figure 6 represents one way to "flatten" a 3 levels hierarchy into
two levels. There are few important points. two levels. There are few important points:
o The flattened pathlists have label arrays associated with them. o The flattened pathlists have label lists associated with them.
The size of the label array associated with the flattened The size of the label list associated with the flattened pathlist
pathlist equals the size of the pathlist. Hence it is possible equals the size of the pathlist. Hence it is possible that an
that an implementation includes these label arrays in the implementation includes these label lists in the flattened
flattened pathlist itself pathlist itself
o Because of "flattening", the size of a flattened pathlist may not o Because of "flattening", the size of a flattened pathlist may not
be equal to the size of the label arrays of leaves using the be equal to the size of the label lists of leaves using the
flattened pathlist. flattened pathlist.
o The indices inside a flattened pathlist still indicate the label o The indices inside a flattened pathlist still indicate the label
index in the Outlabel-Arrays of the leaves using that pathlist. index in the Outlabel-Lists of the leaves using that pathlist.
Because the size of the flattened pathlist may be different from Because the size of the flattened pathlist may be different from
the size of the label arrays of the leaves, the indices may be the size of the label lists of the leaves, the indices may be
repeated repeated
o Let's take a look at the flattened pathlist used by the prefix o Let's take a look at the flattened pathlist used by the prefix
"VPN-P2", The pathlist associated with the prefix "VPN-P2" has "VPN-IP2", The pathlist associated with the prefix "VPN-IP2" has
three entries. three entries.
o The first and second entry have index "0". This is because o The first and second entry have index "0". This is because
both entries correspond to PE22. Hence when hashing performed both entries correspond to ePE2. Hence when hashing performed
by the forwarding engine results in using first or the second by the forwarding engine results in using first or the second
entry in the pathlist, the forwarding engine will pick the entry in the pathlist, the forwarding engine will pick the
correct VPN label "VPN-PE22(P2)", which is the label correct VPN label "VPN-L22", which is the label advertised by
advertised by PE22 for the prefix "VPN-P2" ePE2 for the prefix "VPN-IP2"
o The third entry has the index "1". This is because the third o The third entry has the index "1". This is because the third
entry corresponds to PE23. Hence when the hashing is entry corresponds to ePE3. Hence when the hashing is performed
performed by the forwarding engine results in using the third by the forwarding engine results in using the third entry in
entry in the flattened pathlist, the forwarding engine will the flattened pathlist, the forwarding engine will pick the
pick the correct VPN label "VPN-PE22(P2)", which is the label correct VPN label "VPN-L32", which is the label advertised by
advertised by "PE23" for the prefix "VPN-P2" "ePE3" for the prefix "VPN-IP2"
3. Forwarding Behavior 4. Forwarding Behavior
This section explains how the forwarding plane uses the hierarchical
shared forwarding chain to forward a packet.
When a packet arrives at a router, it matches a leaf. A labeled When a packet arrives at a router, it matches a leaf. A labeled
packet matches a label leaf while an IP packet matches an IP prefix packet matches a label leaf while an IP packet matches an IP prefix
leaf. The forwarding engines walks the forwarding chain starting leaf. The forwarding engines walks the forwarding chain starting
from the leaf until the walk terminates on an adjacency. Thus when a from the leaf until the walk terminates on an adjacency. Thus when a
packet arrives, the chain is walked as follows: packet arrives, the chain is walked as follows:
1. Lookup the leaf based on the destination address or the label at 1. Lookup the leaf based on the destination address or the label at
the top of the packet the top of the packet
skipping to change at page 15, line 45 skipping to change at page 14, line 4
When a packet arrives at a router, it matches a leaf. A labeled When a packet arrives at a router, it matches a leaf. A labeled
packet matches a label leaf while an IP packet matches an IP prefix packet matches a label leaf while an IP packet matches an IP prefix
leaf. The forwarding engines walks the forwarding chain starting leaf. The forwarding engines walks the forwarding chain starting
from the leaf until the walk terminates on an adjacency. Thus when a from the leaf until the walk terminates on an adjacency. Thus when a
packet arrives, the chain is walked as follows: packet arrives, the chain is walked as follows:
1. Lookup the leaf based on the destination address or the label at 1. Lookup the leaf based on the destination address or the label at
the top of the packet the top of the packet
2. Retrieve the parent pathlist of the leaf 2. Retrieve the parent pathlist of the leaf
3. Pick the outgoing path from the list of resolved paths in the 3. Pick the outgoing path from the list of resolved paths in the
pathlist. The method by which the outgoing path is picked is pathlist. The method by which the outgoing path is picked is
beyond the scope of this document (i.e. flow-preserving hash beyond the scope of this document (i.e. flow-preserving hash
exploiting entropy within the MPLS stack and IP header). Let the exploiting entropy within the MPLS stack and IP header). Let the
"path-index" of the outgoing path be "i". "path-index" of the outgoing path be "i".
4. If the prefix is labeled, use the "path-index" "i" to retrieve 4. If the prefix is labeled, use the "path-index" "i" to retrieve
the ith label "Li" stored the ith entry in the OutLabel-Array and the ith label "Li" stored the ith entry in the OutLabel-List and
apply the label action of the label on the packet (e.g. for VPN apply the label action of the label on the packet (e.g. for VPN
label on the ingress PE, the label action is "push"). label on the ingress PE, the label action is "push").
5. Move to the parent of the chosen path "i" 5. Move to the parent of the chosen path "i"
6. If the chosen path "i" is recursive, move to its parent prefix 6. If the chosen path "i" is recursive, move to its parent prefix
and go to step 2 and go to step 2
7. If the chosen path "i" is non-recursive move to its parent 7. If the chosen path "i" is non-recursive move to its parent
adjacency adjacency
8. Encapsulate the packet in the L2 string specified by the 8. Encapsulate the packet in the L2 string specified by the
adjacency and send the packet out. adjacency and send the packet out.
Let's applying the above forwarding steps to the example described Let's apply the above forwarding steps to the forwarding chain
in Figure 1 Section 2.3.1. Suppose a packet arrives at ingress PE depicted in Figure 2 in Section 2. Suppose a packet arrives at
iPE from an external neighbor. Assume the packet matches the VPN ingress PE iPE from an external neighbor. Assume the packet matches
prefix VPN-P1. While walking the forwarding chain, the forwarding the VPN prefix VPN-IP1. While walking the forwarding chain, the
engine applies a hashing algorithm to choose the path and the forwarding engine applies a hashing algorithm to choose the path and
hashing at the BGP level yields path 0 while the hashing at the IGP the hashing at the BGP level yields path 0 while the hashing at the
level yields path 1. In that case, the packet will be sent out of IGP level yields path 1. In that case, the packet will be sent out
interface I1 with the label stack "LDP-L12,VPN-L21". of interface I2 with the label stack "IGP-L12,VPN-L11".
Now let's try and apply the above steps to the flattened forwarding Now let's try and apply the above steps to the flattened forwarding
chain illustrated in Figure 6. chain illustrated in Figure 6.
o Suppose a packet arrives at "iPE" and matches the VPN prefix o Suppose a packet arrives at "iPE" and matches the VPN prefix
"VPN-P2" "VPN-IP2"
o The forwarding engine walks to the parent of the "VPN_P2", whiuch o The forwarding engine walks to the parent of the "VPN_P2", which
is the flattened pathlist and applies a hashing algorithm to pick is the flattened pathlist and applies a hashing algorithm to pick
a path a path
o Suppose the hashing by the forwarding engine picks the second o Suppose the hashing by the forwarding engine picks the second
entry in the flattened pathlist associated with the leaf "VPN- entry in the flattened pathlist associated with the leaf "VPN-
P2". IP2".
o Because the second entry has the index "0", the label "VPN- o Because the second entry has the index "0", the label "VPN-L22"
PE22(P2)" is pushed on the packet is pushed on the packet
o At the same time, the forwarding engine picks the second label o At the same time, the forwarding engine picks the second label
from the Outlabel-Array associated with the flattened pathlist. from the Outlabel-Array associated with the flattened pathlist.
Hence the next label that is pushed is "LASBR12(PE22)" Hence the next label that is pushed is "LASBR12(ePE2)"
o The forwarding engine now moves to the parent of the flattened o The forwarding engine now moves to the parent of the flattened
pathlist corresponding tgo the second entry. The parent is the pathlist corresponding to the second entry. The parent is the IGP
IGP label leaf corresponding to "ASBR12" label leaf corresponding to "ASBR12"
o So the packet is forwarded towards the ASBR "ASBR12" and the o So the packet is forwarded towards the ASBR "ASBR12" and the IGP
SR/LDP label at the top will be "L12" label at the top will be "L12"
The packet is arriving at iPE reaches its destination as follows Based on the above steps, a packet arriving at iPE and destined to
the prefix VPN-L22 reaches its destination as follows
o iPE sends the packet along the shortest path towards ASBR12 with o iPE sends the packet along the shortest path towards ASBR12 with
the following label stack starting from the top: {L12, the following label stack starting from the top: {L12,
LASBR12(PE22), VPN-PE22(P2)}. LASBR12(ePE2), VPN-L22}.
o The penultimate hop of ASBR12 pops the top label "L12". Hence the o The penultimate hop of ASBR12 pops the top label "L12". Hence the
packet arrives at ASBR12 with the label stack {LASBR12(PE22), packet arrives at ASBR12 with the label stack {LASBR12(ePE2),
VPN-PE22(P2)} where "LASBR12(PE22)" is the top label. VPN-L22} where "LASBR12(ePE2)" is the top label.
o ASBR12 swaps "LASBR12(PE22)" with the label "LASBR22(PE22)", o ASBR12 swaps "LASBR12(ePE2)" with the label "LASBR22(ePE2)",
which is the label advertised by ASBR22 for the PE22 (the egress which is the label advertised by ASBR22 for the ePE2 (the egress
PE). PE).
o ASBR22 receives the packet with "LASBR22(PE22)" at the top. o ASBR22 receives the packet with "LASBR22(ePE2)" at the top.
o Hence ASBR22 swaps "LASBR22(PE22)" with the LDP/SR label of PE22, o Hence ASBR22 swaps "LASBR22(ePE2)" with the IGP label for ePE2
pushes the label of the next-hop towards PE22 in domain 2, and advertised by the next-hop towards ePE2 in domain 2, and sends
sends the packet along the shortest path towards PE22. the packet along the shortest path towards ePE2.
o The penultimate hop of PE22 pops the top label. Hence PE22 o The penultimate hop of ePE2 pops the top label. Hence ePE2
receives the packet with the top label VPN-PE22(P2) at the top receives the packet with the top label VPN-L22 at the top
o PE22 pops "VPN-PE22(P2)" and sends the packet as a pure IP packet o ePE2 pops "VPN-L22" and sends the packet as a pure IP packet
towards the destination VPN-PE22. towards the destination VPN-IP2.
4. Forwarding Chain Adjustment at a Failure 5. Forwarding Chain Adjustment at a Failure
The hierarchical and shared structure of the forwarding chain The hierarchical and shared structure of the forwarding chain
explained in Section 2. allows modifying a small number of explained in Section 2 allows modifying a small number of
forwarding chain objects to re-route traffic to a pre-calculated forwarding chain objects to re-route traffic to a pre-calculated
equal-cost or backup path without the need to modify the possibly equal-cost or backup path without the need to modify the possibly
very large number of BGP prefixes. In this section, we go over very large number of BGP prefixes. In this section, we go over
various core and edge failure scenarios to illustrate how FIB various core and edge failure scenarios to illustrate how FIB
manager can utilize the forwarding chain structure to achieve prefix manager can utilize the forwarding chain structure to achieve BGP
independent convergence. prefix independent convergence.
4.1. BGP-PIC core 5.1. BGP-PIC core
This section describes the adjustments to the forwarding chain when This section describes the adjustments to the forwarding chain when
a core link or node fails but the BGP next-hop remains reachable. a core link or node fails but the BGP next-hop remains reachable.
There are two case: remote link failure and attached link failure. There are two case: remote link failure and attached link failure.
Node failures are treated as link failures. Node failures are treated as link failures.
When a remote link or node fails, IGP on the ingress PE receives When a remote link or node fails, IGP on the ingress PE receives
advertisement indicating a topology change so IGP re-converges to advertisement indicating a topology change so IGP re-converges to
either find a new next-hop and outgoing interface or remove the path either find a new next-hop and/or outgoing interface or remove the
completely from the IGP prefix used to resolve BGP next-hops. IGP path completely from the IGP prefix used to resolve BGP next-hops.
and/or LDP download the modified IGP leaves with modified outgoing IGP and/or LDP download the modified IGP leaves with modified
labels for labeled core. FIB manager modifies the existing IGP leaf outgoing labels for labeled core.
by executing the steps outlined in Section 2.2.
When a local link fails, FIB manager detects the failure almost When a local link fails, FIB manager detects the failure almost
immediately. The FIB manager marks the impacted path(s) as unuseable immediately. The FIB manager marks the impacted path(s) as unusable
so that only useable paths are used to forward packets. Note that in so that only useable paths are used to forward packets. Hence only
this particular case there is actually no need even to backwalk to IGP pathlists with paths using the failed local link need to be
IGP leaves to adjust the OutLabel-Arrays because FIB can rely on the modified. All other pathlists are not impacted. Note that in this
path-index stored in the useable paths in the loadinfo to pick the particular case there is actually no need even to backwalk to IGP
leaves to adjust the OutLabel-Lists because FIB can rely on the
path-index stored in the useable paths in the pathlist to pick the
right label. right label.
It is noteworthy to mention that because FIB manager modifies the It is noteworthy to mention that because FIB manager modifies the
forwarding chain starting from the IGP leaves only, BGP pathlists forwarding chain starting from the IGP leaves only, BGP pathlists
and leaves are not modified. Hence traffic restoration occurs within and leaves are not modified. Hence traffic restoration occurs within
the time frame of IGP convergence, and, for local link failure, the time frame of IGP convergence, and, for local link failure,
within the timeframe of local detection. Thus it is possible to assuming a backup path has been precomputed, within the timeframe of
achieve sub-50 msec convergence as described in [10] for local link local detection (e.g. 50ms). Examples of solutions that pre-
failure computing backup paths are IP FRR [16] remote LFA [17], Ti-LFA [15]
and MRT [18] or eBGP path having a backup path [10].
Let's apply the procedure to the forwarding chain depicted in Figure Let's apply the procedure to the forwarding chain depicted in Figure
2 Section 2.3.1. Suppose a remote link failure occurs and impacts 2. Suppose a remote link failure occurs and impacts the first ECMP
the first ECMP IGP path to the remote BGP nhop. Upon IGP IGP path to the remote BGP next-hop. Upon IGP convergence, the IGP
convergence, the IGP pathlist of the BGP nhop is updated to reflect pathlist used by the BGP next-hop is updated to reflect the new
the new topology (one path instead of two). As soon as the IGP topology (one path instead of two). As soon as the IGP convergence
convergence is effective for the BGP nhop entry, the new forwarding is effective for the BGP next-hop entry, the new forwarding state is
state is immediately available to all dependent BGP prefixes. The immediately available to all dependent BGP prefixes. The same
same behavior would occur if the failure was local such as an behavior would occur if the failure was local such as an interface
interface going down. As soon as the IGP convergence is complete for going down. As soon as the IGP convergence is complete for the BGP
the BGP nhop IGP route, all its BGP depending routes benefit from next-hop IGP route, all its BGP depending routes benefit from the
the new path. In fact, upon local failure, if LFA protection is new path. In fact, upon local failure, if LFA protection is enabled
enabled for the IGP route to the BGP nhop and a backup path was pre- for the IGP route to the BGP next-hop and a backup path was pre-
computed and installed in the pathlist, upon the local interface computed and installed in the pathlist, upon the local interface
failure, the LFA backup path is immediately activated (sub-50msec) failure, the LFA backup path is immediately activated (sub-50msec)
and thus protection benefits all the depending BGP traffic through and thus protection benefits all the depending BGP traffic through
the hierarchical forwarding dependency between the routes. the hierarchical forwarding dependency between the routes.
4.2. BGP-PIC edge 5.2. BGP-PIC edge
This section describes the adjustments to the forwarding chains as a This section describes the adjustments to the forwarding chains as a
result of edge node or edge link failure result of edge node or edge link failure.
4.2.1. Adjusting forwarding Chain in egress node failure 5.2.1. Adjusting forwarding Chain in egress node failure
When an edge node fails, IGP on neighboring core nodes send route When an edge node fails, IGP on neighboring core nodes send route
updates indicating that the edge node is no longer reachable. IGP updates indicating that the edge node is no longer reachable. IGP
running on the iBGP peers instructs FIB to remove the IP and label running on the iBGP peers instructs FIB to remove the IP and label
leaves corresponding to the failed edge node from FIB. So FIB leaves corresponding to the failed edge node from FIB. So FIB
manager performs the following steps: manager performs the following steps:
o FIB manager deletes the IGP leaf corresponding to the failed edge o FIB manager deletes the IGP leaf corresponding to the failed edge
node node
o FIB manager backwalks to all dependent BGP pathlists and marks o FIB manager backwalks to all dependent BGP pathlists and marks
that path using the deleted IGP leaf as unresolved that path using the deleted IGP leaf as unresolved
o Note that there is no need to modify BGP leaves because each path o Note that there is no need to modify BGP leaves because each path
in the pathlist carries its path index and hence the correct in the pathlist carries its path index and hence the correct
outgoing label will be picked. So for example the forwarding outgoing label will be picked. Consider for example the
chain depicted in Figure 2, if the 1st path becomes unresolved, forwarding chain depicted in Figure 2. If the 1st BGP path
then the forwarding engine will only use the second path path for becomes unresolved, then the forwarding engine will only use the
forwarding. Yet the pathindex of that single resolved path will second path for forwarding. Yet the pathindex of that single
still be 1 and hence the label VPN-L21 or VPN-L22 will be pushed resolved path will still be 1 and hence the label VPN-L12 will be
pushed.
4.2.2. Adjusting Forwarding Chain on PE-CE link Failure 5.2.2. Adjusting Forwarding Chain on PE-CE link Failure
Suppose the link between an edge router and its external peer fails. Suppose the link between an edge router and its external peer fails.
There are two scenarios (1) the edge node attached to the failed There are two scenarios (1) the edge node attached to the failed
link performs next-hop self and (2) the edge node attached to the link performs next-hop self and (2) the edge node attached to the
failure advertises the IP address of the failed link as the next-hop failure advertises the IP address of the failed link as the next-hop
attribute to its iBGP peers. attribute to its iBGP peers.
In the first case, the rest of iBGP peers will remain unaware of the In the first case, the rest of iBGP peers will remain unaware of the
link failure and will continue to forward traffic to the edge node link failure and will continue to forward traffic to the edge node
until the edge node attached to the failed link withdraws the BGP until the edge node attached to the failed link withdraws the BGP
prefixes. If the destination prefixes are multi-homed to another prefixes. If the destination prefixes are multi-homed to another
iBGP peer, say ePE2, then FIB manager on the edge router detecting iBGP peer, say ePE2, then FIB manager on the edge router detecting
the link failure performs the following tasks the link failure applies the following steps:
o FIB manager backwalks to the BGP pathlists marks the path through o FIB manager backwalks to the BGP pathlists marks the path through
the failed link to the external peer as unresolved the failed link to the external peer as unresolved
o Hence traffic will be forwarded used the backup path towards ePE2 o Hence traffic will be forwarded used the backup path towards ePE2
o For labeled traffic o For labeled traffic
o The Outlabel-Array attached to the BGP leaves already o The Outlabel-List attached to the BGP leaf already contains
contains an entry corresponding to the path towards ePE2. an entry corresponding to the backup path.
o The label entry in OutLabel-Arrays corresponding to the o The label entry in OutLabel-List corresponding to the
internal path to ePE2 has swap action and the label internal path to backup egress PE has swap action to the
advertised by ePE2 label advertised by backup egress PE
o For an arriving label packet (e.g. VPN), the top label is o For an arriving label packet (e.g. VPN), the top label is
swapped with the label advertised by ePE2 swapped with the label advertised by backup egress PE and the
packet is sent towards that backup egress PE
o For unlabeled traffic, packets is simply redirected towards ePE2. o For unlabeled traffic, packets are simply redirected towards
To avoid loops, ePE2 MUST treat any core facing path as a backup backup egress PE.
path, otherwise ePE2 may redirect traffic arriving from the core
back to ePE1 causing a loop.
In the second case where the edge router uses the IP address of the In the second case where the edge router uses the IP address of the
failed link as the BGP next-hop, the edge router will still perform failed link as the BGP next-hop, the edge router will still perform
the previous steps. But, unlike the case of next-hop self, IGP on the previous steps. But, unlike the case of next-hop self, IGP on
failed edge node informs the rest of the iBGP peers that IP address failed edge node informs the rest of the iBGP peers that IP address
of the failed link is no longer reachable. Hence the FIB manager on of the failed link is no longer reachable. Hence the FIB manager on
iBGP peers will delete the IGP leaf corresponding to the IP prefix iBGP peers will delete the IGP leaf corresponding to the IP prefix
of the failed link. The behavior of the iBGP peers will be identical of the failed link. The behavior of the iBGP peers will be identical
to the case of edge node failure outlined in Section 4.2.1. to the case of edge node failure outlined in Section 5.2.1.
It is noteworthy to mention that because the edge link failure is It is noteworthy to mention that because the edge link failure is
local to the edge router, sub-50 msec convergence can be achieved as local to the edge router, sub-50 msec convergence can be achieved as
described in [10]. described in [10].
Let's try to apply the case of next-hop self to the forwarding chain Let's try to apply the case of next-hop self to the forwarding chain
depicted in Figure 3. After failure of the link between ePE1 and CE, depicted in Figure 3. After failure of the link between ePE1 and CE,
the forwarding engine will route traffic arriving from the core the forwarding engine will route traffic arriving from the core
towards VPN-NH2 with path-index=1. A packet arriving from the core towards VPN-NH2 with path-index=1. A packet arriving from the core
will contain the label VPN-L11 at top. The label VPN-L11 is swaped will contain the label VPN-L11 at top. The label VPN-L11 is swapped
with the label VPN-L21 and the packet is forwarded towards ePE2 with the label VPN-L21 and the packet is forwarded towards ePE2.
4.3. Handling Failures for Flattended Forwarding Chains 5.3. Handling Failures for Flattended Forwarding Chains
As explained in the Example in Section 2.3.3. , if the number of As explained in the Example in Section 3.2 if the number of
hierarchy levels of a platform cannot support the number of hierarchy levels of a platform cannot support the native number of
hierarchy levels of a recursive dependency, the instantiated hierarchy levels of a recursive forwarding chain, the instantiated
forwarding chain is constructed by flattening two or more levels. forwarding chain is constructed by flattening two or more levels.
Hence a 3 levels chain in Figure 5 is flattened into the 2 levels Hence a 3 levels chain in Figure 5 is flattened into the 2 levels
chain in Figure 6. chain in Figure 6.
While reducing the benefits of BGP-PIC, flattening one hierarchy While reducing the benefits of BGP-PIC, flattening one hierarchy
into a shallower hierarchy does not always result in a complete loss into a shallower hierarchy does not always result in a complete loss
of the benefits of the BGP-PIC. To illustrate this fact suppose of the benefits of the BGP-PIC. To illustrate this fact suppose
ASBR12 is no longer reachable. If the platform supports the full ASBR12 is no longer reachable in domain 1. If the platform supports
hierarchy depth, the forwarding chain is depicted in Figure 5 and the full hierarchy depth, the forwarding chain is the one depicted
hence the FIB manager needs to backwalk one level to the pathlist in Figure 5 and hence the FIB manager needs to backwalk one level to
shared by "PE21" and "PE222" and adjust it. If the platform supports the pathlist shared by "ePE1" and "ePE2" and adjust it. If the
2 levels of hierarchy, then a useable forwarding chain is the one platform supports 2 levels of hierarchy, then a useable forwarding
depicted in Figure 6. In that case, if ASBR12 is no longer chain is the one depicted in Figure 6. In that case, if ASBR12 is no
reachable, the FIB manager has to backwalk to the two flattened longer reachable, the FIB manager has to backwalk to the two
pathlists and update both of them. flattened pathlists and update both of them.
Hence if the platform supports the "unflattened" forwarding chain, The main observation is that the loss of convergence speed due to
then a single pathlist needs to be updated while if the platform the loss of hierarchy depth depends on the structure of the
supports a shallower forwarding chain, then two pathlists need to be forwarding chain itself. To illustrate this fact, let's take two
updated. In the latter case, convergence is still independent of the extremes. Suppose the forwarding objects in level i+1 depend on the
number of leaves due to the fact that the flattened pathlists forwarding objects in level i. If every object on level i+1 depends
continue to be shared among possibly a large number of leaves on a separate object in level i, then flattening level i into level
i+1 will not result in loss of convergence speed. Now let's take the
other extreme. Suppose "n" objects in level i+1 depend on 1 object
in level i. Now suppose FIB flattens level i into level i+1. If a
topology change results in modifying the single object in level i,
then FIB has to backwalk and modify "n" objects in the flattened
level, thereby losing all the benefit of BGP-PIC. Experience shows
that flattening forwarding chains usually results in moderate loss
of BGP-PIC benefits. Further analysis is needed to corroborate and
quantify this statement.
5. Properties 6. Properties
5.1 Coverage 6.1. Coverage
All the possible failures, except CE node failure, are covered, All the possible failures, except CE node failure, are covered,
whether they impact a local or remote IGP path or a local or remote whether they impact a local or remote IGP path or a local or remote
BGP nhop as described in Section 4. This section provides details BGP next-hop as described in Section 5. This section provides
for each failure and now the hierarchical and shared FIB structure details for each failure and now the hierarchical and shared FIB
proposed in this document allows recovery that does not depend on structure proposed in this document allows recovery that does not
number of BGP prefixes depend on number of BGP prefixes.
5.1.1 A remote failure on the path to a BGP nhop 6.1.1. A remote failure on the path to a BGP next-hop
Upon IGP convergence, the IGP leaf for the BGP nhop is updated upon Upon IGP convergence, the IGP leaf for the BGP next-hop is updated
IGP convergence and all the BGP depending routes leverage the new upon IGP convergence and all the BGP depending routes leverage the
IGP forwarding state immediately. new IGP forwarding state immediately.
This BGP resiliency property only depends on IGP convergence and is This BGP resiliency property only depends on IGP convergence and is
independent of the number of BGP prefixes impacted. independent of the number of BGP prefixes impacted.
5.1.2 A local failure on the path to a BGP nhop 6.1.2. A local failure on the path to a BGP next-hop
Upon LFA protection, the IGP leaf for the BGP nhop is updated to use Upon LFA protection, the IGP leaf for the BGP next-hop is updated to
the precomputed LFA backup path and all the BGP depending routes use the precomputed LFA backup path and all the BGP depending routes
leverage this LFA protection. leverage this LFA protection.
This BGP resiliency property only depends on LFA protection and is This BGP resiliency property only depends on LFA protection and is
independent of the number of BGP prefixes impacted. independent of the number of BGP prefixes impacted.
5.1.3 A remote iBGP nhop fails 6.1.3. A remote iBGP next-hop fails
Upon IGP convergence, the IGP leaf for the BGP nhop is deleted and Upon IGP convergence, the IGP leaf for the BGP next-hop is deleted
all the depending BGP Path-Lists are updated to either use the and all the depending BGP Path-Lists are updated to either use the
remaining ECMP BGP best-paths or if none remains available to remaining ECMP BGP best-paths or if none remains available to
activate precomputed backups. activate precomputed backups.
This BGP resiliency property only depends on IGP convergence and is This BGP resiliency property only depends on IGP convergence and is
independent of the number of BGP prefixes impacted. independent of the number of BGP prefixes impacted.
5.1.4 A local eBGP nhop fails 6.1.4. A local eBGP next-hop fails
Upon local link failure detection, the adjacency to the BGP nhop is
deleted and all the depending BGP Path-Lists are updated to either Upon local link failure detection, the adjacency to the BGP next-hop
is deleted and all the depending BGP pathlists are updated to either
use the remaining ECMP BGP best-paths or if none remains available use the remaining ECMP BGP best-paths or if none remains available
to activate precomputed backups. to activate precomputed backups.
This BGP resiliency property only depends on local link failure This BGP resiliency property only depends on local link failure
detection and is independent of the number of BGP prefixes impacted. detection and is independent of the number of BGP prefixes impacted.
5.2 Performance 6.2. Performance
When the failure is local (a local IGP nhop failure or a local eBGP When the failure is local (a local IGP next-hop failure or a local
nhop failure), a pre-computed and pre-installed backup is activated eBGP next-hop failure), a pre-computed and pre-installed backup is
by a local-protection mechanism that does not depend on the number activated by a local-protection mechanism that does not depend on
of BGP destinations impacted by the failure. Sub-50msec is thus the number of BGP destinations impacted by the failure. Sub-50msec
possible even if millions of BGP routes are impacted. is thus possible even if millions of BGP routes are impacted.
When the failure is remote (a remote IGP failure not impacting the When the failure is remote (a remote IGP failure not impacting the
BGP nhop or a remote BGP nhop failure), an alternate path is BGP next-hop or a remote BGP next-hop failure), an alternate path is
activated upon IGP convergence. All the impacted BGP destinations activated upon IGP convergence. All the impacted BGP destinations
benefit from a working alternate path as soon as the IGP convergence benefit from a working alternate path as soon as the IGP convergence
occurs for their impacted BGP nhop even if millions of BGP routes occurs for their impacted BGP next-hop even if millions of BGP
are impacted. routes are impacted.
5.2.1 Perspective 6.2.1. Perspective
The following table puts the BGP PIC benefits in perspective The following table puts the BGP PIC benefits in perspective
assuming assuming
o 1M impacted BGP prefixes o 1M impacted BGP prefixes
o IGP convergence ~ 500 msec o IGP convergence ~ 500 msec
o local protection ~ 50msec o local protection ~ 50msec
skipping to change at page 23, line 4 skipping to change at page 21, line 15
o BGP Convergence per BGP destination ~ 200usec conservative, o BGP Convergence per BGP destination ~ 200usec conservative,
~ 100usec optimistic ~ 100usec optimistic
Without PIC With PIC Without PIC With PIC
Local IGP Failure 10 to 100sec 50msec Local IGP Failure 10 to 100sec 50msec
Local BGP Failure 100 to 200sec 50msec Local BGP Failure 100 to 200sec 50msec
Remote IGP Failure 10 to 100sec 500msec Remote IGP Failure 10 to 100sec 500msec
Local BGP Failure 100 to 200sec 500msec Local BGP Failure 100 to 200sec 500msec
Upon local IGP nhop failure or remote IGP nhop failure, the existing Upon local IGP next-hop failure or remote IGP next-hop failure, the
primary BGP nhop is intact and usable hence the resiliency only existing primary BGP next-hop is intact and usable hence the
depends on the ability of the FIB mechanism to reflect the new path resiliency only depends on the ability of the FIB mechanism to
to the BGP nhop to the depending BGP destinations. Without BGP PIC, reflect the new path to the BGP next-hop to the depending BGP
a conservative back-of-the-envelope estimation for this FIB update destinations. Without BGP PIC, a conservative back-of-the-envelope
is 100usec per BGP destination. An optimistic estimation is 10usec estimation for this FIB update is 100usec per BGP destination. An
per entry. optimistic estimation is 10usec per entry.
Upon local BGP nhop failure or remote BGP nhop failure, without the Upon local BGP next-hop failure or remote BGP next-hop failure,
BGP PIC mechanism, a new BGP Best-Path needs to be recomputed and without the BGP PIC mechanism, a new BGP Best-Path needs to be
new updates need to be sent to peers. This depends on BGP processing recomputed and new updates need to be sent to peers. This depends on
time that will be shared between best-path computation, RIB update BGP processing time that will be shared between best-path
and peer update. A conservative back-of-the-envelope estimation for computation, RIB update and peer update. A conservative back-of-the-
this is 200usec per BGP destination. An optimistic estimation is envelope estimation for this is 200usec per BGP destination. An
100usec per entry. optimistic estimation is 100usec per entry.
5.3 Automated 6.3. Automated
The BGP PIC solution does not require any operator involvement. The The BGP PIC solution does not require any operator involvement. The
process is entirely automated as part of the FIB implementation. process is entirely automated as part of the FIB implementation.
The salient points enabling this automation are: The salient points enabling this automation are:
o Extension of the BGP Best Path to compute more than one primary o Extension of the BGP Best Path to compute more than one primary
([11]and [12]) or backup BGP nhop ([6] and [13]). ([11]and [12]) or backup BGP next-hop ([6] and [13]).
o Sharing of BGP Path-list across BGP destinations with same o Sharing of BGP Path-list across BGP destinations with same
primary and backup BGP nhop primary and backup BGP next-hop
o Hierarchical indirection and dependency between BGP Path-List and o Hierarchical indirection and dependency between BGP pathlist and
IGP-Path-List IGP pathlist
5.4 Incremental Deployment 6.4. Incremental Deployment
As soon as one router supports BGP PIC solution, it benefits from As soon as one router supports BGP PIC solution, it benefits from
all its benefits without any requirement for other routers to all its benefits without any requirement for other routers to
support BGP PIC. support BGP PIC.
6. Dependency 7. Dependency
This section describes the required functionality in the forwarding This section describes the required functionality in the forwarding
and control planes to support BGP-PIC described in this document and control planes to support BGP-PIC described in this document
6.1 Hierarchical Hardware FIB
7.1. Hierarchical Hardware FIB
BGP PIC requires a hierarchical hardware FIB support: for each BGP BGP PIC requires a hierarchical hardware FIB support: for each BGP
forwarded packet, a BGP leaf is looked up, then a BGP Pathlist is forwarded packet, a BGP leaf is looked up, then a BGP Pathlist is
consulted, then an IGP Pathlist, then an Adjacency. consulted, then an IGP Pathlist, then an Adjacency.
An alternative method consists in "flattening" the dependencies when An alternative method consists in "flattening" the dependencies when
programming the BGP destinations into HW FIB resulting in programming the BGP destinations into HW FIB resulting in
potentially eliminating both the BGP Path-List and IGP Path-List potentially eliminating both the BGP Path-List and IGP Path-List
consultation. Such an approach decreases the number of memory consultation. Such an approach decreases the number of memory
lookup's per forwarding operation at the expense of HW FIB memory lookup's per forwarding operation at the expense of HW FIB memory
increase (flattening means less sharing hence duplication), loss of increase (flattening means less sharing hence duplication), loss of
ECMP properties (flattening means less pathlist entropy) and loss of ECMP properties (flattening means less pathlist entropy) and loss of
BGP PIC properties. BGP PIC properties.
6.2 Availability of more than one primary or secondary BGP next-hops 7.2. Availability of more than one primary or secondary BGP next-
hops
When the primary BGP next-hop fails, BGP PIC depends on the When the primary BGP next-hop fails, BGP PIC depends on the
availability of a pre-computed and pre-installed secondary BGP next- availability of a pre-computed and pre-installed secondary BGP next-
hop in the BGP Pathlist. hop in the BGP Pathlist.
The existence of a secondary next-hop is clear for the following The existence of a secondary next-hop is clear for the following
reason: a service caring for network availability will require two reason: a service caring for network availability will require two
disjoint network connections hence two BGP nhops. disjoint network connections hence two BGP next-hops.
The BGP distribution of the secondary next-hop is available thanks The BGP distribution of the secondary next-hop is available thanks
to the following BGP mechanisms: Add-Path [11], BGP Best-External to the following BGP mechanisms: Add-Path [11], BGP Best-External
[6], diverse path [12], and the frequent use in VPN deployments of [6], diverse path [12], and the frequent use in VPN deployments of
different VPN RD's per PE. It is noteworthy to mention that the different VPN RD's per PE. It is noteworthy to mention that the
availability of another BGP path does not mean that all failure availability of another BGP path does not mean that all failure
scenarios can be covered by simply forwarding traffic to the scenarios can be covered by simply forwarding traffic to the
available secondary path. The discussion of how to cover various available secondary path. The discussion of how to cover various
failure scenarios is beyond the scope of this document failure scenarios is beyond the scope of this document
7.3. Pre-Computation of a secondary BGP next-hop
6.3 Pre-Computation of a secondary BGP nhop
[13] describes how a secondary BGP next-hop can be precomputed on a [13] describes how a secondary BGP next-hop can be precomputed on a
per BGP destination basis. per BGP destination basis.
7. Security Considerations 8. Security Considerations
No additional security risk is introduced by using the mechanisms The behavior described in this document is internal functionality
proposed in this document to a router that result in significant improvement to convergence
time as well as reduction in CPU and memory used by FIB while not
showing change in basic routing and forwarding functionality. As
such no additional security risk is introduced by using the
mechanisms proposed in this document.
8. IANA Considerations 9. IANA Considerations
No requirements for IANA No requirements for IANA
9. Conclusions 10. Conclusions
This document proposes a hierarchical and shared forwarding chain This document proposes a hierarchical and shared forwarding chain
structure that allows achieving prefix independent convergence, structure that allows achieving BGP prefix independent
and in the case of locally detected failures, sub-50 msec convergence, and in the case of locally detected failures, sub-50
convergence. A router can construct the forwarding chains in a msec convergence. A router can construct the forwarding chains in
completely transparent manner with zero operator intervention. It a completely transparent manner with zero operator intervention
supports incremental deployment. thereby supporting smooth and incremental deployment.
10. References 11. References
10.1. Normative References 11.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate [1] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[2] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol [2] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol
4 (BGP-4), RFC 4271, January 2006 4 (BGP-4), RFC 4271, January 2006
[3] Bates, T., Chandra, R., Katz, D., and Rekhter Y., [3] Bates, T., Chandra, R., Katz, D., and Rekhter Y.,
"Multiprotocol Extensions for BGP", RFC 4760, January 2007 "Multiprotocol Extensions for BGP", RFC 4760, January 2007
[4] Y. Rekhter and E. Rosen, " Carrying Label Information in BGP- [4] Y. Rekhter and E. Rosen, " Carrying Label Information in BGP-
4", RFC 3107, May 2001 4", RFC 3107, May 2001
[5] Andersson, L., Minei, I., and B. Thomas, "LDP Specification", [5] Andersson, L., Minei, I., and B. Thomas, "LDP Specification",
RFC 5036, October 2007 RFC 5036, October 2007
10.2. Informative References 11.2. Informative References
[6] Marques,P., Fernando, R., Chen, E, Mohapatra, P., Gredler, H., [6] Marques,P., Fernando, R., Chen, E, Mohapatra, P., Gredler, H.,
"Advertisement of the best external route in BGP", draft-ietf- "Advertisement of the best external route in BGP", draft-ietf-
idr-best-external-05.txt, January 2012. idr-best-external-05.txt, January 2012.
[7] Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh [7] Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh
Framework", RFC 5565, June 2009. Framework", RFC 5565, June 2009.
[8] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private [8] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006. Networks (VPNs)", RFC 4364, February 2006.
skipping to change at page 26, line 10 skipping to change at page 24, line 30
[10] O. Bonaventure, C. Filsfils, and P. Francois. "Achieving sub- [10] O. Bonaventure, C. Filsfils, and P. Francois. "Achieving sub-
50 milliseconds recovery upon bgp peering link failures, " 50 milliseconds recovery upon bgp peering link failures, "
IEEE/ACM Transactions on Networking, 15(5):1123-1135, 2007 IEEE/ACM Transactions on Networking, 15(5):1123-1135, 2007
[11] D. Walton, A. Retana, E. Chen, J. Scudder, "Advertisement of [11] D. Walton, A. Retana, E. Chen, J. Scudder, "Advertisement of
Multiple Paths in BGP", draft-ietf-idr-add-paths-12.txt, Multiple Paths in BGP", draft-ietf-idr-add-paths-12.txt,
November 2015 November 2015
[12] R. Raszuk, R. Fernando, K. Patel, D. McPherson, K. Kumaki, [12] R. Raszuk, R. Fernando, K. Patel, D. McPherson, K. Kumaki,
"Distribution of diverse BGP paths", RFC 6774.txt, November "Distribution of diverse BGP paths", RFC 6774, November 2012
2012
[13] P. Mohapatra, R. Fernando, C. Filsfils, and R. Raszuk, "Fast [13] P. Mohapatra, R. Fernando, C. Filsfils, and R. Raszuk, "Fast
Connectivity Restoration Using BGP Add-path", draft-pmohapat- Connectivity Restoration Using BGP Add-path", draft-pmohapat-
idr-fast-conn-restore-03, Jan 2013 idr-fast-conn-restore-03, Jan 2013
[14] C. Filsfils, S. Previdi, A. Bashandy, B. Decraene, S. [14] C. Filsfils, S. Previdi, A. Bashandy, B. Decraene, S.
Litkowski, M. Horneffer, R. Shakir, J. Tansura, E. Crabbe Litkowski, M. Horneffer, R. Shakir, J. Tansura, E. Crabbe
"Segment Routing with MPLS data plane", draft-ietf-spring- "Segment Routing with MPLS data plane", draft-ietf-spring-
segment-routing-mpls-02 (work in progress), October 2015 segment-routing-mpls-02 (work in progress), October 2015
11. Acknowledgments [15] C. Filsfils, S. Previdi, A. Bashandy, B. Decraene, " Topology
Independent Fast Reroute using Segment Routing", draft-
francois-spring-segment-routing-ti-lfa-02 (work in progress),
August 2015
[16] M. Shand and S. Bryant, "IP Fast Reroute Framework", RFC 5714,
January 2010
[17] S. Bryant, C. Filsfils, S. Previdi, M. Shand, N So, " Remote
Loop-Free Alternate (LFA) Fast Reroute (FRR)", RFC 7490 April
2015
[18] A. Atlas, C. Bowers, G. Enyedi, " An Architecture for IP/LDP
Fast-Reroute Using Maximally Redundant Trees", draft-ietf-
rtgwg-mrt-frr-architecture-10 (work in progress), February
2016
12. Acknowledgments
Special thanks to Neeraj Malhotra, Yuri Tsier for the valuable Special thanks to Neeraj Malhotra, Yuri Tsier for the valuable
help help
Special thanks to Bruno Decraene for the valuable comments Special thanks to Bruno Decraene for the valuable comments
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
Authors' Addresses Authors' Addresses
 End of changes. 160 change blocks. 
494 lines changed or deleted 488 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/