draft-ietf-rtgwg-bgp-pic-06.txt   draft-ietf-rtgwg-bgp-pic-07.txt 
Network Working Group A. Bashandy, Ed. Network Working Group A. Bashandy, Ed.
Internet Draft C. Filsfils Internet Draft C. Filsfils
Intended status: Informational Cisco Systems Intended status: Informational Cisco Systems
Expires: May 2018 P. Mohapatra Expires: September 2018 P. Mohapatra
Sproute Networks Sproute Networks
November 20, 2017 March 30, 2018
BGP Prefix Independent Convergence BGP Prefix Independent Convergence
draft-ietf-rtgwg-bgp-pic-06.txt draft-ietf-rtgwg-bgp-pic-07.txt
Abstract Abstract
In the network comprising thousands of iBGP peers exchanging millions In the network comprising thousands of iBGP peers exchanging millions
of routes, many routes are reachable via more than one next-hop. of routes, many routes are reachable via more than one next-hop.
Given the large scaling targets, it is desirable to restore traffic Given the large scaling targets, it is desirable to restore traffic
after failure in a time period that does not depend on the number of after failure in a time period that does not depend on the number of
BGP prefixes. In this document we proposed an architecture by which BGP prefixes. In this document we proposed an architecture by which
traffic can be re-routed to ECMP or pre-calculated backup paths in a traffic can be re-routed to ECMP or pre-calculated backup paths in a
timeframe that does not depend on the number of BGP prefixes. The timeframe that does not depend on the number of BGP prefixes. The
skipping to change at page 2, line 19 skipping to change at page 2, line 19
documents at any time. It is inappropriate to use Internet-Drafts documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in as reference material or to cite them other than as "work in
progress." progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on May 20, 2018. This Internet-Draft will expire on September 30, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License. warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction...................................................3 1. Introduction...................................................3
1.1. Conventions used in this document.........................4 1.1. Terminology...............................................4
1.2. Terminology...............................................4 2. Overview.......................................................5
2. Overview.......................................................6
2.1. Dependency................................................6 2.1. Dependency................................................6
2.1.1. Hierarchical Hardware FIB............................6 2.1.1. Hierarchical Hardware FIB............................6
2.1.2. Availability of more than one primary or secondary BGP 2.1.2. Availability of more than one primary or secondary BGP
next-hops...................................................7 next-hops...................................................7
2.2. BGP-PIC Illustration......................................7 2.2. BGP-PIC Illustration......................................7
3. Constructing the Shared Hierarchical Forwarding Chain..........9 3. Constructing the Shared Hierarchical Forwarding Chain..........9
3.1. Constructing the BGP-PIC forwarding Chain.................9 3.1. Constructing the BGP-PIC forwarding Chain.................9
3.2. Example: Primary-Backup Path Scenario....................10 3.2. Example: Primary-Backup Path Scenario....................10
4. Forwarding Behavior...........................................11 4. Forwarding Behavior...........................................11
5. Handling Platforms with Limited Levels of Hierarchy...........12 5. Handling Platforms with Limited Levels of Hierarchy...........12
skipping to change at page 3, line 36 skipping to change at page 3, line 35
11.2. Informative References..................................28 11.2. Informative References..................................28
12. Acknowledgments..............................................29 12. Acknowledgments..............................................29
Appendix A. Perspective..........................................30 Appendix A. Perspective..........................................30
1. Introduction 1. Introduction
As a path vector protocol, BGP propagates reachability serially. As a path vector protocol, BGP propagates reachability serially.
Hence BGP convergence speed is limited by the time taken to Hence BGP convergence speed is limited by the time taken to
serially propagate reachability information from the point of serially propagate reachability information from the point of
failure to the device that must re-converge. BGP speakers exchange failure to the device that must re-converge. BGP speakers exchange
reachability information about prefixes[2][3] and, for labeled reachability information about prefixes[1][2] and, for labeled
address families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128, an address families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128, an
edge router assigns local labels to prefixes and associates the edge router assigns local labels to prefixes and associates the
local label with each advertised prefix such as L3VPN [8], 6PE local label with each advertised prefix such as L3VPN [7], 6PE
[9], and Softwire [7] using BGP label unicast technique[4]. A BGP [8], and Softwire [6] using BGP label unicast technique[3]. A BGP
speaker then applies the path selection steps to choose the best speaker then applies the path selection steps to choose the best
path. In modern networks, it is not uncommon to have a prefix path. In modern networks, it is not uncommon to have a prefix
reachable via multiple edge routers. In addition to proprietary reachable via multiple edge routers. In addition to proprietary
techniques, multiple techniques have been proposed to allow for techniques, multiple techniques have been proposed to allow for
BGP to advertise more than one path for a given prefix BGP to advertise more than one path for a given prefix
[6][11][12], whether in the form of equal cost multipath or [5][10][11], whether in the form of equal cost multipath or
primary-backup. Another common and widely deployed scenario is primary-backup. Another common and widely deployed scenario is
L3VPN with multi-homed VPN sites with unique Route Distinguisher. L3VPN with multi-homed VPN sites with unique Route Distinguisher.
It is advantageous to utilize the commonality among paths used by It is advantageous to utilize the commonality among paths used by
NLRIs to significantly improve convergence in case of topology NLRIs to significantly improve convergence in case of topology
modifications. modifications.
This document proposes a hierarchical and shared forwarding chain This document proposes a hierarchical and shared forwarding chain
organization that allows traffic to be restored to pre-calculated organization that allows traffic to be restored to pre-calculated
alternative equal cost primary path or backup path in a time alternative equal cost primary path or backup path in a time
period that does not depend on the number of BGP prefixes. The period that does not depend on the number of BGP prefixes. The
technique relies on internal router behavior that is completely technique relies on internal router behavior that is completely
transparent to the operator and can be incrementally deployed and transparent to the operator and can be incrementally deployed and
enabled with zero operator intervention. enabled with zero operator intervention.
1.1. Conventions used in this document 1.1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in RFC-2119
[1].
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lower case uses of these words are not to
be interpreted as carrying RFC-2119 significance.
1.2. Terminology
This section defines the terms used in this document. For ease of This section defines the terms used in this document. For ease of
use, we will use terms similar to those used by L3VPN [8] use, we will use terms similar to those used by L3VPN [7]
o BGP prefix: A prefix P/m (of any AFI/SAFI) that a BGP speaker o BGP prefix: A prefix P/m (of any AFI/SAFI) that a BGP speaker
has a path for. has a path for.
o IGP prefix: A prefix P/m (of any AFI/SAFI) that is learnt via o IGP prefix: A prefix P/m (of any AFI/SAFI) that is learnt via
an Interior Gateway Protocol, such as OSPF and ISIS, has a path an Interior Gateway Protocol, such as OSPF and ISIS, has a path
for. The prefix may be learnt directly through the IGP or for. The prefix may be learnt directly through the IGP or
redistributed from other protocol(s) redistributed from other protocol(s)
o CE: An external router through which an egress PE can reach a o CE: An external router through which an egress PE can reach a
skipping to change at page 5, line 23 skipping to change at page 5, line 8
o Backup path: A recursive or non-recursive path that can be used o Backup path: A recursive or non-recursive path that can be used
only after some or all primary paths become unreachable only after some or all primary paths become unreachable
o Leaf: A container data structure for a prefix or local label. o Leaf: A container data structure for a prefix or local label.
Alternatively, it is the data structure that contains prefix Alternatively, it is the data structure that contains prefix
specific information. specific information.
o IP leaf: The leaf corresponding to an IPv4 or IPv6 prefix o IP leaf: The leaf corresponding to an IPv4 or IPv6 prefix
o Label leaf. The leaf corresponding to a locally allocated label o Label leaf. The leaf corresponding to a locally allocated label
such as the VPN label on an egress PE [8]. such as the VPN label on an egress PE [7].
o Pathlist: An array of paths used by one or more prefix to forward o Pathlist: An array of paths used by one or more prefix to forward
traffic to destination(s) covered by a IP prefix. Each path in traffic to destination(s) covered by a IP prefix. Each path in
the pathlist carries its "path-index" that identifies its the pathlist carries its "path-index" that identifies its
position in the array of paths. "). In general, the value of the position in the array of paths. "). In general, the value of the
"path-index" stored in path may not necessarily has the same "path-index" stored in path may not necessarily has the same
value of the location of the path in the pathlist. For example value of the location of the path in the pathlist. For example
the 3rd path may carry path-index value of 1 the 3rd path may carry path-index value of 1
o A pathlist may contain a mix of primary and backup paths o A pathlist may contain a mix of primary and backup paths
skipping to change at page 7, line 25 skipping to change at page 7, line 16
When the primary BGP next-hop fails, BGP PIC depends on the When the primary BGP next-hop fails, BGP PIC depends on the
availability of a pre-computed and pre-installed secondary BGP next- availability of a pre-computed and pre-installed secondary BGP next-
hop in the BGP Pathlist. hop in the BGP Pathlist.
The existence of a secondary next-hop is clear for the following The existence of a secondary next-hop is clear for the following
reason: a service caring for network availability will require two reason: a service caring for network availability will require two
disjoint network connections hence two BGP next-hops. disjoint network connections hence two BGP next-hops.
The BGP distribution of the secondary next-hop is available thanks The BGP distribution of the secondary next-hop is available thanks
to the following BGP mechanisms: Add-Path [11], BGP Best-External to the following BGP mechanisms: Add-Path [10], BGP Best-External
[6], diverse path [12], and the frequent use in VPN deployments of [5], diverse path [11], and the frequent use in VPN deployments of
different VPN RD's per PE. It is noteworthy to mention that the different VPN RD's per PE. It is noteworthy to mention that the
availability of another BGP path does not mean that all failure availability of another BGP path does not mean that all failure
scenarios can be covered by simply forwarding traffic to the scenarios can be covered by simply forwarding traffic to the
available secondary path. The discussion of how to cover various available secondary path. The discussion of how to cover various
failure scenarios is beyond the scope of this document failure scenarios is beyond the scope of this document
2.2. BGP-PIC Illustration 2.2. BGP-PIC Illustration
To illustrate the two pillars above as well as the platform To illustrate the two pillars above as well as the platform
dependency, we will use an example of a simple multihomed L3VPN [8] dependency, we will use an example of a simple multihomed L3VPN [7]
prefix in a BGP-free core running LDP [5] or segment routing over prefix in a BGP-free core running LDP [4] or segment routing over
MPLS forwarding plane [14]. MPLS forwarding plane [13].
+--------------------------------+ +--------------------------------+
| | | |
| ePE2 (IGP-IP1 192.0.2.1, Loopback) | ePE2 (IGP-IP1 192.0.2.1, Loopback)
| | \ | | \
| | \ | | \
| | \ | | \
iPE | CE....VRF "Blue", ASnum 65000 iPE | CE....VRF "Blue", ASnum 65000
| | / (VPN-IP1 11.1.1.0/24) | | / (VPN-IP1 198.51.100.0/24)
| | / (VPN-IP2 11.1.2.0/24) | | / (VPN-IP2 203.0.113.0/24)
| LDP/Segment-Routing Core | / | LDP/Segment-Routing Core | /
| ePE1 (IGP-IP2 192.0.2.2, Loopback) | ePE1 (IGP-IP2 192.0.2.2, Loopback)
| | | |
+--------------------------------+ +--------------------------------+
Figure 1 VPN prefix reachable via multiple PEs Figure 1 VPN prefix reachable via multiple PEs
Referring to Figure 1, suppose the iPE (the ingress PE) receives Referring to Figure 1, suppose the iPE (the ingress PE) receives
NLRIs for the VPN prefixes VPN-IP1 and VPN-IP2 from two egress PEs, NLRIs for the VPN prefixes VPN-IP1 and VPN-IP2 from two egress PEs,
ePE1 and ePE2 with next-hop BGP-NH1 and BGP-NH2, respectively. ePE1 and ePE2 with next-hop BGP-NH1 and BGP-NH2, respectively.
Assume that ePE1 advertise the VPN labels VPN-L11 and VPN-L12 while Assume that ePE1 advertise the VPN labels VPN-L11 and VPN-L12 while
ePE2 advertise the VPN labels VPN-L21 and VPN-L22 for VPN-IP1 and ePE2 advertise the VPN labels VPN-L21 and VPN-L22 for VPN-IP1 and
VPN-IP2, respectively. Suppose that BGP-NH1 and BGP-NH2 are resolved VPN-IP2, respectively. Suppose that BGP-NH1 and BGP-NH2 are resolved
via the IGP prefixes IGP-IP1 and IGP-P2, where each happen to have 2 via the IGP prefixes IGP-IP1 and IGP-P2, where each happen to have 2
ECMP paths with IGP-NH1 and IGP-NH2 reachable via the interfaces I1 ECMP paths with IGP-NH1 and IGP-NH2 reachable via the interfaces I1
and I2, respectively. Suppose that local labels (whether LDP [5] or and I2, respectively. Suppose that local labels (whether LDP [4] or
segment routing [14]) on the downstream LSRs for IGP-IP1 are IGP-L11 segment routing [13]) on the downstream LSRs for IGP-IP1 are IGP-L11
and IGP-L12 while for IGP-P2 are IGP-L21 and IGP-L22. As such, the and IGP-L12 while for IGP-P2 are IGP-L21 and IGP-L22. As such, the
routing table at iPE is as follows: routing table at iPE is as follows:
65000:11.1.1.0/24 65000: 198.51.100.0/24
via ePE1 (192.0.2.1), VPN Label: VPN-L11 via ePE1 (192.0.2.1), VPN Label: VPN-L11
via ePE2 (192.0.2.2), VPN Label: VPN-L21 via ePE2 (192.0.2.2), VPN Label: VPN-L21
65000:11.1.2.0/24 65000: 203.0.113.0/24
via ePE1 (192.0.2.1), VPN Label: VPN-L12 via ePE1 (192.0.2.1), VPN Label: VPN-L12
via ePE2 (192.0.2.2), VPN Label: VPN-L22 via ePE2 (192.0.2.2), VPN Label: VPN-L22
192.0.2.1/32 192.0.2.1/32
via Core, Label: IGP-L11 via Core, Label: IGP-L11
via Core, Label: IGP-L12 via Core, Label: IGP-L12
192.0.2.2/32 192.0.2.2/32
via Core, Label: IGP-L21 via Core, Label: IGP-L21
via Core, Label: IGP-L22 via Core, Label: IGP-L22
skipping to change at page 9, line 42 skipping to change at page 9, line 32
3. Constructing the Shared Hierarchical Forwarding Chain 3. Constructing the Shared Hierarchical Forwarding Chain
Constructing the forwarding chain is an application of the two Constructing the forwarding chain is an application of the two
pillars described in Section 2. This section describes how to pillars described in Section 2. This section describes how to
construct the forwarding chain in hierarchical shared manner construct the forwarding chain in hierarchical shared manner
3.1. Constructing the BGP-PIC forwarding Chain 3.1. Constructing the BGP-PIC forwarding Chain
The whole process starts when BGP downloads a prefix to FIB. The The whole process starts when BGP downloads a prefix to FIB. The
prefix contains one or more outgoing paths. For certain labeled prefix contains one or more outgoing paths. For certain labeled
prefixes, such as VPN [8] prefixes, each path may be associated with prefixes, such as VPN [7] prefixes, each path may be associated with
an outgoing label and the prefix itself may be assigned a local an outgoing label and the prefix itself may be assigned a local
label. The list of outgoing paths defines a pathlist. If such label. The list of outgoing paths defines a pathlist. If such
pathlist does not already exist, then FIB creates a new pathlist, pathlist does not already exist, then FIB creates a new pathlist,
otherwise the existing pathlist is used. The BGP prefix is added as otherwise the existing pathlist is used. The BGP prefix is added as
a dependent of the pathlist. a dependent of the pathlist.
The previous step constructs the upper part of the hierarchical The previous step constructs the upper part of the hierarchical
forwarding chain. The forwarding chain is completed by resolving the forwarding chain. The forwarding chain is completed by resolving the
paths of the pathlist. A BGP path usually consists of a next-hop. paths of the pathlist. A BGP path usually consists of a next-hop.
The next-hop is resolved by finding a matching IGP prefix. The next-hop is resolved by finding a matching IGP prefix.
skipping to change at page 11, line 40 skipping to change at page 11, line 31
3. Pick the outgoing path "Pi" from the list of resolved paths in 3. Pick the outgoing path "Pi" from the list of resolved paths in
the pathlist. The method by which the outgoing path is picked is the pathlist. The method by which the outgoing path is picked is
beyond the scope of this document (e.g. flow-preserving hash beyond the scope of this document (e.g. flow-preserving hash
exploiting entropy within the MPLS stack and IP header). Let the exploiting entropy within the MPLS stack and IP header). Let the
"path-index" of the outgoing path "Pi" be "j". "path-index" of the outgoing path "Pi" be "j".
4. If the prefix is labeled, use the "path-index" "j" to retrieve 4. If the prefix is labeled, use the "path-index" "j" to retrieve
the jth label "Lj" stored the jth entry in the OutLabel-List and the jth label "Lj" stored the jth entry in the OutLabel-List and
apply the label action of the label on the packet (e.g. for VPN apply the label action of the label on the packet (e.g. for VPN
label on the ingress PE, the label action is "push"). As label on the ingress PE, the label action is "push"). As
mentioned in Section 1.2, the value of the "path-index" stored mentioned in Section 1.1 the value of the "path-index" stored
in path may not necessarily be the same value of the location of in path may not necessarily be the same value of the location of
the path in the pathlist. the path in the pathlist.
5. Move to the parent of the chosen path "Pi" 5. Move to the parent of the chosen path "Pi"
6. If the chosen path "Pi" is recursive, move to its parent prefix 6. If the chosen path "Pi" is recursive, move to its parent prefix
and go to step 2 and go to step 2
7. If the chosen path is non-recursive move to its parent adjacency. 7. If the chosen path is non-recursive move to its parent adjacency.
Otherwise go to the next step. Otherwise go to the next step.
skipping to change at page 14, line 34 skipping to change at page 14, line 22
5. If there is an OutLabel-list associated with the pathlist, then 5. If there is an OutLabel-list associated with the pathlist, then
if the path "Pi" is chosen by the hashing algorithm, retrieve the if the path "Pi" is chosen by the hashing algorithm, retrieve the
label at location "i" in that OutLabel-list and apply the label label at location "i" in that OutLabel-list and apply the label
action of that label on the packet action of that label on the packet
In the next subsection, we apply the steps in this subsection to a In the next subsection, we apply the steps in this subsection to a
sample scenario. sample scenario.
5.2. Example: Flattening a forwarding chain 5.2. Example: Flattening a forwarding chain
This example uses a case of inter-AS option C [8] where there are 3 This example uses a case of inter-AS option C [7] where there are 3
levels of hierarchy. Figure 4 illustrates the sample topology. To levels of hierarchy. Figure 4 illustrates the sample topology. To
force 3 levels of hierarchy, the ASBRs on the ingress domain (domain force 3 levels of hierarchy, the ASBRs on the ingress domain (domain
1) advertise the core routers of the egress domain (domain 2) to the 1) advertise the core routers of the egress domain (domain 2) to the
ingress PE (iPE) via BGP-LU [4] instead of redistributing them into ingress PE (iPE) via BGP-LU [3] instead of redistributing them into
the IGP of domain 1. The end result is that the ingress PE (iPE) has the IGP of domain 1. The end result is that the ingress PE (iPE) has
2 levels of recursion for the VPN prefix VPN-IP1 and VPN2-IP2. 2 levels of recursion for the VPN prefix VPN-IP1 and VPN2-IP2.
Domain 1 Domain 2 Domain 1 Domain 2
+-------------+ +-------------+ +-------------+ +-------------+
| | | | | | | |
| LDP/SR Core | | LDP/SR core | | LDP/SR Core | | LDP/SR core |
| | | | | | | |
| (192.0.1.1) | | | (192.0.2.4) | |
| ASBR11---------ASBR21........ePE1(192.0.2.1) | ASBR11-------ASBR21........ePE1(192.0.2.1)
| | \ / | . . |\ | | \ / | . . |\
| | \ / | . . | \ | | \ / | . . | \
| | \ / | . . | \ | | \ / | . . | \
| | \/ | .. | \VPN-IP1 (11.1.1.0/24) | | \/ | .. | \VPN-IP1(198.51.100.0/24)
| | /\ | . . | /VRF "Blue" ASn: 65000 | | /\ | . . | /VRF "Blue" ASn: 65000
| | / \ | . . | / | | / \ | . . | /
| | / \ | . . | / | | / \ | . . | /
| | / \ | . . |/ | | / \ | . . |/
iPE ASBR12---------ASBR22........ePE2 (192.0.2.2) iPE ASBR12-------ASBR22........ePE2 (192.0.2.2)
| (192.0.1.2) | |\ | (192.0.2.5) | |\
| | | | \ | | | | \
| | | | \ | | | | \
| | | | \VRF "Blue" ASn: 65000 | | | | \VRF "Blue" ASn: 65000
| | | | /VPN-IP2 (11.1.2.0/24) | | | | /VPN-IP2(203.0.113.0/24)
| | | | / | | | | /
| | | | / | | | | /
| | | |/ | | | |/
| ASBR13---------ASBR23........ePE3(192.0.2.3) | ASBR13-------ASBR23........ePE3(192.0.2.3)
| (192.0.1.3) | | | (192.0.2.6) | |
| | | | | | | |
| | | | | | | |
+-------------+ +-------------+ +-------------+ +-------------+
<============ <========= <============ <=========== <========= <============
Advertise ePEx Advertise Redistribute Advertise ePEx Advertise Redistribute
Using iBGP-LU ePEx Using IGP into Using iBGP-LU ePEx Using IGP into
eBGP-LU BGP eBGP-LU BGP
Figure 4 : Sample 3-level hierarchy topology Figure 4 : Sample 3-level hierarchy topology
We will make the following assumptions about connectivity We will make the following assumptions about connectivity
o In "domain 2", both ASBR21 and ASBR22 can reach both ePE1 and o In "domain 2", both ASBR21 and ASBR22 can reach both ePE1 and
ePE2 using the same distance ePE2 using the same distance
o In "domain 2", only ASBR23 can reach ePE3 o In "domain 2", only ASBR23 can reach ePE3
o In "domain 1", iPE (the ingress PE) can reach ASBR11, ASBR12, and o In "domain 1", iPE (the ingress PE) can reach ASBR11, ASBR12, and
ASBR13 via IGP using the same distance. ASBR13 via IGP using the same distance.
We will make the following assumptions about the labels We will make the following assumptions about the labels
o The VPN labels advertised by ePE1 and ePE2 for prefix VPN-IP1 are o The VPN labels advertised by ePE1 and ePE2 for prefix VPN-IP1 are
VPN-L11 and VPN-L21, respectively VPN-L11 and VPN-L21, respectively
o The VPN labels advertised by ePE2 and ePE3 for prefix VPN-IP2 are o The VPN labels advertised by ePE2 and ePE3 for prefix VPN-IP2 are
VPN-L22 and VPN-L32, respectively VPN-L22 and VPN-L32, respectively
o The labels advertised by ASBR11 to iPE using BGP-LU [4] for the o The labels advertised by ASBR11 to iPE using BGP-LU [3] for the
egress PEs ePE1 and ePE2 are LASBR11(ePE1) and LASBR11(ePE2), egress PEs ePE1 and ePE2 are LASBR11(ePE1) and LASBR11(ePE2),
respectively. respectively.
o The labels advertised by ASBR12 to iPE using BGP-LU [4] for the o The labels advertised by ASBR12 to iPE using BGP-LU [3] for the
egress PEs ePE1 and ePE2 are LASBR12(ePE1) and LASBR12(ePE2), egress PEs ePE1 and ePE2 are LASBR12(ePE1) and LASBR12(ePE2),
respectively respectively
o The label advertised by ASBR11 to iPE using BGP-LU [4] for the o The label advertised by ASBR13 to iPE using BGP-LU [3] for the
egress PE ePE3 is LASBR13(ePE3) egress PE ePE3 is LASBR13(ePE3)
o The IGP labels advertised by the next hops directly connected to o The IGP labels advertised by the next hops directly connected to
iPE towards ASBR11, ASBR12, and ASBR13 in the core of domain 1 iPE towards ASBR11, ASBR12, and ASBR13 in the core of domain 1
are IGP-L11, IGP-L12, and IGP-L13, respectively. are IGP-L11, IGP-L12, and IGP-L13, respectively.
Based on these connectivity assumptions and the topology in Figure Based on these connectivity assumptions and the topology in Figure
4, the routing table on iPE is 4, the routing table on iPE is
65000:11.1.1.0/24 65000: 198.51.100.0/24
via ePE1 (192.0.2.1), VPN Label: VPN-L11 via ePE1 (192.0.2.1), VPN Label: VPN-L11
via ePE2 (192.0.2.2), VPN Label: VPN-L21 via ePE2 (192.0.2.2), VPN Label: VPN-L21
65000:11.1.2.0/24 65000: 203.0.113.0/24
via ePE1 (192.0.2.2), VPN Label: VPN-L22 via ePE1 (192.0.2.2), VPN Label: VPN-L22
via ePE2 (192.0.2.3), VPN Label: VPN-L23 via ePE2 (192.0.2.3), VPN Label: VPN-L32
192.0.2.1/32 (ePE1) 192.0.2.1/32 (ePE1)
Via ASBR11, BGP-LU Label: LASBR11(ePE1) Via ASBR11, BGP-LU Label: LASBR11(ePE1)
Via ASBR12, BGP-LU Label: LASBR12(ePE1) Via ASBR12, BGP-LU Label: LASBR12(ePE1)
192.0.2.2/32 (ePE2) 192.0.2.2/32 (ePE2)
Via ASBR11, BGP-LU Label: LASBR11(ePE2) Via ASBR11, BGP-LU Label: LASBR11(ePE2)
Via ASBR12, BGP-LU Label: LASBR12(ePE2) Via ASBR12, BGP-LU Label: LASBR12(ePE2)
192.0.2.3/32 (ePE3) 192.0.2.3/32 (ePE3)
Via ASBR13, BGP-LU Label: LASBR13(ePE3) Via ASBR13, BGP-LU Label: LASBR13(ePE3)
192.0.1.1/32 (ASBR11) 192.0.2.4/32 (ASBR11)
via Core, Label: IGP-L11 via Core, Label: IGP-L11
192.0.1.2/32 (ASBR12) 192.0.2.5/32 (ASBR12)
via Core, Label: IGP-L12 via Core, Label: IGP-L12
192.0.1.3/32 (ASBR13) 192.0.2.6/32 (ASBR13)
via Core, Label: IGP-L13 via Core, Label: IGP-L13
The diagram in Figure 5 illustrates the forwarding chain in iPE The diagram in Figure 5 illustrates the forwarding chain in iPE
assuming that the forwarding hardware in iPE supports 3 levels of assuming that the forwarding hardware in iPE supports 3 levels of
hierarchy. The leaves corresponding to the ABSRs on domain 1 hierarchy. The leaves corresponding to the ABSRs on domain 1
(ASBR11, ASBR12, and ASBR13) are at the bottom of the hierarchy. (ASBR11, ASBR12, and ASBR13) are at the bottom of the hierarchy.
There are few important points: There are few important points:
o Because the hardware supports the required depth of hierarchy, o Because the hardware supports the required depth of hierarchy,
the sizes of a pathlist equal the size of the label list the sizes of a pathlist equal the size of the label list
skipping to change at page 22, line 36 skipping to change at page 22, line 36
leaves to adjust the OutLabel-Lists because FIB can rely on the leaves to adjust the OutLabel-Lists because FIB can rely on the
path-index stored in the useable paths in the pathlist to pick the path-index stored in the useable paths in the pathlist to pick the
right label. right label.
It is noteworthy to mention that because FIB manager modifies the It is noteworthy to mention that because FIB manager modifies the
forwarding chain starting from the IGP leaves only. BGP pathlists forwarding chain starting from the IGP leaves only. BGP pathlists
and leaves are not modified. Hence traffic restoration occurs within and leaves are not modified. Hence traffic restoration occurs within
the time frame of IGP convergence, and, for local link failure, the time frame of IGP convergence, and, for local link failure,
assuming a backup path has been precomputed, within the timeframe of assuming a backup path has been precomputed, within the timeframe of
local detection (e.g. 50ms). Examples of solutions that pre- local detection (e.g. 50ms). Examples of solutions that pre-
computing backup paths are IP FRR [16] remote LFA [17], Ti-LFA [15] computing backup paths are IP FRR [15] remote LFA [16], Ti-LFA [14]
and MRT [18] or eBGP path having a backup path [10]. and MRT [17] or eBGP path having a backup path [9].
Let's apply the procedure mentioned in this subsection to the Let's apply the procedure mentioned in this subsection to the
forwarding chain depicted in Figure 2. Suppose a remote link failure forwarding chain depicted in Figure 2. Suppose a remote link failure
occurs and impacts the first ECMP IGP path to the remote BGP next- occurs and impacts the first ECMP IGP path to the remote BGP next-
hop. Upon IGP convergence, the IGP pathlist used by the BGP next-hop hop. Upon IGP convergence, the IGP pathlist used by the BGP next-hop
is updated to reflect the new topology (one path instead of two). As is updated to reflect the new topology (one path instead of two). As
soon as the IGP convergence is effective for the BGP next-hop entry, soon as the IGP convergence is effective for the BGP next-hop entry,
the new forwarding state is immediately available to all dependent the new forwarding state is immediately available to all dependent
BGP prefixes. The same behavior would occur if the failure was local BGP prefixes. The same behavior would occur if the failure was local
such as an interface going down. As soon as the IGP convergence is such as an interface going down. As soon as the IGP convergence is
skipping to change at page 24, line 34 skipping to change at page 24, line 34
failed link as the BGP next-hop, the edge router will still perform failed link as the BGP next-hop, the edge router will still perform
the previous steps. But, unlike the case of next-hop self, IGP on the previous steps. But, unlike the case of next-hop self, IGP on
failed edge node informs the rest of the iBGP peers that IP address failed edge node informs the rest of the iBGP peers that IP address
of the failed link is no longer reachable. Hence the FIB manager on of the failed link is no longer reachable. Hence the FIB manager on
iBGP peers will delete the IGP leaf corresponding to the IP prefix iBGP peers will delete the IGP leaf corresponding to the IP prefix
of the failed link. The behavior of the iBGP peers will be identical of the failed link. The behavior of the iBGP peers will be identical
to the case of edge node failure outlined in Section 6.2.1. to the case of edge node failure outlined in Section 6.2.1.
It is noteworthy to mention that because the edge link failure is It is noteworthy to mention that because the edge link failure is
local to the edge router, sub-50 msec convergence can be achieved as local to the edge router, sub-50 msec convergence can be achieved as
described in [10]. described in [9].
Let's try to apply the case of next-hop self to the forwarding chain Let's try to apply the case of next-hop self to the forwarding chain
depicted in Figure 3. After failure of the link between ePE1 and CE, depicted in Figure 3. After failure of the link between ePE1 and CE,
the forwarding engine will route traffic arriving from the core the forwarding engine will route traffic arriving from the core
towards VPN-NH2 with path-index=1. A packet arriving from the core towards VPN-NH2 with path-index=1. A packet arriving from the core
will contain the label VPN-L11 at top. The label VPN-L11 is swapped will contain the label VPN-L11 at top. The label VPN-L11 is swapped
with the label VPN-L21 and the packet is forwarded towards ePE2. with the label VPN-L21 and the packet is forwarded towards ePE2.
6.3. Handling Failures for Flattened Forwarding Chains 6.3. Handling Failures for Flattened Forwarding Chains
skipping to change at page 27, line 13 skipping to change at page 27, line 13
some results using actual numbers. some results using actual numbers.
7.3. Automated 7.3. Automated
The BGP PIC solution does not require any operator involvement. The The BGP PIC solution does not require any operator involvement. The
process is entirely automated as part of the FIB implementation. process is entirely automated as part of the FIB implementation.
The salient points enabling this automation are: The salient points enabling this automation are:
o Extension of the BGP Best Path to compute more than one primary o Extension of the BGP Best Path to compute more than one primary
([11]and [12]) or backup BGP next-hop ([6] and [13]). ([10]and [11]) or backup BGP next-hop ([5] and [12]).
o Sharing of BGP Path-list across BGP destinations with same o Sharing of BGP Path-list across BGP destinations with same
primary and backup BGP next-hop primary and backup BGP next-hop
o Hierarchical indirection and dependency between BGP pathlist and o Hierarchical indirection and dependency between BGP pathlist and
IGP pathlist IGP pathlist
7.4. Incremental Deployment 7.4. Incremental Deployment
As soon as one router supports BGP PIC solution, it benefits from As soon as one router supports BGP PIC solution, it benefits from
skipping to change at page 28, line 9 skipping to change at page 28, line 9
structure that allows achieving BGP prefix independent structure that allows achieving BGP prefix independent
convergence, and in the case of locally detected failures, sub-50 convergence, and in the case of locally detected failures, sub-50
msec convergence. A router can construct the forwarding chains in msec convergence. A router can construct the forwarding chains in
a completely transparent manner with zero operator intervention a completely transparent manner with zero operator intervention
thereby supporting smooth and incremental deployment. thereby supporting smooth and incremental deployment.
11. References 11. References
11.1. Normative References 11.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate [1] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol
Requirement Levels", BCP 14, RFC 2119, March 1997.
[2] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol
4 (BGP-4), RFC 4271, January 2006 4 (BGP-4), RFC 4271, January 2006
[3] Bates, T., Chandra, R., Katz, D., and Rekhter Y., [2] Bates, T., Chandra, R., Katz, D., and Rekhter Y.,
"Multiprotocol Extensions for BGP", RFC 4760, January 2007 "Multiprotocol Extensions for BGP", RFC 4760, January 2007
[4] Y. Rekhter and E. Rosen, " Carrying Label Information in BGP- [3] Y. Rekhter and E. Rosen, " Carrying Label Information in BGP-
4", RFC 3107, May 2001 4", RFC 8277, October 2017
[5] Andersson, L., Minei, I., and B. Thomas, "LDP Specification", [4] Andersson, L., Minei, I., and B. Thomas, "LDP Specification",
RFC 5036, October 2007 RFC 5036, October 2007
11.2. Informative References 11.2. Informative References
[6] Marques,P., Fernando, R., Chen, E, Mohapatra, P., Gredler, H., [5] Marques,P., Fernando, R., Chen, E, Mohapatra, P., Gredler, H.,
"Advertisement of the best external route in BGP", draft-ietf- "Advertisement of the best external route in BGP", draft-ietf-
idr-best-external-05.txt, January 2012. idr-best-external-05.txt, January 2012.
[7] Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh [6] Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh
Framework", RFC 5565, June 2009. Framework", RFC 5565, June 2009.
[8] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private [7] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006. Networks (VPNs)", RFC 4364, February 2006.
[9] De Clercq, J. , Ooms, D., Prevost, S., Le Faucheur, F., [8] De Clercq, J. , Ooms, D., Prevost, S., Le Faucheur, F.,
"Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider "Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider
Edge Routers (6PE)", RFC 4798, February 2007 Edge Routers (6PE)", RFC 4798, February 2007
[10] O. Bonaventure, C. Filsfils, and P. Francois. "Achieving sub- [9] O. Bonaventure, C. Filsfils, and P. Francois. "Achieving sub-
50 milliseconds recovery upon bgp peering link failures, " 50 milliseconds recovery upon bgp peering link failures, "
IEEE/ACM Transactions on Networking, 15(5):1123-1135, 2007 IEEE/ACM Transactions on Networking, 15(5):1123-1135, 2007
[11] D. Walton, A. Retana, E. Chen, J. Scudder, "Advertisement of [10] D. Walton, A. Retana, E. Chen, J. Scudder, "Advertisement of
Multiple Paths in BGP", draft-ietf-idr-add-paths-12.txt, Multiple Paths in BGP", RFC 7911, July 2016
November 2015
[12] R. Raszuk, R. Fernando, K. Patel, D. McPherson, K. Kumaki, [11] R. Raszuk, R. Fernando, K. Patel, D. McPherson, K. Kumaki,
"Distribution of diverse BGP paths", RFC 6774, November 2012 "Distribution of diverse BGP paths", RFC 6774, November 2012
[13] P. Mohapatra, R. Fernando, C. Filsfils, and R. Raszuk, "Fast [12] P. Mohapatra, R. Fernando, C. Filsfils, and R. Raszuk, "Fast
Connectivity Restoration Using BGP Add-path", draft-pmohapat- Connectivity Restoration Using BGP Add-path", draft-pmohapat-
idr-fast-conn-restore-03, Jan 2013 idr-fast-conn-restore-03, Jan 2013
[14] C. Filsfils, S. Previdi, A. Bashandy, B. Decraene, S. [13] A. Bashandy, C. Filsfils, S. Previdi, B. Decraene, S.
Litkowski, M. Horneffer, R. Shakir, J. Tansura, E. Crabbe Litkowski, M. Horneffer, R. Shakir, "Segment Routing with MPLS
"Segment Routing with MPLS data plane", draft-ietf-spring- data plane", draft-ietf-spring-segment-routing-mpls-12 (work
segment-routing-mpls-02 (work in progress), October 2015 in progress), February 2018
[15] C. Filsfils, S. Previdi, A. Bashandy, B. Decraene, " Topology [14] A. Bashandy, C. Filsfils, B. Decraene, P. Francois, " Topology
Independent Fast Reroute using Segment Routing", draft- Independent Fast Reroute using Segment Routing", draft-
francois-spring-segment-routing-ti-lfa-02 (work in progress), bashandy-rtgwg-segment-routing-ti-lfa-02 (work in progress),
August 2015 August 2018
[16] M. Shand and S. Bryant, "IP Fast Reroute Framework", RFC 5714, [15] M. Shand and S. Bryant, "IP Fast Reroute Framework", RFC 5714,
January 2010 January 2010
[17] S. Bryant, C. Filsfils, S. Previdi, M. Shand, N So, " Remote [16] S. Bryant, C. Filsfils, S. Previdi, M. Shand, N So, " Remote
Loop-Free Alternate (LFA) Fast Reroute (FRR)", RFC 7490 April Loop-Free Alternate (LFA) Fast Reroute (FRR)", RFC 7490 April
2015 2015
[18] A. Atlas, C. Bowers, G. Enyedi, " An Architecture for IP/LDP [17] A. Atlas, C. Bowers, G. Enyedi, " An Architecture for IP/LDP
Fast-Reroute Using Maximally Redundant Trees", draft-ietf- Fast-Reroute Using Maximally Redundant Trees", RFC 7812, June
rtgwg-mrt-frr-architecture-10 (work in progress), February
2016 2016
12. Acknowledgments 12. Acknowledgments
Special thanks to Neeraj Malhotra, Yuri Tsier for the valuable Special thanks to Neeraj Malhotra, Yuri Tsier for the valuable
help help
Special thanks to Bruno Decraene for the valuable comments Special thanks to Bruno Decraene for the valuable comments
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
Authors' Addresses Authors' Addresses
Ahmed Bashandy Ahmed Bashandy
Cisco Systems Cisco Systems
170 West Tasman Dr, San Jose, CA 95134, USA 170 West Tasman Dr, San Jose, CA 95134, USA
Email: bashandy@cisco.com Email: abashandy.ietf@gmail.com
Clarence Filsfils Clarence Filsfils
Cisco Systems Cisco Systems
Brussels, Belgium Brussels, Belgium
Email: cfilsfil@cisco.com Email: cfilsfil@cisco.com
Prodosh Mohapatra Prodosh Mohapatra
Sproute Networks Sproute Networks
Email: mpradosh@yahoo.com Email: mpradosh@yahoo.com
 End of changes. 54 change blocks. 
118 lines changed or deleted 101 lines changed or added

This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/