Network Working Group                                         P. Marques
Internet-Draft
Intended status: Standards Track                             R. Fernando
Expires: September 10, October 22, 2011                                    R. Fernando
                                                                 E. Chen
                                                            P. Mohapatra
                                                           Cisco Systems
                                                           March 9,
                                                              H. Gredler
                                                        Juniper Networks
                                                          April 20, 2011

            Advertisement of the best external route in BGP
                  draft-ietf-idr-best-external-03.txt
                    draft-ietf-idr-best-external-04

Abstract

   The base BGP specifications prevent a BGP speaker from advertising
   any route current BGP-4 protocol specification [RFC4271] states that is not the
   selection process chooses the best route path for a BGP destination.  This
   document specifies a modification of this rule.  Routes are divided
   into two categories, "external" and "internal".  A specification given route which is
   provided for choosing a "best external route" (for a particular value
   added to the Loc-Rib and advertised to all peers.

   Previous versions [RFC1771] of the Network Layer Reachability Information).  A specification defined a different
   rule for Internal BGP speaker is
   then allowed to advertise its "best external route" Updates.  Given that Internal paths are not re-
   advertised to its internal
   BGP Internal peers, even if it was specified that is not the best route for of the destination.
   The
   external paths, as determined by the path selection tie breaking
   algorithm, would be advertised to Internal peers.

   This document extends that procedure to operate in environments where
   Route Reflection [RFC4456] or Confederations [RFC5065] are used and
   explains why advertising the best external route additional routing information can
   improve convergence time without causing routing loops.

   Additional benefits include reduction of inter-domain churn and
   avoidance of permanent route oscillation.  The document also generalizes the
   notions of "internal" and "external" so that they can be applied to
   Route Reflector Clusters and Autonomous System Confederations.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."
   This Internet-Draft will expire on September 10, October 22, 2011.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

   This document may contain material from IETF Documents or IETF
   Contributions published or made publicly available before November
   10, 2008.  The person(s) controlling the copyright in some of this
   material may not have granted the IETF Trust the right to allow
   modifications of such material outside the IETF Standards Process.
   Without obtaining an adequate license from the person(s) controlling
   the copyright in such materials, this document may not be modified
   outside the IETF Standards Process, and derivative works of it may
   not be created outside the IETF Standards Process, except to format
   it for publication as an RFC or to translate it into languages other
   than English.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1.
   2.  Requirements Language  . . . . . . . . . . . . . . . . . . . .  5
   2.
   3.  Generalization . . . . . . . . . . . . . . . . . . . . . . . .  6
   4.  Algorithm for selection of best external route . the Adj-RIB-OUT path  . . . . . . .  5
   3.  7
   5.  Advertisement Rules  . . . . . . . . . . . . . . . . . . . . .  6
   4.  9
   6.  Consistency between routing and forwarding . . . . . . . . . .  6
   5. 10
   7.  Applications . . . . . . . . . . . . . . . . . . . . . . . . .  8
     5.1. 12
   8.  Fast Connectivity Restoration  . . . . . . . . . . . . . .  8
     5.2. . . 13
   9.  Inter-Domain Churn Reduction . . . . . . . . . . . . . . .  9
     5.3. . . 14
   10. Reducing Persistent IBGP oscillation . . . . . . . . . . .  9
   6. . . 15
   11. Deployment Considerations  . . . . . . . . . . . . . . . . . .  9
   7. 16
   12. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . .  9
   8. 17
   13. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
   9. 18
   14. Security Considerations  . . . . . . . . . . . . . . . . . . . 10
   10. Normative 19
   15. References . . . . . . . . . . . . . . . . . . . . . 10 . . . . . 20
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 21

1.  Introduction

   The base BGP specifications prevent a BGP speaker

   Earlier versions of the BGP-4 protocol specification [RFC1771]
   prescribed different route advertisement rules for Internal and
   External peers.  While the overall best path would be advertised to
   External peers, Internal peers are advertised the best of the
   externally received paths.

   This Internal advertisement rule was never implemented as specified
   and was latter dropped from the protocol.  There is a trade-off in
   advertising
   any the "best-external" route versus the behavior that is became
   common standard of not advertising the best route for a BGP destination.  This
   document specifies a modification of when the selected best
   path is received from an Internal peer.  By not advertising
   information in this rule.  Routes are divided
   into two categories, "external" and "internal".  A specification case it is
   provided for choosing a "best external route" (for a particular value
   of possible to reduce state both in the Network Layer Reachability Information).  A
   local BGP speaker is
   then allowed to advertise its "best external route" to its internal as well as in the network overall.  Early BGP peers, even if that
   implementations where very concerned with reducing state as they
   where limited to relatively low memory footprints (e.g. 16 MB).
   There is not also the best route for possible concern regarding advertising a path
   different than the destination.
   The document explains why path that has been selected for forwarding.

   However, advertising the best external route can
   improve convergence time without causing routing loops.  Additional
   benefits include reduction of inter-domain churn and avoidance of
   permanent route oscillation.

   The document also generalizes route, when different from the notions
   best route, presents additional information into an IBGP mesh which
   may be of "internal" and
   "external" so value for several purposes including:

   o  Faster restoration of connectivity.  By providing additional
      paths, that they can may be applied used to Route Reflector Clusters
   [RFC4456] and Autonomous System Confederations [RFC5065].  More
   specifically, two routers fail over in the same route reflector cluster having
   an IBGP session between them are defined to be "internal" peers,
   whereas two routers in different clusters having an IBGP session are
   defined to be "external" peers.  Similarly, two routers in the same
   member AS of a confederation having an IBGP session between them are
   "internal" peers, whereas two routers in different member ASs of a
   confederation having a confed EBGP session between them are defined
   to be "external" peers.  The definition of "best external route"
   ensues from this definition in that it is the most preferred route
   among those received from the "external" neighbors.

   Advertising the best external route, when different from the best
   route, presents additional information into an IBGP mesh which may be
   of value for several purposes including:

   o  Faster restoration of connectivity, by providing additional paths,
      that may be used to fail over in case case the primary path
      becomes invalid or is withdrawn.

   o  Reducing inter-domain churn and traffic blackholing black-holing due to the
      readily available alternate path.

   o  Reducing the potential for situations of permanent IBGP route
      oscillation, as discussed in some scenarios
      oscillation [RFC3345].

   o  Improving selection of lower MED routes from the same neighboring
      AS.

   This document defines procedures to select the best external route
   for each destination. peer.  It also describes how above benefits are realized
   with best external route announcement with the help of certain
   scenarios.

1.1.

2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

2.  Algorithm for selection of best external route

   Given

3.  Generalization

   The BGP-4 protocol [RFC1771] has extended with two alternative
   mechanisms that provide ways to reduce the intent in advertising operational complexity of
   route distribution within an external route, when the best AS: Route Reflection [RFC4456] and
   Confederations [RFC5065].  It is important to be able to express
   route for advertisement rules in the same destination context of both of these mechanisms.

   When Route Reflection is an used, Internal peers are further classified
   depending of the reflection cluster they belong to.  Non-client
   internal route, is to provide
   additional information into peers form one BGP peering mesh.  Each set of RR clients
   with the IBGP mesh into which same "cluster-id" configuration forms a route is
   participating, it is desirable separate mesh.

   When selecting the path to add to take into account the routes
   received Adj-RIB-OUT, this document
   specifies that the path that originate from internal neighbors in the same mesh MAY be
   excluded from consideration.  This results in an Adj-RIB-OUT
   selection process.

   We propose per mesh (the set of non-client peers or a route selection algorithm that selects specific
   cluster).

   Similarly, when BGP Confederations are used, each confederated AS is
   a total order
   between BGP mesh.  As with the Route Reflection scenario, when selecting
   the path to add to the Adj-RIB-OUT, routes and which selects from the same mesh MAY be
   excluded.

4.  Algorithm for selection of the Adj-RIB-OUT path

   The objective of this protocol extension is to improve the quality of
   the routing information known to a particular BGP mesh with minimum
   additional cost in terms of processing and state.

   Towards that goal, it is useful to define a total order between the
   Adj-RIB-In routes which provides both the same overall best route path as
   the one
   currently specified [RFC4271]. algorithm defined in the current BGP-4 specification [RFC4271] as
   well as an ordering of alternate routes.  Using this total order it
   is then computationally efficient to select the path for a specific
   Adj-RIB-OUT by excluding the routes that have been received from the
   BGP mesh corresponding to the peer (or set of peers).

   In order to achieve this, we need it is helpful to introduce the concept of route
   path group.  For a given NLRI, suppose  A group is the BGP decision process has run set of paths that compare as equal
   through all the steps prior to the MED comparison step (as defined in
   section 9.1.2.2 of [RFC4271].  Look at the set of routes that are
   still under consideration at that time.  Now partition this set into
   a number of disjoint route groups, where two routes are in the same
   group if RFC 4271 [RFC4271] and only if have been received from the
   same neighbor AS of each route is the same.

   Routes AS.

   Paths are ordered within a group via MED or subsequent route
   selection rules.

   The order of all routes for the same destination is determined by the
   order of

   In pseudo-code:

      function compare(path_1, path_2) {
          cmp_result cmp = selection_steps_before_med(path_1, path_2);
          if (cmp != cmp_result.equal) {
              return cmp;
          }
          if (neighbor_as(path_1) == neighbor_as(path_2)) {
              return selection_steps_after_med(path_1, path_2);
          }

          if (is_group_best(path_1)) {
              if (!is_group_best(path_2)) {
                  return cmp_result.greater_than;
              }
              return selection_steps_after_med(path_1, path_2);
          } else {
              if (is_group_best(path_2)) {
                  return cmp_result.less_than;
              }
              /* Compare the best route in each group. paths of respective groups */
              return compare(group_best(path_1), group_best(path_2));
          }
      }

   As an example, the following set of received routes:

                       +------+----+-----+--------+
                       | Path | AS | MED | rtr_id |
                       +------+----+-----+--------+
                       | a    | 1  | 10  | 10     |
                       |      |    |     |        |
                       | b    | 2  | 5   | 1      |
                       |      |    |     |        |
                       | c    | 1  | 5   | 5      |
                       |      |    |     |        |
                       | d    | 2  | 20  | 20     |
                       |      |    |     |        |
                       | e    | 2  | 30  | 30     |
                       |      |    |     |        |
                       | f    | 3  | 10  | 20
                      Figure 1:     |
                       +------+----+-----+--------+

                           Path Attribute Table

   Would yield the following order (from the most to the least
   preferred):

      b < d < e < c < a < f

   In this example, comparison of the best route path within each group
   provides the sequence (b < c < f).  The remaining routes paths are ordered
   in relation to their respective group best.

   The first route path in the above ordering above is indeed the best route overall path for a
   given destination.  Eliminating the best route and executing the
   above steps leads us to NLRI.  When selecting a new total order of the routes.  The route
   to be advertised to path for a particular domain is selected by choosing the
   most preferred route that is external Adj-RIB-Out (or
   set of RIB-Outs) an implementation MAY choose to that particular domain select the first
   path in the above order.  Note that whenever global order which was not received from the overall best route is
   external it will automatically be selected by this algorithm.

3. same BGP
   mesh (as defined above) as the target peer (or peers).

5.  Advertisement Rules

   1.  In an AS domain, if  When advertising a router has installed an internal route as
       best, it should advertise its "best external route" (as defined
       in the draft) to its internal neighbors.

   2.  In a Cluster domain, if non-client Internal peer, a router (route reflector) has installed
       an external route as best, it should advertise its "best internal
       route" to its external neighbors.  (Advertising BGP
       speaker MAY choose to internal
       neighbors is unchanged.)  Similarly, if select the route reflector has
       installed an internal route as best, it should advertise its
       "best external route" to its internal (client) peers.  In first path in order
       for that did not
       originate from the same BGP mesh (i.e. the set of non-client
       Internal peers) whenever the best overall path has been received
       from this mesh and would be suppressed by the reflector Internal BGP non-
       readvertisement rule.

   2.  When advertising a route to a Route Reflection client peer, in
       case the overall best path has been received from the same
       cluster, a BGP speaker MUST be able to advertise the best external route
       into overall
       path to all the cluster, it members of the cluster other than the originator,
       unless "client-to-client" reflection is necessary disabled.  The
       implementation MAY choose to advertise an alternate path to the
       specific peer that client-to-client
       reflection be disabled, since its advertisement may otherwise
       contain originates the best route within overall path by excluding
       from consideration all paths with the cluster domain. same originator-id.

   3.  In a Confederation Member domain, if a router (confederation
       border router) has installed an internal route as best, it
       advertises its best external route to its internal neighbors.
       However, if it has installed an external route as best, it
       advertises its best internal route to its external neighbors.

4.  Consistency between routing  When "client-to-client" reflection is disabled and forwarding

   The BGP protocol, the cluster is
       operating as defined in [RFC1771], specifies that a BGP
   speaker shall mesh, a Route Reflector MAY opt to advertise to its internal peers
       the route with cluster the
   highest degree of preference among routes to preferred path from the same destination set of paths not received
       from external neighbors.

   This section discusses problems present with the approach described
   in [RFC1771] and the next section offers an alternative algorithm to
   select a best external route which cluster.  While this deployment mode is currently
       uncommon, it can be advertised a practical way to guarantee path diversity
       inside the cluster.

   4.  A confederation border route MAY choose to advertise an IBGP mesh. alternate
       path towards its Internal BGP mesh or towards a con-fed member AS
       following the same procedure as defined above.

6.  Consistency between routing and forwarding

   The internal update advertisement rules contained in the original
   BGP-4 specification [RFC1771] can lead to situations where traffic is
   forwarded through a route other than the route advertised by BGP.

   Inconsistencies between forwarding and routing are highly
   undesirable.  Service providers use BGP with the dual objective of
   learning reachability reach-ability information and expressing policy over network
   resources.  The latter assumes that forwarding follows routing
   information.

   Consider the Autonomous system presented in figure 1, where r1 ... r4
   are members of a single IBGP mesh and routes a, b, and c are received
   from external peers.

                    AS 1 (c)
                      |
                    +----+           +----+
                    | r1 |...........| r2 |
                    +----+           +----+
                      .
                      .
                      .
                      .
                      .
                      .
                    +----+           +----+
                    | r3 |...........| r4 | --- ebgp --- AS X
                    +----+           +----+
                   /      \
                  /        \
               AS 1 (a)  AS 2 (b)

                    Figure 2:

                         Inconsistency in Routing

                       +------+----+-----+--------+
                       | Path | AS | MED   rtr_id | rte_id |
                       +------+----+-----+--------+
                       | a    | 1  | 10  | 1      |
                       |      |    |     |        |
                       | b    | 2  | 5   | 10     |
                       |      |    |     |        |
                       | c    | 1  | 5   | 5

                    Figure 3:      |
                       +------+----+-----+--------+

                           Path Attribute Table - 2

   Following the rules as specified in RFC 1771 [RFC1771], router r3
   will select path (b) received from AS 2 as its overall best to
   install in the Loc-Rib, since path (b) is preferable to path (c), the
   lowest MED route from AS 1.  However for the purposes of Internal
   Update route selection, it will ignore the presence of path (c), and
   elect (a) as its advertisement, via the router-id tie-breaking rule.

   In this scenario, router r4 will receive (c) from r1 and (a) from r3.
   It will pick the lowest MED route (c) and advertise it out via ebgp IBGP
   to AS X. However at this point routing is inconsistent with
   forwarding as traffic received from AS X will be forwarded towards AS
   2, while the ebgp IBGP advertisement is being made for an AS 1 path.

   Routing policies are typically specified in terms of neighboring
   ASes.
   AS-es.  In the situation above, assuming that AS 1 is network for
   which this AS provides transit services while AS 2 and AS X are peer
   networks, one can easily see how the inconsistency between routing
   and forwarding would lead to transit being inadvertently provided
   between AS X and AS 2.  This could lead to persistent forwarding
   loops.

   Inconsistency between routing and forwarding may happen, whenever a
   bgp
   GP speaker chooses to advertise an external route into IBGP that is
   different from the overall best route and its overall best is
   external.

5.

7.  Applications

5.1.
8.  Fast Connectivity Restoration

   When two exits are available to reach a particular destination and
   one is preferred over the other, the availability of an alternate
   path provides fast connectivity restoration when the primary path
   fails.

   Restoration can be quick since the alternate path is already at hand.
   The border router could precompute recompute the backup route and preinstall perinatal it
   in FIB ready to be switched when the primary goes away.  Note that
   this requires the border router that's the backup to also preinstall perinatal
   the secondary path and switch to it on failure.

5.2.

9.  Inter-Domain Churn Reduction

   Within an AS, the non availability of backup best leads to a border
   router sending a withdraw upstream when the primary fails.  This
   leads to inter-domain churn and packet loss for the time the network
   takes to converge to the alternate path.  Having the alternate path
   will reduces the churn and eliminates packet loss.

5.3.

10.  Reducing Persistent IBGP oscillation

   Advertising the best external route, according to the algorithm
   described in this document will reduce the possibility of route
   oscillation by introducing additional information into the IBGP
   system.

   For a permanent oscillation condition to occur, it is necessary that
   a circular dependency between paths occurs such that the selection of
   a new best path by a router, in response to a received IBGP
   advertisement, causes the withdrawal of information that another
   router depends on in order to generate the original event.

   In vanilla BGP, when only the best overall route is advertised, as in
   most implementations, oscillation can occur whenever there are 2 or
   clusters/sub-ASes
   clusters/sub-AS-es such that at least one cluster has more than one
   path that can potentially contribute to the dependency.

6.

11.  Deployment Considerations

   The mechanism specified in the draft allows a BGP speaker to
   advertise a route that is not the best route used for forwarding.
   This is a departure from the current behavior.  However, consistency
   in the path selection process across the AS is still guaranteed since
   the ingress routers will not choose the best-external route as the
   best route for a destination in steady state (for the same reason
   that the BGP speaker announcing the best-external route chose an IBGP
   route as best instead of the externally learnt route).  Though it is
   possible to alter this assurance by defining route policies on IBGP
   sessions, use of such policies in IBGP is not recommended, especially
   with best-external announcement turned on in the network.  It is also
   worth noting that such inconsistency in routing and forwarding is
   mitigated in a tunneled network.

7.

12.  Acknowledgments

   This document greatly benefits from the comments of Yakov Rekhter,
   John Scudder, Eric Rosen, Jenny Yuan, and Jay Borkenhagen.

8. Borkenhagen, Salkat Ray and
   Jakob Heitz.

13.  IANA Considerations

   This document has no actions for IANA.

9.

14.  Security Considerations

   There are no additional security risks introduced by this design.

10.  Normative

15.  References

   [RFC1771]  Rekhter, Y. and T. Li, "A Border Gateway Protocol 4
              (BGP-4)", RFC 1771, March 1995.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3345]  McPherson, D., Gill, V., Walton, D., and A. Retana,
              "Border Gateway Protocol (BGP) Persistent Route
              Oscillation Condition", RFC 3345, August 2002.

   [RFC4271]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
              Protocol 4 (BGP-4)", RFC 4271, January 2006.

   [RFC4456]  Bates, T., Chen, E., and R. Chandra, "BGP Route
              Reflection: An Alternative to Full Mesh Internal BGP
              (IBGP)", RFC 4456, April 2006.

   [RFC5065]  Traina, P., McPherson, D., and J. Scudder, "Autonomous
              System Confederations for BGP", RFC 5065, August 2007.

Authors' Addresses

   Pedro Marques

   Email: pedro.r.marques@gmail.com

   Rex Fernando
   Cisco Systems
   170 W. Tasman Drive Dr.
   San Jose, CA  95134
   USA
   US

   Email: rex@cisco.com

   Enke Chen
   Cisco Systems
   170 W. Tasman Drive Dr.
   San Jose, CA  95134
   USA
   US

   Email: enkechen@cisco.com

   Pradosh Mohapatra
   Cisco Systems
   170 W. Tasman Drive Dr.
   San Jose, CA  95134
   USA
   US

   Email: pmohapat@cisco.com

   Hannes Gredler
   Juniper Networks
   1194 N. Mathilda Ave.
   Sunnyvale, CA  94089
   US

   Email: hannes@juniper.net