Network Working Group                                            R. Bush
Internet-Draft                                 Internet Initiative Japan
Intended status: Standards Track                                 J. Haas
Expires: September 12, 2017 January 4, 2018                                      J. Scudder
                                                  Juniper Networks, Inc.
                                                               A. Nipper
                                                                 T. King, Ed. King
                                                  DE-CIX Management GmbH
                                                          March 11,
                                                            July 3, 2017

        Making Route Servers Aware of Data Link Failures at IXPs


   When BGP route servers are used, the data plane is not congruent with
   the control plane.  Therefore, the peers on the at an Internet exchange can lose
   data connectivity without the control plane being aware of it, and
   packets are dropped on the floor. lost.  This document proposes the use of BFD between a newly defined
   BGP Subsequent Address Family Identifier (SAFI) both to allow the two peering routers
   route server to detect a request its clients use BFD to track data plane
   connectivity to their peers' addresses, and then uses a newly defined BGP SAFI for the clients to signal the
   that connectivity state
   of the data link back to the route server(s). server.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   be interpreted as described in [RFC2119] only when they appear in all
   upper case.  They may also appear in lower or mixed case as English
   words, without normative meaning.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."
   This Internet-Draft will expire on September 12, 2017. January 4, 2018.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Operation .  Definitions . . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Mutual Discovery of Route Server Client Next-Hops
   3.  Overview  . . . .   3
     2.2.  Tracking Connectivity . . . . . . . . . . . . . . . . . .   4
   3.  Advertising Client Router Connectivity to the Route Server .   5 . . .   3
   4.  Advertising NHIB state in BGP  Next Hop Validation . . . . . . . . . . . . . . . . . . . . .   5
     4.1.  Using the RS-Reachable SAFI to carry NHIB state  ReachAsk  . . . . .   6
     4.2.  Specific Procedures for Route Server Clients . . . . . .   6
     4.3.  The RS-Reachable Control Extended Community . . . . . . .   6
   5.  Processing NHIB State Changes . . . . . .   5
     4.2.  LocReach  . . . . . . . . . .   7
     5.1.  Route Server Client Procedures for NHIB Changes . . . . .   7
     5.2.  Route Server Procedures for NHIB Changes . . . . . . . .   8
   6.  Utilizing Next Hop Unreachability Information at Client
       Routers .   5
     4.3.  ReachTell . . . . . . . . . . . . . . . . . . . . . . . .   6
     4.4.  NHIB  . .   9
   7.  Recommendations for Using BFD . . . . . . . . . . . . . . . .   9
   8.  Bootstrapping . . . . . . . .   6
   5.  Advertising NH-Reach state in BGP . . . . . . . . . . . . . .   6
   6.  Client Procedures for NH-Reach Changes  . .  11
   9.  Other Considerations . . . . . . . . .   8
   7.  Recommendations for Using BFD . . . . . . . . . . .  11
   10. IANA Considerations . . . . .   9
   8.  Other Considerations  . . . . . . . . . . . . . . . .  11
   11. Security Considerations . . . .   9
   9.  IANA Considerations . . . . . . . . . . . . . . .  11
   12. References . . . . . .   9
   10. Security Considerations . . . . . . . . . . . . . . . . . . .  12
     12.1.  Normative   9
   11. References  . . . . . . . . . . . . . . . . . .  12
     12.2.  Informative . . . . . . .  10
     11.1.  Normative References . . . . . . . . . . . . . . . . .  12
   Appendix A.  Summary of Adj-NHIB-In state .  10
     11.2.  Informative References . . . . . . . . . . . .  13 . . . . .  11
   Appendix B. A.  Summary of Document Changes  . . . . . . . . . . . .  13  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  13  11

1.  Introduction

   In configurations (typically Internet Exchange Points (IXPs)) where
   EBGP routing information is exchanged between client routers through
   the agency of a route server (RS) [RFC7947], but traffic is exchanged
   directly, operational issues can arise when partial data plane
   connectivity exists among the route server client routers.  Since the
   data plane is not congruent with the control plane, the client
   routers on the IXP can lose data connectivity without the control
   plane - the route server - being aware of it, resulting in
   significant data loss.

   To remedy this, two basic problems need to be solved:

   1.  Client routers must have a means of verifying connectivity
       amongst themselves, and
   2.  Client routers must have a means of communicating the knowledge
       of the failure (and restoration) back to the route server.

   The first can be solved by application of Bidirectional Forwarding
   Detection [RFC5880].  The second can be solved by exchanging BGP
   routes which use the RS-Reachable SAFI NH-Reach Subsequent Address Family Identifier
   (SAFI) defined in this document.

   Throughout this document, we generally assume that the route server
   being discussed is able to represent different RIBs towards different
   clients, as discussed in section  [RFC7947].  These
   procedures (other than the use of BFD to track next hop reachability)
   have limited value if [RFC7947].  If this is
   not the case.

2.  Operation

   Below, we detail case, the procedures where a route server tells its client
   routers about other client nexthops by sending it RS-Reachable
   routes, the client router verifies connectivity described here to those other client
   routers using allow BFD and communicates its findings to be
   automatically provisioned between clients still have value; however,
   the procedures for signaling reachability back to the route server using RS-Reachable routes.  The route server uses the received
   routes with RS-Reachable SAFI as input
   may not.

   Throughout this document, we refer to the route selection process
   it performs on behalf of "route server", "RS" or
   just "server" and the client.

2.1.  Mutual Discovery of Route Server Client Next-Hops

   Strictly speaking, a route server client does not need "client" to know describe the two BGP routers
   engaging in the exchange of information.  We observe that there could
   be other control-plane clients.  For validation purposes, it only needs applications for this extension.  Our use of terminology is
   intended for clarity of description, and not to know limit the set future
   applicability of next hops the proposal.

2.  Definitions

   o  Indirect peer: If a route server is configured such that routes
      from a given client might choose to send to
   it; i.e., be sent to know all potential forwarding plane relationships.

   This requirement amounts some other client, or vice-
      versa, those two clients are considered to knowing be indirect peers.
   o  RS: Route Server.  See [RFC7947].

3.  Overview

   As with the base BGP next hops protocol, we model the route
   server is aware function of for this
   extension as the particular per-client Loc-RIB (see section  [RFC7947]).  We introduce interaction between a new table conceptual set of databases:

   o  ReachAsk: The reachability request database.  A database of
      nexthops (host addresses) for each client which data plane reachability is
      being queried.
   o  ReachAsk-Out: A set of queries sent to
   store the client.
   o  ReachAsk-In: A set of queries received from the route server.

   o  ReachTell: The reachability response database.  A database of
      responses to ReachAsk queries, indicating what is known next hops, their compatibility with this proposed
   solution and their learned about data
      plane reachability.  We call these tables per-
   o  ReachTell-Out: The responses being sent to the route server.
   o  ReachTell-In: The response received from the client.
   o  LocReach: The local reachability database.
   o  NHIB: Next Hop Information Base (NHIB).  The NHIB Base.  Stores what is communicated to known about the Route Server using RS-Reachable routes.

      client's reachability to its next hops.

   |   +------------+    +------------+    +------------+   |
   |   |    Per-    |    | Configured |    |    Per-    |   |
   |          .---------->   |   Client   |    |  indirect  |    |   Client   |----------.   |   |
   |   |    NHIB    |    |   peers    |    |    RIB     |   |
   |   +-----^------+    +------------+    +-----+------+   |
   |         |   +------+-----+                      +-----v------+                         \         |          |   |Adj-NHIB-In
   |                      |Adj-NHIB-Out|   +-----+------+                   `-->-----v------+   |
   |   |ReachTell-In|                      |ReachAsk-Out|   |
   |   +------^-----+     Route Server     +-----+------+   |
              |                                  |
              |                                  |
              |                                  |
              |                                  |
   |   +------+-----+   +------+------+       RS Client      +-----v------+   |     +-----v-----+    |   |Adj-NHIB-Out|                      |Adj-NHIB-In
   |   |ReachTell-Out|                     |ReachAsk-In|    |
   |   +------^-----+                      +-----+------+   +------^------+                     +-----+-----+    |
   |          |          +------------+          |          |
   |          |          |            |          |          |
   |          `----------+    NHIB  LocReach  <----------'          |
   |                     |            |                     |
   |                     +------------+                     |

      Figure 1:

   Route Server, RS Client, and NHIBs Reachability Ask and Tell databases with
                               In/Out Queues

   The NHIB is not large; the set of routers in the ASs

   In outline, the route server requests its client has
   asked the RS to maintain in its view.

   At track
   connectivity for all the route server, potential next hops the Adj-NHIB-Out for each client is populated
   with RS might send to the
   client, by sending these next hops from as ReachAsk "routes".  The client
   tracks connectivity using BFD and reports its Loc-RIB.  If connectivity status to
   the RS using ReachTell "routes".  Connectivity status may be that the BGP capabilities learned
   during BGP session setup identify a
   next hop as compatible with this
   proposal, this is reflected in reachable, unreachable, or unknown.  Once the NHIB.  Initially, it is assumed
   that RS has been
   informed by the client router is able to reach of its next hops which is stored
   in the NHIB.  If a next hop is added connectivity, it uses this information
   to influence the NHIB for a particular
   client, a route SHOULD be added to selection the router server's Adj-NHIB-Out.

   A RS performs on behalf of the
   client.  Details are elaborated in the following sections.

4.  Next Hop Validation

   Below, we detail procedures where a route server tells its client SHOULD use BFD [RFC5880] (or
   router about other means beyond client nexthops by sending it ReachAsk routes and
   the scope of this document) to track forwarding plane client router verifies connectivity to
   each next hop in those other client routers
   and communicates its NHIB as received from findings back to the RS's Adj-NHIB-Out.

2.2.  Tracking Connectivity

   For each next hop in RS using ReachTell routes.
   The RS uses the NHIB received from ReachTell routes as input to the NHIB and
   hence the route server (called
   Adj-NHIB-In), selection process it performs on behalf of the

4.1.  ReachAsk

   The route server maintains a ReachAsk database for each client router SHOULD use some means to confirm that
   data plane connectivity exists to
   supports this proposal, that next hop.  Here we assume BFD.

   The is, for each client router maintains its own NHIB in order to keep track that has advertised
   support (Section 5) for the NH-Reach SAFI.  This database is the
   union of:

   o  The set of
   its (potential) next hops and their reachability. found in the associated per-client Loc-RIB
      (see section of [RFC7947]).
   o  The NHIB is
   updated according set of addresses of this client's indirect peers (Section 2).
   o  The RS MAY also add other entries, for example under configuration

   We note that under most circumstances, the first (Loc-RIB next hops)
   set will be a subset of the second (indirect peers) set.  For this
   not to be the Adj-NHIB-In and case, a client routers own tests would have to
   verify connectivity have sent a "third party"
   next hop [RFC4271] to the server.  To cover such a case, an
   implementation MAY note any such next hops.

   For each hops, and include them in its
   list of indirect peers.  (This implies that if a third party next hop
   for client C is conveyed to client A, not only will C be placed in
   A's ReachAsk database, but A will be placed in C's ReachAsk

   The contents of the Adj-NHIB-In received from ReachAsk database are communicated to the route server, client
   using the NLRI format and procedures described in Section 5.

4.2.  LocReach

   The client router SHOULD MUST attempt to establish a BFD session if one is
   not already established, and track data plane connectivity to each host
   address depicted in the reachability ReachAsk database.  It MAY also track
   connectivity to other addresses.  The use of BFD for this next hop. purpose is
   detailed in Section 6.

   For each nexthop that address being tracked, its state is determined to be reachable, an entry should
   be added in maintained by the client router's Adj-NHIB-Out to be advertised to the
   route server.  Similarly, when that nexthop
   in a LocReach entry.  The state can be:

   o  Unknown.  Connectivity status is determined to no
   longer be reachable, the entry should be removed from the client
   router's Adj-NHIB-Out. unknown.  This may also be done as due to a result
      temporary or permanent lack of policy
   even if connectivity exists.

   If the client can not establish a BFD session with an entry in its
   NHIB, feasible OAM mechanism to determine
      the next hop status.
   o  Up.  The address has been determined to be reachable.
   o  Down.  The address has been determined to be unreachable.

   The LocReach database is put it in the Adj-NHIB-Out used as input for backward

   If the test of connectivity between one client router and another
   client router fails, the client router detecting this failure should
   perform ReachTell database; it
   MAY also be used as input to the connectivity test client's route resolvability
   condition (section of [RFC4271]).

4.3.  ReachTell

   The ReachTell database contains an entry for a configurable amount every entry in the
   LocReach database.

   The contents of time,
   preferably 24 hours.  If during this time no connectivity can be
   restored no more testing is performed until manually changed or the
   client router is rebooted.

3.  Advertising Client Router Connectivity ReachTell database are communicated to the Route Server

   As discussed above, a client router will advertise its Adj-NHIB-Out
   to server
   using the route server. NLRI format and procedures described in Section 5.

4.4.  NHIB

   The route server SHOULD update maintains a per-client Next Hop Information Base, or
   NHIB.  This contains the reachability information of about next hops in hop status received
   from ReachTell.

   In computing its per-client Loc-RIB, the client's NHIB table accordingly.
   Furthermore, RS uses the route server SHOULD use reachability information
   from content of the
   related per-client NHIB as input to its own decision process when computing the
   Adj-RIB-Out for this client.  This client-dependent Adj-RIB-Out route resolvability condition
   (section of [RFC4271]).  The next hop being resolved is
   then advertised to this client.  In particular,
   looked up in the route server MUST
   exclude any routes whose NHIB and its state determined:

   o  Up next hops are considered resolvable.
   o  Unknown next hops MAY be considered resolvable.  They MAY be less
      preferred for selection.
   o  Down next hops MUST NOT be considered resolvable.
   o  If a given next hop is not present in the NHIB, but is present in
      ReachAsk-Out, either the client has declared not responded yet (a transient
      condition) or an error exists.  Similar to Unknown next hops, such
      routes MAY be not

4. considered resolvable; they MAY be less preferred.

5.  Advertising NHIB NH-Reach state in BGP

   Two distinct pieces of per-peer state have been identified in the
   sections above:

   o  The set of next-hops for BGP routes received from the BGP speaker,
      the Adj-NHIB-In.
   o  The set of next-hops the BGP speaker is advertising as reachable,
      i.e., has potential connectivity to, the Adj-NHIB-Out.

4.1.  Using the RS-Reachable SAFI to carry NHIB state

   A new BGP SAFI, the RS-Reachable NH-Reach SAFI, is defined in this document.  It
   has been assigned a value TBD.  A route server or a route server client
   using the procedures in this document negotiate the RS-
   Reachable SAFI MUST advertise support for this
   SAFI, for the IPv4 and/or IPv6 AFIs to carry NHIB entries.

   NHIB entries are exchanged as host routes using the NLRI format
   described in [RFC4271], section 4.3.  If a NHIB entry for a given Address Family Identifier (AFI).  The
   use of this SAFI with any other AFI is received with an inappropriate prefix length, that NLRI MUST BE

   NHIB entries MUST NOT be propagated from one BGP peering session to
   another; the routes are not transitive.  To help enforce this
   expected behavior, RS-Reachable routes MUST carry the NO_ADVERTISE
   community [RFC1997].  RS-Reachable routes not carrying defined by this community
   MUST BE ignored.

   If a NHIB entry is received from document.

   NH-Reach NLRI "routes" have a BGP speaker and that entry is not
   part Length of the sub-network for that BGP session, that NLRI MUST BE
   ignored.  This prevents erroneous BFD peering session being
   provisioned outside Next Hop Network Address
   value of the IXP network.

4.2.  Specific Procedures for Route Server Clients

   A route server SHALL always create 0, therefore they have an entry in its Adj-NHIB-Out for
   its empty Network Address of Next Hop
   field (section 3 of [RFC4760]).

   Since as specified here, ReachTell "routes" from different clients that are peering with each other through
   populate distinct databases on the route
   server, even if RS, there will generally be only a next hop has not been received for
   single path per "route"; this client.
   This self-originated entry permits BFD sessions at the clients to be
   provisioned even if the implies that route exchange via selection need not
   be performed (or equivalently, that it's trivial to perform).

   In the other direction, a client might peer with multiple route server is
   servers and one router sends receive differing sets of ReachAsk routes from them.  An
   implementation MAY handle this situation by implementing a distinct
   ReachAsk and ReachTell per server, but it MAY also handle it by
   placing all servers' ReachAsk "routes" into a single ReachAsk, and
   sending the results to all servers from a single ReachTell.  This
   would imply some route server(s) might get ReachTell results they had
   not asked for, but this is permissible in any case.  Again, since the second router in the
   route server view but not vice versa.

   Route server clients are considered to be peering with each other if
   the configuration
   contents of the route server permits routes from ReachAsk are simply a given
   pair set of peers host routes to be mutually tested,
   route selection over a combined ReachAsk MAY be omitted.

   ReachAsk and ReachTell entries are exchanged through using the route server.

4.3.  The RS-Reachable Control Extended Community NH-Reach NLRI

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |      0x43     | Sub-Type TBD1 |    Reserved (Must be Zero)
   |T|Reserved |Sta|          next hop (4 or 16 octets)            |
   |     Reserved (Must be Zero)   |            Flags            |F|
   .             ...  next hop (4 or 16 octets) ...                .
   .                                                               .
   The RS-Reachable Control Extended Community

                           NH-Reach NLRI Format

   o  T: Type is used to signal
   additional information in RS-Reachable NLRI.  Currently, a two-octet
   flag one-bit field is utilized for Flags.  The remainder of that can take the value 0, meaning the extended
      NLRI is currently reserved and its contents a ReachAsk entry, or 1, meaning it is a ReachTell entry.
   o  Reserved: These five bits are reserved.  They MUST be set to sent as zero
   when originated
      and SHOULD MUST be ignored upon disregarded on receipt.

   A single flag is currently reserved in this proposal:

      F: Flush received NHIB state.

5.  Processing NHIB
   o  Sta: State Changes

5.1.  Route Server Client Procedures for NHIB Changes

   When entries are added to the a route server client's Adj-NHIB-In for
   a route server peering session, it will then attempt to verify
   connectivity to the BGP nexthop for that entry.  The procedure
   described in this specification utilizes BFD; other mechanisms are
   permitted but are out of scope of this document.

   If no existing BFD session exists to this nexthop, a BFD session is
   provisioned a two-bit field used to that IP address and signal the Adj-NHIB-In (In?)  Reachable
   state is set to LocReach
      (Section 4.2) state:

      *  0 or 3: Unknown.  Since this session requires the remote BFD
      *  1: Up.
      *  2: Down.

      Although either 0 or 3 is to also be provisioned, it may stay in interpreted as "Unknown", the Down/AdminDown
   state for
      value 0 MUST be used on transmission.  The value 3 MUST be
      accepted as an alias for 0 on receipt.

   o  The next hop field is an IPv4 or IPv6 host route, depending on
      whether the AFI is IPv4 or IPv6.

   ReachAsk and ReachTell entries MUST NOT be propagated from one BGP
   peering session to another; the routes are not transitive.

   The next hop field is the key for the NH-Reach NLRI type; the
   information encoded in the top octet is non-key information.  It is
   possible in principle (although unlikely) for two NLRI to be validly
   present in an UPDATE message with identical next hop fields but
   different types.  However, two NLRI with the same next hop field and
   different State fields MUST NOT be encoded in the same UPDATE
   message.  If such is encountered, the receiver MUST behave as though
   the state "Unknown" was received for the next hop in question.

6.  Client Procedures for NH-Reach Changes

   When an entry is added to a period of time. route server client's ReachAsk-In for a
   route server peering session, the client will then attempt to verify
   connectivity to the host depicted by that entry.  The procedure
   described in this specification utilizes BFD.

   If no existing BFD session exists to this nexthop, a BFD session is
   provisioned to that IP address and the LocReach reachability state
   (Section 4.2) is set to Unknown.

   If the client can not cannot establish a BFD session with an entry in its
   ReachAsk-In, the next hop is put it nexthop remains in the Adj-NHIB-Out as LocReach with its Reachable for
   backward compatibility. state

   Once the BFD session moves to the Up state, the Adj-NHIB-In Reachable LocReach reachability
   state is set to Up.  This NHIB entry is now eligible to be placed in
   Adj-NHIB-Out table and distributed according to the procedures above.
   Additionally, local BGP route selection may be impacted by this
   state.  See Section 6.

   When the BFD session transitions out of the Up state to the Down
   state, the Adj-NHIB-In Reachable LocReach reachability state is set to Down.  The NHIB
   entry MUST be removed from the Adj-NHIB-Out table.  This informs the
   route server that the next hop is no longer reachable.

   If the BFD session transitions out of the Up state to the AdminDown
   state, the Adj-NHIB-In Reachable LocReach reachability state is set to AdminDown.  During
   this transition, the NHIB entry is not be removed from the Adj-NHIB-
   Out table.  Instead, the RS-Reachable Extended Community is added to
   the route with the F (flush) bit set.  This signals the route server
   should remove cached state for this entry.

   The motivation for this behavior is that AdminDown could imply one of
   two possible circumstances:

   o  The local BFD session has been deconfigured and BFD validation is
      no longer possible.  While the nexthop may still be usable, it is
      no longer able to be determined using BFD whether that can happen.
      Removing the entry from the Adj-NHIB-Out will inform the route
      server that the next hop is no longer reachable and may adversely
      impact the route server's view supplied to that route server
   o  The remote BFD session has been deconfigured with similar impact.

   An implementation of these procedures MUST provide an administrative
   mechanism to clear such AdminDown entries from the Adj-NHIB-Out
   table. Unknown.

   When entries are removed from the route server client's Adj-NHIB-In ReachAsk-In
   for a route server peering session, the client MAY delay de-
   provisioning the BFD peering session.  If the client delays de-
   provisioning the session, it should remove it if the BFD session
   transitions to the Down or AdminDown states.  The client should
   remove the entry from its Adj-NHIB-Out table regardless of the state
   of the BFD session.

5.2.  Route Server Procedures for NHIB Changes

   A route server is tracking two distinct types of next hop state for
   its clients:

   o  The BGP next hops received from those clients' BGP routes.
   o  The Adj-NHIB-Out state from each client representing next hops to
      which the clients believe they have connectivity.

   The route-server will place the collection of received BGP next hops
   from its clients into its per client Adj-NHIB-Out tables when at
   least one of the route server peers that supports this procedure has
   negotiated the RS-Reachable SAFI.  It will then advertise them per
   the procedures above.  This informs the route server clients of the
   available BGP nexthops visible to the route server supporting this

   In the event that a given client that supports this feature does not
   provide any routes containing BGP next hops that would be used to
   populate an Adj-NHIB-Out entry, the route server SHOULD advertise an
   entry for such a router using the provided self-originated entry.
   This permits the provisioning of BFD peering sessions for continuity
   check when route exchange via the route server is asymmetric and one
   client has routes from a second client, but not vice-versa.

   A route server will not generally delete NHIB entries learned in its
   per client Adj-NHIB-In table when processing a withdraw from the
   route server client.  It derives the following information from the
   presence and state, or absence, of an entry:

   o  When an NHIB entry is present, it means that the route server
      client has noted the BGP next hop from the route server and has
      validated connectivity to it.  Such an entry has the Received
      state of Active.
   o  When an entry is withdrawn but was previously present, it means
      that the route server client previously had validated connectivity
      to that next hop and NO LONGER has connectivity to it.  Such an
      entry has the Received state of Cached.  The route server may
      choose to adjust what routes are present in that client's view
      (Adj-Rib-Out) based on that information according to local
      capability and configuration.
   o  When an entry is missing, i.e. never has been seen, the route
      server can't derive any information about the reachability of a
      given next hop from the perspective of the route server client.
      The route server SHOULD NOT negatively bias the client's view
      according to this information.

   However, if the route server receives an NHIB entry with the F
   (flush) bit set the RS-Reachable Control Extended Community, it will
   remove the entry from the Adj-NHIB-In table for that peer.
   Similarly, if the entry is being removed because the peering session
   with the client has closed, entries will also be removed.

6.  Utilizing Next Hop Unreachability Information at Client Routers

   A client router detecting an unreachable next hop signals this
   information to the route server as described above.  Also, it treats
   the routes as unresolvable as per section [RFC4271] and
   proceeds with route selection as normal.

   Changes in nexthop reachability via the above should apply mechanisms
   to avoid unnecessary route flapping.  Such mechanisms exist in IGP
   implementations which should be applied it if the BFD session
   transitions to this scenario. the Down or AdminDown states.

7.  Recommendations for Using BFD

   The RECOMMENDED way a client router can confirm the data plane
   connectivity to its next hops is available, is the use of BFD in
   asynchronous mode.  Echo mode MAY be used if both client routers
   running a BFD session support this.  The use of authentication in BFD
   is OPTIONAL as there is a certain level of trust between the
   operators of the client routers at a particular IXP.  If trust cannot
   be assumed, it is recommended to use pair-wise keys (how this can be
   achieved is outside the scope of this document).  The ttl/hop limit
   values as described in section 5 [RFC5881] MUST be obeyed in order to
   shield BFD sessions against packets coming from outside the IXP.

   There is interdependence between the functions described in this
   document and BFD from an administrative point of view.  To streamline
   behaviour of different implementations the following are RECOMMENDED:

   o  If BFD is administratively shut down by the administrator of a
      client router then the functions described in this document MUST
      also be administratively shut down.
   o  If the administrator enables the functions described in this
      document on a client router then BFD MUST be automatically

   The following values of the BFD configuration of client routers (see
   section 6.8.1 [RFC5880]) are RECOMMENDED in order to allow fast
   detection of lost data plane connectivity: RECOMMENDED:

   o  DesiredMinTxInterval: 1,000,000 (microseconds)
   o  RequiredMinRxInterval: 1,000,000 (microseconds)
   o  DetectMult: 3

   The configuration values above are a trade-off between fast detection
   of data plane connectivity and the load client routers must handle
   keeping up the BFD communication.  Selecting smaller
   DesiredMinTxInterval and RequiredMinRxInterval values generates
   excessive BFD packets, especially at larger IXPs with many hundreds
   of client routers.

   The configuration values above were chosen to accept brief
   interruptions in the data plane.  Otherwise, if a BFD session detects
   a brief data plane interruption to a particular client router, it
   will signal to the route server that it should remove routes from
   this client router and shortly thereafter to add the routes again.
   This is disruptive and computationally expensive on the route server.

   The configuration values above are also partially impacted by BGP
   advertisement time in reaction to events from BFD.  If the
   configuration values are selected so that BFD detects data plane
   interruptions faster than the BGP advertisement time, a data plane
   connectivity flap could be detected by BFD but the route server is
   not informed about it because BGP is not able to transport this
   information quickly enough.

   As discussed, finding good configuration values is hard, so a

   A client router administrator MAY select more appropriate values to
   meet the special needs of a particular deployment.

8.  Bootstrapping

   During route server start-up, it does not know anything about
   connectivity states between client routers.  So, the route server
   assumes optimistically that all client routers are able to reach each
   other unless told otherwise.

9.  Other Considerations

   For purposes of routing stability, implementations may wish to apply
   hysteresis ("holddown") to next hops that have transitioned from
   reachable to unreachable and back.

10. have transitioned from
   reachable to unreachable and back.

   Implementations MAY restrict the range of addresses with which they
   will attempt to form BFD relationships.  For example, an
   implementation might by default only allow BFD relationships with
   peers that share a subnetwork with the route server.  An
   implementation MAY apply such restrictions by default.

9.  IANA Considerations

   IANA is requested to allocate a value from the Subsequent Address
   Family Identifiers (SAFI) Parameters registry for this proposal.  Its
   Description in that registry shall bgp RS-Reachable with a Reference
   of this RFC.

   IANA is request to allocate a value from the Non-Transitive Opaque
   Extended Community Sub-Types registry.  Its Name will be "RS-
   Reachable Control Extended Community" NH-Reach with a Reference of
   this RFC.


10.  Security Considerations

   The mechanism in this document permits a route server clients client to
   influence the contents of the route server's Adj-Ribs-Out through its
   reports of NHIB next hop reachability state using the Rs-Reachable NH-Reach SAFI.
   Since this state is per-client, if a route server client is able to
   inject Rs-
   Reachable NH-Reach routes for another route server's BGP session to a
   client, it can cause the route server to select different forwarding
   than otherwise expected.  This issue may be mitigated using transport
   security on its the BGP session to sessions between the route server and its
   clients.  See [RFC4272].

   Should route server clients provision the RS-Reachable

   The NH-Reach SAFI amongst
   themselves, it would be an error but would have no undesired impact
   on forwarding.  It is incorrect provisioning for an IXP client which
   is using a Route Server enables the server to have trigger creation of a BGP BFD
   session with another IXP on its client.  Should  A malicious or misbehaving server could
   trigger an unreasonable number of sessions, a potential resource
   exhaustion attack.  The sedate default timers proposed in Section 7
   mitigate this; they negotiate also mitigate concerns about use of the RS-Reachable SAFI and send RS-
   Reachable routes, this only serves to signal that BGP Speaker, when
   not operating client as
   a route server, to attempt to set verify
   connectivity with the hosts source of packets in the received NLRI.  While this may
   potentially request a large flooding attack.  An implementation MAY also
   impose limits on the number of sessions, the default BFD
   timers prevent excess packets from being sent from inappropriately
   provisioned sessions. sessions it will create at the
   request of the server.

   The reachability tests between route server clients themselves may be
   a target for attack.  Such attacks may include forcing a BFD session
   Down through injecting false BFD state.  A less likely attack
   includes forcing a BFD session to stay Up when its real state is
   Down.  These attacks may be mitigated using the BFD security
   mechanisms defined in [RFC5880].


11.  References


11.1.  Normative References

   [RFC1997]  Chandra, R., Traina, P., and T. Li, "BGP Communities
              Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996,

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,

   [RFC4271]  Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
              Border Gateway Protocol 4 (BGP-4)", RFC 4271,
              DOI 10.17487/RFC4271, January 2006,

   [RFC4760]  Bates, T., Chandra, R., Katz, D., and Y. Rekhter,
              "Multiprotocol Extensions for BGP-4", RFC 4760,
              DOI 10.17487/RFC4760, January 2007,

   [RFC5880]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
              (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010,

   [RFC5881]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
              (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881,
              DOI 10.17487/RFC5881, June 2010,

   [RFC7947]  Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker,
              "Internet Exchange BGP Route Server", RFC 7947,
              DOI 10.17487/RFC7947, September 2016,


11.2.  Informative References

   [RFC4272]  Murphy, S., "BGP Security Vulnerabilities Analysis",
              RFC 4272, DOI 10.17487/RFC4272, January 2006,

Appendix A.  Summary of Adj-NHIB-In state

   The Adj-NHIB-In state is maintained per BGP peering session.  It
   consists of per-peer state and per-peer, per-nexthop state.

    | Client Role                       | (Route-Server |            |
    |                                   |  Route-Server-Client       |
                Fig. 1  Per-peer Adj-NHIB-In Table State

   | NextHop                   | <IPv4 Address | IPv6 Address         |
   | Reachable                 | (Unknown | Up | Down | AdminDown)    |
                 Fig. 2  Per-peer, per-nexthop  Adj-NHIB-In State

Appendix B.  Summary of Document Changes

   idr-02 to idr-03:  Substantial rewrite.  Introduce NLRI format that
     embeds state.
   idr-01 to idr-02:  Move from BGP-LS to RS-Reachable NH-Reach SAFI.  Lots of
     editorial changes.
   idr-00 to idr-01:  Add BGP Capability.  Move from NH-Cost to BGP-LS.
   ymbk-01 to idr-00:  No technical changes; adopted by IDR.
   ymbk-00 to ymbk-01:  Clarifications to BFD procedures.  Use BFD state
     as an input to BGP route selection.

Authors' Addresses

   Randy Bush
   Internet Initiative Japan
   5147 Crystal Springs
   Bainbridge Island, Washington  98110


   Jeffrey Haas
   Juniper Networks, Inc.
   1133 Innovation Way
   Sunnyvale, CA  94089

   John G. Scudder
   Juniper Networks, Inc.
   1133 Innovation Way
   Sunnyvale, CA  94089


   Arnold Nipper
   DE-CIX Management GmbH
   Lichtstrasse 43i
   Cologne  50825


   Thomas King (editor)
   DE-CIX Management GmbH
   Lichtstrasse 43i
   Cologne  50825