[Docs] [txt|pdf] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits]

Versions: (draft-rp-trill-parent-selection) 00 01 02 03

TRILL Working Group                                      R. Parameswaran
INTERNET-DRAFT                                    Individual Contributor
Intended status: Proposed Standard
Expires: August 19, 2018                               February 15, 2018

         TRILL (Transparent Interconnection of Lots of Links):
         Mitigation of Parent Node Shifts in Tree Construction


   This document describes a known problem in the TRILL tree
   construction mechanism and offers an approach requiring no change to
   the TRILL protocol that solves the problem.

Status of This Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Distribution of this document is unlimited. Comments should be sent
   to the authors or the TRILL working group mailing list:

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft
   Shadow Directories can be accessed at

R. Parameswaran                                                 [Page 1]

INTERNET-DRAFT                                    TRILL Parent Selection

Table of Contents

      1. Introduction............................................3
      1.1 Terminology and Acronyms...............................3
      2. Tree construction in TRILL..............................3
      3. Issues with the TRILL tree construction algorithm.......4
      4. Solution using the Affinity sub-TLV.....................6
      5. Network wide selection of computation algorithm........10
      6. Security Considerations................................10
      7. IANA Considerations....................................10

      8. Normative References...................................11
      9. Informative References.................................11
     10. Acknowledgements.......................................11

      Author's Address:.........................................12

R. Parameswaran                                                 [Page 2]

INTERNET-DRAFT                                    TRILL Parent Selection

1. Introduction

   TRILL is a data center technology that uses link-state routing
   mechanisms in a layer 2 setting, and serves as a replacement for the
   spanning-tree protocol.  TRILL uses Multi-destination trees rooted at
   predetermined nodes as a way to distribute multi-destination traffic.

   Multi-destination traffic includes traffic such as layer-2 broadcast
   frames, unknown unicast flooded frames, and layer 2 traffic with
   multicast MAC addresses (collectively referred to as BUM traffic).
   Multi-destination traffic is typically hashed onto one of the
   available trees and sent over the tree, potentially reaching all
   nodes in the network (hosts behind which may own/need the packet in

1.1 Terminology and Acronyms

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in [RFC2119].

2. Tree construction in TRILL

   Tree construction in TRILL is defined by [RFC6325], with corrections
   defined in [RFC7780].

   The tree construction mechanism used in TRILL codifies certain tree
   construction steps which make the resultant trees brittle as
   explained below. TRILL uses the following rule - when constructing an
   SPF tree, if there are multiple possible parents for a given node
   (i.e. if multiple upstream nodes can potentially pull in a given node
   during SPF, all at the same cumulative cost, then the parent
   selection is imposed in the following manner):

   "When building the tree number j, remember all possible equal cost
   parents for node N.  After calculating the entire 'tree' (actually,
   directed graph), for each node N, if N has 'p' parents, then order
   the parents in ascending order according to the 7-octet IS-IS ID
   considered as an unsigned integer, and number them starting at zero.
   For tree j, choose N's parent as choice j mod p."

   There is an additional correction posted to this in [RFC7780]:

   [RFC7780], Section 3.4:

R. Parameswaran                                                 [Page 3]

INTERNET-DRAFT                                    TRILL Parent Selection

      "Section 4.5.1 of [RFC6325] specifies that, when building
      distribution tree number j, node (RBridge) N that has multiple
      possible parents in the tree is attached to possible parent number
      j mod p.  Trees are numbered starting with 1, but possible parents
      are numbered starting with 0.  As a result, if there are two trees
      and two possible parents, then in tree 1 parent 1 will be
      selected, and in tree 2 parent 0 will be selected.

      This is changed so that the selected parent MUST be (j-1) mod p.
      As a result, in the case above, tree 1 will select parent 0, and
      tree 2 will select parent 1.  This change is not backward
      compatible with [RFC6325].  If all RBridges in a campus do not
      determine distribution trees in the same way, then for most
      topologies, the RPFC will drop many multi-destination packets
      before they have been properly delivered."

3. Issues with the TRILL tree construction algorithm

   With the tree construction mechanism in Section 2 in mind,let's look
   at the Spine-Leaf topology presented below and consider the
   calculation of Tree number 2 in TRILL.  Assume all the links in the
   tree are the same cost.

          A--   --B
         / \ \/   /\
        /   \/\ _/_ \
       /__ _/\  /   \\
      //      \/     \\
      1        2       3
       \       |      /
        \      |     /
         \     |    /
          \    |   /
           \   |  /
            \  | /
             \ |/

   Assume that in the above topology, when ordered by 7-octet ISIS-id, 1
   < 2 < 3 holds and that the root for Tree number 2 is A. Given the
   ordered set {1, 2, 3} , these nodes have the following indices (with
   a starting index of 0):

      Node    Index
       1       0
       2       1
       3       2

R. Parameswaran                                                 [Page 4]

INTERNET-DRAFT                                    TRILL Parent Selection

   Given the SPF (Shortest Path First) constraint and that the tree root
   is A, the parent for nodes 1,2, and 3 will be A. However, when the
   SPF algorithm tries to pull B or C into the tree, we have a choice of
   parents, namely 1, 2, or 3.

   Given that this is tree 2, the parent will be the one with index
   (2-1) mod 3 (which is equal to 1). Hence the parent for node B will
   be the node with an index value of 1, which is node 2.

                 / | \
                /  |  \
               1   2   3
                  /  \
                 B    C

   However, due to TRILL's parent selection algorithm, the sub-tree
   rooted at Node 2 will be impacted even if Node 1 or Node 3 go down.

   Take the case where Node 1 goes down. Tree 2 must now be re-computed
   (this is normal) - but now, when the SPF computation is underway,
   when the SPF process tries to pull in B, the list of potential
   parents for B now are {2  and  3}. So, after ordering these by ISIS-
   Id as {2, 3} (where 2 is considered to be at index of 0 and 3 is
   considered to be at index 1), for tree 1, we apply TRILL's formula

   Parent's index = (TreeNumber-1) mod Number_of_parents.
      = (2-1) mod 2
      = 1 mod 2
      = 1 (which is the index of  Node 3)

   The re-calculated tree now looks as shown below. The shift in parent
   nodes (for B) may cause disruption to live traffic in the network,
   and is unnecessary in absolute terms because the existing parent for
   node B, node 2, was not perturbed in any way.

                  / \
                 /   \
                /     \
               2       3
                      /  \
                     B    C

   Aside from the disruption posed by the change in the tree links,
   depending upon how the concerned RBridges distribute VLANs/FGLs

R. Parameswaran                                                 [Page 5]

INTERNET-DRAFT                                    TRILL Parent Selection

   across trees and how they may prune these, additional disruption is
   possible if the forwarding state on the new parent RBridge is not
   primed to match the new tree structure. This churn could simply be
   avoided with a better approach.

   The parent shift issue noted above can be solved by using the
   Affinity sub-TLV which is specified in [RFC7176].

   While the technique identified in this draft has an immediate benefit
   when applied to spine/leaf networks popular in data-center designs,
   nothing in the approach outlined below assumes a spine-leaf network.
   The technique presented below will work on any connected graph.
   Furthermore, no directional symmetry in link-cost is assumed.

4. Solution using the Affinity sub-TLV

   At a high level, this problem can be solved by having the affected
   parent send out an Affinity sub-TLV identifying the children for
   which it wants to preserve the parent-child relationship, despite
   network events which may change the structure of the tree. The
   concerned parent node would send out an Affinity sub-TLV with
   multiple Affinity records, one per child node, listing the affected
   tree number.

   It would be sufficient to have a local RBridge configuration option
   at one of the nodes that is the parent chosen (referred to as
   designated parent below). The following steps provide a way to
   implement this proposal:

      a. The operator locally configures the designated parent to
         indicate its stickiness in tree construction for a specific
         tree number and tree root via the Affinity sub-TLV. This can be
         done before tree construction if the operator consults the 7
         octet ISIS-ID relative ordering of the concerned nodes and
         decides up-front which of the potential parent nodes should
         become the parent node for a given set of children on that tree
         number under the TRILL tree construction mechanism. The
         operator MUST configure the designated parent stickiness on
         only one node amongst a set of sibling (potential parent) nodes
         relative to the tree root for that tree number.

         It is suggested that the parent stickiness be configured on the
         node that would have been selected as the parent under default
         TRILL parent selection rules. Parent stickiness MUST NOT be
         configured on the root of the tree, or if configured previously
         on a non-root node with the root for that tree shifting to that
         node subsequently, such configuration MUST be ignored on the
         root node.

R. Parameswaran                                                 [Page 6]

INTERNET-DRAFT                                    TRILL Parent Selection

      b. On any subsequent SPF calculation after the operator configures
         the designated parent as indicated above, when the designated
         parent node finds that it could be a potential parent for one
         or more child nodes during tree construction, it declares
         itself to be the parent for the concerned child nodes,
         overriding the default TRILL parent selection rules. The
         configured node advertises its parent preference via the
         Affinity sub-TLV when it completes a tree calculation, and
         finds itself the parent of one or more child nodes per the SPF
         tree calculation. The Affinity sub-TLV MUST reflect the
         appropriate tree number and the child nodes for which the
         concerned node is a parent node. The Affinity sub-TLV SHOULD be
         published when the tree computation is deemed to have converged
         (more on this under d below).

      c. Likewise, when any change event happens in the network, one
         which forces a tree re-calculation for the concerned tree, the
         designated parent node MUST run through the normal TRILL tree
         calculation agnostic to the fact that it has published an
         Affinity sub-TLV and agnostic to the default TRILL tree
         selection rules i.e the node asserts its right to be a parent
         (based on its configuration as a designated parent) without
         directly referencing the default TRILL parent selection rules
         or its own published Affinity sub-TLV in establishing parent

      d. During the SPF tree calculation, the designated parent node
         should react in the following manner:

            i. If the node is a potential parent for some of the
               children identified in an existing Affinity sub-TLV, if
               any, after convergence of the tree computation, the node
               MUST send out an (updated) Affinity sub-TLV identifying
               the correct sub-set of children for which the node
               aspires to establish/continue the parent relationship.
               This case would also apply if there are new child nodes
               for which the node is now a parent (however, see the
               conflicted Affinity sub-TLV rules in vii and i below).

               For its own tree computation, the designated parent node
               MUST use itself as parent in order to pull the set of
               children identified during the SPF run into the tree,
               barring a conflicting affinity sub-TLV seen from another
               node (see vii. below for handling this case).

           ii. If the tree structure later changes such that the
               designated node is no longer a potential parent for any
               of the child nodes in the advertised Affinity sub-TLV,
               then it SHOULD retract the Affinity sub-TLV, upon
               convergence of the tree computation.  In this case, the

R. Parameswaran                                                 [Page 7]

INTERNET-DRAFT                                    TRILL Parent Selection

               default TRILL tie-breaking rule would need to be used
               during SPF construction for the nodes that were children
               of this designated node previously. One specific case may
               be worth highlighting - if a parent-child relationship
               inverts i.e. if the designated parent becomes a child of
               its former child node due to a change in the tree
               structure, it MUST exclude that child from its Affinity
               sub-TLV. In such case, if the designated parent node
               cannot maintain a parent relationship with any of its
               prior child nodes, then it MUST retract any previously
               published affinity sub-TLV.

          iii. Nodes SHOULD use a convergence timer to track completion
               of the tree computation. If there are any additional tree
               computations while the convergence timer is running, the
               timer SHOULD be re-started/extended in order to absorb
               the interim network events. It is possible that the
               intended action at the expiration of the timer may change
               meanwhile. The timer needs to be large enough to absorb
               multiple network events that may happen due to a change
               in the physical state of the network, and yet short
               enough to avoid delaying the update of the Affinity sub-

           iv. At the expiration of the convergence timer, the existing
               state of the tree MUST be compared with the existing
               Affinity sub-TLV and the intended change in the status of
               the Affinity sub-TLV is carried out e.g. a fresh
               publication, or an update to the list of children, or a

            v. Alternately, the above steps (re-examination of the
               Affinity sub-TLV and update) MAY be tied to/triggered
               from the download of the tree routes to the L2 RIB, since
               that typically happens upon a successful computation of
               the complete tree. An additional stabilization timer
               could be used to counteract back-to-back L2 RIB downloads
               due to repeated computations of the tree due to a burst
               of network events.

           vi. Note that this approach may cause an additional tree
               computation at remote nodes once the updated Affinity
               sub-TLV (or lack of it) is received/perceived, beyond the
               network events which led up to the change in the tree. In
               the case where an operator introduced a designated parent
               configuration on an existing tree, then remote nodes
               would need to receive the Affinity sub-TLV indicating the
               designated parent's Affinity for its children before the
               remote nodes shift away from the default TRILL parent
               selection rules. However, in most cases, in steady state,

R. Parameswaran                                                 [Page 8]

INTERNET-DRAFT                                    TRILL Parent Selection

               this mechanism should cause very little tree churn unless
               a designated parent configuration was introduced,
               removed, or a link between the designated parent and its
               children changed state. In cases where the network change
               event originated on the designated parent node, it may be
               possible to optimize on the churn by packing both the
               data bearing the network change event and the Affinity
               sub-TLV into the same link-state update packet.

          vii. In situations where the designated parent node would
               normally originate an affinity sub-TLV to indicate
               affinity to a specific set of child nodes, it MUST NOT
               originate an Affinity sub-TLV if it sees an Affinity sub-
               TLV from some other node for the same tree number and for
               all of the same child-nodes, such that the other node's
               Affinity sub-TLV would win using the conflict tie-break
               rules in section 5.3 of [RFC7783]. Any existing Affinity
               sub-TLV already published by this node in such a
               situation MUST be retracted. If only some of the child
               nodes overlap between the two conflicting Affinity sub-
               TLVs, then this designated parent node MAY continue to
               publish its affinity sub-TLV listing its child nodes that
               are not in conflict with the other Affinity sub-TLV.
               Other guidelines listed in [RFC7783] MUST be adhered to
               as well - the originator of the Affinity sub-TLV must
               name only directly adjacent nodes as children, and must
               not name the tree root as a child.

      e. Situations where the node advertising the Affinity sub-TLV dies
         or restarts SHOULD be handled using the normal handling for
         such scenarios relating to the parent Router Capability TLV,
         and as specified in [RFC7981].

      f. Situations where a parent-child link directly connected to the
         designated parent node constantly flaps, MUST be handled by
         having the designated parent node retract the Affinity sub-TLV,
         if it affects the parent-child relationships in consideration.
         The long-term state of the Affinity sub-TLV can be monitored by
         the designated parent node to see if it is being published and
         retracted repeatedly in multiple iterations or if a specific
         set of children are being constantly added and removed. The
         designated parent may resume publication of the Affinity sub-
         TLV once it perceives the network to be stable again in the

      g. If the designated parent node is forced to retract its Affinity
         sub-TLV due to a change in the tree structure, it can then
         repeat these steps in a subsequent tree construction, if the
         same node becomes a parent again, so long as it perceives its
         parent-child links to be stable (free of link/node flaps).

R. Parameswaran                                                 [Page 9]

INTERNET-DRAFT                                    TRILL Parent Selection

      h. Remote nodes MUST default to the TRILL parent selection rules
         if they do not see an Affinity sub-TLV sent by any node in the

      i. At remote nodes, conflicting Affinity sub-TLVs from different
         originators for the same tree number and child node MUST be
         handled as specified in section 5.3 of [RFC7783], namely by
         selecting the Affinity sub-TLV originated by the node with the
         highest priority to be a tree root, with System-ID as tie-

5. Network wide selection of computation algorithm

   The proposed solution above does not need any operational change to
   the TRILL protocol, beyond the usage of the Affinity sub-TLV (which
   is already in the proposed standard) for the use case identified in
   this draft.

   In terms of nodes that do not support this draft, they are expected
   to seamlessly inter-operate with this draft, so long as they
   understand and honor the Affinity sub-TLV. The draft assumes that
   most TRILL implementations now support the Affinity sub-TLV.  In any
   case, the guidelines specified in section 4.1 of [RFC7783] MUST be
   used i.e.  if all nodes in the network do not announce support of the
   Affinity sub-TLV then the network MUST default to the TRILL parent
   selection rules.

6. Security Considerations

   The proposal primarily influences tree construction and tries to
   preserve parent-child relationships in the tree from prior
   computations of the same tree, without changing any operational
   aspects of the protocol (this proposal does not introduce any new
   TLV/sub-TLV).  Hence, no new security considerations for TRILL are
   raised by this proposal.

7. IANA Considerations

   This document requires no actions by IANA. The Affinity Sub-TLV has
   been defined in [RFC7176], and this proposal requires use of this
   Sub-TLV but does not change its semantics in any way.

R. Parameswaran                                                [Page 10]

INTERNET-DRAFT                                    TRILL Parent Selection

8. Normative References

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, DOI
             10.17487/RFC2119, March 1997, <http://www.rfc-

   [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A.
             Ghanwani, "Routing Bridges (RBridges): Base Protocol
             Specification", RFC 6325, DOI 10.17487/RFC6325, July 2011,

   [RFC7780] Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A.,
             Ghanwani, A., and S. Gupta, "Transparent Interconnection of
             Lots of Links (TRILL): Clarifications, Corrections, and
             Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016,

   [RFC7783] Senevirathne, T., Pathangi, J., Hudson, J., "Coordinated
             Multicast Trees (CMT) for Transparent Interconnection of
             Lots of Links (TRILL)", RFC 7783, February 2016,

   [RFC7981] Ginsberg, L., Previdi, S., Chen, M., "IS-IS Extensions
             for Advertising Router Information", RFC 7981, October
             2016, <http://datatracker.ietf.org/doc/rfc7981>

   [RFC7176] Eastlake 3'rd, D., et al, "Transparent Interconnection of
             Lots of Links (TRILL) Use of IS-IS", RFC 7176, May 2014,

9. Informative References


10. Acknowledgements

I would like to thank Donald Eastlake for his help in preparing the
current iteration of the draft, and for reviewing prior iterations.

R. Parameswaran                                                [Page 11]

INTERNET-DRAFT                                    TRILL Parent Selection

Author's Address:

      Ramkumar Parameswaran,
      Individual contributor,
      PO Box 2788
      Cupertino, CA 95015.

      Email: parameswaran.r7@gmail.com

Copyright, Disclaimer, and Additional IPR Provisions

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

   The definitive version of an IETF Document is that published by, or
   under the auspices of, the IETF. Versions of IETF Documents that are
   published by third parties, including those that are translated into
   other languages, should not be considered to be definitive versions
   of IETF Documents. The definitive version of these Legal Provisions
   is that published by, or under the auspices of, the IETF. Versions of
   these Legal Provisions that are published by third parties, including
   those that are translated into other languages, should not be
   considered to be definitive versions of these Legal Provisions.  For
   the avoidance of doubt, each Contributor to the IETF Standards
   Process licenses each Contribution that he or she makes as part of
   the IETF Standards Process to the IETF Trust pursuant to the
   provisions of RFC 5378. No language to the contrary, or terms,
   conditions or rights that differ from or are inconsistent with the
   rights and licenses granted under RFC 5378, shall have any effect and
   shall be null and void, whether published or posted by such
   Contributor, or included with or in such Contribution.

R. Parameswaran                                                [Page 12]

Html markup produced by rfcmarkup 1.127, available from https://tools.ietf.org/tools/rfcmarkup/