< draft-ietf-bier-entropy-staged-dc-clos-00.txt   draft-ietf-bier-entropy-staged-dc-clos-01.txt >
Network Working Group J. Xie Network Working Group J. Xie
Internet-Draft Huawei Technologies Internet-Draft Huawei Technologies
Intended status: Informational X. Xu Intended status: Informational X. Xu
Expires: April 25, 2019 Alibaba Inc. Expires: November 9, 2019 Alibaba Inc.
G. Yan G. Yan
M. McBride M. McBride
Huawei Technologies Huawei Technologies
October 22, 2018 May 8, 2019
Use of BIER Entropy for Data Center CLOS Networks Use of BIER Entropy for Data Center Clos Networks
draft-ietf-bier-entropy-staged-dc-clos-00 draft-ietf-bier-entropy-staged-dc-clos-01
Abstract Abstract
Bit Index Explicit Replication (BIER) introduces a new multicast- Bit Index Explicit Replication (BIER) introduces a new multicast-
specific BIER Header. BIER can be applied to the Multi Protocol specific BIER Header. BIER can be applied to the Multi Protocol
Label Switching (MPLS) data plane or Non-MPLS data plane. Entropy is Label Switching (MPLS) data plane or Non-MPLS data plane. Entropy is
a technique used in BIER to support load-balancing. This document a technique used in BIER to support load-balancing. This document
examines and describes how BIER Entropy is to be applied to Data examines and describes how BIER Entropy is to be applied to Data
Center CLOS networks for path selection. Center Clos networks for path selection.
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
skipping to change at page 1, line 45 skipping to change at page 1, line 45
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 25, 2019. This Internet-Draft will expire on November 9, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Problem Statement and Considerations . . . . . . . . . . . . 3 3. Problem Statement and Considerations . . . . . . . . . . . . 3
3.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 3 3.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 3
3.2. Considerations . . . . . . . . . . . . . . . . . . . . . 4 3.2. Considerations . . . . . . . . . . . . . . . . . . . . . 4
4. Use of BIER Entropy for DC CLOS Network . . . . . . . . . . . 5 4. Use of BIER Entropy for DC Clos Network . . . . . . . . . . . 5
4.1. Use of BIER Entropy for DC CLOS Network . . . . . . . . . 5 4.1. Use of BIER Entropy for DC Clos Network . . . . . . . . . 5
4.2. Steering for elephant flows . . . . . . . . . . . . . . . 6 4.2. Steering for elephant flows . . . . . . . . . . . . . . . 6
4.3. Path Division for Tenant flows to different SIs . . . . . 6 4.3. Path Division for Tenant flows to different SIs . . . . . 6
4.4. Link Failure and Convergence . . . . . . . . . . . . . . 6 4.4. Link Failure and Convergence . . . . . . . . . . . . . . 6
5. Data-Plane Processing . . . . . . . . . . . . . . . . . . . . 7 5. Data-Plane Processing . . . . . . . . . . . . . . . . . . . . 7
6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 7
9.1. Normative References . . . . . . . . . . . . . . . . . . 7 9.1. Normative References . . . . . . . . . . . . . . . . . . 7
9.2. Informative References . . . . . . . . . . . . . . . . . 8 9.2. Informative References . . . . . . . . . . . . . . . . . 8
skipping to change at page 2, line 50 skipping to change at page 2, line 50
1. Introduction 1. Introduction
Bit Index Explicit Replication (BIER) [RFC8279] is an architecture Bit Index Explicit Replication (BIER) [RFC8279] is an architecture
that provides optimal multicast forwarding without requiring that provides optimal multicast forwarding without requiring
intermediate routers to maintain any per-flow state by using a intermediate routers to maintain any per-flow state by using a
multicast-specific BIER header. [RFC8296] defines two types of BIER multicast-specific BIER header. [RFC8296] defines two types of BIER
encapsulation formats: one is MPLS encapsulation, the other is non- encapsulation formats: one is MPLS encapsulation, the other is non-
MPLS encapsulation. Entropy is a technique used in BIER to support MPLS encapsulation. Entropy is a technique used in BIER to support
load-balancing. This document examines and describes how BIER load-balancing. This document examines and describes how BIER
Entropy is to be applied to Data Center CLOS networks for path Entropy is to be applied to Data Center Clos networks for path
selection. selection.
2. Terminology 2. Terminology
Readers of this document are assumed to be familiar with the Readers of this document are assumed to be familiar with the
terminology and concepts of the documents listed as Normative terminology and concepts of the documents listed as Normative
References. References.
3. Problem Statement and Considerations 3. Problem Statement and Considerations
3.1. Problem Statement 3.1. Problem Statement
A common choice for a horizontally scalable topology used in Data A common choice for a horizontally scalable topology used in Data
Center is a CLOS topology. This topology features an odd number of Center is a Clos topology. This topology features an odd number of
stages, for example, a 5-Stage CLOS Topology as a example in stages, for example, a 5-Stage Clos Topology as a example in
[RFC7938]. [RFC7938].
ECMP is the fundamental load-sharing mechanism used by a CLOS ECMP is the fundamental load-sharing mechanism used by a Clos
topology. Effectively, every lower-tier device will use all of its topology. Effectively, every lower-tier device will use all of its
directly attached upper-tier devices to load-share traffic destined directly attached upper-tier devices to load-share traffic destined
to the same IP prefix. The number of ECMP paths between any two Tier to the same IP prefix. The number of ECMP paths between any two Tier
3 devices in CLOS topology is equal to the number of the devices in 3 devices in Clos topology is equal to the number of the devices in
the middle stage (Tier 1). For example, Figure 1 illustrates a the middle stage (Tier 1). For example, Figure 1 illustrates a
topology where Tier 3 device L1 has four paths to reach servers X and topology where Tier 3 device L1 has four paths to reach servers X and
Y, via Tier 2 devices S1 and S2 and then Tier 1 devices S11, S12, S21 Y, via Tier 2 devices S1 and S2 and then Tier 1 devices S11, S12, S21
and S22 respectively. and S22 respectively.
Tier 1 Tier 1
+-----+ +-----+
Cluster |SUPER| Cluster |SUPER|
+----------------------------+ +--| S11 |--+ +----------------------------+ +--| S11 |--+
| | | +-----+ | | | | +-----+ |
skipping to change at page 4, line 30 skipping to change at page 4, line 30
| | | | | | | | | | | | | | | | | | | | | | | |
| +-----+ +-----+ | | +-----+ | +-----+ +-----+ | +-----+ +-----+ | | +-----+ | +-----+ +-----+
| | LEAF| | LEAF| | +--|SUPER|--+ | LEAF| | LEAF| | | LEAF| | LEAF| | +--|SUPER|--+ | LEAF| | LEAF|
| | L1 | | L2 | Tier 3 | | S22 | Tier 3 | L3 | | L4 | | | L1 | | L2 | Tier 3 | | S22 | Tier 3 | L3 | | L4 |
| +-----+ +-----+ | +-----+ +-----+ +-----+ | +-----+ +-----+ | +-----+ +-----+ +-----+
| | | | | | | | | | | | | | | | | | | |
| O O O O | X Y O O | O O O O | X Y O O
| Servers | Servers | Servers | Servers
+----------------------------+ +----------------------------+
Figure 1: 5-Stage CLOS Topology Figure 1: 5-Stage Clos Topology
When BIER is deployed in a multi-tenant data center network When BIER is deployed in a multi-tenant data center network
environment for efficient delivery of Broadcast, Unknown-unicast and environment for efficient delivery of Broadcast, Unknown-unicast and
Multicast (BUM) traffic, a network operator may want a deterministic Multicast (BUM) traffic, a network operator may want a deterministic
path for every packet. For example, when L1 needs to send a BUM path for every packet. For example, when L1 needs to send a BUM
packet to L3 and L4, which are in different SIs, L1 has to send the packet to L3 and L4, which are in different SIs, L1 has to send the
packet twice, and expects the packet along two deterministic paths of packet twice, and expects the packet along two deterministic paths of
L1->S1->S11-->L3 and L1->S2->S21-->L4 seperately. Another example of L1->S1->S11-->L3 and L1->S2->S21-->L4 seperately. Another example of
using a deterministic path in a DC is for per-flow steering of using a deterministic path in a DC is for per-flow steering of
"elephant" flows defined in [I-D.ietf-spring-segment-routing-msdc]. "elephant" flows defined in [I-D.ietf-spring-segment-routing-msdc].
skipping to change at page 5, line 19 skipping to change at page 5, line 19
If one wants, however, to get a deterministic path from the equal If one wants, however, to get a deterministic path from the equal
cost paths, one can use part of the 20-bit entropy field. For cost paths, one can use part of the 20-bit entropy field. For
example, bit 0 to bit 2 of entropy label can represent a value of 0 example, bit 0 to bit 2 of entropy label can represent a value of 0
to 7, and thus can be used to select a deterministic path from 8 to 7, and thus can be used to select a deterministic path from 8
equal cost paths. And thus, a 20-bit entropy label can be used by equal cost paths. And thus, a 20-bit entropy label can be used by
routers in different tiers to select a deterministic path routers in different tiers to select a deterministic path
independently by using different parts of the 20-bit entropy label, independently by using different parts of the 20-bit entropy label,
and form an end-to-end deterministic path. and form an end-to-end deterministic path.
This is simple and applicable especially for DC CLOS networks, This is simple and applicable especially for DC Clos networks,
because data delivery in DC CLOS networks for tenants is always because data delivery in DC Clos networks for tenants is always
multi-staged, with the upstream direction stages having equal cost multi-staged, with the upstream direction stages having equal cost
paths. paths.
4. Use of BIER Entropy for DC CLOS Network 4. Use of BIER Entropy for DC Clos Network
4.1. Use of BIER Entropy for DC CLOS Network 4.1. Use of BIER Entropy for DC Clos Network
Take the 5-stage CLOS network in figure 1 as an example. Take the 5-stage Clos network in figure 1 as an example.
Tier 2 in every cluster has N nodes, and the Tier 1 has M nodes. M Tier 2 in every cluster has N nodes, and the Tier 1 has M nodes. M
is equal to N multiplied by P. is equal to N multiplied by P.
Tier 3 switches, in upstream direction, act as stage 1 of data Tier 3 switches, in upstream direction, act as stage 1 of data
delivery and have N equal cost paths to every BFERs in other delivery and have N equal cost paths to every BFERs in other
clusters. Tier 2 switches, in upstream direction, act as stage 2 of clusters. Tier 2 switches, in upstream direction, act as stage 2 of
data delivery and have P equal cost paths to every BFERs in other data delivery and have P equal cost paths to every BFERs in other
clusters. clusters.
skipping to change at page 7, line 50 skipping to change at page 7, line 50
[I-D.ietf-mpls-spring-entropy-label] [I-D.ietf-mpls-spring-entropy-label]
Kini, S., Kompella, K., Sivabalan, S., Litkowski, S., Kini, S., Kompella, K., Sivabalan, S., Litkowski, S.,
Shakir, R., and J. Tantsura, "Entropy label for SPRING Shakir, R., and J. Tantsura, "Entropy label for SPRING
tunnels", draft-ietf-mpls-spring-entropy-label-12 (work in tunnels", draft-ietf-mpls-spring-entropy-label-12 (work in
progress), July 2018. progress), July 2018.
[I-D.ietf-spring-segment-routing-msdc] [I-D.ietf-spring-segment-routing-msdc]
Filsfils, C., Previdi, S., Dawra, G., Aries, E., and P. Filsfils, C., Previdi, S., Dawra, G., Aries, E., and P.
Lapukhov, "BGP-Prefix Segment in large-scale data Lapukhov, "BGP-Prefix Segment in large-scale data
centers", draft-ietf-spring-segment-routing-msdc-10 (work centers", draft-ietf-spring-segment-routing-msdc-11 (work
in progress), October 2018. in progress), November 2018.
[RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of
BGP for Routing in Large-Scale Data Centers", RFC 7938, BGP for Routing in Large-Scale Data Centers", RFC 7938,
DOI 10.17487/RFC7938, August 2016, DOI 10.17487/RFC7938, August 2016,
<https://www.rfc-editor.org/info/rfc7938>. <https://www.rfc-editor.org/info/rfc7938>.
[RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
Przygienda, T., and S. Aldrin, "Multicast Using Bit Index Przygienda, T., and S. Aldrin, "Multicast Using Bit Index
Explicit Replication (BIER)", RFC 8279, Explicit Replication (BIER)", RFC 8279,
DOI 10.17487/RFC8279, November 2017, DOI 10.17487/RFC8279, November 2017,
 End of changes. 17 change blocks. 
22 lines changed or deleted 22 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/