< draft-ietf-bier-entropy-staged-dc-clos-00.txt | draft-ietf-bier-entropy-staged-dc-clos-01.txt > | |||
---|---|---|---|---|
Network Working Group J. Xie | Network Working Group J. Xie | |||
Internet-Draft Huawei Technologies | Internet-Draft Huawei Technologies | |||
Intended status: Informational X. Xu | Intended status: Informational X. Xu | |||
Expires: April 25, 2019 Alibaba Inc. | Expires: November 9, 2019 Alibaba Inc. | |||
G. Yan | G. Yan | |||
M. McBride | M. McBride | |||
Huawei Technologies | Huawei Technologies | |||
October 22, 2018 | May 8, 2019 | |||
Use of BIER Entropy for Data Center CLOS Networks | Use of BIER Entropy for Data Center Clos Networks | |||
draft-ietf-bier-entropy-staged-dc-clos-00 | draft-ietf-bier-entropy-staged-dc-clos-01 | |||
Abstract | Abstract | |||
Bit Index Explicit Replication (BIER) introduces a new multicast- | Bit Index Explicit Replication (BIER) introduces a new multicast- | |||
specific BIER Header. BIER can be applied to the Multi Protocol | specific BIER Header. BIER can be applied to the Multi Protocol | |||
Label Switching (MPLS) data plane or Non-MPLS data plane. Entropy is | Label Switching (MPLS) data plane or Non-MPLS data plane. Entropy is | |||
a technique used in BIER to support load-balancing. This document | a technique used in BIER to support load-balancing. This document | |||
examines and describes how BIER Entropy is to be applied to Data | examines and describes how BIER Entropy is to be applied to Data | |||
Center CLOS networks for path selection. | Center Clos networks for path selection. | |||
Requirements Language | Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
skipping to change at page 1, line 45 ¶ | skipping to change at page 1, line 45 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on April 25, 2019. | This Internet-Draft will expire on November 9, 2019. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
3. Problem Statement and Considerations . . . . . . . . . . . . 3 | 3. Problem Statement and Considerations . . . . . . . . . . . . 3 | |||
3.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 3 | 3.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 3 | |||
3.2. Considerations . . . . . . . . . . . . . . . . . . . . . 4 | 3.2. Considerations . . . . . . . . . . . . . . . . . . . . . 4 | |||
4. Use of BIER Entropy for DC CLOS Network . . . . . . . . . . . 5 | 4. Use of BIER Entropy for DC Clos Network . . . . . . . . . . . 5 | |||
4.1. Use of BIER Entropy for DC CLOS Network . . . . . . . . . 5 | 4.1. Use of BIER Entropy for DC Clos Network . . . . . . . . . 5 | |||
4.2. Steering for elephant flows . . . . . . . . . . . . . . . 6 | 4.2. Steering for elephant flows . . . . . . . . . . . . . . . 6 | |||
4.3. Path Division for Tenant flows to different SIs . . . . . 6 | 4.3. Path Division for Tenant flows to different SIs . . . . . 6 | |||
4.4. Link Failure and Convergence . . . . . . . . . . . . . . 6 | 4.4. Link Failure and Convergence . . . . . . . . . . . . . . 6 | |||
5. Data-Plane Processing . . . . . . . . . . . . . . . . . . . . 7 | 5. Data-Plane Processing . . . . . . . . . . . . . . . . . . . . 7 | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 | |||
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 | |||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
9.1. Normative References . . . . . . . . . . . . . . . . . . 7 | 9.1. Normative References . . . . . . . . . . . . . . . . . . 7 | |||
9.2. Informative References . . . . . . . . . . . . . . . . . 8 | 9.2. Informative References . . . . . . . . . . . . . . . . . 8 | |||
skipping to change at page 2, line 50 ¶ | skipping to change at page 2, line 50 ¶ | |||
1. Introduction | 1. Introduction | |||
Bit Index Explicit Replication (BIER) [RFC8279] is an architecture | Bit Index Explicit Replication (BIER) [RFC8279] is an architecture | |||
that provides optimal multicast forwarding without requiring | that provides optimal multicast forwarding without requiring | |||
intermediate routers to maintain any per-flow state by using a | intermediate routers to maintain any per-flow state by using a | |||
multicast-specific BIER header. [RFC8296] defines two types of BIER | multicast-specific BIER header. [RFC8296] defines two types of BIER | |||
encapsulation formats: one is MPLS encapsulation, the other is non- | encapsulation formats: one is MPLS encapsulation, the other is non- | |||
MPLS encapsulation. Entropy is a technique used in BIER to support | MPLS encapsulation. Entropy is a technique used in BIER to support | |||
load-balancing. This document examines and describes how BIER | load-balancing. This document examines and describes how BIER | |||
Entropy is to be applied to Data Center CLOS networks for path | Entropy is to be applied to Data Center Clos networks for path | |||
selection. | selection. | |||
2. Terminology | 2. Terminology | |||
Readers of this document are assumed to be familiar with the | Readers of this document are assumed to be familiar with the | |||
terminology and concepts of the documents listed as Normative | terminology and concepts of the documents listed as Normative | |||
References. | References. | |||
3. Problem Statement and Considerations | 3. Problem Statement and Considerations | |||
3.1. Problem Statement | 3.1. Problem Statement | |||
A common choice for a horizontally scalable topology used in Data | A common choice for a horizontally scalable topology used in Data | |||
Center is a CLOS topology. This topology features an odd number of | Center is a Clos topology. This topology features an odd number of | |||
stages, for example, a 5-Stage CLOS Topology as a example in | stages, for example, a 5-Stage Clos Topology as a example in | |||
[RFC7938]. | [RFC7938]. | |||
ECMP is the fundamental load-sharing mechanism used by a CLOS | ECMP is the fundamental load-sharing mechanism used by a Clos | |||
topology. Effectively, every lower-tier device will use all of its | topology. Effectively, every lower-tier device will use all of its | |||
directly attached upper-tier devices to load-share traffic destined | directly attached upper-tier devices to load-share traffic destined | |||
to the same IP prefix. The number of ECMP paths between any two Tier | to the same IP prefix. The number of ECMP paths between any two Tier | |||
3 devices in CLOS topology is equal to the number of the devices in | 3 devices in Clos topology is equal to the number of the devices in | |||
the middle stage (Tier 1). For example, Figure 1 illustrates a | the middle stage (Tier 1). For example, Figure 1 illustrates a | |||
topology where Tier 3 device L1 has four paths to reach servers X and | topology where Tier 3 device L1 has four paths to reach servers X and | |||
Y, via Tier 2 devices S1 and S2 and then Tier 1 devices S11, S12, S21 | Y, via Tier 2 devices S1 and S2 and then Tier 1 devices S11, S12, S21 | |||
and S22 respectively. | and S22 respectively. | |||
Tier 1 | Tier 1 | |||
+-----+ | +-----+ | |||
Cluster |SUPER| | Cluster |SUPER| | |||
+----------------------------+ +--| S11 |--+ | +----------------------------+ +--| S11 |--+ | |||
| | | +-----+ | | | | | +-----+ | | |||
skipping to change at page 4, line 30 ¶ | skipping to change at page 4, line 30 ¶ | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | |||
| +-----+ +-----+ | | +-----+ | +-----+ +-----+ | | +-----+ +-----+ | | +-----+ | +-----+ +-----+ | |||
| | LEAF| | LEAF| | +--|SUPER|--+ | LEAF| | LEAF| | | | LEAF| | LEAF| | +--|SUPER|--+ | LEAF| | LEAF| | |||
| | L1 | | L2 | Tier 3 | | S22 | Tier 3 | L3 | | L4 | | | | L1 | | L2 | Tier 3 | | S22 | Tier 3 | L3 | | L4 | | |||
| +-----+ +-----+ | +-----+ +-----+ +-----+ | | +-----+ +-----+ | +-----+ +-----+ +-----+ | |||
| | | | | | | | | | | | | | | | | | | | | | |||
| O O O O | X Y O O | | O O O O | X Y O O | |||
| Servers | Servers | | Servers | Servers | |||
+----------------------------+ | +----------------------------+ | |||
Figure 1: 5-Stage CLOS Topology | Figure 1: 5-Stage Clos Topology | |||
When BIER is deployed in a multi-tenant data center network | When BIER is deployed in a multi-tenant data center network | |||
environment for efficient delivery of Broadcast, Unknown-unicast and | environment for efficient delivery of Broadcast, Unknown-unicast and | |||
Multicast (BUM) traffic, a network operator may want a deterministic | Multicast (BUM) traffic, a network operator may want a deterministic | |||
path for every packet. For example, when L1 needs to send a BUM | path for every packet. For example, when L1 needs to send a BUM | |||
packet to L3 and L4, which are in different SIs, L1 has to send the | packet to L3 and L4, which are in different SIs, L1 has to send the | |||
packet twice, and expects the packet along two deterministic paths of | packet twice, and expects the packet along two deterministic paths of | |||
L1->S1->S11-->L3 and L1->S2->S21-->L4 seperately. Another example of | L1->S1->S11-->L3 and L1->S2->S21-->L4 seperately. Another example of | |||
using a deterministic path in a DC is for per-flow steering of | using a deterministic path in a DC is for per-flow steering of | |||
"elephant" flows defined in [I-D.ietf-spring-segment-routing-msdc]. | "elephant" flows defined in [I-D.ietf-spring-segment-routing-msdc]. | |||
skipping to change at page 5, line 19 ¶ | skipping to change at page 5, line 19 ¶ | |||
If one wants, however, to get a deterministic path from the equal | If one wants, however, to get a deterministic path from the equal | |||
cost paths, one can use part of the 20-bit entropy field. For | cost paths, one can use part of the 20-bit entropy field. For | |||
example, bit 0 to bit 2 of entropy label can represent a value of 0 | example, bit 0 to bit 2 of entropy label can represent a value of 0 | |||
to 7, and thus can be used to select a deterministic path from 8 | to 7, and thus can be used to select a deterministic path from 8 | |||
equal cost paths. And thus, a 20-bit entropy label can be used by | equal cost paths. And thus, a 20-bit entropy label can be used by | |||
routers in different tiers to select a deterministic path | routers in different tiers to select a deterministic path | |||
independently by using different parts of the 20-bit entropy label, | independently by using different parts of the 20-bit entropy label, | |||
and form an end-to-end deterministic path. | and form an end-to-end deterministic path. | |||
This is simple and applicable especially for DC CLOS networks, | This is simple and applicable especially for DC Clos networks, | |||
because data delivery in DC CLOS networks for tenants is always | because data delivery in DC Clos networks for tenants is always | |||
multi-staged, with the upstream direction stages having equal cost | multi-staged, with the upstream direction stages having equal cost | |||
paths. | paths. | |||
4. Use of BIER Entropy for DC CLOS Network | 4. Use of BIER Entropy for DC Clos Network | |||
4.1. Use of BIER Entropy for DC CLOS Network | 4.1. Use of BIER Entropy for DC Clos Network | |||
Take the 5-stage CLOS network in figure 1 as an example. | Take the 5-stage Clos network in figure 1 as an example. | |||
Tier 2 in every cluster has N nodes, and the Tier 1 has M nodes. M | Tier 2 in every cluster has N nodes, and the Tier 1 has M nodes. M | |||
is equal to N multiplied by P. | is equal to N multiplied by P. | |||
Tier 3 switches, in upstream direction, act as stage 1 of data | Tier 3 switches, in upstream direction, act as stage 1 of data | |||
delivery and have N equal cost paths to every BFERs in other | delivery and have N equal cost paths to every BFERs in other | |||
clusters. Tier 2 switches, in upstream direction, act as stage 2 of | clusters. Tier 2 switches, in upstream direction, act as stage 2 of | |||
data delivery and have P equal cost paths to every BFERs in other | data delivery and have P equal cost paths to every BFERs in other | |||
clusters. | clusters. | |||
skipping to change at page 7, line 50 ¶ | skipping to change at page 7, line 50 ¶ | |||
[I-D.ietf-mpls-spring-entropy-label] | [I-D.ietf-mpls-spring-entropy-label] | |||
Kini, S., Kompella, K., Sivabalan, S., Litkowski, S., | Kini, S., Kompella, K., Sivabalan, S., Litkowski, S., | |||
Shakir, R., and J. Tantsura, "Entropy label for SPRING | Shakir, R., and J. Tantsura, "Entropy label for SPRING | |||
tunnels", draft-ietf-mpls-spring-entropy-label-12 (work in | tunnels", draft-ietf-mpls-spring-entropy-label-12 (work in | |||
progress), July 2018. | progress), July 2018. | |||
[I-D.ietf-spring-segment-routing-msdc] | [I-D.ietf-spring-segment-routing-msdc] | |||
Filsfils, C., Previdi, S., Dawra, G., Aries, E., and P. | Filsfils, C., Previdi, S., Dawra, G., Aries, E., and P. | |||
Lapukhov, "BGP-Prefix Segment in large-scale data | Lapukhov, "BGP-Prefix Segment in large-scale data | |||
centers", draft-ietf-spring-segment-routing-msdc-10 (work | centers", draft-ietf-spring-segment-routing-msdc-11 (work | |||
in progress), October 2018. | in progress), November 2018. | |||
[RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of | [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of | |||
BGP for Routing in Large-Scale Data Centers", RFC 7938, | BGP for Routing in Large-Scale Data Centers", RFC 7938, | |||
DOI 10.17487/RFC7938, August 2016, | DOI 10.17487/RFC7938, August 2016, | |||
<https://www.rfc-editor.org/info/rfc7938>. | <https://www.rfc-editor.org/info/rfc7938>. | |||
[RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., | [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., | |||
Przygienda, T., and S. Aldrin, "Multicast Using Bit Index | Przygienda, T., and S. Aldrin, "Multicast Using Bit Index | |||
Explicit Replication (BIER)", RFC 8279, | Explicit Replication (BIER)", RFC 8279, | |||
DOI 10.17487/RFC8279, November 2017, | DOI 10.17487/RFC8279, November 2017, | |||
End of changes. 17 change blocks. | ||||
22 lines changed or deleted | 22 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |