draft-ietf-bess-evpn-fast-df-recovery-00.txt   draft-ietf-bess-evpn-fast-df-recovery-01.txt 
BESS Working Group A. Sajassi BESS Working Group A. Sajassi, Ed.
Internet-Draft G. Badoni Internet-Draft G. Badoni
Intended Status: Standards Track D. Rao Intended status: Standards Track D. Rao
P. Brissette Expires: September 10, 2020 P. Brissette
Cisco Cisco
J. Drake J. Drake
Juniper Juniper
J. Rabadan J. Rabadan
Nokia Nokia
March 9, 2020
Expires: December 12, 2018 June 12, 2018
Fast Recovery for EVPN DF Election Fast Recovery for EVPN DF Election
draft-ietf-bess-evpn-fast-df-recovery-00 draft-ietf-bess-evpn-fast-df-recovery-01
Abstract Abstract
Ethernet Virtual Private Network (EVPN) solution [RFC 7432] describes Ethernet Virtual Private Network (EVPN) solution [RFC7432] describes
DF election procedures for multi-homing Ethernet Segments. These DF election procedures for multi-homing Ethernet Segments. These
procedures are enhanced further in [DF-FRAMEWORK] by applying Highest procedures are enhanced further in [RFC8584] by applying Highest
Random Weight Algorithm for DF election in order to avoid DF status Random Weight Algorithm for DF election in order to avoid DF status
unnecessarily upon a failure. This draft makes further improvement to unnecessarily upon a failure. This draft makes further improvement
DF election procedures in [DF-FRAMEWORK] by providing two options for to DF election procedures in [RFC8584] by providing two options for
fast DF election upon recovery of the failed link or node associated fast DF election upon recovery of the failed link or node associated
with the multi-homing Ethernet Segment. This fast DF election is with the multi-homing Ethernet Segment. This fast DF election is
achieved independent of number of EVIs associated with that Ethernet achieved independent of number of EVIs associated with that Ethernet
Segment and it is performed via a simple signaling between the Segment and it is performed via a simple signaling between the
recovered PE and each PE in the multi-homing group. recovered PE and each PE in the multi-homing group.
Status of this Memo Status of This Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF). Note that other groups may also distribute
other groups may also distribute working documents as working documents as Internet-Drafts. The list of current Internet-
Internet-Drafts. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at This Internet-Draft will expire on September 10, 2020.
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Copyright and License Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2 Challenges with Existing Solution . . . . . . . . . . . . . . . 4 2. Challenges with Existing Solution . . . . . . . . . . . . . . 3
3 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1. Overview of Proposed Solutions . . . . . . . . . . . . . 5
3.1 DF Election Handshake Solution . . . . . . . . . . . . . . . 6 3. DF Election Handshake Solution . . . . . . . . . . . . . . . 5
3.1.1 Discovery . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Discovery . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.2 DF candidates Determination . . . . . . . . . . . . . . 6 3.2. DF Candidates Determination . . . . . . . . . . . . . . . 6
3.1.3 DF Election Handshake . . . . . . . . . . . . . . . . . 7 3.3. DF Election Handshake . . . . . . . . . . . . . . . . . . 6
3.1.4 Node Insertion . . . . . . . . . . . . . . . . . . . . . 8 3.4. Node Insertion . . . . . . . . . . . . . . . . . . . . . 7
3.1.5 BGP Encoding . . . . . . . . . . . . . . . . . . . . . . 8 3.5. BGP Encoding . . . . . . . . . . . . . . . . . . . . . . 8
3.1.5.1 DF Election Handshake Request Route . . . . . . . . 9 3.5.1. DF Election Handshake Request Route . . . . . . . . . 8
3.5.1.2 DF Election Handshake Response Route . . . . . . . . 9 3.5.2. DF Election Handshake Response Route . . . . . . . . 8
3.1.6 DF Handshake Scenarios . . . . . . . . . . . . . . . . . 11 3.6. DF Handshake Scenarios . . . . . . . . . . . . . . . . . 10
3.1.7 Interoperability . . . . . . . . . . . . . . . . . . . 13 3.7. Backwards Compatibility . . . . . . . . . . . . . . . . . 12
3.2 DF Election Synchronization Solution . . . . . . . . . . . . 14 4. DF Election Synchronization Solution . . . . . . . . . . . . 13
3.2.3 Advantages . . . . . . . . . . . . . . . . . . . . . . . 15 4.1. Advantages . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.4 Interoperability . . . . . . . . . . . . . . . . . . . . 16 4.2. BGP Encoding . . . . . . . . . . . . . . . . . . . . . . 14
3.2.5 BGP Encoding . . . . . . . . . . . . . . . . . . . . . . 16 4.3. Note on NTP-based synchronization . . . . . . . . . . . . 15
3.2.6 Note on NTP-based synchronization . . . . . . . . . . . 17 4.4. Synchronization Scenarios . . . . . . . . . . . . . . . . 15
3.2.7 An example . . . . . . . . . . . . . . . . . . . . . . . 17 4.5. Backwards Compatibility . . . . . . . . . . . . . . . . . 16
4 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . 18 5. Interoperability . . . . . . . . . . . . . . . . . . . . . . 17
5 Security Considerations . . . . . . . . . . . . . . . . . . . . 18 6. Security Considerations . . . . . . . . . . . . . . . . . . . 17
6 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 18 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17
7 References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 18
7.1 Normative References . . . . . . . . . . . . . . . . . . . 18 8.1. Normative References . . . . . . . . . . . . . . . . . . 18
7.2 Informative References . . . . . . . . . . . . . . . . . . 18 8.2. Informative References . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 18
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19
1 Introduction 1. Introduction
Ethernet Virtual Private Network (EVPN) solution [RFC 7432] is Ethernet Virtual Private Network (EVPN) solution [RFC7432] is
becoming pervasive in data center (DC) applications for Network becoming pervasive in data center (DC) applications for Network
Virtualization Overlay (NVO) and DC interconnect (DCI) services, and Virtualization Overlay (NVO) and DC interconnect (DCI) services, and
in service provider (SP) applications for next generation virtual in service provider (SP) applications for next generation virtual
private LAN services. private LAN services.
EVPN solution [RFC 7432] describes DF election procedures for multi- EVPN solution [RFC7432] describes DF election procedures for multi-
homing Ethernet Segments. These procedures are enhanced further in homing Ethernet Segments. These procedures are enhanced further in
[DF-FRAMEWORK] by applying Highest Random Weight Algorithm for DF [RFC8584] by applying Highest Random Weight Algorithm for DF election
election in order to avoid DF status change unnecessarily upon a link in order to avoid DF status change unnecessarily upon a link or node
or node failure associated with the multi-homing Ethernet Segment. failure associated with the multi-homing Ethernet Segment. This
This draft makes further improvement to DF election procedures in draft makes further improvement to DF election procedures in
[DF-FRAMEWORK] by providing two options for a fast DF election upon [RFC8584] by providing two options for a fast DF election upon
recovery of the failed link or node associated with the multi-homing recovery of the failed link or node associated with the multi-homing
Ethernet Segment. This DF election is achieved independent of number Ethernet Segment. This DF election is achieved independent of number
of EVIs associated with that Ethernet Segment and it is performed via of EVIs associated with that Ethernet Segment and it is performed via
a simple signaling between the recovered PE and each PE in the multi- a simple signaling between the recovered PE and each PE in the multi-
homing group. The draft presents two signaling options. The first homing group. The draft presents two signaling options. The first
option is based on a bidirectional handshake procedure whereas the option is based on a bidirectional handshake procedure whereas the
second option is based on simple one-way signaling mechanism. second option is based on simple one-way signaling mechanism.
1.1 Terminology 1.1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [KEYWORDS]. document are to be interpreted as described in [RFC2119].
Provider Edge (PE) : A device that sits in the boundary of Provider Provider Edge (PE): A device that sits in the boundary of Provider
and Customer networks and performs encap/decap of data from L2 to L3 and Customer networks and performs encap/decap of data from L2 to
and vice-versa. L3 and vice-versa.
Designated Forwarder (DF): An PE that is currently forwarding Designated Forwarder (DF): A PE that is currently forwarding
(encapsulating/decapsulating) traffic for a given VLAN in and out of (encapsulating/decapsulating) traffic for a given VLAN in and out
a site. of a site.
2 Challenges with Existing Solution 2. Challenges with Existing Solution
In EVPN technology, multiple PE devices have the ability to encap and In EVPN technology, multiple PE devices have the ability to encap and
decap data belonging to the same VLAN. In certain situations, this decap data belonging to the same VLAN. In certain situations, this
may cause L2 duplicates and even loops if there is a momentary may cause L2 duplicates and even loops if there is a momentary
overlap of forwarding roles between two or more PE devices, leading overlap of forwarding roles between two or more PE devices, leading
to broadcast storms. to broadcast storms.
EVPN [RFC 7432] currently uses timer based synchronization among PE EVPN [RFC7432] currently uses timer based synchronization among PE
devices in redundancy group that can result in duplications (and even devices in redundancy group that can result in duplications (and even
loops) because of multiple DFs if the timer is too short or loops) because of multiple DFs if the timer is too short or
blackholing if the timer is too long. blackholing if the timer is too long.
Using site-of-origin Split Horizon filtering can prevent loops (but Using site-of-origin Split Horizon filtering can prevent loops (but
not duplicates), however if there are overlapping DFs in two not duplicates), however if there are overlapping DFs in two
different sites at the same time for the same VLAN, the site different sites at the same time for the same VLAN, the site
identifier will be different upon re-entry of the packet and hence identifier will be different upon re-entry of the packet and hence
the split horizon check will fail, leading to L2 loops. the split horizon check will fail, leading to L2 loops.
The current state of art [DF-FRAMEWORK] uses the well known HRW The current state of art [RFC8584] uses the well known HRW
(Highest Random Weight) algorithm to avoid reshuffling of VLANs among (Highest Random Weight) algorithm to avoid reshuffling of VLANs among
PE devices in the redundancy group upon failure/recovery and thus PE devices in the redundancy group upon failure/recovery and thus
reducing the impact of failure/recovery to VLANs not on the reducing the impact of failure/recovery to VLANs not on the
failed/recovered ports. This eliminates loops/duplicates in failure failed/recovered ports. This eliminates loops/duplicates in failure
scenarios. scenarios.
However, upon PE insertion or port bring-up, HRW cannot help as a However, upon PE insertion or port bring-up, HRW cannot help as a
transfer of DF role need to happen to the newly inserted device/port transfer of DF role need to happen to the newly inserted device/port
while the old DF is still active. while the old DF is still active.
+---------+ +---------+
+-------------+ | | +-------------+ | |
| | | | | | | |
/ | PE1 |----| | +-------------+ / | PE1 |----| | +-------------+
/ | | | MPLS/ | | |---H3 / | | | MPLS/ | | |---H3
/ +-------------+ | VxLAN/ | | PE10 | / +-------------+ | VxLAN/ | | PE10 |
CE1 - | Cloud | | | CE1 - | Cloud | | |
\ +-------------+ | |---| | \ +-------------+ | |---| |
\ | | | | +-------------+ \ | | | | +-------------+
\ | PE2 |----| | \ | PE2 |----| |
| | | | | | | |
+-------------+ | | +-------------+ | |
+---------+ +---------+
Figure 1: CE1 multi-homed to PE1 and PE2. Potential for duplicate DF. Figure 1: CE1 multi-homed to PE1 and PE2. Potential for duplicate
DF.
In the Figure 1, when PE2 is inserted or booted up, PE1 will transfer In the Figure 1, when PE2 is inserted or booted up, PE1 will transfer
DF role of some VLANs to PE2 to achieve load balancing. However, DF role of some VLANs to PE2 to achieve load balancing. However,
because there is no handshake mechanism between PE1 and PE2, because there is no handshake mechanism between PE1 and PE2,
duplication of DF roles for a give VLAN is possible. Duplication of duplication of DF roles for a give VLAN is possible. Duplication of
DF roles may eventually lead to L2 loops as well as duplication of DF roles may eventually lead to L2 loops as well as duplication of
traffic. traffic.
Current state of EVPN art relies on a blackholing timer for Current state of EVPN art relies on a blackholing timer for
transferring the DF role to the newly inserted device. This can cause transferring the DF role to the newly inserted device. This can
the following issues: cause the following issues:
* Loops/Duplicates if the timer value is too short * Loops/Duplicates if the timer value is too short
* Prolonged Traffic Blackholing if the timer value is too long * Prolonged Traffic Blackholing if the timer value is too long
This draft is proposing solutions that deterministically eliminates 2.1. Overview of Proposed Solutions
loops/duplicates and at the same time provides fast convergence upon
PE/port insertion.
3 Operation The first solution proposed deterministically eliminates loops/
duplicates with a state machine approach. The second proposal helps
narrow the DF Election window defined in [RFC7432], intended to
eliminate loops, based on common clock alignment. Both proposals
provide fast convergence upon PE/port insertion.
Here we describe two signaling mechanisms between the newly inserted Two signaling mechanisms between the newly inserted PE and remaining
PE and remaining PEs. The signaling is only possible once the newly PEs are described. The signaling is only possible once the newly
inserted PE has reliably discovered the other PEs and vice versa. The inserted PE has reliably discovered the other PEs and vice versa.
first option is referred to as DF Election Handshake solution and is The first option is referred to as DF Election Handshake solution and
described in section 3.1. The second option is referred to as DF is described in Section 3. The second option is referred to as DF
Election Synchronization Solution and is described in section 3.2. Election Synchronization Solution and is described in Section 4.
3.1 DF Election Handshake Solution 3. DF Election Handshake Solution
Due to HRW, the handshake will only be one per PE device and Due to HRW, the handshake will only be one per PE device and
independent of EVI/VNI scale. Therefore, this solution is divided independent of EVI/VNI scale. Therefore, this solution is divided
into three steps: into three steps:
Phase 1: Discovery Phase 1: Discovery
Phase 2: DF Candidate Determination; HRW or Preference-based Phase 2: DF Candidates Determination
Phase 3: Handshake Phase 3: DF Election Handshake
Following is the description each step in detail. Following is the description each step in detail.
3.1.1 Discovery 3.1. Discovery
Each PE needs to have a consistent view of the network including the Each PE needs to have a consistent view of the network including the
newly inserted PE. newly inserted PE.
Newly inserted device PE will advertise it's Ethernet Segment route Newly inserted device PE will advertise it's Ethernet Segment route
and start a flood/wait timer. This timer should be large enough to and start a flood/wait timer. This timer should be large enough to
guarantee the dissemination and receipt of this advertisement by guarantee the dissemination and receipt of this advertisement by
previously inserted PEs. previously inserted PEs.
As the old DF is continuously forwarding traffic while the new PE is As the old DF is continuously forwarding traffic while the new PE is
running this timer, this timer can be made as long as required running this timer, this timer can be made as long as required
without impacting traffic convergence. The timer value can be the BGP without impacting traffic convergence. The timer value can be the
session hold time in the worst case to ensure proper discovery. BGP session hold time in the worst case to ensure proper discovery
but in most cases will be equivalent to [RFC7432]'s PEERING timer.
3.1.2 DF candidates Determination 3.2. DF Candidates Determination
After the discovery timer has elapsed, each PE would have an imported After the discovery timer has elapsed, each PE would have an imported
list of the Ethernet Segment Routes from other PEs. The resultant list of the Ethernet Segment Routes from other PEs. The resultant
database will comprise of all the DF candidates on a per ES basis and database will comprise of all the DF candidates on a per ES basis and
will be used for DF election. Each PE will independently run the will be used for DF election. Each PE will independently run the
selected DF algorithm - i.e., HRW algorithm (or Preference-based) for selected DF algorithm - i.e., HRW algorithm (or Preference-based) for
all VLANs in a given Ethernet Segment. Since the discovery phase all VLANs in a given Ethernet Segment. Since the discovery phase
guarantees uniform network view between the participating devices, guarantees uniform network view between the participating devices,
the VLAN distribution results based on HRW (or Preference-based) will the VLAN distribution results based on HRW (or Preference-based) will
be consistent. be consistent.
3.1.3 DF Election Handshake 3.3. DF Election Handshake
The DF Election handshake will be accomplished in the following The DF Election handshake will be accomplished in the following
steps: steps:
- The newly inserted PE will send the DF Request to previously - The newly inserted PE will send the DF-Request to previously
inserted PEs with a new sequence number. inserted PEs with a new sequence number.
- The previously inserted PE(s) will receive the DF Request, will - The previously inserted PE(s) will receive the DF-Request, will
validate this request as per own discovery state and HRW (or validate this request as per own discovery state and local DF
Preference-based) results. Candidates results (e.g. Modulo, HRW or Preference-based).
- The previously inserted PE(s) will program hardware to block the - The previously inserted PE(s) will program its hardware to block
VLANs that must be transferred to the newly inserted PE. the VLANs that must be transferred to the newly inserted PE.
- The previously inserted PE(s) will send DF Response (W/ ACK OR - The previously inserted PE(s) will send DF-Response (with DF-ACK or
NACK) to the newly inserted PE with the same sequence number that was DF-NACK flag) to the newly inserted PE with the same sequence
contained in the DF Request. number that was contained in the DF-Request.
- Newly inserted PE will receive DF Response and validate it using - Newly inserted PE will receive DF-Response and validate it using
the sequence number. It will take action per received DF Response the sequence number. It will take action per received DF-Response
message and will not wait for all previously inserted devices for message, and for faster convergence, does not wait for all
faster convergence. The received DF Response is interpreted as an previously inserted devices. The Handshake transaction are on a
indication from the previously inserted PE to give up the DF role on per-pair of peering PEs.
those VLANs for which the newly inserted PE should be DF. In other
words, the newly inserted PE will only take over as DF for a given
VLAN/ISID if (a) it is the DF Election winner AND (b) it gets the ACK
from the previous DF.
- In case of Preference-based DF Election, the above procedure should - The DF-Response received at newly inserted PE is interpreted as an
only be followed if there is at least one previously inserted PE that indication from the previously inserted PE that is has relinquished
signals DP=0 in its ES route (there is no need for handshake in case the DF role on those VLANs for which the newly inserted PE should
of non-revertive mode). be DF. In other words, the newly inserted PE will only take over
as DF for a given VLAN/ISID if
- In case of a DF Response ACK, newly inserted PE will program its A. it is the DF Candidates election winner, AND
hardware to assume the DF responsibility.
B. it gets the DF-ACK from the previous DF.
- Upon receiving DF-Response with DF-ACK, newly inserted PE assumes
the DF responsibility and will program its hardware to unblock the
VLANs it is assuming.
- In case of Preference-based DF Election, the above procedure should
only be followed if there is at least one previously inserted PE
that signals DP=0 in its ES route (there is no need for handshake
in case of non-revertive mode).
We don't need to have a handshake on a per VLAN/EVI basis but rather We don't need to have a handshake on a per VLAN/EVI basis but rather
per pair of PEs in the redundancy group - i.e., if a new PE is added per pair of PEs in the redundancy group - i.e., if a new PE is added
to an existing redundancy group of 3 PE devices, then we need only to to an existing redundancy group of 3 PE devices, then we need only to
have 3 handshakes. This is because the devices already are in sync have 3 handshakes. This is because the devices already are in sync
about which VLANs to give-up/takeover (HRW). about which VLANs to give-up/takeover.
At the end of these three phases, the VLAN DF role transfer would At the end of these three phases, the VLAN DF role transfer would
have happened in a deterministic way while ensuring minimum traffic have happened in a deterministic way while ensuring minimum traffic
loss. Device recovery and device insertion scenarios are identical in loss. Device recovery and device insertion scenarios are identical
terms of the handshaking procedure. In next section, we describe the in terms of the handshaking procedure. In next section, we describe
procedure details for device insertion. the procedure details for device insertion.
3.1.4 Node Insertion 3.4. Node Insertion
Consider the scenario where PE3 is inserted in the network, while PE1 Consider the scenario where PE3 is inserted in the network, while PE1
and PE2 are already in stable state. PE3 will send/receive the and PE2 are already in stable state. PE3 will send/receive the
following flags along with the EVPN Type 4 route: following flags along with the EVPN Type 4 route:
- DF Request: Upon completing the DF Election, PE3 will send DF - DF-Request: Upon completing the DF Election, PE3 will send DF
Request with a new sequence number. PE1 and PE2 will receive this Request with a new sequence number. PE1 and PE2 will receive this
message and respond with DF Response ACK or NACK with the same message and respond with Response DF-ACK or DF-NACK with the same
sequence number that was generated by PE3. sequence number that was generated by PE3.
- DF Response ACK: When PE3 receives DF Response ACK from PE1 with - DF-Response DF-ACK: When PE3 receives DF-Response DF-ACK from PE1
the same sequence number as DF Request, it will take over the DF role with the same sequence number as DF-Request, it will take over the
for the appropriate VLANS that are being transferred from PE1. When DF role for the appropriate VLANS that are being transferred from
DF Response ACK from PE2 arrives, the rest of the VLANS to be PE1. When DF-Response DF-ACK from PE2 arrives, the rest of the
transferred from PE2 to PE3 are then taken over by PE3. VLANS to be transferred from PE2 to PE3 are then taken over by PE3.
- DF Response NACK: If PE3 receives DF Response NACK from at least - DF-Response DF-NACK: If PE3 receives DF-Response DF-NACK from at
one of PE1 or PE2, it will not take over DF role and will start least one of PE1 or PE2, it will not take over DF role and will
over. start over (new sequence number).
Consider the scenario where two nodes PE3 and PE4 are being inserted Consider the scenario where two nodes PE3 and PE4 are being inserted
at the same time. Both of them will send a DF Request to PE1 and PE2 at the same time. Both of them will send a DF-Request to PE1 and PE2
at around the same time with possibly the same sequence number. When at around the same time with possibly the same sequence number. When
PE1 and PE2 respond with DF Response ACK, it is important to signify PE1 and PE2 respond with DF-Response DF-ACK, it is important to
exactly whom the response is meant for as it could be for either signify exactly whom the response is meant for as it could be for
requester (PE3 or PE4). To remove any ambiguity and false positives, either requester (PE3 or PE4). To remove any ambiguity and false
the IP address of the requester MUST be included in the response positives, the IP address of the requester MUST be included in the
message to specify who the response is meant for. response message to specify who the response is meant for.
3.1.5 BGP Encoding 3.5. BGP Encoding
The EVPN NLRI comprises of Route Type (1B), Length (1B) and Route The EVPN NLRI comprises of Route Type (1B), Length (1B) and Route
Type specific variable encoding. Here we propose the creation of two Type specific variable encoding. Here we propose the creation of two
new EVPN route types: new EVPN route types:
+ 0x0C - DF Election Handshake Request Route + TBD1 - DF Election Handshake Request Route
+ 0x0D - DF Election Handshake Response Route
3.1.5.1 DF Election Handshake Request Route + TBD2 - DF Election Handshake Response Route
3.5.1. DF Election Handshake Request Route
A DF Election Handshake Request Type NLRI consists of the following: A DF Election Handshake Request Type NLRI consists of the following:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-++ +-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-++
| RD (8 octets) | | RD (8 octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet Segment Identifier (10 octets) | | Ethernet Segment Identifier (10 octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| DF-Flags (1 octet) | | DF-Flags (1 octet) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number (1 octet) | | Sequence Number (1 octet) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Originating Router's IP Address | | Originating Router's IP Address |
| (4 or 16 octets) | | (4 or 16 octets) |
+-----------------------------------------+ +-----------------------------------------+
The DF-Flags can have the following values: The DF Flags can have the following values:
DF-INIT : Sent initially upon boot-up; bootstraps the network DF-INIT : Sent initially upon boot-up; bootstraps the network
DF-REQUEST : Sent to request DF takeover DF-REQUEST : Sent to request DF takeover
For the purpose of BGP route key processing, the Ethernet Segment For the purpose of BGP route key processing, the Ethernet Segment
Identifier and Originating Router's IP address fields are considered Identifier and Originating Router's IP address fields are considered
to be part of the prefix in the NLRI. The DF-Flag and Sequence number to be part of the prefix in the NLRI. The DF-Flag and Sequence
is to be treated as a route attribute as opposed to being part of the number is to be treated as a route attribute as opposed to being part
route. This route is sent along with ESI-Import route target. of the route. This route is sent along with ES-Import route target.
3.5.1.2 DF Election Handshake Response Route 3.5.2. DF Election Handshake Response Route
A DF Election Handshake Response Type NLRI consists of the following: A DF Election Handshake Response Type NLRI consists of the following:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-++ +-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-++
| RD (8 octets) | | RD (8 octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet Segment Identifier (10 octets) | | Ethernet Segment Identifier (10 octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP-Address Length (1 octet) | | IP-Address Length (1 octet) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 10, line 25 skipping to change at page 9, line 25
| DF-Flags (1 octet) | | DF-Flags (1 octet) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number (1 octet) | | Sequence Number (1 octet) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Originating Router's IP Address | | Originating Router's IP Address |
| (4 or 16 octets) | | (4 or 16 octets) |
+-----------------------------------------+ +-----------------------------------------+
The DF-Flags can have the following values: The DF-Flags can have the following values:
DF-ACK : Sent to Acknowledge DF-REQUEST DF-ACK : Sent to Acknowledge DF-REQUEST
DF-NACK : Sent to Reject DF-Request DF-NACK : Sent to Reject DF-REQUEST
For the purpose of BGP route key processing, the Ethernet Segment For the purpose of BGP route key processing, the Ethernet Segment
Identifier, IP Address Length and Destination Router's IP Address Identifier, IP Address Length and Destination Router's IP Address
fields, and Originating Router's IP address fields are considered to fields, and Originating Router's IP address fields are considered to
be part of the prefix in the NLRI. The DF-Flag and Sequence number is be part of the prefix in the NLRI. The DF-Flag and Sequence number
to be treated as a route attribute as opposed to being part of the is to be treated as a route attribute as opposed to being part of the
route. This route is sent along with ESI-Import route target. route. This route is sent along with ESI-Import route target.
This document introduces a new flag called "H" (for Handshake) to the This document introduces a new flag called "H" (for Handshake) to the
bitmap field of the DF Election Extended Community defined in [DF- bitmap field of the DF Election Extended Community defined in [DF-
FRAMWORK]. FRAMWORK].
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type=0x06 | Sub-Type(0x06)| DF Type |P|A|H|T| Bitmap| | Type=0x06 | Sub-Type(0x06)| DF Type |D|A|H|T| |P| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved = 0 | | Reserved = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
H: This flag is located in bit position 26 as shown above. When set H: This flag is located in bit position 26 as shown above. When set
to 1, it indicates the desire to use Handshaking capability with the to 1, it indicates the desire to use Handshaking capability with the
rest of the PEs in the ES. This capability can only be used with a rest of the PEs in the ES. This capability can only be used with a
selected number of DF election algorithms such as HRW and Preference- selected number of DF election algorithms such as HRW and Preference-
based. based.
3.1.6 DF Handshake Scenarios 3.6. DF Handshake Scenarios
Consider the scenario where PE3 is freshly inserted into the network Consider the scenario where PE3 is freshly inserted into the network
with PE1 and PE2 in steady state (as shown below). As shown in the with PE1 and PE2 in steady state (as shown below). As shown in the
sequence diagram below, at time = T0, PE3 will send Type 4 ES route sequence diagram below, at time = T0, PE3 will send Type 4 ES route
and that will cause PE1 and PE2 to discover PE3. and that will cause PE1 and PE2 to discover PE3.
Post the discovery timer, at time = T1, PE3 will send DF Request Post the discovery timer, at time = T1, PE3 will send DF-Request
containing [ESI, DF-REQ, SEQ1]. containing [ESI, DF-REQ, SEQ1].
PE2 responds via DF Response ACK at time = T2, with the same sequence PE2 responds via DF-Response ACK at time = T2, with the same sequence
number SEQ1. [ESI, DF-ACK, PE3, SEQ1]. Note that the sequence number number SEQ1. [ESI, DF-ACK, PE3, SEQ1]. Note that the sequence
is the same as is contained in the DF Request from PE3. PE3 will number is the same as is contained in the DF-Request from PE3. PE3
receive the DF Response ACK and take over the appropriate VLANs based will receive the DF-Response ACK and take over the appropriate VLANs
on HRW only if the sequence number matches. based on HRW only if the sequence number matches.
PE1 responds via DF Response ACK at time = T3, with the same sequence PE1 responds via DF-Response DF-ACK at time = T3, with the same
number SEQ1; [ESI, DF-ACK, PE3, SEQ1]. PE3 will receive the DF sequence number SEQ1; [ESI, DF-ACK, PE3, SEQ1]. PE3 will receive the
Response ACK and take over the appropriate VLANs based on HRW only if DF Response ACK and take over the appropriate VLANs based on HRW only
the sequence number matches. if the sequence number matches.
By the end of the handshake, all appropriate VLANs for the ES are By the end of the handshake, all appropriate VLANs for the ES are
transferred from PE1 and PE2 to PE3 with a single per-ES handshake. transferred from PE1 and PE2 to PE3 with a single per-ES handshake.
PE1 PE2 PE3 PE1 PE2 PE3
| | | | | |
| | Type 4 (Discovery) | | | Type-4 (Discovery) |
| |<<-------------------------| T0 | |<<-------------------------| T0
|<<---------------------------------------------| |<<------------------------------------------------|
| | | | | |
| | | | | |
| | Type C (DF Request) | .br | | Type-TBD1 (DF-Request) |
|<<-----------------|<<-------------------------| T1 |<<--------------------|<<-------------------------| T1
| | | | | |
| | Type D (DF Response) | .br | | Type-TBD2 (DF-Response)|
| |------------------------->>| T2 | |------------------------->>| T2
| Type D(DF Resp) | | | Type-TBD2 (DF-Response)| |
|--------------------------------------------->>| T3 |------------------------------------------------>>| T3
| | | | | |
|<<###########################################>>| |<<##############################################>>|
| PE3 freshly inserted | | PE3 freshly inserted |
|<<###########################################>>| |<<##############################################>>|
. . . . . .
Consider the scenario where PE2 and PE3 are inserted simultaneously Consider the scenario where PE2 and PE3 are inserted simultaneously
in the network where PE1 is in steady state (as shown below). PE2 and in the network where PE1 is in steady state (as shown below). PE2
PE3 will send the Type 4 ES routes and start the discovery timer. and PE3 will send the Type 4 ES routes and start the discovery timer.
This will cause PE1, PE2 and PE3 to discover each other. This will cause PE1, PE2 and PE3 to discover each other.
PE2 and PE3 will then simultaneously and separately send DF Request. PE2 and PE3 will then simultaneously and separately send DF Request.
PE1 will receive these requests and respond to them. PE1 will receive these requests and respond to them.
To avoid any ambiguity, PE1 will explicitly specify in the DF Request To avoid any ambiguity, PE1 will explicitly specify in the DF Request
route the destination for which the DF-ACK is meant for. That is why route the destination for which the DF-ACK is meant for. That is why
the responses from PE1 will contain [ES1, DF-ACK, PE2, SEQ] and [ESI, the responses from PE1 will contain [ES1, DF-ACK, PE2, SEQ] and [ESI,
DF-ACK, PE3, SEQ] to specify that the response is meant for PE2 and DF-ACK, PE3, SEQ] to specify that the response is meant for PE2 and
PE3 respectively. PE3 respectively.
Upon receiving the Type-D response message, PE2 and PE3 will take Upon receiving the Type-TBD2 response message, PE2 and PE3 will take
over the respective VLANs. over the respective VLANs.
PE1 PE2 PE3 PE1 PE2 PE3
| | | | | |
| | Type 4 (Discovery) | | | Type 4 (Discovery) |
|<<-----------------| | T0 |<<-------------------| | T0
|<<-----------------|<<-------------------------| |<<-------------------|<<-------------------------|
| | | | | |
| | | | | |
| | Type C (DF Request) | | | Type-TBD1 (DF-Request) |
| |<<-------------------------| T1 | |<<-------------------------| T1
| | | | | |
| Type C(DF Request)| | | Type-TBD1(DF-Request)| |
|<<-----------------| | T2 |<<-------------------| | T2
| | | | | |
| | Type D (DF Response) | | | Type-TBD2 (DF-Response)|
|--------------------------------------------->>| T3 |----------------------------------------------->>| T3
| | | | | |
|Type D(DF Response)| | |Type-TBD2(DF-Response)| |
|----------------->>| | T4 |------------------->>| | T4
| | | | | |
|<<###########################################>>| |<<#############################################>>|
| PE2 and PE3 inserted simultaneously | | PE2 and PE3 inserted simultaneously |
|<<###########################################>>| |<<#############################################>>|
. . . . . .
When PE3 is booted down or removed from the network, the routes When PE3 is booted down or removed from the network, the routes
formerly advertised by PE3 will be withdrawn, including the Type 4 formerly advertised by PE3 will be withdrawn, including the Type-4
route (as shown below). When PE1 and PE2 process the deletion of route (as shown below). When PE1 and PE2 process the deletion of
PE3's Type 4 route, they will clean up any DF handshake state PE3's Type-4 route, they will clean up any DF handshake state
pertaining to PE3. This means that PE1 and PE2 will withdraw the DF pertaining to PE3. This means that PE1 and PE2 will withdraw the DF
Response routes that they had earlier sent with PE3 as the Response routes that they had earlier sent with PE3 as the
destination. destination.
PE1 PE2 PE3 PE1 PE2 PE3
| | | | | |
| | Type 4 Route Withdrawal | | | Type-4 Route Withdrawal |
| |<<-------------------------| T0 | |<<-------------------------| T0
|<<---------------------------------------------| |<<-----------------------------------------------|
| | | | | |
| PE2 purges Type D(DF Resp) sent to PE3 | T2 | PE2 purges Type-TBD2 (DF-Response) sent to PE3| T2
| | | | | |
| PE1 purges Type D(DF Resp) sent to PE3 | T3 | PE1 purges Type-TBD2 (DF-Response) sent to PE3| T3
| | | | | |
|<<###########################################>>| |<<#############################################>>|
| PE3 booted down/removed from the network | | PE3 booted down/removed from the network |
|<<###########################################>>| |<<#############################################>>|
. . . . . .
3.1.7 Interoperability 3.7. Backwards Compatibility
Per redundancy group (per ES), for the DF election procedures to be Per redundancy group (per ES), for the DF election procedures to be
globally convergent and unanimous, it is necessary that all the globally convergent and unanimous, it is necessary that all the
participating PEs agree on the DF Election algorithm to be used. It participating PEs agree on the DF Election algorithm to be used. It
is, however, possible that some PEs continue to use the existing is, however, possible that some PEs continue to use the existing
modulus based DF election and do not rely on the new handshake/sync modulus based DF election and do not rely on the new handshake/sync
procedures. PEs running an old versions of draft/RFC shall simply procedures. PEs running an old versions of draft/RFC shall simply
discard unrecognized new BGP extended communities. discard unrecognized new BGP extended communities.
A PE can indicate its willingness to support new Handshake and/or A PE can indicate its willingness to support new Handshake and/or
Time Synchronization capabilities by signaling them in the DF Time Synchronization capabilities by signaling them in the DF
Election Extended Community defined in [DF-FRAMEWORK] sent along with Election Extended Community defined in [RFC8584] sent along with the
the Ethernet-Segment Route (Type-4). Ethernet-Segment Route (Type-4).
Considering that all the PE devices support the HRW election Considering that all the PE devices support the HRW election
algorithm, but only a subset of them may have the capability of algorithm, but only a subset of them may have the capability of
performing the handshake or synchronization mechanism. In such a performing the handshake or synchronization mechanism. In such a
situation, the following procedure are exercised. situation, the following procedure are exercised.
If some PEs in the redundancy group signal both Handshake and Time
Synchronization capabilities (both H & T set to 1), then Time
Synchronization capability SHALL be chosen over Handshake capability
with the HRW (or Preference-based) DF election algorithm.
If some PEs in the redundancy group signal Time Synchronization
(T=1) but not Handshaking (H=0); whereas, some other PEs in the same
redundancy group signal Handshaking (H=1) but not Time
Synchronization (T=0), then the PEs that have handshaking ability,
SHALL perform HRW with handshaking among themselves and the PEs that
Time Synchronization capability SHALL perform HRW (or Preference-
based) with time synchronization among themselves.
If some PEs in the redundancy group don't signal either Time
Synchronization or Handshaking capabilities, then these PEs SHALL
perform HRW (or Preference-based) with default timer based mechanism
defined in [RFC 7432].
In the illustration below, PE1, PE2 and PE3 send their respective In the illustration below, PE1, PE2 and PE3 send their respective
Type 4 routes indicating their DF capabilities at time T1, T2 and T3 Type-4 routes indicating their DF capabilities at time T1, T2 and T3
respectively. Only PE2 and PE3 are Handshake capable, hence only PE2 respectively. Only PE2 and PE3 are Handshake capable, hence only PE2
and PE3 partake in DF Handshaking procedure described here at time T4 and PE3 partake in DF Handshaking procedure described here at time T4
and T5. PE1 on the other hand, runs the DF election timer and takes and T5. PE1 on the other hand, runs the DF election timer and takes
over the DF role upon timer expiry at time T6. over the DF role upon timer expiry at time T6.
PE1 PE2 PE3.br PE1 PE2 PE3
| | | | | |
| | | | | |
| Type 4 (0x0 Default Capability) | | Type-4 (0x0 Default Capability) |
|----------------->>|------------------------->>| T1 |------------------->>|------------------------->>| T1
| | | | | |
| Type 4 (H=1 Handshake Capable) | | Type-4 (H=1 Handshake Capable) |
|<<-----------------|------------------------->>| T2 |<<-------------------|------------------------->>| T2
| | | | | |
| Type 4 (H=1 Handshake Capable) | | Type-4 (H=1 Handshake Capable) |
|<<-----------------|<<-------------------------| T3 |<<-------------------|<<-------------------------| T3
| | | | | |
| | | | | |
| | Type C (DF Request) | | | Type-TBD1 (DF-Request) |
| |<<-------------------------| T4 | |<<-------------------------| T4
| | | | | |
| | Type D (DF Response) | | | Type-TBD2 (DF-Response)|
| |------------------------->>| T5 | |------------------------->>| T5
| PE1 Timer Expiry (DF Takeover) | T6 | PE1 Timer Expiry (DF Takeover) | T6
|<<###########################################>>| |<<#############################################>>|
| Only PE2 and PE3 Handshake Capable | | Only PE2 and PE3 Handshake Capable |
|<<###########################################>>| |<<#############################################>>|
. . . . . .
4. DF Election Synchronization Solution
3.2 DF Election Synchronization Solution
If all PE devices attached to a given Ethernet Segment are clock- If all PE devices attached to a given Ethernet Segment are clock-
synchronized with each other, then the above handshaking procedures synchronized with each other, then the above handshaking procedures
can be simplified and packet loss can be reduced from BGP-propagation can be simplified and packet loss can be reduced from BGP-propagation
time (between recovered PE and the DF PE) to very small time (e.g., time (between recovered PE and the DF PE) to very small time (e.g.,
milliseconds or less). milliseconds or less).
The simplified procedure is as follow: The simplified procedure is as follow:
First, the DF election procedure, described in RFC7432, is applied as The DF Election procedure, as described in [RFC7432] and as
before. optionally signalled in [RFC8584], is applied.
All PEs attached to a given Ethernet-Segment are clock-synchronized; All PEs attached to a given Ethernet-Segment are clock-synchronized;
using a networking protocol for clock synchronization (e.g. NTP, PTP, using a networking protocol for clock synchronization (e.g. NTP,
etc). PTP, etc.).
Newly inserted device PE or during failure recovery of a PE, that PE Newly inserted device PE or during failure recovery of a PE, that PE
communicates the current time to peering partners plus the remaining communicates the current time to peering partners plus the remaining
peering timer time left. This constitute an "endtime" as see from peering timer time left. This constitute an "end" or "absolute" time
local PE. That "endtime" is called "Service Carving Time" (SCT). as seen from local PE. That absolute time is called "Service Carving
Time" (SCT).
A new BGP Extended Community is advertised along with RT-4 to A new BGP Extended Community is advertised along with RT-4 to
communicate to other partners the Service Carving Time. communicate to other partners the Service Carving Time.
Upon reception of that new BGP Extended Community, partner PEs know Upon reception of that new BGP Extended Community, partner PEs know
exactly its carving time. The notion of skew is introduced to exactly its carving time. The notion of skew is introduced to
eliminate any potential duplicate traffic or loops. They add a skew eliminate any potential duplicate traffic or loops. They add a skew
(default = -10ms) to the Service Carving Time to enforce this; (default = -10ms) to the Service Carving Time to enforce this. The
basically partner PEs must carve first. previously inserted PE(s) must carve first, followed shortly(skew) by
the newly insterted PE.
To summarize, all peering PEs carve almost simultaneously at the time To summarize, all peering PEs carve almost simultaneously at the time
announced by newly added / recovered PE. The newly added/recovered PE announced by newly added/recovered PE. The newly inserted PE
initiates the SCT, carves immediately on peering timer expiry. Other initiates the SCT, and carves immediately on peering timer expiry.
PE receiving RT-4 with a SCT BGP ExtComm, carve shortly before "SCT The previously inserted PE(s) receiving RT-4 with a SCT BGP extended
time". community, carve shortly before Service Carving Time.
3.2.3 Advantages 4.1. Advantages
There are multiples advantages of using the approach. Here is a non- There are multiples advantages of using the approach. Here is a non-
exhaustive list: exhaustive list:
- A simple uni-directional signaling is all needed - A simple uni-directional signaling is all needed
- Backwards-compatible: old versions of draft/RFC shall simply - Backwards-compatible: PEs supporting only older [RFC7432] shall
discard unrecognized new SCT BGP ExtComm simply discard unrecognized new "Service Carving Timestamp" BGP
Extended Community
- Multiple DF Election algorithms can be supported: - Multiple DF Election algorithms can be supported:
* RFC7432's default ordered list ordinal algorithm (modulo)
* HRW in [DF-FRAMEWORK], etc
- Independent of BGP transmission delay for RT-4
- Solutions is agnostic of the time synchronization mechanisms (e.g.
NTP, PTP, ...)
3.2.4 Interoperability * [RFC7432] default ordered list ordinal algorithm (Modulo),
Per redundancy group, for the DF election procedures to be globally * [RFC8584] highest-random weight, etc.
convergent and unanimous, it is necessary that all the participating
PEs agree on the DF Election algorithm to be used. It is, however,
possible that some PEs continue to use the existing modulus based DF
election and do not rely on the new SCT BGP extended community. PEs
running an baseline DF election mechanism shall simply discard
unrecognized new SCT BGP extended community.
A PE can indicate its willingness to support clock-synched carving by - Independent of BGP transmission delay for RT-4
signaling the new SCT BGP extended community along with the Ethernet-
Segment Route (Type-4).
3.2.5 BGP Encoding - Agnostic of the time synchronization mechanism used (e.g .NTP, PTP,
etc.)
4.2. BGP Encoding
A new BGP extended community needs to be defined to communicate the A new BGP extended community needs to be defined to communicate the
Service Carving Expected Timestamp for each Ethernet Segment. Service Carving Timestamp for each Ethernet Segment.
A new transitive extended community where the Type field is 0x06, and A new transitive extended community where the Type field is 0x06, and
the Sub-Type is <to be defined> is advertised along with Ethernet the Sub-Type is [TBD3] is advertised along with Ethernet Segment
Segment route. Timestamp for expected Service carving is encoded as a route. Timestamp for expected Service carving is encoded as a
8-octet value as follows: 8-octet value as follows:
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type=0x06 | Sub-Type(TBD) | Timestamp(upper 16)| | Type=0x06 | Sub-Type(TBD3)| Timestamp(upper 16)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp (lower 32) | | Timestamp (lower 32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
This document introduces a new flag called "T" (for Time This document introduces a new flag called "T" (for Time
Synchronization) to the bitmap field of the DF Election Extended Synchronization) to the bitmap field of the DF Election Extended
Community defined in [DF-FRAMWORK]. Community defined in [RFC8584].
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type=0x06 | Sub-Type(0x06)| DF Type |P|A|H|T| Bitmap| | Type=0x06 | Sub-Type(0x06)| DF Type |P|A|H|T| Bitmap|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved = 0 | | Reserved = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
T: This flag is located in bit position 27 as shown above. When set T: This flag is located in bit position 27 as shown above. When set
to 1, it indicates the desire to use Time Synchronization capability to 1, it indicates the desire to use Time Synchronization capability
with the rest of the PEs in the ES. This capability is used in with the rest of the PEs in the ES. This capability is used in
conjunction with the agreed upon DF Type (DF Election Type). For conjunction with the agreed upon DF Type (DF Election Type). For
example if all the PEs in the ES indicated that they have Time example if all the PEs in the ES indicated that they have Time
Synchronization capability and they want the DF type be of HRW, then Synchronization capability and they want the DF type be of HRW, then
HRW algorithm is used in conjunction with this capability. HRW algorithm is used in conjunction with this capability.
3.2.6 Note on NTP-based synchronization 4.3. Note on NTP-based synchronization
The 64-bit timestamp used by NTP protocol consists of a 32-bit part The 64-bit timestamp used by NTP protocol consists of a 32-bit part
for seconds and a 32-bit part for fractional second. Giving a time for seconds and a 32-bit part for fractional second. Giving a time
scale that rolls over every 2^32 seconds (136 years) and a scale that rolls over every 2^32 seconds (136 years) and a
theoretical resolution of 2^32 seconds (233 picoseconds). The theoretical resolution of 2^-32 seconds (233 picoseconds). The
recommendation is to keep the top 32 bits and carry lower MSB 16 bits recommendation is to keep the top 32 bits and carry lower MSB 16 bits
of fractional second. of fractional second.
3.2.7 An example 4.4. Synchronization Scenarios
Let's take figure 1 as an example where initially PE2 had failed and Let's take Figure 1 as an example where initially PE2 had failed and
PE1 had taken over. PE1 had taken over.
Based on RFC-7432: Based on [RFC7432]:
- Initial state: PE1 is in steady-state, PE2 is recovering - Initial state: PE1 is in steady-state, PE2 is recovering
- PE2 recovers at (absolute) time t=99 - PE2 recovers at (absolute) time t=99
- PE2 advertises RT-4 (sent at t=100) to partner PE1.
- PE2 advertises RT-4 (sent at t=100) to partner PE1
- PE2, it starts its 3sec peering timer as per RFC7432 - PE2, it starts its 3sec peering timer as per RFC7432
- PE1 carves immediately on RT-4 reception. PE2 carves at time t=103.
With following procedure, there is a high chance to generate a - PE1 carves immediately on RT-4 reception, i.e. t=100 + minimal BGP
traffic black hole or traffic loop. The peering timer value has a propagation delay
direct effect of this behavior. A short peering timer may generate
loop whereas a long peering timer provide a prolong blackout.
Based on the SCT approach: - PE2 carves at time t=103
With above procedure, and based on the [RFC7432] aim of favouring
traffic black hole over duplicate traffic, traffic black hole will
occur as part of each PE recovery sequence. The peering timer value
has a direct effect on the duration of the prolonged blackholing. A
short (esp. zero) peering timer may, however, result in duplicate
traffic or traffic loops.
Based on the Service Carving Time (SCT) approach:
- Initial state: PE1 is in steady-state, PE2 is recovering - Initial state: PE1 is in steady-state, PE2 is recovering
- PE2 recovers at (absolute) time t=99 - PE2 recovers at (absolute) time t=99
- PE2 advertises RT-4 (sent at t=100) with target SCT value t=103 to - PE2 advertises RT-4 (sent at t=100) with target SCT value t=103 to
partner PE1 partner PE1
- PE2 starts its 3sec peering timer as per RFC7432
- PE2 starts its 3 second peering timer as per [RFC7432]
- Both PE1 and PE2 carves at (absolute) time t=103; In fact, PE1 - Both PE1 and PE2 carves at (absolute) time t=103; In fact, PE1
should carve slightly before PE2 (skew). should carve slightly before PE2 (skew).
Using SCT approach, the effect of the peering timer is gone. Also, Using SCT approach, the negative effect of the peering timer is
the BGP RT-4 transmission delay (from PE2 to PE1) becomes a no-op. mitigated. Also, the BGP RT-4 transmission delay (from PE2 to PE1)
becomes a no-op.
4 Acknowledgement Authors would like to acknowledge helpful comments 4.5. Backwards Compatibility
and contributions of Satya Mohanty and Luc Andre Burdet.
5 Security Considerations Per redundancy group, for the DF election procedures to be globally
convergent and unanimous, it is necessary that all the participating
PEs agree on the DF Election algorithm to be used. It is, however,
possible that some PEs continue to use the existing modulus based DF
election and do not rely on the new SCT BGP extended community. PEs
running an baseline DF election mechanism shall simply discard
unrecognized new SCT BGP extended community.
A PE can indicate its willingness to support clock-synched carving by
signaling the new 'T' DF Election Capability as well as including the
new Service Carving Time BGP extended community along with the
Ethernet-Segment Route (Type-4).
5. Interoperability
If some PEs in the redundancy group signal both Handshake and Time
Synchronization capabilities (both H & T set to 1), then Time
Synchronization capability SHALL be chosen over Handshake capability
with the HRW (or Preference-based) DF election algorithm.
If some PEs in the redundancy group signal Time Synchronization (T=1)
but not Handshaking (H=0); whereas, some other PEs in the same
redundancy group signal Handshaking (H=1) but not Time
Synchronization (T=0), then the PEs that have handshaking ability,
SHALL perform DF Election using signaled or default DF-Type with
handshaking among themselves and the PEs that Time Synchronization
capability SHALL perform DF Election using signaled or default DF-
Type with time synchronization among themselves.
If some PEs in the redundancy group don't signal either Time
Synchronization or Handshaking capabilities, then these PEs SHALL
perform DF Election (Modulo, HRW or Preference-based) with default
Peering timer based mechanism defined in [RFC7432].
6. Security Considerations
The mechanisms in this document use EVPN control plane as defined in The mechanisms in this document use EVPN control plane as defined in
[RFC7432]. Security considerations described in [RFC7432] are equally [RFC7432]. Security considerations described in [RFC7432] are
applicable. This document uses MPLS and IP-based tunnel technologies equally applicable. This document uses MPLS and IP-based tunnel
to support data plane transport. Security considerations described in technologies to support data plane transport. Security
[R7432] and in [ietf-evpn-overlay] are equally applicable. considerations described in [RFC7432] and in [RFC8365] are equally
applicable.
6 IANA Considerations 7. IANA Considerations
Allocation of Extended Community Type and Sub-Type for EVPN. This document solicits the allocation of the following sub-type in
the "EVPN Route Types" registry setup by [RFC7432]:
7 References TBD1 DF Election Handshake Request This document
TBD2 DF Election Handshake Rsponse This document
7.1 Normative References This document solicits the allocation of the following sub-type in
the "EVPN Extended Community Sub-Types" registry setup by [RFC7153]:
[KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate TBD3 Service Carving Timestamp This document
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC7432] Sajassi et al., "BGP MPLS Based Ethernet VPN", February, This document solicits the allocation of the following values in the
2015. "DF Election Capabilities" registry setup by [RFC8584]:
[DF-FRAMEWORK] Rabadan, Mohanty et al., "Framework for EVPN Bit Name Reference
Designated Forwarder Election Extensibility", draft-ietf- ---- ---------------- -------------
bess-evpn-df-election-framework-00, work in progress, 2 Handshake This document
March 5, 2018. 3 Time Synchronization This document
8. References
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP
Extended Communities", RFC 7153, DOI 10.17487/RFC7153,
March 2014, <https://www.rfc-editor.org/info/rfc7153>.
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
2015, <https://www.rfc-editor.org/info/rfc7432>.
[RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R.,
Uttaro, J., and W. Henderickx, "A Network Virtualization
Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365,
DOI 10.17487/RFC8365, March 2018,
<https://www.rfc-editor.org/info/rfc8365>.
[RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake,
J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet
VPN Designated Forwarder Election Extensibility",
RFC 8584, DOI 10.17487/RFC8584, April 2019,
<https://www.rfc-editor.org/info/rfc8584>.
8.2. Informative References
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
Writing an IANA Considerations Section in RFCs", BCP 26,
RFC 8126, DOI 10.17487/RFC8126, June 2017,
<https://www.rfc-editor.org/info/rfc8126>.
Appendix A. Contributors
In addition to the authors listed on the front page, the following
co-authors have also contributed substantially to this document:
Luc Andre Burdet
Cisco
Email: lburdet@cisco.com
Appendix B. Acknowledgements
Authors would like to acknowledge helpful comments and contributions
of Satya Mohanty and Bharath Vasudevan.
7.2 Informative References
Authors' Addresses Authors' Addresses
Ali Sajassi Ali Sajassi (editor)
Cisco Cisco
Email: sajassi@cisco.com Email: sajassi@cisco.com
Gaurav Badoni Gaurav Badoni
Cisco Cisco
Email: gbadoni@cisco.com
Patrice Brissette Email: gbadoni@cisco.com
Cisco
Email: pbrisset@cisco.com
Dhananjaya Rao Dhananjaya Rao
Cisco Cisco
Email: dhrao@cisco.com Email: dhrao@cisco.com
Patrice Brissette
Cisco
Email: pbrisset@cisco.com
John Drake John Drake
Juniper Juniper
Email: jdrake@juniper.net Email: jdrake@juniper.net
Jorge Rabadan Jorge Rabadan
Juniper Nokia
Email: jorge.rabadan@nokia.com Email: jorge.rabadan@nokia.com
 End of changes. 155 change blocks. 
393 lines changed or deleted 485 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/