draft-ietf-nvo3-dataplane-requirements-02.txt   draft-ietf-nvo3-dataplane-requirements-03.txt 
Internet Engineering Task Force Nabil Bitar Internet Engineering Task Force Nabil Bitar
Internet Draft Verizon Internet Draft Verizon
Intended status: Informational Intended status: Informational
Expires: May 2014 Marc Lasserre Expires: Oct 2014 Marc Lasserre
Florin Balus Florin Balus
Alcatel-Lucent Alcatel-Lucent
Thomas Morin Thomas Morin
France Telecom Orange France Telecom Orange
Lizhong Jin Lizhong Jin
Bhumip Khasnabish Bhumip Khasnabish
ZTE ZTE
November 12, 2013 April 15, 2014
NVO3 Data Plane Requirements NVO3 Data Plane Requirements
draft-ietf-nvo3-dataplane-requirements-02.txt draft-ietf-nvo3-dataplane-requirements-03.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 12, 2014. This Internet-Draft will expire on Oct 15, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 24 skipping to change at page 2, line 21
Abstract Abstract
Several IETF drafts relate to the use of overlay networks to support Several IETF drafts relate to the use of overlay networks to support
large scale virtual data centers. This draft provides a list of data large scale virtual data centers. This draft provides a list of data
plane requirements for Network Virtualization over L3 (NVO3) that plane requirements for Network Virtualization over L3 (NVO3) that
have to be addressed in solutions documents. have to be addressed in solutions documents.
Table of Contents Table of Contents
1. Introduction..................................................3 1. Introduction.................................................3
1.1. Conventions used in this document........................3 1.1. Conventions used in this document.......................3
1.2. General terminology......................................3 1.2. General terminology.....................................3
2. Data Path Overview............................................4 2. Data Path Overview...........................................3
3. Data Plane Requirements.......................................5 3. Data Plane Requirements......................................5
3.1. Virtual Access Points (VAPs).............................5 3.1. Virtual Access Points (VAPs)............................5
3.2. Virtual Network Instance (VNI)...........................5 3.2. Virtual Network Instance (VNI)..........................5
3.2.1. L2 VNI.................................................5 3.2.1. L2 VNI................................................5
3.2.2. L3 VNI.................................................6 3.2.2. L3 VNI................................................6
3.3. Overlay Module...........................................7 3.3. Overlay Module..........................................7
3.3.1. NVO3 overlay header....................................8 3.3.1. NVO3 overlay header...................................8
3.3.1.1. Virtual Network Context Identification...............8 3.3.1.1. Virtual Network Context Identification..............8
3.3.1.2. Service QoS identifier...............................8 3.3.1.2. Quality of Service (QoS) identifier.................8
3.3.2. Tunneling function.....................................9 3.3.2. Tunneling function....................................9
3.3.2.1. LAG and ECMP........................................10 3.3.2.1. LAG and ECMP........................................9
3.3.2.2. DiffServ and ECN marking............................10 3.3.2.2. DiffServ and ECN marking...........................10
3.3.2.3. Handling of BUM traffic.............................11 3.3.2.3. Handling of BUM traffic............................11
3.4. External NVO3 connectivity..............................11 3.4. External NVO3 connectivity.............................11
3.4.1. GW Types..............................................12 3.4.1. Gateway (GW) Types...................................12
3.4.1.1. VPN and Internet GWs................................12 3.4.1.1. VPN and Internet GWs...............................12
3.4.1.2. Inter-DC GW.........................................12 3.4.1.2. Inter-DC GW........................................12
3.4.1.3. Intra-DC gateways...................................12 3.4.1.3. Intra-DC gateways..................................12
3.4.2. Path optimality between NVEs and Gateways.............12 3.4.2. Path optimality between NVEs and Gateways............12
3.4.2.1. Load-balancing......................................14 3.4.2.1. Load-balancing.....................................13
3.4.2.2. Triangular Routing Issues (a.k.a. Traffic Tromboning)14 3.4.2.2. Triangular Routing Issues..........................14
3.5. Path MTU................................................14 3.5. Path MTU...............................................14
3.6. Hierarchical NVE........................................15 3.6. Hierarchical NVE dataplane requirements................15
3.7. NVE Multi-Homing Requirements...........................15 3.7. Other considerations...................................15
3.8. Other considerations....................................16 3.7.1. Data Plane Optimizations.............................15
3.8.1. Data Plane Optimizations..............................16 3.7.2. NVE location trade-offs..............................15
3.8.2. NVE location trade-offs...............................16 4. Security Considerations.....................................16
4. Security Considerations......................................17 5. IANA Considerations.........................................16
5. IANA Considerations..........................................17 6. References..................................................16
6. References...................................................17 6.1. Normative References...................................16
6.1. Normative References....................................17 6.2. Informative References.................................16
6.2. Informative References..................................17 7. Acknowledgments.............................................17
7. Acknowledgments..............................................18
1. Introduction 1. Introduction
1.1. Conventions used in this document 1.1. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [RFC2119]. document are to be interpreted as described in RFC-2119 [RFC2119].
In this document, these words will appear with that interpretation In this document, these words will appear with that interpretation
skipping to change at page 4, line 24 skipping to change at page 4, line 19
| +----------+-------+ | | +---------+--------+ | | +----------+-------+ | | +---------+--------+ |
| | Overlay Module | | | | Overlay Module | | | | Overlay Module | | | | Overlay Module | |
| +---------+--------+ | | +---------+--------+ | | +---------+--------+ | | +---------+--------+ |
| |VN context| | VN context| | | |VN context| | VN context| |
| | | | | | | | | | | |
| +-------+--------+ | | +--------+-------+ | | +-------+--------+ | | +--------+-------+ |
| | |VNI| ... |VNI| | | | |VNI| ... |VNI| | | | |VNI| ... |VNI| | | | |VNI| ... |VNI| |
NVE1 | +-+------------+-+ | | +-+-----------+--+ | NVE2 NVE1 | +-+------------+-+ | | +-+-----------+--+ | NVE2
| | VAPs | | | | VAPs | | | | VAPs | | | | VAPs | |
+----+------------+----+ +----+------------+----+ +----+------------+----+ +----+------------+----+
| | | | | | | |
-------+------------+-----------------+------------+------- -------+------------+-----------------+------------+-------
| | Tenant | | | | Tenant | |
| | Service IF | | | | Service IF | |
Tenant Systems Tenant Systems Tenant Systems Tenant Systems
Figure 1 : Generic reference model for NV Edge Figure 1 : Generic reference model for NV Edge
When a frame is received by an ingress NVE from a Tenant System over When a frame is received by an ingress NVE from a Tenant System over
a local VAP, it needs to be parsed in order to identify which a local VAP, it needs to be parsed in order to identify which
virtual network instance it belongs to. The parsing function can virtual network instance it belongs to. The parsing function can
examine various fields in the data frame (e.g., VLANID) and/or examine various fields in the data frame (e.g., VLANID) and/or
associated interface/port the frame came from. associated interface/port the frame came from.
Once a corresponding VNI is identified, a lookup is performed to Once a corresponding VNI is identified, a lookup is performed to
determine where the frame needs to be sent. This lookup can be based determine where the frame needs to be sent. This lookup can be based
on any combinations of various fields in the data frame (e.g., on any combinations of various fields in the data frame (e.g.,
destination MAC addresses and/or destination IP addresses). Note destination MAC addresses and/or destination IP addresses). Note
that additional criteria such as 802.1p and/or DSCP markings might that additional criteria such as Ethernet 802.1p priorities and/or
be used to select an appropriate tunnel or local VAP destination. DSCP markings might be used to select an appropriate tunnel or local
VAP destination.
Lookup tables can be populated using different techniques: data Lookup tables can be populated using different techniques: data
plane learning, management plane configuration, or a distributed plane learning, management plane configuration, or a distributed
control plane. Management and control planes are not in the scope of control plane. Management and control planes are not in the scope of
this document. The data plane based solution is described in this this document. The data plane based solution is described in this
document as it has implications on the data plane processing document as it has implications on the data plane processing
function. function.
The result of this lookup yields the corresponding information The result of this lookup yields the corresponding information
needed to build the overlay header, as described in section 3.3. needed to build the overlay header, as described in section 3.3.
skipping to change at page 5, line 29 skipping to change at page 5, line 26
the appropriate recipient, usually a local VAP. the appropriate recipient, usually a local VAP.
3. Data Plane Requirements 3. Data Plane Requirements
3.1. Virtual Access Points (VAPs) 3.1. Virtual Access Points (VAPs)
The NVE forwarding plane MUST support VAP identification through the The NVE forwarding plane MUST support VAP identification through the
following mechanisms: following mechanisms:
- Using the local interface on which the frames are received, where - Using the local interface on which the frames are received, where
the local interface may be an internal, virtual port in a VSwitch the local interface may be an internal, virtual port in a virtual
or a physical port on the ToR switch or a physical port on a ToR switch
- Using the local interface and some fields in the frame header, - Using the local interface and some fields in the frame header,
e.g. one or multiple VLANs or the source MAC e.g. one or multiple VLANs or the source MAC
3.2. Virtual Network Instance (VNI) 3.2. Virtual Network Instance (VNI)
VAPs are associated with a specific VNI at service instantiation VAPs are associated with a specific VNI at service instantiation
time. time.
A VNI identifies a per-tenant private context, i.e. per-tenant A VNI identifies a per-tenant private context, i.e. per-tenant
policies and a FIB table to allow overlapping address space between policies and a FIB table to allow overlapping address space between
skipping to change at page 6, line 14 skipping to change at page 6, line 10
a set of NVO3 tunnels). The emulated bridge could be 802.1Q enabled a set of NVO3 tunnels). The emulated bridge could be 802.1Q enabled
(allowing use of VLAN tags as a VAP). An L2 VNI provides per tenant (allowing use of VLAN tags as a VAP). An L2 VNI provides per tenant
virtual switching instance with MAC addressing isolation and L3 virtual switching instance with MAC addressing isolation and L3
tunneling. Loop avoidance capability MUST be provided. tunneling. Loop avoidance capability MUST be provided.
Forwarding table entries provide mapping information between tenant Forwarding table entries provide mapping information between tenant
system MAC addresses and VAPs on directly connected VNIs and L3 system MAC addresses and VAPs on directly connected VNIs and L3
tunnel destination addresses over the overlay. Such entries could be tunnel destination addresses over the overlay. Such entries could be
populated by a control or management plane, or via data plane. populated by a control or management plane, or via data plane.
By default, data plane learning MUST be used to populate forwarding Unless a control plane is used to disseminate address mappings, data
tables. As frames arrive from VAPs or from overlay tunnels, standard plane learning MUST be used to populate forwarding tables. As frames
MAC learning procedures are used: The tenant system source MAC arrive from VAPs or from overlay tunnels, standard MAC learning
address is learned against the VAP or the NVO3 tunneling procedures are used: The tenant system source MAC address is learned
encapsulation source address on which the frame arrived. This against the VAP or the NVO3 tunneling encapsulation source address
implies that unknown unicast traffic will be flooded (i.e. on which the frame arrived. Data plane learning implies that unknown
broadcast). unicast traffic will be flooded (i.e. broadcast).
When flooding is required, either to deliver unknown unicast, or When flooding is required, either to deliver unknown unicast, or
broadcast or multicast traffic, the NVE MUST either support ingress broadcast or multicast traffic, the NVE MUST either support ingress
replication or multicast. replication or multicast.
When using multicast, the NVE MUST have one or more multicast trees When using underlay multicast, the NVE MUST have one or more
that can be used by local VNIs for flooding to NVEs belonging to the underlay multicast trees that can be used by local VNIs for flooding
same VN. For each VNI, there is at least one flooding tree used for to NVEs belonging to the same VN. For each VNI, there is at least
Broadcast, Unknown Unicast and Multicast forwarding. This tree MAY one underlay flooding tree used for Broadcast, Unknown Unicast and
be shared across VNIs. The flooding tree is equivalent with a Multicast forwarding. This tree MAY be shared across VNIs. The
multicast (*,G) construct where all the NVEs for which the flooding tree is equivalent with a multicast (*,G) construct where
corresponding VNI is instantiated are members. all the NVEs for which the corresponding VNI is instantiated are
members.
When tenant multicast is supported, it SHOULD also be possible to When tenant multicast is supported, it SHOULD also be possible to
select whether the NVE provides optimized multicast trees inside the select whether the NVE provides optimized underlay multicast trees
VNI for individual tenant multicast groups or whether the default inside the VNI for individual tenant multicast groups or whether the
VNI flooding tree is used. If the former option is selected the VNI default VNI flooding tree is used. If the former option is selected
SHOULD be able to snoop IGMP/MLD messages in order to efficiently the VNI SHOULD be able to snoop IGMP/MLD messages in order to
join/prune Tenant System from multicast trees. efficiently join/prune Tenant System from multicast trees.
3.2.2. L3 VNI 3.2.2. L3 VNI
L3 VNIs MUST provide virtualized IP routing and forwarding. L3 VNIs L3 VNIs MUST provide virtualized IP routing and forwarding. L3 VNIs
MUST support per-tenant forwarding instance with IP addressing MUST support per-tenant forwarding instance with IP addressing
isolation and L3 tunneling for interconnecting instances of the same isolation and L3 tunneling for interconnecting instances of the same
VNI on NVEs. VNI on NVEs.
In the case of L3 VNI, the inner TTL field MUST be decremented by In the case of L3 VNI, the inner TTL field MUST be decremented by
(at least) 1 as if the NVO3 egress NVE was one (or more) hop(s) (at least) 1 as if the NVO3 egress NVE was one (or more) hop(s)
away. The TTL field in the outer IP header MUST be set to a value away. The TTL field in the outer IP header MUST be set to a value
appropriate for delivery of the encapsulated frame to the tunnel appropriate for delivery of the encapsulated frame to the tunnel
exit point. Thus, the default behavior MUST be the TTL pipe model exit point. Thus, the default behavior MUST be the TTL pipe model
where the overlay network looks like one hop to the sending NVE. where the overlay network looks like one hop to the sending NVE.
Configuration of a "uniform" TTL model where the outer tunnel TTL is Configuration of a "uniform" TTL model where the outer tunnel TTL is
set equal to the inner TTL on ingress NVE and the inner TTL is set set equal to the inner TTL on ingress NVE and the inner TTL is set
to the outer TTL value on egress MAY be supported. to the outer TTL value on egress MAY be supported. [RFC2983]
provides additional details on the uniform and pipe models.
L2 and L3 VNIs can be deployed in isolation or in combination to L2 and L3 VNIs can be deployed in isolation or in combination to
optimize traffic flows per tenant across the overlay network. For optimize traffic flows per tenant across the overlay network. For
example, an L2 VNI may be configured across a number of NVEs to example, an L2 VNI may be configured across a number of NVEs to
offer L2 multi-point service connectivity while a L3 VNI can be co- offer L2 multi-point service connectivity while a L3 VNI can be co-
located to offer local routing capabilities and gateway located to offer local routing capabilities and gateway
functionality. In addition, integrated routing and bridging per functionality. In addition, integrated routing and bridging per
tenant MAY be supported on an NVE. An instantiation of such service tenant MAY be supported on an NVE. An instantiation of such service
may be realized by interconnecting an L2 VNI as access to an L3 VNI may be realized by interconnecting an L2 VNI as access to an L3 VNI
on the NVE. on the NVE.
When multicast is supported, it MAY be possible to select whether When underlay multicast is supported, it MAY be possible to select
the NVE provides optimized multicast trees inside the VNI for whether the NVE provides optimized underlay multicast trees inside
individual tenant multicast groups or whether a default VNI the VNI for individual tenant multicast groups or whether a default
multicasting tree, where all the NVEs of the corresponding VNI are underlay VNI multicasting tree, where all the NVEs of the
members, is used. corresponding VNI are members, is used.
3.3. Overlay Module 3.3. Overlay Module
The overlay module performs a number of functions related to NVO3 The overlay module performs a number of functions related to NVO3
header and tunnel processing. header and tunnel processing.
The following figure shows a generic NVO3 encapsulated frame: The following figure shows a generic NVO3 encapsulated frame:
+--------------------------+ +--------------------------+
| Tenant Frame | | Tenant Frame |
+--------------------------+ +--------------------------+
| NVO3 Overlay Header | | NVO3 Overlay Header |
+--------------------------+ +--------------------------+
| Outer Underlay header | | Outer Underlay header |
+--------------------------+ +--------------------------+
| Outer Link layer header | | Outer Link layer header |
+--------------------------+ +--------------------------+
Figure 2 : NVO3 encapsulated frame Figure 2 : NVO3 encapsulated frame
where where
. Tenant frame: Ethernet or IP based upon the VNI type
. Tenant frame: Ethernet or IP based upon the VNI type
. NVO3 overlay header: Header containing VNI context information . NVO3 overlay header: Header containing VNI context information
and other optional fields that can be used for processing and other optional fields that can be used for processing
this packet. this packet.
. Outer underlay header: Can be either IP or MPLS . Outer underlay header: Can be either IP or MPLS
. Outer link layer header: Header specific to the physical . Outer link layer header: Header specific to the physical
transmission link used transmission link used
3.3.1. NVO3 overlay header 3.3.1. NVO3 overlay header
skipping to change at page 8, line 40 skipping to change at page 8, line 38
The egress NVE uses this field to determine the appropriate virtual The egress NVE uses this field to determine the appropriate virtual
network context in which to process the packet. This field MAY be an network context in which to process the packet. This field MAY be an
explicit, unique (to the administrative domain) virtual network explicit, unique (to the administrative domain) virtual network
identifier (VNID) or MAY express the necessary context information identifier (VNID) or MAY express the necessary context information
in other ways (e.g. a locally significant identifier). in other ways (e.g. a locally significant identifier).
In the case of a global identifier, this field MUST be large enough In the case of a global identifier, this field MUST be large enough
to scale to 100's of thousands of virtual networks. Note that there to scale to 100's of thousands of virtual networks. Note that there
is typically no such constraint when using a local identifier. is typically no such constraint when using a local identifier.
3.3.1.2. Service QoS identifier 3.3.1.2. Quality of Service (QoS) identifier
Traffic flows originating from different applications could rely on Traffic flows originating from different applications could rely on
differentiated forwarding treatment to meet end-to-end availability differentiated forwarding treatment to meet end-to-end availability
and performance objectives. Such applications may span across one or and performance objectives. Such applications may span across one or
more overlay networks. To enable such treatment, support for more overlay networks. To enable such treatment, support for
multiple Classes of Service across or between overlay networks MAY multiple Classes of Service (Cos) across or between overlay networks
be required. MAY be required.
To effectively enforce CoS across or between overlay networks, NVEs To effectively enforce CoS across or between overlay networks
MAY be able to map CoS markings between networking layers, e.g., without Deep Packet Inspection (DPI) repeat, NVEs MAY be able to map
Tenant Systems, Overlays, and/or Underlay, enabling each networking CoS markings between networking layers, e.g., Tenant Systems,
layer to independently enforce its own CoS policies. For example: Overlays, and/or Underlay, enabling each networking layer to
independently enforce its own CoS policies. For example:
- TS (e.g. VM) CoS - TS (e.g. VM) CoS
o Tenant CoS policies MAY be defined by Tenant administrators o Tenant CoS policies MAY be defined by Tenant administrators
o QoS fields (e.g. IP DSCP and/or Ethernet 802.1p) in the o QoS fields (e.g. IP DSCP and/or Ethernet 802.1p) in the
tenant frame are used to indicate application level CoS tenant frame are used to indicate application level CoS
requirements requirements
- NVE CoS - NVE CoS: Support for NVE Service CoS MAY be provided through a
QoS field, inside the NVO3 overlay header
o NVE MAY classify packets based on Tenant CoS markings or o NVE MAY classify packets based on Tenant CoS markings or
other mechanisms (eg. DPI) to identify the proper service CoS other mechanisms (eg. DPI) to identify the proper service CoS
to be applied across the overlay network to be applied across the overlay network
o NVE service CoS levels are normalized to a common set (for o NVE service CoS levels are normalized to a common set (for
example 8 levels) across multiple tenants; NVE uses per example 8 levels) across multiple tenants; NVE uses per
tenant policies to map Tenant CoS to the normalized service tenant policies to map Tenant CoS to the normalized service
CoS fields in the NVO3 header CoS fields in the NVO3 header
- Underlay CoS - Underlay CoS
o The underlay/core network MAY use a different CoS set (for o The underlay/core network MAY use a different CoS set (for
example 4 levels) than the NVE CoS as the core devices MAY example 4 levels) than the NVE CoS as the core devices MAY
have different QoS capabilities compared with NVEs. have different QoS capabilities compared with NVEs.
o The Underlay CoS MAY also change as the NVO3 tunnels pass o The Underlay CoS MAY also change as the NVO3 tunnels pass
between different domains. between different domains.
Support for NVE Service CoS MAY be provided through a QoS field,
inside the NVO3 overlay header. Examples of service CoS provided
part of the service tag are 802.1p and DE bits in the VLAN and PBB
ISID tags and MPLS TC bits in the VPN labels.
3.3.2. Tunneling function 3.3.2. Tunneling function
This section describes the underlay tunneling requirements. From an This section describes the underlay tunneling requirements. From an
encapsulation perspective, IPv4 or IPv6 MUST be supported, both IPv4 encapsulation perspective, IPv4 or IPv6 MUST be supported, both IPv4
and IPv6 SHOULD be supported, MPLS tunneling MAY be supported. and IPv6 SHOULD be supported, MPLS MAY be supported.
3.3.2.1. LAG and ECMP 3.3.2.1. LAG and ECMP
For performance reasons, multipath over LAG and ECMP paths MAY be For performance reasons, multipath over LAG and ECMP paths MAY be
supported. supported.
LAG (Link Aggregation Group) [IEEE 802.1AX-2008] and ECMP (Equal LAG (Link Aggregation Group) [IEEE 802.1AX-2008] and ECMP (Equal
Cost Multi Path) are commonly used techniques to perform load- Cost Multi Path) are commonly used techniques to perform load-
balancing of microflows over a set of a parallel links either at balancing of microflows over a set of a parallel links either at
Layer-2 (LAG) or Layer-3 (ECMP). Existing deployed hardware Layer-2 (LAG) or Layer-3 (ECMP). Existing deployed hardware
implementations of LAG and ECMP uses a hash of various fields in the implementations of LAG and ECMP uses a hash of various fields in the
encapsulation (outermost) header(s) (e.g. source and destination MAC encapsulation (outermost) header(s) (e.g. source and destination MAC
addresses for non-IP traffic, source and destination IP addresses, addresses for non-IP traffic, source and destination IP addresses,
L4 protocol, L4 source and destination port numbers, etc). L4 protocol, L4 source and destination port numbers, etc).
Furthermore, hardware deployed for the underlay network(s) will be Furthermore, hardware deployed for the underlay network(s) will be
most often unaware of the carried, innermost L2 frames or L3 packets most often unaware of the carried, innermost L2 frames or L3 packets
transmitted by the TS. transmitted by the TS.
Thus, in order to perform fine-grained load-balancing over LAG and Thus, in order to perform fine-grained load-balancing over LAG and
ECMP paths in the underlying network, the encapsulation MUST result ECMP paths in the underlying network, the encapsulation needs to
in sufficient entropy to exercise all paths through several LAG/ECMP present sufficient entropy to exercise all paths through several
hops. LAG/ECMP hops.
The entropy information can be inferred from the NVO3 overlay header The entropy information can be inferred from the NVO3 overlay header
or underlay header. If the overlay protocol does not support the or underlay header. If the overlay protocol does not support the
necessary entropy information or the switches/routers in the necessary entropy information or the switches/routers in the
underlay do not support parsing of the additional entropy underlay do not support parsing of the additional entropy
information in the overlay header, underlay switches and routers information in the overlay header, underlay switches and routers
should be programmable, i.e. select the appropriate fields in the should be programmable, i.e. select the appropriate fields in the
underlay header for hash calculation based on the type of overlay underlay header for hash calculation based on the type of overlay
header. header.
skipping to change at page 11, line 26 skipping to change at page 11, line 18
3.3.2.3. Handling of BUM traffic 3.3.2.3. Handling of BUM traffic
NVO3 data plane support for either ingress replication or point-to- NVO3 data plane support for either ingress replication or point-to-
multipoint tunnels is required to send traffic destined to multiple multipoint tunnels is required to send traffic destined to multiple
locations on a per-VNI basis (e.g. L2/L3 multicast traffic, L2 locations on a per-VNI basis (e.g. L2/L3 multicast traffic, L2
broadcast and unknown unicast traffic). It is possible that both broadcast and unknown unicast traffic). It is possible that both
methods be used simultaneously. methods be used simultaneously.
There is a bandwidth vs state trade-off between the two approaches. There is a bandwidth vs state trade-off between the two approaches.
User-configurable knobs MUST be provided to select which method(s) User-configurable settings MUST be provided to select which
gets used based upon the amount of replication required (i.e. the method(s) gets used based upon the amount of replication required
number of hosts per group), the amount of multicast state to (i.e. the number of hosts per group), the amount of multicast state
maintain, the duration of multicast flows and the scalability of to maintain, the duration of multicast flows and the scalability of
multicast protocols. multicast protocols.
When ingress replication is used, NVEs MUST maintain for each VNI When ingress replication is used, NVEs MUST maintain for each VNI
the related tunnel endpoints to which it needs to replicate the the related tunnel endpoints to which it needs to replicate the
frame. frame.
For point-to-multipoint tunnels, the bandwidth efficiency is For point-to-multipoint tunnels, the bandwidth efficiency is
increased at the cost of more state in the Core nodes. The ability increased at the cost of more state in the Core nodes. The ability
to auto-discover or pre-provision the mapping between VNI multicast to auto-discover or pre-provision the mapping between VNI multicast
trees to related tunnel endpoints at the NVE and/or throughout the trees to related tunnel endpoints at the NVE and/or throughout the
core SHOULD be supported. core SHOULD be supported.
3.4. External NVO3 connectivity 3.4. External NVO3 connectivity
NVO3 services MUST interoperate with current VPN and Internet It is important that NVO3 services interoperate with current VPN and
services. This may happen inside one DC during a migration phase or Internet services. This may happen inside one DC during a migration
as NVO3 services are delivered to the outside world via Internet or phase or as NVO3 services are delivered to the outside world via
VPN gateways. Internet or VPN gateways (GW).
Moreover the compute and storage services delivered by a NVO3 domain Moreover the compute and storage services delivered by a NVO3 domain
may span multiple DCs requiring Inter-DC connectivity. From a DC may span multiple DCs requiring Inter-DC connectivity. From a DC
perspective a set of gateway devices are required in all of these perspective a set of GW devices are required in all of these cases
cases albeit with different functionalities influenced by the albeit with different functionalities influenced by the overlay type
overlay type across the WAN, the service type and the DC network across the WAN, the service type and the DC network technologies
technologies used at each DC site. used at each DC site.
A GW handling the connectivity between NVO3 and external domains A GW handling the connectivity between NVO3 and external domains
represents a single point of failure that may affect multiple tenant represents a single point of failure that may affect multiple tenant
services. Redundancy between NVO3 and external domains MUST be services. Redundancy between NVO3 and external domains MUST be
supported. supported.
3.4.1. GW Types 3.4.1. Gateway (GW) Types
3.4.1.1. VPN and Internet GWs 3.4.1.1. VPN and Internet GWs
Tenant sites may be already interconnected using one of the existing Tenant sites may be already interconnected using one of the existing
VPN services and technologies (VPLS or IP VPN). If a new NVO3 VPN services and technologies (VPLS or IP VPN). If a new NVO3
encapsulation is used, a VPN GW is required to forward traffic encapsulation is used, a VPN GW is required to forward traffic
between NVO3 and VPN domains. Translation of encapsulations MAY be between NVO3 and VPN domains. Internet connected Tenants require
required. Internet connected Tenants require translation from NVO3 translation from NVO3 encapsulation to IP in the NVO3 gateway. The
encapsulation to IP in the NVO3 gateway. The translation function translation function SHOULD minimize provisioning touches.
SHOULD minimize provisioning touches.
3.4.1.2. Inter-DC GW 3.4.1.2. Inter-DC GW
Inter-DC connectivity MAY be required to provide support for Inter-DC connectivity MAY be required to provide support for
features like disaster prevention or compute load re-distribution. features like disaster prevention or compute load re-distribution.
This MAY be provided via a set of gateways interconnected through a This MAY be provided via a set of gateways interconnected through a
WAN. This type of connectivity MAY be provided either through WAN. This type of connectivity MAY be provided either through
extension of the NVO3 tunneling domain or via VPN GWs. extension of the NVO3 tunneling domain or via VPN GWs.
3.4.1.3. Intra-DC gateways 3.4.1.3. Intra-DC gateways
Even within one DC there may be End Devices that do not support NVO3 Even within one DC there may be End Devices that do not support NVO3
encapsulation, for example bare metal servers, hardware appliances encapsulation, for example bare metal servers, hardware appliances
and storage. A gateway device, e.g. a ToR, is required to translate and storage. A gateway device, e.g. a ToR switch, is required to
the NVO3 to Ethernet VLAN encapsulation. translate the NVO3 to Ethernet VLAN encapsulation.
3.4.2. Path optimality between NVEs and Gateways 3.4.2. Path optimality between NVEs and Gateways
Within an NVO3 overlay, a default assumption is that NVO3 traffic Within an NVO3 overlay, a default assumption is that NVO3 traffic
will be equally load-balanced across the underlying network will be equally load-balanced across the underlying network
consisting of LAG and/or ECMP paths. This assumption is valid only consisting of LAG and/or ECMP paths. This assumption is valid only
as long as: a) all traffic is load-balanced equally among each of as long as: a) all traffic is load-balanced equally among each of
the component-links and paths; and, b) each of the component- the component-links and paths; and, b) each of the component-
links/paths is of identical capacity. During the course of normal links/paths is of identical capacity. During the course of normal
operation of the underlying network, it is possible that one, or operation of the underlying network, it is possible that one, or
more, of the component-links/paths of a LAG may be taken out-of- more, of the component-links/paths of a LAG may be taken out-of-
service in order to be repaired, e.g.: due to hardware failure of service in order to be repaired, e.g.: due to hardware failure of
cabling, optics, etc. In such cases, the administrator should cabling, optics, etc. In such cases, the administrator may configure
configure the underlying network such that an entire LAG bundle in the underlying network such that an entire LAG bundle in the
the underlying network will be reported as operationally down if underlying network will be reported as operationally down if there
there is a failure of any single component-link member of the LAG is a failure of any single component-link member of the LAG bundle,
bundle, (e.g.: N = M configuration of the LAG bundle), and, thus, (e.g.: N = M configuration of the LAG bundle), and, thus, they know
they know that traffic will be carried sufficiently by alternate, that traffic will be carried sufficiently by alternate, available
available (potentially ECMP) paths in the underlying network. This (potentially ECMP) paths in the underlying network. This is a likely
is a likely an adequate assumption for Intra-DC traffic where an adequate assumption for Intra-DC traffic where presumably the
presumably the costs for additional, protection capacity along costs for additional, protection capacity along alternate paths is
alternate paths is not cost-prohibitive. Thus, there are likely no not cost-prohibitive. In this case, there are no additional
additional requirements on NVO3 solutions to accommodate this type requirements on NVO3 solutions to accommodate this type of
of underlying network configuration and administration. underlying network configuration and administration.
There is a similar case with ECMP, used Intra-DC, where failure of a There is a similar case with ECMP, used Intra-DC, where failure of a
single component-path of an ECMP group would result in traffic single component-path of an ECMP group would result in traffic
shifting onto the surviving members of the ECMP group. shifting onto the surviving members of the ECMP group.
Unfortunately, there are no automatic recovery methods in IP routing Unfortunately, there are no automatic recovery methods in IP routing
protocols to detect a simultaneous failure of more than one protocols to detect a simultaneous failure of more than one
component-path in a ECMP group, operationally disable the entire component-path in a ECMP group, operationally disable the entire
ECMP group and allow traffic to shift onto alternative paths. This ECMP group and allow traffic to shift onto alternative paths. This
problem is attributable to the underlying network and, thus, out-of- problem is attributable to the underlying network and, thus, out-of-
scope of any NVO3 solutions. scope of any NVO3 solutions.
skipping to change at page 14, line 14 skipping to change at page 14, line 5
3.4.2.1. Load-balancing 3.4.2.1. Load-balancing
When using active-active load-balancing across physically separate When using active-active load-balancing across physically separate
NVE GW's (e.g.: two, separate chassis) an NVO3 solution SHOULD NVE GW's (e.g.: two, separate chassis) an NVO3 solution SHOULD
support forwarding tables that can simultaneously map a single support forwarding tables that can simultaneously map a single
egress NVE to more than one NVO3 tunnels. The granularity of such egress NVE to more than one NVO3 tunnels. The granularity of such
mappings, in both active-backup and active-active, MUST be specific mappings, in both active-backup and active-active, MUST be specific
to each tenant. to each tenant.
3.4.2.2. Triangular Routing Issues (a.k.a. Traffic Tromboning) 3.4.2.2. Triangular Routing Issues
L2/ELAN over NVO3 service may span multiple racks distributed across L2/ELAN over NVO3 service may span multiple racks distributed across
different DC regions. Multiple ELANs belonging to one tenant may be different DC regions. Multiple ELANs belonging to one tenant may be
interconnected or connected to the outside world through multiple interconnected or connected to the outside world through multiple
Router/VRF gateways distributed throughout the DC regions. In this Router/VRF gateways distributed throughout the DC regions. In this
scenario, without aid from an NVO3 or other type of solution, scenario, without aid from an NVO3 or other type of solution,
traffic from an ingress NVE destined to External gateways will take traffic from an ingress NVE destined to External gateways will take
a non-optimal path that will result in higher latency and costs, a non-optimal path that will result in higher latency and costs,
(since it is using more expensive resources of a WAN). In the case (since it is using more expensive resources of a WAN). In the case
of traffic from an IP/MPLS network destined toward the entrance to of traffic from an IP/MPLS network destined toward the entrance to
an NVO3 overlay, well-known IP routing techniques MAY be used to an NVO3 overlay, well-known IP routing techniques MAY be used to
optimize traffic into the NVO3 overlay, (at the expense of optimize traffic into the NVO3 overlay, (at the expense of
additional routes in the IP/MPLS network). In summary, these issues additional routes in the IP/MPLS network). In summary, these issues
are well known as triangular routing. are well known as triangular routing (a.k.a. traffic tromboning).
Procedures for gateway selection to avoid triangular routing issues Procedures for gateway selection to avoid triangular routing issues
SHOULD be provided. SHOULD be provided.
The details of such procedures are, most likely, part of the NVO3 The details of such procedures are, most likely, part of the NVO3
Management and/or Control Plane requirements and, thus, out of scope Management and/or Control Plane requirements and, thus, out of scope
of this document. However, a key requirement on the dataplane of any of this document. However, a key requirement on the dataplane of any
NVO3 solution to avoid triangular routing is stated above, in NVO3 solution to avoid triangular routing is stated above, in
Section 3.4.2, with respect to active-active load-balancing. More Section 3.4.2, with respect to active-active load-balancing. More
specifically, an NVO3 solution SHOULD support forwarding tables that specifically, an NVO3 solution SHOULD support forwarding tables that
skipping to change at page 15, line 24 skipping to change at page 15, line 16
Extended MTU Path Discovery techniques such as defined in Extended MTU Path Discovery techniques such as defined in
[RFC4821] [RFC4821]
o Segmentation and reassembly support from the overlay layer o Segmentation and reassembly support from the overlay layer
operations without relying on the Tenant Systems to know about operations without relying on the Tenant Systems to know about
the end-to-end MTU the end-to-end MTU
o The underlay network MAY be designed in such a way that the MTU o The underlay network MAY be designed in such a way that the MTU
can accommodate the extra tunnel overhead. can accommodate the extra tunnel overhead.
3.6. Hierarchical NVE 3.6. Hierarchical NVE dataplane requirements
It might be desirable to support the concept of hierarchical NVEs, It might be desirable to support the concept of hierarchical NVEs,
such as spoke NVEs and hub NVEs, in order to address possible NVE such as spoke NVEs and hub NVEs, in order to address possible NVE
performance limitations and service connectivity optimizations. performance limitations and service connectivity optimizations.
For instance, spoke NVE functionality may be used when processing For instance, spoke NVE functionality may be used when processing
capabilities are limited. A hub NVE would provide additional data capabilities are limited. In this case, a hub NVE MUST provide
processing capabilities such as packet replication. additional data processing capabilities such as packet replication.
NVEs can be either connected in an any-to-any or hub and spoke
topology on a per VNI basis.
3.7. NVE Multi-Homing Requirements
Multi-homing techniques SHOULD be used to increase the reliability
of an nvo3 network. It is also important to ensure that physical
diversity in an nvo3 network is taken into account to avoid single
points of failure.
Multi-homing can be enabled in various nodes, from tenant systems
into TORs, TORs into core switches/routers, and core nodes into DC
GWs.
Tenant systems can either be L2 or L3 nodes. In the former case
(L2), techniques such as LAG or STP for instance MAY be used. In the
latter case (L3), it is possible that no dynamic routing protocol is
enabled. Tenant systems can be multi-homed into remote NVE using
several interfaces (physical NICS or vNICS) with an IP address per
interface either to the same nvo3 network or into different nvo3
networks. When one of the links fails, the corresponding IP is not
reachable but the other interfaces can still be used. When a tenant
system is co-located with an NVE, IP routing can be relied upon to
handle routing over diverse links to TORs.
External connectivity MAY be handled by two or more nvo3 gateways.
Each gateway is connected to a different domain (e.g. ISP) and runs
BGP multi-homing. They serve as an access point to external networks
such as VPNs or the Internet. When a connection to an upstream
router is lost, the alternative connection is used and the failed
route withdrawn.
3.8. Other considerations 3.7. Other considerations
3.8.1. Data Plane Optimizations 3.7.1. Data Plane Optimizations
Data plane forwarding and encapsulation choices SHOULD consider the Data plane forwarding and encapsulation choices SHOULD consider the
limitation of possible NVE implementations, specifically in software limitation of possible NVE implementations, specifically in software
based implementations (e.g. servers running VSwitches) based implementations (e.g. servers running virtual switches)
NVE SHOULD provide efficient processing of traffic. For instance, NVE SHOULD provide efficient processing of traffic. For instance,
packet alignment, the use of offsets to minimize header parsing, packet alignment, the use of offsets to minimize header parsing,
padding techniques SHOULD be considered when designing NVO3 padding techniques SHOULD be considered when designing NVO3
encapsulation types. encapsulation types.
The NV03 encapsulation/decapsulation processing in software-based The NV03 encapsulation/decapsulation processing in software-based
NVEs SHOULD make use of hardware assist provided by NICs in order to NVEs SHOULD make use of hardware assist provided by NICs in order to
speed up packet processing. speed up packet processing.
3.8.2. NVE location trade-offs 3.7.2. NVE location trade-offs
In the case of DC traffic, traffic originated from a VM is native In the case of DC traffic, traffic originated from a VM is native
Ethernet traffic. This traffic can be switched by a local VM switch Ethernet traffic. This traffic can be switched by a local VM switch
or ToR switch and then by a DC gateway. The NVE function can be or ToR switch and then by a DC gateway. The NVE function can be
embedded within any of these elements. embedded within any of these elements.
The NVE function can be supported in various DC network elements The NVE function can be supported in various DC network elements
such as a VM, VM switch, ToR switch or DC GW. such as a VM, VM switch, ToR switch or DC GW.
The following criteria SHOULD be considered when deciding where the The following criteria SHOULD be considered when deciding where the
skipping to change at page 18, line 41 skipping to change at page 17, line 45
[RFC6391] Bryant, S. et al, "Flow-Aware Transport of Pseudowires [RFC6391] Bryant, S. et al, "Flow-Aware Transport of Pseudowires
over an MPLS Packet Switched Network", RFC6391, November over an MPLS Packet Switched Network", RFC6391, November
2011 2011
7. Acknowledgments 7. Acknowledgments
In addition to the authors the following people have contributed to In addition to the authors the following people have contributed to
this document: this document:
Shane Amante, Dimitrios Stiliadis, Rotem Salomonovitch, Larry Shane Amante, David Black, Dimitrios Stiliadis, Rotem Salomonovitch,
Kreeger, and Eric Gray. Larry Kreeger, Eric Gray and Erik Nordmark.
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
Authors' Addresses Authors' Addresses
Nabil Bitar Nabil Bitar
Verizon Verizon
40 Sylvan Road 40 Sylvan Road
Waltham, MA 02145 Waltham, MA 02145
Email: nabil.bitar@verizon.com Email: nabil.bitar@verizon.com
 End of changes. 39 change blocks. 
166 lines changed or deleted 132 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/