draft-ietf-nvo3-dataplane-requirements-01.txt   draft-ietf-nvo3-dataplane-requirements-02.txt 
Internet Engineering Task Force Nabil Bitar Internet Engineering Task Force Nabil Bitar
Internet Draft Verizon Internet Draft Verizon
Intended status: Informational Intended status: Informational
Expires: January 2014 Marc Lasserre Expires: May 2014 Marc Lasserre
Florin Balus Florin Balus
Alcatel-Lucent Alcatel-Lucent
Thomas Morin Thomas Morin
France Telecom Orange France Telecom Orange
Lizhong Jin Lizhong Jin
Bhumip Khasnabish Bhumip Khasnabish
ZTE ZTE
July 1, 2013 November 12, 2013
NVO3 Data Plane Requirements NVO3 Data Plane Requirements
draft-ietf-nvo3-dataplane-requirements-01.txt draft-ietf-nvo3-dataplane-requirements-02.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 1, 2013. This Internet-Draft will expire on May 12, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 24 skipping to change at page 2, line 24
Abstract Abstract
Several IETF drafts relate to the use of overlay networks to support Several IETF drafts relate to the use of overlay networks to support
large scale virtual data centers. This draft provides a list of data large scale virtual data centers. This draft provides a list of data
plane requirements for Network Virtualization over L3 (NVO3) that plane requirements for Network Virtualization over L3 (NVO3) that
have to be addressed in solutions documents. have to be addressed in solutions documents.
Table of Contents Table of Contents
1. Introduction................................................3 1. Introduction..................................................3
1.1. Conventions used in this document.......................3 1.1. Conventions used in this document........................3
1.2. General terminology.....................................3 1.2. General terminology......................................3
2. Data Path Overview..........................................4 2. Data Path Overview............................................4
3. Data Plane Requirements......................................5 3. Data Plane Requirements.......................................5
3.1. Virtual Access Points (VAPs)............................5 3.1. Virtual Access Points (VAPs).............................5
3.2. Virtual Network Instance (VNI)..........................5 3.2. Virtual Network Instance (VNI)...........................5
3.2.1. L2 VNI...............................................5 3.2.1. L2 VNI.................................................5
3.2.2. L3 VNI...............................................6 3.2.2. L3 VNI.................................................6
3.3. Overlay Module.........................................7 3.3. Overlay Module...........................................7
3.3.1. NVO3 overlay header...................................8 3.3.1. NVO3 overlay header....................................8
3.3.1.1. Virtual Network Context Identification..............8 3.3.1.1. Virtual Network Context Identification...............8
3.3.1.2. Service QoS identifier..............................8 3.3.1.2. Service QoS identifier...............................8
3.3.2. Tunneling function....................................9 3.3.2. Tunneling function.....................................9
3.3.2.1. LAG and ECMP.......................................10 3.3.2.1. LAG and ECMP........................................10
3.3.2.2. DiffServ and ECN marking...........................10 3.3.2.2. DiffServ and ECN marking............................10
3.3.2.3. Handling of BUM traffic............................11 3.3.2.3. Handling of BUM traffic.............................11
3.4. External NVO3 connectivity.............................11 3.4. External NVO3 connectivity..............................11
3.4.1. GW Types............................................12 3.4.1. GW Types..............................................12
3.4.1.1. VPN and Internet GWs...............................12 3.4.1.1. VPN and Internet GWs................................12
3.4.1.2. Inter-DC GW........................................12 3.4.1.2. Inter-DC GW.........................................12
3.4.1.3. Intra-DC gateways..................................12 3.4.1.3. Intra-DC gateways...................................12
3.4.2. Path optimality between NVEs and Gateways............12 3.4.2. Path optimality between NVEs and Gateways.............12
3.4.2.1. Triangular Routing Issues (Traffic Tromboning)......13 3.4.2.1. Load-balancing......................................14
3.5. Path MTU..............................................14 3.4.2.2. Triangular Routing Issues (a.k.a. Traffic Tromboning)14
3.6. Hierarchical NVE.......................................15 3.5. Path MTU................................................14
3.7. NVE Multi-Homing Requirements..........................15 3.6. Hierarchical NVE........................................15
3.8. OAM...................................................16 3.7. NVE Multi-Homing Requirements...........................15
3.9. Other considerations...................................16 3.8. Other considerations....................................16
3.9.1. Data Plane Optimizations.............................16 3.8.1. Data Plane Optimizations..............................16
3.9.2. NVE location trade-offs..............................17 3.8.2. NVE location trade-offs...............................16
4. Security Considerations.....................................17 4. Security Considerations......................................17
5. IANA Considerations........................................17 5. IANA Considerations..........................................17
6. References.................................................18 6. References...................................................17
6.1. Normative References...................................18 6.1. Normative References....................................17
6.2. Informative References.................................18 6.2. Informative References..................................17
7. Acknowledgments............................................19 7. Acknowledgments..............................................18
1. Introduction 1. Introduction
1.1. Conventions used in this document 1.1. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [RFC2119]. document are to be interpreted as described in RFC-2119 [RFC2119].
In this document, these words will appear with that interpretation In this document, these words will appear with that interpretation
skipping to change at page 6, line 4 skipping to change at page 6, line 4
tenants. tenants.
There are different VNI types differentiated by the virtual network There are different VNI types differentiated by the virtual network
service they provide to Tenant Systems. Network virtualization can service they provide to Tenant Systems. Network virtualization can
be provided by L2 and/or L3 VNIs. be provided by L2 and/or L3 VNIs.
3.2.1. L2 VNI 3.2.1. L2 VNI
An L2 VNI MUST provide an emulated Ethernet multipoint service as if An L2 VNI MUST provide an emulated Ethernet multipoint service as if
Tenant Systems are interconnected by a bridge (but instead by using Tenant Systems are interconnected by a bridge (but instead by using
a set of NVO3 tunnels). The emulated bridge MAY be 802.1Q enabled a set of NVO3 tunnels). The emulated bridge could be 802.1Q enabled
(allowing use of VLAN tags as a VAP). An L2 VNI provides per tenant (allowing use of VLAN tags as a VAP). An L2 VNI provides per tenant
virtual switching instance with MAC addressing isolation and L3 virtual switching instance with MAC addressing isolation and L3
tunneling. Loop avoidance capability MUST be provided. tunneling. Loop avoidance capability MUST be provided.
Forwarding table entries provide mapping information between tenant Forwarding table entries provide mapping information between tenant
system MAC addresses and VAPs on directly connected VNIs and L3 system MAC addresses and VAPs on directly connected VNIs and L3
tunnel destination addresses over the overlay. Such entries MAY be tunnel destination addresses over the overlay. Such entries could be
populated by a control or management plane, or via data plane. populated by a control or management plane, or via data plane.
In the absence of a management or control plane, data plane learning By default, data plane learning MUST be used to populate forwarding
MUST be used to populate forwarding tables. As frames arrive from tables. As frames arrive from VAPs or from overlay tunnels, standard
VAPs or from overlay tunnels, standard MAC learning procedures are MAC learning procedures are used: The tenant system source MAC
used: The tenant system source MAC address is learned against the address is learned against the VAP or the NVO3 tunneling
VAP or the NVO3 tunneling encapsulation source address on which the encapsulation source address on which the frame arrived. This
frame arrived. This implies that unknown unicast traffic be flooded implies that unknown unicast traffic will be flooded (i.e.
i.e. broadcast. broadcast).
When flooding is required, either to deliver unknown unicast, or When flooding is required, either to deliver unknown unicast, or
broadcast or multicast traffic, the NVE MUST either support ingress broadcast or multicast traffic, the NVE MUST either support ingress
replication or multicast. In this latter case, the NVE MUST have one replication or multicast.
or more multicast trees that can be used by local VNIs for flooding
to NVEs belonging to the same VN. For each VNI, there is one When using multicast, the NVE MUST have one or more multicast trees
flooding tree, and a multicast tree may be dedicated per VNI or that can be used by local VNIs for flooding to NVEs belonging to the
shared across VNIs. In such cases, multiple VNIs MAY share the same same VN. For each VNI, there is at least one flooding tree used for
default flooding tree. The flooding tree is equivalent with a Broadcast, Unknown Unicast and Multicast forwarding. This tree MAY
be shared across VNIs. The flooding tree is equivalent with a
multicast (*,G) construct where all the NVEs for which the multicast (*,G) construct where all the NVEs for which the
corresponding VNI is instantiated are members. The multicast tree corresponding VNI is instantiated are members.
MAY be established automatically via routing and signaling or pre-
provisioned.
When tenant multicast is supported, it SHOULD also be possible to When tenant multicast is supported, it SHOULD also be possible to
select whether the NVE provides optimized multicast trees inside the select whether the NVE provides optimized multicast trees inside the
VNI for individual tenant multicast groups or whether the default VNI for individual tenant multicast groups or whether the default
VNI flooding tree is used. If the former option is selected the VNI VNI flooding tree is used. If the former option is selected the VNI
SHOULD be able to snoop IGMP/MLD messages in order to efficiently SHOULD be able to snoop IGMP/MLD messages in order to efficiently
join/prune Tenant System from multicast trees. join/prune Tenant System from multicast trees.
3.2.2. L3 VNI 3.2.2. L3 VNI
skipping to change at page 7, line 22 skipping to change at page 7, line 21
L2 and L3 VNIs can be deployed in isolation or in combination to L2 and L3 VNIs can be deployed in isolation or in combination to
optimize traffic flows per tenant across the overlay network. For optimize traffic flows per tenant across the overlay network. For
example, an L2 VNI may be configured across a number of NVEs to example, an L2 VNI may be configured across a number of NVEs to
offer L2 multi-point service connectivity while a L3 VNI can be co- offer L2 multi-point service connectivity while a L3 VNI can be co-
located to offer local routing capabilities and gateway located to offer local routing capabilities and gateway
functionality. In addition, integrated routing and bridging per functionality. In addition, integrated routing and bridging per
tenant MAY be supported on an NVE. An instantiation of such service tenant MAY be supported on an NVE. An instantiation of such service
may be realized by interconnecting an L2 VNI as access to an L3 VNI may be realized by interconnecting an L2 VNI as access to an L3 VNI
on the NVE. on the NVE.
The L3 VNI does not require support for Broadcast and Unknown When multicast is supported, it MAY be possible to select whether
Unicast traffic. The L3 VNI MAY provide support for customer the NVE provides optimized multicast trees inside the VNI for
multicast groups. When multicast is supported, it SHOULD be possible individual tenant multicast groups or whether a default VNI
to select whether the NVE provides optimized multicast trees inside multicasting tree, where all the NVEs of the corresponding VNI are
the VNI for individual tenant multicast groups or whether a default members, is used.
VNI multicasting tree, where all the NVEs of the corresponding VNI
are members, is used.
3.3. Overlay Module 3.3. Overlay Module
The overlay module performs a number of functions related to NVO3 The overlay module performs a number of functions related to NVO3
header and tunnel processing. header and tunnel processing.
The following figure shows a generic NVO3 encapsulated frame: The following figure shows a generic NVO3 encapsulated frame:
+--------------------------+ +--------------------------+
| Tenant Frame | | Tenant Frame |
skipping to change at page 8, line 21 skipping to change at page 8, line 18
this packet. this packet.
. Outer underlay header: Can be either IP or MPLS . Outer underlay header: Can be either IP or MPLS
. Outer link layer header: Header specific to the physical . Outer link layer header: Header specific to the physical
transmission link used transmission link used
3.3.1. NVO3 overlay header 3.3.1. NVO3 overlay header
An NVO3 overlay header MUST be included after the underlay tunnel An NVO3 overlay header MUST be included after the underlay tunnel
header when forwarding tenant traffic. Note that this information header when forwarding tenant traffic.
can be carried within existing protocol headers (when overloading of
specific fields is possible) or within a separate header. Note that this information can be carried within existing protocol
headers (when overloading of specific fields is possible) or within
a separate header.
3.3.1.1. Virtual Network Context Identification 3.3.1.1. Virtual Network Context Identification
The overlay encapsulation header MUST contain a field which allows The overlay encapsulation header MUST contain a field which allows
the encapsulated frame to be delivered to the appropriate virtual the encapsulated frame to be delivered to the appropriate virtual
network endpoint by the egress NVE. The egress NVE uses this field network endpoint by the egress NVE.
to determine the appropriate virtual network context in which to
process the packet. This field MAY be an explicit, unique (to the
administrative domain) virtual network identifier (VNID) or MAY
express the necessary context information in other ways (e.g. a
locally significant identifier).
It SHOULD be aligned on a 32-bit boundary so as to make it The egress NVE uses this field to determine the appropriate virtual
efficiently processable by the data path. It MUST be distributable network context in which to process the packet. This field MAY be an
by a control-plane or configured via a management plane. explicit, unique (to the administrative domain) virtual network
identifier (VNID) or MAY express the necessary context information
in other ways (e.g. a locally significant identifier).
In the case of a global identifier, this field MUST be large enough In the case of a global identifier, this field MUST be large enough
to scale to 100's of thousands of virtual networks. Note that there to scale to 100's of thousands of virtual networks. Note that there
is no such constraint when using a local identifier. is typically no such constraint when using a local identifier.
3.3.1.2. Service QoS identifier 3.3.1.2. Service QoS identifier
Traffic flows originating from different applications could rely on Traffic flows originating from different applications could rely on
differentiated forwarding treatment to meet end-to-end availability differentiated forwarding treatment to meet end-to-end availability
and performance objectives. Such applications may span across one or and performance objectives. Such applications may span across one or
more overlay networks. To enable such treatment, support for more overlay networks. To enable such treatment, support for
multiple Classes of Service across or between overlay networks MAY multiple Classes of Service across or between overlay networks MAY
be required. be required.
skipping to change at page 10, line 7 skipping to change at page 10, line 7
ISID tags and MPLS TC bits in the VPN labels. ISID tags and MPLS TC bits in the VPN labels.
3.3.2. Tunneling function 3.3.2. Tunneling function
This section describes the underlay tunneling requirements. From an This section describes the underlay tunneling requirements. From an
encapsulation perspective, IPv4 or IPv6 MUST be supported, both IPv4 encapsulation perspective, IPv4 or IPv6 MUST be supported, both IPv4
and IPv6 SHOULD be supported, MPLS tunneling MAY be supported. and IPv6 SHOULD be supported, MPLS tunneling MAY be supported.
3.3.2.1. LAG and ECMP 3.3.2.1. LAG and ECMP
For performance reasons, multipath over LAG and ECMP paths SHOULD be For performance reasons, multipath over LAG and ECMP paths MAY be
supported. supported.
LAG (Link Aggregation Group) [IEEE 802.1AX-2008] and ECMP (Equal LAG (Link Aggregation Group) [IEEE 802.1AX-2008] and ECMP (Equal
Cost Multi Path) are commonly used techniques to perform load- Cost Multi Path) are commonly used techniques to perform load-
balancing of microflows over a set of a parallel links either at balancing of microflows over a set of a parallel links either at
Layer-2 (LAG) or Layer-3 (ECMP). Existing deployed hardware Layer-2 (LAG) or Layer-3 (ECMP). Existing deployed hardware
implementations of LAG and ECMP uses a hash of various fields in the implementations of LAG and ECMP uses a hash of various fields in the
encapsulation (outermost) header(s) (e.g. source and destination MAC encapsulation (outermost) header(s) (e.g. source and destination MAC
addresses for non-IP traffic, source and destination IP addresses, addresses for non-IP traffic, source and destination IP addresses,
L4 protocol, L4 source and destination port numbers, etc). L4 protocol, L4 source and destination port numbers, etc).
Furthermore, hardware deployed for the underlay network(s) will be Furthermore, hardware deployed for the underlay network(s) will be
most often unaware of the carried, innermost L2 frames or L3 packets most often unaware of the carried, innermost L2 frames or L3 packets
transmitted by the TS. Thus, in order to perform fine-grained load- transmitted by the TS.
balancing over LAG and ECMP paths in the underlying network, the
encapsulation MUST result in sufficient entropy to exercise all Thus, in order to perform fine-grained load-balancing over LAG and
paths through several LAG/ECMP hops. The entropy information MAY be ECMP paths in the underlying network, the encapsulation MUST result
inferred from the NVO3 overlay header or underlay header. If the in sufficient entropy to exercise all paths through several LAG/ECMP
overlay protocol does not support the necessary entropy information hops.
or the switches/routers in the underlay do not support parsing of
the additional entropy information in the overlay header, underlay The entropy information can be inferred from the NVO3 overlay header
switches and routers should be programmable, i.e. select the or underlay header. If the overlay protocol does not support the
appropriate fields in the underlay header for hash calculation based necessary entropy information or the switches/routers in the
on the type of overlay header. underlay do not support parsing of the additional entropy
information in the overlay header, underlay switches and routers
should be programmable, i.e. select the appropriate fields in the
underlay header for hash calculation based on the type of overlay
header.
All packets that belong to a specific flow MUST follow the same path All packets that belong to a specific flow MUST follow the same path
in order to prevent packet re-ordering. This is typically achieved in order to prevent packet re-ordering. This is typically achieved
by ensuring that the fields used for hashing are identical for a by ensuring that the fields used for hashing are identical for a
given flow. given flow.
All paths available to the overlay network SHOULD be used The goal is for all paths available to the overlay network to be
efficiently. Different flows SHOULD be distributed as evenly as used efficiently. Different flows should be distributed as evenly as
possible across multiple underlay network paths. For instance, this possible across multiple underlay network paths. For instance, this
can be achieved by ensuring that some fields used for hashing are can be achieved by ensuring that some fields used for hashing are
randomly generated. randomly generated.
3.3.2.2. DiffServ and ECN marking 3.3.2.2. DiffServ and ECN marking
When traffic is encapsulated in a tunnel header, there are numerous When traffic is encapsulated in a tunnel header, there are numerous
options as to how the Diffserv Code-Point (DSCP) and Explicit options as to how the Diffserv Code-Point (DSCP) and Explicit
Congestion Notification (ECN) markings are set in the outer header Congestion Notification (ECN) markings are set in the outer header
and propagated to the inner header on decapsulation. and propagated to the inner header on decapsulation.
[RFC2983] defines two modes for mapping the DSCP markings from inner [RFC2983] defines two modes for mapping the DSCP markings from inner
to outer headers and vice versa. The Uniform model copies the inner to outer headers and vice versa. The Uniform model copies the inner
DSCP marking to the outer header on tunnel ingress, and copies that DSCP marking to the outer header on tunnel ingress, and copies that
outer header value back to the inner header at tunnel egress. The outer header value back to the inner header at tunnel egress. The
Pipe model sets the DSCP value to some value based on local policy Pipe model sets the DSCP value to some value based on local policy
at ingress and does not modify the inner header on egress. Both at ingress and does not modify the inner header on egress. Both
models SHOULD be supported. models SHOULD be supported.
ECN marking MUST be performed according to [RFC6040] which describes [RFC6040] defines ECN marking and processing for IP tunnels.
the correct ECN behavior for IP tunnels.
3.3.2.3. Handling of BUM traffic 3.3.2.3. Handling of BUM traffic
NVO3 data plane support for either ingress replication or point-to- NVO3 data plane support for either ingress replication or point-to-
multipoint tunnels is required to send traffic destined to multiple multipoint tunnels is required to send traffic destined to multiple
locations on a per-VNI basis (e.g. L2/L3 multicast traffic, L2 locations on a per-VNI basis (e.g. L2/L3 multicast traffic, L2
broadcast and unknown unicast traffic). It is possible that both broadcast and unknown unicast traffic). It is possible that both
methods be used simultaneously. methods be used simultaneously.
There is a bandwidth vs state trade-off between the two approaches. There is a bandwidth vs state trade-off between the two approaches.
User-definable knobs MUST be provided to select which method(s) gets User-configurable knobs MUST be provided to select which method(s)
used based upon the amount of replication required (i.e. the number gets used based upon the amount of replication required (i.e. the
of hosts per group), the amount of multicast state to maintain, the number of hosts per group), the amount of multicast state to
duration of multicast flows and the scalability of multicast maintain, the duration of multicast flows and the scalability of
protocols. multicast protocols.
When ingress replication is used, NVEs MUST track for each VNI the When ingress replication is used, NVEs MUST maintain for each VNI
related tunnel endpoints to which it needs to replicate the frame. the related tunnel endpoints to which it needs to replicate the
frame.
For point-to-multipoint tunnels, the bandwidth efficiency is For point-to-multipoint tunnels, the bandwidth efficiency is
increased at the cost of more state in the Core nodes. The ability increased at the cost of more state in the Core nodes. The ability
to auto-discover or pre-provision the mapping between VNI multicast to auto-discover or pre-provision the mapping between VNI multicast
trees to related tunnel endpoints at the NVE and/or throughout the trees to related tunnel endpoints at the NVE and/or throughout the
core SHOULD be supported. core SHOULD be supported.
3.4. External NVO3 connectivity 3.4. External NVO3 connectivity
NVO3 services MUST interoperate with current VPN and Internet NVO3 services MUST interoperate with current VPN and Internet
skipping to change at page 12, line 39 skipping to change at page 12, line 43
3.4.1.3. Intra-DC gateways 3.4.1.3. Intra-DC gateways
Even within one DC there may be End Devices that do not support NVO3 Even within one DC there may be End Devices that do not support NVO3
encapsulation, for example bare metal servers, hardware appliances encapsulation, for example bare metal servers, hardware appliances
and storage. A gateway device, e.g. a ToR, is required to translate and storage. A gateway device, e.g. a ToR, is required to translate
the NVO3 to Ethernet VLAN encapsulation. the NVO3 to Ethernet VLAN encapsulation.
3.4.2. Path optimality between NVEs and Gateways 3.4.2. Path optimality between NVEs and Gateways
Within the NVO3 overlay, a default assumption is that NVO3 traffic Within an NVO3 overlay, a default assumption is that NVO3 traffic
will be equally load-balanced across the underlying network will be equally load-balanced across the underlying network
consisting of LAG and/or ECMP paths. This assumption is valid only consisting of LAG and/or ECMP paths. This assumption is valid only
as long as: a) all traffic is load-balanced equally among each of as long as: a) all traffic is load-balanced equally among each of
the component-links and paths; and, b) each of the component- the component-links and paths; and, b) each of the component-
links/paths is of identical capacity. During the course of normal links/paths is of identical capacity. During the course of normal
operation of the underlying network, it is possible that one, or operation of the underlying network, it is possible that one, or
more, of the component-links/paths of a LAG may be taken out-of- more, of the component-links/paths of a LAG may be taken out-of-
service in order to be repaired, e.g.: due to hardware failure of service in order to be repaired, e.g.: due to hardware failure of
cabling, optics, etc. In such cases, the administrator should cabling, optics, etc. In such cases, the administrator should
configure the underlying network such that an entire LAG bundle in configure the underlying network such that an entire LAG bundle in
skipping to change at page 13, line 32 skipping to change at page 13, line 37
On the other hand, for Inter-DC and DC to External Network cases On the other hand, for Inter-DC and DC to External Network cases
that use a WAN, the costs of the underlying network and/or service that use a WAN, the costs of the underlying network and/or service
(e.g.: IPVPN service) are more expensive; therefore, there is a (e.g.: IPVPN service) are more expensive; therefore, there is a
requirement on administrators to both: a) ensure high availability requirement on administrators to both: a) ensure high availability
(active-backup failover or active-active load-balancing); and, b) (active-backup failover or active-active load-balancing); and, b)
maintaining substantial utilization of the WAN transport capacity at maintaining substantial utilization of the WAN transport capacity at
nearly all times, particularly in the case of active-active load- nearly all times, particularly in the case of active-active load-
balancing. With respect to the dataplane requirements of NVO3 balancing. With respect to the dataplane requirements of NVO3
solutions, in the case of active-backup fail-over, all of the solutions, in the case of active-backup fail-over, all of the
ingress NVE's MUST dynamically adapt to the failure of an active NVE ingress NVE's need to dynamically adapt to the failure of an active
GW when the backup NVE GW announces itself into the NVO3 overlay NVE GW when the backup NVE GW announces itself into the NVO3 overlay
immediately following a failure of the previously active NVE GW and immediately following a failure of the previously active NVE GW and
update their forwarding tables accordingly, (e.g.: perhaps through update their forwarding tables accordingly, (e.g.: perhaps through
dataplane learning and/or translation of a gratuitous ARP, IPv6 dataplane learning and/or translation of a gratuitous ARP, IPv6
Router Advertisement, etc.) Note that active-backup fail-over could Router Advertisement). Note that active-backup fail-over could be
be used to accomplish a crude form of load-balancing by, for used to accomplish a crude form of load-balancing by, for example,
example, manually configuring each tenant to use a different NVE GW, manually configuring each tenant to use a different NVE GW, in a
in a round-robin fashion. On the other hand, with respect to active- round-robin fashion.
active load-balancing across physically separate NVE GW's (e.g.:
two, separate chassis) an NVO3 solution SHOULD support forwarding
tables that can simultaneously map a single egress NVE to more than
one NVO3 tunnels. The granularity of such mappings, in both active-
backup and active-active, MUST be unique to each tenant.
3.4.2.1. Triangular Routing Issues (Traffic Tromboning) 3.4.2.1. Load-balancing
When using active-active load-balancing across physically separate
NVE GW's (e.g.: two, separate chassis) an NVO3 solution SHOULD
support forwarding tables that can simultaneously map a single
egress NVE to more than one NVO3 tunnels. The granularity of such
mappings, in both active-backup and active-active, MUST be specific
to each tenant.
3.4.2.2. Triangular Routing Issues (a.k.a. Traffic Tromboning)
L2/ELAN over NVO3 service may span multiple racks distributed across L2/ELAN over NVO3 service may span multiple racks distributed across
different DC regions. Multiple ELANs belonging to one tenant may be different DC regions. Multiple ELANs belonging to one tenant may be
interconnected or connected to the outside world through multiple interconnected or connected to the outside world through multiple
Router/VRF gateways distributed throughout the DC regions. In this Router/VRF gateways distributed throughout the DC regions. In this
scenario, without aid from an NVO3 or other type of solution, scenario, without aid from an NVO3 or other type of solution,
traffic from an ingress NVE destined to External gateways will take traffic from an ingress NVE destined to External gateways will take
a non-optimal path that will result in higher latency and costs, a non-optimal path that will result in higher latency and costs,
(since it is using more expensive resources of a WAN). In the case (since it is using more expensive resources of a WAN). In the case
of traffic from an IP/MPLS network destined toward the entrance to of traffic from an IP/MPLS network destined toward the entrance to
an NVO3 overlay, well-known IP routing techniques MAY be used to an NVO3 overlay, well-known IP routing techniques MAY be used to
optimize traffic into the NVO3 overlay, (at the expense of optimize traffic into the NVO3 overlay, (at the expense of
additional routes in the IP/MPLS network). In summary, these issues additional routes in the IP/MPLS network). In summary, these issues
are well known as triangular routing. are well known as triangular routing.
Procedures for gateway selection to avoid triangular routing issues Procedures for gateway selection to avoid triangular routing issues
SHOULD be provided. The details of such procedures are, most likely, SHOULD be provided.
part of the NVO3 Management and/or Control Plane requirements and,
thus, out of scope of this document. However, a key requirement on The details of such procedures are, most likely, part of the NVO3
the dataplane of any NVO3 solution to avoid triangular routing is Management and/or Control Plane requirements and, thus, out of scope
stated above, in Section 3.4.2, with respect to active-active load- of this document. However, a key requirement on the dataplane of any
balancing. More specifically, an NVO3 solution SHOULD support NVO3 solution to avoid triangular routing is stated above, in
forwarding tables that can simultaneously map a single egress NVE to Section 3.4.2, with respect to active-active load-balancing. More
more than one NVO3 tunnels. The expectation is that, through the specifically, an NVO3 solution SHOULD support forwarding tables that
Control and/or Management Planes, this mapping information MAY be can simultaneously map a single egress NVE to more than one NVO3
dynamically manipulated to, for example, provide the closest tunnel.
geographic and/or topological exit point (egress NVE) for each
ingress NVE. The expectation is that, through the Control and/or Management
Planes, this mapping information may be dynamically manipulated to,
for example, provide the closest geographic and/or topological exit
point (egress NVE) for each ingress NVE.
3.5. Path MTU 3.5. Path MTU
The tunnel overlay header can cause the MTU of the path to the The tunnel overlay header can cause the MTU of the path to the
egress tunnel endpoint to be exceeded. egress tunnel endpoint to be exceeded.
IP fragmentation SHOULD be avoided for performance reasons. IP fragmentation SHOULD be avoided for performance reasons.
The interface MTU as seen by a Tenant System SHOULD be adjusted such The interface MTU as seen by a Tenant System SHOULD be adjusted such
that no fragmentation is needed. This can be achieved by that no fragmentation is needed. This can be achieved by
skipping to change at page 15, line 14 skipping to change at page 15, line 30
o The underlay network MAY be designed in such a way that the MTU o The underlay network MAY be designed in such a way that the MTU
can accommodate the extra tunnel overhead. can accommodate the extra tunnel overhead.
3.6. Hierarchical NVE 3.6. Hierarchical NVE
It might be desirable to support the concept of hierarchical NVEs, It might be desirable to support the concept of hierarchical NVEs,
such as spoke NVEs and hub NVEs, in order to address possible NVE such as spoke NVEs and hub NVEs, in order to address possible NVE
performance limitations and service connectivity optimizations. performance limitations and service connectivity optimizations.
For instance, spoke NVE functionality MAY be used when processing For instance, spoke NVE functionality may be used when processing
capabilities are limited. A hub NVE would provide additional data capabilities are limited. A hub NVE would provide additional data
processing capabilities such as packet replication. processing capabilities such as packet replication.
NVEs can be either connected in an any-to-any or hub and spoke NVEs can be either connected in an any-to-any or hub and spoke
topology on a per VNI basis. topology on a per VNI basis.
3.7. NVE Multi-Homing Requirements 3.7. NVE Multi-Homing Requirements
Multi-homing techniques SHOULD be used to increase the reliability Multi-homing techniques SHOULD be used to increase the reliability
of an nvo3 network. It is also important to ensure that physical of an nvo3 network. It is also important to ensure that physical
skipping to change at page 16, line 5 skipping to change at page 16, line 18
system is co-located with an NVE, IP routing can be relied upon to system is co-located with an NVE, IP routing can be relied upon to
handle routing over diverse links to TORs. handle routing over diverse links to TORs.
External connectivity MAY be handled by two or more nvo3 gateways. External connectivity MAY be handled by two or more nvo3 gateways.
Each gateway is connected to a different domain (e.g. ISP) and runs Each gateway is connected to a different domain (e.g. ISP) and runs
BGP multi-homing. They serve as an access point to external networks BGP multi-homing. They serve as an access point to external networks
such as VPNs or the Internet. When a connection to an upstream such as VPNs or the Internet. When a connection to an upstream
router is lost, the alternative connection is used and the failed router is lost, the alternative connection is used and the failed
route withdrawn. route withdrawn.
3.8. OAM 3.8. Other considerations
NVE MAY be able to originate/terminate OAM messages for connectivity
verification, performance monitoring, statistic gathering and fault
isolation. Depending on configuration, NVEs SHOULD be able to
process or transparently tunnel OAM messages, as well as supporting
alarm propagation capabilities.
Given the critical requirement to load-balance NVO3 encapsulated
packets over LAG and ECMP paths, it will be equally critical to
ensure existing and/or new OAM tools allow NVE administrators to
proactively and/or reactively monitor the health of various
component-links that comprise both LAG and ECMP paths carrying NVO3
encapsulated packets. For example, it will be important that such
OAM tools allow NVE administrators to reveal the set of underlying
network hops (topology) in order that the underlying network
administrators can use this information to quickly perform fault
isolation and restore the underlying network.
The NVE MUST provide the ability to reveal the set of ECMP and/or
LAG paths used by NVO3 encapsulated packets in the underlying
network from an ingress NVE to egress NVE. The NVE MUST provide the
ability to provide a "ping"-like functionality that can be used to
determine the health (liveness) of remote NVE's or their VNI's. The
NVE SHOULD provide a "ping"-like functionality to more expeditiously
aid in troubleshooting performance problems, i.e.: blackholing or
other types of congestion occurring in the underlying network, for
NVO3 encapsulated packets carried over LAG and/or ECMP paths.
3.9. Other considerations
3.9.1. Data Plane Optimizations 3.8.1. Data Plane Optimizations
Data plane forwarding and encapsulation choices SHOULD consider the Data plane forwarding and encapsulation choices SHOULD consider the
limitation of possible NVE implementations, specifically in software limitation of possible NVE implementations, specifically in software
based implementations (e.g. servers running VSwitches) based implementations (e.g. servers running VSwitches)
NVE SHOULD provide efficient processing of traffic. For instance, NVE SHOULD provide efficient processing of traffic. For instance,
packet alignment, the use of offsets to minimize header parsing, packet alignment, the use of offsets to minimize header parsing,
padding techniques SHOULD be considered when designing NVO3 padding techniques SHOULD be considered when designing NVO3
encapsulation types. encapsulation types.
The NV03 encapsulation/decapsulation processing in software-based The NV03 encapsulation/decapsulation processing in software-based
NVEs SHOULD make use of hardware assist provided by NICs in order to NVEs SHOULD make use of hardware assist provided by NICs in order to
speed up packet processing. speed up packet processing.
3.9.2. NVE location trade-offs 3.8.2. NVE location trade-offs
In the case of DC traffic, traffic originated from a VM is native In the case of DC traffic, traffic originated from a VM is native
Ethernet traffic. This traffic can be switched by a local VM switch Ethernet traffic. This traffic can be switched by a local VM switch
or ToR switch and then by a DC gateway. The NVE function can be or ToR switch and then by a DC gateway. The NVE function can be
embedded within any of these elements. embedded within any of these elements.
The NVE function can be supported in various DC network elements The NVE function can be supported in various DC network elements
such as a VM, VM switch, ToR switch or DC GW. such as a VM, VM switch, ToR switch or DC GW.
The following criteria SHOULD be considered when deciding where the The following criteria SHOULD be considered when deciding where the
skipping to change at page 19, line 10 skipping to change at page 18, line 41
[RFC6391] Bryant, S. et al, "Flow-Aware Transport of Pseudowires [RFC6391] Bryant, S. et al, "Flow-Aware Transport of Pseudowires
over an MPLS Packet Switched Network", RFC6391, November over an MPLS Packet Switched Network", RFC6391, November
2011 2011
7. Acknowledgments 7. Acknowledgments
In addition to the authors the following people have contributed to In addition to the authors the following people have contributed to
this document: this document:
Shane Amante, Level3 Shane Amante, Dimitrios Stiliadis, Rotem Salomonovitch, Larry
Kreeger, and Eric Gray.
Dimitrios Stiliadis, Rotem Salomonovitch, Alcatel-Lucent
Larry Kreeger, Cisco
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
Authors' Addresses Authors' Addresses
Nabil Bitar Nabil Bitar
Verizon Verizon
40 Sylvan Road 40 Sylvan Road
Waltham, MA 02145 Waltham, MA 02145
Email: nabil.bitar@verizon.com Email: nabil.bitar@verizon.com
 End of changes. 31 change blocks. 
165 lines changed or deleted 140 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/