draft-ietf-nvo3-framework-06.txt   draft-ietf-nvo3-framework-07.txt 
Internet Engineering Task Force Marc Lasserre Internet Engineering Task Force Marc Lasserre
Internet Draft Florin Balus Internet Draft Florin Balus
Intended status: Informational Alcatel-Lucent Intended status: Informational Alcatel-Lucent
Expires: Nov 2014 Expires: Dec 2014
Thomas Morin Thomas Morin
France Telecom Orange France Telecom Orange
Nabil Bitar Nabil Bitar
Verizon Verizon
Yakov Rekhter Yakov Rekhter
Juniper Juniper
May 21, 2014 June 5, 2014
Framework for DC Network Virtualization Framework for DC Network Virtualization
draft-ietf-nvo3-framework-06.txt draft-ietf-nvo3-framework-07.txt
Abstract Abstract
This document provides a framework for Network Virtualization This document provides a framework for Data Center (DC) Network
Overlays (NVO3) and it defines a reference model along with logical Virtualization Overlays (NVO3) and it defines a reference model
components required to design a solution. along with logical components required to design a solution.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on Nov 20, 2014. This Internet-Draft will expire on Dec 5, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 5 skipping to change at page 3, line 5
3.1.5.3. Address advertisement and tunnel mapping........15 3.1.5.3. Address advertisement and tunnel mapping........15
3.1.5.4. Overlay Tunneling...............................15 3.1.5.4. Overlay Tunneling...............................15
3.2. Multi-homing...........................................16 3.2. Multi-homing...........................................16
3.3. VM Mobility............................................17 3.3. VM Mobility............................................17
4. Key aspects of overlay networks.............................17 4. Key aspects of overlay networks.............................17
4.1. Pros & Cons............................................17 4.1. Pros & Cons............................................17
4.2. Overlay issues to consider.............................19 4.2. Overlay issues to consider.............................19
4.2.1. Data plane vs Control plane driven................19 4.2.1. Data plane vs Control plane driven................19
4.2.2. Coordination between data plane and control plane.19 4.2.2. Coordination between data plane and control plane.19
4.2.3. Handling Broadcast, Unknown Unicast and Multicast (BUM) 4.2.3. Handling Broadcast, Unknown Unicast and Multicast (BUM)
traffic..................................................20 traffic..................................................19
4.2.4. Path MTU..........................................20 4.2.4. Path MTU..........................................20
4.2.5. NVE location trade-offs...........................21 4.2.5. NVE location trade-offs...........................21
4.2.6. Interaction between network overlays and underlays.22 4.2.6. Interaction between network overlays and underlays.22
5. Security Considerations......................................22 5. Security Considerations.....................................22
6. IANA Considerations.........................................23 6. IANA Considerations.........................................23
7. References..................................................23 7. References..................................................23
7.1. Informative References.................................23 7.1. Informative References.................................23
8. Acknowledgments.............................................24 8. Acknowledgments.............................................24
1. Introduction 1. Introduction
This document provides a framework for Data Center (DC) Network This document provides a framework for Data Center (DC) Network
Virtualization over Layer3 (L3) tunnels. This framework is intended Virtualization over Layer3 (L3) tunnels. This framework is intended
to aid in standardizing protocols and mechanisms to support large- to aid in standardizing protocols and mechanisms to support large-
skipping to change at page 16, line 34 skipping to change at page 16, line 34
on MPLS re-rerouting capabilities. on MPLS re-rerouting capabilities.
When a Tenant System is co-located with the NVE, the Tenant System When a Tenant System is co-located with the NVE, the Tenant System
is effectively single homed to the NVE via a virtual port. When the is effectively single homed to the NVE via a virtual port. When the
Tenant System and the NVE are separated, the Tenant System is Tenant System and the NVE are separated, the Tenant System is
connected to the NVE via a logical Layer2 (L2) construct such as a connected to the NVE via a logical Layer2 (L2) construct such as a
VLAN and it can be multi-homed to various NVEs. An NVE may provide VLAN and it can be multi-homed to various NVEs. An NVE may provide
an L2 service to the end system or an l3 service. An NVE may be an L2 service to the end system or an l3 service. An NVE may be
multi-homed to a next layer in the DC at Layer2 (L2) or Layer3 multi-homed to a next layer in the DC at Layer2 (L2) or Layer3
(L3). When an NVE provides an L2 service and is not co-located with (L3). When an NVE provides an L2 service and is not co-located with
the end system, techniques such as Ethernet Link Aggregation Group the end system, loop avoidance techniques must be used. Similarly,
(LAG) or Spanning Tree Protocol (STP) can be used to switch traffic when the NVE provides L3 service, similar dual-homing techniques can
between an end system and connected NVEs without creating be used. When the NVE provides a L3 service to the end system, it is
loops. Similarly, when the NVE provides L3 service, similar dual- possible that no dynamic routing protocol is enabled between the end
homing techniques can be used. When the NVE provides a L3 service to system and the NVE. The end system can be multi-homed to
the end system, it is possible that no dynamic routing protocol is multiple physically-separated L3 NVEs over multiple interfaces. When
enabled between the end system and the NVE. The end system can be one of the links connected to an NVE fails, the other interfaces can
multi-homed to multiple physically-separated L3 NVEs over multiple be used to reach the end system.
interfaces. When one of the links connected to an NVE fails, the
other interfaces can be used to reach the end system.
External connectivity from a DC can be handled by two or more DC External connectivity from a DC can be handled by two or more DC
gateways. Each gateway provides access to external networks such as gateways. Each gateway provides access to external networks such as
VPNs or the Internet. A gateway may be connected to two or more edge VPNs or the Internet. A gateway may be connected to two or more edge
nodes in the external network for redundancy. When a connection to nodes in the external network for redundancy. When a connection to
an upstream node is lost, the alternative connection is used and the an upstream node is lost, the alternative connection is used and the
failed route withdrawn. failed route withdrawn.
3.3. VM Mobility 3.3. VM Mobility
skipping to change at page 17, line 29 skipping to change at page 17, line 27
Solutions to maintain connectivity while a VM is moved are necessary Solutions to maintain connectivity while a VM is moved are necessary
in the case of "hot" mobility. This implies that connectivity among in the case of "hot" mobility. This implies that connectivity among
VMs is preserved. For instance, for L2 VNs, ARP caches are updated VMs is preserved. For instance, for L2 VNs, ARP caches are updated
accordingly. accordingly.
Upon VM mobility, NVE policies that define connectivity among VMs Upon VM mobility, NVE policies that define connectivity among VMs
must be maintained. must be maintained.
During VM mobility, it is expected that the path to the VM's default During VM mobility, it is expected that the path to the VM's default
gateway assures adequate performance to VM applications. gateway assures adequate QoS to VM applications, i.e. QoS that
matches the expected service level agreement for these applications.
4. Key aspects of overlay networks 4. Key aspects of overlay networks
The intent of this section is to highlight specific issues that The intent of this section is to highlight specific issues that
proposed overlay solutions need to address. proposed overlay solutions need to address.
4.1. Pros & Cons 4.1. Pros & Cons
An overlay network is a layer of virtual network topology on top of An overlay network is a layer of virtual network topology on top of
the physical network. the physical network.
skipping to change at page 18, line 35 skipping to change at page 18, line 32
It is difficult to accurately evaluate network properties. It It is difficult to accurately evaluate network properties. It
might be preferable for the underlay network to expose usage might be preferable for the underlay network to expose usage
and performance information. and performance information.
- Miscommunication or lack of coordination between overlay and - Miscommunication or lack of coordination between overlay and
underlay networks can lead to an inefficient usage of network underlay networks can lead to an inefficient usage of network
resources. resources.
- When multiple overlays co-exist on top of a common underlay - When multiple overlays co-exist on top of a common underlay
network, the lack of coordination between overlays can lead network, the lack of coordination between overlays can lead
to performance issues and/or resource usage inefficiencies. to performance issues and/or resource usage inefficiencies.
- Traffic carried over an overlay may not traverse firewalls and - Traffic carried over an overlay might fail to traverse
NAT devices. firewalls and NAT devices.
- Multicast service scalability: Multicast support may be - Multicast service scalability: Multicast support may be
required in the underlay network to address tenant flood required in the underlay network to address tenant flood
containment or efficient multicast handling. The underlay may containment or efficient multicast handling. The underlay may
also be required to maintain multicast state on a per-tenant also be required to maintain multicast state on a per-tenant
basis, or even on a per-individual multicast flow of a given basis, or even on a per-individual multicast flow of a given
tenant. Ingress replication at the NVE eliminates that tenant. Ingress replication at the NVE eliminates that
additional multicast state in the underlay core, but depending additional multicast state in the underlay core, but depending
on the multicast traffic volume, it may cause inefficient use on the multicast traffic volume, it may cause inefficient use
of bandwidth. of bandwidth.
- Hash-based load balancing may not be optimal as the hash
algorithm may not work well due to the limited number of
combinations of tunnel source and destination addresses. Other
NVO3 mechanisms may use additional entropy information than
source and destination addresses.
4.2. Overlay issues to consider 4.2. Overlay issues to consider
4.2.1. Data plane vs Control plane driven 4.2.1. Data plane vs Control plane driven
In the case of an L2 NVE, it is possible to dynamically learn MAC In the case of an L2 NVE, it is possible to dynamically learn MAC
addresses against VAPs. It is also possible that such addresses be addresses against VAPs. It is also possible that such addresses be
known and controlled via management or a control protocol for both known and controlled via management or a control protocol for both
L2 NVEs and L3 NVEs. Dynamic data plane learning implies that L2 NVEs and L3 NVEs. Dynamic data plane learning implies that
flooding of unknown destinations be supported and hence implies that flooding of unknown destinations be supported and hence implies that
broadcast and/or multicast be supported or that ingress replication broadcast and/or multicast be supported or that ingress replication
skipping to change at page 20, line 44 skipping to change at page 20, line 36
trees as opposed to dedicated multicast trees. trees as opposed to dedicated multicast trees.
4.2.4. Path MTU 4.2.4. Path MTU
When using overlay tunneling, an outer header is added to the When using overlay tunneling, an outer header is added to the
original frame. This can cause the MTU of the path to the egress original frame. This can cause the MTU of the path to the egress
tunnel endpoint to be exceeded. tunnel endpoint to be exceeded.
It is usually not desirable to rely on IP fragmentation for It is usually not desirable to rely on IP fragmentation for
performance reasons. Ideally, the interface MTU as seen by a Tenant performance reasons. Ideally, the interface MTU as seen by a Tenant
System is adjusted such that no fragmentation is needed. TCP will System is adjusted such that no fragmentation is needed.
adjust its maximum segment size accordingly.
It is possible for the MTU to be configured manually or to be It is possible for the MTU to be configured manually or to be
discovered dynamically. Various Path MTU discovery techniques exist discovered dynamically. Various Path MTU discovery techniques exist
in order to determine the proper MTU size to use: in order to determine the proper MTU size to use:
- Classical ICMP-based MTU Path Discovery [RFC1191] [RFC1981] - Classical ICMP-based MTU Path Discovery [RFC1191] [RFC1981]
- Tenant Systems rely on ICMP messages to discover the MTU of the - Tenant Systems rely on ICMP messages to discover the MTU of the
end-to-end path to its destination. This method is not always end-to-end path to its destination. This method is not always
possible, such as when traversing middle boxes (e.g. firewalls) possible, such as when traversing middle boxes (e.g. firewalls)
which disable ICMP for security reasons which disable ICMP for security reasons
- Extended MTU Path Discovery techniques such as defined in - Extended MTU Path Discovery techniques such as defined in
[RFC4821] [RFC4821]
- Tenant Systems rely on detection of receipt and loss of probe - Tenant Systems send probe packets of different sizes, and rely
packets at receivers and communication of that receipt/loss on confirmation of receipt or lack thereof from receivers to
information to senders in order to discover the MTU of the end- allow a sender to discover the MTU of the end-to-end paths.
to-end path to its destination
It is also possible to rely on the NVE to perform segmentation and While it could also be possible to rely on the NVE to perform
reassembly operations without relying on the Tenant Systems to know segmentation and reassembly operations without relying on the Tenant
about the end-to-end MTU. The assumption is that some hardware Systems to know about the end-to-end MTU, this would lead to
assist is available on the NVE node to perform such SAR operations. undesired performance and congestion issues as well as significantly
However, fragmentation by the NVE can lead to performance and increase the complexity of hardware NVEs required for buffering and
congestion issues due to TCP dynamics and might require new reassembly logic.
congestion avoidance mechanisms from the underlay network [FLOYD].
Finally, the underlay network may be designed in such a way that the Preferably, the underlay network should be designed in such a way
MTU can accommodate the extra tunneling and possibly additional NVO3 that the MTU can accommodate the extra tunneling and possibly
header encapsulation overhead. additional NVO3 header encapsulation overhead.
4.2.5. NVE location trade-offs 4.2.5. NVE location trade-offs
In the case of DC traffic, traffic originated from a VM is native In the case of DC traffic, traffic originated from a VM is native
Ethernet traffic. This traffic can be switched by a local virtual Ethernet traffic. This traffic can be switched by a local virtual
switch or ToR switch and then by a DC gateway. The NVE function can switch or ToR switch and then by a DC gateway. The NVE function can
be embedded within any of these elements. be embedded within any of these elements.
There are several criteria to consider when deciding where the NVE There are several criteria to consider when deciding where the NVE
function should happen: function should happen:
skipping to change at page 22, line 42 skipping to change at page 22, line 32
coordination in placing overlay demand on an underlay network, may coordination in placing overlay demand on an underlay network, may
be achieved by providing mechanisms to exchange performance and be achieved by providing mechanisms to exchange performance and
liveliness information between the underlay and overlay(s) or the liveliness information between the underlay and overlay(s) or the
use of such information by a coordination system. Such information use of such information by a coordination system. Such information
may include: may include:
- Performance metrics (throughput, delay, loss, jitter) - Performance metrics (throughput, delay, loss, jitter)
- Cost metrics - Cost metrics
such as defined in [RFC2330].
5. Security Considerations 5. Security Considerations
Since NVEs and NVAs play a central role in NVO3, it is critical that Since NVEs and NVAs play a central role in NVO3, it is critical that
a secure access to NVEs and NVAs be ensured such that no a secure access to NVEs and NVAs be ensured such that no
unauthorized access is possible. unauthorized access is possible.
As discussed in section 3.1.5.2. , Tenant Systems identification is As discussed in section 3.1.5.2. , Tenant Systems identification is
based upon state that is often provided by management systems (e.g. based upon state that is often provided by management systems (e.g.
a VM orchestration system in a virtualized environment). Secure a VM orchestration system in a virtualized environment). Secure
access to such management systems must also be ensured. access to such management systems must also be ensured.
When an NVE receives data from a TS, the tenant identity needs to be When an NVE receives data from a Tenant System, the tenant identity
verified in order to guarantee that it is authorized to access the needs to be verified in order to guarantee that it is authorized to
corresponding VN. This can be achieved by identifying incoming access the corresponding VN. This can be achieved by identifying
packets against specific VAPs in some cases. In other circumstances, incoming packets against specific VAPs in some cases. In other
authentication may be necessary. circumstances, authentication may be necessary.
Data integrity can be assured if authorized access to NVEs, NVAs, Data integrity can be assured if authorized access to NVEs, NVAs,
and intermediate underlay nodes is ensured. Otherwise, encryption and intermediate underlay nodes is ensured. Otherwise, encryption
must be used. must be used.
NVO3 provides data confidentiality through data separation. The use NVO3 provides data confidentiality through data separation. The use
of both VNIs and tunneling of tenant traffic by NVEs ensures that of both VNIs and tunneling of tenant traffic by NVEs ensures that
NVO3 data is kept in a separate context and thus separated from NVO3 data is kept in a separate context and thus separated from
other tenant traffic. When NVO3 data traverses untrusted networks, other tenant traffic. When NVO3 data traverses untrusted networks,
data encryption may be needed. data encryption may be needed.
skipping to change at page 23, line 40 skipping to change at page 23, line 31
information). information).
6. IANA Considerations 6. IANA Considerations
IANA does not need to take any action for this draft. IANA does not need to take any action for this draft.
7. References 7. References
7.1. Informative References 7.1. Informative References
[NVOPS] Narten, T. et al, "Problem Statement : Overlays for Network [NVOPS] Narten, T. et al, "Problem Statement : Overlays for
Virtualization", draft-narten-nvo3-overlay-problem- Network Virtualization", draft-ietf-nvo3-overlay-problem-
statement (work in progress) statement (work in progress)
[OF] Open Networking Foundation, "OpenFlow Switch Specification
v1.4.0"
[FLOYD] Sally Floyd, Allyn Romanow, "Dynamics of TCP Traffic over
ATM Networks", IEEE JSAC, V. 13 N. 4, May 1995
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006. Networks (VPNs)", RFC 4364, February 2006.
[RFC4761] Kompella, K. et al, "Virtual Private LAN Service (VPLS) [RFC4761] Kompella, K. et al, "Virtual Private LAN Service (VPLS)
Using BGP for auto-discovery and Signaling", RFC4761, Using BGP for auto-discovery and Signaling", RFC4761,
January 2007 January 2007
[RFC4762] Lasserre, M. et al, "Virtual Private LAN Service (VPLS) [RFC4762] Lasserre, M. et al, "Virtual Private LAN Service (VPLS)
Using Label Distribution Protocol (LDP) Signaling", Using Label Distribution Protocol (LDP) Signaling",
RFC4762, January 2007 RFC4762, January 2007
skipping to change at page 24, line 30 skipping to change at page 24, line 14
[RFC1981] McCann, J. et al, "Path MTU Discovery for IPv6", RFC1981, [RFC1981] McCann, J. et al, "Path MTU Discovery for IPv6", RFC1981,
August 1996 August 1996
[RFC4821] Mathis, M. et al, "Packetization Layer Path MTU [RFC4821] Mathis, M. et al, "Packetization Layer Path MTU
Discovery", RFC4821, March 2007 Discovery", RFC4821, March 2007
[RFC6820] Narten, T. et al, "Address Resolution Problems in Large [RFC6820] Narten, T. et al, "Address Resolution Problems in Large
Data Center Networks", RFC6820, January 2013 Data Center Networks", RFC6820, January 2013
[RFC2330] Paxson, V. et al, "Framework for IP Performance Metrics",
RFC2330, May 1998
8. Acknowledgments 8. Acknowledgments
In addition to the authors the following people have contributed to In addition to the authors the following people have contributed to
this document: this document:
Dimitrios Stiliadis, Rotem Salomonovitch, Lucy Yong, Thomas Narten, Dimitrios Stiliadis, Rotem Salomonovitch, Lucy Yong, Thomas Narten,
Larry Kreeger, David Black. Larry Kreeger, David Black.
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
 End of changes. 21 change blocks. 
58 lines changed or deleted 46 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/