draft-ietf-nvo3-geneve-01.txt   draft-ietf-nvo3-geneve-02.txt 
Network Working Group J. Gross, Ed. Network Working Group J. Gross, Ed.
Internet-Draft VMware Internet-Draft VMware
Intended status: Standards Track I. Ganga, Ed. Intended status: Standards Track I. Ganga, Ed.
Expires: July 17, 2016 Intel Expires: January 9, 2017 Intel
January 14, 2016 July 8, 2016
Geneve: Generic Network Virtualization Encapsulation Geneve: Generic Network Virtualization Encapsulation
draft-ietf-nvo3-geneve-01 draft-ietf-nvo3-geneve-02
Abstract Abstract
Network virtualization involves the cooperation of devices with a Network virtualization involves the cooperation of devices with a
wide variety of capabilities such as software and hardware tunnel wide variety of capabilities such as software and hardware tunnel
endpoints, transit fabrics, and centralized control clusters. As a endpoints, transit fabrics, and centralized control clusters. As a
result of their role in tying together different elements in the result of their role in tying together different elements in the
system, the requirements on tunnels are influenced by all of these system, the requirements on tunnels are influenced by all of these
components. Flexibility is therefore the most important aspect of a components. Flexibility is therefore the most important aspect of a
tunnel protocol if it is to keep pace with the evolution of the tunnel protocol if it is to keep pace with the evolution of the
skipping to change at page 1, line 39 skipping to change at page 1, line 39
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 17, 2016. This Internet-Draft will expire on January 9, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 21 skipping to change at page 2, line 21
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4
2. Design Requirements . . . . . . . . . . . . . . . . . . . . . 5 2. Design Requirements . . . . . . . . . . . . . . . . . . . . . 5
2.1. Control Plane Independence . . . . . . . . . . . . . . . 6 2.1. Control Plane Independence . . . . . . . . . . . . . . . 6
2.2. Data Plane Extensibility . . . . . . . . . . . . . . . . 7 2.2. Data Plane Extensibility . . . . . . . . . . . . . . . . 7
2.2.1. Efficient Implementation . . . . . . . . . . . . . . 7 2.2.1. Efficient Implementation . . . . . . . . . . . . . . 7
2.3. Use of Standard IP Fabrics . . . . . . . . . . . . . . . 8 2.3. Use of Standard IP Fabrics . . . . . . . . . . . . . . . 8
3. Geneve Encapsulation Details . . . . . . . . . . . . . . . . 9 3. Geneve Encapsulation Details . . . . . . . . . . . . . . . . 9
3.1. Geneve Frame Format Over IPv4 . . . . . . . . . . . . . . 9 3.1. Geneve Packet Format Over IPv4 . . . . . . . . . . . . . 9
3.2. Geneve Frame Format Over IPv6 . . . . . . . . . . . . . . 10 3.2. Geneve Packet Format Over IPv6 . . . . . . . . . . . . . 10
3.3. UDP Header . . . . . . . . . . . . . . . . . . . . . . . 12 3.3. UDP Header . . . . . . . . . . . . . . . . . . . . . . . 12
3.4. Tunnel Header Fields . . . . . . . . . . . . . . . . . . 13 3.4. Tunnel Header Fields . . . . . . . . . . . . . . . . . . 13
3.5. Tunnel Options . . . . . . . . . . . . . . . . . . . . . 14 3.5. Tunnel Options . . . . . . . . . . . . . . . . . . . . . 14
3.5.1. Options Processing . . . . . . . . . . . . . . . . . 16 3.5.1. Options Processing . . . . . . . . . . . . . . . . . 16
4. Implementation and Deployment Considerations . . . . . . . . 17 4. Implementation and Deployment Considerations . . . . . . . . 17
4.1. Encapsulation of Geneve in IP . . . . . . . . . . . . . . 17 4.1. Encapsulation of Geneve in IP . . . . . . . . . . . . . . 17
4.1.1. IP Fragmentation . . . . . . . . . . . . . . . . . . 17 4.1.1. IP Fragmentation . . . . . . . . . . . . . . . . . . 17
4.1.2. DSCP and ECN . . . . . . . . . . . . . . . . . . . . 17 4.1.2. DSCP and ECN . . . . . . . . . . . . . . . . . . . . 17
4.1.3. Broadcast and Multicast . . . . . . . . . . . . . . . 18 4.1.3. Broadcast and Multicast . . . . . . . . . . . . . . . 18
4.1.4. Unidirectional Tunnels . . . . . . . . . . . . . . . 18 4.1.4. Unidirectional Tunnels . . . . . . . . . . . . . . . 18
4.2. Constraints on Protocol Features . . . . . . . . . . . . 19 4.2. Constraints on Protocol Features . . . . . . . . . . . . 19
4.3. NIC Offloads . . . . . . . . . . . . . . . . . . . . . . 19 4.3. NIC Offloads . . . . . . . . . . . . . . . . . . . . . . 19
4.4. Inner VLAN Handling . . . . . . . . . . . . . . . . . . . 20 4.4. Inner VLAN Handling . . . . . . . . . . . . . . . . . . . 20
5. Interoperability Issues . . . . . . . . . . . . . . . . . . . 20 5. Interoperability Issues . . . . . . . . . . . . . . . . . . . 20
6. Security Considerations . . . . . . . . . . . . . . . . . . . 21 6. Security Considerations . . . . . . . . . . . . . . . . . . . 21
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 22 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 22
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 23 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 24
10.1. Normative References . . . . . . . . . . . . . . . . . . 24 10.1. Normative References . . . . . . . . . . . . . . . . . . 24
10.2. Informative References . . . . . . . . . . . . . . . . . 24 10.2. Informative References . . . . . . . . . . . . . . . . . 24
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26
1. Introduction 1. Introduction
Networking has long featured a variety of tunneling, tagging, and Networking has long featured a variety of tunneling, tagging, and
other encapsulation mechanisms. However, the advent of network other encapsulation mechanisms. However, the advent of network
virtualization has caused a surge of renewed interest and a virtualization has caused a surge of renewed interest and a
skipping to change at page 5, line 7 skipping to change at page 5, line 7
NIC. Network Interface Card. A NIC could be part of a tunnel NIC. Network Interface Card. A NIC could be part of a tunnel
endpoint or transit device and can either process Geneve packets or endpoint or transit device and can either process Geneve packets or
aid in the processing of Geneve packets. aid in the processing of Geneve packets.
OAM. Operations, Administration, and Management. A suite of tools OAM. Operations, Administration, and Management. A suite of tools
used to monitor and troubleshoot network problems. used to monitor and troubleshoot network problems.
Transit device. A forwarding element along the path of the tunnel Transit device. A forwarding element along the path of the tunnel
making up part of the Underlay Network. A transit device MAY be making up part of the Underlay Network. A transit device MAY be
capable of understanding the Geneve frame format but does not capable of understanding the Geneve packet format but does not
originate or terminate Geneve packets. originate or terminate Geneve packets.
LSO. Large Segmentation Offload. A function provided by many LSO. Large Segmentation Offload. A function provided by many
commercial NICs that allows data units larger than the MTU to be commercial NICs that allows data units larger than the MTU to be
passed to the NIC to improve performance, the NIC being responsible passed to the NIC to improve performance, the NIC being responsible
for creating smaller segments of size less than or equal to the MTU for creating smaller segments of size less than or equal to the MTU
with correct protocol headers. When referring specifically to TCP/ with correct protocol headers. When referring specifically to TCP/
IP, this feature is often known as TSO (TCP Segmentation Offload). IP, this feature is often known as TSO (TCP Segmentation Offload).
Tunnel endpoint. A component performing encapsulation and Tunnel endpoint. A component performing encapsulation and
skipping to change at page 6, line 44 skipping to change at page 6, line 44
These requirements are described further in the following These requirements are described further in the following
subsections. subsections.
2.1. Control Plane Independence 2.1. Control Plane Independence
Although some protocols for network virtualization have included a Although some protocols for network virtualization have included a
control plane as part of the tunnel format specification (most control plane as part of the tunnel format specification (most
notably, the original VXLAN spec prescribed a multicast learning- notably, the original VXLAN spec prescribed a multicast learning-
based control plane), these specifications have largely been treated based control plane), these specifications have largely been treated
as describing only the data format. The VXLAN frame format has as describing only the data format. The VXLAN packet format has
actually seen a wide variety of control planes built on top of it. actually seen a wide variety of control planes built on top of it.
There is a clear advantage in settling on a data format: most of the There is a clear advantage in settling on a data format: most of the
protocols are only superficially different and there is little protocols are only superficially different and there is little
advantage in duplicating effort. However, the same cannot be said of advantage in duplicating effort. However, the same cannot be said of
control planes, which are diverse in very fundamental ways. The case control planes, which are diverse in very fundamental ways. The case
for standardization is also less clear given the wide variety in for standardization is also less clear given the wide variety in
requirements, goals, and deployment scenarios. requirements, goals, and deployment scenarios.
As a result of this reality, Geneve aims to be a pure tunnel format As a result of this reality, Geneve aims to be a pure tunnel format
skipping to change at page 7, line 24 skipping to change at page 7, line 24
Achieving the level of flexibility needed to support current and Achieving the level of flexibility needed to support current and
future control planes effectively requires an options infrastructure future control planes effectively requires an options infrastructure
to allow new metadata types to be defined, deployed, and either to allow new metadata types to be defined, deployed, and either
finalized or retired. Options also allow for differentiation of finalized or retired. Options also allow for differentiation of
products by encouraging independent development in each vendor's core products by encouraging independent development in each vendor's core
specialty, leading to an overall faster pace of advancement. By far specialty, leading to an overall faster pace of advancement. By far
the most common mechanism for implementing options is Type-Length- the most common mechanism for implementing options is Type-Length-
Value (TLV) format. Value (TLV) format.
It should be noted that while options can be used to support non- It should be noted that while options can be used to support non-
wirespeed control frames, they are equally important on data frames wirespeed control packets, they are equally important on data packets
as well to segregate and direct forwarding (for instance, the as well to segregate and direct forwarding (for instance, the
examples given before of input port based security policies and examples given before of input port based security policies and
service interposition both require tags to be placed on data service interposition both require tags to be placed on data
packets). Therefore, while it would be desirable to limit the packets). Therefore, while it would be desirable to limit the
extensibility to only control frames for the purposes of simplifying extensibility to only control packets for the purposes of simplifying
the datapath, that would not satisfy the design requirements. the datapath, that would not satisfy the design requirements.
2.2.1. Efficient Implementation 2.2.1. Efficient Implementation
There is often a conflict between software flexibility and hardware There is often a conflict between software flexibility and hardware
performance that is difficult to resolve. For a given set of performance that is difficult to resolve. For a given set of
functionality, it is obviously desirable to maximize performance. functionality, it is obviously desirable to maximize performance.
However, that does not mean new features that cannot be run at that However, that does not mean new features that cannot be run at that
speed today should be disallowed. Therefore, for a protocol to be speed today should be disallowed. Therefore, for a protocol to be
efficiently implementable means that a set of common capabilities can efficiently implementable means that a set of common capabilities can
skipping to change at page 9, line 7 skipping to change at page 9, line 7
endpoint addresses are available for hashing. endpoint addresses are available for hashing.
Since it is desirable for Geneve to perform well on these existing Since it is desirable for Geneve to perform well on these existing
fabrics, it is necessary for entropy from encapsulated packets to be fabrics, it is necessary for entropy from encapsulated packets to be
exposed in the tunnel header. The most common technique for this is exposed in the tunnel header. The most common technique for this is
to use the UDP source port, which is discussed further in to use the UDP source port, which is discussed further in
Section 3.3. Section 3.3.
3. Geneve Encapsulation Details 3. Geneve Encapsulation Details
The Geneve frame format consists of a compact tunnel header The Geneve packet format consists of a compact tunnel header
encapsulated in UDP over either IPv4 or IPv6. A small fixed tunnel encapsulated in UDP over either IPv4 or IPv6. A small fixed tunnel
header provides control information plus a base level of header provides control information plus a base level of
functionality and interoperability with a focus on simplicity. This functionality and interoperability with a focus on simplicity. This
header is then followed by a set of variable options to allow for header is then followed by a set of variable options to allow for
future innovation. Finally, the payload consists of a protocol data future innovation. Finally, the payload consists of a protocol data
unit of the indicated type, such as an Ethernet frame. Section 3.1 unit of the indicated type, such as an Ethernet frame. Section 3.1
and Section 3.2 illustrate the Geneve frame format transported (for and Section 3.2 illustrate the Geneve packet format transported (for
example) over Ethernet along with an Ethernet payload. example) over Ethernet along with an Ethernet payload.
3.1. Geneve Frame Format Over IPv4 3.1. Geneve Packet Format Over IPv4
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Outer Ethernet Header: Outer Ethernet Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Destination MAC Address | | Outer Destination MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Destination MAC Address | Outer Source MAC Address | | Outer Destination MAC Address | Outer Source MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 10, line 46 skipping to change at page 10, line 46
| Original Ethernet Payload | | Original Ethernet Payload |
| | | |
| (Note that the original Ethernet Frame's FCS is not included) | | (Note that the original Ethernet Frame's FCS is not included) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Frame Check Sequence: Frame Check Sequence:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| New FCS (Frame Check Sequence) for Outer Ethernet Frame | | New FCS (Frame Check Sequence) for Outer Ethernet Frame |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3.2. Geneve Frame Format Over IPv6 3.2. Geneve Packet Format Over IPv6
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Outer Ethernet Header: Outer Ethernet Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Destination MAC Address | | Outer Destination MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Destination MAC Address | Outer Source MAC Address | | Outer Destination MAC Address | Outer Source MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Source MAC Address | | Outer Source MAC Address |
skipping to change at page 13, line 44 skipping to change at page 13, line 44
version number MUST treat them as UDP packets with an unknown version number MUST treat them as UDP packets with an unknown
payload. payload.
Opt Len (6 bits): The length of the options fields, expressed in Opt Len (6 bits): The length of the options fields, expressed in
four byte multiples, not including the eight byte fixed tunnel four byte multiples, not including the eight byte fixed tunnel
header. This results in a minimum total Geneve header size of 8 header. This results in a minimum total Geneve header size of 8
bytes and a maximum of 260 bytes. The start of the payload bytes and a maximum of 260 bytes. The start of the payload
headers can be found using this offset from the end of the base headers can be found using this offset from the end of the base
Geneve header. Geneve header.
O (1 bit): OAM frame. This packet contains a control message O (1 bit): OAM packet. This packet contains a control message
instead of a data payload. Endpoints MUST NOT forward the payload instead of a data payload. Endpoints MUST NOT forward the payload
and transit devices MUST NOT attempt to interpret or process it. and transit devices MUST NOT attempt to interpret or process it.
Since these are infrequent control messages, it is RECOMMENDED Since these are infrequent control messages, it is RECOMMENDED
that endpoints direct these packets to a high priority control that endpoints direct these packets to a high priority control
queue (for example, to direct the packet to a general purpose CPU queue (for example, to direct the packet to a general purpose CPU
from a forwarding ASIC or to separate out control traffic on a from a forwarding ASIC or to separate out control traffic on a
NIC). Transit devices MUST NOT alter forwarding behavior on the NIC). Transit devices MUST NOT alter forwarding behavior on the
basis of this bit, such as ECMP link selection. basis of this bit, such as ECMP link selection.
C (1 bit): Critical options present. One or more options has the C (1 bit): Critical options present. One or more options has the
critical bit set (see Section 3.5). If this bit is set then critical bit set (see Section 3.5). If this bit is set then
tunnel endpoints MUST parse the options list to interpret any tunnel endpoints MUST parse the options list to interpret any
critical options. On endpoints where option parsing is not critical options. On endpoints where option parsing is not
supported the frame MUST be dropped on the basis of the 'C' bit in supported the packet MUST be dropped on the basis of the 'C' bit
the base header. If the bit is not set tunnel endpoints MAY strip in the base header. If the bit is not set tunnel endpoints MAY
all options using 'Opt Len' and forward the decapsulated frame. strip all options using 'Opt Len' and forward the decapsulated
Transit devices MUST NOT drop or modify packets on the basis of packet. Transit devices MUST NOT drop or modify packets on the
this bit. basis of this bit.
Rsvd. (6 bits): Reserved field which MUST be zero on transmission Rsvd. (6 bits): Reserved field which MUST be zero on transmission
and ignored on receipt. and ignored on receipt.
Protocol Type (16 bits): The type of the protocol data unit Protocol Type (16 bits): The type of the protocol data unit
appearing after the Geneve header. This follows the EtherType appearing after the Geneve header. This follows the EtherType
[ETYPES] convention with Ethernet itself being represented by the [ETYPES] convention with Ethernet itself being represented by the
value 0x6558. value 0x6558.
Virtual Network Identifier (VNI) (24 bits): An identifier for a Virtual Network Identifier (VNI) (24 bits): An identifier for a
skipping to change at page 15, line 28 skipping to change at page 15, line 28
addition, IANA will be requested to reserve specific ranges for addition, IANA will be requested to reserve specific ranges for
standardized and experimental options. standardized and experimental options.
Type (8 bits): Type indicating the format of the data contained in Type (8 bits): Type indicating the format of the data contained in
this option. Options are primarily designed to encourage future this option. Options are primarily designed to encourage future
extensibility and innovation and so standardized forms of these extensibility and innovation and so standardized forms of these
options will be defined in a separate document. options will be defined in a separate document.
The high order bit of the option type indicates that this is a The high order bit of the option type indicates that this is a
critical option. If the receiving endpoint does not recognize critical option. If the receiving endpoint does not recognize
this option and this bit is set then the frame MUST be dropped. this option and this bit is set then the packet MUST be dropped.
If the critical bit is set in any option then the 'C' bit in the If the critical bit is set in any option then the 'C' bit in the
Geneve base header MUST also be set. Transit devices MUST NOT Geneve base header MUST also be set. Transit devices MUST NOT
drop packets on the basis of this bit. The following figure shows drop packets on the basis of this bit. The following figure shows
the location of the 'C' bit in the 'Type' field: the location of the 'C' bit in the 'Type' field:
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|C| Type | |C| Type |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
skipping to change at page 16, line 15 skipping to change at page 16, line 15
header are invalid and MUST be silently dropped if received by an header are invalid and MUST be silently dropped if received by an
endpoint. endpoint.
Variable Option Data: Option data interpreted according to 'Type'. Variable Option Data: Option data interpreted according to 'Type'.
3.5.1. Options Processing 3.5.1. Options Processing
Geneve options are primarily intended to be originated and processed Geneve options are primarily intended to be originated and processed
by tunnel endpoints. However, options MAY be processed by transit by tunnel endpoints. However, options MAY be processed by transit
devices along the tunnel path as well. Transit devices not devices along the tunnel path as well. Transit devices not
processing Geneve options SHOULD process Geneve frame as any other processing Geneve headers SHOULD process Geneve packets as any other
UDP frame and maintain consistent forwarding behavior. UDP packet and maintain consistent forwarding behavior.
In tunnel endpoints, the generation and interpretation of options is In tunnel endpoints, the generation and interpretation of options is
determined by the control plane, which is out of the scope of this determined by the control plane, which is out of the scope of this
document. However, to ensure interoperability between heterogeneous document. However, to ensure interoperability between heterogeneous
devices two requirements are imposed on endpoint devices: devices some requirements are imposed on options and the devices that
process them:
o Receiving endpoints MUST drop packets containing unknown options o Receiving endpoints MUST drop packets containing unknown options
with the 'C' bit set in the option type. with the 'C' bit set in the option type. Conversely, transit
devices MUST NOT drop packets as a result of encountering unknown
options, including those with the 'C' bit set.
o Sending endpoints MUST NOT assume that options will be processed o Some options may be defined in such a way that the position in the
sequentially by the receiver in the order they were transmitted. option list is significant. Therefore, options MUST NOT be
reordered by transit devices.
o An option MUST NOT affect the parsing or interpretation of any
other option.
When designing a Geneve option, it is important to consider how the When designing a Geneve option, it is important to consider how the
option will evolve in the future. Once an option is defined it is option will evolve in the future. Once an option is defined it is
reasonable to expect that implementations may come to depend on a reasonable to expect that implementations may come to depend on a
specific behavior. As a result, the scope of any future changes must specific behavior. As a result, the scope of any future changes must
be carefully described upfront. be carefully described upfront.
Unexpectedly significant interoperability issues may result from Unexpectedly significant interoperability issues may result from
changing the length of an option that was defined to be a certain changing the length of an option that was defined to be a certain
size. A particular option is specified to have either a fixed size. A particular option is specified to have either a fixed
skipping to change at page 17, line 38 skipping to change at page 17, line 41
encapsulating endpoint with soft-state about the link that it may use encapsulating endpoint with soft-state about the link that it may use
to prevent or minimize fragmentation depending on its role in the to prevent or minimize fragmentation depending on its role in the
virtualized network. virtualized network.
Note that some implementations may not be capable of supporting Note that some implementations may not be capable of supporting
fragmentation or other less common features of the IP header, such as fragmentation or other less common features of the IP header, such as
options and extension headers. options and extension headers.
4.1.2. DSCP and ECN 4.1.2. DSCP and ECN
When encapsulating IP (including over Ethernet) frames in Geneve, When encapsulating IP (including over Ethernet) packets in Geneve,
there are several options for propagating DSCP and ECN bits from the there are several considerations for propagating DSCP and ECN bits
inner header to the tunnel on transmission and the reverse on from the inner header to the tunnel on transmission and the reverse
reception. on reception.
[RFC2983] lists considerations for mapping DSCP between inner and [RFC2983] provides guidance for mapping DSCP between inner and outer
outer IP headers. Network virtualization is typically more closely IP headers. Network virtualization is typically more closely aligned
aligned with the Pipe model described, where the DSCP value on the with the Pipe model described, where the DSCP value on the tunnel
tunnel header is set based on a policy (which may be a fixed value, header is set based on a policy (which may be a fixed value, one
one based on the inner traffic class, or some other mechanism for based on the inner traffic class, or some other mechanism for
grouping traffic). Aspects of the Uniform model (which treats the grouping traffic). Aspects of the Uniform model (which treats the
inner and outer DSCP value as a single field by copying on ingress inner and outer DSCP value as a single field by copying on ingress
and egress) may also apply, such as the ability to remark the inner and egress) may also apply, such as the ability to remark the inner
header on tunnel egress based on transit marking. However, the header on tunnel egress based on transit marking. However, the
Uniform model is not conceptually consistent with network Uniform model is not conceptually consistent with network
virtualization, which seeks to provide strong isolation between virtualization, which seeks to provide strong isolation between
encapsulated traffic and the physical network. encapsulated traffic and the physical network.
[RFC6040] describes the mechanism for exposing ECN capabilities on IP [RFC6040] describes the mechanism for exposing ECN capabilities on IP
tunnels and propagating congestion markers to the inner packets. tunnels and propagating congestion markers to the inner packets.
This behavior SHOULD be followed for IP packets encapsulated in This behavior MUST be followed for IP packets encapsulated in Geneve.
Geneve.
4.1.3. Broadcast and Multicast 4.1.3. Broadcast and Multicast
Geneve tunnels may either be point-to-point unicast between two Geneve tunnels may either be point-to-point unicast between two
endpoints or may utilize broadcast or multicast addressing. It is endpoints or may utilize broadcast or multicast addressing. It is
not required that inner and outer addressing match in this respect. not required that inner and outer addressing match in this respect.
For example, in physical networks that do not support multicast, For example, in physical networks that do not support multicast,
encapsulated multicast traffic may be replicated into multiple encapsulated multicast traffic may be replicated into multiple
unicast tunnels or forwarded by policy to a unicast location unicast tunnels or forwarded by policy to a unicast location
(possibly to be replicated there). (possibly to be replicated there).
skipping to change at page 19, line 45 skipping to change at page 19, line 45
o When performing LSO, a NIC MUST replicate the entire Geneve header o When performing LSO, a NIC MUST replicate the entire Geneve header
and all options, including those unknown to the device, onto each and all options, including those unknown to the device, onto each
resulting segment. However, a given option definition may resulting segment. However, a given option definition may
override this rule and specify different behavior in supporting override this rule and specify different behavior in supporting
devices. Conversely, when performing LRO, a NIC MAY assume that a devices. Conversely, when performing LRO, a NIC MAY assume that a
binary comparison of the options (including unknown options) is binary comparison of the options (including unknown options) is
sufficient to ensure equality and MAY merge packets with equal sufficient to ensure equality and MAY merge packets with equal
Geneve headers. Geneve headers.
o Option ordering is not significant and packets with the same o Options MUST NOT be reordered during the course of offload
options in a different order MAY be processed alike. processing, including when merging packets for the purpose of LRO.
o NICs performing offloads MUST NOT drop packets with unknown o NICs performing offloads MUST NOT drop packets with unknown
options, including those marked as critical. options, including those marked as critical.
There is no requirement that a given implementation of Geneve employ There is no requirement that a given implementation of Geneve employ
the offloads listed as examples above. However, as these offloads the offloads listed as examples above. However, as these offloads
are currently widely deployed in commercially available NICs, the are currently widely deployed in commercially available NICs, the
rules described here are intended to enable efficient handling of rules described here are intended to enable efficient handling of
current and future options across a variety of devices. current and future options across a variety of devices.
skipping to change at page 20, line 28 skipping to change at page 20, line 28
security or implementation considerations. However, in other cases security or implementation considerations. However, in other cases
trunking of VLAN frames across a Geneve tunnel can prove useful. As trunking of VLAN frames across a Geneve tunnel can prove useful. As
a result, the processing of inner VLAN tags upon ingress or egress a result, the processing of inner VLAN tags upon ingress or egress
from a tunnel endpoint is based upon the configuration of the from a tunnel endpoint is based upon the configuration of the
endpoint and/or control plane and not explicitly defined as part of endpoint and/or control plane and not explicitly defined as part of
the data format. the data format.
5. Interoperability Issues 5. Interoperability Issues
Viewed exclusively from the data plane, Geneve does not introduce any Viewed exclusively from the data plane, Geneve does not introduce any
interoperability issues as it appears to most devices as UDP frames. interoperability issues as it appears to most devices as UDP packets.
However, as there are already a number of tunnel protocols deployed However, as there are already a number of tunnel protocols deployed
in network virtualization environments, there is a practical question in network virtualization environments, there is a practical question
of transition and coexistence. of transition and coexistence.
Since Geneve is a superset of the functionality of the three most Since Geneve is a superset of the functionality of the three most
common protocols used for network virtualization (VXLAN, NVGRE, and common protocols used for network virtualization (VXLAN, NVGRE, and
STT) it should be straightforward to port an existing control plane STT) it should be straightforward to port an existing control plane
to run on top of it with minimal effort. With both the old and new to run on top of it with minimal effort. With both the old and new
frame formats supporting the same set of capabilities, there is no packet formats supporting the same set of capabilities, there is no
need for a hard transition - endpoints directly communicating with need for a hard transition - endpoints directly communicating with
each other use any common protocol, which may be different even each other use any common protocol, which may be different even
within a single overall system. As transit devices are primarily within a single overall system. As transit devices are primarily
forwarding frames on the basis of the IP header, all protocols appear forwarding packets on the basis of the IP header, all protocols
similar and these devices do not introduce additional appear similar and these devices do not introduce additional
interoperability concerns. interoperability concerns.
To assist with this transition, it is strongly suggested that To assist with this transition, it is strongly suggested that
implementations support simultaneous operation of both Geneve and implementations support simultaneous operation of both Geneve and
existing tunnel protocols as it is expected to be common for a single existing tunnel protocols as it is expected to be common for a single
node to communicate with a mixture of other nodes. Eventually, older node to communicate with a mixture of other nodes. Eventually, older
protocols may be phased out as they are no longer in use. protocols may be phased out as they are no longer in use.
6. Security Considerations 6. Security Considerations
As UDP/IP packets, Geneve does not have any inherent security As UDP/IP packets, Geneve does not have any inherent security
mechanisms. As a result, an attacker with access to the underlay mechanisms. As a result, an attacker with access to the underlay
network transporting the IP frames has the ability to snoop or inject network transporting the IP packets has the ability to snoop or
packets. Legitimate but malicious tunnel endpoints may also spoof inject packets. Legitimate but malicious tunnel endpoints may also
identifiers in the tunnel header to gain access to networks owned by spoof identifiers in the tunnel header to gain access to networks
other tenants. owned by other tenants.
Within a particular security domain, such as a data center operated Within a particular security domain, such as a data center operated
by a single provider, the most common and highest performing security by a single provider, the most common and highest performing security
mechanism is isolation of trusted components. Tunnel traffic can be mechanism is isolation of trusted components. Tunnel traffic can be
carried over a separate VLAN and filtered at any untrusted carried over a separate VLAN and filtered at any untrusted
boundaries. In addition, tunnel endpoints should only be operated in boundaries. In addition, tunnel endpoints should only be operated in
environments controlled by the service provider, such as the environments controlled by the service provider, such as the
hypervisor itself rather than within a customer VM. hypervisor itself rather than within a customer VM.
When crossing an untrusted link, such as the public Internet, IPsec When crossing an untrusted link, such as the public Internet, IPsec
skipping to change at page 21, line 50 skipping to change at page 21, line 50
Assignee: Jesse Gross <jgross@vmware.com> Assignee: Jesse Gross <jgross@vmware.com>
Contact: Jesse Gross <jgross@vmware.com> Contact: Jesse Gross <jgross@vmware.com>
Description: Generic Network Virtualization Encapsulation (Geneve) Description: Generic Network Virtualization Encapsulation (Geneve)
Reference: This document Reference: This document
Port Number: 6081 Port Number: 6081
In addition, IANA is requested to create a "Geneve Option Class" In addition, IANA is requested to create a "Geneve Option Class"
registry to allocate Option Classes. This shall be a registry of registry to allocate Option Classes. This shall be a registry of
16-bit hexadecimal values along with descriptive strings. The 16-bit hexadecimal values along with descriptive strings. The
identifiers 0x0-0xFF are to be reserved for standardized options for identifiers 0x0-0xFF are to be reserved for standardized options for
allocation by IETF Review [RFC5226] and 0xFFFF for Experimental Use. allocation by IETF Review [RFC5226] and 0xFFF0-0xFFFF for
Otherwise, identifiers are to be assigned to any organization with an Experimental Use. Otherwise, identifiers are to be assigned to any
interest in creating Geneve options on a First Come First Served organization with an interest in creating Geneve options on a First
basis. The registry is to be populated with the following initial Come First Served basis. The registry is to be populated with the
values: following initial values:
+----------------+--------------------------------------+ +----------------+--------------------------------------+
| Option Class | Description | | Option Class | Description |
+----------------+--------------------------------------+ +----------------+--------------------------------------+
| 0x0000..0x00FF | Unassigned - IETF Review | | 0x0000..0x00FF | Unassigned - IETF Review |
| 0x0100 | Linux | | 0x0100 | Linux |
| 0x0101 | Open vSwitch | | 0x0101 | Open vSwitch |
| 0x0102 | Open Virtual Networking (OVN) | | 0x0102 | Open Virtual Networking (OVN) |
| 0x0103..0xFFFE | Unassigned - First Come First Served | | 0x0103 | In-band Network Telemetry (INT) |
| 0xFFFF | Experimental | | 0x0104 | VMware |
| 0x0105..0xFFEF | Unassigned - First Come First Served |
| 0xFFF0..FFFF | Experimental |
+----------------+--------------------------------------+ +----------------+--------------------------------------+
8. Contributors 8. Contributors
The following individuals were authors of an earlier version of this The following individuals were authors of an earlier version of this
document and made significant contributions: document and made significant contributions:
T. Sridhar T. Sridhar
VMware, Inc. VMware, Inc.
3401 Hillview Ave. 3401 Hillview Ave.
skipping to change at page 23, line 36 skipping to change at page 23, line 37
Jon Hudson Jon Hudson
Brocade Communications Systems, Inc. Brocade Communications Systems, Inc.
130 Holger Way 130 Holger Way
San Jose, CA 95134 San Jose, CA 95134
USA USA
Email: jon.hudson@gmail.com Email: jon.hudson@gmail.com
Ariel Hendel Ariel Hendel
Broadcom Corporation Broadcom Limited
3151 Zanker Road 3151 Zanker Road
San Jose, CA 95134 San Jose, CA 95134
USA USA
Email: ahendel@broadcom.com Email: ariel.hendel@broadcom.com
9. Acknowledgements 9. Acknowledgements
The authors wish to thank Martin Casado, Bruce Davie and Dave Thaler The authors wish to thank Martin Casado, Bruce Davie and Dave Thaler
for their input, feedback, and helpful suggestions. for their input, feedback, and helpful suggestions.
10. References 10. References
10.1. Normative References 10.1. Normative References
skipping to change at page 24, line 32 skipping to change at page 24, line 37
10.2. Informative References 10.2. Informative References
[ETYPES] The IEEE Registration Authority, "IEEE 802 Numbers", 2013, [ETYPES] The IEEE Registration Authority, "IEEE 802 Numbers", 2013,
<http://www.iana.org/assignments/ieee-802-numbers/ <http://www.iana.org/assignments/ieee-802-numbers/
ieee-802-numbers.xml>. ieee-802-numbers.xml>.
[I-D.davie-stt] [I-D.davie-stt]
Davie, B. and J. Gross, "A Stateless Transport Tunneling Davie, B. and J. Gross, "A Stateless Transport Tunneling
Protocol for Network Virtualization (STT)", draft-davie- Protocol for Network Virtualization (STT)", draft-davie-
stt-06 (work in progress), April 2014. stt-08 (work in progress), April 2016.
[I-D.ietf-nvo3-dataplane-requirements] [I-D.ietf-nvo3-dataplane-requirements]
Bitar, N., Lasserre, M., Balus, F., Morin, T., Jin, L., Bitar, N., Lasserre, M., Balus, F., Morin, T., Jin, L.,
and B. Khasnabish, "NVO3 Data Plane Requirements", draft- and B. Khasnabish, "NVO3 Data Plane Requirements", draft-
ietf-nvo3-dataplane-requirements-03 (work in progress), ietf-nvo3-dataplane-requirements-03 (work in progress),
April 2014. April 2014.
[IEEE.802.1Q-2014] [IEEE.802.1Q-2014]
IEEE, "IEEE Standard for Local and metropolitan area IEEE, "IEEE Standard for Local and metropolitan area
networks -- Bridges and Bridged Networks", IEEE networks -- Bridges and Bridged Networks", IEEE
 End of changes. 33 change blocks. 
59 lines changed or deleted 67 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/