--- 1/draft-ietf-intarea-gue-07.txt 2019-10-04 16:13:13.823094901 -0700 +++ 2/draft-ietf-intarea-gue-08.txt 2019-10-04 16:13:13.903096916 -0700 @@ -1,21 +1,21 @@ Internet Area WG T. Herbert Internet-Draft Quantonium Intended status: Standard track L. Yong -Expires September 8, 2019 Independent +Expires April 6, 2020 Independent O. Zia Microsoft - March 7, 2019 + October 4, 2019 Generic UDP Encapsulation - draft-ietf-intarea-gue-07 + draft-ietf-intarea-gue-08 Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. @@ -24,21 +24,21 @@ and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html - This Internet-Draft will expire on September 8, 2019. + This Internet-Draft will expire on April 6, 2020. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -69,140 +69,142 @@ packets of various IP protocols. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Applicability . . . . . . . . . . . . . . . . . . . . . . . 5 1.2. Terminology and acronyms . . . . . . . . . . . . . . . . . 6 1.3. Requirements Language . . . . . . . . . . . . . . . . . . . 7 2. Base packet format . . . . . . . . . . . . . . . . . . . . . . 8 2.1. GUE variant . . . . . . . . . . . . . . . . . . . . . . . . 8 - 3. Variant 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 + 3. Variant 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1. Header format . . . . . . . . . . . . . . . . . . . . . . . 9 3.2. Proto/ctype field . . . . . . . . . . . . . . . . . . . . . 10 3.2.1. Proto field . . . . . . . . . . . . . . . . . . . . . . 10 - 3.2.2. Ctype field . . . . . . . . . . . . . . . . . . . . . . 11 + 3.2.2. Ctype field . . . . . . . . . . . . . . . . . . . . . . 10 3.3. Flags and extension fields . . . . . . . . . . . . . . . . 11 3.3.1. Requirements . . . . . . . . . . . . . . . . . . . . . 11 3.3.2. Example GUE header with extension fields . . . . . . . 12 - 3.4. Private data . . . . . . . . . . . . . . . . . . . . . . . 13 + 3.4. Surplus space . . . . . . . . . . . . . . . . . . . . . . . 12 3.5. Message types . . . . . . . . . . . . . . . . . . . . . . . 13 3.5.1. Control messages . . . . . . . . . . . . . . . . . . . 13 - 3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . . 14 - 4. Variant 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 - 4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 15 - 4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 16 - 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 - 5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 17 - 5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 17 - 5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 18 - 5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 18 - 5.4.1. Processing a received data message . . . . . . . . . . 18 - 5.4.2. Processing a received control message . . . . . . . . . 19 - 5.5. Middlebox inspection . . . . . . . . . . . . . . . . . . . 19 - 5.6. Router and switch operation . . . . . . . . . . . . . . . . 20 - 5.6.1. Connection semantics . . . . . . . . . . . . . . . . . 20 - 5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 21 - 5.7. Checksum Handling . . . . . . . . . . . . . . . . . . . . . 21 - 5.7.1. Requirements . . . . . . . . . . . . . . . . . . . . . 21 - 5.7.2. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 21 - 5.7.3. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 22 - 5.8. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 22 - 5.9. Congestion control . . . . . . . . . . . . . . . . . . . . 23 - 5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 23 - 5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 23 - 5.11.1. Flow classification . . . . . . . . . . . . . . . . . 24 - 5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 24 - 5.12. Negotiation of acceptable flags and extension fields . . . 25 - 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 26 - 6.1. Benefits of GUE . . . . . . . . . . . . . . . . . . . . . . 26 - 6.2. Comparison of GUE to other encapsulations . . . . . . . . . 26 - 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 28 - 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 28 - 8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 28 - 8.2. GUE variant number . . . . . . . . . . . . . . . . . . . . 29 - 8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 29 - 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 29 - 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 30 - 10.1. Normative References . . . . . . . . . . . . . . . . . . . 30 - 10.2. Informative References . . . . . . . . . . . . . . . . . . 31 - Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 34 - A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 34 - A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 34 - A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 35 - A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 35 - A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 36 - A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 37 - Appendix B: Implementation considerations . . . . . . . . . . . . 37 - B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 37 - B.2. Setting flow entropy as a route selector . . . . . . . . . 38 - B.3. Hardware protocol implementation considerations . . . . . . 38 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 39 + 3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . . 13 + 4. Variant 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 + 4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 14 + 4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 15 + 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 + 5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 16 + 5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 16 + 5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 17 + 5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 17 + 5.4.1. Processing a received data message . . . . . . . . . . 17 + 5.4.2. Processing a received control message . . . . . . . . . 18 + 5.5. Middlebox inspection . . . . . . . . . . . . . . . . . . . 18 + 5.6. Router and switch operation . . . . . . . . . . . . . . . . 19 + 5.6.1. Connection semantics . . . . . . . . . . . . . . . . . 19 + 5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 20 + 5.7. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 20 + 5.8. UDP Checksum Handling . . . . . . . . . . . . . . . . . . . 20 + 5.8.1. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 20 + 5.8.2. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 21 + 5.9. Congestion Considerations . . . . . . . . . . . . . . . . . 24 + 5.9.1. GUE tunnels . . . . . . . . . . . . . . . . . . . . . . 24 + 5.9.2 Transport layer encapsulation . . . . . . . . . . . . . 25 + 5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 25 + 5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 25 + 5.11.1. Flow classification . . . . . . . . . . . . . . . . . 25 + 5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 26 + 5.12. Negotiation of acceptable flags and extension fields . . . 27 + 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 27 + 6.1. Benefits of GUE . . . . . . . . . . . . . . . . . . . . . . 27 + 6.2. Comparison of GUE to other encapsulations . . . . . . . . . 28 + 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 30 + 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 30 + 8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 30 + 8.2. GUE variant number . . . . . . . . . . . . . . . . . . . . 31 + 8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 31 + 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 31 + 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 + 10.1. Normative References . . . . . . . . . . . . . . . . . . . 32 + 10.2. Informative References . . . . . . . . . . . . . . . . . . 33 + Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 36 + A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 36 + A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 36 + A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 37 + A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 37 + A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 38 + A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 39 + Appendix B: Implementation considerations . . . . . . . . . . . . 39 + B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 39 + B.2. Setting flow entropy as a route selector . . . . . . . . . 40 + B.3. Hardware protocol implementation considerations . . . . . . 40 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 41 1. Introduction This specification describes Generic UDP Encapsulation (GUE) which is a general method for encapsulating packets of arbitrary IP protocols within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating packets in UDP facilitates efficient transport across networks. Networking devices widely provide protocol specific processing and optimizations for UDP (as well as TCP) packets. Packets for atypical IP protocols (those not usually parsed by networking hardware) can be encapsulated in UDP packets to maximize deliverability and to leverage flow specific mechanisms for routing and packet steering. GUE provides an extensible header format for including optional data in the encapsulation header. This data potentially covers items such as a virtual networking identifier, security data for validating or - authenticating the GUE header, congestion control data, etc. GUE also - allows private optional data in the encapsulation header. This - feature can be used by a site or implementation to define local - custom optional data, and allows experimentation of options that may - eventually become standard. + authenticating the GUE header, congestion control data, etc. This document does not define any specific GUE extensions. [GUEEXTEN] specifies a set of initial extensions. 1.1. Applicability GUE is a network encapsulation protocol that encapsulates packets for various IP protocols. Potential use cases include network tunneling, multi-tenant network virtualization, tunneling for mobility, and transport layer encapsulation. GUE is intended for deploying overlay networks in public or private data center environments, as well as providing a general tunneling mechanism usable in the Internet. GUE is a UDP based encapsulation protocol transported over existing IPv4 and IPv6 networks. Hence, as a UDP based protocol, GUE adheres to the UDP usage guidelines as specified in [RFC8085]. Applicability of these guidelines are dependent on the underlay IP network and the nature of GUE payload protocol (for example TCP/IP or IP/Ethernet). + GUE may also be used to create IP tunnels, hence the guidelines in + [IPTUN] are applicable. - [RFC8085] outlines two applicability scenarios for UDP applications, - 1) general Internet and 2) controlled environment. GUE is intended to - allow deployment in both controlled environments and in the - uncontrolled Internet. The requirements of [RFC8085] pertaining to - deployment of a UDP encapsulation protocol in these environments are - applicable. Section 5 provides the specifics for satisfying - requirements of [RFC8085]. It is the responsibility of the operator - deploying GUE to ensure that the necessary operational requirements - are met for the environment in which GUE is being deployed. + [RFC8085] outlines two applicability scenarios for UDP applications: + (1) general Internet and (2) a traffic-managed controlled environment + (TMCE). The requirements of [RFC8085] pertaining to deployment of a + UDP encapsulation protocol in these environments are applicable. + Section 5 provides the specifics for satisfying requirements of + [RFC8085]. It is the responsibility of the operator deploying GUE to + ensure that the necessary operational requirements are met for the + environment in which GUE is being deployed. GUE has much of the same applicability and benefits as GRE-in-UDP [RFC8086] that are afforded by UDP encapsulation protocols. GUE offers the possibility of good performance for load-balancing encapsulated IP traffic in transit networks using existing Equal-Cost Multipath (ECMP) mechanisms that use a hash of the five-tuple of source IP address, destination IP address, UDP/TCP source port, UDP/TCP destination port, and protocol number. Encapsulating packets in UDP enables use of the UDP source port to provide entropy to ECMP - hashing. + hashing. A material difference between GUE and GRE-in-UDP is that the + payload of GUE is always an IP protocol whereas the payload in GRE- + in-UDP may be a non-IP protocol; this distinction is pertinent in the + discussion of congestion considerations (section 5.9) since IP + protocols are generally assumed to be congestion controlled. In addition, GUE enables extending the use of atypical IP protocols (those other than TCP and UDP) across networks that might otherwise filter packets carrying those protocols. GUE may also be used with connection oriented UDP semantics in order to facilitate traversal through stateful firewalls and stateful NAT. Additional motivation for the GUE protocol is provided in section 6. 1.2. Terminology and acronyms @@ -226,48 +228,48 @@ Data message An encapsulated packet in a GUE payload that is addressed to the protocol stack for an associated protocol Control message A formatted message in the GUE payload that is implicitly addressed to the decapsulator to monitor or control the state or behavior of a tunnel Flags A set of bit flags in the primary GUE header - Extension field - An optional field in a GUE header whose presence is + Extension field An optional field in a GUE header whose presence is indicated by corresponding flag(s) C-bit A single bit flag in the primary GUE header that indicates whether the GUE packet contains a control message or data message Hlen A field in the primary GUE header that gives the length of the GUE header Proto/ctype A field in the GUE header that holds either the IP protocol number for a data message or a type for a control message - Private data Optional data in the GUE header that can be used for - private purposes - Outer IP header Refers to the outer most IP header or packet when encapsulating a packet over IP Inner IP header Refers to an encapsulated IP header when an IP packet is encapsulated Outer packet Refers to an encapsulating packet Inner packet Refers to a packet that is encapsulated + TMCE A traffic-managed controlled environment, i.e., an + IP network that is traffic-engineered and/or + otherwise managed + 1.3. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Base packet format A GUE packet is comprised of a UDP packet whose payload is a GUE header followed by a payload which is either an encapsulated packet @@ -284,21 +286,21 @@ | GUE Header | | | |-------------------------------| | | | Encapsulated packet | | or control message | | | +-------------------------------+ The GUE header is variable length as determined by the presence of - optional extension fields and private data. + optional extension fields. 2.1. GUE variant The first two bits of the GUE header contain the GUE protocol variant number. The variant number can indicate the version of the GUE protocol as well as alternate forms of a version. Variants 0 and 1 are described in this specification; variants 2 and 3 are reserved. @@ -313,26 +315,22 @@ 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ | Source port | Destination port | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP | Length | Checksum | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ | 0 |C| Hlen | Proto/ctype | Flags |\ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | - | | | - ~ Extensions Fields (optional) ~ | | | GUE - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | - | | | - ~ Private data (optional) ~ | + ~ Extensions Fields (optional) ~ | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ The contents of the UDP header are: o Source port: If connection semantics (section 5.6.1) are applied to an encapsulation, this is set to the local source port for the connection. When connection semantics are not applied, the source port is either set to a flow entropy value, as described in section 5.11, or is set to the GUE assigned port number, @@ -340,21 +338,21 @@ o Destination port: If connection semantics (section 5.6.1) are applied to an encapsulation, this is set to the destination port for the tuple. If connection semantics are not applied then the destination port is set to the GUE assigned port number, 6080. o Length: Canonical length of the UDP packet (length of UDP header and payload). o Checksum: Standard UDP checksum (handling is described in - section 5.7). + section 5.8). The GUE header consists of: o Variant: 0 indicates GUE protocol version 0 with a header. o C: C-bit: When set indicates a control message. When not set indicates a data message. o Hlen: Length in 32-bit words of the GUE header, including optional extension fields but not the first four bytes of the @@ -369,27 +367,20 @@ control message or encapsulated packet begins at the offset provided by Hlen. o Flags: Header flags that may be allocated for various purposes and may indicate the presence of extension fields. Undefined header flag bits MUST be set to zero on transmission. o Extension Fields: Optional fields whose presence is indicated by corresponding flags. - o Private data: Optional private data block (see section 3.4). If - the private block is present, it immediately follows that last - extension field present in the header. The private block is - considered to be part of the GUE header. The length of this data - is determined by subtracting the starting offset of the private - data from the header length. - 3.2. Proto/ctype field The proto/ctype fields either contains an Internet protocol number (when the C-bit is not set) or GUE control message type (when the C- bit is set). 3.2.1. Proto field When the C-bit is not set, the proto/ctype field MUST contain an IANA Internet Protocol Number [IANA-PN]. The protocol number is @@ -410,62 +401,60 @@ When the C-bit is set, the proto/ctype field MUST be set to a valid control message type. A value of zero indicates that the GUE payload requires further interpretation to deduce the control type. This might be the case when the payload is a fragment of a control message, where only the reassembled packet can be interpreted as a control message. Control messages will be defined in an IANA registry. Control message types 1 through 127 may be defined in standards. Types 128 through - 255 are reserved to be user defined for experimentation or private - control messages. + 255 are reserved to be user defined for experimentation. This document does not specify any standard control message types - other than type 0. Type 0 does not define a format of the control - message. Instead, it indicates that the GUE payload is a control + other than type 0. Type 0 indicates that the GUE payload is a control message, or part of a control message (as might be the case in GUE - fragmentation), that cannot be correctly parsed or interpreted - without additional context. + fragmentation) that cannot be correctly parsed or interpreted without + additional context. 3.3. Flags and extension fields Flags and associated extension fields are the primary mechanism of extensibility in GUE. As mentioned in section 3.1, GUE header flags indicate the presence of optional extension fields in the GUE header. [GUEEXTEN] defines an initial set of GUE extensions. 3.3.1. Requirements There are sixteen flag bits in the GUE header. Flags may indicate presence of extension fields. The size of an extension field indicated by a flag MUST be fixed in the specification of the flag. - Flags can be paired together to allow different lengths for an - extension field. For example, if two flag bits are paired, a field + Flags can be grouped together to allow different lengths for an + extension field. For example, if two flag bits are grouped, a field can possibly be three different lengths-- that is bit value of 00 indicates no field present; 01, 10, and 11 indicate three possible - lengths for the field. Regardless of how flag bits are paired, the + lengths for the field. Regardless of how flag bits are grouped, the lengths and offsets of extension fields corresponding to a set of flags MUST be well defined and deterministic. Extension fields are placed in order of the flags. New flags are to be allocated from high to low order bit contiguously without holes. Flags allow random access, for instance to inspect the field corresponding to the Nth flag bit, an implementation only considers the previous N-1 flags to determine the offset. Flags after the Nth flag are not pertinent in calculating the offset of the field for the Nth flag. Random access of flags and fields permits processing of optional extensions in an order that is independent of their position in the packet. - Flags (or paired flags) are idempotent such that new flags MUST NOT + Flags (or grouped flags) are idempotent such that new flags MUST NOT cause reinterpretation of old flags. Also, new flags MUST NOT alter interpretation of other elements in the GUE header nor how the message is parsed (for instance, in a data message the proto/ctype field always holds an IP protocol number as an invariant). The set of available flags can be extended in the future by defining a "flag extensions bit" that refers to a field containing a new set of flags. 3.3.2. Example GUE header with extension fields @@ -481,57 +470,38 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Security + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ In the above example, the first flag bit is set which indicates that the Group Identifier extension is present which is a 32 bit field. - The second through fourth bits of the flags are paired flags that + The second through fourth bits of the flags are grouped flags that indicate the presence of a Security field with seven possible sizes. In this example 001 indicates a sixty-four bit security field. -3.4. Private data - - An implementation MAY use private data for its own use. The private - data immediately follows the last extension field in the GUE header - and is not a fixed length. This data is considered part of the GUE - header and MUST be accounted for in header length (Hlen). The length - of the private data MUST be a multiple of four bytes and is - determined by subtracting the offset of private data in the GUE - header from the header length. Specifically: - - Private_length = (Hlen * 4) - Length(flags) - - where "Length(flags)" returns the sum of lengths of all the extension - fields present in the GUE header. When there is no private data - present, the length of the private data is zero. - - The semantics and interpretation of private data are implementation - specific. The private data may be structured as necessary, for - instance it might contain its own set of flags and extension fields. +3.4. Surplus space - An encapsulator and decapsulator MUST agree on the meaning of private - data before using it. The mechanism to achieve this agreement is - outside the scope of this document but could include implementation- - defined behavior, coordinated configuration, in-band communication - using GUE control messages, or out-of-band messages. + The length of a GUE header, as indicated in the GUE Hlen field, may + exceed the space consumed by optional extensions in a packet. The + space between the end of the last optional field and the end of the + header is termed the "surplus space". - If a decapsulator receives a GUE packet with private data, it MUST - validate the private data appropriately. If a decapsulator does not - expect private data from an encapsulator, the packet MUST be dropped. - If a decapsulator cannot validate the contents of private data per - the provided semantics, the packet MUST also be dropped. An - implementation MAY place security data in GUE private data which if - present MUST be verified for packet acceptance. + Surplus space is reserved per this specification and uses may be + defined in future specifications. If a node receives a GUE packet + with non-zero length of surplus space then it MUST NOT attempt to + interpret the data in the surplus space. For purposes of transforms + across the header, such as optional integrity check over the header, + the surplus space is considered to be part of the GUE header and + would be included in computation. 3.5. Message types There are two message types in GUE variant 0: control messages and data messages. 3.5.1. Control messages Control messages carry formatted data that are implicitly addressed to the decapsulator to monitor or control the state or behavior of a @@ -551,43 +521,42 @@ control messages can be created that follow the same path through the network as data messages. 3.5.2. Data messages Data messages carry encapsulated packets that are addressed to the protocol stack for the associated protocol. Data messages are a primary means of encapsulation and can be used to create tunnels for overlay networks. - Data messages are indicated in GUE header when the C-bit is not set. - The payload of a data message is interpreted as an encapsulated + Data messages are indicated in the GUE header when the C-bit is not + set. The payload of a data message is interpreted as an encapsulated packet of an Internet protocol indicated in the proto/ctype field. The encapsulated packet immediately follows the GUE header. 4. Variant 1 Variant 1 of GUE allows direct encapsulation of IPv4 and IPv6 in UDP. In this variant there is no GUE header, a UDP packet carries an IP packet. The first two bits of the UDP payload are the GUE variant field and coincide with the first two bits of the version number in the IP header. The first two version bits of IPv4 and IPv6 are 01, so we use GUE variant 1 for direct IP encapsulation which makes the two bits of GUE variant to also be 01. This technique is effectively a means to compress out the GUE version 0 header when encapsulating IPv4 or IPv6 packets and there are no - flags, extension fields, or private data present. This method is - compatible to use on the same port number as packets with the GUE - header (GUE variant 0 packets). This technique saves encapsulation - overhead on costly links for the common use of IP encapsulation, and - also obviates the need to allocate a separate UDP port number for IP- - over-UDP encapsulation. + flags or extension fields. This method is compatible to use on the + same port number as packets with the GUE header (GUE variant 0 + packets). This technique saves encapsulation overhead on costly links + for the common use of IP encapsulation, and also obviates the need to + allocate a separate UDP port number for IP-over-UDP encapsulation. 4.1. Direct encapsulation of IPv4 The format for encapsulating IPv4 directly in UDP is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ | Source port | Destination port | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP @@ -601,22 +570,22 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source IPv4 Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination IPv4 Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The UDP fields are set in a similar manner as described in section 3.1. Note that the 0100 value in the first four bits of the UDP payload - expresses the GUE variant as 1 (bits 01) and IP version as 4 (bits - 0100). + expresses both the GUE variant as 1 (bits 01) and IP version as 4 + (bits 0100). 4.2. Direct encapsulation of IPv6 The format for encapsulating IPv6 directly in UDP is demonstrated below: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ | Source port | Destination port | | @@ -641,22 +610,22 @@ + Destination IPv6 Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The UDP fields are set in a similar manner as described in section 3.1. Note that the 0110 value in the first four bits of the the UDP - payload expresses the GUE variant as 1 (bits 01) and IP version as 6 - (bits 0110). + payload expresses both the GUE variant as 1 (bits 01) and IP version + as 6 (bits 0110). 5. Operation The figure below illustrates the use of GUE encapsulation between two hosts. Host 1 is sending packets to Host 2. An encapsulator performs encapsulation of packets from Host 1. These encapsulated packets traverse the network as UDP packets. At the decapsulator, packets are decapsulated and sent on to Host 2. Packet flow in the reverse direction need not be symmetric; for example, the reverse path might not use GUE or any other form of encapsulation. @@ -700,24 +669,24 @@ Encapsulators create GUE data messages, set the fields of the UDP header, set flags and optional extension fields in the GUE header, and forward packets to a decapsulator. An encapsulator can be an end host originating the packets of a flow, or can be a network device performing encapsulation on behalf of hosts (routers implementing tunnels for instance). In either case, the intended target (decapsulator) is indicated by the outer destination IP address and destination port in the UDP header. - If an encapsulator is tunneling packets -- that is encapsulating + If an encapsulator is tunneling packets, that is encapsulating packets of layer 2 or layer 3 protocols (e.g. EtherIP, IPIP, ESP - tunnel mode) -- it SHOULD follow standard conventions for tunneling - one protocol over another. For instance, if an IP packet is being + tunnel mode), it SHOULD follow standard conventions for tunneling one + protocol over another. For instance, if an IP packet is being encapsulated in GUE then diffserv interaction [RFC2983] and ECN propagation for tunnels [RFC6040] SHOULD be followed. 5.4. Decapsulator operation A decapsulator performs decapsulation of GUE packets. A decapsulator is addressed by the outer destination IP address and UDP destination port of a GUE packet. The decapsulator validates packets, including fields of the GUE header. @@ -826,152 +795,273 @@ A middlebox might infer bidirectional connection semantics for a UDP flow. For instance, a stateful firewall might create a five-tuple rule to match flows on egress, and a corresponding five-tuple rule for matching ingress packets where the roles of source and destination are reversed for the IP addresses and UDP port numbers. To operate in this environment, a GUE tunnel should be configured to assume connected semantics defined by the UDP five tuple and the use of GUE encapsulation needs to be symmetric between both endpoints. The source port set in the UDP header MUST be the destination port the peer would set for replies. In this case, the UDP source port for - a tunnel would be a fixed value and not set to be flow entropy as - described in section 5.11. + a tunnel would be a fixed value and not set to be flow entropy. The selection of whether to make the UDP source port fixed or set to a flow entropy value for each packet sent SHOULD be configurable for a tunnel. The default MUST be to set the flow entropy value in the UDP source port. 5.6.2. NAT IP address and port translation can be performed on the UDP/IP headers adhering to the requirements for NAT (Network Address Translation) with UDP [RFC4787]. In the case of stateful NAT, connection semantics MUST be applied to a GUE tunnel as described in section 5.6.1. GUE endpoints MAY also invoke STUN [RFC5389] or ICE [RFC5245] to manage NAT port mappings for encapsulations. -5.7. Checksum Handling - - The potential for mis-delivery of packets due to corruption of IP, - UDP, or GUE headers needs to be considered. Historically, the UDP - checksum would be considered sufficient as a check against corruption - of either the UDP header and payload or the IP addresses. - Encapsulation protocols, such as GUE, can be originated or terminated - on devices incapable of computing the UDP checksum for packet. This - section discusses the requirements around checksum and alternatives - that might be used when an endpoint does not support UDP checksum. - -5.7.1. Requirements +5.7. MTU and fragmentation - One of the following requirements MUST be met: + Standard conventions for handling of MTU (Maximum Transmission Unit) + and fragmentation in conjunction with networking tunnels + (encapsulation of layer 2 or layer 3 packets) SHOULD be followed. + Details are described in MTU and Fragmentation Issues with In-the- + Network Tunneling [RFC4459]. - o UDP checksums are enabled (for IPv4 or IPv6). + If a packet is fragmented before encapsulation in GUE, all the + related fragments MUST be encapsulated using the same UDP source + port. An operator SHOULD set MTU to account for encapsulation + overhead and reduce the likelihood of fragmentation. - o The GUE header checksum is used (defined in [GUEEXTEN]). + Alternative to IP fragmentation, the GUE fragmentation extension can + be used. GUE fragmentation is described in [GUEEXTEN]. - o Use zero UDP checksums. This is always permissible with IPv4; in - IPv6, they can only be used in accordance with applicable - requirements in [RFC8086], [RFC6935], and [RFC6936]. +5.8. UDP Checksum Handling -5.7.2. UDP Checksum with IPv4 +5.8.1. UDP Checksum with IPv4 - For UDP in IPv4, the UDP checksum MUST be processed as specified in - [RFC0768] and [RFC1122] for both transmit and receive. An - encapsulator MAY set the UDP checksum to zero for performance or - implementation considerations. The IPv4 header includes a checksum - that protects against mis-delivery of the packet due to corruption of + For UDP in IPv4, when a non-zero UDP checksum is used, the UDP + checksum MUST be processed as specified in [RFC0768] and [RFC1122] + for both transmit and receive. The IPv4 header includes a checksum + that protects against misdelivery of the packet due to corruption of IP addresses. The UDP checksum potentially provides protection against corruption of the UDP header, GUE header, and GUE payload. - Enabling or disabling the use of checksums is a deployment - consideration that should take into account the risk and effects of - packet corruption, and whether the packets in the network are already - adequately protected by other, possibly stronger mechanisms, such as - the Ethernet CRC. If an encapsulator sets a zero UDP checksum for - IPv4, it SHOULD use the GUE header checksum as described in - [GUEEXTEN] if there are no other mechanisms used that would detect - corruption of GUE packets. + Disabling the use of checksums is a deployment consideration that + should take into account the risk and effects of packet corruption. When a decapsulator receives a packet, the UDP checksum field MUST be processed. If the UDP checksum is non-zero, the decapsulator MUST verify the checksum before accepting the packet. By default, a decapsulator SHOULD accept UDP packets with a zero checksum. A node - MAY be configured to disallow zero checksums per [RFC1122]. - Configuration of zero checksums can be selective. For instance, zero - checksums might be disallowed from certain hosts that are known to be - traversing paths subject to packet corruption. If verification of a - non-zero checksum fails, a decapsulator lacks the capability to - verify a non-zero checksum, or a packet with a zero-checksum was - received and the decapsulator is configured to disallow that, then - the packet MUST be dropped. + MAY be configured to disallow zero checksums per [RFC1122]; this may + be done selectively, for instance by disallowing zero checksums from + certain hosts that are known to be sending over paths subject to + packet corruption. If verification of a non-zero checksum fails, a + decapsulator lacks the capability to verify a non-zero checksum, or a + packet with a zero checksum was received and the decapsulator is + configured to disallow, the packet MUST be dropped and an event MAY + be logged. -5.7.3. UDP Checksum with IPv6 +5.8.2. UDP Checksum with IPv6 - In IPv6, there is no checksum in the IPv6 header that protects - against mis-delivery due to address corruption. Therefore, when GUE - is used over IPv6, either the UDP checksum or the GUE header checksum - SHOULD be used unless there are alternative mechanisms in use that - protect against misdelivery. The UDP checksum and GUE header checksum - SHOULD NOT be used at the same time since that would be mostly - redundant. + For UDP in IPv6, the UDP checksum MUST be processed as specified in + [RFC0768] and [RFC2460] for both transmit and receive. - If neither the UDP checksum nor the GUE header checksum is used, then - the requirements for using zero IPv6 UDP checksums in [RFC6935] and - [RFC6936] MUST be met. + When UDP is used over IPv6, the UDP checksum is relied upon to + protect both the IPv6 and UDP headers from corruption. As such, by + default a GUE encapsulator MUST use UDP checksums. - When a decapsulator receives a packet, the UDP checksum field MUST be - processed. If the UDP checksum is non-zero, the decapsulator MUST - verify the checksum before accepting the packet. By default a - decapsulator MUST only accept UDP packets with a zero checksum if the - GUE header checksum is used and is verified. If verification of a - non-zero checksum fails or a decapsulator lacks the capability to - verify a non-zero checksum then the packet MUST be dropped. If a - packet is received with a zero UDP checksum, no GUE header checksum, - and zero UDP checksums are disallowed then the packet MUST be + [GUEEXTEN] specifies a GUE checksum option that includes a pseudo + header containing the IP addresses. An encapsulator MAY use zero-UDP + checksums if it uses the GUE checksum. A non-zero UDP checksum and + the GUE checksum SHOULD NOT be used simultaneously in a packet since + that would be redundant. + + When deployed in a TMCE, a GUE encapsulator MAY be configured to use + UDP zero-checksum mode and no GUE checksum if the traffic-managed + controlled environment or a set of closely cooperating traffic- + managed controlled environments (such as by network operators who + have agreed to work together in order to jointly provide specific + services) meet at least one of the following conditions: + + a. It is known (perhaps through knowledge of equipment types and + lower-layer checks) that packet corruption is exceptionally + unlikely and where the operator is willing to take the risk of + undetected packet corruption. + + b. It is judged through observational measurements (perhaps of + historic or current traffic flows that use a non-zero checksum) + that the level of packet corruption is tolerably low and where + the operator is willing to take the risk of undetected packet + corruption. + + c. Carrying applications that are tolerant of misdelivered or + corrupted packets (perhaps through higher-layer checksum, + validation, and retransmission or transmission redundancy) + where the operator is willing to rely on the applications using + GUE to survive any corrupt packets. + + The following requirements apply to encapsulators deployed in a TMCE + environment that use UDP zero-checksum mode: + + a. Use of the UDP checksum with IPv6 MUST be the default + configuration for all communications. + + b. The GUE implementation MUST comply with all requirements + specified in Section 4 of [RFC6936] and with requirement 1 + specified in Section 5 of [RFC6936]. + + c. A decapsulator SHOULD only allow the use of UDP zero-checksum + mode for IPv6 on a single received UDP Destination Port, + regardless of the encapsulator. The motivation for this + requirement is possible corruption of the UDP Destination Port, + which may cause packet delivery to the wrong UDP port. If that + other UDP port requires the UDP checksum, the misdelivered + packet will be discarded. + + d. It is RECOMMENDED that the UDP zero-checksum mode for IPv6 is + only enabled for certain selected source addresses. The + decapsulator MUST check that the source and destination IPv6 + addresses in a received packets are permitted by configuration + to use UDP zero-checksum mode and discard any packet for which + this check fails. + + e. The tunnel encapsulator SHOULD use different IPv6 addresses for + each GUE communication (tunnel or transport flow) that uses UDP + zero-checksum mode, regardless of the decapsulator, in order to + strengthen the decapsulator's check of the IPv6 source address + (i.e., the same IPv6 source address SHOULD NOT be used with + more than one IPv6 destination address, independent of whether + that destination address is a unicast or multicast address). + When this is not possible, it is RECOMMENDED to use each source + IPv6 address for as few GUE communications that use UDP zero- + checksum mode as is feasible. + + f. When any middlebox exists on the path of GUE communication, it + is RECOMMENDED to use the default mode, i.e., use UDP checksum, + to reduce the chance that the encapsulated packets will be dropped. -5.8. MTU and fragmentation + g. Any middlebox that allows the UDP zero-checksum mode for IPv6 + MUST comply with requirements 1 and 8-10 in Section 5 of + [RFC6936]. - Standard conventions for handling of MTU (Maximum Transmission Unit) - and fragmentation in conjunction with networking tunnels - (encapsulation of layer 2 or layer 3 packets) SHOULD be followed. - Details are described in MTU and Fragmentation Issues with In-the- - Network Tunneling [RFC4459]. + h. Measures SHOULD be taken to prevent IPv6 traffic with zero UDP + checksums from "escaping" to the general Internet; see Section + 5.9 for examples of such measures. - If a packet is fragmented before encapsulation in GUE, all the - related fragments MUST be encapsulated using the same UDP source - port. An operator SHOULD set MTU to account for encapsulation - overhead and reduce the likelihood of fragmentation. + i. IPv6 traffic with zero UDP checksums MUST be actively monitored + for errors by the network operator. For example, the operator + may monitor Ethernet-layer packet error rates. - Alternative to IP fragmentation, the GUE fragmentation extension can - be used. GUE fragmentation is described in [GUEEXTEN]. + j. If a packet with a non-zero checksum is received, the checksum + MUST be verified before accepting the packet. This is + regardless of whether the tunnel encapsulator and decapsulator + have been configured with UDP zero-checksum mode. -5.9. Congestion control + The above requirements do not change either the requirements + specified in [RFC8200] as modified by [RFC6935] or the requirements + specified in [RFC6936]. - Per requirements of [RFC8085], if the IP traffic encapsulated with - GUE implements proper congestion control then no additional - mechanisms should be required. + The requirement to check the source IPv6 address in addition to the + destination IPv6 address and the strong recommendation against reuse + of source IPv6 addresses among GUE communications collectively + provide some mitigation for the absence of UDP checksum coverage of + the IPv6 header. A traffic-managed controlled environment that + satisfies at least one of three conditions listed at the beginning of + this section provides additional assurance. - In the case that the encapsulated traffic does not implement any or - sufficient control, or it is not known whether a transmitter will - consistently implement proper congestion control, then congestion - control at the encapsulation layer MUST be provided per [RFC8085]. - Note that this case applies to a significant use case in network - virtualization in which guests run third party networking stacks that - cannot be implicitly trusted to implement conformant congestion - control. + GUE packets are suitable for transmission over lower layers in the + traffic-managed controlled environments that are allowed by the + exceptions stated above, and the rate of corruption of the inner IP + packet on such networks is not expected to increase by comparison to + traffic that is not encapsulated in UDP. For these reasons, GUE does + not provide an additional integrity check except when GUE checksum + [GUEEXTEN] is used when UDP zero-checksum mode is used with IPv6, and + this design is in accordance with requirements 2, 3, and 5 specified + in Section 5 of [RFC6936]. - Out of band mechanisms such as rate limiting, Managed Circuit Breaker - [RFC8084], or traffic isolation MAY be used to provide rudimentary - congestion control. For finer-grained congestion control that allows - alternate congestion control algorithms, reaction time within an RTT, - and interaction with ECN, in-band mechanisms might be warranted. + Generic UDP Encapsulation does not accumulate incorrect transport- + layer state as a consequence of GUE header corruption. A corrupt GUE + packet may result in either packet discard or packet forwarding + without accumulation of GUE state. Active monitoring of GUE traffic + for errors is REQUIRED, as the occurrence of errors will result in + some accumulation of error information outside the protocol for + operational and management purposes. This design is in accordance + with requirement 4 specified in Section 5 of [RFC6936]. + + The remaining requirements specified in Section 5 of [RFC6936] are + not applicable to GUE. Requirements 6 and 7 do not apply because GUE + does not include a control feedback mechanism. Requirements 8-10 are + middlebox requirements that do not apply to GUE tunnel endpoints. + (See Section 5.5 for further middlebox discussion.) + + In summary, a TMCE GUE tunnel is allowed to use UDP zero- checksum + mode for IPv6 when the conditions and requirements stated above are + met. Otherwise, the UDP checksum needs to be used for IPv6 as + specified in [RFC768] and [RFC8200]. Use of GUE checksum is + RECOMMENDED when the UDP checksum is not used. + +5.9. Congestion Considerations + + This section describes congestion considerations for GUE tunnels + (Layer 2 and Layer 3 encapsulation) and transport layer encapsulation + (Layer 4 protocol over GUE). + +5.9.1. GUE tunnels + + Section 3.1.9 of [RFC8085] discusses the congestion considerations + for design and use of UDP tunnels; this is important because other + flows could share the path with one or more UDP tunnels, + necessitating congestion control [RFC2914] to avoid destructive + interference. + + Congestion has potential impacts both on the rest of the network + containing a UDP tunnel and on the traffic flows using the UDP + tunnels. These impacts depend upon what sort of traffic is carried + over the tunnel, as well as the path of the tunnel. The GUE protocol + does not provide any congestion control and GUE UDP packets are + regular UDP packets. Therefore, a GUE tunnel MUST NOT be deployed to + carry non-congestion-controlled traffic over the Internet [RFC8085]. + + Within a TMCE network, GUE tunnels are appropriate for carrying + traffic that is not known to be congestion controlled. For example, a + GUE tunnel may be used to carry Multiprotocol Label Switching (MPLS) + traffic such as pseudowires or VPNs where specific bandwidth + guarantees are provided to each pseudowire or VPN. In such cases, + operators of TMCE networks avoid congestion by careful provisioning + of their networks, rate-limiting of user data traffic, and traffic + engineering according to path capacity. + + When a GUE tunnel carries traffic that is not known to be congestion + controlled in a TMCE network, the tunnel MUST be deployed entirely + within that network, and measures SHOULD be taken to prevent the GUE + traffic from "escaping" the network to the general Internet. Examples + of such measures are: + + o physical or logical isolation of the links carrying GUE from the + general Internet, + + o deployment of packet filters that block the UDP ports assigned + for GUE, and + + o imposition of restrictions on GUE traffic by software tools used + to set up GUE tunnels between specific end systems (as might be + used within a single data center) or by tunnel ingress nodes for + tunnels that don't terminate at end systems. + +5.9.2 Transport layer encapsulation + + If GUE encapsulates a transport layer protocol, such as TCP, it is + expected that the transport layer or application layer properly + implements congestion control or avoidance. In the case that UDP is + encapsulated, the application is expected to provide congestion + control as specified in [RFC8085]. 5.10. Multicast GUE packets can be multicast to decapsulators using a multicast destination address in the outer IP header. Each receiving host will decapsulate the packet independently following normal decapsulator operations. The receiving decapsulators need to agree on the same set of GUE parameters and properties; how such an agreement is reached is outside the scope of this document. @@ -1093,31 +1184,28 @@ * GUE is an extensible encapsulation protocol. Standardized optional data such as security, virtual networking identifiers, fragmentation are defined. * For extensibility, GUE uses flag fields as opposed to TLVs as some other encapsulation protocols do. Flag fields are strictly ordered, allow random access, and are efficient in use of header space. - * GUE allows private data to be sent as part of the encapsulation. - This permits experimentation or customization in deployment. - * GUE allows sending of control messages such as OAM using the same GUE header format (for routing purposes) as normal data messages. * GUE maximizes deliverability of non-UDP and non-TCP protocols. * GUE provides a means for exposing per flow entropy for ECMP for - atypical protocols such as SCTP, DCCP, ESP, etc. + IP atypical protocols such as SCTP, DCCP, ESP, etc. 6.2. Comparison of GUE to other encapsulations A number of different encapsulation techniques have been proposed for the encapsulation of one protocol over another. EtherIP [RFC3378] provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling layer 2 and layer 3 packets over IP. NVGRE [RFC7637] and VXLAN [RFC7348] are proposals for encapsulation of layer 2 packets for network virtualization. IPIP [RFC2003] and Generic packet tunneling @@ -1146,24 +1234,20 @@ extensible mechanism for encapsulating all IP protocols in UDP with minimal overhead (four bytes of additional header). o GUE is extensible. New flags and extension fields can be defined. o The GUE header includes a header length field. This allows a network node to inspect an encapsulated packet without needing to parse the full encapsulation header. - o Private data in the encapsulation header allows local - customization and experimentation while being compatible with - processing in network nodes (routers and middleboxes). - o GUE includes both data messages (encapsulation of packets) and control messages (such as OAM). o The flags-field model facilitates efficient implementation of extensibility in hardware. For instance, a TCAM can be used to parse a known set of N flags where the number of entries in the TCAM is 2^N. By comparison, the number of TCAM entries needed to parse a set of N arbitrarily ordered TLVs is approximately e*N!. o GUE includes a variant that encapsulates IPv4 and IPv6 packets @@ -1178,23 +1262,23 @@ o Authentication, integrity, and confidentiality of the GUE payload. GUE security is provided by extensions for security defined in [GUEEXTEN]. These extensions include methods to authenticate the GUE header and encrypt the GUE payload. The GUE header can be authenticated using a security extension for an HMAC (Hashed Message Authentication Code). Securing the GUE payload - can be accomplished use of the GUE Payload Transform extension. This - extension allows the use of DTLS (Datagram Transport Layer Security) - to encrypt and authenticate the GUE payload. + can be accomplished by use of the GUE Payload Transform extension. + This extension allows the use of DTLS (Datagram Transport Layer + Security) to encrypt and authenticate the GUE payload. A hash function for computing flow entropy (section 5.11) SHOULD be randomly seeded to mitigate some possible denial service attacks. 8. IANA Considerations 8.1. UDP source port A user UDP port number assignment for GUE has been assigned: @@ -1238,29 +1322,29 @@ +----------------+------------------+---------------+ | Control type | Description | Reference | +----------------+------------------+---------------+ | 0 | Control payload | This document | | | needs more | | | | context for | | | | interpretation | | | | | | | 1..127 | Unassigned | | | | | | - | 128..255 | User defined | This document | + | 128..255 | Experimental | This document | +----------------+------------------+---------------+ 9. Acknowledgements The authors would like to thank David Liu, Erik Nordmark, Fred - Templin, Adrian Farrel, Bob Briscoe, and Murray Kucherawy for - valuable input on this draft. Special thanks to Fred Templin who is - serving as document shepherd. + Templin, Adrian Farrel, Bob Briscoe, Murray Kucherawy, Mirja + Kuhlewind, and David Black for valuable input on this draft. Special + thanks to Fred Templin who is serving as document shepherd. 10. References 10.1. Normative References [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 10.17487/RFC0768, August 1980, . [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage @@ -1304,24 +1388,20 @@ Procedures for the Management of the Service Name and Transport Protocol Port Number Registry", BCP 165, RFC 6335, DOI 10.17487/RFC6335, August 2011, . [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", RFC 5226, DOI 10.17487/RFC5226, May 2008, . - [GUEEXTEN] Herbert, T., Yong, L., and Templin, F., "Extensions for - Generic UDP Encapsulation", draft-herbert-gue-extensions- - 06 - 10.2. Informative References [RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, March 2017, . [RFC7605] Touch, J., "Recommendations on Using Assigned Transport Port Numbers", BCP 165, RFC 7605, DOI 10.17487/RFC7605, August 2015, . @@ -1398,20 +1478,27 @@ [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The Locator/ID Separation Protocol (LISP)", RFC 6830, DOI 10.17487/RFC6830, January 2013, . [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, "Encapsulating MPLS in UDP", RFC 7510, DOI 10.17487/RFC7510, April 2015, . + [GUEEXTEN] Herbert, T., Yong, L., and Templin, F., "Extensions for + Generic UDP Encapsulation", draft-ietf-intarea-gue- + extensions-06 + + [IPTUN] Touch, J. and Townsley, M., "IP Tunnels in the Internet + Architecture", draft-ietf-intarea-tunnels-10 + [IANA-PN] IANA, "Protocol Numbers", . [TCPUDP] Chesire, S., Graessley, J., and McGuire, R., "Encapsulation of TCP and other Transport Protocols over UDP", draft-cheshire-tcp-over-udp-00 [GENEVE] Gross, J., Ed., Ganga, I. Ed., and Sridhar, T., "Geneve: Generic Network Virtualization Encapsulation", draft-ietf- nvo3-geneve-10 @@ -1457,21 +1544,21 @@ GUE encapsulation is compatible with multi-queue NICs that support five-tuple hash calculation for UDP/IP packets as input to RSS. The flow entropy in the UDP source port ensures classification of the encapsulated flow even in the case that the outer source and destination addresses are the same for all flows (e.g. all flows are going over a single tunnel). By default, UDP RSS support is often disabled in NICs to avoid out- of-order reception that can occur when UDP packets are fragmented. As - discussed is section 5.8, fragmentation of GUE packets is mostly + discussed is section 5.7, fragmentation of GUE packets is mostly avoided by fragmenting packets before entering a tunnel, GUE fragmentation, path MTU discovery in higher layer protocols, or operator adjusting MTUs. Other UDP traffic might not implement such procedures to avoid fragmentation, so enabling UDP RSS support in the NIC might be a considered tradeoff during configuration. A.2. Checksum offload Many NICs provide capabilities to calculate the standard ones complement checksum for packets in transmit or receive [CSUMOFF]. @@ -1580,24 +1667,24 @@ To facilitate TSO with GUE, it is recommended that extension fields do not contain values that need to be updated on a per segment basis. For example, extension fields should not include checksums, lengths, or sequence numbers that refer to the payload. If the GUE header does not contain such fields then the TSO engine only needs to copy the bits in the GUE header when creating each segment and does not need to parse the GUE header. A.4. Large Receive Offload - Large Receive Offload (LRO) [SEGOFF] is a NIC feature where packets - of a TCP connection are reassembled, or coalesced, in the NIC and - delivered to the host as one large packet. This feature can reduce - CPU utilization in the host. + Large Receive Offload (LRO) [SEGOFF] is a NIC feature where received + packets of a TCP connection are reassembled, or coalesced, in the NIC + and delivered to the host as one large packet. This feature can + reduce CPU utilization in the host. LRO requires significant protocol awareness to be implemented correctly and is difficult to generalize. Packets in the same flow need to be unambiguously identified. In the presence of tunnels or network virtualization, this may require more than a five-tuple match (for instance packets for flows in two different virtual networks may have identical five-tuples). Additionally, a NIC needs to perform validation over packets that are being coalesced, and needs to fabricate a single meaningful header from all the coalesced packets. @@ -1669,17 +1756,19 @@ Santa Clara, CA 95054 US Email: tom@herbertland.com Lucy Yong Independent Austin, TX US + Email: lucy_yong@yahoo.com + Osama Zia Microsoft 1 Microsoft Way Redmond, WA 98029 US Email: osamaz@microsoft.com