draft-ietf-tsvwg-tcp-ulp-frame-00.txt   draft-ietf-tsvwg-tcp-ulp-frame-01.txt 
Transport Area Working Group S. Bailey Transport Area Working Group S. Bailey (Sandburst)
Internet-draft Sandburst Internet-draft J. Chase (Duke)
Expires: January 2001 J. Pinkerton Expires: May 2002 J. Pinkerton (Microsoft)
Microsoft A. Romanow (Cisco)
C. Sapuntzakis C. Sapuntzakis (Cisco)
Cisco J. Wendt (HP)
M. Wakeley J. Williams (Emulex)
Agilent
J. Wendt
HP
J. Williams
Emulex
ULP Framing for TCP TCP ULP Framing Protocol (TUF)
draft-ietf-tsvwg-tcp-ulp-frame-00 draft-ietf-tsvwg-tcp-ulp-frame-01
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 47 skipping to change at page 1, line 42
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2001). All Rights Reserved. Copyright (C) The Internet Society (2001). All Rights Reserved.
Abstract Abstract
The framing protocol accepts PDUs from a ULP (upper level protocol) The TCP ULP Framing (TUF) protocol defines a shim layer protocol
and transports them over a TCP connection. This is done in such a between an Upper Layer Protocol (ULP) and TCP. TUF also depends on
way that the PDUs can be recovered at the receiver even if a specified TCP segmentation convention between TUF endpoints.
preceding TCP segments have not yet been received. This is useful Together, the shim and segmentation conventions enable a TUF/TCP
when the PDUs are self describing within the context of a protocol receiver to recognize ULP data units within a TCP segment
TCP connection. In this case, the framing protocol allows incoming independently of other TCP segments. This capability simplifies
packets to be parsed (but not processed) in the order received and the design of enhanced network interfaces implementing direct data
their data to be placed directly in the ultimate destination memory placement for ULPs using TCP. Direct data placement is a key step
instead of TCP reassembly buffers. to making IP networking competitive with high-end interconnect
solutions in data centers and other high-performance application
domains.
Table Of Contents Table Of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Definitions . . . . . . . . . . . . . . . . . . . . . . 3
2. Theory Of Operation . . . . . . . . . . . . . . . . . . . . 3 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . 4
3. ULP Support For Framing . . . . . . . . . . . . . . . . . . 5 2.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 4
4. Negotiating Use Of The Framing Protocol . . . . . . . . . . 6 2.2. Approach . . . . . . . . . . . . . . . . . . . . . . . . 5
5. PDU Alignment Mode . . . . . . . . . . . . . . . . . . . . . 6 3. Rational For TUF . . . . . . . . . . . . . . . . . . . . 6
5.1. Framing-aware TCP . . . . . . . . . . . . . . . . . . . . 8 3.1. Direct Data Placement . . . . . . . . . . . . . . . . . 7
5.2. PDU Alignment Mode Exception Cases . . . . . . . . . . . . 9 3.2. Direct Data Placement with TCP . . . . . . . . . . . . . 8
5.3. Validity Of Framing-aware TCP Segmentation . . . . . . . . 10 3.2.1. The Simple Case: ULP-unaware Placement . . . . . . . . . 9
5.4. Receiving In PDU Alignment Mode . . . . . . . . . . . . . 11 3.2.2. The Complex Case: ULP-aware Placement . . . . . . . . . 9
6. Marker Mode . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.3. The Problem of ULP-aware Placement with TCP . . . . . . 10
7. Security Considerations . . . . . . . . . . . . . . . . . . 12 3.2.4. Finding ULPDUs In Out-of-order Segments . . . . . . . . 11
7.1. Security Protocol Interactions . . . . . . . . . . . . . . 13 3.2.5. The TUF Solution . . . . . . . . . . . . . . . . . . . . 12
7.2. Using IPSec With The Framing Protocol . . . . . . . . . . 13 3.2.6. TUF's ULP Assumptions . . . . . . . . . . . . . . . . . 12
7.3. Using TLS With The Framing Protocol . . . . . . . . . . . 13 4. The Protocol . . . . . . . . . . . . . . . . . . . . . . 13
7.3.1. Using TLS In PDU Alignment Mode . . . . . . . . . . . . 15 4.1. The Framing Protocol Data Unit (FPDU) . . . . . . . . . 13
7.3.2. Using TLS In Marker Mode . . . . . . . . . . . . . . . . 15 4.1.1. FPDU Format . . . . . . . . . . . . . . . . . . . . . . 13
7.4. Other Security Considerations . . . . . . . . . . . . . . 16 4.1.2. FPDU Size Selection . . . . . . . . . . . . . . . . . . 14
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . 16 4.2. TUF-conforming TCP Sender Segmentation . . . . . . . . . 15
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3. Negotiating TUF . . . . . . . . . . . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 17 4.4. TUF Receiver ULPDU Containment Property Testing . . . . 16
A. Sockets Support For The Framing Protocol . . . . . . . . . . 19 5. Protocol Characteristics . . . . . . . . . . . . . . . . 17
A.1 Enabling The Framing Protocol . . . . . . . . . . . . . . . 20 5.1. Properties Of TUF-conforming TCP Senders . . . . . . . . 17
A.2 Sending Data Atomically . . . . . . . . . . . . . . . . . . 20 5.2. Exception Cases . . . . . . . . . . . . . . . . . . . . 18
A.3 Retrieving The Current EMSS . . . . . . . . . . . . . . . . 21 5.2.1. Resegmenting Intermediaries . . . . . . . . . . . . . . 18
A.4 Disabling ULP PDU Packing . . . . . . . . . . . . . . . . . 21 5.2.2. PMTU Reduction . . . . . . . . . . . . . . . . . . . . . 19
A.5 Enabling Emergency Mode . . . . . . . . . . . . . . . . . . 21 5.2.3. PMTU Increase . . . . . . . . . . . . . . . . . . . . . 20
A.6 Setting The Sending Marker Interval . . . . . . . . . . . . 22 5.2.4. Receive Window < EMSS . . . . . . . . . . . . . . . . . 21
A.7 Setting The Receiving Marker Interval . . . . . . . . . . . 22 5.2.5. Size of ULPDU + 8 > EMSS . . . . . . . . . . . . . . . . 21
Full Copyright Statement . . . . . . . . . . . . . . . . . . . 22 6. Security Considerations . . . . . . . . . . . . . . . . 22
6.1. Protocol-specific Security Considerations . . . . . . . 22
6.2. Using IPSec With TUF . . . . . . . . . . . . . . . . . . 22
6.3. Using TLS With TUF . . . . . . . . . . . . . . . . . . . 22
7. IANA Considerations . . . . . . . . . . . . . . . . . . 25
References . . . . . . . . . . . . . . . . . . . . . . . 25
Authors' Addresses . . . . . . . . . . . . . . . . . . . 26
A. Sample Sockets Support For TUF . . . . . . . . . . . . . 27
A.1 Basic Principles . . . . . . . . . . . . . . . . . . . . 28
A.2 Enabling TUF . . . . . . . . . . . . . . . . . . . . . . 28
A.3 Sending Data . . . . . . . . . . . . . . . . . . . . . . 29
A.4 Retrieving The Current EMSS or MULPDU . . . . . . . . . 29
A.5 Disabling ULPDU Packing . . . . . . . . . . . . . . . . 29
A.6 Disabling The Report of Oversized ULPDUs . . . . . . . . 30
Full Copyright Statement . . . . . . . . . . . . . . . . 30
1. Introduction 1. Definitions
Many upper layer protocols (ULP)s, particularly those which perform The following terms and abbreviations are used in this document.
bulk data transfer, permit the final location of transferred data
(e.g. a ULP client buffer) to be known when the data is received.
The information required to compute the final location of such data
is contained in local protocol state and ULP protocol data unit
(PDU) headers. In this case, ULP data can be placed directly at
its final destination by a network interface with knowledge of the
ULP. A direct placement network interface can offer extremely high
performance since the host CPU does not copy the data at all, and
the data only crosses system buses once.
Both specific application ULPs, such as iSCSI, and generic hardware data delivery - the delivery of received ULP payloads to the
acceleration ULPs, such as an RDMA protocol, offer the potential ULP application, i.e, notifying the application of data
for direct data placement. The advantage of using a generic arrival by completing a receive operation or generating an
acceleration ULP for direct data placement is that the same direct event.
placement network interface can be used to accelerate many
different application protocols (e.g. iSCSI on RDMA).
PDU shall mean ULP PDU for the remainder of the document unless data placement - the storage of received ULP payloads to host
otherwise indicated. memory, pending delivery to the ULP application.
TCP specifies that the ULP is notified of the delivery of octets in direct data placement - the storage of received ULP payloads
the order in which they are presented to the sender. Many ULPs directly to application-specified buffers without intermediate
rely on this sequencing guarantee. While notification from TCP is buffering or copying.
required to be in-order, this does not prohibit arbitrary placement
of TCP data received in any order. Even if data for a ULP is
placed out-of-order, the ULP may still only be notified of of such
data in-order, in accordance with TCP semantics. In other words,
direct data placement based upon ULP information is not at odds
with TCP's stream-orientation, but rather is a natural application
of TCP's philosophy that ULP PDU framing be performed at the layer
above TCP. RFC 879 also points out in its discussion of layering
and modularity that this type of behavior is completely in harmony
with layered protocol design [RFC0879].
Packet delay, loss and reordering are expected, common occurrences EMSS - the effective maximum segment size. EMSS is the TCP
in IP networks. Traditionally, data in TCP segments is placed in maximum segment size (MSS) defined in RFC 793 [TCP] and
an intermediate reassembly buffer to restore the sending order exchanged during TCP connection establishment, adjusted by the
which may have been lost as a result of segment delay, loss or current path maximum transfer unit (MTU) [PathMTU].
reordering. While it is possible for a direct placement network
interface to implement a complete reassembly buffer, the cost of
doing so is prohibitive. Such a reassembly buffer would need to
have a size equal to the sum of the maximum window sizes of all
active connections. On a fast network link (e.g. > 1 Gb/s), the
window size for each connection can be very large, which would
require a huge, very high speed reassembly buffer on the network
interface.
A way to find PDUs when previous PDU headers are in delayed, lost FPDU - framing protocol data unit. The protocol data unit
or reordered segments will permit data in these subsequent PDUs to defined by TUF.
be placed immediately by a direct placement network interface.
This will reduce the buffer requirements for a direct placement
network interface. Without such a mechanism, the data from
subsequent PDUs must all be buffered in the adapter until all
previous TCP segments are received. Initial discussion of this
issue, and how it relates specifically to iSCSI can be found in an
early iSCSI design team memo [Satran].
This document specifies a protocol with two modes for efficiently MULPDU - maximum upper layer protocol data unit size. The
finding PDUs in the presence of lost, delayed or reordered TCP size of the largest ULPDU that fits in an EMSS-sized FPDU.
segments.
2. Theory Of Operation NIC - network interface controller. The device that provides
a host's access to a physical network link.
One very efficient way to guarantee that subsequent PDUs can always PDU - protocol data unit. A self-contained block of control
be found when a previous PDU header has been lost is to ensure each and data defined by a particular protocol.
TCP segment begins with a PDU and contains an integral number of
PDUs. In this case, the data in each TCP segment may be placed
independently of all other segments. No reassembly buffer is
required at all. Guaranteeing a TCP segment begins with a PDU
requires a modification to TCP's sending behavior. This document
defines the behavior of a TCP with a modified sender behavior,
called a `framing-aware TCP'. A framing-aware TCP allows a ULP
implementation to ensure that each TCP segment begins with a PDU.
A framing-aware TCP is fully compliant with all RFCs governing TCP
and fully interoperable with existing, compliant, non-framing-aware
TCP implementations. When the framing protocol can use a framing-
aware TCP, it operates in `PDU alignment mode'. The framing
protocol in PDU alignment mode uses a combination of a framing-
aware TCP and an encapsulation of PDUs to permit error free PDU
location when TCP segments are lost.
Another way to locate PDUs in the presence of lost TCP segments is RDMA - Remote Direct Memory Access protocol. A data transfer
to insert markers at a known period in the TCP octet stream. Each protocol which uses memory access-style transfer mode(s) to
marker points to the beginning of the next PDU. If the marker provide generic direct data placement capabilities for
frequency is high relative to packet loss rate (e.g. once per TCP arbitrary ULPs.
segment), the receiver can, with very high likelihood, learn the
location of the next PDU from a marker even when a previous PDU
header has been lost. The receiver must still buffer the octets
between the lost TCP segment and the subsequent PDU, but this is
likely to be a much smaller buffer than the maximum TCP window
size. By limiting the maximum PDU size, the receiver buffering can
be reasonably bounded. This document defines a periodic marker
mechanism which can be used to bound receiver reassembly buffers.
Two framing protocol modes are defined because of the substantial TUF - TCP ULP Framing protocol. The protocol defined in this
tradeoff between the modes. Both modes can bound reassembly buffer document.
on a direct placement network interface, but the modes apply in
disjoint circumstances.
Marker mode has the following advantage: ULP - upper layer protocol. The client protocol using the
services of the transport layer, or TUF.
1. Implementable without TCP sender modification ULPDU - upper layer protocol data unit.
The PDU alignment mode has the following advantages: ULPDU containment property - the property that a TCP segment
contains exactly an integral number of ULPDUs.
1. No reassembly buffering required at all 2. Overview
2. Placement information is always at the start of a TCP segment, This section summarizes the motivation for the TCP ULP Framing
substantially simplifying hardware processing (TUF) protocol and explains its operation in brief. Section 3
(`Rational for TUF') develops the rationale for TUF in detail.
Section 4 (`The Protocol') defines the protocol itself. Section 5
(`Protocol Characteristics') examines various properties of the
protocol's operation. Implementors may wish to refer directly to
sections 4 and 5.
PDU alignment mode is more powerful, and is preferable when 2.1. Motivation
available. Marker mode still requires some high-speed reassembly
memory, whose size is a linear function of the number of active TCP
connections. Furthermore, marker mode only offers a probabilistic
bound on the reassembly buffer size per active TCP connection. In
cases where many TCP segments with PDU headers are lost, the buffer
size required for direct placement could approach that of a
complete reassembly buffer.
It is expected that ultimately PDU alignment mode will dominate The IP protocols are not usually used for high-performance high
because of compelling cost and performance scalability advantages. speed data transfers due to overhead in TCP processing. Instead, a
However, until framing-aware TCPs are ubiquitous, marker mode number of special purpose protocols have been used. The domain of
offers an alternative for use with an unmodified TCP application for such high speed buffer transfer includes storage,
implementation. To make transition from marker mode to PDU video delivery and processing, and various applications of cluster
alignment mode easy, the sockets API extension defined in Appendix computing, such as scalable database or application service. For
A supports both modes relatively transparently. A ULP which reasons discussed below, today, there is great industry interest in
implements the behavior required for PDU alignment mode can use developing an IP standard for low overhead high bandwidth data
marker mode without modification. transfer, which would decrease the costs of high speed
interconnects and supplant special purpose protocols.
Framing protocol receivers MAY implement either PDU alignment mode, The approach typically used for low overhead transfers is called
or marker mode, or both. Framing protocol senders, MUST implement direct data placement, in which the network interface places data
marker mode, and MUST implement PDU alignment mode if the directly in application buffers, avoiding the latency and memory
underlying TCP is framing-aware. bandwidth costs associated with copying. Direct data placement can
in principal be done with either of IP's reliable transports--SCTP
or TCP. This document considers what is needed to do direct data
placement with TCP.
3. ULP Support For Framing In order to place data directly in application buffers, the network
interface needs to use information in the Upper Layer Protocol Data
Units (ULPDUs) contained in the TCP stream. This can be
accomplished routinely except when TCP segments arrive out of
order. If TCP segments arrive out of order, the location of the
ULPDUs in the TCP segment cannot be found. The TUF protocol
addresses this problem of finding ULPDU headers in the TCP stream,
even when TCP segments arrive out of order.
A ULP using the framing protocol will submit each complete PDU to 2.2. Approach
the framing module in a single sending operation. This behavior is
already common practice for most ULP implementations.
When the framing protocol is in PDU alignment mode, each PDU TUF is implemented as a shim layer between an ULP and TCP. The
submitted is limited to the smaller of 2^16-8 (65528) and the size end-to-end data flow is:
that will fit entirely within a TCP segment. The framing protocol
in PDU alignment mode MUST fail any attempt to submit a PDU that is 0. Use of TUF is negotiated end-to-end by the ULP.
larger than will fit with an 8-byte framing header in a TCP
1. The ULP delivers a data stream with ULPDUs delimited to TUF.
2. TUF inserts a header and delivers the shimmed ULPDUs to TCP.
3. The TUF-aware TCP sender preserves boundaries of shimmed
ULPDUs (TUF FPDUs) as much as possible when delivering
segments to the IP layer.
4. The receiving TCP delivers shimmed ULPDUs to the receiving TUF
layer.
5. TUF removes the shim and delivers the ULPDUs to the ULP.
In other words, the layering of TUF is:
ULP client
^
|
| ULPDUs (in octet stream)
|
v
TUF
^
|
| FPDUs (containing ULPDUs)
|
v
TUF-conforming TCP
^
|
| TCP Segments (each containing an FPDU)
|
v
. . .
Note that while the semantics of this protocol layering must be
maintained, the receiving network interface may use the information
in the framed ULPDUs to place the data in memory on the host.
Whatever the case, the data is only delivered to the ULP when all
preceding TCP data has arrived.
3. Rational For TUF
This document defines the TUF protocol as a shim layer between an
Upper Layer Protocol (ULP) and TCP. TUF also depends on a TCP
segmentation convention between TUF/TCP endpoints specified in this
document. Taken together they provide the capability for a TUF/TCP
receiver to recognize ULPDUs by processing each TCP segment
independently, without requiring state from previous segments.
The purpose of TUF is to enable practical designs for enhanced
network interfaces (NICs) implementing direct data placement for
TCP-based ULPs. The purpose of direct data placement is to
eliminate the need for a host to copy received data after it
arrives in host memory. This copying incurs CPU, memory and bus
costs that are substantial and are not masked by advancing hardware
technology.
A general and practical solution to the receive copy problem has
eluded the IP networking community for almost two decades. There
is a long history of research and experimental schemes to reduce or
eliminate receiver copying overhead for IP networking in general,
and for TCP/IP communication in particular. While these systems
have convincingly demonstrated the potential performance benefits
of reducing copy costs, all such schemes suffer from one or more of
the following limitations: they require a significant restructuring
of operating system buffering and/or APIs; they are limited to
specific modes of communication (e.g., bulk data transfer) or
specific application ULPs; they do not scale on multiprocessor
hosts; their benefits depend on specific properties of the network
(e.g., large MTUs) or host buffer size and alignment. Moreover,
all such schemes require some degree of support from NICs to
separate payloads from headers and/or ensure that their placement
in host memory meets specific requirements (e.g., for page
placement and alignment).
Inherent copying costs for IP communication are one motivation to
use alternative non-IP technologies for high-speed networking. A
number of specialized technologies have been developed for high
speed data transfers in which network interfaces transfer data from
application buffer to application buffer without software touching
the data. Some examples include the VAXCluster Interconnect in
1983, Fibre Channel (FC) in 1994, and today InfiniBand (IB) and
Virtual Interface Architecture (VIA). These alternatives have
eroded the popularity of IP technologies in application domains
including network storage, video processing and delivery, and
cluster computing for scientific applications and scalable
database-related services.
Until recently, several factors have limited interest in promoting
IP networking as a solution in these application domains. First,
the competing network technologies offered significantly higher
link speeds than the network hardware available for use with IP.
Second, these application domains were a relatively small segment
of the network market. Recently, however, Ethernet networks have
closed the bandwidth gap and even exceeded the bandwidth of
alternatives such as FibreChannel, at much lower cost. At the same
time, an increasing number of applications are server-hosted in
data centers to enable sharing and access from a growing number of
IP-connected client devices and locations. With the growth in
importance and number of data centers, high-speed interconnection
within the data center is now central to the everyday operation of
Internet services.
Thus, technology changes have created an opportunity and demand to
extend the benefits of IP technologies to high-performance
application domains, while simultaneously increasing the importance
of those domains. The ubiquity of IP offers economies of scale
heavily favoring IP in these domains. For example, reliance on
specialized non-IP technologies for high-performance domains
creates a need to support multiple protocols and redundant network
infrastructure in data centers, and it compromises portability and
interoperability of data center solutions. Moreover, comprehensive
support for network management and security is developing rapidly
in the IP space. Use of IP technologies would allow data centers
to benefit from these enhancements.
3.1. Direct Data Placement
Direct data placement is a key step toward making IP networking
competitive in data centers and other high-performance domains.
Direct data placement refers to the ability of a NIC to place data
directly from the network into designated application buffers,
without intermediate copying. Direct data placement is attractive
relative to other solutions to the receive copy problem. It is the
only solution that can be implemented in a way that is compatible
with existing operating systems, since the receiving NIC takes over
most of the responsibility to avoid receive copying. Also, direct
data placement generalizes easily to a range of ULPs. In
particular, the establishment of an IETF standard for an IP
transport-based direct data placement protocol, which would allow
NICs to directly place data independent of the application ULP
using it.
The TUF protocol is necessary to permit easily deployable enhanced
NICs supporting direct data placement. Such NICs already exist and
their usage is growing rapidly, but their development is impeded by
the lack of standards. Direct data placement is unnecessarily
difficult and expensive to design and implement for existing TCP-
based ULPs; the key objective of TUF is to define transport
conventions to simplify the design of these NICs. A related
impediment is that in the absence of a general direct data
placement protocol these products are limited to specific ULPs such
as iSCSI. TUF, and possibly additional, higher layer protocol
definitions outside the scope of this document, would encourage the
market by ensuring interoperability of product offerings from
different vendors.
This document defines a framing protocol (TUF) and TCP segmentation
conventions that enable simple support of direct data placement for
a class of TCP-based ULPs. It does not propose a generic direct
placement ULP, such as an RDMA protocol, or any facility for direct
data placement, but only the foundations for building such a
facility on TCP. A key objective of TUF is to do this in a way
that is compatible with existing standards and with the spirit of
TCP's stream communication model. TUF can simplify support for
direct data placement for ULPs such as iSCSI, and it can serve as a
basis for a future RDMA proposal.
The key limitation of TUF as a solution to the receive copy problem
is that it works only if the ULP standard and the sending and
receiving implementations all support it. Impact on the sender and
ULPs is minimal, but ULPs must be adapted to allow use of TUF at
the ULP/transport boundary. The necessary modifications may be
quite small. Use of TUF is a negotiated option between the sender
and receiver for each ULP session, preserving interoperability
among senders and receivers that do not support TUF.
3.2. Direct Data Placement with TCP
Direct data placement is widely used to accomplish high-performance
data transfer in non-IP technologies such as block storage channels
(SCSI, Fibre Channel, etc.), and other specialized high performance
networks like InfiniBand. This section considers how direct
placement can be done with TCP.
The Internet Protocol suite provides two transports that are prime
candidates for use with direct data placement -- SCTP and TCP. The
framing features of the SCTP Stream Control Transmission Protocol
[SCTP] make it more directly adaptable for direct data placement
for future ULPs using SCTP. However, the maturity and ubiquity of
TCP make it desirable to define a flexible method for direct data
placement for TCP-based ULPs as well.
There has been a great deal of `moral confusion' concerning the
interaction of direct data placement with TCP's ordering
guarantees. These ordering guarantees do not prohibit direct data
placement, even if data is placed as it arrives out of order.
TCP guarantees data delivery to the application ULP as an ordered,
sequential stream [RFC793]. Data is delivered only when TCP has
notified the application of its arrival and transferred ownership
of the receive data buffer. TCP does not specify how received data
is stored prior to its delivery, and it does not preclude placement
of data in application buffers out of order, as long as no data is
delivered until all preceding data has also been delivered. Out-
of-order placement greatly simplifies direct data placement NICs
because it streamlines data paths and eliminates the need for a TCP
reassembly buffer on the NIC.
An implementation performing direct data placement must still
respect all TCP delivery semantics. For example, if a checksum
integrity check fails, the data must not be placed in ULP-supplied
buffers, because, for example, the TCP ports and the TCP sequence
number are not trustworthy.
3.2.1. The Simple Case: ULP-unaware Placement
Direct data placement into a ULP client-supplied buffer designated
to hold the next data delivered to the ULP, regardless of the
contents of the received data, is one of the simplest possible
forms of direct data placement. This form of direct data placement
is already fully supported by existing TCP mechanisms. New NIC
products currently, or soon to be available, which claim to offer
`full zero copy operation' typical provide only this ULP-unaware
form of direct data placement.
While ULP-unaware direct data placement works well for ULPs like
FTP where the entire contents of a TCP connection are known to be
nothing but a single stream of bulk client data, most widely used
ULPs, e.g. HTTP [HTTP], BEEP [BEEP] and storage protocols,
multiplex control and data, and possibly even interleave data from
different requests on the same TCP connection. The simple ULP-
unaware direct data placement is inadequate to avoid data copies
for these ULPs.
3.2.2. The Complex Case: ULP-aware Placement
An explicit goal of this proposal is to support out-of-order direct
data placement for ULPs that provide additional transport-like
features such as control and data multiplexing, layered above TCP
(e.g., iSCSI or a generic direct data placement protocol such as
RDMA). In many ULPs, such as storage protocols, control
information contained in the ULP uniquely identifies the
destination application buffer of each particular piece of data.
For example, suppose a client requests a read operation using a
network storage ULP, specifying the destination buffer for the
requested data. The requesting ULP includes control information in
the request (e.g., in the ULPDU header) uniquely identifying that
buffer, and the responder includes that information in the read
response. For some protocols, the identifier is a unique request
ID, allowing the client ULP to identify the buffer indirectly
through a table of pending requests. If the storage protocol uses
RDMA, the response may specify the buffer directly by means of a
region identifier.
A network interface that understands the relevant ULP control
information can use it to place the incoming data (e.g., read
response payload) directly in the correct buffer. In this case,
data placement is guided by ULPDU headers embedded in the TCP data
stream. The NIC accesses these headers as hints for placement of
the ULP payloads--a form of integrated layer processing for each
TCP segment as it arrives. This is compatible with TCP's ordering
properties if completion of ULP header processing and delivery of
the payload data to the application are strictly in order.
3.2.3. The Problem of ULP-aware Placement with TCP
The problem with performing direct data placement as a function of
ULP control information in TCP is that it may be difficult to
locate the ULP control information (ULPDU headers) within a TCP
segment. segment.
The TCP maximum segment size (MSS) is defined in RFC 793 [TCP] as If all TCP segments are received in sequence order, ULP control
the segment size exchanged on TCP connection establishment. In information can be unambiguously located by the rules that permit
addition, there is the segment size presently used by TCP which is any ULP implementation to do so. For example, each ULPDU may
less than or equal to the exchanged MSS, adjusted by the current contain a length field that implicitly specifies the location of
path MTU [PathMTU]. This document calls the MSS presently in use the beginning of the subsequent ULPDU.
the `effective maximum segment size' (EMSS). The EMSS is of
primary concern to the framing protocol in PDU alignment mode.
The TCP EMSS can shrink to 8 octets [PathMTU] which leaves no room If TCP segments are not received in sequence order, without taking
for a PDU in PDU alignment mode. If the EMSS goes below 512 octets, additional measures, it may not be possible to unambiguously locate
the ULP MAY instruct the framing protocol to enter an "emergency ULP control information needed for direct data placement. For
mode." In this mode, the framing module MUST accept PDUs up to 512 example, if ULPDU length information is in a TCP segment that is
octets and MAY fragment a PDU across TCP segments. delayed or lost in transmission, assuming the ULPDU length is the
only means of locating the beginning of the subsequent ULPDU, it is
impossible to locate ULP control information for ULPDUs in
subsequent TCP segments until the lost or delayed TCP segment is
received. ULP control information, and the data whose placement
depends on it may even be in different TCP segments. If the ULP
control information is in a TCP segment that is delayed or lost, it
is impossible to directly place the data until the ULP control
information is received.
The EMSS may change during the course of the connection. The 3.2.4. Finding ULPDUs In Out-of-order Segments
framing module in PDU alignment mode MUST notify the ULP sender of
changes in the EMSS. The framing module in PDU alignment mode MUST
provide the current value of the path EMSS to the ULP on request.
When the framing protocol is in marker mode, each PDU submitted is Early attempts at ULP-aware direct data placement in TCP took the
limited to 2^16-8 minus the size of all interspersed markers. The approach of only directly placing data for TCP segments received
framing protocol in marker mode MUST fail any attempt to submit a in-order. Otherwise, data was copied through a reassembly buffer
PDU larger than this limit. The framing module MAY impose a as in a traditional implementation. Unfortunately packet loss, and
smaller, implementation specific size limit on PDUs. In order to attendant out-of-order reception is a frequent, continuous
effectively bound the receiver's reassembly buffer size, the ULP characteristic of both wide-area, and switched local area networks
SHOULD submit PDUs limited in size by some appropriate function of of almost any size, as TCP adjusts to varying congestion
the receiver's reassembly buffer resources, but no specific limit conditions. Under these conditions, a large portion of the data
is imposed by the framing protocol. transferred ends up being copied, rather than being directly
placed.
4. Negotiating Use Of The Framing Protocol Another solution to this problem is to build a reassembly buffer
into the network interface. Data received out-of-order can be held
in the network interface reassembly buffer until all preceding data
is received, and then direct placement can be performed on the
reassembled data. Within certain implementation assumptions, this
is reasonable approach, but, unfortunately there are a number of
issues including very large memory requirements, limited
scalability, and increased latency, that make the reassembly
approach undesirable.
Negotiating use of the framing protocol is the responsibility of The size of reassembly buffer needed in the network interface is a
the ULP. The use of the framing protocol MAY be negotiated direct function of the bandwidth * delay product of all active TCP
separately for each direction on a particular connection. The connections. Reasonable assumptions on the active bandwidth *
negotiation procedure MUST ensure that when receive framing is delay product can imply a large amount of reassembly memory.
enabled, the remote peer will not transmit the first TCP segment Furthermore, this large reassembly memory must run at high
with framed data until it is certain that the local peer has speed---more than two times the link speed, to maintain full link
actually enabled receive framing. bandwidth.
If a receiver requests PDU alignment mode, and the sender supports Finally, performing reassembly in the network interface requires
PDU alignment mode, then the sender MUST enable PDU alignment mode. that the bandwidth from the network interface to host memory be not
This ensures that PDU alignment mode, with its favorable hardware just equal, but substantially greater than the maximum bandwidth of
characteristics, is used when possible. the network link, to ensure that the reassembly buffer is drained
when reassembly is complete. System bus and interconnect bandwidth
are particularly scarce and expensive resources in most systems.
The specific negotiation mechanism for enabling the framing What is needed to permit ULP-aware direct data placement without
protocol and choosing the framing mode is outside the scope of this reassembly buffering is a way to ensure that the ULP control
document. However, note that framing protocol behavior is information and the data associated with it is highly likely to be
requested by the receiver and offered by the sender. Negotiation contained completely within a single TCP segment, and a way for a
will probably include exchange of: receiver to validate this containment property on TCP segments it
receives. If the receiver can determine that a ULPDU starts at the
beginning of a TCP segment, the receiver can perform ULP-aware
direct placement for that ULPDU, and subsequent ULPDUs contained in
that TCP segment. The property that a ULPDU is completely
contained within a TCP segment is called the `ULPDU containment
property'.
1. the receiver's desired mode(s) 3.2.5. The TUF Solution
2. the sender's framing key if PDU alignment mode is selected The TUF protocol defines a shim layer above TCP and below the ULP
that allows the receiver to validate the ULPDU containment property
for each TCP segment received, independently of any other TCP
segment. The TUF protocol also defines a segmentation behavior for
the TCP sender that ensures the ULPDU containment property holds as
often as possible while still respecting the protocol requirements
for TCP senders.
2. ULP packing behavior if PDU alignment mode is selected The TUF-specified TCP segmentation behavior ensures that the ULPDU
containment property is maintained as long as the receiver window
size is at least equal to the effective MSS (EMSS), the path MTU
(PMTU) does not change, and the TCP stream is not resegmented by an
intermediary. In conditions where the TCP receiver window size is
smaller than EMSS, or the PMTU changes, the segmentation behavior
further ensures that once the relevant condition is restored, the
ULPDU containment property will be satisfied again.
3. the receiver's desired marker period if marker mode is For the high-performance applications that this protocol targets,
selected small receiver window sizes, and PMTU changes are rare transients.
Thus, the specified protocol ensures that ULP control information
and its associated data are virtually always together in a single
TCP segment.
4. the receiver's desired maximum PDU size if marker mode is 3.2.6. TUF's ULP Assumptions
selected
5. PDU Alignment Mode A key assumption of TUF is that ULPs running on TUF can adjust
ULPDU sizes to fit completely within an EMSS-sized TCP segment.
Clearly, if a ULPDU does not fit within an EMSS-sized TCP segment,
the ULPDU containment property can not be satisfied. Most storage
protocols (e.g. iSCSI), and other performance-targeted protocols
(e.g. RDMA protocols) support this capability. ULPs that can not
adjust ULPDU sizes to fit within an EMSS-sized TCP segment, but
still want the performance advantages of direct data placement, can
be mapped on top of an intermediate protocol (e.g. an RDMA
protocol) that does support this data `chunking'.
The framing protocol in PDU alignment mode sends one or more TUF does not change the stream delivery semantics of TCP to the
complete ULP PDUs preceded by a framing header. This framing ULP, through the TUF implementation. It merely inserts a shim
header and set of ULP PDUs is called a `framing PDU'. The framing header that can be used by direct placement network interfaces to
protocol in PDU alignment mode is supported by a framing-aware TCP verify the ULPDU containment property. The shim header is inserted
whose behavior is described in `Framing-Aware TCP', below. by the sending TUF implementation and removed by the receiving TUF
implementation, leaving a stream to be delivered to the ULP.
The format of a framing PDU is as follows: 4. The Protocol
This section defines the TUF protocol itself. The first two
sections are the core of the protocol defining:
o the shim layer PDUs, called FPDUs,
o a TCP-conforming segmentation behavior which ensures the ULPDU
containment property holds under most conditions.
The remaining sections cover other aspects of the protocol which
are primarily implications of the core protocol:
o what ULP-specified negotiations to enable TUF must accomplish,
o how receivers can process received TCP segments to establish
whether the ULPDU containment property holds.
4.1. The Framing Protocol Data Unit (FPDU)
TUF sends groups of one or more complete ULPDUs in a framing
protocol data unit (FPDU).
4.1.1. FPDU Format
The format of an FPDU is:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Key | | Length | Key |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Key | | Key |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | |
| | | |
~ ~ ~ ~
~ ULP PDUs ~ ~ ULPDUs ~
| | | |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ULP PDUs | | ULPDUs |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The "Length" field is 16 bits and contains the length in octets of Length: 16 bits (unsigned integer)
the set of framed ULP PDUs, excluding the framing header. This is the length in octets of the set of framed ULPDUs. It
does not include the length of the FPDU header itself.
The "Key" field is 48 bits and is selected at random by the sender, Key: 48 bits (unsigned integer)
and signalled to the receiver in a ULP-specified way. All framing
PDUs sent on the same connection in the same direction must use the
same key value. A good quality random number generator MUST be
used to generate the initial key. RFC 1750 discusses relevant
characteristics and provides references for good quality random
number generation [RFC1750].
The length of the framing PDU in octets will be 8 + L, where L is This is used by the receiver to validate the ULPDU containment
the length of the set of framed ULP PDUs. property. It is selected at random by the sender, and
initially signaled to the receiver in a ULP-specified way,
before the receiver attempts to test the ULPDU containment
property. All FPDUs sent on the same connection in the same
direction must use the same key value. A good quality random
number generator MUST be used to generate the initial key.
RFC 1750 discusses relevant characteristics and provides
references for good quality random number generation
[RFC1750].
Whether more than one ULP PDU may be packed into a single framing The length of an FPDU is 8 + L octets, where L is the length of the
PDU is a controllable option of the framing module in PDU alignment set of framed ULPDUs. The 16-bit length field is sufficient to
mode. Some receivers may choose to expect exactly one ULP PDU per permit a TCP segment with an FPDU to completely fill a maximum-size
TCP segment when framing is behaving nominally. The sender MUST IPv4 or IPv6 datagram.
NOT pack more than one ULP PDU into a framing PDU if this behavior
is desired by the receiver. ULP packing behavior may be negotiated
or specified priori by the ULP.
5.1. Framing-aware TCP 4.1.2. FPDU Size Selection
A framing-aware TCP SHALL send one complete framing PDU per TCP Each FPDU SHOULD contain as many contiguous, complete ULPDUs as
segment whenever possible. Cases when it may not be possible to will fit within the current EMSS, unless ULPDU packing is disabled.
send a complete framing PDU in each TCP segment are described in If ULPDU packing is disabled each FPDU SHALL contain a single
`PDU Alignment Mode Exception Cases', below. ULPDU. ULPDU packing mode may be negotiated, or specified a priori
by a ULP. Disabling ULPDU packing is analogous to disabling the
Nagle algorithm in TCP.
A framing-aware TCP MUST NOT send any TCP segment containing octets TUF SHALL present the size of the largest ULPDU size fitting in an
from more than one sending operation. In other words, the boundary EMSS-sized FPDU (MULPDU) to the ULP. MULPDU is EMSS - the FPDU
between data of consecutive sending operations MUST occur between header size (8 octets). ULPs SHOULD submit as large ULPDUs as
TCP segments. By following this rule, the sender guarantees that possible to TUF, up to MULPDU, subject to limits imposed by
in the event an exception causes PDU alignment to be lost specific ULP properties. The ULP MAY also chose to pack several
temporarily, it will be regained as soon as possible. ULPDUs into an EMSS-sized unit before submitting them as one ULPDU
to TUF. Depending upon the ULP, ULP packing may improve data
transfer efficiency, and is unlikely to have any detrimental
effect.
The use of oversize TCP segments sent by means of IP fragmentation A TUF implementation probing for PMTU increase SHOULD present an
is discouraged due to the limited size of the IP header increased MULPDU value to the ULP until a large enough FPDU to
Identification field and the potential for undetected errors due to perform the probe results.
wrapping of the Identification value. Framing-aware TCP
implementations SHOULD resegment at the TCP layer according to the
rule given in the previous paragraph when necessary to meet
requirements of the current maximum segment size for a path. In
this document, EMSS means the current TCP maximum segment size used
for sending segments on a connection, which is initially negotiated
during the connection handshake, and subsequently adjusted by path
maximum transfer unit (PMTU) discovery behavior [PathMTU].
A framing-aware TCP must notify the framing module of changes in Under exceptional circumstances, the EMSS can become too small to
the EMSS. The framing module must be able to retrieve the EMSS accommodate even a single ULPDU. For example, a ULP may define
from the framing-aware TCP. fixed-sized PDUs that are incompressible, or variable size PDUs
with some absolute minimum size, such as the size of a data PDU
containing a minimum amount of data. It is possible for the EMSS
to shrink to as small as 8 octets [PathMTU]. If the EMSS is too
small to accommodate an incompressible ULPDU, the FPDU MUST contain
only that ULPDU. ULPs using TUF SHOULD NOT define ULPDUs with a
minimum size greater than 128 octets.
If the framing-aware TCP chooses to probe for path MTU increase 4.2. TUF-conforming TCP Sender Segmentation
using TCP segment larger than the path MTU, the framing-aware TCP
MUST report an appropriate EMSS increase. The candidate path MTU
will only be probed when the framing protocol submits a framing PDU
larger than the current EMSS. Immediately following the probing
segment, the framing-aware TCP MUST reduce EMSS to its previous
value until the candidate path MTU is confirmed.
Probing for path MTU increase is optional [PathMTU], and a framing- TCP senders are allowed substantial freedom in the choice of how to
aware TCP might elect not to do so unless the EMSS becomes segment an outgoing TCP stream. Within the confines of the
`inconveniently' small. By not probing for path MTU increase when receiver-advertised receive window, and the sender computed
the current EMSS provides adequate performance, the framing congestion window, any segmentation is permitted. Virtually all
protocol will not send the potentially unaligned PDUs that would be TCP implementations do attempt to segment outgoing TCP streams into
used to probe path MTU. EMSS-sized segments where possible because it improves performance.
Although framing-aware TCP is defined specifically to support the TUF-conforming TCP sender behavior ensures that the ULPDU
framing protocol in ULP alignment mode, it may be used by other containment property holds most of the time. To do this, a TUF-
clients, assuming framing validation is provided by some means. conforming TCP sender MUST respect a single additional rule in
For example, as discussed below in `Security Considerations', a performing segmentation:
framing-aware TLS could use a framing-aware TCP directly without
adding framing PDU headers, because TLS validation can serve the
same purpose, and actually provides stronger framing validations
guarantees than a framing PDU header.
5.2. PDU Alignment Mode Exception Cases A TUF-conforming TCP sender MUST segment the outgoing TCP
stream such that the first octet of every FPDU is sent at the
beginning of a TCP segment
Although the framing-aware TCP sender should place exactly one 4.3. Negotiating TUF
framing PDU in each TCP segment there are exceptions when this is
not possible. These exceptions include the following.
1. The connection is in emergency mode and EMSS is less than 512 Negotiating the use of TUF is the responsibility of the ULP. The
octets. use of TUF MAY be negotiated separately for each direction on a
connection. The negotiation procedure MUST ensure that when TUF is
enabled or disabled, the remote peer will not transmit its first
TCP segment in the new mode until it is certain that the local peer
has actually enabled or disabled TUF.
2. The EMSS has been reduced. This will result in a window TUF operation is characteristically requested by the receiver and
during which the ULP is not yet aware of the reduced EMSS. offered by the sender. Before enabling TUF, the relevant
Since some framing PDUs may already have been sent and parameters:
possibly lost prior to being received, the same framing PDUs
must be resent, if necessary, but in smaller TCP segments
which conform to the new EMSS.
3. The remote end is advertising a window smaller than the EMSS. 1. the sender's 48-bit key
If both ends manage their window as required in RFC-1122
[RFC1122], and a reasonable amount of receive buffering is
available, this case should not occur, but the sender, for
robustness, must tolerate this.
4. The sender is probing an advertised window of zero. 2. ULPDU packing mode
5. The sender is probing to determine if the path MTU can be MUST be established at each peer.
increased.
In addition, there is another case in which the receiver will A natural way to enable the use of TUF is a ULP-defined negotiation
receive framing PDUs which are not aligned with TCP segments. exchange of the TUF parameters culminating in enabling TUF, if
requested, for each transfer direction. A three-way handshake
protocol can be used to ensure that the point at which TUF is
enabled is unambiguous and each end has time to perform local state
changes. A connection on which TUF is enabled is likely to be the
same connection on which the negotiation occurs, but this is not
required. A new connection could also use TUF from its initial
establishment, if the TUF parameters and modes are known through
some out-of-band mechanism.
6. There is a middle-box in the connection which is resegmenting Use of TUF could be disabled during a connection using a similar
the TCP data stream. ULP-defined three-way handshake.
If the framing protocol in PDU alignment mode must send an Other alternatives to parameter exchange include stipulating some
unaligned framing PDU, it SHALL take one of the following actions. parameters a priori. For example, a ULP could specify that TUF
with ULPDU packing enabled is always used in both directions. In
this case, only the 48-bit keys need to be exchanged before TUF is
enabled. Or, a ULP could determine TUF characteristics on the
basis of the TCP port number.
1. Send the framing PDU as a single TCP segment using IP 4.4. TUF Receiver ULPDU Containment Property Testing
fragmentation. While this behavior is discouraged, it is not
prohibited by the framing protocol, or any other applicable
RFCs.
2. Send the framing PDU as several TCP segments, with each A TUF receiver that wishes to use ULP control information to
segment guaranteed not to appear as a well-formed, complete perform direct data placement must first verify the ULPDU
framing PDU on its own, at the time the segment is sent. That containment property. To do this, the receiver MUST establish that
is, the sender SHALL ensure that one of the following is true the TCP segment contains exactly one FPDU. Abstractly, this can be
for every segment with a partial framing PDU: done by assuming the TCP segment payload begins with an FPDU, and
verifying the following properties of that putative FPDU:
A. octets 0-1 do not equal the segment length minus 8 o The received TCP segment payload length equals the FPDU length
plus the length of the FPDU header (8 octets).
B. octets 2-8 do not match the framing key value o The 48-bit key equals the value signaled to the receiver when
TUF was enabled for the connection.
C. the total segment length is less than the framing PDU If these conditions are true, the TUF receiver MAY assume that the
header of 8 octets ULPDU containment property holds, and use ULP control information
to directly place data in the contained ULPDUs.
These mechanisms ensure that the receiver will not falsely TUF DOES NOT provide any information that a TUF receiver can use to
misinterpret any piece of a framing PDU sent in several segments as locate ULP control information beyond the ULPDU containment
a complete, valid framing PDU. However if the TCP data stream is property. In particular, a TUF receiver MUST NOT scan TCP segments
subjected to resegmenting by a middle-box, the sender may no longer in an attempt to locate FPDUs that do not begin at the beginning of
control segmentation of received data. In this case the framing a TCP segment. However, even if the ULPDU containment property
protocol must rely on probability to ensure that segments of the does not hold, a TUF receiver may still be able to reliably locate
resegmented data stream will not appear as valid, complete framing and use ULP control information. For example, if a received TCP
PDUs, if they are not. segment contains the next unreceived data in the TCP stream, the
location of ULPDUs in that segment are unambiguous. The behavior
of a TUF receiver acting on ULP control information located with
properties other than the ULPDU containment property is not
specified here.
In the case where the receiver detects a continuous stream of TCP 5. Protocol Characteristics
segments which do not contain complete framing PDUs, the ULP SHOULD
disable use of the framing protocol, or switch to marker mode if
the ULP provides a means of doing this, and the end points so
choose. Such a continuous stream of improperly framed TCP segments
implies the presence of a resegmenting middle-box. Such a
detection process SHOULD NOT mistake a temporary sequence of
improperly framed TCP segments resulting from an EMSS change with
the presence of a resegmenting middle-box
5.3. Validity Of Framing-aware TCP Segmentation This section discusses some characteristics and behavior which are
implications of the TUF protocol.
A framing-aware TCP normally sends exactly one framing PDU per TCP 5.1. Properties Of TUF-conforming TCP Senders
segment. This may therefore result in more segments being sent
than would occur in a traditional TCP. However, the framing module
is allowed to pack multiple ULP PDUs into a single framing PDU if
ULP packing is enabled, which will give behavior approaching that
of a traditional TCP. Even with ULP packing disabled, the behavior
of a framing-aware TCP effectively corresponds to that of a
traditional TCP sender with the Nagle algorithm disabled (i.e.
TCP_NODELAY), and this is considered acceptable behavior.
Framing-aware TCPs still respect congestion control windows, which The general practice of TCP senders to send as much data as
are maintained as a octet count not as a segment count. possible within a TCP segment (up to EMSS) implies that an FPDU
whose size is less than or equal to EMSS, and whose first octet
begins a TCP segment will be sent entirely within a single TCP
segment. This ensures the ULPDU containment property for that TCP
segment.
On retransmission, a framing-aware TCP respects the original stream A TUF-conforming TCP sender still obeys all requirements of TCP.
segmentation. This is allowed by RFC1122 [RFC1122], section While the segmentation of a TUF-conforming TCP sender will have
4.2.2.15. distinctive characteristics when viewed from the network wire, the
same segmentation behavior could also result from a stock TCP
sender.
5.4. Receiving In PDU Alignment Mode The one property of a TUF-conforming TCP sender which arguably
departs from traditional expectations is that a TUF-conforming TCP
sender may not produce TCP segments which are as close in size to
EMSS as a stock TCP sender. The need to ensure the ULPDU
containment property may result in TCP segments which are not as
full as if the property did not need to hold. While this is
abstractly true, in practice, several characteristics combine to
minimize this effect. Specifically:
Because each framing PDU contains sufficient information to o Packing ULPDUs into FPDUs gives behavior similar to that of
determine its length, the beginning of the next framing PDU can be stock TCP segmentation, albeit with coarser granularity.
determined. Therefore each successive PDU can be recovered.
Conventional TCP implementations will pass received data to the ULP o ULPs which benefit from data-dependent direct data placement
in order, so framing is easily recovered by the ULP. (candidates for TUF) usually transfer large amounts of data in
bulk. This means that most ULPDUs are data-carrying, and will
be EMSS-sized. Even when control is interleaved with data,
the combination of a small number of control ULPDUs with a
data ULPDU can be packed to fill an EMSS-sized segment.
Special receive implementations which exploit PDU alignment mode, Therefore, a TUF-conforming TCP sender seems likely to behave
typically found in direct placement network interfaces, may allow similarly to a stock TCP sender under most circumstances. However,
the ULP to do direct data placement on TCP segments received out of applications that both send and receive data over the same TCP
order. The receiving end can safely assume that a framing PDU is connection, where there might be dependencies between incoming and
exactly contained within TCP segment payload if the following outgoing data, are often subject to excessive delays attributable
conditions are met. to TCP's Nagle algorithm and/or delayed-ACK algorithm [NagleDAck].
These algorithms generally perform best when TCP always sends full-
EMSS segments. Because TUF can generate sub-EMSS segments as a by-
product of aligning FPDU boundaries with TCP segment boundaries,
TUF might be especially vulnerable to the known problems with the
Nagle and/or delayed-ACK algorithms.
1. Standard TCP processing indicates that this is a valid, in- Further work, including implementation experience with TUF, as well
window segment. as existing and future proposals for improvements to the Nagle
and/or delayed-ACK algorithms, might be necessary to optimize TUF
performance while fully preserving the congestion-avoidance
features of TCP. This work is currently outside the scope of this
document.
2. The payload of the TCP segment, parsed as a framing PDU, has a 5.2. Exception Cases
length field which equals the TCP segment length minus 8, and
a key field which matches the expected key for the framing
protocol connection.
The framing protocol passes the contained ULP PDUs to a ULP parser. The complete operational specification of TUF is contained in the
The ULP parser performs direct placement for the PDUs. The ULP rules for forming FPDUs, and sending those FPDUs in TCP segments.
parser MUST NOT execute the ULP protocol (i.e. none of the ULP However, the operation of TUF will be subject to a variety of
protocol state variables change), until all preceding octets in the transient or exceptional conditions. The behavior of TUF under
TCP stream have also been received. those conditions is discussed below to illustrate specifically how
TUF addresses them.
6. Marker Mode 5.2.1. Resegmenting Intermediaries
The framing protocol in marker mode inserts framing markers in the Resegmenting TCP-layer intermediaries (middleboxes) are one of the
TCP octet stream at a period agreed upon by the framing protocol most formidable obstacles to maintaining the ULPDU containment
sender and receiver. Each framing marker points to the next PDU in property. In the presence of such an intermediary, the
the TCP octet stream. Marker insertion in the TCP octet stream is segmentation chosen by the sender may not be the segmentation at
not synchronized in any way with the ULP. The ULP may use PDUs of the receiver. While such intermediaries may or may not be common
any size up to 2^16-8-(4 * # of markers inserted) (determined by in particular networks, in many cases the presence or absence of
marker interval). Markers will be inserted in the resulting octet such resegmenting behavior is beyond the control or even knowledge
stream, possibly interrupting PDUs, as necessary to maintain the of the end points using TUF. Therefore, TUF must detect such
interval. Although the placement of each marker is not a function resegmentation by design.
of the ULP PDU boundaries, the contents of each marker are.
The format of a framing marker is as follows: A primary reason for the presence of a random key in the FPDU
header is to detect such resegmentation. An alternative to the
random key which has been proposed, is to use ULP-specific
validation criteria to determine the ULPDU containment property.
For example, some ULP PDUs include relatively strong data integrity
checks such as CRCs, and other ULP control information can often be
validated against various ULP-specific criteria.
0 1 2 3 While such ULP-specific validation criteria may involve checking
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 many more bits than the combination of the FPDU's 16-bit length and
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 48-bit key, ULP-specific validation criteria may not actually offer
| Next PDU Offset | Next PDU Offset | a strong guarantee of the ULPDU containment property. For certain
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ data streams, the probability of a false-positive indication of the
ULPDU containment property can be extremely high.
The "Next PDU Offset" contains the offset to the next PDU, in Assume that the intermediary resegments to a granularity of no
octets, from the end of the marker. finer than G octets (e.g. 4). Also assume that the TCP data stream
contains predominantly application data. If the ULP is a storage
protocol, simply transferring a file containing a continuous,
repeated stream of well-formed ULPDUs which are some multiple of G
in size increases the probability of a false-positive indication of
the ULPDU containment property to approximately:
The "Next PDU Offset" occurs twice in the marker to guarantee that 1 / (sizeof(repeated ULPDU)/G)
when a marker is split across TCP segments, a complete copy of Next
PDU Offset occurs in at least one of the two TCP segments.
The framing protocol receiver must remove (or otherwise ignore) the If the well-formed ULPDUs are relatively small (e.g. 32 octets
periodic markers in the received TCP octet stream to reconstruct where G=4 octets), the probability of a false-positive indication
the PDUs from the sender. of the ULPDU containment property is approximately 1/8, for EACH
TCP segment which does not actually begin with a ULPDU. Clearly,
in this case, it would take only a very small number of TCP
segments which do not begin with an actual ULPDU before the `fake'
ULPDU in the application data is interpreted as an actual ULPDU.
The consequences of such a false-positive interpretation could be
dire, for example executing a destructive operation request.
The first marker SHALL be sent in the TCP octet stream preceding The 48-bit random key in the FPDU results in a low probability of a
any framed PDUs. This first marker will, necessarily, have a Next false-positive indication of the ULPDU containment property because
PDU Pointer of 0. The first marker corresponds to the point in the it is effectively secret with respect to the application data
TCP octet stream when the framing protocol is enabled. stream.
7. Security Considerations Note that although this analysis may appear to be security-minded,
prompting the image of a sighted third-party adversary that can
`sniff' the 48-bit key, it is actually considering a safety, rather
than a security property. The security properties of TUF are
discussed in Section 6 (`Security Considerations') below.
7.1. Security Protocol Interactions Even though TUF can detect the presence of a resegmenting
intermediary, such an intermediary will almost certainly
substantially reduce the chance of the ULPDU containment property
being satisfied. A TUF implementation which detects a very low
incidence of the ULPDU containment property for a sustained
interval (>> RTT) may assume that a resegmenting intermediary is in
operation and SHOULD discontinue the use of ULP control information
found using the ULPDU containment property. In such cases, the ULP
MAY elect to disable the use of TUF altogether, or simply just stop
exploiting the ULPDU containment property.
The ULP framing protocol may be layered on top of IPSec, or TLS. A 5.2.2. PMTU Reduction
direct placement network interface which supports connections
secured with IPSec or TLS must directly implement security protocol
processing as well as framing and direct placement support.
7.2. Using IPSec With The Framing Protocol When a PMTU reduction is detected by a TUF-compliant TCP, the TUF-
compliant TCP sender may send FPDUs already committed to the TCP
layer in one of two ways:
Since IPSec is designed to secure arbitrary IP packet streams, o send unsegmented FPDUs in TCP segments of the old EMSS size,
including streams where packets are lost, the framing protocol and rely on IP fragmentation to deliver the segments,
could run cleanly on top of IPSec without any change. o segment FPDUs to fit in TCP segments which respect the new
EMSS size.
Using IPSec end-to-end with the framing protocol in PDU alignment Stock TCPs face a similar choice on PMTU change, and both
mode permits an optimization to the framing protocol. Because alternatives are used in practice.
IPSec validation criteria guarantee that IP packets received are
equivalent to the IP packets sent, it is not possible for an
intermediary to resegment the TCP stream. If IP fragmentation
(rather than resegmenting) is used to send committed data when the
EMSS changes, the framing PDU validation header is not needed. In
this case, a ULP may run directly on top of a framing-aware TCP.
7.3. Using TLS With The Framing Protocol In the case that a TUF-compliant TCP chooses to segment FPDUs, it
SHOULD segment them in such a way that, in the absence of
resegmentation by an intermediary, the segments are guaranteed not
to give a false-positive indication of the ULPDU containment
property. There are various ways to ensure this. For example, no
matter how the FPDU is segmented, the first segment is guaranteed
not to give a false-positive indication of the ULPDU containment
property---the 48-bit key will match, but the length will not. In
the worst possible case, each subsequent TCP segment could be sent
with fewer than 8 octets of data, also guaranteed not to give a
false-positive indication of the ULPDU containment property. More
efficient approaches are possible, but PMTU reduction is a rare
event, and reacting to it is only a transient condition.
Eventually a new MULPDU will be presented to the ULP, and FPDUs
that fit in the new EMSS will result. During the transient
condition, performance will suffer temporarily no matter how FPDUs
are segmented.
Using TLS with the framing protocol is more complicated than using No matter what segmentation is chosen by a TUF-compliant TCP sender
IPSec. The combination of TLS and the framing protocol must still when segmenting an FPDU, if the segments pass through a
provide a modest bound on reassembly buffer size to be useful. resegmenting intermediary, the correctness of the ULPDU containment
property remains strictly a matter of probability.
TLS is a record-oriented protocol. TLS records are PDUs just like 5.2.3. PMTU Increase
those used by ULPs that permit direct placement. As with other
ULPs, the only way to avoid a complete reassembly buffer is to be
able to find TLS PDUs in the presence of lost TCP segments.
Therefore, to permit direct placement of ULPs secured with TLS, TLS
should also be treated as a protocol which uses framing support.
Using the framing protocol with TLS requires modification of a TLS As described in `FPDU Size Selection' above, a TUF-compliant TCP
implementation for the combination to perform effectively. probing for PMTU increase will present an increased MULPDU value to
Essentially, a TLS implementation must become a client of the the ULP. This should eventually lead to an FPDU large enough to
framing protocol. actually perform the PMTU increase probe. The MULPDU value should
not be further adjusted until the probe is actually performed.
This behavior is similar to when a stock TCP would like to perform
a PMTU increase, but less data is available than would fill the
desired segment.
TLS provides a similar interface to TCP for sending protocol data. Also, note that depending on the ULP, the actual distribution of
Protocol data submitted to the TLS send interface may be coalesced FPDU sizes may have a granularity coarser than a single octet. An
with other protocol data in a single TLS PDU, or it may be FPDU with an particular, desired TCP segment size may never be
segmented arbitrarily across more than one TLS PDU. For the generated. Therefore when probing for PMTU increase, a TUF-
framing protocol in to properly support direct placement with TLS, compliant TCP must be satisfied with an FPDU that produces a TCP
a framing-aware TLS MUST provide a framing-aware interface to the segment size that is `close' to the desired size.
ULP similar to the one described in Appendix A.
This layering looks like: Finally, note that in cases where PMTU grows and shrinks relatively
frequently, better performance may result from not probing for PMTU
increase at all, or probing very rarely. This is because the
performance disruption resulting from PMTU decrease can be
substantial, and in many cases, implementations of TUF will be in
hardware, so performance may less sensitive to differences in PMTU.
Framing ULP client 5.2.4. Receive Window < EMSS
|
V
TLS-capable framing module
|
V
Framing-aware TLS
|
V
Framing module
|
V
TCP (possibly framing-aware)
|
V
. . .
Although some framing information may be exposed in the clear when A TUF-compliant TCP sender that is presented with a receive window
running TLS on the framing protocol, this information does not add smaller than EMSS may be required to segment FPDUs. The TCP window
to what is already available to an attacker. Framing only conveys probe is a limiting case of this condition where the advertised
the location of TLS PDUs, which are already available in the clear. receive window is 0, and the amount of data typically sent in
response is a single octet.
Unfortunately, ciphers defined for use with TLS do not offer the In this case, a TUF-compliant TCP sender will segment in accordance
same independence of TLS PDUs that IPSec provides for IP datagrams. to the requirements of TCP, and the rule defined in `TUF-conforming
For one thing, TLS supports the use of stream ciphers, which IPSec TCP Sender Segmentation' above. In addition, as when resegmenting
does not. Stream ciphers typically have dependencies reaching far in response to PMTU decrease, a TUF-compliant TCP sender SHOULD
back in the data stream for deciphering at the current point. segment in such a way that, in the absence of a resegmenting
Therefore it is probably not appropriate to negotiate the use of a intermediary, segments are guaranteed not to give a false-positive
stream cipher when securing the framing protocol. indication of the ULPDU containment property. In situations where
the receive window is smaller than EMSS, data transfer performance
is likely to be limited independently of any segmentation behavior
by the TCP sender. Furthermore, ULP implementations that choose to
use TUF will almost certainly be designed to maintain a receiver
window larger than EMSS, so a small receiver window should occur
extremely infrequently.
Block ciphers defined for use with TLS have similar properties to 5.2.5. Size of ULPDU + 8 > EMSS
those defined for use with IPSec. Specifically, they all operate
in Cipher Block Chaining (CBC) mode. However, while IPSec provides
a CBC initialization vector for each IP datagram, TLS defines only
a single CBC initialization vector for use in the first block. All
subsequent blocks use the cipher-text of their predecessor. To
decipher the current TLS PDU, the final cipher-text block from the
previous TLS PDU must be available. Typically, block ciphers
defined for use with TLS have an 8-octet block size. This implies
that for ULP direct placement to be possible with TLS, data from a
preceding TCP segment may be needed, where it is not when using the
framing protocol without TLS. Note that if the preceding TCP
segment is missing, all cipher blocks within the current TCP
segment may still be processed except the first one (assuming the
bounds of the TLS PDU is known).
7.3.1. Using TLS In PDU Alignment Mode In cases where EMSS shrinks below the minimum size of a ULPDU that
a ULP wants to send, TUF will create FPDUs that are larger than
EMSS, and a TUF-compliant TCP sender will face the same
alternatives as during PMTU reduction:
To run the framing protocol running on TLS in PDU alignment mode, o send unsegmented FPDUs and rely on IP fragmentation to deliver
an integral number of TLS PDUs may be sent in each TCP segment the the segments
same way ULP PDUs are sent in the absence of TLS. A framing-aware
TLS would use the framing-aware TCP. In this case, the role of the
framing PDU header in detecting unexpected modification of TCP
segmentation is subsumed by the strong integrity checks performed
on TLS PDUs. There is no need to encapsulate TLS PDUs in a framing
PDU. In fact, the vulnerability of the framing key to active
attack is eliminated by using TLS validation algorithms instead.
Use of a non-null TLS compression algorithm may interact badly with o segment FPDUs to fit in TCP segments which respect the EMSS
a framing-aware TLS implementation. A TLS compression algorithm is size
allowed to increase content length by up to 1024, which may result
in the compressed TLS PDU no longer fitting within EMSS.
Therefore, only TLS compression algorithms which are known not to
increase content length, or increase content length by a small,
manageable amount, should be selected.
The need to receive the previous TCP segment before completing TLS A ULP which is presented with an MULPDU value that is too small to
processing of current TCP segment means that using the framing accommodate PDUs necessary operation SHOULD simply attempt to use
protocol in PDU alignment mode with TLS will require some high- ULPDUs which are as small as possible
speed receive packet buffer memory. This defeats one of the
primary advantages of PDU alignment mode. Therefore, while it is
possible to use TLS to secure the framing protocol in PDU alignment
mode, IPSec would be a more appropriate choice for securing PDU
alignment mode connections because it does not require any
reassembly buffer memory.
7.3.2. Using TLS In Marker Mode If the EMSS shrinks to a pathologically small size, then a TUF
implementation SHOULD discontinue the use of ULP control
information found using the ULPDU containment property. In such
cases, the ULP MAY elect to disable the use of TUF altogether, or
simply just stop exploiting the ULPDU containment property.
To use TLS on a framing protocol connection in marker mode, the TCP A path MTU which results in an EMSS < 128 + 8 octets is an
stream must actually contain two, independent sets of periodic extremely unlikely occurrence and when it does occur, poor data
markers. Clear-text markers in the TLS PDU stream will permit TLS transfer performance is a likely result, independent of TCP sender
PDUs to be found in the presence of lost TCP segments. Once a segmentation behavior.
portion of the original, clear-text TCP stream is recovered by TLS
processing, markers in the original octet stream are used to find
ULP PDUs and perform direct placement.
7.4. Other Security Considerations 6. Security Considerations
The modification of the sender's TCP segmentation algorithm in PDU
alignment mode does not open any new attacks, since: 1) the
segmentation algorithm is not based on input from the network, 2)
the segmentation algorithm may pack small ULP PDUs into a single
TCP segment so it does not open packet flooding attacks.
If an attacker can send an in-window TCP segment that is accepted, This section discusses both protocol-specific considerations and
on an unsecured framing protocol connection the attacker can the implications of using TUF with existing security mechanisms.
probably force the TCP receiver in to a framing protocol exception
path, degrading service. However, such an attacker can also place
arbitrary data into the stream, so merely forcing the receiver on
to an exception path is not a compelling attack.
8. IANA Considerations 6.1. Protocol-specific Security Considerations
A third-party that can inject spoofed packets into the network
which can be delivered to a TUF receiver could launch a variety of
attacks that exploit TUF-specific behavior. For example a blind
third-party adversary could inject random packets which appear in
the valid TCP window and do not begin with valid FPDU headers. A
barrage of such packets might cause a TUF receiver to conclude that
a resegmenting intermediary is present and disable the use of TUF
and direct data placement. This would substantially degrade
performance. However, it would probably also have more dire
consequences than performance, such as causing the ULP to interpret
the bogus data as valid. Furthermore, such a third-party could
also degrade performance just as effectively in a TUF-independent
way by injecting spoofed ICMP packets which result in reduction of
the path MTU to an inefficiently small size.
Fundamentally, the vulnerabilities of TUF to active third-party
interference are no more acute than to TCP without TUF. In both
cases, a communication security mechanism such as IPSec is the only
way to completely prevent such attacks.
6.2. Using IPSec With TUF
Since IPSec is designed to secure arbitrary IP packet streams,
including streams where packets are lost, TUF can run cleanly on
top of IPSec without any change. IPSec packets may be decrypted in
the order they are received, and a TUF receiver may test and
exploit the ULPDU containment property just as if the IP datagram
were unsecured.
6.3. Using TLS With TUF
Using TLS [TLS] with TUF, particularly trying to exploit the ULPDU
containment property to locate ULP control information, is not a
straightforward process. TUF can be directly layered on top of
TLS, but many of the advantages of TUF are lost. This document
does not define a way of using TLS with TUF that could offer better
performance than stock reassembly buffer-based implementations.
That task is left to a different document, if there is sufficient
motivation to address the problems. This section does outlines
some of the known complications of trying to do better than stock
reassembly buffer-based implementations using TLS with TUF.
TLS is a record-oriented protocol. TLS records are PDUs with a
similar structure to ULPDUs defined in application ULPs. As with
other ULPs, the only way to avoid a complete reassembly buffer is
to be able to find TLS PDUs in the presence of lost TCP segments.
The ULPDU containment property could be used to do this, which
suggests that TLS itself should be layered on top of TUF. In this
case, the FPDU header will travel in the clear, but this will
probably not present serious vulnerabilities other than denial of
service attacks comparable to what is already possible without TUF.
Once the TLS records are located and processed it still remains to
locate the ULPDUs. The simplest way to do this would be to have
the TLS implementation be TUF-compliant, and ensure the ULPDU
containment property within each TLS record. In this case, the
protocol layering would look like:
ULP client
^
|
| ULPDUs (in octet stream)
|
v
TUF-conforming TLS
^
|
| TLS records (containing ULPDUs)
|
v
TUF
^
|
| FPDUs (each containing a TLS record)
|
v
TUF-conforming TCP
^
|
| TCP Segments (each containing an FPDU)
|
v
. . .
An obvious complications of using TLS with TUF is that ciphers
defined for use with TLS do not offer independence across TLS
records. The most common cipher used with TLS is RC4, which is a
stream cipher. Efficient decryption of an RC4 stream depends upon
the entire preceding data stream. In other words, it is simply not
feasible to decrypt TLS records encrypted with RC4 in any order
other than the TCP stream order. This clearly defeats the purpose
of TUF.
TLS is also defined to work with block ciphers such as 3DES in
Cipher Block Chaining (CBC) mode. In this case, the dependency of
the decryption operation on data in previous TLS records is less
severe. To decrypt the current TLS record only requires ciphertext
from the previous TLS record. While this does not allow complete
independence of processing TLS records, a lost or delayed TCP
segment containing a TLS record only prevents decrypting the
immediately subsequent TLS record, not all TLS records after it.
TLS compression presents another complication to using TLS with
TUF. TLS compression algorithms are allowed to increase the
content length by up to 1024 octets. If the content length does
increase, the TLS record may not fit within an EMSS-sized TCP
segment, even if the uncompressed ULPDU does. If the risk of
exceeding an EMSS-sized TCP segment is small, it may be acceptable
to occasionally send FPDUs containing TLS records that span several
TCP segments, or use IP fragmentation. Some TLS compression
algorithms may never increase the content length, or only increase
it by some small, manageable amount.
7. IANA Considerations
If framing is enabled a priori for a ULP by connecting to a well- If framing is enabled a priori for a ULP by connecting to a well-
known port, this well-known port would be registered for the framed known port, this well-known port would be registered for the framed
ULP with IANA. ULP with IANA.
9. References 8. References
[ALF] [BEEP]
D. D. Clark and D. L. Tennenhouse, "Architectural Rose, M., "The Blocks Extensible Exchange Protocol Core", RFC
considerations for a new generation of protocols," in SIGCOMM 3080, March 2001.
Symposium on Communications Architectures and Protocols ,
(Philadelphia, Pennsylvania), pp. 200--208, IEEE, Sept. 1990.
Computer Communications Review, Vol. 20(4), Sept. 1990.
[SOCKS] [HTTP]
Leech, M., and others, "SOCKS Protocol Version 5," RFC 1928, Fielding, R. and others, "Hypertext Transfer Protocol --
April 1996 HTTP/1.1.", RFC 2616, June 1999.
http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-
initwin-00.txt.
[RFC0879] [NagleDAck]
Postel, J., "TCP Maximum Segment Size And Related Topics", RFC Minshall G., Mogul, J., Saito, Y., Verghese, B., "Application
879, November 1983 performance pitfalls and TCP's Nagle algorithm", Workshop on
Internet Server Performance, May 1999.
[RFC1112]
Braden, R., ed., "Requirements for Internet Hosts --
Communications Layers", RFC 1122, October 1989
[PathMTU] [PathMTU]
Mogul, J., and Deering, S., "Path MTU Discovery", RFC 1191, Mogul, J., and Deering, S., "Path MTU Discovery", RFC 1191,
November 1990 November 1990.
[RFC1750] [RFC1750]
Eastlake, D., Crocker, S., Schiller., J., "Randomness Eastlake, D., Crocker, S., Schiller., J., "Randomness
Recommendations for Security.", RFC 1750, December 1994 Recommendations for Security.", RFC 1750, December 1994.
[RFC2581] [RFC2581]
Allman, M. and others, "TCP Congestion Control," RFC 2581, Allman, M., and others, "TCP Congestion Control," RFC 2581,
April 1999 April 1999.
[SCTP]
Stewart, R.R. and others, "Stream Control Transmission
Protocol," RFC2960, October 2000.
[Stevens] [Stevens]
Stevens, W. Richard, "Unix Network Programming Volume 1," Stevens, W. Richard, "Unix Network Programming Volume 1,"
Prentice Hall, 1998, ISBN 0-13-490012-X Prentice Hall, 1998, ISBN 0-13-490012-X.
[TCP] [TCP]
Postel, J., "Transmission Control Protocol - DARPA Internet Postel, J., "Transmission Control Protocol - DARPA Internet
Program Protocol Specification", RFC 793, September 1981 Program Protocol Specification", RFC 793, September 1981.
[TLS] [TLS]
Dierks, T. and others, "The TLS Protocol, Version 1.0", RFC Dierks, T. and others, "The TLS Protocol, Version 1.0", RFC
2246 2246, January 1999.
[Satran]
Satran, J., "iSCSI - fragments, packets synchronization and
RDMA", http://www.haifa.il.ibm.com/satran/ips/iSCSI-RDMA-
memo.txt, July 2000.
Authors' Addresses Authors' Addresses
Stephen Bailey Stephen Bailey
Sandburst Corporation Sandburst Corporation
600 Federal Street 600 Federal Street
Andover, MA 01810 Andover, MA 01810
USA USA
Phone: +1 978 689 1614 Phone: +1 978 689 1614
Email: steph@sandburst.com Email: steph@sandburst.com
Jeff Chase
Department of Computer Science
Duke University
Durham, NC 27708-0129
USA
Phone: +1 919 660 6559
Email: chase@cs.duke.edu
Jim Pinkerton Jim Pinkerton
Microsoft, Inc. Microsoft, Inc.
1 Microsoft Way 1 Microsoft Way
Redmond, WA 98052 Redmond, WA 98052
USA USA
EMail: jpink@microsoft.com EMail: jpink@microsoft.com
Allyn Romanow
Cisco Systems
170 W Tasman Drive
San Jose, CA 95134
USA
Phone: +1 408 525 8836
Email: allyn@cisco.com
Constantine Sapuntzakis Constantine Sapuntzakis
Cisco Systems Cisco Systems
170 W Tasman Drive 170 W Tasman Drive
San Jose, CA 95134 San Jose, CA 95134
USA USA
Phone: +1 408 525 5497 Phone: +1 408 525 5497
EMail: csapuntz@cisco.com EMail: csapuntz@cisco.com
Matt Wakeley
Agilent Technologies
1101 Creekside Ridge Drive
Suite 100, M/S RH21
Roseville, CA 95661
USA
Phone: +1 916 788 5670
EMail: matt_wakeley@agilent.com
Jim Wendt Jim Wendt
Hewlett Packard Corporation Hewlett Packard Corporation
8000 Foothills Boulevard MS 5668 8000 Foothills Boulevard MS 5668
Roseville, CA 95747-5668 Roseville, CA 95747-5668
USA USA
Phone: +1 916 785 5198 Phone: +1 916 785 5198
EMail: jim_wendt@hp.com EMail: jim_wendt@hp.com
Jim Williams Jim Williams
Emulex Corporation Emulex Corporation
580 Main Street 580 Main Street
Bolton, MA 01740 Bolton, MA 01740
US USA
Phone: +1 978 779 7224 Phone: +1 978 779 7224
EMail: jim.williams@emulex.com EMail: jim.williams@emulex.com
Appendix A. Sockets Support For The Framing Protocol Appendix A. Sample Sockets Support For TUF
The sockets support for the framing module takes the form of a set The sockets support for TUF described below is only a sketch. It
of socket options which may be set or requested to enable the is provided as an aid to understanding TUF. Implementing this
appropriate behavior. interface is not a requirement for a TUF implementation.
A socket may be in one of three modes in the send direction: Other software interfaces are possible. The described interface
draws from the sockets interface for UDP. The described interface
might be natural for applications already designed to support both
TCP and UCP, or that do network input and output in complete PDU
units. For applications that perform octet-at-a-time style input
and output, an alternative interface that draws from the tradition
of the TCP URG pointer interface (e.g. using a MSG_OOB flag to
send()) is equally possible. An implementation may even offer
several different interfaces to TUF.
1. Framing-aware TCP mode. No data is added to the TCP octet That said, the sockets support sketched below might well provide
stream (neither framing PDUs nor markers), but each data the basis for a complete, standard interface to be described
buffer presented in a sending operation is sent atomically as outside this draft.
a single TCP segment. This mode provides direct access to a
framing-aware TCP sender for purposes such as implementing a
framing-aware TLS.
2. Framing protocol PDU alignment sender mode. A framing PDU A.1 Basic Principles
header is added to data presented by an integral number of
sending operations, and the resulting framing PDU is sent
according to the rules of PDU alignment mode.
3. Framing protocol marker sender mode. Markers are inserted at The sockets support for TUF takes the form of a set of socket
fixed intervals which point to the octet past the current PDU options that may be set or requested to enable the appropriate
submitted by a sending operation. behavior.
A socket may be in one of two modes in the receive direction: A socket may be in one of two TUF-related modes in the send
direction:
1. Framing protocol PDU alignment receiver mode. Framing PDUs 1. TUF-compliant TCP sender mode. No data (FPDU headers) is
are expected in each TCP segment. added to the TCP octet stream, but each data buffer presented
in a sending operation is to be sent according to the rules of
TCP and TUF-compliant TCP senders. This mode provides direct
access to a TUF-compliant TCP sender for purposes such as
implementing TUF.
2. Framing protocol marker receiver mode. Markers are expected 2. TUF sender mode. An FPDU header is added to data presented by
at a fixed interval in the TCP stream. an integral number of sending operations, and the FPDU is
passed to a TUF-compliant TCP sender for transmission
Received TCP segments are processed as defined above. If a socket A socket may be in one TUF-related mode in the receive direction:
receiving operation is used to retrieve received data (as opposed
to direct placement), framing PDU headers or markers are removed
before the data is returned.
A.1 Enabling The Framing Protocol 1. TUF receiver mode. FPDUs are expected in each TCP segment.
/* Pick one sending mode and one receiving mode */ If a socket receiving operation is used to retrieve received data
if (sendMode == ATOMIC) (as opposed to the data being directly placed), FPDU headers are
mode = TCP_FRAMING_SEND_ATOMIC removed before the data is returned.
else if (sendMode == ALIGN)
mode = TCP_FRAMING_SEND_ALIGN;
else /* sendMode == MARKERS */
mode = TCP_FRAMING_SEND_MARKERS;
if (recvMode == ALIGN) A.2 Enabling TUF
mode |= TCP_FRAMING_RECV_ALIGN; /* Pick a sending mode */
else /* recvMode == MARKERS */ if (sendMode == TUF_TCP)
mode |= TCP_FRAMING_RECV_MARKERS; mode = TUF_SEND_TCP
else
mode = TUF_SEND;
setsockopt (s, SOL_TCP, TCP_FRAMING_MODE, &mode, mode |= TUF_RECEIVE;
sizeof(mode));
A framing module that does not support a requested mode MUST fail setsockopt (s, SOL_TCP, TUF_MODE, &mode, sizeof(mode));
the setsockopt call. Framing may be enabled on a socket before or
after it is connected, subject to the requirements of Section 2.
A.2 Sending Data Atomically A.3 Sending Data
The standard socket sending operations, including send(), sendto(), The standard socket sending operations, including send(), sendto(),
sendmsg(), writev(), and others are used to send framed data units sendmsg(), writev(), and others are used to send ULPDUs in TUF.
(ULP PDU)s with the framing protocol. The EMSGSIZE error should be The EMSGSIZE error should be returned if the buffer passed to the
returned if the buffer passed to the sending operation does not sending operation would result in an FPDU that does not fit in an
satisfied the size requirements defined in the `ULP Support For EMSS-sized TCP segment, unless oversized ULPDU errors are disabled,
Framing' section above. as described below.
When the path EMSS increases, the TCP MAY return EMSGSIZE once to When the path EMSS increases, the sending operation MAY return
inform the client of the change. EMSGSIZE once to inform the client of the change.
A.3 Retrieving The Current EMSS A.4 Retrieving The Current EMSS or MULPDU
getsockopt (s, SOL_TCP, TCP_SEND_EMSS, &emss, sizeof(emss)); getsockopt (s, SOL_TCP, TUF_MULPDU, &emss, sizeof(emss));
This call returns the maximum segment size that can be submitted in If the socket is in TUF_SEND_TCP mode, this call returns the TCP
a sending operation without fragmentation. The number returned EMSS. If the socket is in TUF_SEND mode, the call returns the
depends upon the current socket sending mode. If the socket is in maximum ULPDU that can be submitted in a sending operation without
framing protocol PDU alignment mode, the returned EMSS is requiring fragmentation of the associated FPDU.
appropriately adjusted by the size of the framing header. The
number should not count any octets that go towards TCP options. A
framing protocol implementation which does not support PDU
alignment mode, because the underlying TCP sender is not framing-
aware, is not required to implement this getsockopt call.
A.4 Disabling ULP PDU Packing The number should not count any octets that go towards TCP options.
A.5 Disabling ULPDU Packing
flag = 0; flag = 0;
setsockopt (s, SOL_TCP, TCP_FRAMING_PACK_PDUS, &flag, setsockopt (s, SOL_TCP, TUF_PACK_PDUS, &flag, sizeof(flag));
sizeof(flag));
This call disables the framing protocol in PDU alignment mode from This call disables TUF from packing more than one ULPDU into an
packing more than one ULP PDU into a framing PDU. By default, ULP FPDU. By default, ULP PDU packing is enabled.
PDU packing is enabled.
A.5 Enabling Emergency Mode A.6 Disabling The Report of Oversized ULPDUs
flag = 1; flag = 0;
setsockopt (s, SOL_TCP, TCP_FRAMING_EMERGENCY, &flag, setsockopt (s, SOL_TCP, TUF_REPORT_OVERSIZED, &flag,
sizeof(flag)); sizeof(flag));
This call enables emergency mode for PDU alignment mode. It may be This call disables sending operations from returning EMSGSIZE in
called at any time on a socket, whether connected or not, and response to oversized ULPDUs. It may be called at any time on a
whether the current EMSS is smaller than 512 octets or not. By socket, whether connected or not. It is used to continue ULP
default emergency mode is disabled. operation when MULPDU is already known to be too small to permit
some ULPDUs to be sent with out segmentation. Oversized ULPDU
A.6 Setting The Sending Marker Interval reporting can be enabled again if PMTU is discovered to have
increased.
ivl = 2048;
setsockopt (s, SOL_TCP, TCP_FRAMING_SEND_INTERVAL, &ivl,
sizeof(ivl));
This call sets the period at which markers will be introduced to
the sent TCP octet stream. The sending marker interval may be set
at any time, but it only has effect when sending markers is enabled
for the socket.
A.7 Setting The Receiving Marker Interval
ivl = 2048;
setsockopt (s, SOL_TCP, TCP_FRAMING_RECV_INTERVAL, &ivl
sizeof(ivl));
This call sets the period at which markers are expected in the
received TCP octet stream. The receiving marker interval may be
set at any time, but it only has effect when receiving markers is
enabled for the socket.
Full Copyright Statement Full Copyright Statement
Copyright (C) The Internet Society (2001). All Rights Reserved. Copyright (C) The Internet Society (2001). All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain others, and derivative works that comment on or otherwise explain
it or assist in its implementation may be prepared, copied, it or assist in its implementation may be prepared, copied,
published and distributed, in whole or in part, without restriction published and distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice and this of any kind, provided that the above copyright notice and this
paragraph are included on all such copies and derivative works. paragraph are included on all such copies and derivative works.
However, this document itself may not be modified in any way, such However, this document itself may not be modified in any way, such
as by removing the copyright notice or references to the Internet as by removing the copyright notice or references to the Internet
Society or other Internet organizations, except as needed for the Society or other Internet organizations, except as needed for the
purpose of developing Internet standards in which case the purpose of developing Internet standards in which case the
procedures for copyrights defined in the Internet Standards process
must be followed, or as required to translate it into languages
other than English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/