Transport Area Working Group S. Bailey (Sandburst) Internet-draft
SandburstJ. Chase (Duke) Expires: January 2001May 2002 J. Pinkerton Microsoft(Microsoft) A. Romanow (Cisco) C. Sapuntzakis Cisco M. Wakeley Agilent(Cisco) J. Wendt HP(HP) J. Williams Emulex(Emulex) TCP ULP Framing for TCP draft-ietf-tsvwg-tcp-ulp-frame-00Protocol (TUF) draft-ietf-tsvwg-tcp-ulp-frame-01 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract The framingTCP ULP Framing (TUF) protocol accepts PDUs fromdefines a ULP (upper level protocol)shim layer protocol between an Upper Layer Protocol (ULP) and transports them overTCP. TUF also depends on a specified TCP connection. This is done in such a way that the PDUs can be recovered atsegmentation convention between TUF endpoints. Together, the shim and segmentation conventions enable a TUF/TCP receiver even if precedingto recognize ULP data units within a TCP segments have not yet been received.segment independently of other TCP segments. This is useful when the PDUs are self describing withincapability simplifies the contextdesign of enhanced network interfaces implementing direct data placement for ULPs using TCP. Direct data placement is a protocol TCP connection. In this case, the framing protocol allows incoming packetskey step to be parsed (but not processed)making IP networking competitive with high-end interconnect solutions in the order received and theirdata to be placed directly in the ultimate destination memory instead of TCP reassembly buffers.centers and other high-performance application domains. Table Of Contents 1. IntroductionDefinitions . . . . . . . . . . . . . . . . . . . . . . 3 2. Overview . . 2 2. Theory Of Operation. . . . . . . . . . . . . . . . . . . . 3 3. ULP Support For Framing. . 4 2.1. Motivation . . . . . . . . . . . . . . . . 5 4. Negotiating Use Of The Framing Protocol. . . . . . . 4 2.2. Approach . . . 6 5. PDU Alignment Mode. . . . . . . . . . . . . . . . . . . . . 6 5.1. Framing-aware TCP5 3. Rational For TUF . . . . . . . . . . . . . . . . . . . . 8 5.2. PDU Alignment Mode Exception Cases6 3.1. Direct Data Placement . . . . . . . . . . . . 9 5.3. Validity Of Framing-aware TCP Segmentation. . . . . 7 3.2. Direct Data Placement with TCP . . . 10 5.4. Receiving In PDU Alignment Mode. . . . . . . . . . 8 3.2.1. The Simple Case: ULP-unaware Placement . . . 11 6. Marker Mode. . . . . . 9 3.2.2. The Complex Case: ULP-aware Placement . . . . . . . . . 9 3.2.3. The Problem of ULP-aware Placement with TCP . . . . . . 10 3.2.4. Finding ULPDUs In Out-of-order Segments . . . 12 7. Security Considerations. . . . . 11 3.2.5. The TUF Solution . . . . . . . . . . . . . 12 7.1. Security Protocol Interactions. . . . . . . 12 3.2.6. TUF's ULP Assumptions . . . . . . . 13 7.2. Using IPSec With The Framing Protocol. . . . . . . . . . 13 7.3. Using TLS With12 4. The FramingProtocol . . . . . . . . . . . 13 7.3.1. Using TLS In PDU Alignment Mode . .. . . . . . . . . . 15 7.3.2. Using TLS In Marker Mode. 13 4.1. The Framing Protocol Data Unit (FPDU) . . . . . . . . . 13 4.1.1. FPDU Format . . . . . . 15 7.4. Other Security Considerations. . . . . . . . . . . . . . 16 8. IANA Considerations. . 13 4.1.2. FPDU Size Selection . . . . . . . . . . . . . . . . . . 16 9. References14 4.2. TUF-conforming TCP Sender Segmentation . . . . . . . . . 15 4.3. Negotiating TUF . . . . . . . . . . . . . . . . 16 Authors' Addresses. . . . 15 4.4. TUF Receiver ULPDU Containment Property Testing . . . . 16 5. Protocol Characteristics . . . . . . . . . . . . . . . . 17 A. Sockets Support For The Framing Protocol5.1. Properties Of TUF-conforming TCP Senders . . . . . . . . 17 5.2. Exception Cases . . . . . . . . . . . . . . . . . . . . 18 5.2.1. Resegmenting Intermediaries . . . . . . . . . . . . . . 18 5.2.2. PMTU Reduction . . . 19 A.1 Enabling The Framing Protocol. . . . . . . . . . . . . . . 20 A.2 Sending Data Atomically. . . 19 5.2.3. PMTU Increase . . . . . . . . . . . . . . . . . . . . . 20 A.3 Retrieving The Current5.2.4. Receive Window < EMSS . . . . . . . . . . . . . . . . 21 A.4 Disabling ULP PDU Packing. 21 5.2.5. Size of ULPDU + 8 > EMSS . . . . . . . . . . . . . . . . 21 A.5 Enabling Emergency Mode6. Security Considerations . . . . . . . . . . . . . . . . 22 6.1. Protocol-specific Security Considerations . . 21 A.6 Setting The Sending Marker Interval. . . . . 22 6.2. Using IPSec With TUF . . . . . . . 22 A.7 Setting The Receiving Marker Interval. . . . . . . . . . . 22 Full Copyright Statement6.3. Using TLS With TUF . . . . . . . . . . . . . . . . . . . 22 1. Introduction Many upper layer protocols (ULP)s, particularly those which perform bulk data transfer, permit the final location of transferred data (e.g. a ULP client buffer) to be known when the data is received. The information required to compute the final location of such data is contained in local protocol state and ULP protocol data unit (PDU) headers. In this case, ULP data can be placed directly at its final destination by a network interface with knowledge of the ULP. A direct placement network interface can offer extremely high performance since the host CPU does not copy the data at all, and the data only crosses system buses once. Both specific application ULPs, such as iSCSI, and generic hardware acceleration ULPs, such as an RDMA protocol, offer the potential for direct data placement. The advantage of using a generic acceleration ULP for direct data placement is that the same direct placement network interface can be used to accelerate many different application protocols (e.g. iSCSI7. IANA Considerations . . . . . . . . . . . . . . . . . . 25 References . . . . . . . . . . . . . . . . . . . . . . . 25 Authors' Addresses . . . . . . . . . . . . . . . . . . . 26 A. Sample Sockets Support For TUF . . . . . . . . . . . . . 27 A.1 Basic Principles . . . . . . . . . . . . . . . . . . . . 28 A.2 Enabling TUF . . . . . . . . . . . . . . . . . . . . . . 28 A.3 Sending Data . . . . . . . . . . . . . . . . . . . . . . 29 A.4 Retrieving The Current EMSS or MULPDU . . . . . . . . . 29 A.5 Disabling ULPDU Packing . . . . . . . . . . . . . . . . 29 A.6 Disabling The Report of Oversized ULPDUs . . . . . . . . 30 Full Copyright Statement . . . . . . . . . . . . . . . . 30 1. Definitions The following terms and abbreviations are used in this document. data delivery - the delivery of received ULP payloads to the ULP application, i.e, notifying the application of data arrival by completing a receive operation or generating an event. data placement - the storage of received ULP payloads to host memory, pending delivery to the ULP application. direct data placement - the storage of received ULP payloads directly to application-specified buffers without intermediate buffering or copying. EMSS - the effective maximum segment size. EMSS is the TCP maximum segment size (MSS) defined in RFC 793 [TCP] and exchanged during TCP connection establishment, adjusted by the current path maximum transfer unit (MTU) [PathMTU]. FPDU - framing protocol data unit. The protocol data unit defined by TUF. MULPDU - maximum upper layer protocol data unit size. The size of the largest ULPDU that fits in an EMSS-sized FPDU. NIC - network interface controller. The device that provides a host's access to a physical network link. PDU - protocol data unit. A self-contained block of control and data defined by a particular protocol. RDMA - Remote Direct Memory Access protocol. A data transfer protocol which uses memory access-style transfer mode(s) to provide generic direct data placement capabilities for arbitrary ULPs. TUF - TCP ULP Framing protocol. The protocol defined in this document. ULP - upper layer protocol. The client protocol using the services of the transport layer, or TUF. ULPDU - upper layer protocol data unit. ULPDU containment property - the property that a TCP segment contains exactly an integral number of ULPDUs. 2. Overview This section summarizes the motivation for the TCP ULP Framing (TUF) protocol and explains its operation in brief. Section 3 (`Rational for TUF') develops the rationale for TUF in detail. Section 4 (`The Protocol') defines the protocol itself. Section 5 (`Protocol Characteristics') examines various properties of the protocol's operation. Implementors may wish to refer directly to sections 4 and 5. 2.1. Motivation The IP protocols are not usually used for high-performance high speed data transfers due to overhead in TCP processing. Instead, a number of special purpose protocols have been used. The domain of application for such high speed buffer transfer includes storage, video delivery and processing, and various applications of cluster computing, such as scalable database or application service. For reasons discussed below, today, there is great industry interest in developing an IP standard for low overhead high bandwidth data transfer, which would decrease the costs of high speed interconnects and supplant special purpose protocols. The approach typically used for low overhead transfers is called direct data placement, in which the network interface places data directly in application buffers, avoiding the latency and memory bandwidth costs associated with copying. Direct data placement can in principal be done with either of IP's reliable transports--SCTP or TCP. This document considers what is needed to do direct data placement with TCP. In order to place data directly in application buffers, the network interface needs to use information in the Upper Layer Protocol Data Units (ULPDUs) contained in the TCP stream. This can be accomplished routinely except when TCP segments arrive out of order. If TCP segments arrive out of order, the location of the ULPDUs in the TCP segment cannot be found. The TUF protocol addresses this problem of finding ULPDU headers in the TCP stream, even when TCP segments arrive out of order. 2.2. Approach TUF is implemented as a shim layer between an ULP and TCP. The end-to-end data flow is: 0. Use of TUF is negotiated end-to-end by the ULP. 1. The ULP delivers a data stream with ULPDUs delimited to TUF. 2. TUF inserts a header and delivers the shimmed ULPDUs to TCP. 3. The TUF-aware TCP sender preserves boundaries of shimmed ULPDUs (TUF FPDUs) as much as possible when delivering segments to the IP layer. 4. The receiving TCP delivers shimmed ULPDUs to the receiving TUF layer. 5. TUF removes the shim and delivers the ULPDUs to the ULP. In other words, the layering of TUF is: ULP client ^ | | ULPDUs (in octet stream) | v TUF ^ | | FPDUs (containing ULPDUs) | v TUF-conforming TCP ^ | | TCP Segments (each containing an FPDU) | v . . . Note that while the semantics of this protocol layering must be maintained, the receiving network interface may use the information in the framed ULPDUs to place the data in memory on the host. Whatever the case, the data is only delivered to the ULP when all preceding TCP data has arrived. 3. Rational For TUF This document defines the TUF protocol as a shim layer between an Upper Layer Protocol (ULP) and TCP. TUF also depends on a TCP segmentation convention between TUF/TCP endpoints specified in this document. Taken together they provide the capability for a TUF/TCP receiver to recognize ULPDUs by processing each TCP segment independently, without requiring state from previous segments. The purpose of TUF is to enable practical designs for enhanced network interfaces (NICs) implementing direct data placement for TCP-based ULPs. The purpose of direct data placement is to eliminate the need for a host to copy received data after it arrives in host memory. This copying incurs CPU, memory and bus costs that are substantial and are not masked by advancing hardware technology. A general and practical solution to the receive copy problem has eluded the IP networking community for almost two decades. There is a long history of research and experimental schemes to reduce or eliminate receiver copying overhead for IP networking in general, and for TCP/IP communication in particular. While these systems have convincingly demonstrated the potential performance benefits of reducing copy costs, all such schemes suffer from one or more of the following limitations: they require a significant restructuring of operating system buffering and/or APIs; they are limited to specific modes of communication (e.g., bulk data transfer) or specific application ULPs; they do not scale on multiprocessor hosts; their benefits depend on specific properties of the network (e.g., large MTUs) or host buffer size and alignment. Moreover, all such schemes require some degree of support from NICs to separate payloads from headers and/or ensure that their placement in host memory meets specific requirements (e.g., for page placement and alignment). Inherent copying costs for IP communication are one motivation to use alternative non-IP technologies for high-speed networking. A number of specialized technologies have been developed for high speed data transfers in which network interfaces transfer data from application buffer to application buffer without software touching the data. Some examples include the VAXCluster Interconnect in 1983, Fibre Channel (FC) in 1994, and today InfiniBand (IB) and Virtual Interface Architecture (VIA). These alternatives have eroded the popularity of IP technologies in application domains including network storage, video processing and delivery, and cluster computing for scientific applications and scalable database-related services. Until recently, several factors have limited interest in promoting IP networking as a solution in these application domains. First, the competing network technologies offered significantly higher link speeds than the network hardware available for use with IP. Second, these application domains were a relatively small segment of the network market. Recently, however, Ethernet networks have closed the bandwidth gap and even exceeded the bandwidth of alternatives such as FibreChannel, at much lower cost. At the same time, an increasing number of applications are server-hosted in data centers to enable sharing and access from a growing number of IP-connected client devices and locations. With the growth in importance and number of data centers, high-speed interconnection within the data center is now central to the everyday operation of Internet services. Thus, technology changes have created an opportunity and demand to extend the benefits of IP technologies to high-performance application domains, while simultaneously increasing the importance of those domains. The ubiquity of IP offers economies of scale heavily favoring IP in these domains. For example, reliance on specialized non-IP technologies for high-performance domains creates a need to support multiple protocols and redundant network infrastructure in data centers, and it compromises portability and interoperability of data center solutions. Moreover, comprehensive support for network management and security is developing rapidly in the IP space. Use of IP technologies would allow data centers to benefit from these enhancements. 3.1. Direct Data Placement Direct data placement is a key step toward making IP networking competitive in data centers and other high-performance domains. Direct data placement refers to the ability of a NIC to place data directly from the network into designated application buffers, without intermediate copying. Direct data placement is attractive relative to other solutions to the receive copy problem. It is the only solution that can be implemented in a way that is compatible with existing operating systems, since the receiving NIC takes over most of the responsibility to avoid receive copying. Also, direct data placement generalizes easily to a range of ULPs. In particular, the establishment of an IETF standard for an IP transport-based direct data placement protocol, which would allow NICs to directly place data independent of the application ULP using it. The TUF protocol is necessary to permit easily deployable enhanced NICs supporting direct data placement. Such NICs already exist and their usage is growing rapidly, but their development is impeded by the lack of standards. Direct data placement is unnecessarily difficult and expensive to design and implement for existing TCP- based ULPs; the key objective of TUF is to define transport conventions to simplify the design of these NICs. A related impediment is that in the absence of a general direct data placement protocol these products are limited to specific ULPs such as iSCSI. TUF, and possibly additional, higher layer protocol definitions outside the scope of this document, would encourage the market by ensuring interoperability of product offerings from different vendors. This document defines a framing protocol (TUF) and TCP segmentation conventions that enable simple support of direct data placement for a class of TCP-based ULPs. It does not propose a generic direct placement ULP, such as an RDMA protocol, or any facility for direct data placement, but only the foundations for building such a facility on TCP. A key objective of TUF is to do this in a way that is compatible with existing standards and with the spirit of TCP's stream communication model. TUF can simplify support for direct data placement for ULPs such as iSCSI, and it can serve as a basis for a future RDMA proposal. The key limitation of TUF as a solution to the receive copy problem is that it works only if the ULP standard and the sending and receiving implementations all support it. Impact on RDMA). PDU shall meanthe sender and ULPs is minimal, but ULPs must be adapted to allow use of TUF at the ULP/transport boundary. The necessary modifications may be quite small. Use of TUF is a negotiated option between the sender and receiver for each ULP PDUsession, preserving interoperability among senders and receivers that do not support TUF. 3.2. Direct Data Placement with TCP Direct data placement is widely used to accomplish high-performance data transfer in non-IP technologies such as block storage channels (SCSI, Fibre Channel, etc.), and other specialized high performance networks like InfiniBand. This section considers how direct placement can be done with TCP. The Internet Protocol suite provides two transports that are prime candidates for use with direct data placement -- SCTP and TCP. The framing features of the SCTP Stream Control Transmission Protocol [SCTP] make it more directly adaptable for direct data placement for future ULPs using SCTP. However, the maturity and ubiquity of TCP make it desirable to define a flexible method for direct data placement for TCP-based ULPs as well. There has been a great deal of `moral confusion' concerning the interaction of direct data placement with TCP's ordering guarantees. These ordering guarantees do not prohibit direct data placement, even if data is placed as it arrives out of order. TCP guarantees data delivery to the application ULP as an ordered, sequential stream [RFC793]. Data is delivered only when TCP has notified the application of its arrival and transferred ownership of the receive data buffer. TCP does not specify how received data is stored prior to its delivery, and it does not preclude placement of data in application buffers out of order, as long as no data is delivered until all preceding data has also been delivered. Out- of-order placement greatly simplifies direct data placement NICs because it streamlines data paths and eliminates the need for a TCP reassembly buffer on the remainder ofNIC. An implementation performing direct data placement must still respect all TCP delivery semantics. For example, if a checksum integrity check fails, the data must not be placed in ULP-supplied buffers, because, for example, the document unless otherwise indicated.TCP specifies thatports and the TCP sequence number are not trustworthy. 3.2.1. The Simple Case: ULP-unaware Placement Direct data placement into a ULP is notifiedclient-supplied buffer designated to hold the next data delivered to the ULP, regardless of the deliverycontents of octets inthe order in which they are presented toreceived data, is one of the sender. Many ULPs rely on this sequencing guarantee. While notification from TCPsimplest possible forms of direct data placement. This form of direct data placement is requiredalready fully supported by existing TCP mechanisms. New NIC products currently, or soon to be in-order,available, which claim to offer `full zero copy operation' typical provide only this does not prohibit arbitrary placementULP-unaware form of TCPdirect data received in any order. Even ifplacement. While ULP-unaware direct data placement works well for a ULP is placed out-of-order,ULPs like FTP where the ULP may still only be notifiedentire contents of a TCP connection are known to be nothing but a single stream of suchbulk client data, most widely used ULPs, e.g. HTTP [HTTP], BEEP [BEEP] and storage protocols, multiplex control and data, and possibly even interleave data in-order, in accordance withfrom different requests on the same TCP semantics. In other words,connection. The simple ULP- unaware direct data placement based upon ULP information is not at odds with TCP's stream-orientation, but ratheris a natural application of TCP's philosophy that ULP PDU framing be performed at the layer above TCP. RFC 879 also points out in its discussioninadequate to avoid data copies for these ULPs. 3.2.2. The Complex Case: ULP-aware Placement An explicit goal of layering and modularity thatthis type of behaviorproposal is completely in harmony with layered protocol design [RFC0879]. Packet delay, lossto support out-of-order direct data placement for ULPs that provide additional transport-like features such as control and reordering are expected, common occurrences in IP networks. Traditionally,data inmultiplexing, layered above TCP segments is placed in an intermediate reassembly buffer to restore the sending order which may have been lost as a result of segment delay, loss(e.g., iSCSI or reordering. While it is possible fora generic direct data placement network interface to implement a complete reassembly buffer,protocol such as RDMA). In many ULPs, such as storage protocols, control information contained in the cost of doing so is prohibitive. Such a reassembly buffer would need to have a size equal toULP uniquely identifies the sumdestination application buffer of the maximum window sizeseach particular piece of all active connections. Ondata. For example, suppose a client requests a read operation using a fastnetwork link (e.g. > 1 Gb/s),storage ULP, specifying the window size for each connection can be very large, which would require a huge, very high speed reassemblydestination buffer onfor the network interface. A way to find PDUs when previous PDU headers arerequested data. The requesting ULP includes control information in delayed, lost or reordered segments will permit datathe request (e.g., in these subsequent PDUsthe ULPDU header) uniquely identifying that buffer, and the responder includes that information in the read response. For some protocols, the identifier is a unique request ID, allowing the client ULP to be placed immediately byidentify the buffer indirectly through a direct placement network interface. This will reducetable of pending requests. If the storage protocol uses RDMA, the response may specify the buffer requirements fordirectly by means of a direct placementregion identifier. A network interface. Without such a mechanism,interface that understands the relevant ULP control information can use it to place the incoming data from subsequent PDUs must all be buffered(e.g., read response payload) directly in the adapter until all previous TCP segments are received. Initial discussion ofcorrect buffer. In this issue, and how it relates specifically to iSCSI can be foundcase, data placement is guided by ULPDU headers embedded in an early iSCSI design team memo [Satran]. This document specifies a protocol with two modesthe TCP data stream. The NIC accesses these headers as hints for efficiently finding PDUs inplacement of the presenceULP payloads--a form of lost, delayed or reordered TCP segments. 2. Theory Of Operation One very efficient way to guarantee that subsequent PDUs can always be found when a previous PDU header has been lost is to ensureintegrated layer processing for each TCP segment beginsas it arrives. This is compatible with a PDUTCP's ordering properties if completion of ULP header processing and contains an integral numberdelivery of PDUs. In this case,the payload data to the application are strictly in each TCP segment may be placed independentlyorder. 3.2.3. The Problem of all other segments. No reassembly buffer is required at all. Guaranteeing aULP-aware Placement with TCP segment beginsThe problem with performing direct data placement as a PDU requires a modificationfunction of ULP control information in TCP is that it may be difficult to TCP's sending behavior. This document defineslocate the behavior ofULP control information (ULPDU headers) within a TCP with a modified sender behavior, called a `framing-aware TCP'. A framing-awaresegment. If all TCP allows asegments are received in sequence order, ULP control information can be unambiguously located by the rules that permit any ULP implementation to ensure thatdo so. For example, each TCP segment begins withULPDU may contain a PDU. A framing-aware TCP is fully compliant with all RFCs governing TCP and fully interoperable with existing, compliant, non-framing-aware TCP implementations. Whenlength field that implicitly specifies the framing protocol can use a framing- aware TCP, it operates in `PDU alignment mode'. The framing protocol in PDU alignment mode uses a combinationlocation of a framing- aware TCP and an encapsulationthe beginning of PDUs to permit error free PDU location whenthe subsequent ULPDU. If TCP segments are lost. Another waynot received in sequence order, without taking additional measures, it may not be possible to unambiguously locate PDUs in the presence of lost TCP segmentsULP control information needed for direct data placement. For example, if ULPDU length information is to insert markers atin a known periodTCP segment that is delayed or lost in transmission, assuming the TCP octet stream. Each marker points toULPDU length is the beginningonly means of locating the next PDU. Ifbeginning of the marker frequencysubsequent ULPDU, it is high relativeimpossible to packet loss rate (e.g. once perlocate ULP control information for ULPDUs in subsequent TCP segment), the receiver can, with very high likelihood, learn the location of the next PDU from a marker even when a previous PDU header has been lost. The receiver must still buffer the octets betweensegments until the lost or delayed TCP segment is received. ULP control information, and the subsequent PDU, but this is likely todata whose placement depends on it may even be a much smaller buffer thanin different TCP segments. If the maximumULP control information is in a TCP window size. By limitingsegment that is delayed or lost, it is impossible to directly place the maximum PDU size,data until the receiver buffering can be reasonably bounded. This document definesULP control information is received. 3.2.4. Finding ULPDUs In Out-of-order Segments Early attempts at ULP-aware direct data placement in TCP took the approach of only directly placing data for TCP segments received in-order. Otherwise, data was copied through a periodic marker mechanism which can be used to bound receiverreassembly buffers. Two framing protocol modes are defined becausebuffer as in a traditional implementation. Unfortunately packet loss, and attendant out-of-order reception is a frequent, continuous characteristic of both wide-area, and switched local area networks of almost any size, as TCP adjusts to varying congestion conditions. Under these conditions, a large portion of the substantial tradeoff between the modes. Both modes can bounddata transferred ends up being copied, rather than being directly placed. Another solution to this problem is to build a reassembly buffer on a direct placement network interface, butinto the modes applynetwork interface. Data received out-of-order can be held in disjoint circumstances. Marker mode has the following advantage: 1. Implementable without TCP sender modification The PDU alignment mode hasthe following advantages: 1. Nonetwork interface reassembly buffering required atbuffer until all 2. Placement informationpreceding data is always atreceived, and then direct placement can be performed on the start of a TCP segment, substantially simplifying hardware processing PDU alignment modereassembled data. Within certain implementation assumptions, this is more powerful,reasonable approach, but, unfortunately there are a number of issues including very large memory requirements, limited scalability, and is preferable when available. Marker mode still requires some high-speedincreased latency, that make the reassembly memory, whoseapproach undesirable. The size of reassembly buffer needed in the network interface is a lineardirect function of the numberbandwidth * delay product of all active TCP connections. Furthermore, marker mode only offers a probabilistic boundReasonable assumptions on the reassembly buffer size peractive TCP connection. In cases where many TCP segments with PDU headers are lost, the buffer size required for direct placement could approach that ofbandwidth * delay product can imply a complete reassembly buffer. It is expected that ultimately PDU alignment mode will dominate becauselarge amount of compelling cost and performance scalability advantages. However, until framing-aware TCPs are ubiquitous, marker mode offers an alternative for use with an unmodified TCP implementation. To make transition from marker mode to PDU alignment mode easy,reassembly memory. Furthermore, this large reassembly memory must run at high speed---more than two times the sockets API extension definedlink speed, to maintain full link bandwidth. Finally, performing reassembly in Appendix A supports both modes relatively transparently. A ULP which implementsthe behavior required for PDU alignment mode can use marker mode without modification. Framing protocol receivers MAY implement either PDU alignment mode, or marker mode, or both. Framing protocol senders, MUST implement marker mode, and MUST implement PDU alignment mode ifnetwork interface requires that the underlying TCP is framing-aware. 3. ULP Support For Framing A ULP usingbandwidth from the framing protocol will submit each complete PDUnetwork interface to host memory be not just equal, but substantially greater than the maximum bandwidth of the network link, to ensure that the framing module in a single sending operation. This behaviorreassembly buffer is already common practice for most ULP implementations. When the framing protocoldrained when reassembly is complete. System bus and interconnect bandwidth are particularly scarce and expensive resources in PDU alignment mode, each PDU submittedmost systems. What is limitedneeded to permit ULP-aware direct data placement without reassembly buffering is a way to ensure that the smaller of 2^16-8 (65528)ULP control information and the size that will fit entirelydata associated with it is highly likely to be contained completely within a single TCP segment. The framing protocol in PDU alignment mode MUST fail any attemptsegment, and a way for a receiver to submitvalidate this containment property on TCP segments it receives. If the receiver can determine that a PDUULPDU starts at the beginning of a TCP segment, the receiver can perform ULP-aware direct placement for that is larger than will fit with an 8-byte framing headerULPDU, and subsequent ULPDUs contained in athat TCP segment. The TCP maximum segment size (MSS)property that a ULPDU is defined in RFC 793 [TCP] as the segment size exchanged oncompletely contained within a TCP connection establishment. In addition, there is thesegment size presently used by TCP whichis less than or equal tocalled the exchanged MSS, adjusted by`ULPDU containment property'. 3.2.5. The TUF Solution The TUF protocol defines a shim layer above TCP and below the current path MTU [PathMTU]. This document callsULP that allows the MSS presently in usereceiver to validate the `effective maximumULPDU containment property for each TCP segment size' (EMSS). The EMSS isreceived, independently of primary concern toany other TCP segment. The TUF protocol also defines a segmentation behavior for the TCP sender that ensures the ULPDU containment property holds as often as possible while still respecting the framingprotocol in PDU alignment mode.requirements for TCP senders. The TUF-specified TCP EMSS can shrinksegmentation behavior ensures that the ULPDU containment property is maintained as long as the receiver window size is at least equal to 8 octets [PathMTU] which leaves no room for a PDU in PDU alignment mode. Ifthe EMSS goes below 512 octets,effective MSS (EMSS), the ULP MAY instructpath MTU (PMTU) does not change, and the framing protocol to enterTCP stream is not resegmented by an "emergency mode."intermediary. In this mode,conditions where the framing module MUST accept PDUs up to 512 octets and MAY fragment a PDU acrossTCP segments. The EMSS may change during the course of the connection. The framing module in PDU alignment mode MUST notify the ULP sender of changes inreceiver window size is smaller than EMSS, or the EMSS. The framing module in PDU alignment mode MUST providePMTU changes, the current value ofsegmentation behavior further ensures that once the path EMSS torelevant condition is restored, the ULP on request. WhenULPDU containment property will be satisfied again. For the framinghigh-performance applications that this protocol is in marker mode, each PDU submitted is limited to 2^16-8 minustargets, small receiver window sizes, and PMTU changes are rare transients. Thus, the size of all interspersed markers. The framingspecified protocol ensures that ULP control information and its associated data are virtually always together in marker mode MUST fail any attempt to submit a PDU larger than this limit. The framing module MAY imposea smaller, implementation specific size limit on PDUs. In order to effectively bound the receiver's reassembly buffer size, thesingle TCP segment. 3.2.6. TUF's ULP SHOULD submit PDUs limited in size by some appropriate functionAssumptions A key assumption of TUF is that ULPs running on TUF can adjust ULPDU sizes to fit completely within an EMSS-sized TCP segment. Clearly, if a ULPDU does not fit within an EMSS-sized TCP segment, the receiver's reassembly buffer resources,ULPDU containment property can not be satisfied. Most storage protocols (e.g. iSCSI), and other performance-targeted protocols (e.g. RDMA protocols) support this capability. ULPs that can not adjust ULPDU sizes to fit within an EMSS-sized TCP segment, but no specific limit is imposed bystill want the framing protocol. 4. Negotiating Use Of The Framing Protocol Negotiating useperformance advantages of the framing protocol is the responsibilitydirect data placement, can be mapped on top of an intermediate protocol (e.g. an RDMA protocol) that does support this data `chunking'. TUF does not change the ULP. The usestream delivery semantics of TCP to the framing protocol MAY be negotiated separately for each direction onULP, through the TUF implementation. It merely inserts a particular connection. The negotiation procedure MUST ensureshim header that when receive framing is enabled, the remote peer will not transmitcan be used by direct placement network interfaces to verify the first TCP segment with framed data until itULPDU containment property. The shim header is certain thatinserted by the local peer has actually enabled receive framing. If a receiver requests PDU alignment mode,sending TUF implementation and removed by the sender supports PDU alignment mode, thenreceiving TUF implementation, leaving a stream to be delivered to the sender MUST enable PDU alignment mode. This ensures that PDU alignment mode, with its favorable hardware characteristics, is used when possible.ULP. 4. The specific negotiation mechanism for enablingProtocol This section defines the framingTUF protocol and choosing the framing mode is outsideitself. The first two sections are the scopecore of this document. However, note that framing protocol behavior is requested bythe receiver and offered byprotocol defining: o the sender. Negotiation will probably include exchange of: 1.shim layer PDUs, called FPDUs, o a TCP-conforming segmentation behavior which ensures the receiver's desired mode(s) 2.ULPDU containment property holds under most conditions. The remaining sections cover other aspects of the sender's framing key if PDU alignment mode is selected 2. ULP packing behavior if PDU alignment mode is selected 3.protocol which are primarily implications of the receiver's desired marker period if marker mode is selected 4.core protocol: o what ULP-specified negotiations to enable TUF must accomplish, o how receivers can process received TCP segments to establish whether the receiver's desired maximum PDU size if marker mode is selected 5. PDU Alignment ModeULPDU containment property holds. 4.1. The framing protocol in PDU alignment modeFraming Protocol Data Unit (FPDU) TUF sends groups of one or more complete ULP PDUs preceded by a framing header. This framing header and set of ULP PDUs is called a `framing PDU'. The framing protocol in PDU alignment mode is supported by a framing-aware TCP whose behavior is describedULPDUs in `Framing-Aware TCP', below.a framing protocol data unit (FPDU). 4.1.1. FPDU Format The format of a framing PDU is as follows:an FPDU is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Key | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | ~ ~ ~ ULPDUs ~ | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ULPDUs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Length: 16 bits (unsigned integer) This is the length in octets of the set of framed ULPDUs. It does not include the length of the FPDU header itself. Key: 48 bits (unsigned integer) This is used by the receiver to validate the ULPDU containment property. It is selected at random by the sender, and initially signaled to the receiver in a ULP-specified way, before the receiver attempts to test the ULPDU containment property. All FPDUs sent on the same connection in the same direction must use the same key value. A good quality random number generator MUST be used to generate the initial key. RFC 1750 discusses relevant characteristics and provides references for good quality random number generation [RFC1750]. The length of an FPDU is 8 + L octets, where L is the length of the set of framed ULPDUs. The 16-bit length field is sufficient to permit a TCP segment with an FPDU to completely fill a maximum-size IPv4 or IPv6 datagram. 4.1.2. FPDU Size Selection Each FPDU SHOULD contain as many contiguous, complete ULPDUs as will fit within the current EMSS, unless ULPDU packing is disabled. If ULPDU packing is disabled each FPDU SHALL contain a single ULPDU. ULPDU packing mode may be negotiated, or specified a priori by a ULP. Disabling ULPDU packing is analogous to disabling the Nagle algorithm in TCP. TUF SHALL present the size of the largest ULPDU size fitting in an EMSS-sized FPDU (MULPDU) to the ULP. MULPDU is EMSS - the FPDU header size (8 octets). ULPs SHOULD submit as large ULPDUs as possible to TUF, up to MULPDU, subject to limits imposed by specific ULP properties. The ULP MAY also chose to pack several ULPDUs into an EMSS-sized unit before submitting them as one ULPDU to TUF. Depending upon the ULP, ULP packing may improve data transfer efficiency, and is unlikely to have any detrimental effect. A TUF implementation probing for PMTU increase SHOULD present an increased MULPDU value to the ULP PDUs ~ | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |until a large enough FPDU to perform the probe results. Under exceptional circumstances, the EMSS can become too small to accommodate even a single ULPDU. For example, a ULP may define fixed-sized PDUs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The "Length" fieldthat are incompressible, or variable size PDUs with some absolute minimum size, such as the size of a data PDU containing a minimum amount of data. It is 16 bits and containspossible for the length inEMSS to shrink to as small as 8 octets [PathMTU]. If the EMSS is too small to accommodate an incompressible ULPDU, the FPDU MUST contain only that ULPDU. ULPs using TUF SHOULD NOT define ULPDUs with a minimum size greater than 128 octets. 4.2. TUF-conforming TCP Sender Segmentation TCP senders are allowed substantial freedom in the choice of how to segment an outgoing TCP stream. Within the setconfines of framed ULP PDUs, excludingthe framing header. The "Key" field is 48 bitsreceiver-advertised receive window, and is selected at random bythe sender, and signalledsender computed congestion window, any segmentation is permitted. Virtually all TCP implementations do attempt to segment outgoing TCP streams into EMSS-sized segments where possible because it improves performance. TUF-conforming TCP sender behavior ensures that the receiver in a ULP-specified way. All framing PDUs sent onULPDU containment property holds most of the same connectiontime. To do this, a TUF- conforming TCP sender MUST respect a single additional rule in the same direction must use the same key value.performing segmentation: A good quality random number generatorTUF-conforming TCP sender MUST be used to generatesegment the initial key. RFC 1750 discusses relevant characteristics and provides references for good quality random number generation [RFC1750]. The lengthoutgoing TCP stream such that the first octet of every FPDU is sent at the framing PDU in octets will be 8 + L, where Lbeginning of a TCP segment 4.3. Negotiating TUF Negotiating the use of TUF is the lengthresponsibility of the setULP. The use of framed ULP PDUs. Whether more than one ULP PDU mayTUF MAY be packed intonegotiated separately for each direction on a single framing PDUconnection. The negotiation procedure MUST ensure that when TUF is a controllable option ofenabled or disabled, the framing module in PDU alignment mode. Some receivers may choose to expect exactly one ULP PDU perremote peer will not transmit its first TCP segment when framingin the new mode until it is behaving nominally. The sender MUST NOT pack more than one ULP PDU into a framing PDU if this behaviorcertain that the local peer has actually enabled or disabled TUF. TUF operation is desiredcharacteristically requested by the receiver. ULP packing behavior may be negotiated or specified priorireceiver and offered by the ULP. 5.1. Framing-aware TCP A framing-aware TCP SHALL send one complete framing PDU per TCP segment whenever possible. Cases when it may notsender. Before enabling TUF, the relevant parameters: 1. the sender's 48-bit key 2. ULPDU packing mode MUST be possibleestablished at each peer. A natural way to sendenable the use of TUF is a complete framing PDUULP-defined negotiation exchange of the TUF parameters culminating in enabling TUF, if requested, for each TCP segment are described in `PDU Alignment Mode Exception Cases', below.transfer direction. A framing-aware TCP MUST NOT send any TCP segment containing octets from more than one sending operation. In other words,three-way handshake protocol can be used to ensure that the boundary between data of consecutive sending operations MUST occur between TCP segments. By followingpoint at which TUF is enabled is unambiguous and each end has time to perform local state changes. A connection on which TUF is enabled is likely to be the same connection on which the negotiation occurs, but this rule,is not required. A new connection could also use TUF from its initial establishment, if the sender guaranteesTUF parameters and modes are known through some out-of-band mechanism. Use of TUF could be disabled during a connection using a similar ULP-defined three-way handshake. Other alternatives to parameter exchange include stipulating some parameters a priori. For example, a ULP could specify that TUF with ULPDU packing enabled is always used in both directions. In this case, only the event an exception causes PDU alignment48-bit keys need to be lost temporarily, it will be regained as soon as possible. The use of oversize TCP segments sent by means of IP fragmentationexchanged before TUF is discouraged due to the limited size of the IP header Identification field andenabled. Or, a ULP could determine TUF characteristics on the potential for undetected errors due to wrappingbasis of the Identification value. Framing-aware TCP implementations SHOULD resegment at theTCP layer accordingport number. 4.4. TUF Receiver ULPDU Containment Property Testing A TUF receiver that wishes to use ULP control information to perform direct data placement must first verify the rule given inULPDU containment property. To do this, the previous paragraph when necessary to meet requirements ofreceiver MUST establish that the current maximumTCP segment size for a path. Incontains exactly one FPDU. Abstractly, this document, EMSS meanscan be done by assuming the currentTCP maximumsegment size used for sending segments on a connection, which is initially negotiated during the connection handshake,payload begins with an FPDU, and subsequently adjusted by path maximum transfer unit (PMTU) discovery behavior [PathMTU]. A framing-aware TCP must notifyverifying the framing modulefollowing properties of changes in the EMSS.that putative FPDU: o The framing module must be able to retrieve the EMSS from the framing-aware TCP. If the framing-aware TCP chooses to probe for path MTU increase usingreceived TCP segment larger thanpayload length equals the path MTU,FPDU length plus the framing-aware TCP MUST report an appropriate EMSS increase.length of the FPDU header (8 octets). o The candidate path MTU will only be probed48-bit key equals the value signaled to the receiver when TUF was enabled for the framing protocol submits a framing PDU larger thanconnection. If these conditions are true, the current EMSS. Immediately followingTUF receiver MAY assume that the probing segment,ULPDU containment property holds, and use ULP control information to directly place data in the framing-aware TCP MUST reduce EMSScontained ULPDUs. TUF DOES NOT provide any information that a TUF receiver can use to its previous value untillocate ULP control information beyond the candidate path MTU is confirmed. Probing for path MTU increase is optional [PathMTU], andULPDU containment property. In particular, a framing- awareTUF receiver MUST NOT scan TCP might elect notsegments in an attempt to locate FPDUs that do so unless the EMSS becomes `inconveniently' small. Bynot probing for path MTU increase whenbegin at the current EMSS provides adequate performance,beginning of a TCP segment. However, even if the framing protocol willULPDU containment property does not send the potentially unaligned PDUs that wouldhold, a TUF receiver may still be used to probe path MTU. Although framing-aware TCP is defined specificallyable to support the framing protocol inreliably locate and use ULP alignment mode, it may be used by other clients, assuming framing validation is provided by some means.control information. For example, as discussed below in `Security Considerations', a framing-aware TLS could useif a framing-awarereceived TCP directly without adding framing PDU headers, because TLS validation can servesegment contains the same purpose, and actually provides stronger framing validations guarantees than a framing PDU header. 5.2. PDU Alignment Mode Exception Cases Althoughnext unreceived data in the framing-awareTCP sender should place exactly one framing PDUstream, the location of ULPDUs in each TCPthat segment thereare exceptions when thisunambiguous. The behavior of a TUF receiver acting on ULP control information located with properties other than the ULPDU containment property is not possible. These exceptions includespecified here. 5. Protocol Characteristics This section discusses some characteristics and behavior which are implications of the following. 1.TUF protocol. 5.1. Properties Of TUF-conforming TCP Senders The connection is in emergency mode and EMSSgeneral practice of TCP senders to send as much data as possible within a TCP segment (up to EMSS) implies that an FPDU whose size is less than 512 octets. 2. The EMSS has been reduced. Thisor equal to EMSS, and whose first octet begins a TCP segment will result inbe sent entirely within a window during whichsingle TCP segment. This ensures the ULP is not yet awareULPDU containment property for that TCP segment. A TUF-conforming TCP sender still obeys all requirements of TCP. While the reduced EMSS. Since some framing PDUs may alreadysegmentation of a TUF-conforming TCP sender will have been sent and possibly lost prior to being received,distinctive characteristics when viewed from the network wire, the same framing PDUs must be resent, if necessary, but in smallersegmentation behavior could also result from a stock TCP segments which conform to the new EMSS. 3.sender. The remote endone property of a TUF-conforming TCP sender which arguably departs from traditional expectations is advertisingthat a window smaller than the EMSS. If both ends manage their windowTUF-conforming TCP sender may not produce TCP segments which are as requiredclose in RFC-1122 [RFC1122], andsize to EMSS as a reasonable amount of receive buffering is available, this case should not occur, but the sender, for robustness, must tolerate this. 4. The sender is probing an advertised window of zero. 5.stock TCP sender. The sender is probingneed to determine ifensure the path MTU can be increased. In addition, there is another caseULPDU containment property may result in which the receiver will receive framing PDUsTCP segments which are not aligned with TCP segments. 6. Thereas full as if the property did not need to hold. While this is a middle-boxabstractly true, in the connectionpractice, several characteristics combine to minimize this effect. Specifically: o Packing ULPDUs into FPDUs gives behavior similar to that of stock TCP segmentation, albeit with coarser granularity. o ULPs which benefit from data-dependent direct data placement (candidates for TUF) usually transfer large amounts of data in bulk. This means that most ULPDUs are data-carrying, and will be EMSS-sized. Even when control is resegmentinginterleaved with data, the combination of a small number of control ULPDUs with a data ULPDU can be packed to fill an EMSS-sized segment. Therefore, a TUF-conforming TCP sender seems likely to behave similarly to a stock TCP data stream. If the framing protocol in PDU alignment mode mustsender under most circumstances. However, applications that both send an unaligned framing PDU, it SHALL take one of the following actions. 1. Sendand receive data over the framing PDUsame TCP connection, where there might be dependencies between incoming and outgoing data, are often subject to excessive delays attributable to TCP's Nagle algorithm and/or delayed-ACK algorithm [NagleDAck]. These algorithms generally perform best when TCP always sends full- EMSS segments. Because TUF can generate sub-EMSS segments as a singleby- product of aligning FPDU boundaries with TCP segment using IP fragmentation. While this behavior is discouraged, it is not prohibited byboundaries, TUF might be especially vulnerable to the framing protocol, or any other applicable RFCs. 2. Sendknown problems with the framing PDU as several TCP segments,Nagle and/or delayed-ACK algorithms. Further work, including implementation experience with each segment guaranteed not to appearTUF, as a well-formed, complete framing PDU on its own, atwell as existing and future proposals for improvements to the timeNagle and/or delayed-ACK algorithms, might be necessary to optimize TUF performance while fully preserving the segmentcongestion-avoidance features of TCP. This work is sent. That is,currently outside the sender SHALL ensure that onescope of the followingthis document. 5.2. Exception Cases The complete operational specification of TUF is true for every segment with a partial framing PDU: A. octets 0-1 do not equal the segment length minus 8 B. octets 2-8 do not match the framing key value C.contained in the total segment length is less thanrules for forming FPDUs, and sending those FPDUs in TCP segments. However, the framing PDU headeroperation of 8 octets These mechanisms ensure that the receiverTUF will not falsely misinterpret any piece of a framing PDU sent in several segments asbe subject to a complete, valid framing PDU. However if the TCP data streamvariety of transient or exceptional conditions. The behavior of TUF under those conditions is subjecteddiscussed below to resegmenting by a middle-box, the sender may no longer control segmentationillustrate specifically how TUF addresses them. 5.2.1. Resegmenting Intermediaries Resegmenting TCP-layer intermediaries (middleboxes) are one of received data. In this case the framing protocol must rely on probabilitythe most formidable obstacles to ensure that segmentsmaintaining the ULPDU containment property. In the presence of such an intermediary, the resegmented data stream willsegmentation chosen by the sender may not appear as valid, complete framing PDUs, if they are not. Inbe the case wheresegmentation at the receiver detects a continuous stream of TCP segments which doreceiver. While such intermediaries may or may not contain complete framing PDUs,be common in particular networks, in many cases the ULP SHOULD disable usepresence or absence of such resegmenting behavior is beyond the framing protocol,control or switch to marker mode if the ULP provides a meanseven knowledge of doing this, andthe end points so choose. Such a continuous stream of improperly framed TCP segments implies the presence of a resegmenting middle-box. Such a detection process SHOULD NOT mistake a temporary sequence of improperly framed TCP segments resulting from an EMSS change withusing TUF. Therefore, TUF must detect such resegmentation by design. A primary reason for the presence of a resegmenting middle-box 5.3. Validity Of Framing-aware TCP Segmentation A framing-aware TCP normally sends exactly one framing PDU per TCP segment. This may therefore result in more segments being sent than would occurrandom key in a traditional TCP. However,the framing moduleFPDU header is allowedto pack multiple ULP PDUs into a single framing PDU if ULP packing is enabled, which will give behavior approaching that of a traditional TCP. Even with ULP packing disabled,detect such resegmentation. An alternative to the behavior of a framing-aware TCP effectively correspondsrandom key which has been proposed, is to that of a traditional TCP sender withuse ULP-specific validation criteria to determine the Nagle algorithm disabled (i.e. TCP_NODELAY),ULPDU containment property. For example, some ULP PDUs include relatively strong data integrity checks such as CRCs, and this is considered acceptable behavior. Framing-aware TCPs still respect congestionother ULP control windows, which are maintained as a octet countinformation can often be validated against various ULP-specific criteria. While such ULP-specific validation criteria may involve checking many more bits than the combination of the FPDU's 16-bit length and 48-bit key, ULP-specific validation criteria may not as a segment count. On retransmission,actually offer a framing-aware TCP respectsstrong guarantee of the original stream segmentation. This is allowed by RFC1122 [RFC1122], section 184.108.40.206. 5.4. Receiving In PDU Alignment Mode Because each framing PDU contains sufficient information to determine its length,ULPDU containment property. For certain data streams, the beginningprobability of a false-positive indication of the next framing PDU can be determined. Therefore each successive PDUULPDU containment property can be recovered. Conventionalextremely high. Assume that the intermediary resegments to a granularity of no finer than G octets (e.g. 4). Also assume that the TCP implementations will pass receiveddata tostream contains predominantly application data. If the ULP in order, so framingis easily recovered by the ULP. Special receive implementationsa storage protocol, simply transferring a file containing a continuous, repeated stream of well-formed ULPDUs which exploit PDU alignment mode, typically foundare some multiple of G in direct placement network interfaces, may allowsize increases the ULP to do direct data placement on TCP segments received outprobability of order. The receiving end can safely assume thata framing PDU is exactly contained within TCP segment payload iffalse-positive indication of the following conditionsULPDU containment property to approximately: 1 / (sizeof(repeated ULPDU)/G) If the well-formed ULPDUs are met. 1. Standard TCP processing indicates that this isrelatively small (e.g. 32 octets where G=4 octets), the probability of a valid, in- window segment. 2. The payloadfalse-positive indication of the ULPDU containment property is approximately 1/8, for EACH TCP segment, parsed assegment which does not actually begin with a framing PDU, hasULPDU. Clearly, in this case, it would take only a length field which equals thevery small number of TCP segment length minus 8, and a key fieldsegments which matchesdo not begin with an actual ULPDU before the expected key for`fake' ULPDU in the framing protocol connection.application data is interpreted as an actual ULPDU. The framing protocol passes the contained ULP PDUs toconsequences of such a ULP parser. The ULP parser performs direct placementfalse-positive interpretation could be dire, for example executing a destructive operation request. The 48-bit random key in the FPDU results in a low probability of a false-positive indication of the ULPDU containment property because it is effectively secret with respect to the PDUs. The ULP parser MUST NOT executeapplication data stream. Note that although this analysis may appear to be security-minded, prompting the ULP protocol (i.e. noneimage of a sighted third-party adversary that can `sniff' the ULP protocol state variables change), until all preceding octets in the TCP stream have also been received. 6. Marker Mode48-bit key, it is actually considering a safety, rather than a security property. The framing protocol in marker mode inserts framing markerssecurity properties of TUF are discussed in Section 6 (`Security Considerations') below. Even though TUF can detect the TCP octet stream atpresence of a period agreed upon by the framing protocol sender and receiver. Each framing marker points toresegmenting intermediary, such an intermediary will almost certainly substantially reduce the next PDU inchance of the TCP octet stream. Marker insertion inULPDU containment property being satisfied. A TUF implementation which detects a very low incidence of the TCP octet streamULPDU containment property for a sustained interval (>> RTT) may assume that a resegmenting intermediary is not synchronizedin any way withoperation and SHOULD discontinue the ULP. The ULP mayuse PDUs of any size up to 2^16-8-(4 * #of markers inserted) (determined by marker interval). Markers will be inserted inULP control information found using the resulting octet stream, possibly interrupting PDUs, as necessary to maintainULPDU containment property. In such cases, the interval. AlthoughULP MAY elect to disable the placementuse of each markerTUF altogether, or simply just stop exploiting the ULPDU containment property. 5.2.2. PMTU Reduction When a PMTU reduction is notdetected by a function ofTUF-compliant TCP, the ULP PDU boundaries,TUF- compliant TCP sender may send FPDUs already committed to the contentsTCP layer in one of each marker are. The formattwo ways: o send unsegmented FPDUs in TCP segments of a framing marker is as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next PDU Offset | Next PDU Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The "Next PDU Offset" containsthe offsetold EMSS size, and rely on IP fragmentation to deliver the next PDU,segments, o segment FPDUs to fit in octets, from the end ofTCP segments which respect the marker. The "Next PDU Offset" occurs twicenew EMSS size. Stock TCPs face a similar choice on PMTU change, and both alternatives are used in practice. In the marker to guaranteecase that whena marker is split acrossTUF-compliant TCP segments,chooses to segment FPDUs, it SHOULD segment them in such a complete copy of Next PDU Offset occursway that, in at least one ofthe two TCP segments. The framing protocol receiver must remove (or otherwise ignore)absence of resegmentation by an intermediary, the periodic markers insegments are guaranteed not to give a false-positive indication of the received TCP octet streamULPDU containment property. There are various ways to reconstructensure this. For example, no matter how the PDUs fromFPDU is segmented, the sender. Thefirst marker SHALL be sent insegment is guaranteed not to give a false-positive indication of the ULPDU containment property---the 48-bit key will match, but the length will not. In the worst possible case, each subsequent TCP octet stream preceding any framed PDUs. This first marker will, necessarily, havesegment could be sent with fewer than 8 octets of data, also guaranteed not to give a Next PDU Pointerfalse-positive indication of 0. The first marker correspondsthe ULPDU containment property. More efficient approaches are possible, but PMTU reduction is a rare event, and reacting to it is only a transient condition. Eventually a new MULPDU will be presented to the pointULP, and FPDUs that fit in the new EMSS will result. During the transient condition, performance will suffer temporarily no matter how FPDUs are segmented. No matter what segmentation is chosen by a TUF-compliant TCP octet streamsender when segmenting an FPDU, if the framing protocol is enabled. 7. Security Considerations 7.1. Security Protocol Interactions The ULP framing protocol may be layered on topsegments pass through a resegmenting intermediary, the correctness of IPSec, or TLS. A direct placement network interface which supports connections secured with IPSec or TLS must directly implement security protocol processing as well as framing and direct placement support. 7.2. Using IPSec Withthe ULPDU containment property remains strictly a matter of probability. 5.2.3. PMTU Increase As described in `FPDU Size Selection' above, a TUF-compliant TCP probing for PMTU increase will present an increased MULPDU value to the ULP. This should eventually lead to an FPDU large enough to actually perform the PMTU increase probe. The Framing Protocol Since IPSecMULPDU value should not be further adjusted until the probe is designedactually performed. This behavior is similar to secure arbitrary IP packet streams, including streams where packets are lost,when a stock TCP would like to perform a PMTU increase, but less data is available than would fill the framing protocol could run cleanlydesired segment. Also, note that depending on topthe ULP, the actual distribution of IPSec without any change. Using IPSec end-to-endFPDU sizes may have a granularity coarser than a single octet. An FPDU with the framing protocol in PDU alignment mode permitsan optimization to the framing protocol. Because IPSec validation criteria guaranteeparticular, desired TCP segment size may never be generated. Therefore when probing for PMTU increase, a TUF- compliant TCP must be satisfied with an FPDU that IP packets received are equivalentproduces a TCP segment size that is `close' to the IP packets sent, it isdesired size. Finally, note that in cases where PMTU grows and shrinks relatively frequently, better performance may result from not possibleprobing for an intermediary to resegmentPMTU increase at all, or probing very rarely. This is because the performance disruption resulting from PMTU decrease can be substantial, and in many cases, implementations of TUF will be in hardware, so performance may less sensitive to differences in PMTU. 5.2.4. Receive Window < EMSS A TUF-compliant TCP stream. If IP fragmentation (rather than resegmenting)sender that is usedpresented with a receive window smaller than EMSS may be required to send committed data whensegment FPDUs. The TCP window probe is a limiting case of this condition where the EMSS changes,advertised receive window is 0, and the framing PDU validation headeramount of data typically sent in response is not needed.a single octet. In this case, a ULP may run directly on top of a framing-aware TCP. 7.3. Using TLS With The Framing Protocol Using TLS withTUF-compliant TCP sender will segment in accordance to the framing protocol is more complicated than using IPSec. The combinationrequirements of TLSTCP, and the framing protocol must still provide a modest bound on reassembly buffer sizerule defined in `TUF-conforming TCP Sender Segmentation' above. In addition, as when resegmenting in response to be useful. TLS isPMTU decrease, a TUF-compliant TCP sender SHOULD segment in such a record-oriented protocol. TLS records are PDUs just like those used by ULPs that permit direct placement. As with other ULPs, the onlyway that, in the absence of a resegmenting intermediary, segments are guaranteed not to avoidgive a complete reassembly bufferfalse-positive indication of the ULPDU containment property. In situations where the receive window is smaller than EMSS, data transfer performance is likely to be able to find TLS PDUs in the presencelimited independently of lostany segmentation behavior by the TCP segments. Therefore,sender. Furthermore, ULP implementations that choose to permit direct placement of ULPs secured with TLS, TLS should alsouse TUF will almost certainly be treated as a protocol which uses framing support. Using the framing protocol with TLS requires modification of a TLS implementation for the combinationdesigned to perform effectively. Essentially,maintain a TLS implementation must becomereceiver window larger than EMSS, so a clientsmall receiver window should occur extremely infrequently. 5.2.5. Size of ULPDU + 8 > EMSS In cases where EMSS shrinks below the framing protocol. TLS providesminimum size of a similar interfaceULPDU that a ULP wants to send, TUF will create FPDUs that are larger than EMSS, and a TUF-compliant TCP for sending protocol data. Protocol data submitted tosender will face the TLSsame alternatives as during PMTU reduction: o send interface may be coalesced with other protocol dataunsegmented FPDUs and rely on IP fragmentation to deliver the segments o segment FPDUs to fit in a single TLS PDU, or it may be segmented arbitrarily across more than one TLS PDU. ForTCP segments which respect the EMSS size A ULP which is presented with an MULPDU value that is too small to accommodate PDUs necessary operation SHOULD simply attempt to use ULPDUs which are as small as possible If the framing protocol inEMSS shrinks to properly support direct placement with TLS,a framing-aware TLS MUST providepathologically small size, then a framing-aware interface to the ULP similar toTUF implementation SHOULD discontinue the one described in Appendix A. This layering looks like: Framinguse of ULP client | V TLS-capable framing module | V Framing-aware TLS | V Framing module | V TCP (possibly framing-aware) | V . . . Although some framingcontrol information may be exposed infound using the clear when running TLS onULPDU containment property. In such cases, the framing protocol, this information does not add to what is already availableULP MAY elect to an attacker. Framing only conveysdisable the locationuse of TLS PDUs,TUF altogether, or simply just stop exploiting the ULPDU containment property. A path MTU which are already availableresults in an EMSS < 128 + 8 octets is an extremely unlikely occurrence and when it does occur, poor data transfer performance is a likely result, independent of TCP sender segmentation behavior. 6. Security Considerations This section discusses both protocol-specific considerations and the clear. Unfortunately, ciphers defined for useimplications of using TUF with TLS do not offerexisting security mechanisms. 6.1. Protocol-specific Security Considerations A third-party that can inject spoofed packets into the same independencenetwork which can be delivered to a TUF receiver could launch a variety of TLS PDUsattacks that IPSec provides for IP datagrams.exploit TUF-specific behavior. For one thing, TLS supports the use of stream ciphers,example a blind third-party adversary could inject random packets which IPSec does not. Stream ciphers typically have dependencies reaching far backappear in the data stream for deciphering at the current point. Therefore it is probablyvalid TCP window and do not appropriate to negotiate the use of a stream cipher when securing the framing protocol. Block ciphers defined for use with TLS have similar properties to those defined for usebegin with IPSec. Specifically, they all operate in Cipher Block Chaining (CBC) mode. However, while IPSec provides a CBC initialization vector for each IP datagram, TLS defines onlyvalid FPDU headers. A barrage of such packets might cause a single CBC initialization vector for use inTUF receiver to conclude that a resegmenting intermediary is present and disable the first block. All subsequent blocksuse the cipher-textof their predecessor. To decipher the current TLS PDU, the final cipher-text block from the previous TLS PDU must be available. Typically, block ciphers defined for use with TLS have an 8-octet block size.TUF and direct data placement. This implies that forwould substantially degrade performance. However, it would probably also have more dire consequences than performance, such as causing the ULP direct placementto be possible with TLS,interpret the bogus data fromas valid. Furthermore, such a preceding TCP segment may be needed, where it is not when usingthird-party could also degrade performance just as effectively in a TUF-independent way by injecting spoofed ICMP packets which result in reduction of the framing protocol without TLS. Note that ifpath MTU to an inefficiently small size. Fundamentally, the precedingvulnerabilities of TUF to active third-party interference are no more acute than to TCP segmentwithout TUF. In both cases, a communication security mechanism such as IPSec is missing, all cipher blocks within the current TCP segment may still be processed except the first one (assuming the bounds ofthe TLS PDU is known). 7.3.1.only way to completely prevent such attacks. 6.2. Using TLS In PDU Alignment Mode ToIPSec With TUF Since IPSec is designed to secure arbitrary IP packet streams, including streams where packets are lost, TUF can run the framing protocol runningcleanly on TLS in PDU alignment mode, an integral numbertop of TLS PDUsIPSec without any change. IPSec packets may be sentdecrypted in each TCP segmentthe same way ULP PDUsorder they are sent inreceived, and a TUF receiver may test and exploit the absence of TLS. A framing-aware TLS would useULPDU containment property just as if the framing-aware TCP. In this case,IP datagram were unsecured. 6.3. Using TLS With TUF Using TLS [TLS] with TUF, particularly trying to exploit the roleULPDU containment property to locate ULP control information, is not a straightforward process. TUF can be directly layered on top of the framing PDU header in detecting unexpected modificationTLS, but many of TCP segmentation is subsumed bythe strong integrity checks performed onadvantages of TUF are lost. This document does not define a way of using TLS PDUs. Therewith TUF that could offer better performance than stock reassembly buffer-based implementations. That task is no needleft to encapsulate TLS PDUs ina framing PDU. In fact,different document, if there is sufficient motivation to address the vulnerabilityproblems. This section does outlines some of the framing keyknown complications of trying to active attack is eliminated bydo better than stock reassembly buffer-based implementations using TLS validation algorithms instead. Use ofwith TUF. TLS is a non-nullrecord-oriented protocol. TLS compression algorithm may interact badlyrecords are PDUs with a framing-aware TLS implementation. A TLS compression algorithm is allowed to increase content length by upsimilar structure to 1024, which may resultULPDUs defined in application ULPs. As with other ULPs, the compressed TLS PDU no longer fitting within EMSS. Therefore,only TLS compression algorithms which are known notway to increase content length, or increase content length byavoid a small, manageable amount, shouldcomplete reassembly buffer is to be selected. The needable to receive the previous TCP segment before completingfind TLS processingPDUs in the presence of currentlost TCP segment meanssegments. The ULPDU containment property could be used to do this, which suggests that using the framing protocol in PDU alignment mode withTLS will require some high- speed receive packet buffer memory. This defeats oneitself should be layered on top of TUF. In this case, the primary advantagesFPDU header will travel in the clear, but this will probably not present serious vulnerabilities other than denial of PDU alignment mode. Therefore, while itservice attacks comparable to what is already possible to usewithout TUF. Once the TLS records are located and processed it still remains to securelocate the ULPDUs. The simplest way to do this would be to have the TLS implementation be TUF-compliant, and ensure the ULPDU containment property within each TLS record. In this case, the framingprotocol in PDU alignment mode, IPSeclayering would belook like: ULP client ^ | | ULPDUs (in octet stream) | v TUF-conforming TLS ^ | | TLS records (containing ULPDUs) | v TUF ^ | | FPDUs (each containing a more appropriate choiceTLS record) | v TUF-conforming TCP ^ | | TCP Segments (each containing an FPDU) | v . . . An obvious complications of using TLS with TUF is that ciphers defined for securing PDU alignment mode connections because it doesuse with TLS do not require any reassembly buffer memory. 7.3.2. Usingoffer independence across TLS In Marker Mode To userecords. The most common cipher used with TLS onis RC4, which is a framing protocol connection in marker mode, the TCPstream must actually contain two, independent setscipher. Efficient decryption of periodic markers. Clear-text markers in the TLS PDUan RC4 stream will permit TLS PDUsdepends upon the entire preceding data stream. In other words, it is simply not feasible to be founddecrypt TLS records encrypted with RC4 in any order other than the presence of lost TCP segments. Once a portion of the original, clear-textTCP stream is recovered byorder. This clearly defeats the purpose of TUF. TLS processing, markersis also defined to work with block ciphers such as 3DES in Cipher Block Chaining (CBC) mode. In this case, the original octet stream are used to find ULP PDUs and perform direct placement. 7.4. Other Security Considerations The modificationdependency of the sender's TCP segmentation algorithmdecryption operation on data in PDU alignment mode does not open any new attacks, since: 1) the segmentation algorithmprevious TLS records is not based on input fromless severe. To decrypt the network, 2)current TLS record only requires ciphertext from the segmentation algorithm may pack small ULP PDUs intoprevious TLS record. While this does not allow complete independence of processing TLS records, a singlelost or delayed TCP segment so itcontaining a TLS record only prevents decrypting the immediately subsequent TLS record, not all TLS records after it. TLS compression presents another complication to using TLS with TUF. TLS compression algorithms are allowed to increase the content length by up to 1024 octets. If the content length does increase, the TLS record may not open packet flooding attacks. If an attacker can sendfit within an in-windowEMSS-sized TCP segment that is accepted, on an unsecured framing protocol connectionsegment, even if the attacker can probably forceuncompressed ULPDU does. If the risk of exceeding an EMSS-sized TCP receiver insegment is small, it may be acceptable to a framing protocol exception path, degrading service. However, such an attacker can also place arbitrary data into the stream, so merely forcingoccasionally send FPDUs containing TLS records that span several TCP segments, or use IP fragmentation. Some TLS compression algorithms may never increase the receiver on to an exception path is not a compelling attack. 8.content length, or only increase it by some small, manageable amount. 7. IANA Considerations If framing is enabled a priori for a ULP by connecting to a well- known port, this well-known port would be registered for the framed ULP with IANA. 9.8. References [ALF] D. D. Clark and D. L. Tennenhouse, "Architectural considerations for a new generation of protocols," in SIGCOMM Symposium on Communications Architectures and Protocols , (Philadelphia, Pennsylvania), pp. 200--208, IEEE, Sept. 1990. Computer Communications Review, Vol. 20(4), Sept. 1990. [SOCKS] Leech,[BEEP] Rose, M., "The Blocks Extensible Exchange Protocol Core", RFC 3080, March 2001. [HTTP] Fielding, R. and others, "SOCKS"Hypertext Transfer Protocol Version 5,"-- HTTP/1.1.", RFC 1928, April 1996 [RFC0879] Postel,2616, June 1999. http://www.ietf.org/internet-drafts/draft-ietf-tsvwg- initwin-00.txt. [NagleDAck] Minshall G., Mogul, J., "TCP Maximum Segment Size And Related Topics", RFC 879, November 1983 [RFC1112] Braden, R., ed., "Requirements forSaito, Y., Verghese, B., "Application performance pitfalls and TCP's Nagle algorithm", Workshop on Internet Hosts -- Communications Layers", RFC 1122, October 1989Server Performance, May 1999. [PathMTU] Mogul, J., and Deering, S., "Path MTU Discovery", RFC 1191, November 19901990. [RFC1750] Eastlake, D., Crocker, S., Schiller., J., "Randomness Recommendations for Security.", RFC 1750, December 19941994. [RFC2581] Allman, M.M., and others, "TCP Congestion Control," RFC 2581, April 19991999. [SCTP] Stewart, R.R. and others, "Stream Control Transmission Protocol," RFC2960, October 2000. [Stevens] Stevens, W. Richard, "Unix Network Programming Volume 1," Prentice Hall, 1998, ISBN 0-13-490012-X0-13-490012-X. [TCP] Postel, J., "Transmission Control Protocol - DARPA Internet Program Protocol Specification", RFC 793, September 19811981. [TLS] Dierks, T. and others, "The TLS Protocol, Version 1.0", RFC 2246 [Satran] Satran, J., "iSCSI - fragments, packets synchronization and RDMA", http://www.haifa.il.ibm.com/satran/ips/iSCSI-RDMA- memo.txt, July 2000.2246, January 1999. Authors' Addresses Stephen Bailey Sandburst Corporation 600 Federal Street Andover, MA 01810 USA Phone: +1 978 689 1614 Email: email@example.com Jeff Chase Department of Computer Science Duke University Durham, NC 27708-0129 USA Phone: +1 919 660 6559 Email: firstname.lastname@example.org Jim Pinkerton Microsoft, Inc. 1 Microsoft Way Redmond, WA 98052 USA EMail: email@example.com Constantine SapuntzakisAllyn Romanow Cisco Systems 170 W Tasman Drive San Jose, CA 95134 USA Phone: +1 408 525 5497 EMail: firstname.lastname@example.org Matt Wakeley Agilent Technologies 1101 Creekside Ridge8836 Email: email@example.com Constantine Sapuntzakis Cisco Systems 170 W Tasman Drive Suite 100, M/S RH21 Roseville,San Jose, CA 9566195134 USA Phone: +1 916 788 5670408 525 5497 EMail: firstname.lastname@example.org@cisco.com Jim Wendt Hewlett Packard Corporation 8000 Foothills Boulevard MS 5668 Roseville, CA 95747-5668 USA Phone: +1 916 785 5198 EMail: email@example.com Jim Williams Emulex Corporation 580 Main Street Bolton, MA 01740 USUSA Phone: +1 978 779 7224 EMail: firstname.lastname@example.org Appendix A. Sample Sockets Support For TUF The Framing Protocolsockets support for TUF described below is only a sketch. It is provided as an aid to understanding TUF. Implementing this interface is not a requirement for a TUF implementation. Other software interfaces are possible. The described interface draws from the sockets interface for UDP. The described interface might be natural for applications already designed to support both TCP and UCP, or that do network input and output in complete PDU units. For applications that perform octet-at-a-time style input and output, an alternative interface that draws from the tradition of the TCP URG pointer interface (e.g. using a MSG_OOB flag to send()) is equally possible. An implementation may even offer several different interfaces to TUF. That said, the sockets support sketched below might well provide the basis for a complete, standard interface to be described outside this draft. A.1 Basic Principles The sockets support for the framing moduleTUF takes the form of a set of socket options whichthat may be set or requested to enable the appropriate behavior. A socket may be in one of threetwo TUF-related modes in the send direction: 1. Framing-awareTUF-compliant TCP sender mode. No data (FPDU headers) is added to the TCP octet stream (neither framing PDUs nor markers),stream, but each data buffer presented in a sending operation is to be sent atomically as a singleaccording to the rules of TCP segment.and TUF-compliant TCP senders. This mode provides direct access to a framing-awareTUF-compliant TCP sender for purposes such as implementing a framing-aware TLS.TUF. 2. Framing protocol PDU alignmentTUF sender mode. A framing PDUAn FPDU header is added to data presented by an integral number of sending operations, and the resulting framing PDUFPDU is sent according to the rules of PDU alignment mode. 3. Framing protocol marker sender mode. Markers are inserted at fixed intervals which pointpassed to the octet past the current PDU submitted bya sending operation.TUF-compliant TCP sender for transmission A socket may be in one of two modesTUF-related mode in the receive direction: 1. Framing protocol PDU alignmentTUF receiver mode. Framing PDUsFPDUs are expected in each TCP segment. 2. Framing protocol marker receiver mode. Markers are expected at a fixed interval in the TCP stream. Received TCP segments are processed as defined above.If a socket receiving operation is used to retrieve received data (as opposed to direct placement), framing PDUthe data being directly placed), FPDU headers or markersare removed before the data is returned. A.1A.2 Enabling The Framing ProtocolTUF /* Pick onea sending mode and one receiving mode*/ if (sendMode == ATOMIC) mode = TCP_FRAMING_SEND_ATOMIC else if (sendMode == ALIGN)TUF_TCP) mode = TCP_FRAMING_SEND_ALIGN;TUF_SEND_TCP else /* sendMode == MARKERS */mode = TCP_FRAMING_SEND_MARKERS; if (recvMode == ALIGN)TUF_SEND; mode |= TCP_FRAMING_RECV_ALIGN; else /* recvMode == MARKERS */ mode |= TCP_FRAMING_RECV_MARKERS;TUF_RECEIVE; setsockopt (s, SOL_TCP, TCP_FRAMING_MODE,TUF_MODE, &mode, sizeof(mode)); A framing module that does not support a requested mode MUST fail the setsockopt call. Framing may be enabled on a socket before or after it is connected, subject to the requirements of Section 2. A.2A.3 Sending Data AtomicallyThe standard socket sending operations, including send(), sendto(), sendmsg(), writev(), and others are used to send framed data units (ULP PDU)s with the framing protocol.ULPDUs in TUF. The EMSGSIZE error should be returned if the buffer passed to the sending operation would result in an FPDU that does not satisfied the size requirements definedfit in the `ULP Support For Framing' section above.an EMSS-sized TCP segment, unless oversized ULPDU errors are disabled, as described below. When the path EMSS increases, the TCPsending operation MAY return EMSGSIZE once to inform the client ofclient of the change. A.4 Retrieving The Current EMSS or MULPDU getsockopt (s, SOL_TCP, TUF_MULPDU, &emss, sizeof(emss)); If the socket is in TUF_SEND_TCP mode, this call returns the TCP EMSS. If the socket is in TUF_SEND mode, the change. A.3 Retrieving The Current EMSS getsockopt (s, SOL_TCP, TCP_SEND_EMSS, &emss, sizeof(emss)); Thiscall returns the maximum segment sizeULPDU that can be submitted in a sending operation without fragmentation. The number returned depends upon the current socket sending mode. If the socket is in framing protocol PDU alignment mode, the returned EMSS is appropriately adjusted by the sizerequiring fragmentation of the framing header.associated FPDU. The number should not count any octets that go towards TCP options. A framing protocol implementation which does not support PDU alignment mode, because the underlying TCP sender is not framing- aware, is not required to implement this getsockopt call. A.4A.5 Disabling ULP PDUULPDU Packing flag = 0; setsockopt (s, SOL_TCP, TCP_FRAMING_PACK_PDUS,TUF_PACK_PDUS, &flag, sizeof(flag)); This call disables the framing protocol in PDU alignment modeTUF from packing more than one ULP PDUULPDU into a framing PDU.an FPDU. By default, ULP PDU packing is enabled. A.5 Enabling Emergency ModeA.6 Disabling The Report of Oversized ULPDUs flag = 1;0; setsockopt (s, SOL_TCP, TCP_FRAMING_EMERGENCY,TUF_REPORT_OVERSIZED, &flag, sizeof(flag)); This call enables emergency mode for PDU alignment mode.disables sending operations from returning EMSGSIZE in response to oversized ULPDUs. It may be called at any time on a socket, whether connected or not, and whether the current EMSS is smaller than 512 octets ornot. By default emergency modeIt is disabled. A.6 Setting The Sending Marker Interval ivl = 2048; setsockopt (s, SOL_TCP, TCP_FRAMING_SEND_INTERVAL, &ivl, sizeof(ivl)); This call sets the period at which markers willused to continue ULP operation when MULPDU is already known to be introducedtoo small to thepermit some ULPDUs to be sent TCP octet stream. The sending marker interval maywith out segmentation. Oversized ULPDU reporting can be set at any time, but it only has effect when sending markers isenabled for the socket. A.7 Setting The Receiving Marker Interval ivl = 2048; setsockopt (s, SOL_TCP, TCP_FRAMING_RECV_INTERVAL, &ivl sizeof(ivl)); This call sets the period at which markers are expected in the received TCP octet stream. The receiving marker interval may be set at any time, but it only has effect when receiving markersagain if PMTU is enabled for the socket.discovered to have increased. Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.