draft-ietf-tcpm-rfc793bis-04.txt   draft-ietf-tcpm-rfc793bis-05.txt 
Internet Engineering Task Force W. Eddy, Ed. Internet Engineering Task Force W. Eddy, Ed.
Internet-Draft MTI Systems Internet-Draft MTI Systems
Obsoletes: 793, 879, 6093, 6429, 6528, December 8, 2016 Obsoletes: 793, 879, 6093, 6429, 6528, March 28, 2017
6691 (if approved) 6691 (if approved)
Updates: 1122 (if approved) Updates: 1122 (if approved)
Intended status: Standards Track Intended status: Standards Track
Expires: June 11, 2017 Expires: September 29, 2017
Transmission Control Protocol Specification Transmission Control Protocol Specification
draft-ietf-tcpm-rfc793bis-04 draft-ietf-tcpm-rfc793bis-05
Abstract Abstract
This document specifies the Internet's Transmission Control Protocol This document specifies the Internet's Transmission Control Protocol
(TCP). TCP is an important transport layer protocol in the Internet (TCP). TCP is an important transport layer protocol in the Internet
stack, and has continuously evolved over decades of use and growth of stack, and has continuously evolved over decades of use and growth of
the Internet. Over this time, a number of changes have been made to the Internet. Over this time, a number of changes have been made to
TCP as it was specified in RFC 793, though these have only been TCP as it was specified in RFC 793, though these have only been
documented in a piecemeal fashion. This document collects and brings documented in a piecemeal fashion. This document collects and brings
those changes together with the protocol specification from RFC 793. those changes together with the protocol specification from RFC 793.
skipping to change at page 2, line 4 skipping to change at page 2, line 4
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on June 11, 2017. This Internet-Draft will expire on September 29, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 3, line 7 skipping to change at page 3, line 7
3.6. Precedence and Security . . . . . . . . . . . . . . . . . 31 3.6. Precedence and Security . . . . . . . . . . . . . . . . . 31
3.7. Segmentation . . . . . . . . . . . . . . . . . . . . . . 32 3.7. Segmentation . . . . . . . . . . . . . . . . . . . . . . 32
3.7.1. Maximum Segment Size Option . . . . . . . . . . . . . 33 3.7.1. Maximum Segment Size Option . . . . . . . . . . . . . 33
3.7.2. Path MTU Discovery . . . . . . . . . . . . . . . . . 34 3.7.2. Path MTU Discovery . . . . . . . . . . . . . . . . . 34
3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 35 3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 35
3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 35 3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 35
3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 36 3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 36
3.8. Data Communication . . . . . . . . . . . . . . . . . . . 36 3.8. Data Communication . . . . . . . . . . . . . . . . . . . 36
3.8.1. Retransmission Timeout . . . . . . . . . . . . . . . 37 3.8.1. Retransmission Timeout . . . . . . . . . . . . . . . 37
3.8.2. TCP Connection Failures . . . . . . . . . . . . . . . 37 3.8.2. TCP Congestion Control . . . . . . . . . . . . . . . 37
3.8.3. TCP Keep-Alives . . . . . . . . . . . . . . . . . . . 38 3.8.3. TCP Connection Failures . . . . . . . . . . . . . . . 37
3.8.4. The Communication of Urgent Information . . . . . . . 38 3.8.4. TCP Keep-Alives . . . . . . . . . . . . . . . . . . . 38
3.8.5. Managing the Window . . . . . . . . . . . . . . . . . 39 3.8.5. The Communication of Urgent Information . . . . . . . 39
3.8.6. Managing the Window . . . . . . . . . . . . . . . . . 40
3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 44 3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 44
3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 44 3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 44
3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 52 3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 52
3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 54 3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 54
3.11. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 78 3.11. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 79
4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 83 4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 84
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 87 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 88
6. Security and Privacy Considerations . . . . . . . . . . . . . 87 6. Security and Privacy Considerations . . . . . . . . . . . . . 89
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 88 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 89
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 88 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.1. Normative References . . . . . . . . . . . . . . . . . . 88 8.1. Normative References . . . . . . . . . . . . . . . . . . 89
8.2. Informative References . . . . . . . . . . . . . . . . . 89 8.2. Informative References . . . . . . . . . . . . . . . . . 90
Appendix A. TCP Requirement Summary . . . . . . . . . . . . . . 90 Appendix A. TCP Requirement Summary . . . . . . . . . . . . . . 92
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 94 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 95
1. Purpose and Scope 1. Purpose and Scope
In 1981, RFC 793 [8] was released, documenting the Transmission In 1981, RFC 793 [10] was released, documenting the Transmission
Control Protocol (TCP), and replacing earlier specifications for TCP Control Protocol (TCP), and replacing earlier specifications for TCP
that had been published in the past. that had been published in the past.
Since then, TCP has been implemented many times, and has been used as Since then, TCP has been implemented many times, and has been used as
a transport protocol for numerous applications on the Internet. a transport protocol for numerous applications on the Internet.
For several decades, RFC 793 plus a number of other documents have For several decades, RFC 793 plus a number of other documents have
combined to serve as the specification for TCP [20]. Over time, a combined to serve as the specification for TCP [23]. Over time, a
number of errata have been identified on RFC 793, as well as number of errata have been identified on RFC 793, as well as
deficiencies in security, performance, and other aspects. A number deficiencies in security, performance, and other aspects. A number
of enhancements has grown and been documented separately. These were of enhancements has grown and been documented separately. These were
never accumulated together into an update to the base specification. never accumulated together into an update to the base specification.
The purpose of this document is to bring together all of the IETF The purpose of this document is to bring together all of the IETF
Standards Track changes that have been made to the basic TCP Standards Track changes that have been made to the basic TCP
functional specification and unify them into an update of the RFC 793 functional specification and unify them into an update of the RFC 793
protocol specification. Some companion documents are referenced for protocol specification. Some companion documents are referenced for
important algorithms that TCP uses (e.g. for congestion control), but important algorithms that TCP uses (e.g. for congestion control), but
skipping to change at page 4, line 36 skipping to change at page 4, line 37
repair losses. repair losses.
This document describes the basic functionality expected in modern This document describes the basic functionality expected in modern
implementations of TCP, and replaces the protocol specification in implementations of TCP, and replaces the protocol specification in
RFC 793. It does not replicate or attempt to update the examples and RFC 793. It does not replicate or attempt to update the examples and
other discussion in RFC 793. Other documents are referenced to other discussion in RFC 793. Other documents are referenced to
provide explanation of the theory of operation, rationale, and provide explanation of the theory of operation, rationale, and
detailed discussion of design decisions. This document only focuses detailed discussion of design decisions. This document only focuses
on the normative behavior of the protocol. on the normative behavior of the protocol.
The "TCP Roadmap" [20] provides a more extensive guide to the RFCs The "TCP Roadmap" [23] provides a more extensive guide to the RFCs
that define TCP and describe various important algorithms. The TCP that define TCP and describe various important algorithms. The TCP
Roadmap contains sections on strongly encouraged enhancements that Roadmap contains sections on strongly encouraged enhancements that
improve performance and other aspects of TCP beyond the basic improve performance and other aspects of TCP beyond the basic
operation specified in this document. As one example, implementing operation specified in this document. As one example, implementing
congestion control (e.g. [13]) is a TCP requirement, but is a complex congestion control (e.g. [16]) is a TCP requirement, but is a complex
topic on its own, and not described in detail in this document, as topic on its own, and not described in detail in this document, as
there are many options and possibilities that do not impact basic there are many options and possibilities that do not impact basic
interoperability. Similarly, most common TCP implementations today interoperability. Similarly, most common TCP implementations today
include the high-performance extensions in [19], but these are not include the high-performance extensions in [22], but these are not
strictly required or discussed in this document. strictly required or discussed in this document.
TEMPORARY EDITOR'S NOTE: This is an early revision in the process of TEMPORARY EDITOR'S NOTE: This is an early revision in the process of
updating RFC 793. Many planned changes are not yet incorporated. updating RFC 793. Many planned changes are not yet incorporated.
***Please do not use this revision as a basis for any work or ***Please do not use this revision as a basis for any work or
reference.*** reference.***
A list of changes from RFC 793 is contained in Section 4. A list of changes from RFC 793 is contained in Section 4.
skipping to change at page 6, line 13 skipping to change at page 6, line 13
TCP Header Format TCP Header Format
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port | | Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number | | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number | | Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| | | Data | |C|E|U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window | | Offset| Rsrvd |W|C|R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| | | | |R|E|G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer | | Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding | | Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data | | data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
TCP Header Format TCP Header Format
skipping to change at page 7, line 8 skipping to change at page 7, line 8
If the ACK control bit is set this field contains the value of the If the ACK control bit is set this field contains the value of the
next sequence number the sender of the segment is expecting to next sequence number the sender of the segment is expecting to
receive. Once a connection is established this is always sent. receive. Once a connection is established this is always sent.
Data Offset: 4 bits Data Offset: 4 bits
The number of 32 bit words in the TCP Header. This indicates where The number of 32 bit words in the TCP Header. This indicates where
the data begins. The TCP header (even one including options) is an the data begins. The TCP header (even one including options) is an
integral number of 32 bits long. integral number of 32 bits long.
Reserved: 4 bits Rsrvd - Reserved: 4 bits
Reserved for future use. Must be zero. Reserved for future use. Must be zero.
Control Bits: 8 bits (from left to right): Control Bits: 8 bits (from left to right):
CWR: Congestion Window Reduced CWR: Congestion Window Reduced (see [7])
ECE: ECN-Echo ECE: ECN-Echo (see [7])
URG: Urgent Pointer field significant URG: Urgent Pointer field significant
ACK: Acknowledgment field significant ACK: Acknowledgment field significant
PSH: Push Function PSH: Push Function
RST: Reset the connection RST: Reset the connection
SYN: Synchronize sequence numbers SYN: Synchronize sequence numbers
FIN: No more data from sender FIN: No more data from sender
Window: 16 bits Window: 16 bits
The number of data octets beginning with the one indicated in the The number of data octets beginning with the one indicated in the
skipping to change at page 18, line 33 skipping to change at page 18, line 33
ISN = M + F(localip, localport, remoteip, remoteport, secretkey) ISN = M + F(localip, localport, remoteip, remoteport, secretkey)
where M is the 4 microsecond timer, and F() is a pseudorandom where M is the 4 microsecond timer, and F() is a pseudorandom
function (PRF) of the connection's identifying parameters ("localip, function (PRF) of the connection's identifying parameters ("localip,
localport, remoteip, remoteport") and a secret key ("secretkey"). localport, remoteip, remoteport") and a secret key ("secretkey").
F() MUST NOT be computable from the outside, or an attacker could F() MUST NOT be computable from the outside, or an attacker could
still guess at sequence numbers from the ISN used for some other still guess at sequence numbers from the ISN used for some other
connection. The PRF could be implemented as a cryptographic has of connection. The PRF could be implemented as a cryptographic has of
the concatenation of the TCP connection parameters and some secret the concatenation of the TCP connection parameters and some secret
data. For discussion of the selection of a specific hash algorithm data. For discussion of the selection of a specific hash algorithm
and management of the secret key data, please see Section 3 of [17]. and management of the secret key data, please see Section 3 of [20].
For each connection there is a send sequence number and a receive For each connection there is a send sequence number and a receive
sequence number. The initial send sequence number (ISS) is chosen by sequence number. The initial send sequence number (ISS) is chosen by
the data sending TCP, and the initial receive sequence number (IRS) the data sending TCP, and the initial receive sequence number (IRS)
is learned during the connection establishing procedure. is learned during the connection establishing procedure.
For a connection to be established or initialized, the two TCPs must For a connection to be established or initialized, the two TCPs must
synchronize on each other's initial sequence numbers. This is done synchronize on each other's initial sequence numbers. This is done
in an exchange of connection establishing segments carrying a control in an exchange of connection establishing segments carrying a control
bit called "SYN" (for synchronize) and the initial sequence numbers. bit called "SYN" (for synchronize) and the initial sequence numbers.
skipping to change at page 28, line 41 skipping to change at page 28, line 41
3.4.1. Remote Address Validation 3.4.1. Remote Address Validation
TODO - figure out if this section would fit better elsewhere, for TODO - figure out if this section would fit better elsewhere, for
instance in the more detailed description of the OPEN call later on instance in the more detailed description of the OPEN call later on
A TCP implementation MUST reject as an error a local OPEN call for an A TCP implementation MUST reject as an error a local OPEN call for an
invalid remote IP address (e.g., a broadcast or multicast address). invalid remote IP address (e.g., a broadcast or multicast address).
An incoming SYN with an invalid source address must be ignored either An incoming SYN with an invalid source address must be ignored either
by TCP or by the IP layer (see Section 3.2.1.3 of [10]). by TCP or by the IP layer (see Section 3.2.1.3 of [12]).
A TCP implementation MUST silently discard an incoming SYN segment A TCP implementation MUST silently discard an incoming SYN segment
that is addressed to a broadcast or multicast address. that is addressed to a broadcast or multicast address.
3.5. Closing a Connection 3.5. Closing a Connection
CLOSE is an operation meaning "I have no more data to send." The CLOSE is an operation meaning "I have no more data to send." The
notion of closing a full-duplex connection is subject to ambiguous notion of closing a full-duplex connection is subject to ambiguous
interpretation, of course, since it may not be obvious how to treat interpretation, of course, since it may not be obvious how to treat
the receiving side of the connection. We have chosen to treat CLOSE the receiving side of the connection. We have chosen to treat CLOSE
skipping to change at page 32, line 26 skipping to change at page 32, line 26
The term "segmentation" refers to the activity TCP performs when The term "segmentation" refers to the activity TCP performs when
ingesting a stream of bytes from a sending application and ingesting a stream of bytes from a sending application and
packetizing that stream of bytes into TCP segments. Individual TCP packetizing that stream of bytes into TCP segments. Individual TCP
segments often do not correspond one-for-one to individual send (or segments often do not correspond one-for-one to individual send (or
socket write) calls from the application. Applications may perform socket write) calls from the application. Applications may perform
writes at the granularity of messages in the upper layer protocol, writes at the granularity of messages in the upper layer protocol,
but TCP guarantees no boundary coherence between the TCP segments but TCP guarantees no boundary coherence between the TCP segments
sent and received versus user application data read or write buffer sent and received versus user application data read or write buffer
boundaries. In some specific protocols, such as RDMA using DDP and boundaries. In some specific protocols, such as RDMA using DDP and
MPA [12], there are performance optimizations possible when the MPA [14], there are performance optimizations possible when the
relation between TCP segments and application data units can be relation between TCP segments and application data units can be
controlled, and MPA includes a specific mechanism for detecting and controlled, and MPA includes a specific mechanism for detecting and
verifying this relationship between TCP segments and application verifying this relationship between TCP segments and application
message data strcutures, but this is specific to applications like message data strcutures, but this is specific to applications like
RDMA. In general, multiple goals influence the sizing of TCP RDMA. In general, multiple goals influence the sizing of TCP
segments created by a TCP implementation. segments created by a TCP implementation.
Goals driving the sending of larger segments include: Goals driving the sending of larger segments include:
o Reducing the number of packets in flight within the network. o Reducing the number of packets in flight within the network.
skipping to change at page 34, line 25 skipping to change at page 34, line 25
o IPoptionsize is the size of any IP options associated with a TCP o IPoptionsize is the size of any IP options associated with a TCP
connection. Note that some options may not be included on all connection. Note that some options may not be included on all
packets, but that for each segment sent, the sender should adjust packets, but that for each segment sent, the sender should adjust
the data length accordingly, within the Eff.snd.MSS. the data length accordingly, within the Eff.snd.MSS.
The MSS value to be sent in an MSS option should be equal to the The MSS value to be sent in an MSS option should be equal to the
effective MTU minus the fixed IP and TCP headers. By ignoring both effective MTU minus the fixed IP and TCP headers. By ignoring both
IP and TCP options when calculating the value for the MSS option, if IP and TCP options when calculating the value for the MSS option, if
there are any IP or TCP options to be sent in a packet, then the there are any IP or TCP options to be sent in a packet, then the
sender must decrease the size of the TCP data accordingly. RFC 6691 sender must decrease the size of the TCP data accordingly. RFC 6691
[18] discusses this in greater detail. [21] discusses this in greater detail.
The MSS value to be sent in an MSS option must be less than or equal The MSS value to be sent in an MSS option must be less than or equal
to: to:
MMS_R - 20 MMS_R - 20
where MMS_R is the maximum size for a transport-layer message that where MMS_R is the maximum size for a transport-layer message that
can be received (and reassembled). TCP obtains MMS_R and MMS_S from can be received (and reassembled). TCP obtains MMS_R and MMS_S from
the IP layer; see the generic call GET_MAXSIZES in Section 3.4 of RFC the IP layer; see the generic call GET_MAXSIZES in Section 3.4 of RFC
1122. 1122.
skipping to change at page 35, line 10 skipping to change at page 35, line 10
on the default effective MTU for sending to be less than or equal to on the default effective MTU for sending to be less than or equal to
576 for destinations not directly connected. For IPv6, this would be 576 for destinations not directly connected. For IPv6, this would be
1280. In all cases, however, implementation of Path MTU Discovery 1280. In all cases, however, implementation of Path MTU Discovery
(PMTUD) and Packetization Layer Path MTU Discovery (PLPMTUD) is (PMTUD) and Packetization Layer Path MTU Discovery (PLPMTUD) is
strongly recommended in order for TCP to improve segmentation strongly recommended in order for TCP to improve segmentation
decisions. decisions.
PMTUD for IPv4 [2] or IPv6 [3] is implemented in conjunction between PMTUD for IPv4 [2] or IPv6 [3] is implemented in conjunction between
TCP, IP, and ICMP protocols. Several adjustments to a TCP TCP, IP, and ICMP protocols. Several adjustments to a TCP
implementation with PMTUD are described in RFC 2923 in order to deal implementation with PMTUD are described in RFC 2923 in order to deal
with problems experienced in practice [6]. PLPMTUD [11] is a with problems experienced in practice [6]. PLPMTUD [13] is a
Standards Track improvement to PMTUD that relaxes the requirement for Standards Track improvement to PMTUD that relaxes the requirement for
ICMP support across a path, and improves performance in cases where ICMP support across a path, and improves performance in cases where
ICMP is not consistently conveyed. The mechanisms in all four of ICMP is not consistently conveyed. The mechanisms in all four of
these RFCs are recommended to be included in TCP implementations. these RFCs are recommended to be included in TCP implementations.
The TCP MSS option specifies an upper bound for the size of packets The TCP MSS option specifies an upper bound for the size of packets
that can be received. Hence, setting the value in the MSS option too that can be received. Hence, setting the value in the MSS option too
small can impact the ability for PMTUD or PLPMTUD to find a larger small can impact the ability for PMTUD or PLPMTUD to find a larger
path MTU. RFC 1191 discusses this implication of many older TCP path MTU. RFC 1191 discusses this implication of many older TCP
implementations setting MSS to 536 for non-local destinations, rather implementations setting MSS to 536 for non-local destinations, rather
than deriving it from the MTUs of connected interfaces as than deriving it from the MTUs of connected interfaces as
recommended. recommended.
3.7.3. Interfaces with Variable MTU Values 3.7.3. Interfaces with Variable MTU Values
The effective MTU can sometimes vary, as when used with variable The effective MTU can sometimes vary, as when used with variable
compression, e.g., RObust Header Compression (ROHC) [14]. It is compression, e.g., RObust Header Compression (ROHC) [17]. It is
tempting for TCP to want to advertise the largest possible MSS, to tempting for TCP to want to advertise the largest possible MSS, to
support the most efficient use of compressed payloads. support the most efficient use of compressed payloads.
Unfortunately, some compression schemes occasionally need to transmit Unfortunately, some compression schemes occasionally need to transmit
full headers (and thus smaller payloads) to resynchronize state at full headers (and thus smaller payloads) to resynchronize state at
their endpoint compressors/decompressors. If the largest MTU is used their endpoint compressors/decompressors. If the largest MTU is used
to calculate the value to advertise in the MSS option, TCP to calculate the value to advertise in the MSS option, TCP
retransmission may interfere with compressor resynchronization. retransmission may interfere with compressor resynchronization.
As a result, when the effective MTU of an interface varies, TCP As a result, when the effective MTU of an interface varies, TCP
SHOULD use the smallest effective MTU of the interface to calculate SHOULD use the smallest effective MTU of the interface to calculate
the value to advertise in the MSS option. the value to advertise in the MSS option.
3.7.4. Nagle Algorithm 3.7.4. Nagle Algorithm
The "Nagle algorithm" was described in RFC 896 [9] and was The "Nagle algorithm" was described in RFC 896 [11] and was
recommended in RFC 1122 [10] for mitigation of an early problem of recommended in RFC 1122 [12] for mitigation of an early problem of
too many small packets being generated. It has been implemented in too many small packets being generated. It has been implemented in
most current TCP code bases, sometimes with minor variations. most current TCP code bases, sometimes with minor variations.
If there is unacknowledged data (i.e., SND.NXT > SND.UNA), then the If there is unacknowledged data (i.e., SND.NXT > SND.UNA), then the
sending TCP buffers all user data (regardless of the PSH bit), until sending TCP buffers all user data (regardless of the PSH bit), until
the outstanding data has been acknowledged or until the TCP can send the outstanding data has been acknowledged or until the TCP can send
a full-sized segment (Eff.snd.MSS bytes). a full-sized segment (Eff.snd.MSS bytes).
TODO - see if SEND description later should be updated to reflect TODO - see if SEND description later should be updated to reflect
this this
A TCP SHOULD implement the Nagle Algorithm to coalesce short A TCP SHOULD implement the Nagle Algorithm to coalesce short
segments. However, there MUST be a way for an application to disable segments. However, there MUST be a way for an application to disable
the Nagle algorithm on an individual connection. In all cases, the Nagle algorithm on an individual connection. In all cases,
sending data is also subject to the limitation imposed by the Slow sending data is also subject to the limitation imposed by the Slow
Start algorithm [13]. Start algorithm [16].
3.7.5. IPv6 Jumbograms 3.7.5. IPv6 Jumbograms
In order to support TCP over IPv6 jumbograms, implementations need to In order to support TCP over IPv6 jumbograms, implementations need to
be able to send TCP segments larger than the 64KB limit that the MSS be able to send TCP segments larger than the 64KB limit that the MSS
option can convey. RFC 2675 [5] defines that an MSS value of 65,535 option can convey. RFC 2675 [5] defines that an MSS value of 65,535
bytes is to be treated as infinity, and Path MTU Discovery [3] is bytes is to be treated as infinity, and Path MTU Discovery [3] is
used to determine the actual MSS. used to determine the actual MSS.
3.8. Data Communication 3.8. Data Communication
skipping to change at page 37, line 11 skipping to change at page 37, line 11
The CLOSE user call implies a push function, as does the FIN control The CLOSE user call implies a push function, as does the FIN control
flag in an incoming segment. flag in an incoming segment.
3.8.1. Retransmission Timeout 3.8.1. Retransmission Timeout
Because of the variability of the networks that compose an Because of the variability of the networks that compose an
internetwork system and the wide range of uses of TCP connections the internetwork system and the wide range of uses of TCP connections the
retransmission timeout (RTO) must be dynamically determined. retransmission timeout (RTO) must be dynamically determined.
The RTO MUST be computed according to the algorithm in [7], including The RTO MUST be computed according to the algorithm in [8], including
Karn's algorithm for taking RTT samples. Karn's algorithm for taking RTT samples.
RFC 793 contains an early example procedure for computing the RTO. RFC 793 contains an early example procedure for computing the RTO.
This was then replaced by the algorithm described in RFC 1122, and This was then replaced by the algorithm described in RFC 1122, and
subsequently updated in RFC 2988, and then again in RFC 6298. subsequently updated in RFC 2988, and then again in RFC 6298.
If a retransmitted packet is identical to the original packet (which If a retransmitted packet is identical to the original packet (which
implies not only that the data boundaries have not changed, but also implies not only that the data boundaries have not changed, but also
that the window and acknowledgment fields of the header have not that the window and acknowledgment fields of the header have not
changed), then the same IP Identification field MAY be used (see changed), then the same IP Identification field MAY be used (see
Section 3.2.1.5 of RFC 1122). Section 3.2.1.5 of RFC 1122).
3.8.2. TCP Connection Failures 3.8.2. TCP Congestion Control
RFC 1122 required implementation of Van Jacobson's congestion control
algorithm combining slow start with congestion avoidance. RFC 2581
provided IETF Standards Track description of this, along with fast
retransmit and fast recovery. RFC 5681 is the current description of
these algorithms and is the current standard for TCP congestion
control.
A TCP MUST implement RFC 5681.
Explicit Congestion Notification (ECN) was defined in RFC 3168 and is
an IETF Standards Track enhancement that has many benefits [24].
A TCP SHOULD implement ECN as described in RFC 3168.
3.8.3. TCP Connection Failures
Excessive retransmission of the same segment by TCP indicates some Excessive retransmission of the same segment by TCP indicates some
failure of the remote host or the Internet path. This failure may be failure of the remote host or the Internet path. This failure may be
of short or long duration. The following procedure MUST be used to of short or long duration. The following procedure MUST be used to
handle excessive retransmissions of data segments: handle excessive retransmissions of data segments:
(a) There are two thresholds R1 and R2 measuring the amount of (a) There are two thresholds R1 and R2 measuring the amount of
retransmission that has occurred for the same segment. R1 and R2 retransmission that has occurred for the same segment. R1 and R2
might be measured in time units or as a count of retransmissions. might be measured in time units or as a count of retransmissions.
(b) When the number of transmissions of the same segment reaches (b) When the number of transmissions of the same segment reaches
or exceeds threshold R1, pass negative advice (see [10] or exceeds threshold R1, pass negative advice (see [12]
Section 3.3.1.4) to the IP layer, to trigger dead-gateway Section 3.3.1.4) to the IP layer, to trigger dead-gateway
diagnosis. diagnosis.
(c) When the number of transmissions of the same segment reaches a (c) When the number of transmissions of the same segment reaches a
threshold R2 greater than R1, close the connection. threshold R2 greater than R1, close the connection.
(d) An application MUST be able to set the value for R2 for a (d) An application MUST be able to set the value for R2 for a
particular connection. For example, an interactive application particular connection. For example, an interactive application
might set R2 to "infinity," giving the user control over when to might set R2 to "infinity," giving the user control over when to
disconnect. disconnect.
skipping to change at page 38, line 24 skipping to change at page 38, line 38
an ICMP Port Unreachable. SYN retransmissions MUST be handled in the an ICMP Port Unreachable. SYN retransmissions MUST be handled in the
general way just described for data retransmissions, including general way just described for data retransmissions, including
notification of the application layer. notification of the application layer.
However, the values of R1 and R2 may be different for SYN and data However, the values of R1 and R2 may be different for SYN and data
segments. In particular, R2 for a SYN segment MUST be set large segments. In particular, R2 for a SYN segment MUST be set large
enough to provide retransmission of the segment for at least 3 enough to provide retransmission of the segment for at least 3
minutes. The application can close the connection (i.e., give up on minutes. The application can close the connection (i.e., give up on
the open attempt) sooner, of course. the open attempt) sooner, of course.
3.8.3. TCP Keep-Alives 3.8.4. TCP Keep-Alives
Implementors MAY include "keep-alives" in their TCP implementations, Implementors MAY include "keep-alives" in their TCP implementations,
although this practice is not universally accepted. If keep-alives although this practice is not universally accepted. If keep-alives
are included, the application MUST be able to turn them on or off for are included, the application MUST be able to turn them on or off for
each TCP connection, and they MUST default to off. each TCP connection, and they MUST default to off.
Keep-alive packets MUST only be sent when no data or acknowledgement Keep-alive packets MUST only be sent when no data or acknowledgement
packets have been received for the connection within an interval. packets have been received for the connection within an interval.
This interval MUST be configurable and MUST default to no less than This interval MUST be configurable and MUST default to no less than
two hours. two hours.
skipping to change at page 38, line 46 skipping to change at page 39, line 12
It is extremely important to remember that ACK segments that contain It is extremely important to remember that ACK segments that contain
no data are not reliably transmitted by TCP. Consequently, if a no data are not reliably transmitted by TCP. Consequently, if a
keep-alive mechanism is implemented it MUST NOT interpret failure to keep-alive mechanism is implemented it MUST NOT interpret failure to
respond to any specific probe as a dead connection. respond to any specific probe as a dead connection.
An implementation SHOULD send a keep-alive segment with no data; An implementation SHOULD send a keep-alive segment with no data;
however, it MAY be configurable to send a keep-alive segment however, it MAY be configurable to send a keep-alive segment
containing one garbage octet, for compatibility with erroneous TCP containing one garbage octet, for compatibility with erroneous TCP
implementations. implementations.
3.8.4. The Communication of Urgent Information 3.8.5. The Communication of Urgent Information
As a result of implementation differences and middlebox interactions, As a result of implementation differences and middlebox interactions,
new applications SHOULD NOT employ the TCP urgent mechanism. new applications SHOULD NOT employ the TCP urgent mechanism.
However, TCP implementations MUST still include support for the However, TCP implementations MUST still include support for the
urgent mechanism. Details can be found in RFC 6093 [15]. urgent mechanism. Details can be found in RFC 6093 [18].
The objective of the TCP urgent mechanism is to allow the sending The objective of the TCP urgent mechanism is to allow the sending
user to stimulate the receiving user to accept some urgent data and user to stimulate the receiving user to accept some urgent data and
to permit the receiving TCP to indicate to the receiving user when to permit the receiving TCP to indicate to the receiving user when
all the currently known urgent data has been received by the user. all the currently known urgent data has been received by the user.
This mechanism permits a point in the data stream to be designated as This mechanism permits a point in the data stream to be designated as
the end of urgent information. Whenever this point is in advance of the end of urgent information. Whenever this point is in advance of
the receive sequence number (RCV.NXT) at the receiving TCP, that TCP the receive sequence number (RCV.NXT) at the receiving TCP, that TCP
must tell the user to go into "urgent mode"; when the receive must tell the user to go into "urgent mode"; when the receive
skipping to change at page 39, line 30 skipping to change at page 39, line 44
transmitted. The URG control flag indicates that the urgent field is transmitted. The URG control flag indicates that the urgent field is
meaningful and must be added to the segment sequence number to yield meaningful and must be added to the segment sequence number to yield
the urgent pointer. The absence of this flag indicates that there is the urgent pointer. The absence of this flag indicates that there is
no urgent data outstanding. no urgent data outstanding.
To send an urgent indication the user must also send at least one To send an urgent indication the user must also send at least one
data octet. If the sending user also indicates a push, timely data octet. If the sending user also indicates a push, timely
delivery of the urgent information to the destination process is delivery of the urgent information to the destination process is
enhanced. enhanced.
A TCP MUST support a sequence of urgent data of any length. [10] A TCP MUST support a sequence of urgent data of any length. [12]
A TCP MUST inform the application layer asynchronously whenever it A TCP MUST inform the application layer asynchronously whenever it
receives an Urgent pointer and there was previously no pending urgent receives an Urgent pointer and there was previously no pending urgent
data, or whenvever the Urgent pointer advances in the data stream. data, or whenvever the Urgent pointer advances in the data stream.
There MUST be a way for the application to learn how much urgent data There MUST be a way for the application to learn how much urgent data
remains to be read from the connection, or at least to determine remains to be read from the connection, or at least to determine
whether or not more urgent data remains to be read. [10] whether or not more urgent data remains to be read. [12]
3.8.5. Managing the Window 3.8.6. Managing the Window
The window sent in each segment indicates the range of sequence The window sent in each segment indicates the range of sequence
numbers the sender of the window (the data receiver) is currently numbers the sender of the window (the data receiver) is currently
prepared to accept. There is an assumption that this is related to prepared to accept. There is an assumption that this is related to
the currently available data buffer space available for this the currently available data buffer space available for this
connection. connection.
The sending TCP packages the data to be transmitted into segments The sending TCP packages the data to be transmitted into segments
which fit the current window, and may repackage segments on the which fit the current window, and may repackage segments on the
retransmission queue. Such repackaging is not required, but may be retransmission queue. Such repackaging is not required, but may be
skipping to change at page 40, line 32 skipping to change at page 40, line 45
The mechanisms provided allow a TCP to advertise a large window and The mechanisms provided allow a TCP to advertise a large window and
to subsequently advertise a much smaller window without having to subsequently advertise a much smaller window without having
accepted that much data. This, so called "shrinking the window," is accepted that much data. This, so called "shrinking the window," is
strongly discouraged. The robustness principle dictates that TCPs strongly discouraged. The robustness principle dictates that TCPs
will not shrink the window themselves, but will be prepared for such will not shrink the window themselves, but will be prepared for such
behavior on the part of other TCPs. behavior on the part of other TCPs.
A TCP receiver SHOULD NOT shrink the window, i.e., move the right A TCP receiver SHOULD NOT shrink the window, i.e., move the right
window edge to the left. However, a sending TCP MUST be robust window edge to the left. However, a sending TCP MUST be robust
against window shrinking, which may cause the "useable window" (see against window shrinking, which may cause the "useable window" (see
Section 3.8.5.2.1) to become negative. Section 3.8.6.2.1) to become negative.
If this happens, the sender SHOULD NOT send new data, but SHOULD If this happens, the sender SHOULD NOT send new data, but SHOULD
retransmit normally the old unacknowledged data between SND.UNA and retransmit normally the old unacknowledged data between SND.UNA and
SND.UNA+SND.WND. The sender MAY also retransmit old data beyond SND.UNA+SND.WND. The sender MAY also retransmit old data beyond
SND.UNA+SND.WND, but SHOULD NOT time out the connection if data SND.UNA+SND.WND, but SHOULD NOT time out the connection if data
beyond the right window edge is not acknowledged. If the window beyond the right window edge is not acknowledged. If the window
shrinks to zero, the TCP MUST probe it in the standard way (described shrinks to zero, the TCP MUST probe it in the standard way (described
below). below).
3.8.5.1. Zero Window Probing 3.8.6.1. Zero Window Probing
The sending TCP must be prepared to accept from the user and send at The sending TCP must be prepared to accept from the user and send at
least one octet of new data even if the send window is zero. The least one octet of new data even if the send window is zero. The
sending TCP must regularly retransmit to the receiving TCP even when sending TCP must regularly retransmit to the receiving TCP even when
the window is zero, in order to "probe" the window. Two minutes is the window is zero, in order to "probe" the window. Two minutes is
recommended for the retransmission interval when the window is zero. recommended for the retransmission interval when the window is zero.
This retransmission is essential to guarantee that when either TCP This retransmission is essential to guarantee that when either TCP
has a zero window the re-opening of the window will be reliably has a zero window the re-opening of the window will be reliably
reported to the other. This is referred to as Zero-Window Probing reported to the other. This is referred to as Zero-Window Probing
(ZWP) in other documents. (ZWP) in other documents.
Probing of zero (offered) windows MUST be supported. Probing of zero (offered) windows MUST be supported.
A TCP MAY keep its offered receive window closed indefinitely. As A TCP MAY keep its offered receive window closed indefinitely. As
long as the receiving TCP continues to send acknowledgments in long as the receiving TCP continues to send acknowledgments in
response to the probe segments, the sending TCP MUST allow the response to the probe segments, the sending TCP MUST allow the
connection to stay open. This enables TCP to function in scenarios connection to stay open. This enables TCP to function in scenarios
such as the "printer ran out of paper" situation described in such as the "printer ran out of paper" situation described in
Section 4.2.2.17 of RFC1122. The behavior is subject to the Section 4.2.2.17 of RFC1122. The behavior is subject to the
implementation's resource management concerns, as noted in [16]. implementation's resource management concerns, as noted in [19].
When the receiving TCP has a zero window and a segment arrives it When the receiving TCP has a zero window and a segment arrives it
must still send an acknowledgment showing its next expected sequence must still send an acknowledgment showing its next expected sequence
number and current window (zero). number and current window (zero).
3.8.5.2. Silly Window Syndrome Avoidance 3.8.6.2. Silly Window Syndrome Avoidance
The "Silly Window Syndrome" (SWS) is a stable pattern of small The "Silly Window Syndrome" (SWS) is a stable pattern of small
incremental window movements resulting in extremely poor TCP incremental window movements resulting in extremely poor TCP
performance. Algorithms to avoid SWS are described below for both performance. Algorithms to avoid SWS are described below for both
the sending side and the receiving side. RFC 1122 contains more the sending side and the receiving side. RFC 1122 contains more
detailed discussion of the SWS problem. Note that the Nagle detailed discussion of the SWS problem. Note that the Nagle
algorithm and the sender SWS avoidance algorithm play complementary algorithm and the sender SWS avoidance algorithm play complementary
roles in improving performance. The Nagle algorithm discourages roles in improving performance. The Nagle algorithm discourages
sending tiny segments when the data to be sent increases in small sending tiny segments when the data to be sent increases in small
increments, while the SWS avoidance algorithm discourages small increments, while the SWS avoidance algorithm discourages small
segments resulting from the right window edge advancing in small segments resulting from the right window edge advancing in small
increments. increments.
3.8.5.2.1. Sender's Algorithm - When to Send Data 3.8.6.2.1. Sender's Algorithm - When to Send Data
A TCP MUST include a SWS avoidance algorithm in the sender. A TCP MUST include a SWS avoidance algorithm in the sender.
A TCP SHOULD implement the Nagle Algorithm to coalesce short A TCP SHOULD implement the Nagle Algorithm to coalesce short
segments. However, there MUST be a way for an application to disable segments. However, there MUST be a way for an application to disable
the Nagle algorithm on an individual connection. In all cases, the Nagle algorithm on an individual connection. In all cases,
sending data is also subject to the limitation imposed by the Slow sending data is also subject to the limitation imposed by the Slow
Start algorithm. Start algorithm.
The sender's SWS avoidance algorithm is more difficult than the The sender's SWS avoidance algorithm is more difficult than the
skipping to change at page 42, line 38 skipping to change at page 43, line 8
[SND.NXT = SND.UNA and] [SND.NXT = SND.UNA and]
min(D.U) >= Fs * Max(SND.WND); min(D.U) >= Fs * Max(SND.WND);
(4) or if data is PUSHed and the override timeout occurs. (4) or if data is PUSHed and the override timeout occurs.
Here Fs is a fraction whose recommended value is 1/2. The override Here Fs is a fraction whose recommended value is 1/2. The override
timeout should be in the range 0.1 - 1.0 seconds. It may be timeout should be in the range 0.1 - 1.0 seconds. It may be
convenient to combine this timer with the timer used to probe zero convenient to combine this timer with the timer used to probe zero
windows (Section Section 3.8.5.1). windows (Section Section 3.8.6.1).
3.8.5.2.2. Receiver's Algorithm - When to Send a Window Update 3.8.6.2.2. Receiver's Algorithm - When to Send a Window Update
A TCP MUST include a SWS avoidance algorithm in the receiver. A TCP MUST include a SWS avoidance algorithm in the receiver.
The receiver's SWS avoidance algorithm determines when the right The receiver's SWS avoidance algorithm determines when the right
window edge may be advanced; this is customarily known as "updating window edge may be advanced; this is customarily known as "updating
the window". This algorithm combines with the delayed ACK algorithm the window". This algorithm combines with the delayed ACK algorithm
(see Section 3.8.5.3) to determine when an ACK segment containing the (see Section 3.8.6.3) to determine when an ACK segment containing the
current window will really be sent to the receiver. current window will really be sent to the receiver.
The solution to receiver SWS is to avoid advancing the right window The solution to receiver SWS is to avoid advancing the right window
edge RCV.NXT+RCV.WND in small increments, even if data is received edge RCV.NXT+RCV.WND in small increments, even if data is received
from the network in small segments. from the network in small segments.
Suppose the total receive buffer space is RCV.BUFF. At any given Suppose the total receive buffer space is RCV.BUFF. At any given
moment, RCV.USER octets of this total may be tied up with data that moment, RCV.USER octets of this total may be tied up with data that
has been received and acknowledged but which the user process has not has been received and acknowledged but which the user process has not
yet consumed. When the connection is quiescent, RCV.WND = RCV.BUFF yet consumed. When the connection is quiescent, RCV.WND = RCV.BUFF
skipping to change at page 43, line 45 skipping to change at page 44, line 19
where Fr is a fraction whose recommended value is 1/2, and where Fr is a fraction whose recommended value is 1/2, and
Eff.snd.MSS is the effective send MSS for the connection (see Eff.snd.MSS is the effective send MSS for the connection (see
Section 3.7.1). When the inequality is satisfied, RCV.WND is set to Section 3.7.1). When the inequality is satisfied, RCV.WND is set to
RCV.BUFF-RCV.USER. RCV.BUFF-RCV.USER.
Note that the general effect of this algorithm is to advance RCV.WND Note that the general effect of this algorithm is to advance RCV.WND
in increments of Eff.snd.MSS (for realistic receive buffers: in increments of Eff.snd.MSS (for realistic receive buffers:
Eff.snd.MSS < RCV.BUFF/2). Note also that the receiver must use its Eff.snd.MSS < RCV.BUFF/2). Note also that the receiver must use its
own Eff.snd.MSS, assuming it is the same as the sender's. own Eff.snd.MSS, assuming it is the same as the sender's.
3.8.5.3. Delayed Acknowledgements - When to Send an ACK Segment 3.8.6.3. Delayed Acknowledgements - When to Send an ACK Segment
A host that is receiving a stream of TCP data segments can increase A host that is receiving a stream of TCP data segments can increase
efficiency in both the Internet and the hosts by sending fewer than efficiency in both the Internet and the hosts by sending fewer than
one ACK (acknowledgment) segment per data segment received; this is one ACK (acknowledgment) segment per data segment received; this is
known as a "delayed ACK". known as a "delayed ACK".
A TCP SHOULD implement a delayed ACK, but an ACK should not be A TCP SHOULD implement a delayed ACK, but an ACK should not be
excessively delayed; in particular, the delay MUST be less than 0.5 excessively delayed; in particular, the delay MUST be less than 0.5
seconds, and in a stream of full-sized segments there SHOULD be an seconds, and in a stream of full-sized segments there SHOULD be an
ACK for at least every second segment. Excessive delays on ACK's can ACK for at least every second segment. Excessive delays on ACK's can
skipping to change at page 47, line 36 skipping to change at page 48, line 7
which case, an automatic OPEN would be done. If the calling which case, an automatic OPEN would be done. If the calling
process is not authorized to use this connection, an error is process is not authorized to use this connection, an error is
returned. returned.
If the PUSH flag is set, the data must be transmitted promptly If the PUSH flag is set, the data must be transmitted promptly
to the receiver, and the PUSH bit will be set in the last TCP to the receiver, and the PUSH bit will be set in the last TCP
segment created from the buffer. If the PUSH flag is not set, segment created from the buffer. If the PUSH flag is not set,
the data may be combined with data from subsequent SENDs for the data may be combined with data from subsequent SENDs for
transmission efficiency. transmission efficiency.
New applications SHOULD NOT set the URGENT flag [15] due to New applications SHOULD NOT set the URGENT flag [18] due to
implementation differences and middlebox issues. implementation differences and middlebox issues.
If the URGENT flag is set, segments sent to the destination TCP If the URGENT flag is set, segments sent to the destination TCP
will have the urgent pointer set. The receiving TCP will will have the urgent pointer set. The receiving TCP will
signal the urgent condition to the receiving process if the signal the urgent condition to the receiving process if the
urgent pointer indicates that data preceding the urgent pointer urgent pointer indicates that data preceding the urgent pointer
has not been consumed by the receiving process. The purpose of has not been consumed by the receiving process. The purpose of
urgent is to stimulate the receiver to process the urgent data urgent is to stimulate the receiver to process the urgent data
and to indicate to the receiver when all the currently known and to indicate to the receiver when all the currently known
urgent data has been received. The number of times the sending urgent data has been received. The number of times the sending
skipping to change at page 53, line 33 skipping to change at page 54, line 7
source route received in a datagram. source route received in a datagram.
When a TCP connection is OPENed passively and a packet arrives with a When a TCP connection is OPENed passively and a packet arrives with a
completed IP Source Route option (containing a return route), TCP completed IP Source Route option (containing a return route), TCP
MUST save the return route and use it for all segments sent on this MUST save the return route and use it for all segments sent on this
connection. If a different source route arrives in a later segment, connection. If a different source route arrives in a later segment,
the later definition SHOULD override the earlier one. the later definition SHOULD override the earlier one.
3.9.2.2. ICMP Messages 3.9.2.2. ICMP Messages
TODO - this section is verbatim from 1122, currently. It should be
revised to match the soft-errors RFC, and other updates (e.g. source
quench deprecation)
TCP MUST act on an ICMP error message passed up from the IP layer, TCP MUST act on an ICMP error message passed up from the IP layer,
directing it to the connection that created the error. The necessary directing it to the connection that created the error. The necessary
demultiplexing information can be found in the IP header contained demultiplexing information can be found in the IP header contained
within the ICMP message. within the ICMP message.
This applies to ICMPv6 in addition to IPv4 ICMP.
[15] contains discussion of specific ICMP and ICMPv6 messages
classified as either "soft" or "hard" errors that may bear different
responses. Treatment for classes of ICMP messages is described
below:
Source Quench Source Quench
TCP MUST react to a Source Quench by slowing transmission on the TCP MUST silently discard any received ICMP Source Quench messages.
connection. The RECOMMENDED procedure is for a Source Quench to See [9] for discussion.
trigger a "slow start," as if a retransmission timeout had
occurred.
Destination Unreachable -- codes 0, 1, 5 Soft Errors
For ICMP these include: Destination Unreachable -- codes 0, 1, 5,
Time Exceeded -- codes 0, 1, and Parameter Problem.
For ICMPv6 these include: Destination Unreachable -- codes 0 and 3,
Time Exceeded -- codes 0, 1, and Parameter Problem -- codes 0, 1, 2
Since these Unreachable messages indicate soft error conditions, Since these Unreachable messages indicate soft error conditions,
TCP MUST NOT abort the connection, and it SHOULD make the TCP MUST NOT abort the connection, and it SHOULD make the
information available to the application. information available to the application.
Destination Unreachable -- codes 2-4 Hard Errors
For ICMP these include Destination Unreachable -- codes 2-4">
These are hard error conditions, so TCP SHOULD abort the These are hard error conditions, so TCP SHOULD abort the
connection. connection. [15] notes that some implementations do not abort
connections when an ICMP hard error is received for a connection
Time Exceeded -- codes 0, 1 that is in any of the synchronized states.
This should be handled the same way as Destination Unreachable
codes 0, 1, 5 (see above).
Parameter Problem Note that [15] section 4 describes widespread implementation behavior
This should be handled the same way as Destination Unreachable that treats soft errors as hard errors during connection
codes 0, 1, 5 (see above). establishment.
3.10. Event Processing 3.10. Event Processing
The processing depicted in this section is an example of one possible The processing depicted in this section is an example of one possible
implementation. Other implementations may have slightly different implementation. Other implementations may have slightly different
processing sequences, but they should differ from those in this processing sequences, but they should differ from those in this
section only in detail, not in substance. section only in detail, not in substance.
The activity of the TCP can be characterized as responding to events. The activity of the TCP can be characterized as responding to events.
The events that occur can be cast into three categories: user calls, The events that occur can be cast into three categories: user calls,
skipping to change at page 87, line 13 skipping to change at page 88, line 13
clearer subsections (previously headings were embedded in the 793 clearer subsections (previously headings were embedded in the 793
text), and windows management advice from 793 was removed (as text), and windows management advice from 793 was removed (as
reviewed by TCPM working group) in favor of the 1122 additions on reviewed by TCPM working group) in favor of the 1122 additions on
SWS, ZWP, and related topics. SWS, ZWP, and related topics.
The -04 revision includes reference to RFC 6429 on the ZWP condition, The -04 revision includes reference to RFC 6429 on the ZWP condition,
RFC1122 material on TCP Connection Failures, TCP Keep-Alives, RFC1122 material on TCP Connection Failures, TCP Keep-Alives,
Acknowledging Queued Segments, and Remote Address Validation. RTO Acknowledging Queued Segments, and Remote Address Validation. RTO
computation is referenced from RFC 6298 rather than RFC 1122. computation is referenced from RFC 6298 rather than RFC 1122.
The -05 revision includes the requirement to implement TCP congestion
control with recommendation to implemente ECN, the RFC 6633 update to
1122, which changed the requirement on responding to source quench
ICMP messages, and discussion of ICMP (and ICMPv6) soft and hard
errors per RFC 5461 (ICMPv6 handling for TCP doesn't seem to be
mentioned elsewhere in standards track).
TODO list of other planned changes (these can be added to or made TODO list of other planned changes (these can be added to or made
more specific, as the document proceeds): more specific, as the document proceeds):
1. incorporate relevant parts of 3168 (ECN) - beyond just indicating 1. mention 5961 state machine option
the names of the 2 bits already done 2. mention 6161 (reducing TIME-WAIT)
2. point to 5461 (soft errors) 3. TOS material does not take DSCP changes into account
3. mention 5961 state machine option 4. there is inconsistency between use of SYN_RCVD and SYNC-RECEIVED
4. mention 6161 (reducing TIME-WAIT)
5. TOS material does not take DSCP changes into account
6. there is inconsistency between use of SYN_RCVD and SYNC-RECEIVED
in diagrams and text in various places in diagrams and text in various places
7. make sure that clarifications in RFC 1011 are captured 5. make sure that clarifications in RFC 1011 are captured
TODO list of other potential changes, if there is TCPM consensus: TODO list of other potential changes, if there is TCPM consensus:
1. see draft-gont-tcpm-tcp-seccomp-prec 1. see draft-gont-tcpm-tcp-seccomp-prec
2. incorporate Fernando's new number-checking fixes (if past the 2. incorporate Fernando's new number-checking fixes (if past the
IESG in time) IESG in time)
3. look at Tony Sabatini suggestion for describing DO field 3. look at Tony Sabatini suggestion for describing DO field
4. clearly specify treatment of reserved bits (see TCPM thread on 4. clearly specify treatment of reserved bits (see TCPM thread on
EDO draft April 25, 2014) EDO draft April 25, 2014)
5. look at possible mention of draft-minshall-nagle (e.g. as in 5. look at possible mention of draft-minshall-nagle (e.g. as in
skipping to change at page 88, line 4 skipping to change at page 89, line 8
This memo includes no request to IANA. Existing IANA registries for This memo includes no request to IANA. Existing IANA registries for
TCP parameters are sufficient. TCP parameters are sufficient.
TODO: check whether entries pointing to 793 and other documents TODO: check whether entries pointing to 793 and other documents
obsoleted by this one should be updated to point to this one instead. obsoleted by this one should be updated to point to this one instead.
6. Security and Privacy Considerations 6. Security and Privacy Considerations
TODO TODO
See RFC 6093 [15] for discussion of security considerations related
See RFC 6093 [18] for discussion of security considerations related
to the urgent pointer field. to the urgent pointer field.
Editor's Note: Scott Brim mentioned that this should include a Editor's Note: Scott Brim mentioned that this should include a
PERPASS/privacy review. PERPASS/privacy review.
7. Acknowledgements 7. Acknowledgements
This document is largely a revision of RFC 793, which Jon Postel was This document is largely a revision of RFC 793, which Jon Postel was
the editor of. Due to his excellent work, it was able to last for the editor of. Due to his excellent work, it was able to last for
three decades before we felt the need to revise it. three decades before we felt the need to revise it.
skipping to change at page 89, line 22 skipping to change at page 90, line 26
<http://www.rfc-editor.org/info/rfc2119>. <http://www.rfc-editor.org/info/rfc2119>.
[5] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", [5] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms",
RFC 2675, DOI 10.17487/RFC2675, August 1999, RFC 2675, DOI 10.17487/RFC2675, August 1999,
<http://www.rfc-editor.org/info/rfc2675>. <http://www.rfc-editor.org/info/rfc2675>.
[6] Lahey, K., "TCP Problems with Path MTU Discovery", [6] Lahey, K., "TCP Problems with Path MTU Discovery",
RFC 2923, DOI 10.17487/RFC2923, September 2000, RFC 2923, DOI 10.17487/RFC2923, September 2000,
<http://www.rfc-editor.org/info/rfc2923>. <http://www.rfc-editor.org/info/rfc2923>.
[7] Paxson, V., Allman, M., Chu, J., and M. Sargent, [7] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001,
<http://www.rfc-editor.org/info/rfc3168>.
[8] Paxson, V., Allman, M., Chu, J., and M. Sargent,
"Computing TCP's Retransmission Timer", RFC 6298, "Computing TCP's Retransmission Timer", RFC 6298,
DOI 10.17487/RFC6298, June 2011, DOI 10.17487/RFC6298, June 2011,
<http://www.rfc-editor.org/info/rfc6298>. <http://www.rfc-editor.org/info/rfc6298>.
[9] Gont, F., "Deprecation of ICMP Source Quench Messages",
RFC 6633, DOI 10.17487/RFC6633, May 2012,
<http://www.rfc-editor.org/info/rfc6633>.
8.2. Informative References 8.2. Informative References
[8] Postel, J., "Transmission Control Protocol", STD 7, [10] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, DOI 10.17487/RFC0793, September 1981, RFC 793, DOI 10.17487/RFC0793, September 1981,
<http://www.rfc-editor.org/info/rfc793>. <http://www.rfc-editor.org/info/rfc793>.
[9] Nagle, J., "Congestion Control in IP/TCP Internetworks", [11] Nagle, J., "Congestion Control in IP/TCP Internetworks",
RFC 896, DOI 10.17487/RFC0896, January 1984, RFC 896, DOI 10.17487/RFC0896, January 1984,
<http://www.rfc-editor.org/info/rfc896>. <http://www.rfc-editor.org/info/rfc896>.
[10] Braden, R., Ed., "Requirements for Internet Hosts - [12] Braden, R., Ed., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, Communication Layers", STD 3, RFC 1122,
DOI 10.17487/RFC1122, October 1989, DOI 10.17487/RFC1122, October 1989,
<http://www.rfc-editor.org/info/rfc1122>. <http://www.rfc-editor.org/info/rfc1122>.
[11] Mathis, M. and J. Heffner, "Packetization Layer Path MTU [13] Mathis, M. and J. Heffner, "Packetization Layer Path MTU
Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007,
<http://www.rfc-editor.org/info/rfc4821>. <http://www.rfc-editor.org/info/rfc4821>.
[12] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. [14] Culley, P., Elzur, U., Recio, R., Bailey, S., and J.
Carrier, "Marker PDU Aligned Framing for TCP Carrier, "Marker PDU Aligned Framing for TCP
Specification", RFC 5044, DOI 10.17487/RFC5044, October Specification", RFC 5044, DOI 10.17487/RFC5044, October
2007, <http://www.rfc-editor.org/info/rfc5044>. 2007, <http://www.rfc-editor.org/info/rfc5044>.
[13] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [15] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461,
DOI 10.17487/RFC5461, February 2009,
<http://www.rfc-editor.org/info/rfc5461>.
[16] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
<http://www.rfc-editor.org/info/rfc5681>. <http://www.rfc-editor.org/info/rfc5681>.
[14] Sandlund, K., Pelletier, G., and L-E. Jonsson, "The RObust [17] Sandlund, K., Pelletier, G., and L-E. Jonsson, "The RObust
Header Compression (ROHC) Framework", RFC 5795, Header Compression (ROHC) Framework", RFC 5795,
DOI 10.17487/RFC5795, March 2010, DOI 10.17487/RFC5795, March 2010,
<http://www.rfc-editor.org/info/rfc5795>. <http://www.rfc-editor.org/info/rfc5795>.
[15] Gont, F. and A. Yourtchenko, "On the Implementation of the [18] Gont, F. and A. Yourtchenko, "On the Implementation of the
TCP Urgent Mechanism", RFC 6093, DOI 10.17487/RFC6093, TCP Urgent Mechanism", RFC 6093, DOI 10.17487/RFC6093,
January 2011, <http://www.rfc-editor.org/info/rfc6093>. January 2011, <http://www.rfc-editor.org/info/rfc6093>.
[16] Bashyam, M., Jethanandani, M., and A. Ramaiah, "TCP Sender [19] Bashyam, M., Jethanandani, M., and A. Ramaiah, "TCP Sender
Clarification for Persist Condition", RFC 6429, Clarification for Persist Condition", RFC 6429,
DOI 10.17487/RFC6429, December 2011, DOI 10.17487/RFC6429, December 2011,
<http://www.rfc-editor.org/info/rfc6429>. <http://www.rfc-editor.org/info/rfc6429>.
[17] Gont, F. and S. Bellovin, "Defending against Sequence [20] Gont, F. and S. Bellovin, "Defending against Sequence
Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February
2012, <http://www.rfc-editor.org/info/rfc6528>. 2012, <http://www.rfc-editor.org/info/rfc6528>.
[18] Borman, D., "TCP Options and Maximum Segment Size (MSS)", [21] Borman, D., "TCP Options and Maximum Segment Size (MSS)",
RFC 6691, DOI 10.17487/RFC6691, July 2012, RFC 6691, DOI 10.17487/RFC6691, July 2012,
<http://www.rfc-editor.org/info/rfc6691>. <http://www.rfc-editor.org/info/rfc6691>.
[19] Borman, D., Braden, B., Jacobson, V., and R. [22] Borman, D., Braden, B., Jacobson, V., and R.
Scheffenegger, Ed., "TCP Extensions for High Performance", Scheffenegger, Ed., "TCP Extensions for High Performance",
RFC 7323, DOI 10.17487/RFC7323, September 2014, RFC 7323, DOI 10.17487/RFC7323, September 2014,
<http://www.rfc-editor.org/info/rfc7323>. <http://www.rfc-editor.org/info/rfc7323>.
[20] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. [23] Duke, M., Braden, R., Eddy, W., Blanton, E., and A.
Zimmermann, "A Roadmap for Transmission Control Protocol Zimmermann, "A Roadmap for Transmission Control Protocol
(TCP) Specification Documents", RFC 7414, (TCP) Specification Documents", RFC 7414,
DOI 10.17487/RFC7414, February 2015, DOI 10.17487/RFC7414, February 2015,
<http://www.rfc-editor.org/info/rfc7414>. <http://www.rfc-editor.org/info/rfc7414>.
[24] Fairhurst, G. and M. Welzl, "The Benefits of Using
Explicit Congestion Notification (ECN)", RFC 8087,
DOI 10.17487/RFC8087, March 2017,
<http://www.rfc-editor.org/info/rfc8087>.
Appendix A. TCP Requirement Summary Appendix A. TCP Requirement Summary
This section is adapted from RFC 1122. This section is adapted from RFC 1122.
TODO: this needs to be seriously redone, to use 793bis section TODO: this needs to be seriously redone, to use 793bis section
numbers instead of 1122 ones, the RFC1122 heading should be removed, numbers instead of 1122 ones, the RFC1122 heading should be removed,
and all 1122 requirements need to be reflected in 793bis text. and all 1122 requirements need to be reflected in 793bis text.
TODO: NOTE that PMTUD+PLPMTUD is not included in this table of TODO: NOTE that PMTUD+PLPMTUD is not included in this table of
recommendations. recommendations.
skipping to change at page 93, line 42 skipping to change at page 95, line 17
Source Route: | | | | | | | Source Route: | | | | | | |
ALP can specify |4.2.3.8 |x| | | | |1 ALP can specify |4.2.3.8 |x| | | | |1
Overrides src rt in datagram |4.2.3.8 |x| | | | | Overrides src rt in datagram |4.2.3.8 |x| | | | |
Build return route from src rt |4.2.3.8 |x| | | | | Build return route from src rt |4.2.3.8 |x| | | | |
Later src route overrides |4.2.3.8 | |x| | | | Later src route overrides |4.2.3.8 | |x| | | |
| | | | | | | | | | | | | |
Receiving ICMP Messages from IP |4.2.3.9 |x| | | | | Receiving ICMP Messages from IP |4.2.3.9 |x| | | | |
Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | | Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | |
Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x| Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x|
Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | | Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | |
Source Quench => slow start |4.2.3.9 | |x| | | | Source Quench => silent discard |4.2.3.9 | |x| | | |
Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | | Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | |
Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | | Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | |
| | | | | | | | | | | | | |
Address Validation | | | | | | | Address Validation | | | | | | |
Reject OPEN call to invalid IP address |4.2.3.10|x| | | | | Reject OPEN call to invalid IP address |4.2.3.10|x| | | | |
Reject SYN from invalid IP address |4.2.3.10|x| | | | | Reject SYN from invalid IP address |4.2.3.10|x| | | | |
Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | | Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | |
| | | | | | | | | | | | | |
TCP/ALP Interface Services | | | | | | | TCP/ALP Interface Services | | | | | | |
Error Report mechanism |4.2.4.1 |x| | | | | Error Report mechanism |4.2.4.1 |x| | | | |
 End of changes. 70 change blocks. 
100 lines changed or deleted 144 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/