draft-ietf-tcpm-rfc793bis-06.txt   draft-ietf-tcpm-rfc793bis-07.txt 
Internet Engineering Task Force W. Eddy, Ed. Internet Engineering Task Force W. Eddy, Ed.
Internet-Draft MTI Systems Internet-Draft MTI Systems
Obsoletes: 793, 879, 6093, 6429, 6528, July 17, 2017 Obsoletes: 793, 879, 6093, 6429, 6528, November 12, 2017
6691 (if approved) 6691 (if approved)
Updates: 5961, 1122 (if approved) Updates: 5961, 1122 (if approved)
Intended status: Standards Track Intended status: Standards Track
Expires: January 18, 2018 Expires: May 16, 2018
Transmission Control Protocol Specification Transmission Control Protocol Specification
draft-ietf-tcpm-rfc793bis-06 draft-ietf-tcpm-rfc793bis-07
Abstract Abstract
This document specifies the Internet's Transmission Control Protocol This document specifies the Internet's Transmission Control Protocol
(TCP). TCP is an important transport layer protocol in the Internet (TCP). TCP is an important transport layer protocol in the Internet
stack, and has continuously evolved over decades of use and growth of stack, and has continuously evolved over decades of use and growth of
the Internet. Over this time, a number of changes have been made to the Internet. Over this time, a number of changes have been made to
TCP as it was specified in RFC 793, though these have only been TCP as it was specified in RFC 793, though these have only been
documented in a piecemeal fashion. This document collects and brings documented in a piecemeal fashion. This document collects and brings
those changes together with the protocol specification from RFC 793. those changes together with the protocol specification from RFC 793.
skipping to change at page 1, line 47 skipping to change at page 1, line 47
document are to be interpreted as described in RFC 2119 [4]. document are to be interpreted as described in RFC 2119 [4].
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 18, 2018. This Internet-Draft will expire on May 16, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
This document may contain material from IETF Documents or IETF This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this 10, 2008. The person(s) controlling the copyright in some of this
skipping to change at page 3, line 16 skipping to change at page 3, line 16
3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 35 3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 35
3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 36 3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 36
3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 36 3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 36
3.8. Data Communication . . . . . . . . . . . . . . . . . . . 36 3.8. Data Communication . . . . . . . . . . . . . . . . . . . 36
3.8.1. Retransmission Timeout . . . . . . . . . . . . . . . 37 3.8.1. Retransmission Timeout . . . . . . . . . . . . . . . 37
3.8.2. TCP Congestion Control . . . . . . . . . . . . . . . 37 3.8.2. TCP Congestion Control . . . . . . . . . . . . . . . 37
3.8.3. TCP Connection Failures . . . . . . . . . . . . . . . 38 3.8.3. TCP Connection Failures . . . . . . . . . . . . . . . 38
3.8.4. TCP Keep-Alives . . . . . . . . . . . . . . . . . . . 39 3.8.4. TCP Keep-Alives . . . . . . . . . . . . . . . . . . . 39
3.8.5. The Communication of Urgent Information . . . . . . . 39 3.8.5. The Communication of Urgent Information . . . . . . . 39
3.8.6. Managing the Window . . . . . . . . . . . . . . . . . 40 3.8.6. Managing the Window . . . . . . . . . . . . . . . . . 40
3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 44 3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 45
3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 45 3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 45
3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 53 3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 53
3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 55 3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 55
3.11. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 80 3.11. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 81
4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 86 4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 87
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 90 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 90
6. Security and Privacy Considerations . . . . . . . . . . . . . 90 6. Security and Privacy Considerations . . . . . . . . . . . . . 91
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 90 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 92
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 91 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.1. Normative References . . . . . . . . . . . . . . . . . . 91 8.1. Normative References . . . . . . . . . . . . . . . . . . 92
8.2. Informative References . . . . . . . . . . . . . . . . . 92 8.2. Informative References . . . . . . . . . . . . . . . . . 94
Appendix A. Other Implementation Notes . . . . . . . . . . . . . 93 Appendix A. Other Implementation Notes . . . . . . . . . . . . . 96
Appendix B. TCP Requirement Summary . . . . . . . . . . . . . . 94 A.1. IP Security Compartment and Precedence . . . . . . . . . 96
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 97 A.2. Sequence Number Validation . . . . . . . . . . . . . . . 97
A.3. Nagle Modification . . . . . . . . . . . . . . . . . . . 97
A.4. Low Water Mark . . . . . . . . . . . . . . . . . . . . . 97
Appendix B. TCP Requirement Summary . . . . . . . . . . . . . . 98
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 101
1. Purpose and Scope 1. Purpose and Scope
In 1981, RFC 793 [12] was released, documenting the Transmission In 1981, RFC 793 [12] was released, documenting the Transmission
Control Protocol (TCP), and replacing earlier specifications for TCP Control Protocol (TCP), and replacing earlier specifications for TCP
that had been published in the past. that had been published in the past.
Since then, TCP has been implemented many times, and has been used as Since then, TCP has been implemented many times, and has been used as
a transport protocol for numerous applications on the Internet. a transport protocol for numerous applications on the Internet.
For several decades, RFC 793 plus a number of other documents have For several decades, RFC 793 plus a number of other documents have
combined to serve as the specification for TCP [27]. Over time, a combined to serve as the specification for TCP [33]. Over time, a
number of errata have been identified on RFC 793, as well as number of errata have been identified on RFC 793, as well as
deficiencies in security, performance, and other aspects. A number deficiencies in security, performance, and other aspects. A number
of enhancements has grown and been documented separately. These were of enhancements has grown and been documented separately. These were
never accumulated together into an update to the base specification. never accumulated together into an update to the base specification.
The purpose of this document is to bring together all of the IETF The purpose of this document is to bring together all of the IETF
Standards Track changes that have been made to the basic TCP Standards Track changes that have been made to the basic TCP
functional specification and unify them into an update of the RFC 793 functional specification and unify them into an update of the RFC 793
protocol specification. Some companion documents are referenced for protocol specification. Some companion documents are referenced for
important algorithms that TCP uses (e.g. for congestion control), but important algorithms that TCP uses (e.g. for congestion control), but
skipping to change at page 4, line 43 skipping to change at page 4, line 48
repair losses. repair losses.
This document describes the basic functionality expected in modern This document describes the basic functionality expected in modern
implementations of TCP, and replaces the protocol specification in implementations of TCP, and replaces the protocol specification in
RFC 793. It does not replicate or attempt to update the examples and RFC 793. It does not replicate or attempt to update the examples and
other discussion in RFC 793. Other documents are referenced to other discussion in RFC 793. Other documents are referenced to
provide explanation of the theory of operation, rationale, and provide explanation of the theory of operation, rationale, and
detailed discussion of design decisions. This document only focuses detailed discussion of design decisions. This document only focuses
on the normative behavior of the protocol. on the normative behavior of the protocol.
The "TCP Roadmap" [27] provides a more extensive guide to the RFCs The "TCP Roadmap" [33] provides a more extensive guide to the RFCs
that define TCP and describe various important algorithms. The TCP that define TCP and describe various important algorithms. The TCP
Roadmap contains sections on strongly encouraged enhancements that Roadmap contains sections on strongly encouraged enhancements that
improve performance and other aspects of TCP beyond the basic improve performance and other aspects of TCP beyond the basic
operation specified in this document. As one example, implementing operation specified in this document. As one example, implementing
congestion control (e.g. [18]) is a TCP requirement, but is a complex congestion control (e.g. [21]) is a TCP requirement, but is a complex
topic on its own, and not described in detail in this document, as topic on its own, and not described in detail in this document, as
there are many options and possibilities that do not impact basic there are many options and possibilities that do not impact basic
interoperability. Similarly, most common TCP implementations today interoperability. Similarly, most common TCP implementations today
include the high-performance extensions in [26], but these are not include the high-performance extensions in [31], but these are not
strictly required or discussed in this document. strictly required or discussed in this document.
TEMPORARY EDITOR'S NOTE: This is an early revision in the process of TEMPORARY EDITOR'S NOTE: This is an early revision in the process of
updating RFC 793. Many planned changes are not yet incorporated. updating RFC 793. Many planned changes are not yet incorporated.
***Please do not use this revision as a basis for any work or ***Please do not use this revision as a basis for any work or
reference.*** reference.***
A list of changes from RFC 793 is contained in Section 4. A list of changes from RFC 793 is contained in Section 4.
skipping to change at page 9, line 15 skipping to change at page 9, line 15
Case 2: An octet of option-kind, an octet of option-length, and Case 2: An octet of option-kind, an octet of option-length, and
the actual option-data octets. the actual option-data octets.
The option-length counts the two octets of option-kind and option- The option-length counts the two octets of option-kind and option-
length as well as the option-data octets. length as well as the option-data octets.
Note that the list of options may be shorter than the data offset Note that the list of options may be shorter than the data offset
field might imply. The content of the header beyond the End-of- field might imply. The content of the header beyond the End-of-
Option option must be header padding (i.e., zero). Option option must be header padding (i.e., zero).
The list of all currently defined options is managed by IANA [29], The list of all currently defined options is managed by IANA [35],
and each option is defined in other RFCs, as indicated there. That and each option is defined in other RFCs, as indicated there. That
set includes experimental options that can be extended to support set includes experimental options that can be extended to support
multiple concurrent uses [25]. multiple concurrent uses [30].
A given TCP implementation can support any currently defined A given TCP implementation can support any currently defined
options, but the following options MUST be supported (kind options, but the following options MUST be supported (kind
indicated in octal): indicated in octal):
Kind Length Meaning Kind Length Meaning
---- ------ ------- ---- ------ -------
0 - End of option list. 0 - End of option list.
1 - No-Operation. 1 - No-Operation.
2 4 Maximum Segment Size. 2 4 Maximum Segment Size.
skipping to change at page 18, line 38 skipping to change at page 18, line 38
ISN = M + F(localip, localport, remoteip, remoteport, secretkey) ISN = M + F(localip, localport, remoteip, remoteport, secretkey)
where M is the 4 microsecond timer, and F() is a pseudorandom where M is the 4 microsecond timer, and F() is a pseudorandom
function (PRF) of the connection's identifying parameters ("localip, function (PRF) of the connection's identifying parameters ("localip,
localport, remoteip, remoteport") and a secret key ("secretkey"). localport, remoteip, remoteport") and a secret key ("secretkey").
F() MUST NOT be computable from the outside, or an attacker could F() MUST NOT be computable from the outside, or an attacker could
still guess at sequence numbers from the ISN used for some other still guess at sequence numbers from the ISN used for some other
connection. The PRF could be implemented as a cryptographic has of connection. The PRF could be implemented as a cryptographic has of
the concatenation of the TCP connection parameters and some secret the concatenation of the TCP connection parameters and some secret
data. For discussion of the selection of a specific hash algorithm data. For discussion of the selection of a specific hash algorithm
and management of the secret key data, please see Section 3 of [23]. and management of the secret key data, please see Section 3 of [28].
For each connection there is a send sequence number and a receive For each connection there is a send sequence number and a receive
sequence number. The initial send sequence number (ISS) is chosen by sequence number. The initial send sequence number (ISS) is chosen by
the data sending TCP, and the initial receive sequence number (IRS) the data sending TCP, and the initial receive sequence number (IRS)
is learned during the connection establishing procedure. is learned during the connection establishing procedure.
For a connection to be established or initialized, the two TCPs must For a connection to be established or initialized, the two TCPs must
synchronize on each other's initial sequence numbers. This is done synchronize on each other's initial sequence numbers. This is done
in an exchange of connection establishing segments carrying a control in an exchange of connection establishing segments carrying a control
bit called "SYN" (for synchronize) and the initial sequence numbers. bit called "SYN" (for synchronize) and the initial sequence numbers.
skipping to change at page 31, line 36 skipping to change at page 31, line 36
accept a new SYN from the remote TCP to reopen the connection accept a new SYN from the remote TCP to reopen the connection
directly from TIME-WAIT state, if it: directly from TIME-WAIT state, if it:
(1) assigns its initial sequence number for the new connection to (1) assigns its initial sequence number for the new connection to
be larger than the largest sequence number it used on the previous be larger than the largest sequence number it used on the previous
connection incarnation, and connection incarnation, and
(2) returns to TIME-WAIT state if the SYN turns out to be an old (2) returns to TIME-WAIT state if the SYN turns out to be an old
duplicate. duplicate.
When the TCP Timestamp options are available, an improved algorithm
is described in [26] in order to support higher connection
establishment rates. This algorithm for reducing TIME-WAIT is a Best
Current Practice that SHOULD be implemented, since timestamp options
are commonly used, and using them to reduce TIME-WAIT provides
benefits for busy Internet servers.
3.6. Precedence and Security 3.6. Precedence and Security
TODO - talk to TCPM about what to do about precedence and security TODO - talk to TCPM about what to do about precedence and security
compartment throughout the document ... security compartment material compartment throughout the document ... security compartment material
for IPv4 may be fine nearly as-is, but precedence was a subset of for IPv4 may be fine nearly as-is, but precedence was a subset of
what DSCP includes and it's not clear that running code actually does what DSCP includes and it's not clear that running code actually does
what 793 says about precedence anyways, especially since now as a what 793 says about precedence anyways, especially since now as a
DSCP it doesn't make sense to do greater-than comparisons on, nor to DSCP it doesn't make sense to do greater-than comparisons on, nor to
reset connections if it changes. reset connections if it changes.
skipping to change at page 32, line 34 skipping to change at page 32, line 41
The term "segmentation" refers to the activity TCP performs when The term "segmentation" refers to the activity TCP performs when
ingesting a stream of bytes from a sending application and ingesting a stream of bytes from a sending application and
packetizing that stream of bytes into TCP segments. Individual TCP packetizing that stream of bytes into TCP segments. Individual TCP
segments often do not correspond one-for-one to individual send (or segments often do not correspond one-for-one to individual send (or
socket write) calls from the application. Applications may perform socket write) calls from the application. Applications may perform
writes at the granularity of messages in the upper layer protocol, writes at the granularity of messages in the upper layer protocol,
but TCP guarantees no boundary coherence between the TCP segments but TCP guarantees no boundary coherence between the TCP segments
sent and received versus user application data read or write buffer sent and received versus user application data read or write buffer
boundaries. In some specific protocols, such as RDMA using DDP and boundaries. In some specific protocols, such as RDMA using DDP and
MPA [16], there are performance optimizations possible when the MPA [19], there are performance optimizations possible when the
relation between TCP segments and application data units can be relation between TCP segments and application data units can be
controlled, and MPA includes a specific mechanism for detecting and controlled, and MPA includes a specific mechanism for detecting and
verifying this relationship between TCP segments and application verifying this relationship between TCP segments and application
message data strcutures, but this is specific to applications like message data strcutures, but this is specific to applications like
RDMA. In general, multiple goals influence the sizing of TCP RDMA. In general, multiple goals influence the sizing of TCP
segments created by a TCP implementation. segments created by a TCP implementation.
Goals driving the sending of larger segments include: Goals driving the sending of larger segments include:
o Reducing the number of packets in flight within the network. o Reducing the number of packets in flight within the network.
skipping to change at page 34, line 32 skipping to change at page 34, line 38
o IPoptionsize is the size of any IP options associated with a TCP o IPoptionsize is the size of any IP options associated with a TCP
connection. Note that some options may not be included on all connection. Note that some options may not be included on all
packets, but that for each segment sent, the sender should adjust packets, but that for each segment sent, the sender should adjust
the data length accordingly, within the Eff.snd.MSS. the data length accordingly, within the Eff.snd.MSS.
The MSS value to be sent in an MSS option should be equal to the The MSS value to be sent in an MSS option should be equal to the
effective MTU minus the fixed IP and TCP headers. By ignoring both effective MTU minus the fixed IP and TCP headers. By ignoring both
IP and TCP options when calculating the value for the MSS option, if IP and TCP options when calculating the value for the MSS option, if
there are any IP or TCP options to be sent in a packet, then the there are any IP or TCP options to be sent in a packet, then the
sender must decrease the size of the TCP data accordingly. RFC 6691 sender must decrease the size of the TCP data accordingly. RFC 6691
[24] discusses this in greater detail. [29] discusses this in greater detail.
The MSS value to be sent in an MSS option must be less than or equal The MSS value to be sent in an MSS option must be less than or equal
to: to:
MMS_R - 20 MMS_R - 20
where MMS_R is the maximum size for a transport-layer message that where MMS_R is the maximum size for a transport-layer message that
can be received (and reassembled at the IP layer). TCP obtains MMS_R can be received (and reassembled at the IP layer). TCP obtains MMS_R
and MMS_S from the IP layer; see the generic call GET_MAXSIZES in and MMS_S from the IP layer; see the generic call GET_MAXSIZES in
Section 3.4 of RFC 1122. These are defined in terms of their IP MTU Section 3.4 of RFC 1122. These are defined in terms of their IP MTU
skipping to change at page 35, line 26 skipping to change at page 35, line 32
avoid both on-path (for IPv4) and source fragmentation (IPv4 and avoid both on-path (for IPv4) and source fragmentation (IPv4 and
IPv6). IPv6).
PMTUD for IPv4 [2] or IPv6 [3] is implemented in conjunction between PMTUD for IPv4 [2] or IPv6 [3] is implemented in conjunction between
TCP, IP, and ICMP protocols. It relies both on avoiding source TCP, IP, and ICMP protocols. It relies both on avoiding source
fragmentation and setting the IPv4 DF (don't fragment) flag, the fragmentation and setting the IPv4 DF (don't fragment) flag, the
latter to inhibit on-path fragmentation. It relies on ICMP errors latter to inhibit on-path fragmentation. It relies on ICMP errors
from routers along the path, whenever a segment is too large to from routers along the path, whenever a segment is too large to
traverse a link. Several adjustments to a TCP implementation with traverse a link. Several adjustments to a TCP implementation with
PMTUD are described in RFC 2923 in order to deal with problems PMTUD are described in RFC 2923 in order to deal with problems
experienced in practice [8]. PLPMTUD [15] is a Standards Track experienced in practice [8]. PLPMTUD [16] is a Standards Track
improvement to PMTUD that relaxes the requirement for ICMP support improvement to PMTUD that relaxes the requirement for ICMP support
across a path, and improves performance in cases where ICMP is not across a path, and improves performance in cases where ICMP is not
consistently conveyed, but still tries to avoid source fragmentation. consistently conveyed, but still tries to avoid source fragmentation.
The mechanisms in all four of these RFCs are recommended to be The mechanisms in all four of these RFCs are recommended to be
included in TCP implementations. included in TCP implementations.
The TCP MSS option specifies an upper bound for the size of packets The TCP MSS option specifies an upper bound for the size of packets
that can be received. Hence, setting the value in the MSS option too that can be received. Hence, setting the value in the MSS option too
small can impact the ability for PMTUD or PLPMTUD to find a larger small can impact the ability for PMTUD or PLPMTUD to find a larger
path MTU. RFC 1191 discusses this implication of many older TCP path MTU. RFC 1191 discusses this implication of many older TCP
implementations setting MSS to 536 for non-local destinations, rather implementations setting MSS to 536 for non-local destinations, rather
than deriving it from the MTUs of connected interfaces as than deriving it from the MTUs of connected interfaces as
recommended. recommended.
3.7.3. Interfaces with Variable MTU Values 3.7.3. Interfaces with Variable MTU Values
The effective MTU can sometimes vary, as when used with variable The effective MTU can sometimes vary, as when used with variable
compression, e.g., RObust Header Compression (ROHC) [19]. It is compression, e.g., RObust Header Compression (ROHC) [22]. It is
tempting for TCP to want to advertise the largest possible MSS, to tempting for TCP to want to advertise the largest possible MSS, to
support the most efficient use of compressed payloads. support the most efficient use of compressed payloads.
Unfortunately, some compression schemes occasionally need to transmit Unfortunately, some compression schemes occasionally need to transmit
full headers (and thus smaller payloads) to resynchronize state at full headers (and thus smaller payloads) to resynchronize state at
their endpoint compressors/decompressors. If the largest MTU is used their endpoint compressors/decompressors. If the largest MTU is used
to calculate the value to advertise in the MSS option, TCP to calculate the value to advertise in the MSS option, TCP
retransmission may interfere with compressor resynchronization. retransmission may interfere with compressor resynchronization.
As a result, when the effective MTU of an interface varies, TCP As a result, when the effective MTU of an interface varies, TCP
SHOULD use the smallest effective MTU of the interface to calculate SHOULD use the smallest effective MTU of the interface to calculate
the value to advertise in the MSS option. the value to advertise in the MSS option.
skipping to change at page 36, line 28 skipping to change at page 36, line 34
the outstanding data has been acknowledged or until the TCP can send the outstanding data has been acknowledged or until the TCP can send
a full-sized segment (Eff.snd.MSS bytes). a full-sized segment (Eff.snd.MSS bytes).
TODO - see if SEND description later should be updated to reflect TODO - see if SEND description later should be updated to reflect
this this
A TCP SHOULD implement the Nagle Algorithm to coalesce short A TCP SHOULD implement the Nagle Algorithm to coalesce short
segments. However, there MUST be a way for an application to disable segments. However, there MUST be a way for an application to disable
the Nagle algorithm on an individual connection. In all cases, the Nagle algorithm on an individual connection. In all cases,
sending data is also subject to the limitation imposed by the Slow sending data is also subject to the limitation imposed by the Slow
Start algorithm [18]. Start algorithm [21].
3.7.5. IPv6 Jumbograms 3.7.5. IPv6 Jumbograms
In order to support TCP over IPv6 jumbograms, implementations need to In order to support TCP over IPv6 jumbograms, implementations need to
be able to send TCP segments larger than the 64KB limit that the MSS be able to send TCP segments larger than the 64KB limit that the MSS
option can convey. RFC 2675 [7] defines that an MSS value of 65,535 option can convey. RFC 2675 [7] defines that an MSS value of 65,535
bytes is to be treated as infinity, and Path MTU Discovery [3] is bytes is to be treated as infinity, and Path MTU Discovery [3] is
used to determine the actual MSS. used to determine the actual MSS.
3.8. Data Communication 3.8. Data Communication
skipping to change at page 37, line 51 skipping to change at page 38, line 10
RFC 1122 required implementation of Van Jacobson's congestion control RFC 1122 required implementation of Van Jacobson's congestion control
algorithm combining slow start with congestion avoidance. RFC 2581 algorithm combining slow start with congestion avoidance. RFC 2581
provided IETF Standards Track description of this, along with fast provided IETF Standards Track description of this, along with fast
retransmit and fast recovery. RFC 5681 is the current description of retransmit and fast recovery. RFC 5681 is the current description of
these algorithms and is the current standard for TCP congestion these algorithms and is the current standard for TCP congestion
control. control.
A TCP MUST implement RFC 5681. A TCP MUST implement RFC 5681.
Explicit Congestion Notification (ECN) was defined in RFC 3168 and is Explicit Congestion Notification (ECN) was defined in RFC 3168 and is
an IETF Standards Track enhancement that has many benefits [28]. an IETF Standards Track enhancement that has many benefits [34].
A TCP SHOULD implement ECN as described in RFC 3168. A TCP SHOULD implement ECN as described in RFC 3168.
3.8.3. TCP Connection Failures 3.8.3. TCP Connection Failures
Excessive retransmission of the same segment by TCP indicates some Excessive retransmission of the same segment by TCP indicates some
failure of the remote host or the Internet path. This failure may be failure of the remote host or the Internet path. This failure may be
of short or long duration. The following procedure MUST be used to of short or long duration. The following procedure MUST be used to
handle excessive retransmissions of data segments: handle excessive retransmissions of data segments:
skipping to change at page 39, line 32 skipping to change at page 39, line 40
An implementation SHOULD send a keep-alive segment with no data; An implementation SHOULD send a keep-alive segment with no data;
however, it MAY be configurable to send a keep-alive segment however, it MAY be configurable to send a keep-alive segment
containing one garbage octet, for compatibility with erroneous TCP containing one garbage octet, for compatibility with erroneous TCP
implementations. implementations.
3.8.5. The Communication of Urgent Information 3.8.5. The Communication of Urgent Information
As a result of implementation differences and middlebox interactions, As a result of implementation differences and middlebox interactions,
new applications SHOULD NOT employ the TCP urgent mechanism. new applications SHOULD NOT employ the TCP urgent mechanism.
However, TCP implementations MUST still include support for the However, TCP implementations MUST still include support for the
urgent mechanism. Details can be found in RFC 6093 [21]. urgent mechanism. Details can be found in RFC 6093 [25].
The objective of the TCP urgent mechanism is to allow the sending The objective of the TCP urgent mechanism is to allow the sending
user to stimulate the receiving user to accept some urgent data and user to stimulate the receiving user to accept some urgent data and
to permit the receiving TCP to indicate to the receiving user when to permit the receiving TCP to indicate to the receiving user when
all the currently known urgent data has been received by the user. all the currently known urgent data has been received by the user.
This mechanism permits a point in the data stream to be designated as This mechanism permits a point in the data stream to be designated as
the end of urgent information. Whenever this point is in advance of the end of urgent information. Whenever this point is in advance of
the receive sequence number (RCV.NXT) at the receiving TCP, that TCP the receive sequence number (RCV.NXT) at the receiving TCP, that TCP
must tell the user to go into "urgent mode"; when the receive must tell the user to go into "urgent mode"; when the receive
skipping to change at page 41, line 40 skipping to change at page 41, line 52
(ZWP) in other documents. (ZWP) in other documents.
Probing of zero (offered) windows MUST be supported. Probing of zero (offered) windows MUST be supported.
A TCP MAY keep its offered receive window closed indefinitely. As A TCP MAY keep its offered receive window closed indefinitely. As
long as the receiving TCP continues to send acknowledgments in long as the receiving TCP continues to send acknowledgments in
response to the probe segments, the sending TCP MUST allow the response to the probe segments, the sending TCP MUST allow the
connection to stay open. This enables TCP to function in scenarios connection to stay open. This enables TCP to function in scenarios
such as the "printer ran out of paper" situation described in such as the "printer ran out of paper" situation described in
Section 4.2.2.17 of RFC1122. The behavior is subject to the Section 4.2.2.17 of RFC1122. The behavior is subject to the
implementation's resource management concerns, as noted in [22]. implementation's resource management concerns, as noted in [27].
When the receiving TCP has a zero window and a segment arrives it When the receiving TCP has a zero window and a segment arrives it
must still send an acknowledgment showing its next expected sequence must still send an acknowledgment showing its next expected sequence
number and current window (zero). number and current window (zero).
3.8.6.2. Silly Window Syndrome Avoidance 3.8.6.2. Silly Window Syndrome Avoidance
The "Silly Window Syndrome" (SWS) is a stable pattern of small The "Silly Window Syndrome" (SWS) is a stable pattern of small
incremental window movements resulting in extremely poor TCP incremental window movements resulting in extremely poor TCP
performance. Algorithms to avoid SWS are described below for both performance. Algorithms to avoid SWS are described below for both
skipping to change at page 48, line 14 skipping to change at page 48, line 18
Send Send
Format: SEND (local connection name, buffer address, byte Format: SEND (local connection name, buffer address, byte
count, PUSH flag, URGENT flag [,timeout]) count, PUSH flag, URGENT flag [,timeout])
This call causes the data contained in the indicated user This call causes the data contained in the indicated user
buffer to be sent on the indicated connection. If the buffer to be sent on the indicated connection. If the
connection has not been opened, the SEND is considered an connection has not been opened, the SEND is considered an
error. Some implementations may allow users to SEND first; in error. Some implementations may allow users to SEND first; in
which case, an automatic OPEN would be done. If the calling which case, an automatic OPEN would be done. For example, this
process is not authorized to use this connection, an error is might be one way for application data to be included in SYN
returned. segments. If the calling process is not authorized to use this
connection, an error is returned.
If the PUSH flag is set, the data must be transmitted promptly If the PUSH flag is set, the data must be transmitted promptly
to the receiver, and the PUSH bit will be set in the last TCP to the receiver, and the PUSH bit will be set in the last TCP
segment created from the buffer. If the PUSH flag is not set, segment created from the buffer. If the PUSH flag is not set,
the data may be combined with data from subsequent SENDs for the data may be combined with data from subsequent SENDs for
transmission efficiency. transmission efficiency.
New applications SHOULD NOT set the URGENT flag [21] due to New applications SHOULD NOT set the URGENT flag [25] due to
implementation differences and middlebox issues. implementation differences and middlebox issues.
If the URGENT flag is set, segments sent to the destination TCP If the URGENT flag is set, segments sent to the destination TCP
will have the urgent pointer set. The receiving TCP will will have the urgent pointer set. The receiving TCP will
signal the urgent condition to the receiving process if the signal the urgent condition to the receiving process if the
urgent pointer indicates that data preceding the urgent pointer urgent pointer indicates that data preceding the urgent pointer
has not been consumed by the receiving process. The purpose of has not been consumed by the receiving process. The purpose of
urgent is to stimulate the receiver to process the urgent data urgent is to stimulate the receiver to process the urgent data
and to indicate to the receiver when all the currently known and to indicate to the receiver when all the currently known
urgent data has been received. The number of times the sending urgent data has been received. The number of times the sending
skipping to change at page 53, line 32 skipping to change at page 53, line 37
If the lower level protocol is IPv4 it provides arguments for a type If the lower level protocol is IPv4 it provides arguments for a type
of service (used within the Differentiated Services field) and for a of service (used within the Differentiated Services field) and for a
time to live. TCP uses the following settings for these parameters: time to live. TCP uses the following settings for these parameters:
Type of Service = Precedence: given by user, Delay: normal, Type of Service = Precedence: given by user, Delay: normal,
Throughput: normal, Reliability: normal; or binary XXX00000, where Throughput: normal, Reliability: normal; or binary XXX00000, where
XXX are the three bits determining precedence, e.g. 000 means XXX are the three bits determining precedence, e.g. 000 means
routine precedence. TODO - this is pretty much wrong with regard routine precedence. TODO - this is pretty much wrong with regard
to DiffServ, I think we should just say that the user can specify to DiffServ, I think we should just say that the user can specify
diffserv field (superset of DSCP) and leave it at that, but will diffserv field (superset of DSCP) and mostly leave it at that, but
check with TCPM will check with TCPM. It may also be worth noting that 1122
permits DSCP to change during a connection (section 4.2.4.2) but
the API might not allow it, and the application doesn't know about
individual TCP segments anyways, so this could only be done on a
"coarse" granularity at best. David Black noted that 7657 (sec
5.1, 5.3, and 6) discuss this. Summary from Joe Touch is that it
generally SHOULD NOT be changed, but the RFC series currently
seems to be lacking any mention of when it might be appropriate to
change (it's SHOUND NOT and not MUST NOT).
Time to Live (TTL): The TTL value used to send TCP segments MUST Time to Live (TTL): The TTL value used to send TCP segments MUST
be configurable. be configurable.
Note that RFC 793 specified one minute (60 seconds) as a Note that RFC 793 specified one minute (60 seconds) as a
constant for the TTL, because the assumed maximum segment constant for the TTL, because the assumed maximum segment
lifetime was two minutes. This was intended to explicitly ask lifetime was two minutes. This was intended to explicitly ask
that a segment be destroyed if it cannot be delivered by the that a segment be destroyed if it cannot be delivered by the
internet system within one minute. RFC 1122 changed this internet system within one minute. RFC 1122 changed this
specification to require that the TTL be configurable. specification to require that the TTL be configurable.
skipping to change at page 54, line 35 skipping to change at page 54, line 50
3.9.2.2. ICMP Messages 3.9.2.2. ICMP Messages
TCP MUST act on an ICMP error message passed up from the IP layer, TCP MUST act on an ICMP error message passed up from the IP layer,
directing it to the connection that created the error. The necessary directing it to the connection that created the error. The necessary
demultiplexing information can be found in the IP header contained demultiplexing information can be found in the IP header contained
within the ICMP message. within the ICMP message.
This applies to ICMPv6 in addition to IPv4 ICMP. This applies to ICMPv6 in addition to IPv4 ICMP.
[17] contains discussion of specific ICMP and ICMPv6 messages [20] contains discussion of specific ICMP and ICMPv6 messages
classified as either "soft" or "hard" errors that may bear different classified as either "soft" or "hard" errors that may bear different
responses. Treatment for classes of ICMP messages is described responses. Treatment for classes of ICMP messages is described
below: below:
Source Quench Source Quench
TCP MUST silently discard any received ICMP Source Quench messages. TCP MUST silently discard any received ICMP Source Quench messages.
See [11] for discussion. See [11] for discussion.
Soft Errors Soft Errors
For ICMP these include: Destination Unreachable -- codes 0, 1, 5, For ICMP these include: Destination Unreachable -- codes 0, 1, 5,
Time Exceeded -- codes 0, 1, and Parameter Problem. Time Exceeded -- codes 0, 1, and Parameter Problem.
For ICMPv6 these include: Destination Unreachable -- codes 0 and 3, For ICMPv6 these include: Destination Unreachable -- codes 0 and 3,
Time Exceeded -- codes 0, 1, and Parameter Problem -- codes 0, 1, 2 Time Exceeded -- codes 0, 1, and Parameter Problem -- codes 0, 1, 2
Since these Unreachable messages indicate soft error conditions, Since these Unreachable messages indicate soft error conditions,
TCP MUST NOT abort the connection, and it SHOULD make the TCP MUST NOT abort the connection, and it SHOULD make the
information available to the application. information available to the application.
Hard Errors Hard Errors
For ICMP these include Destination Unreachable -- codes 2-4"> For ICMP these include Destination Unreachable -- codes 2-4">
These are hard error conditions, so TCP SHOULD abort the These are hard error conditions, so TCP SHOULD abort the
connection. [17] notes that some implementations do not abort connection. [20] notes that some implementations do not abort
connections when an ICMP hard error is received for a connection connections when an ICMP hard error is received for a connection
that is in any of the synchronized states. that is in any of the synchronized states.
Note that [17] section 4 describes widespread implementation behavior Note that [20] section 4 describes widespread implementation behavior
that treats soft errors as hard errors during connection that treats soft errors as hard errors during connection
establishment. establishment.
3.10. Event Processing 3.10. Event Processing
The processing depicted in this section is an example of one possible The processing depicted in this section is an example of one possible
implementation. Other implementations may have slightly different implementation. Other implementations may have slightly different
processing sequences, but they should differ from those in this processing sequences, but they should differ from those in this
section only in detail, not in substance. section only in detail, not in substance.
skipping to change at page 68, line 17 skipping to change at page 68, line 17
noted that some stacks in the wild that do not send data noted that some stacks in the wild that do not send data
on the SYN are just checking that SEG.ACK == SND.NXT ... on the SYN are just checking that SEG.ACK == SND.NXT ...
think about whether anything should be said about that think about whether anything should be said about that
here) here)
second check the RST bit second check the RST bit
If the RST bit is set If the RST bit is set
A potential blind reset attack is described in RFC 5961 A potential blind reset attack is described in RFC 5961
[20], with the mitigation that a TCP implementation [24], with the mitigation that a TCP implementation
SHOULD first check that the sequence number exactly SHOULD first check that the sequence number exactly
matches RCV.NXT prior to executing the action in the next matches RCV.NXT prior to executing the action in the next
paragraph. paragraph.
If the ACK was acceptable then signal the user "error: If the ACK was acceptable then signal the user "error:
connection reset", drop the segment, enter CLOSED state, connection reset", drop the segment, enter CLOSED state,
delete TCB, and return. Otherwise (no ACK) drop the delete TCB, and return. Otherwise (no ACK) drop the
segment and return. segment and return.
third check the security and precedence third check the security and precedence
skipping to change at page 69, line 50 skipping to change at page 69, line 50
and send it. Set the variables: and send it. Set the variables:
SND.WND <- SEG.WND SND.WND <- SEG.WND
SND.WL1 <- SEG.SEQ SND.WL1 <- SEG.SEQ
SND.WL2 <- SEG.ACK SND.WL2 <- SEG.ACK
If there are other controls or text in the segment, queue If there are other controls or text in the segment, queue
them for processing after the ESTABLISHED state has been them for processing after the ESTABLISHED state has been
reached, return. reached, return.
Note that it is legal to send and receive application data
on SYN segments (this is the "text in the segment" mentioned
above. There has been significant misinformation and
misunderstanding of this topic historically. Some firewalls
and security devices consider this suspicious. However, the
capability was used in T/TCP [15] and is used in TCP Fast
Open (TFO) [32], so is important for implementations and
network devices to permit.
fifth, if neither of the SYN or RST bits is set then drop the fifth, if neither of the SYN or RST bits is set then drop the
segment and return. segment and return.
Otherwise, Otherwise,
first check sequence number first check sequence number
SYN-RECEIVED STATE SYN-RECEIVED STATE
ESTABLISHED STATE ESTABLISHED STATE
FIN-WAIT-1 STATE FIN-WAIT-1 STATE
skipping to change at page 71, line 8 skipping to change at page 71, line 18
If an incoming segment is not acceptable, an acknowledgment If an incoming segment is not acceptable, an acknowledgment
should be sent in reply (unless the RST bit is set, if so should be sent in reply (unless the RST bit is set, if so
drop the segment and return): drop the segment and return):
<SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
After sending the acknowledgment, drop the unacceptable After sending the acknowledgment, drop the unacceptable
segment and return. segment and return.
Note that for the TIME-WAIT state, there is an improved
algorithm described in [26] for handling incoming SYN
segments, that utilizes timestamps rather than relying on
the sequence number check described here. When the improved
algorithm is implemented, the logic above is not applicable
for incoming SYN segments with timestamp options, received
on a connection in the TIME-WAIT state.
In the following it is assumed that the segment is the In the following it is assumed that the segment is the
idealized segment that begins at RCV.NXT and does not exceed idealized segment that begins at RCV.NXT and does not exceed
the window. One could tailor actual segments to fit this the window. One could tailor actual segments to fit this
assumption by trimming off any portions that lie outside the assumption by trimming off any portions that lie outside the
window (including SYN and FIN), and only processing further window (including SYN and FIN), and only processing further
if the segment then begins at RCV.NXT. Segments with higher if the segment then begins at RCV.NXT. Segments with higher
beginning sequence numbers should be held for later beginning sequence numbers should be held for later
processing. processing.
In general, the processing of received segments MUST be In general, the processing of received segments MUST be
skipping to change at page 73, line 37 skipping to change at page 74, line 7
ESTABLISHED STATE ESTABLISHED STATE
FIN-WAIT STATE-1 FIN-WAIT STATE-1
FIN-WAIT STATE-2 FIN-WAIT STATE-2
CLOSE-WAIT STATE CLOSE-WAIT STATE
CLOSING STATE CLOSING STATE
LAST-ACK STATE LAST-ACK STATE
TIME-WAIT STATE TIME-WAIT STATE
If the SYN bit is set in these synchronized states, it If the SYN bit is set in these synchronized states, it
may be either an error where the connection should be may be either a legitimate new connection attempt (e.g.
reset, or the result of an attack attempt, as described in the case of TIME-WAIT), an error where the connection
in RFC 5961 [20]. RFC 5961 provides a mitigation that should be reset, or the result of an attack attempt, as
SHOULD be implemented, though there are alternatives (see described in RFC 5961 [24]. For the TIME-WAIT state, new
connections can be accepted if the timestamp option is
used and meets expectations (per [26]). For all other
caess, RFC 5961 provides a mitigation that SHOULD be
implemented, though there are alternatives (see
Section 6). RFC 5961 recommends that in these Section 6). RFC 5961 recommends that in these
synchronized states, if the SYN bit is set, irrespective synchronized states, if the SYN bit is set, irrespective
of the sequence number, TCP MUST send a "challenge ACK" of the sequence number, TCP MUST send a "challenge ACK"
to the remote peer: to the remote peer:
<SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
After sending the acknowledgement, TCP MUST drop the After sending the acknowledgement, TCP MUST drop the
unacceptable segment and stop processing further. Note unacceptable segment and stop processing further. Note
that RFC 5961 and Errata ID 4772 contain additional ACK that RFC 5961 and Errata ID 4772 contain additional ACK
skipping to change at page 82, line 27 skipping to change at page 83, line 27
minutes. minutes.
octet octet
An eight bit byte. An eight bit byte.
Options Options
An Option field may contain several options, and each option An Option field may contain several options, and each option
may be several octets in length. The options are used may be several octets in length. The options are used
primarily in testing situations; for example, to carry primarily in testing situations; for example, to carry
timestamps. Both the Internet Protocol and TCP provide for timestamps. Both the Internet Protocol and TCP provide for
options fields. options fields. -- TODO not primarily testing anymore!
packet packet
A package of data with a header which may or may not be A package of data with a header which may or may not be
logically complete. More often a physical packaging than a logically complete. More often a physical packaging than a
logical packaging of data. logical packaging of data.
port port
The portion of a socket that specifies which logical input or The portion of a socket that specifies which logical input or
output channel of a process is associated with the data. output channel of a process is associated with the data.
skipping to change at page 89, line 31 skipping to change at page 90, line 31
The -06 revision includes an appendix on "Other Implementation Notes" The -06 revision includes an appendix on "Other Implementation Notes"
to capture widely-deployed fundamental features that are not to capture widely-deployed fundamental features that are not
contained in the RFC series yet. It also added mention of RFC 6994 contained in the RFC series yet. It also added mention of RFC 6994
and the IANA TCP parameters registry as a reference. It includes and the IANA TCP parameters registry as a reference. It includes
references to RFC 5961 in appropriate places. The references to TOS references to RFC 5961 in appropriate places. The references to TOS
were changed to DiffServ field, based on reflecting RFC 2474 as well were changed to DiffServ field, based on reflecting RFC 2474 as well
as the IPv6 presence of traffic class (carrying DiffServ field) as the IPv6 presence of traffic class (carrying DiffServ field)
rather than TOS. rather than TOS.
TODO list of other planned changes (these can be added to or made The -07 revision includes reference to RFC 6191, updated security
more specific, as the document proceeds): considerations, discussion of additional implementation
considerations, and clarification of data on the SYN.
1. mention 6161 (reducing TIME-WAIT)
2. clarify data on SYN from Michael Welzl
Some other suggested changes that will not be incorporated in this Some other suggested changes that will not be incorporated in this
793 update unless TCPM consensus changes with regard to scope are: 793 update unless TCPM consensus changes with regard to scope are:
1. look at Tony Sabatini suggestion for describing DO field 1. look at Tony Sabatini suggestion for describing DO field
2. clearly specify treatment of reserved bits (see TCPM thread on 2. clearly specify treatment of reserved bits (see TCPM thread on
EDO draft April 25, 2014) -- TODO - an attempt at this is EDO draft April 25, 2014) -- TODO - an attempt at this is
actually in -06, but needs to be confirmed by TCPM explicitly actually in -06, but needs to be confirmed by TCPM explicitly
since there is no RFC reference since there is no RFC reference
3. per discussion with Joe Touch (TAPS list, 6/20/2015), the 3. per discussion with Joe Touch (TAPS list, 6/20/2015), the
skipping to change at page 90, line 15 skipping to change at page 91, line 10
5. IANA Considerations 5. IANA Considerations
This memo includes no request to IANA. Existing IANA registries for This memo includes no request to IANA. Existing IANA registries for
TCP parameters are sufficient. TCP parameters are sufficient.
TODO: check whether entries pointing to 793 and other documents TODO: check whether entries pointing to 793 and other documents
obsoleted by this one should be updated to point to this one instead. obsoleted by this one should be updated to point to this one instead.
6. Security and Privacy Considerations 6. Security and Privacy Considerations
TODO The TCP design includes only rudimentary security features that
improve the robustness and reliability of connections and application
data transfer, but there are no built-in capabilities to support any
form of privacy, authentication, or other typical security functions.
Applications typically utilize lower-layer (e.g. IPsec) and upper-
layer (e.g. TLS) protocols to provide security and privacy for TCP
connections and application data carried in TCP. TCP options are
available as well, to support some security capabilities.
See RFC 6093 [21] for discussion of security considerations related Applications using long-lived TCP flows have been vulnerable to
to the urgent pointer field. attacks that exploit the processing of control flags described in
earlier TCP specifications [18]. TCP-MD5 was a commonly implemented
TCP option to support authentication for some of these connections,
but had flaws and is now deprecated. The TCP Authentication Option
(TCP AO) [23] provides a capability to protect long-lived TCP
connections from attacks, and has superior properties to TCP-MD5. It
does not provide any privacy for application data, nor for the TCP
headers.
Editor's Note: Scott Brim mentioned that this should include a The "tcpcrypt" [38]Experimental extension to TCP provides the ability
PERPASS/privacy review. to cryptographically protect connection data. Metadata aspects of
the TCP flow are still visible, but the application stream is well-
protected. Within the TCP header, only the urgent pointer and FIN
flag are protected through tcpcrypt.
The TCP Roadmap [33] includes notes about several RFCs related to TCP
security. Many of the enhancements provided by these RFCs have been
integrated into the present document, including ISN generation,
mitigating blind in-window attacks, and improving handling of soft
errors and ICMP packets. These are all discussed in greater detail
in the referenced RFCs that originally described the changes needed
to earlier TCP specifications. Additionally, see RFC 6093 [25] for
discussion of security considerations related to the urgent pointer
field, that has been deprecated.
Since TCP is often used for bulk transfer flows, some attacks are
possible that abuse the TCP congestion control logic. An example is
"ACK-division" attacks. Updates that have been made to the TCP
congestion control specifications include mechanisms like Appropriate
Byte Counting (ABC) that act as mitigations to these attacks.
Other attacks are focused on exhausting the resources of a TCP
server. Examples include SYN flooding [17] or wasting resources on
non-progressing connections [27]. Operating systems commonly
implement mitigations for these attacks. Some common defenses also
utilize proxies, stateful firewalls, and other technologies outside
of the end-host TCP implementation.
TODO Editor's Note: Scott Brim mentioned that this should include a
PERPASS/privacy review ... Is this relevant anymore? Is it something
for the chairs or AD to request during WGLC or IETF LC?
7. Acknowledgements 7. Acknowledgements
This document is largely a revision of RFC 793, which Jon Postel was This document is largely a revision of RFC 793, which Jon Postel was
the editor of. Due to his excellent work, it was able to last for the editor of. Due to his excellent work, it was able to last for
three decades before we felt the need to revise it. three decades before we felt the need to revise it.
Andre Oppermann was a contributor and helped to edit the first Andre Oppermann was a contributor and helped to edit the first
revision of this document. revision of this document.
skipping to change at page 91, line 11 skipping to change at page 92, line 49
(listed chronologically): Yin Shuming, Bob Braden, Morris M. Keesan, (listed chronologically): Yin Shuming, Bob Braden, Morris M. Keesan,
Pei-chun Cheng, Constantin Hagemeier, Vishwas Manral, Mykyta Pei-chun Cheng, Constantin Hagemeier, Vishwas Manral, Mykyta
Yevstifeyev, EungJun Yi, Botong Huang. Yevstifeyev, EungJun Yi, Botong Huang.
8. References 8. References
8.1. Normative References 8.1. Normative References
[1] Postel, J., "Internet Protocol", STD 5, RFC 791, [1] Postel, J., "Internet Protocol", STD 5, RFC 791,
DOI 10.17487/RFC0791, September 1981, DOI 10.17487/RFC0791, September 1981,
<http://www.rfc-editor.org/info/rfc791>. <https://www.rfc-editor.org/info/rfc791>.
[2] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, [2] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
DOI 10.17487/RFC1191, November 1990, DOI 10.17487/RFC1191, November 1990,
<http://www.rfc-editor.org/info/rfc1191>. <https://www.rfc-editor.org/info/rfc1191>.
[3] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery [3] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August
1996, <http://www.rfc-editor.org/info/rfc1981>. 1996, <https://www.rfc-editor.org/info/rfc1981>.
[4] Bradner, S., "Key words for use in RFCs to Indicate [4] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[5] Deering, S. and R. Hinden, "Internet Protocol, Version 6 [5] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460,
December 1998, <http://www.rfc-editor.org/info/rfc2460>. December 1998, <https://www.rfc-editor.org/info/rfc2460>.
[6] Nichols, K., Blake, S., Baker, F., and D. Black, [6] Nichols, K., Blake, S., Baker, F., and D. Black,
"Definition of the Differentiated Services Field (DS "Definition of the Differentiated Services Field (DS
Field) in the IPv4 and IPv6 Headers", RFC 2474, Field) in the IPv4 and IPv6 Headers", RFC 2474,
DOI 10.17487/RFC2474, December 1998, DOI 10.17487/RFC2474, December 1998,
<http://www.rfc-editor.org/info/rfc2474>. <https://www.rfc-editor.org/info/rfc2474>.
[7] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", [7] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms",
RFC 2675, DOI 10.17487/RFC2675, August 1999, RFC 2675, DOI 10.17487/RFC2675, August 1999,
<http://www.rfc-editor.org/info/rfc2675>. <https://www.rfc-editor.org/info/rfc2675>.
[8] Lahey, K., "TCP Problems with Path MTU Discovery", [8] Lahey, K., "TCP Problems with Path MTU Discovery",
RFC 2923, DOI 10.17487/RFC2923, September 2000, RFC 2923, DOI 10.17487/RFC2923, September 2000,
<http://www.rfc-editor.org/info/rfc2923>. <https://www.rfc-editor.org/info/rfc2923>.
[9] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [9] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001, RFC 3168, DOI 10.17487/RFC3168, September 2001,
<http://www.rfc-editor.org/info/rfc3168>. <https://www.rfc-editor.org/info/rfc3168>.
[10] Paxson, V., Allman, M., Chu, J., and M. Sargent, [10] Paxson, V., Allman, M., Chu, J., and M. Sargent,
"Computing TCP's Retransmission Timer", RFC 6298, "Computing TCP's Retransmission Timer", RFC 6298,
DOI 10.17487/RFC6298, June 2011, DOI 10.17487/RFC6298, June 2011,
<http://www.rfc-editor.org/info/rfc6298>. <https://www.rfc-editor.org/info/rfc6298>.
[11] Gont, F., "Deprecation of ICMP Source Quench Messages", [11] Gont, F., "Deprecation of ICMP Source Quench Messages",
RFC 6633, DOI 10.17487/RFC6633, May 2012, RFC 6633, DOI 10.17487/RFC6633, May 2012,
<http://www.rfc-editor.org/info/rfc6633>. <https://www.rfc-editor.org/info/rfc6633>.
8.2. Informative References 8.2. Informative References
[12] Postel, J., "Transmission Control Protocol", STD 7, [12] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, DOI 10.17487/RFC0793, September 1981, RFC 793, DOI 10.17487/RFC0793, September 1981,
<http://www.rfc-editor.org/info/rfc793>. <https://www.rfc-editor.org/info/rfc793>.
[13] Nagle, J., "Congestion Control in IP/TCP Internetworks", [13] Nagle, J., "Congestion Control in IP/TCP Internetworks",
RFC 896, DOI 10.17487/RFC0896, January 1984, RFC 896, DOI 10.17487/RFC0896, January 1984,
<http://www.rfc-editor.org/info/rfc896>. <https://www.rfc-editor.org/info/rfc896>.
[14] Braden, R., Ed., "Requirements for Internet Hosts - [14] Braden, R., Ed., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, Communication Layers", STD 3, RFC 1122,
DOI 10.17487/RFC1122, October 1989, DOI 10.17487/RFC1122, October 1989,
<http://www.rfc-editor.org/info/rfc1122>. <https://www.rfc-editor.org/info/rfc1122>.
[15] Mathis, M. and J. Heffner, "Packetization Layer Path MTU [15] Braden, R., "T/TCP -- TCP Extensions for Transactions
Functional Specification", RFC 1644, DOI 10.17487/RFC1644,
July 1994, <https://www.rfc-editor.org/info/rfc1644>.
[16] Mathis, M. and J. Heffner, "Packetization Layer Path MTU
Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007,
<http://www.rfc-editor.org/info/rfc4821>. <https://www.rfc-editor.org/info/rfc4821>.
[16] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. [17] Eddy, W., "TCP SYN Flooding Attacks and Common
Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007,
<https://www.rfc-editor.org/info/rfc4987>.
[18] Touch, J., "Defending TCP Against Spoofing Attacks",
RFC 4953, DOI 10.17487/RFC4953, July 2007,
<https://www.rfc-editor.org/info/rfc4953>.
[19] Culley, P., Elzur, U., Recio, R., Bailey, S., and J.
Carrier, "Marker PDU Aligned Framing for TCP Carrier, "Marker PDU Aligned Framing for TCP
Specification", RFC 5044, DOI 10.17487/RFC5044, October Specification", RFC 5044, DOI 10.17487/RFC5044, October
2007, <http://www.rfc-editor.org/info/rfc5044>. 2007, <https://www.rfc-editor.org/info/rfc5044>.
[17] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461, [20] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461,
DOI 10.17487/RFC5461, February 2009, DOI 10.17487/RFC5461, February 2009,
<http://www.rfc-editor.org/info/rfc5461>. <https://www.rfc-editor.org/info/rfc5461>.
[18] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [21] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
<http://www.rfc-editor.org/info/rfc5681>. <https://www.rfc-editor.org/info/rfc5681>.
[19] Sandlund, K., Pelletier, G., and L-E. Jonsson, "The RObust [22] Sandlund, K., Pelletier, G., and L-E. Jonsson, "The RObust
Header Compression (ROHC) Framework", RFC 5795, Header Compression (ROHC) Framework", RFC 5795,
DOI 10.17487/RFC5795, March 2010, DOI 10.17487/RFC5795, March 2010,
<http://www.rfc-editor.org/info/rfc5795>. <https://www.rfc-editor.org/info/rfc5795>.
[20] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's [23] Touch, J., Mankin, A., and R. Bonica, "The TCP
Authentication Option", RFC 5925, DOI 10.17487/RFC5925,
June 2010, <https://www.rfc-editor.org/info/rfc5925>.
[24] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's
Robustness to Blind In-Window Attacks", RFC 5961, Robustness to Blind In-Window Attacks", RFC 5961,
DOI 10.17487/RFC5961, August 2010, DOI 10.17487/RFC5961, August 2010,
<http://www.rfc-editor.org/info/rfc5961>. <https://www.rfc-editor.org/info/rfc5961>.
[21] Gont, F. and A. Yourtchenko, "On the Implementation of the [25] Gont, F. and A. Yourtchenko, "On the Implementation of the
TCP Urgent Mechanism", RFC 6093, DOI 10.17487/RFC6093, TCP Urgent Mechanism", RFC 6093, DOI 10.17487/RFC6093,
January 2011, <http://www.rfc-editor.org/info/rfc6093>. January 2011, <https://www.rfc-editor.org/info/rfc6093>.
[22] Bashyam, M., Jethanandani, M., and A. Ramaiah, "TCP Sender [26] Gont, F., "Reducing the TIME-WAIT State Using TCP
Timestamps", BCP 159, RFC 6191, DOI 10.17487/RFC6191,
April 2011, <https://www.rfc-editor.org/info/rfc6191>.
[27] Bashyam, M., Jethanandani, M., and A. Ramaiah, "TCP Sender
Clarification for Persist Condition", RFC 6429, Clarification for Persist Condition", RFC 6429,
DOI 10.17487/RFC6429, December 2011, DOI 10.17487/RFC6429, December 2011,
<http://www.rfc-editor.org/info/rfc6429>. <https://www.rfc-editor.org/info/rfc6429>.
[23] Gont, F. and S. Bellovin, "Defending against Sequence [28] Gont, F. and S. Bellovin, "Defending against Sequence
Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February
2012, <http://www.rfc-editor.org/info/rfc6528>. 2012, <https://www.rfc-editor.org/info/rfc6528>.
[24] Borman, D., "TCP Options and Maximum Segment Size (MSS)", [29] Borman, D., "TCP Options and Maximum Segment Size (MSS)",
RFC 6691, DOI 10.17487/RFC6691, July 2012, RFC 6691, DOI 10.17487/RFC6691, July 2012,
<http://www.rfc-editor.org/info/rfc6691>. <https://www.rfc-editor.org/info/rfc6691>.
[25] Touch, J., "Shared Use of Experimental TCP Options", [30] Touch, J., "Shared Use of Experimental TCP Options",
RFC 6994, DOI 10.17487/RFC6994, August 2013, RFC 6994, DOI 10.17487/RFC6994, August 2013,
<http://www.rfc-editor.org/info/rfc6994>. <https://www.rfc-editor.org/info/rfc6994>.
[26] Borman, D., Braden, B., Jacobson, V., and R. [31] Borman, D., Braden, B., Jacobson, V., and R.
Scheffenegger, Ed., "TCP Extensions for High Performance", Scheffenegger, Ed., "TCP Extensions for High Performance",
RFC 7323, DOI 10.17487/RFC7323, September 2014, RFC 7323, DOI 10.17487/RFC7323, September 2014,
<http://www.rfc-editor.org/info/rfc7323>. <https://www.rfc-editor.org/info/rfc7323>.
[27] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. [32] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
<https://www.rfc-editor.org/info/rfc7413>.
[33] Duke, M., Braden, R., Eddy, W., Blanton, E., and A.
Zimmermann, "A Roadmap for Transmission Control Protocol Zimmermann, "A Roadmap for Transmission Control Protocol
(TCP) Specification Documents", RFC 7414, (TCP) Specification Documents", RFC 7414,
DOI 10.17487/RFC7414, February 2015, DOI 10.17487/RFC7414, February 2015,
<http://www.rfc-editor.org/info/rfc7414>. <https://www.rfc-editor.org/info/rfc7414>.
[28] Fairhurst, G. and M. Welzl, "The Benefits of Using [34] Fairhurst, G. and M. Welzl, "The Benefits of Using
Explicit Congestion Notification (ECN)", RFC 8087, Explicit Congestion Notification (ECN)", RFC 8087,
DOI 10.17487/RFC8087, March 2017, DOI 10.17487/RFC8087, March 2017,
<http://www.rfc-editor.org/info/rfc8087>. <https://www.rfc-editor.org/info/rfc8087>.
[29] IANA, "Transmission Control Protocol (TCP) Parameters, [35] IANA, "Transmission Control Protocol (TCP) Parameters,
https://www.iana.org/assignments/tcp-parameters/tcp- https://www.iana.org/assignments/tcp-parameters/
parameters.xhtml", 2017. tcp-parameters.xhtml", 2017.
[36] Gont, F., "Processing of IP Security/Compartment and
Precedence Information by TCP", draft-gont-tcpm-tcp-
seccomp-prec-00 (work in progress), March 2012.
[37] Gont, F. and D. Borman, "On the Validation of TCP Sequence
Numbers", draft-gont-tcpm-tcp-seq-validation-02 (work in
progress), March 2015.
[38] Bittau, A., Giffin, D., Handley, M., Mazieres, D., Slack,
Q., and E. Smith, "Cryptographic protection of TCP Streams
(tcpcrypt)", draft-ietf-tcpinc-tcpcrypt-09 (work in
progress), November 2017.
[39] Minshall, G., "A Proposed Modification to Nagle's
Algorithm", draft-minshall-nagle-01 (work in progress),
June 1999.
Appendix A. Other Implementation Notes Appendix A. Other Implementation Notes
TODO - mention draft-gont-tcpm-tcp-seccomp-prec - per IETF 99 TCPM This section includes additional notes and references on TCP
discussion implementation decisions that are currently not a part of the RFC
series or included within the TCP standard. These items can be
considered by implementers, but there was not yet a consensus to
include them in the standard.
TODO - mention draft-gont-tcpm-tcp-seq-validation - per IETF 99 TCPM A.1. IP Security Compartment and Precedence
discussion
TODO - mention the draft-minshall Nagle variation that is in Linux - The TCP standard requires checking the IP security compartment and
suggested by Yuchung Cheng precedence on incoming TCP segments for consistency within a
connection.
In common Internet usage of TCP, the IP security compartment is not
used. IP precedence has been deprecated with the introduction of
DiffServ many years ago.
Reseting connections when incoming packets do not meet expected
security compartment and precedence expectations has been recognized
as a possible attack vector [36], and the document advises ammending
the TCP specification to prevent connections from being aborted due
to non-matching IP security compartment and DiffServ codepoint
values.
A.2. Sequence Number Validation
There are cases where the TCP sequence number validation rules can
prevent ACK fields from being processed. This can result in
connection issues, as described in [37], which includes descriptions
of potential problems in conditions of simultaneous open, self-
connects, simultaneous close, and simultaneous window probes. The
document also describes potential changes to the TCP specification to
mitigate the issue by expanding the acceptable sequence numbers.
In Internet usage of TCP, these conditions are rarely occuring.
Common operating systems include different alternative mitigations,
and the standard has not been updated yet to codify one of them, but
implementers should consider the problems described in [37].
A.3. Nagle Modification
In common operating systems, both the Nagle algorithm and delayed
acknowledgements are implemented and enabled by default. TCP is used
by many applications that have a request-response style of
communication, where the combination of the Nagle algorithm and
delayed acknowledgements can result in poor application performance.
A modification to the Nagle algorithm is described in [39] that
improves the situation for these applications.
This modification is implemented in some common operating systems,
and does not impact TCP interoperability. Additionally, many
applications simply disable Nagle, since this is generally supported
by a socket option. The TCP standard has not been updated to include
this Nagle modification, but implementers may find it beneficial to
consider.
A.4. Low Water Mark
TODO - mention the low watermark function that is in Linux - TODO - mention the low watermark function that is in Linux -
suggested by Michael Welzl suggested by Michael Welzl
SO_SNDLOWAT and SO_RCVLOWAT would be potential enhancements to the
abstract TCP API
TCP_NOTSENT_LOWAT is what Michael is talking about, that helps a
sending TCP application to help avoid creating large amounts of
buffered data (and corresponding latency). This is useful for
applications that are multiplexing data from multiple upper level
streams onto a connection, especially when streams may be a mix of
interactive/realtime and bulk data transfer.
Appendix B. TCP Requirement Summary Appendix B. TCP Requirement Summary
This section is adapted from RFC 1122. This section is adapted from RFC 1122.
TODO: this needs to be seriously redone, to use 793bis section TODO: this needs to be seriously redone, to use 793bis section
numbers instead of 1122 ones, the RFC1122 heading should be removed, numbers instead of 1122 ones, the RFC1122 heading should be removed,
and all 1122 requirements need to be reflected in 793bis text. and all 1122 requirements need to be reflected in 793bis text.
TODO: NOTE that PMTUD+PLPMTUD is not included in this table of TODO: NOTE that PMTUD+PLPMTUD is not included in this table of
recommendations. recommendations.
skipping to change at page 95, line 52 skipping to change at page 99, line 49
OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x| OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x|
Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | | Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | |
| | | | | | | | | | | | | |
Closing Connections | | | | | | | Closing Connections | | | | | | |
RST can contain data |4.2.2.12| |x| | | | RST can contain data |4.2.2.12| |x| | | |
Inform application of aborted conn |4.2.2.13|x| | | | | Inform application of aborted conn |4.2.2.13|x| | | | |
Half-duplex close connections |4.2.2.13| | |x| | | Half-duplex close connections |4.2.2.13| | |x| | |
Send RST to indicate data lost |4.2.2.13| |x| | | | Send RST to indicate data lost |4.2.2.13| |x| | | |
In TIME-WAIT state for 2MSL seconds |4.2.2.13|x| | | | | In TIME-WAIT state for 2MSL seconds |4.2.2.13|x| | | | |
Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | | Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | |
Use Timestamps to reduce TIME-WAIT | TODO | | | | | |
| | | | | | | | | | | | | |
Retransmissions | | | | | | | Retransmissions | | | | | | |
Jacobson Slow Start algorithm |4.2.2.15|x| | | | | Jacobson Slow Start algorithm |4.2.2.15|x| | | | |
Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | | Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | |
Retransmit with same IP ident |4.2.2.15| | |x| | | Retransmit with same IP ident |4.2.2.15| | |x| | |
Karn's algorithm |4.2.3.1 |x| | | | | Karn's algorithm |4.2.3.1 |x| | | | |
Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | | Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | |
Exponential backoff |4.2.3.1 |x| | | | | Exponential backoff |4.2.3.1 |x| | | | |
SYN RTO calc same as data |4.2.3.1 | |x| | | | SYN RTO calc same as data |4.2.3.1 | |x| | | |
Recommended initial values and bounds |4.2.3.1 | |x| | | | Recommended initial values and bounds |4.2.3.1 | |x| | | |
| | | | | | | | | | | | | |
 End of changes. 89 change blocks. 
109 lines changed or deleted 293 lines changed or added

This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/