draft-ietf-tcpm-rfc793bis-02.txt   draft-ietf-tcpm-rfc793bis-03.txt 
Internet Engineering Task Force W. Eddy, Ed. Internet Engineering Task Force W. Eddy, Ed.
Internet-Draft MTI Systems Internet-Draft MTI Systems
Obsoletes: 793, 879, 6093, 6528, 6691 March 21, 2016 Obsoletes: 793, 879, 6093, 6528, 6691 August 1, 2016
(if approved) (if approved)
Updates: 1122 (if approved) Updates: 1122 (if approved)
Intended status: Standards Track Intended status: Standards Track
Expires: September 22, 2016 Expires: February 2, 2017
Transmission Control Protocol Specification Transmission Control Protocol Specification
draft-ietf-tcpm-rfc793bis-02 draft-ietf-tcpm-rfc793bis-03
Abstract Abstract
This document specifies the Internet's Transmission Control Protocol This document specifies the Internet's Transmission Control Protocol
(TCP). TCP is an important transport layer protocol in the Internet (TCP). TCP is an important transport layer protocol in the Internet
stack, and has continuously evolved over decades of use and growth of stack, and has continuously evolved over decades of use and growth of
the Internet. Over this time, a number of changes have been made to the Internet. Over this time, a number of changes have been made to
TCP as it was specified in RFC 793, though these have only been TCP as it was specified in RFC 793, though these have only been
documented in a piecemeal fashion. This document collects and brings documented in a piecemeal fashion. This document collects and brings
those changes together with the protocol specification from RFC 793. those changes together with the protocol specification from RFC 793.
skipping to change at page 2, line 4 skipping to change at page 2, line 4
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 22, 2016. This Internet-Draft will expire on February 2, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 37 skipping to change at page 2, line 37
the copyright in such materials, this document may not be modified the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other it for publication as an RFC or to translate it into languages other
than English. than English.
Table of Contents Table of Contents
1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 3 1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Functional Specification . . . . . . . . . . . . . . . . . . 4 3. Functional Specification . . . . . . . . . . . . . . . . . . 5
3.1. Header Format . . . . . . . . . . . . . . . . . . . . . . 4 3.1. Header Format . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 9 3.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 9
3.3. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 14 3.3. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 14
3.4. Establishing a connection . . . . . . . . . . . . . . . . 20 3.4. Establishing a connection . . . . . . . . . . . . . . . . 20
3.5. Closing a Connection . . . . . . . . . . . . . . . . . . 27 3.5. Closing a Connection . . . . . . . . . . . . . . . . . . 27
3.5.1. Half-Closed Connections . . . . . . . . . . . . . . . 30 3.5.1. Half-Closed Connections . . . . . . . . . . . . . . . 30
3.6. Precedence and Security . . . . . . . . . . . . . . . . . 30 3.6. Precedence and Security . . . . . . . . . . . . . . . . . 30
3.7. Segmentation . . . . . . . . . . . . . . . . . . . . . . 31 3.7. Segmentation . . . . . . . . . . . . . . . . . . . . . . 31
3.7.1. Maximum Segment Size Option . . . . . . . . . . . . . 32 3.7.1. Maximum Segment Size Option . . . . . . . . . . . . . 32
3.7.2. Path MTU Discovery . . . . . . . . . . . . . . . . . 33 3.7.2. Path MTU Discovery . . . . . . . . . . . . . . . . . 33
3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 34 3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 34
3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 34 3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 34
3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 35 3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 35
3.8. Data Communication . . . . . . . . . . . . . . . . . . . 35 3.8. Data Communication . . . . . . . . . . . . . . . . . . . 35
3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 39 3.8.1. Retransmission Timeout . . . . . . . . . . . . . . . 36
3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 39 3.8.2. The Communication of Urgent Information . . . . . . . 36
3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 47 3.8.3. Managing the Window . . . . . . . . . . . . . . . . . 37
3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 47 3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 41
3.11. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 71 3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 42
4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 76 3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 50
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 80 3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 52
6. Security and Privacy Considerations . . . . . . . . . . . . . 80 3.11. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 75
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 80 4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 80
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 81 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 84
8.1. Normative References . . . . . . . . . . . . . . . . . . 81 6. Security and Privacy Considerations . . . . . . . . . . . . . 84
8.2. Informative References . . . . . . . . . . . . . . . . . 81 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 84
Appendix A. TCP Requirement Summary . . . . . . . . . . . . . . 82 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 85
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 86 8.1. Normative References . . . . . . . . . . . . . . . . . . 85
8.2. Informative References . . . . . . . . . . . . . . . . . 85
Appendix A. TCP Requirement Summary . . . . . . . . . . . . . . 86
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 90
1. Purpose and Scope 1. Purpose and Scope
In 1981, RFC 793 [6] was released, documenting the Transmission In 1981, RFC 793 [6] was released, documenting the Transmission
Control Protocol (TCP), and replacing earlier specifications for TCP Control Protocol (TCP), and replacing earlier specifications for TCP
that had been published in the past. that had been published in the past.
Since then, TCP has been implemented many times, and has been used as Since then, TCP has been implemented many times, and has been used as
a transport protocol for numerous applications on the Internet. a transport protocol for numerous applications on the Internet.
For several decades, RFC 793 plus a number of other documents have For several decades, RFC 793 plus a number of other documents have
combined to serve as the specification for TCP [16]. Over time, a combined to serve as the specification for TCP [17]. Over time, a
number of errata have been identified on RFC 793, as well as number of errata have been identified on RFC 793, as well as
deficiencies in security, performance, and other aspects. A number deficiencies in security, performance, and other aspects. A number
of enhancements has grown and been documented separately. These were of enhancements has grown and been documented separately. These were
never accumulated together into an update to the base specification. never accumulated together into an update to the base specification.
The purpose of this document is to bring together all of the IETF The purpose of this document is to bring together all of the IETF
Standards Track changes that have been made to the basic TCP Standards Track changes that have been made to the basic TCP
functional specification and unify them into an update of the RFC 793 functional specification and unify them into an update of the RFC 793
protocol specification. Some companion documents are referenced for protocol specification. Some companion documents are referenced for
important algorithms that TCP uses (e.g. for congestion control), but important algorithms that TCP uses (e.g. for congestion control), but
skipping to change at page 4, line 30 skipping to change at page 4, line 34
repair losses. repair losses.
This document describes the basic functionality expected in modern This document describes the basic functionality expected in modern
implementations of TCP, and replaces the protocol specification in implementations of TCP, and replaces the protocol specification in
RFC 793. It does not replicate or attempt to update the examples and RFC 793. It does not replicate or attempt to update the examples and
other discussion in RFC 793. Other documents are referenced to other discussion in RFC 793. Other documents are referenced to
provide explanation of the theory of operation, rationale, and provide explanation of the theory of operation, rationale, and
detailed discussion of design decisions. This document only focuses detailed discussion of design decisions. This document only focuses
on the normative behavior of the protocol. on the normative behavior of the protocol.
The "TCP Roadmap" [17] provides a more extensive guide to the RFCs
that define TCP and describe various important algorithms. The TCP
Roadmap contains sections on strongly encouraged enhancements that
improve performance and other aspects of TCP beyond the basic
operation specified in this document. As one example, implementing
congestion control (e.g. [11]) is a TCP requirement, but is a complex
topic on its own, and not described in detail in this document, as
there are many options and possibilities that do not impact basic
interoperability. Similarly, most common TCP implementations today
include the high-performance extensions in [16], but these are not
strictly required or discussed in this document.
TEMPORARY EDITOR'S NOTE: This is an early revision in the process of TEMPORARY EDITOR'S NOTE: This is an early revision in the process of
updating RFC 793. Many planned changes are not yet incorporated. updating RFC 793. Many planned changes are not yet incorporated.
***Please do not use this revision as a basis for any work or ***Please do not use this revision as a basis for any work or
reference.*** reference.***
A list of changes from RFC 793 is contained in Section 4. A list of changes from RFC 793 is contained in Section 4.
TEMPORARY EDITOR'S NOTE: the current revision of this document does TEMPORARY EDITOR'S NOTE: the current revision of this document does
not yet collect all of the changes that will be in the final version. not yet collect all of the changes that will be in the final version.
skipping to change at page 36, line 5 skipping to change at page 36, line 5
an acknowledgment it advances SND.UNA. The extent to which the an acknowledgment it advances SND.UNA. The extent to which the
values of these variables differ is a measure of the delay in the values of these variables differ is a measure of the delay in the
communication. The amount by which the variables are advanced is the communication. The amount by which the variables are advanced is the
length of the data and SYN or FIN flags in the segment. Note that length of the data and SYN or FIN flags in the segment. Note that
once in the ESTABLISHED state all segments must carry current once in the ESTABLISHED state all segments must carry current
acknowledgment information. acknowledgment information.
The CLOSE user call implies a push function, as does the FIN control The CLOSE user call implies a push function, as does the FIN control
flag in an incoming segment. flag in an incoming segment.
Retransmission Timeout 3.8.1. Retransmission Timeout
NOTE: TODO this needs to be updated in light of 1122 4.2.2.15 and NOTE: TODO this needs to be updated in light of 1122 4.2.2.15 and
errata 573; this will be done as part of RFC 1122 incorporation into errata 573; this will be done as part of RFC 1122 incorporation into
this document. this document.
Because of the variability of the networks that compose an Because of the variability of the networks that compose an
internetwork system and the wide range of uses of TCP connections the internetwork system and the wide range of uses of TCP connections the
retransmission timeout must be dynamically determined. One procedure retransmission timeout must be dynamically determined. One procedure
for determining a retransmission timeout is given here as an for determining a retransmission timeout is given here as an
illustration. illustration.
skipping to change at page 36, line 35 skipping to change at page 36, line 35
and based on this, compute the retransmission timeout (RTO) as: and based on this, compute the retransmission timeout (RTO) as:
RTO = min[UBOUND,max[LBOUND,(BETA*SRTT)]] RTO = min[UBOUND,max[LBOUND,(BETA*SRTT)]]
where UBOUND is an upper bound on the timeout (e.g., 1 minute), where UBOUND is an upper bound on the timeout (e.g., 1 minute),
LBOUND is a lower bound on the timeout (e.g., 1 second), ALPHA is LBOUND is a lower bound on the timeout (e.g., 1 second), ALPHA is
a smoothing factor (e.g., .8 to .9), and BETA is a delay variance a smoothing factor (e.g., .8 to .9), and BETA is a delay variance
factor (e.g., 1.3 to 2.0). factor (e.g., 1.3 to 2.0).
The Communication of Urgent Information 3.8.2. The Communication of Urgent Information
As a result of implementation differences and middlebox interactions, As a result of implementation differences and middlebox interactions,
new applications SHOULD NOT employ the TCP urgent mechanism. new applications SHOULD NOT employ the TCP urgent mechanism.
However, TCP implementations MUST still include support for the However, TCP implementations MUST still include support for the
urgent mechanism. Details can be found in RFC 6093 [13]. urgent mechanism. Details can be found in RFC 6093 [13].
The objective of the TCP urgent mechanism is to allow the sending The objective of the TCP urgent mechanism is to allow the sending
user to stimulate the receiving user to accept some urgent data and user to stimulate the receiving user to accept some urgent data and
to permit the receiving TCP to indicate to the receiving user when to permit the receiving TCP to indicate to the receiving user when
all the currently known urgent data has been received by the user. all the currently known urgent data has been received by the user.
skipping to change at page 37, line 27 skipping to change at page 37, line 27
A TCP MUST support a sequence of urgent data of any length. [8] A TCP MUST support a sequence of urgent data of any length. [8]
A TCP MUST inform the application layer asynchronously whenever it A TCP MUST inform the application layer asynchronously whenever it
receives an Urgent pointer and there was previously no pending urgent receives an Urgent pointer and there was previously no pending urgent
data, or whenvever the Urgent pointer advances in the data stream. data, or whenvever the Urgent pointer advances in the data stream.
There MUST be a way for the application to learn how much urgent data There MUST be a way for the application to learn how much urgent data
remains to be read from the connection, or at least to determine remains to be read from the connection, or at least to determine
whether or not more urgent data remains to be read. [8] whether or not more urgent data remains to be read. [8]
Managing the Window 3.8.3. Managing the Window
The window sent in each segment indicates the range of sequence The window sent in each segment indicates the range of sequence
numbers the sender of the window (the data receiver) is currently numbers the sender of the window (the data receiver) is currently
prepared to accept. There is an assumption that this is related to prepared to accept. There is an assumption that this is related to
the currently available data buffer space available for this the currently available data buffer space available for this
connection. connection.
The sending TCP packages the data to be transmitted into segments
which fit the current window, and may repackage segments on the
retransmission queue. Such repackaging is not required, but may be
helpful.
In a connection with a one-way data flow, the window information will
be carried in acknowledgment segments that all have the same sequence
number so there will be no way to reorder them if they arrive out of
order. This is not a serious problem, but it will allow the window
information to be on occasion temporarily based on old reports from
the data receiver. A refinement to avoid this problem is to act on
the window information from segments that carry the highest
acknowledgment number (that is segments with acknowledgment number
equal or greater than the highest previously received).
Indicating a large window encourages transmissions. If more data Indicating a large window encourages transmissions. If more data
arrives than can be accepted, it will be discarded. This will result arrives than can be accepted, it will be discarded. This will result
in excessive retransmissions, adding unnecessarily to the load on the in excessive retransmissions, adding unnecessarily to the load on the
network and the TCPs. Indicating a small window may restrict the network and the TCPs. Indicating a small window may restrict the
transmission of data to the point of introducing a round trip delay transmission of data to the point of introducing a round trip delay
between each new segment transmitted. between each new segment transmitted.
The mechanisms provided allow a TCP to advertise a large window and The mechanisms provided allow a TCP to advertise a large window and
to subsequently advertise a much smaller window without having to subsequently advertise a much smaller window without having
accepted that much data. This, so called "shrinking the window," is accepted that much data. This, so called "shrinking the window," is
strongly discouraged. The robustness principle dictates that TCPs strongly discouraged. The robustness principle dictates that TCPs
will not shrink the window themselves, but will be prepared for such will not shrink the window themselves, but will be prepared for such
behavior on the part of other TCPs. behavior on the part of other TCPs.
A TCP receiver SHOULD NOT shrink the window, i.e., move the right
window edge to the left. However, a sending TCP MUST be robust
against window shrinking, which may cause the "useable window" (see
Section 3.8.3.2.1) to become negative.
If this happens, the sender SHOULD NOT send new data, but SHOULD
retransmit normally the old unacknowledged data between SND.UNA and
SND.UNA+SND.WND. The sender MAY also retransmit old data beyond
SND.UNA+SND.WND, but SHOULD NOT time out the connection if data
beyond the right window edge is not acknowledged. If the window
shrinks to zero, the TCP MUST probe it in the standard way (described
below).
3.8.3.1. Zero Window Probing
The sending TCP must be prepared to accept from the user and send at The sending TCP must be prepared to accept from the user and send at
least one octet of new data even if the send window is zero. The least one octet of new data even if the send window is zero. The
sending TCP must regularly retransmit to the receiving TCP even when sending TCP must regularly retransmit to the receiving TCP even when
the window is zero. Two minutes is recommended for the the window is zero, in order to "probe" the window. Two minutes is
retransmission interval when the window is zero. This retransmission recommended for the retransmission interval when the window is zero.
is essential to guarantee that when either TCP has a zero window the This retransmission is essential to guarantee that when either TCP
re-opening of the window will be reliably reported to the other. has a zero window the re-opening of the window will be reliably
reported to the other. This is referred to as Zero-Window Probing
(ZWP) in other documents.
Probing of zero (offered) windows MUST be supported.
A TCP MAY keep its offered receive window closed indefinitely. As
long as the receiving TCP continues to send acknowledgments in
response to the probe segments, the sending TCP MUST allow the
connection to stay open.
When the receiving TCP has a zero window and a segment arrives it When the receiving TCP has a zero window and a segment arrives it
must still send an acknowledgment showing its next expected sequence must still send an acknowledgment showing its next expected sequence
number and current window (zero). number and current window (zero).
The sending TCP packages the data to be transmitted into segments 3.8.3.2. Silly Window Syndrome Avoidance
which fit the current window, and may repackage segments on the
retransmission queue. Such repackaging is not required, but may be
helpful.
In a connection with a one-way data flow, the window information will The "Silly Window Syndrome" (SWS) is a stable pattern of small
be carried in acknowledgment segments that all have the same sequence incremental window movements resulting in extremely poor TCP
number so there will be no way to reorder them if they arrive out of performance. Algorithms to avoid SWS are described below for both
order. This is not a serious problem, but it will allow the window the sending side and the receiving side. RFC 1122 contains more
information to be on occasion temporarily based on old reports from detailed discussion of the SWS problem. Note that the Nagle
the data receiver. A refinement to avoid this problem is to act on algorithm and the sender SWS avoidance algorithm play complementary
the window information from segments that carry the highest roles in improving performance. The Nagle algorithm discourages
acknowledgment number (that is segments with acknowledgment number sending tiny segments when the data to be sent increases in small
equal or greater than the highest previously received). increments, while the SWS avoidance algorithm discourages small
segments resulting from the right window edge advancing in small
increments.
The window management procedure has significant influence on the 3.8.3.2.1. Sender's Algorithm - When to Send Data
communication performance. The following comments are suggestions to
implementers.
Window Management Suggestions A TCP MUST include a SWS avoidance algorithm in the sender.
Allocating a very small window causes data to be transmitted in A TCP SHOULD implement the Nagle Algorithm to coalesce short
many small segments when better performance is achieved using segments. However, there MUST be a way for an application to disable
fewer large segments. the Nagle algorithm on an individual connection. In all cases,
sending data is also subject to the limitation imposed by the Slow
Start algorithm.
One suggestion for avoiding small windows is for the receiver to The sender's SWS avoidance algorithm is more difficult than the
defer updating a window until the additional allocation is at receivers's, because the sender does not know (directly) the
least X percent of the maximum allocation possible for the receiver's total buffer space RCV.BUFF. An approach which has been
connection (where X might be 20 to 40). found to work well is for the sender to calculate Max(SND.WND), the
maximum send window it has seen so far on the connection, and to use
this value as an estimate of RCV.BUFF. Unfortunately, this can only
be an estimate; the receiver may at any time reduce the size of
RCV.BUFF. To avoid a resulting deadlock, it is necessary to have a
timeout to force transmission of data, overriding the SWS avoidance
algorithm. In practice, this timeout should seldom occur.
Another suggestion is for the sender to avoid sending small The "useable window" is:
segments by waiting until the window is large enough before
sending data. If the user signals a push function then the data
must be sent even if it is a small segment.
Note that the acknowledgments should not be delayed or unnecessary U = SND.UNA + SND.WND - SND.NXT
retransmissions will result. One strategy would be to send an
acknowledgment when a small segment arrives (with out updating the
window information), and then to send another acknowledgment with
new window information when the window is larger.
The segment sent to probe a zero window may also begin a break up i.e., the offered window less the amount of data sent but not
of transmitted data into smaller and smaller segments. If a acknowledged. If D is the amount of data queued in the sending TCP
segment containing a single data octet sent to probe a zero window but not yet sent, then the following set of rules is recommended.
is accepted, it consumes one octet of the window now available.
If the sending TCP simply sends as much as it can whenever the
window is non zero, the transmitted data will be broken into
alternating big and small segments. As time goes on, occasional
pauses in the receiver making window allocation available will
result in breaking the big segments into a small and not quite so
big pair. And after a while the data transmission will be in
mostly small segments.
The suggestion here is that the TCP implementations need to Send data:
actively attempt to combine small window allocations into larger
windows, since the mechanisms for managing the window tend to lead (1) if a maximum-sized segment can be sent, i.e, if:
to many small windows in the simplest minded implementations.
min(D,U) >= Eff.snd.MSS;
(2) or if the data is pushed and all queued data can be sent now,
i.e., if:
[SND.NXT = SND.UNA and] PUSHED and D <= U
(the bracketed condition is imposed by the Nagle algorithm);
(3) or if at least a fraction Fs of the maximum window can be sent,
i.e., if:
[SND.NXT = SND.UNA and]
min(D.U) >= Fs * Max(SND.WND);
(4) or if data is PUSHed and the override timeout occurs.
Here Fs is a fraction whose recommended value is 1/2. The override
timeout should be in the range 0.1 - 1.0 seconds. It may be
convenient to combine this timer with the timer used to probe zero
windows (Section Section 3.8.3.1).
3.8.3.2.2. Receiver's Algorithm - When to Send a Window Update
A TCP MUST include a SWS avoidance algorithm in the receiver.
The receiver's SWS avoidance algorithm determines when the right
window edge may be advanced; this is customarily known as "updating
the window". This algorithm combines with the delayed ACK algorithm
(see Section 3.8.3.3) to determine when an ACK segment containing the
current window will really be sent to the receiver.
The solution to receiver SWS is to avoid advancing the right window
edge RCV.NXT+RCV.WND in small increments, even if data is received
from the network in small segments.
Suppose the total receive buffer space is RCV.BUFF. At any given
moment, RCV.USER octets of this total may be tied up with data that
has been received and acknowledged but which the user process has not
yet consumed. When the connection is quiescent, RCV.WND = RCV.BUFF
and RCV.USER = 0.
Keeping the right window edge fixed as data arrives and is
acknowledged requires that the receiver offer less than its full
buffer space, i.e., the receiver must specify a RCV.WND that keeps
RCV.NXT+RCV.WND constant as RCV.NXT increases. Thus, the total
buffer space RCV.BUFF is generally divided into three parts:
|<------- RCV.BUFF ---------------->|
1 2 3
----|---------|------------------|------|----
RCV.NXT ^
(Fixed)
1 - RCV.USER = data received but not yet consumed;
2 - RCV.WND = space advertised to sender;
3 - Reduction = space available but not yet
advertised.
The suggested SWS avoidance algorithm for the receiver is to keep
RCV.NXT+RCV.WND fixed until the reduction satisfies:
RCV.BUFF - RCV.USER - RCV.WND >=
min( Fr * RCV.BUFF, Eff.snd.MSS )
where Fr is a fraction whose recommended value is 1/2, and
Eff.snd.MSS is the effective send MSS for the connection (see
Section 3.7.1). When the inequality is satisfied, RCV.WND is set to
RCV.BUFF-RCV.USER.
Note that the general effect of this algorithm is to advance RCV.WND
in increments of Eff.snd.MSS (for realistic receive buffers:
Eff.snd.MSS < RCV.BUFF/2). Note also that the receiver must use its
own Eff.snd.MSS, assuming it is the same as the sender's.
3.8.3.3. Delayed Acknowledgements - When to Send an ACK Segment
A host that is receiving a stream of TCP data segments can increase
efficiency in both the Internet and the hosts by sending fewer than
one ACK (acknowledgment) segment per data segment received; this is
known as a "delayed ACK".
A TCP SHOULD implement a delayed ACK, but an ACK should not be
excessively delayed; in particular, the delay MUST be less than 0.5
seconds, and in a stream of full-sized segments there SHOULD be an
ACK for at least every second segment. Excessive delays on ACK's can
disturb the round-trip timing and packet "clocking" algorithms.
3.9. Interfaces 3.9. Interfaces
There are of course two interfaces of concern: the user/TCP interface There are of course two interfaces of concern: the user/TCP interface
and the TCP/lower-level interface. We have a fairly elaborate model and the TCP/lower-level interface. We have a fairly elaborate model
of the user/TCP interface, but the interface to the lower level of the user/TCP interface, but the interface to the lower level
protocol module is left unspecified here, since it will be specified protocol module is left unspecified here, since it will be specified
in detail by the specification of the lower level protocol. For the in detail by the specification of the lower level protocol. For the
case that the lower level is IP we note some of the parameter values case that the lower level is IP we note some of the parameter values
that TCPs might use. that TCPs might use.
skipping to change at page 40, line 50 skipping to change at page 43, line 25
If the active/passive flag is set to passive, then this is a If the active/passive flag is set to passive, then this is a
call to LISTEN for an incoming connection. A passive open may call to LISTEN for an incoming connection. A passive open may
have either a fully specified foreign socket to wait for a have either a fully specified foreign socket to wait for a
particular connection or an unspecified foreign socket to wait particular connection or an unspecified foreign socket to wait
for any call. A fully specified passive call can be made for any call. A fully specified passive call can be made
active by the subsequent execution of a SEND. active by the subsequent execution of a SEND.
A transmission control block (TCB) is created and partially A transmission control block (TCB) is created and partially
filled in with data from the OPEN command parameters. filled in with data from the OPEN command parameters.
Every passive OPEN call either creates a new connection record
in LISTEN state, or it returns an error; it MUST NOT affect any
previously created connection record.
A TCP that supports multiple concurrent users MUST provide an
OPEN call that will functionally allow an application to LISTEN
on a port while a connection block with the same local port is
in SYN-SENT or SYN-RECEIVED state.
On an active OPEN command, the TCP will begin the procedure to On an active OPEN command, the TCP will begin the procedure to
synchronize (i.e., establish) the connection at once. synchronize (i.e., establish) the connection at once.
The timeout, if present, permits the caller to set up a timeout The timeout, if present, permits the caller to set up a timeout
for all data submitted to TCP. If data is not successfully for all data submitted to TCP. If data is not successfully
delivered to the destination within the timeout period, the TCP delivered to the destination within the timeout period, the TCP
will abort the connection. The present global default is five will abort the connection. The present global default is five
minutes. minutes.
The TCP or some component of the operating system will verify The TCP or some component of the operating system will verify
skipping to change at page 42, line 5 skipping to change at page 44, line 38
address. If the parameter is unspecified, a passive OPEN will address. If the parameter is unspecified, a passive OPEN will
await an incoming connection request to any local IP address, await an incoming connection request to any local IP address,
and then bind the local IP address of the connection to the and then bind the local IP address of the connection to the
particular address that is used. particular address that is used.
For an active OPEN call, a specified "local IP address" For an active OPEN call, a specified "local IP address"
parameter will be used for opening the connection. If the parameter will be used for opening the connection. If the
parameter is unspecified, the TCP will choose an appropriate parameter is unspecified, the TCP will choose an appropriate
local IP address (see RFC 1122 section 3.3.4.2). local IP address (see RFC 1122 section 3.3.4.2).
TODO - the previous and next paragraphs are mildly in conflict.
Previous paragraph says that the TCP chooses an address, but
next paragraph says that it asks IP to choose ... need to make
this consistent
If an application on a multihomed host does not specify the
local IP address when actively opening a TCP connection, then
the TCP MUST ask the IP layer to select a local IP address
before sending the (first) SYN. See the function GET_SRCADDR()
in Section 3.4 of RFC 1122.
At all other times, a previous segment has either been sent or
received on this connection, and TCP MUST use the same local
address is used that was used in those previous segments.
Send Send
Format: SEND (local connection name, buffer address, byte Format: SEND (local connection name, buffer address, byte
count, PUSH flag, URGENT flag [,timeout]) count, PUSH flag, URGENT flag [,timeout])
This call causes the data contained in the indicated user This call causes the data contained in the indicated user
buffer to be sent on the indicated connection. If the buffer to be sent on the indicated connection. If the
connection has not been opened, the SEND is considered an connection has not been opened, the SEND is considered an
error. Some implementations may allow users to SEND first; in error. Some implementations may allow users to SEND first; in
which case, an automatic OPEN would be done. If the calling which case, an automatic OPEN would be done. If the calling
skipping to change at page 47, line 28 skipping to change at page 50, line 28
If the lower level protocol is IP it provides arguments for a type of If the lower level protocol is IP it provides arguments for a type of
service and for a time to live. TCP uses the following settings for service and for a time to live. TCP uses the following settings for
these parameters: these parameters:
Type of Service = Precedence: given by user, Delay: normal, Type of Service = Precedence: given by user, Delay: normal,
Throughput: normal, Reliability: normal; or binary XXX00000, where Throughput: normal, Reliability: normal; or binary XXX00000, where
XXX are the three bits determining precedence, e.g. 000 means XXX are the three bits determining precedence, e.g. 000 means
routine precedence. routine precedence.
Time to Live = one minute, or 00111100. Time to Live (TTL): The TTL value used to send TCP segments MUST
be configurable.
Note that the assumed maximum segment lifetime is two minutes. Note that RFC 793 specified one minute (60 seconds) as a
Here we explicitly ask that a segment be destroyed if it cannot constant for the TTL, because the assumed maximum segment
be delivered by the internet system within one minute. lifetime was two minutes. This was intended to explicitly ask
that a segment be destroyed if it cannot be delivered by the
internet system within one minute. RFC 1122 changed this
specification to require that the TTL be configurable.
Any lower level protocol will have to provide the source address,
destination address, and protocol fields, and some way to determine
the "TCP length", both to provide the functional equivalent service
of IP and to be used in the TCP checksum.
When received options are passed up to TCP from the IP layer, TCP
MUST ignore options that it does not understand.
A TCP MAY support the Time Stamp and Record Route options.
3.9.2.1. Source Routing
If the lower level is IP (or other protocol that provides this If the lower level is IP (or other protocol that provides this
feature) and source routing is used, the interface must allow the feature) and source routing is used, the interface must allow the
route information to be communicated. This is especially important route information to be communicated. This is especially important
so that the source and destination addresses used in the TCP checksum so that the source and destination addresses used in the TCP checksum
be the originating source and ultimate destination. It is also be the originating source and ultimate destination. It is also
important to preserve the return route to answer connection requests. important to preserve the return route to answer connection requests.
Any lower level protocol will have to provide the source address, An application MUST be able to specify a source route when it
destination address, and protocol fields, and some way to determine actively opens a TCP connection, and this MUST take precedence over a
the "TCP length", both to provide the functional equivalent service source route received in a datagram.
of IP and to be used in the TCP checksum.
When a TCP connection is OPENed passively and a packet arrives with a
completed IP Source Route option (containing a return route), TCP
MUST save the return route and use it for all segments sent on this
connection. If a different source route arrives in a later segment,
the later definition SHOULD override the earlier one.
3.9.2.2. ICMP Messages
TODO - this section is verbatim from 1122, currently. It should be
revised to match the soft-errors RFC, and other updates (e.g. source
quench deprecation)
TCP MUST act on an ICMP error message passed up from the IP layer,
directing it to the connection that created the error. The necessary
demultiplexing information can be found in the IP header contained
within the ICMP message.
Source Quench
TCP MUST react to a Source Quench by slowing transmission on the
connection. The RECOMMENDED procedure is for a Source Quench to
trigger a "slow start," as if a retransmission timeout had
occurred.
Destination Unreachable -- codes 0, 1, 5
Since these Unreachable messages indicate soft error conditions,
TCP MUST NOT abort the connection, and it SHOULD make the
information available to the application.
Destination Unreachable -- codes 2-4
These are hard error conditions, so TCP SHOULD abort the
connection.
Time Exceeded -- codes 0, 1
This should be handled the same way as Destination Unreachable
codes 0, 1, 5 (see above).
Parameter Problem
This should be handled the same way as Destination Unreachable
codes 0, 1, 5 (see above).
3.10. Event Processing 3.10. Event Processing
The processing depicted in this section is an example of one possible The processing depicted in this section is an example of one possible
implementation. Other implementations may have slightly different implementation. Other implementations may have slightly different
processing sequences, but they should differ from those in this processing sequences, but they should differ from those in this
section only in detail, not in substance. section only in detail, not in substance.
The activity of the TCP can be characterized as responding to events. The activity of the TCP can be characterized as responding to events.
The events that occur can be cast into three categories: user calls, The events that occur can be cast into three categories: user calls,
skipping to change at page 77, line 37 skipping to change at page 81, line 37
that this mitigates, as well as advice on selecting PRF algorithms that this mitigates, as well as advice on selecting PRF algorithms
and managing secret key data. and managing secret key data.
RFC EDITOR'S NOTE: the content below is for detailed change tracking RFC EDITOR'S NOTE: the content below is for detailed change tracking
and planning, and not to be included with the final revision of the and planning, and not to be included with the final revision of the
document. document.
This document started as draft-eddy-rfc793bis-00, that was merely a This document started as draft-eddy-rfc793bis-00, that was merely a
proposal and rough plan for updating RFC 793. proposal and rough plan for updating RFC 793.
The -01 revision of this document incorporates the content of RFC 793 The -01 revision of this draft-eddy-rfc793bis incorporates the
Section 3 titled "FUNCTIONAL SPECIFICATION". Other content from RFC content of RFC 793 Section 3 titled "FUNCTIONAL SPECIFICATION".
793 has not been incorporated. The -01 revision of this document Other content from RFC 793 has not been incorporated. The -01
makes some minor formatting changes to the RFC 793 content in order revision of this document makes some minor formatting changes to the
to convert the content into XML2RFC format and account for left-out RFC 793 content in order to convert the content into XML2RFC format
parts of RFC 793. For instance, figure numbering differs and some and account for left-out parts of RFC 793. For instance, figure
indentation is not exactly the same. numbering differs and some indentation is not exactly the same.
The -02 revision of draft-eddy-rfc793bis incorporates errata that The -02 revision of draft-eddy-rfc793bis incorporates errata that
have been verified: have been verified:
Errata ID 573: Reported by Bob Braden (note: This errata basically Errata ID 573: Reported by Bob Braden (note: This errata basically
is just a reminder that RFC 1122 updates 793. Some of the is just a reminder that RFC 1122 updates 793. Some of the
associated changes are left pending to a separate revision that associated changes are left pending to a separate revision that
incorporates 1122. Bob's mention of PUSH in 793 section 2.8 was incorporates 1122. Bob's mention of PUSH in 793 section 2.8 was
not applicable here because that section was not part of the not applicable here because that section was not part of the
"functional specification". Also the 1122 text on the "functional specification". Also the 1122 text on the
skipping to change at page 79, line 17 skipping to change at page 83, line 17
option-handling requirements from RFC 1122. option-handling requirements from RFC 1122.
The -00 revision of draft-ietf-tcpm-rfc793bis incorporates several The -00 revision of draft-ietf-tcpm-rfc793bis incorporates several
additional clarifications and updates to the section on segmentation, additional clarifications and updates to the section on segmentation,
many of which are based on feedback from Joe Touch improving from the many of which are based on feedback from Joe Touch improving from the
initial text on this in the previous revision. initial text on this in the previous revision.
The -01 revision incorporates the change to Reserved bits due to ECN, The -01 revision incorporates the change to Reserved bits due to ECN,
as well as many other changes that come from RFC 1122. as well as many other changes that come from RFC 1122.
TODO: Incomplete list of other planned changes - these can be added The -02 revision has small formating modifications in order to
to and made more specific, as the document proceeds: address xml2rfc warnings about long lines. It was a quick update to
avoid document expiration. TCPM working group discussion in 2015
also indicated that that we should not try to add sections on
implementation advice or similar non-normative information.
1. incorporate all other 1122 additions (sections on Data The -03 revision incorporates more content from RFC 1122: Passive
Communication, Retransmission Timeout, Managing the Window, OPEN Calls, Time-To-Live, Multihoming, IP Options, ICMP messages,
Probing Zero Windows, Passive OPEN Calls, Time to Live, Event Data Communications, When to Send Data, When to Send a Window Update,
Processing, Acknowledging Queued Segments, Retransmission Managing the Window, Probing Zero Windows, When to Send an ACK
Timeout Calculation, When to Send an ACK Segment, When to Send a Segment. The section on data communications was re-organized into
Window Update, When to Send Data, TCP Connection Failures, TCP clearer subsections (previously headings were embedded in the 793
Keep-Alives, TCP Multihoming, IP options, ICMP messages, remote text), and windows management advice from 793 was provisionally
address validation) removed (to be reviewed by TCPM working group) in favor of the 1122
2. point to major additional docs like 1323bis and 5681 additions on SWS, ZWP, and related topics.
3. incorporate relevant parts of 3168 (ECN) - beyond just
indicating the names of the 2 bits already done TODO list of other planned changes (these can be added to or made
4. incorporate Fernando's new number-checking fixes (if past the more specific, as the document proceeds):
IESG in time)
5. point to 5461 (soft errors) 1. incorporate all other 1122 additions (sections on Retransmission
6. mention 5961 state machine option Timeout, Event Processing, Acknowledging Queued Segments,
7. mention 6161 (reducing TIME-WAIT) Retransmission Timeout Calculation, TCP Connection Failures, TCP
8. incorporate 6429 (ZWP/persist) Keep-Alives, remote address validation)
9. look at Tony Sabatini suggestion for describing DO field 2. incorporate relevant parts of 3168 (ECN) - beyond just indicating
10. clearly specify treatment of reserved bits (see TCPM thread on the names of the 2 bits already done
EDO draft April 25, 2014) 3. point to 5461 (soft errors)
11. look at possible mention of draft-minshall-nagle (e.g. as in 4. mention 5961 state machine option
Linux) 5. mention 6161 (reducing TIME-WAIT)
12. make sure that clarifications in RFC 1011 are captured 6. incorporate 6429 (ZWP/persist)
13. per TCPM discussion, discussion of checking reserved bits may 7. TOS material does not take DSCP changes into account
need to be altered from 793 8. there is inconsistency between use of SYN_RCVD and SYNC-RECEIVED
14. per discussion with Joe Touch (TAPS list, 6/20/2015), the in diagrams and text in various places
description of the API could be revisited 9. make sure that clarifications in RFC 1011 are captured
15. there is inconsistency between use of SYN_RCVD and SYNC-RECEIVED
in diagrams and text in various places TODO list of other potential changes, if there is TCPM consensus:
16. TOS material does not take DSCP changes into account
17. discuss with working group whether to include anything like 1. incorporate Fernando's new number-checking fixes (if past the
section 4.2.3.12 of 1122 (on "efficiency" ... basically IESG in time)
implementation advice), maybe similar to 2525 in handling for 2. look at Tony Sabatini suggestion for describing DO field
this document. also 4.2.3.11 on "TCP Traffic Patterns" 3. clearly specify treatment of reserved bits (see TCPM thread on
EDO draft April 25, 2014)
4. look at possible mention of draft-minshall-nagle (e.g. as in
Linux)
5. per discussion with Joe Touch (TAPS list, 6/20/2015), the
description of the API could be revisited
5. IANA Considerations 5. IANA Considerations
This memo includes no request to IANA. Existing IANA registries for This memo includes no request to IANA. Existing IANA registries for
TCP parameters are sufficient. TCP parameters are sufficient.
TODO: check whether entries pointing to 793 and other documents TODO: check whether entries pointing to 793 and other documents
obsoleted by this one should be updated to point to this one instead. obsoleted by this one should be updated to point to this one instead.
6. Security and Privacy Considerations 6. Security and Privacy Considerations
skipping to change at page 82, line 26 skipping to change at page 86, line 35
January 2011, <http://www.rfc-editor.org/info/rfc6093>. January 2011, <http://www.rfc-editor.org/info/rfc6093>.
[14] Gont, F. and S. Bellovin, "Defending against Sequence [14] Gont, F. and S. Bellovin, "Defending against Sequence
Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February
2012, <http://www.rfc-editor.org/info/rfc6528>. 2012, <http://www.rfc-editor.org/info/rfc6528>.
[15] Borman, D., "TCP Options and Maximum Segment Size (MSS)", [15] Borman, D., "TCP Options and Maximum Segment Size (MSS)",
RFC 6691, DOI 10.17487/RFC6691, July 2012, RFC 6691, DOI 10.17487/RFC6691, July 2012,
<http://www.rfc-editor.org/info/rfc6691>. <http://www.rfc-editor.org/info/rfc6691>.
[16] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. [16] Borman, D., Braden, B., Jacobson, V., and R.
Scheffenegger, Ed., "TCP Extensions for High Performance",
RFC 7323, DOI 10.17487/RFC7323, September 2014,
<http://www.rfc-editor.org/info/rfc7323>.
[17] Duke, M., Braden, R., Eddy, W., Blanton, E., and A.
Zimmermann, "A Roadmap for Transmission Control Protocol Zimmermann, "A Roadmap for Transmission Control Protocol
(TCP) Specification Documents", RFC 7414, (TCP) Specification Documents", RFC 7414,
DOI 10.17487/RFC7414, February 2015, DOI 10.17487/RFC7414, February 2015,
<http://www.rfc-editor.org/info/rfc7414>. <http://www.rfc-editor.org/info/rfc7414>.
Appendix A. TCP Requirement Summary Appendix A. TCP Requirement Summary
This section is adapted from RFC 1122. This section is adapted from RFC 1122.
TODO: this needs to be seriously redone, to use 793bis section TODO: this needs to be seriously redone, to use 793bis section
 End of changes. 33 change blocks. 
128 lines changed or deleted 353 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/