draft-ietf-lwig-tcp-constrained-node-networks-08.txt   draft-ietf-lwig-tcp-constrained-node-networks-09.txt 
LWIG Working Group C. Gomez LWIG Working Group C. Gomez
Internet-Draft UPC Internet-Draft UPC
Intended status: Informational J. Crowcroft Intended status: Informational J. Crowcroft
Expires: December 6, 2019 University of Cambridge Expires: May 6, 2020 University of Cambridge
M. Scharf M. Scharf
Hochschule Esslingen Hochschule Esslingen
June 4, 2019 November 3, 2019
TCP Usage Guidance in the Internet of Things (IoT) TCP Usage Guidance in the Internet of Things (IoT)
draft-ietf-lwig-tcp-constrained-node-networks-08 draft-ietf-lwig-tcp-constrained-node-networks-09
Abstract Abstract
This document provides guidance on how to implement and use the This document provides guidance on how to implement and use the
Transmission Control Protocol (TCP) in Constrained-Node Networks Transmission Control Protocol (TCP) in Constrained-Node Networks
(CNNs), which are a characterstic of the Internet of Things (IoT). (CNNs), which are a characterstic of the Internet of Things (IoT).
Such environments require a lightweight TCP implementation and may Such environments require a lightweight TCP implementation and may
not make use of optional functionality. This document explains a not make use of optional functionality. This document explains a
number of known and deployed techniques to simplify a TCP stack as number of known and deployed techniques to simplify a TCP stack as
well as corresponding tradeoffs. The objective is to help embedded well as corresponding tradeoffs. The objective is to help embedded
skipping to change at page 1, line 40 skipping to change at page 1, line 40
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 6, 2019. This Internet-Draft will expire on May 6, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 24 skipping to change at page 2, line 24
2. Conventions used in this document . . . . . . . . . . . . . . 4 2. Conventions used in this document . . . . . . . . . . . . . . 4
3. Characteristics of CNNs relevant for TCP . . . . . . . . . . 4 3. Characteristics of CNNs relevant for TCP . . . . . . . . . . 4
3.1. Network and link properties . . . . . . . . . . . . . . . 4 3.1. Network and link properties . . . . . . . . . . . . . . . 4
3.2. Usage scenarios . . . . . . . . . . . . . . . . . . . . . 5 3.2. Usage scenarios . . . . . . . . . . . . . . . . . . . . . 5
3.3. Communication and traffic patterns . . . . . . . . . . . 6 3.3. Communication and traffic patterns . . . . . . . . . . . 6
4. TCP implementation and configuration in CNNs . . . . . . . . 6 4. TCP implementation and configuration in CNNs . . . . . . . . 6
4.1. Addressing path properties . . . . . . . . . . . . . . . 7 4.1. Addressing path properties . . . . . . . . . . . . . . . 7
4.1.1. Maximum Segment Size (MSS) . . . . . . . . . . . . . 7 4.1.1. Maximum Segment Size (MSS) . . . . . . . . . . . . . 7
4.1.2. Explicit Congestion Notification (ECN) . . . . . . . 8 4.1.2. Explicit Congestion Notification (ECN) . . . . . . . 8
4.1.3. Explicit loss notifications . . . . . . . . . . . . . 9 4.1.3. Explicit loss notifications . . . . . . . . . . . . . 9
4.2. TCP guidance for single-segment stacks . . . . . . . . . 9 4.2. TCP guidance for single-MSS stacks . . . . . . . . . . . 9
4.2.1. Single-segment stacks - benefits and issues . . . . . 9 4.2.1. Single-MSS stacks - benefits and issues . . . . . . . 9
4.2.2. TCP options for single-segment stacks . . . . . . . . 10 4.2.2. TCP options for single-MSS stacks . . . . . . . . . . 10
4.2.3. Delayed Acknowledgments for single-segment stacks . . 10 4.2.3. Delayed Acknowledgments for single-MSS stacks . . . . 10
4.2.4. RTO calculation for single-segment stacks . . . . . . 11 4.2.4. RTO calculation for single-MSS stacks . . . . . . . . 11
4.3. General recommendations for TCP in CNNs . . . . . . . . . 11 4.3. General recommendations for TCP in CNNs . . . . . . . . . 12
4.3.1. Loss recovery and congestion/flow control . . . . . . 12 4.3.1. Loss recovery and congestion/flow control . . . . . . 12
4.3.1.1. Selective Acknowledgments (SACK) . . . . . . . . 12 4.3.1.1. Selective Acknowledgments (SACK) . . . . . . . . 12
4.3.2. Delayed Acknowledgments . . . . . . . . . . . . . . . 13 4.3.2. Delayed Acknowledgments . . . . . . . . . . . . . . . 13
4.3.3. Initial Window . . . . . . . . . . . . . . . . . . . 13 4.3.3. Initial Window . . . . . . . . . . . . . . . . . . . 14
5. TCP usage recommendations in CNNs . . . . . . . . . . . . . . 14 5. TCP usage recommendations in CNNs . . . . . . . . . . . . . . 14
5.1. TCP connection initiation . . . . . . . . . . . . . . . . 14 5.1. TCP connection initiation . . . . . . . . . . . . . . . . 14
5.2. Number of concurrent connections . . . . . . . . . . . . 14 5.2. Number of concurrent connections . . . . . . . . . . . . 14
5.3. TCP connection lifetime . . . . . . . . . . . . . . . . . 15 5.3. TCP connection lifetime . . . . . . . . . . . . . . . . . 15
6. Security Considerations . . . . . . . . . . . . . . . . . . . 17 6. Security Considerations . . . . . . . . . . . . . . . . . . . 17
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17
8. Annex. TCP implementations for constrained devices . . . . . 18 8. Annex. TCP implementations for constrained devices . . . . . 18
8.1. uIP . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 8.1. uIP . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8.2. lwIP . . . . . . . . . . . . . . . . . . . . . . . . . . 19 8.2. lwIP . . . . . . . . . . . . . . . . . . . . . . . . . . 19
8.3. RIOT . . . . . . . . . . . . . . . . . . . . . . . . . . 19 8.3. RIOT . . . . . . . . . . . . . . . . . . . . . . . . . . 19
skipping to change at page 3, line 9 skipping to change at page 3, line 9
8.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 20 8.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 20
9. Annex. Changes compared to previous versions . . . . . . . . 22 9. Annex. Changes compared to previous versions . . . . . . . . 22
9.1. Changes between -00 and -01 . . . . . . . . . . . . . . . 22 9.1. Changes between -00 and -01 . . . . . . . . . . . . . . . 22
9.2. Changes between -01 and -02 . . . . . . . . . . . . . . . 22 9.2. Changes between -01 and -02 . . . . . . . . . . . . . . . 22
9.3. Changes between -02 and -03 . . . . . . . . . . . . . . . 22 9.3. Changes between -02 and -03 . . . . . . . . . . . . . . . 22
9.4. Changes between -03 and -04 . . . . . . . . . . . . . . . 23 9.4. Changes between -03 and -04 . . . . . . . . . . . . . . . 23
9.5. Changes between -04 and -05 . . . . . . . . . . . . . . . 23 9.5. Changes between -04 and -05 . . . . . . . . . . . . . . . 23
9.6. Changes between -05 and -06 . . . . . . . . . . . . . . . 23 9.6. Changes between -05 and -06 . . . . . . . . . . . . . . . 23
9.7. Changes between -06 and -07 . . . . . . . . . . . . . . . 23 9.7. Changes between -06 and -07 . . . . . . . . . . . . . . . 23
9.8. Changes between -07 and -08 . . . . . . . . . . . . . . . 23 9.8. Changes between -07 and -08 . . . . . . . . . . . . . . . 23
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 9.9. Changes between -08 and -09 . . . . . . . . . . . . . . . 23
10.1. Normative References . . . . . . . . . . . . . . . . . . 23 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 24
10.1. Normative References . . . . . . . . . . . . . . . . . . 24
10.2. Informative References . . . . . . . . . . . . . . . . . 25 10.2. Informative References . . . . . . . . . . . . . . . . . 25
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 29 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 29
1. Introduction 1. Introduction
The Internet Protocol suite is being used for connecting Constrained- The Internet Protocol suite is being used for connecting Constrained-
Node Networks (CNNs) to the Internet, enabling the so-called Internet Node Networks (CNNs) to the Internet, enabling the so-called Internet
of Things (IoT) [RFC7228]. In order to meet the requirements that of Things (IoT) [RFC7228]. In order to meet the requirements that
stem from CNNs, the IETF has produced a suite of new protocols stem from CNNs, the IETF has produced a suite of new protocols
specifically designed for such environments (see e.g. [RFC8352]). specifically designed for such environments (see e.g. [RFC8352]).
skipping to change at page 7, line 12 skipping to change at page 7, line 12
constraints in CNN. The guidance in this section relates to the TCP constraints in CNN. The guidance in this section relates to the TCP
implementation and its configuration. implementation and its configuration.
4.1. Addressing path properties 4.1. Addressing path properties
4.1.1. Maximum Segment Size (MSS) 4.1.1. Maximum Segment Size (MSS)
Assuming that IPv6 is used, and for the sake of lightweight Assuming that IPv6 is used, and for the sake of lightweight
implementation and operation, unless applications require handling implementation and operation, unless applications require handling
large data units (i.e. leading to an IPv6 datagram size greater than large data units (i.e. leading to an IPv6 datagram size greater than
1280 bytes), it may be desirable to limit the MTU to 1280 bytes in 1280 bytes), it may be desirable to limit the IP datagram size to
order to avoid the need to support Path MTU Discovery [RFC8201]. In 1280 bytes in order to avoid the need to support Path MTU Discovery
addition, an MTU of 1280 bytes avoids incurring IPv6-layer [RFC8201]. In addition, an IP datagram size of 1280 bytes avoids
fragmentation. incurring IPv6-layer fragmentation.
An IPv6 datagram size exceeding 1280 bytes can be avoided by setting An IPv6 datagram size exceeding 1280 bytes can be avoided by setting
the TCP MSS not larger than 1220 bytes. This assumes that the remote the TCP MSS not larger than 1220 bytes. This assumes that the remote
sender will use no TCP options, aside from possibly the MSS option, sender will use no TCP options, aside from possibly the MSS option,
which is only used in the initial TCP SYN packet. which is only used in the initial TCP SYN packet.
In order to accommodate unrequested TCP options that may be used by In order to accommodate unrequested TCP options that may be used by
some TCP implementations, a constrained device may advertise an MSS some TCP implementations, a constrained device may advertise an MSS
smaller than 1220 bytes (e.g. not larger than 1200 bytes). Note smaller than 1220 bytes (e.g. not larger than 1200 bytes). Note
that, in many implementations, TCP options generally consume payload that, in many implementations, TCP options generally consume payload
skipping to change at page 8, line 7 skipping to change at page 8, line 7
On the other hand, there exist technologies also used in the CNN On the other hand, there exist technologies also used in the CNN
space, such as Master Slave / Token Passing (TP) [RFC8163], space, such as Master Slave / Token Passing (TP) [RFC8163],
Narrowband IoT (NB-IoT) [RFC8376] or IEEE 802.11ah Narrowband IoT (NB-IoT) [RFC8376] or IEEE 802.11ah
[I-D.delcarpio-6lo-wlanah], that do not suffer the same degree of [I-D.delcarpio-6lo-wlanah], that do not suffer the same degree of
frame size limitations as the technologies mentioned above. The MTU frame size limitations as the technologies mentioned above. The MTU
for MS/TP is recommended to be 1500 bytes [RFC8163], the MTU in NB- for MS/TP is recommended to be 1500 bytes [RFC8163], the MTU in NB-
IoT is 1600 bytes, and the maximum frame payload size for IEEE IoT is 1600 bytes, and the maximum frame payload size for IEEE
802.11ah is 7991 bytes. 802.11ah is 7991 bytes.
While many IP-based IoT environments use IPv6, IPv4 can also be in
use. In IPv4, the minimum MTU is 576 bytes. In order to avoid
exceeding the IPv4 MTU, the MSS needs to be set to a value not larger
than the IPv4 MTU minus 40 bytes. Similarly to the recommendations
given above for IPv6, a constrained device using IPv4 may advertise
an even smaller MSS in order to accommodate unrequested TCP options.
Finally, note that using larger MSS (to a suitable extent) may be Finally, note that using larger MSS (to a suitable extent) may be
beneficial, especially when transferring large payloads, as it beneficial, especially when transferring large payloads, as it
reduces the number of packets (and packet headers) required for a reduces the number of packets (and packet headers) required for a
given payload. given payload.
4.1.2. Explicit Congestion Notification (ECN) 4.1.2. Explicit Congestion Notification (ECN)
Explicit Congestion Notification (ECN) [RFC3168] ECN allows a router Explicit Congestion Notification (ECN) [RFC3168] ECN allows a router
to signal in the IP header of a packet that congestion is arising, to signal in the IP header of a packet that congestion is arising,
for example when a queue size reaches a certain threshold. An ECN- for example when a queue size reaches a certain threshold. An ECN-
skipping to change at page 9, line 29 skipping to change at page 9, line 22
4.1.3. Explicit loss notifications 4.1.3. Explicit loss notifications
There has been a significant body of research on solutions capable of There has been a significant body of research on solutions capable of
explicitly indicating whether a TCP segment loss is due to explicitly indicating whether a TCP segment loss is due to
corruption, in order to avoid activation of congestion control corruption, in order to avoid activation of congestion control
mechanisms [ETEN] [RFC2757]. While such solutions may provide mechanisms [ETEN] [RFC2757]. While such solutions may provide
significant improvement, they have not been widely deployed and significant improvement, they have not been widely deployed and
remain as experimental work. In fact, as of today, the IETF has not remain as experimental work. In fact, as of today, the IETF has not
standardized any such solution. standardized any such solution.
4.2. TCP guidance for single-segment stacks 4.2. TCP guidance for single-MSS stacks
This section discusses TCP stacks that allow transferring only a This section discusses TCP stacks that allow transferring a single
single segment. More general guidance is provided in Section 4.3. MSS. More general guidance is provided in Section 4.3.
4.2.1. Single-segment stacks - benefits and issues 4.2.1. Single-MSS stacks - benefits and issues
A TCP stack can reduce the memory requirements by advertising a TCP A TCP stack can reduce the memory requirements by advertising a TCP
window size of one MSS, and also transmit at most one MSS of window size of one MSS, and also transmit at most one MSS of
unacknowledged data. In that case, both congestion and flow control unacknowledged data. In that case, both congestion and flow control
implementation are quite simple. Such a small receive and send implementation are quite simple. Such a small receive and send
window may be sufficient for simple message exchanges in the CNN window may be sufficient for simple message exchanges in the CNN
space. However, only using a window of one MSS can significantly space. However, only using a window of one MSS can significantly
affect performance. A stop-and-wait operation results in low affect performance. A stop-and-wait operation results in low
throughput for transfers that exceed the length of one MSS, e.g., a throughput for transfers that exceed the length of one MSS, e.g., a
firmware download. Furthermore, a single-segment solution relies firmware download. Furthermore, a single-MSS solution relies solely
solely on timer-based loss recovery, therefore missing the on timer-based loss recovery, therefore missing the performance gain
performance gain of Fast Retransmit and Fast Recovery (which require of Fast Retransmit and Fast Recovery (which require a larger window
a larger window size, see Subsection 4.3.1). size, see Subsection 4.3.1).
If CoAP is used over TCP with the default setting for NSTART in If CoAP is used over TCP with the default setting for NSTART in
[RFC7252], a CoAP endpoint is not allowed to send a new message to a [RFC7252], a CoAP endpoint is not allowed to send a new message to a
destination until a response for the previous message sent to that destination until a response for the previous message sent to that
destination has been received. This is equivalent to an application- destination has been received. This is equivalent to an application-
layer window size of 1 data unit. For this use of CoAP, a maximum layer window size of 1 data unit. For this use of CoAP, a maximum
TCP window of one MSS may be sufficient, as long as the CoAP message TCP window of one MSS may be sufficient, as long as the CoAP message
size does not exceed one MSS. size does not exceed one MSS. An exception in CoAP over TCP, though,
is the Capabilities and Settings Message (CSM) that must be sent at
the start of the TCP connection. The first application message
carrying user data is allowed to be sent immediately after the CSM
message. If the sum of the CSM size plus the application message
size exceeds the MSS, a sender using a single-MSS stack will need to
wait for the ACK confirming the CSM before sending the application
message.
4.2.2. TCP options for single-segment stacks 4.2.2. TCP options for single-MSS stacks
A TCP implementation needs to support, at a minimum, TCP options 2, 1 A TCP implementation needs to support, at a minimum, TCP options 2, 1
and 0. These are, respectively, the Maximum Segment Size (MSS) and 0. These are, respectively, the Maximum Segment Size (MSS)
option, the No-Operation option, and the End Of Option List marker option, the No-Operation option, and the End Of Option List marker
[RFC0793]. None of these are a substantial burden to support. These [RFC0793]. None of these are a substantial burden to support. These
options are sufficient for interoperability with a standard-compliant options are sufficient for interoperability with a standard-compliant
TCP endpoint, albeit many TCP stacks support additional options and TCP endpoint, albeit many TCP stacks support additional options and
can negotiate their use. A TCP implementation is permitted to can negotiate their use. A TCP implementation is permitted to
silently ignore all other TCP options. silently ignore all other TCP options.
A TCP implementation for a constrained device that uses a single- A TCP implementation for a constrained device that uses a single-MSS
segment TCP receive or transmit window size may not benefit from TCP receive or transmit window size may not benefit from supporting
supporting the following TCP options: Window scale [RFC7323], TCP the following TCP options: Window scale [RFC7323], TCP Timestamps
Timestamps [RFC7323], Selective Acknowledgments (SACK) and SACK- [RFC7323], Selective Acknowledgments (SACK) and SACK-Permitted
Permitted [RFC2018]. Also other TCP options may not be required on a [RFC2018]. Also other TCP options may not be required on a
constrained device with a very lightweight implementation. With constrained device with a very lightweight implementation. With
regard to the Window scale option, note that it is only useful if a regard to the Window scale option, note that it is only useful if a
window size greater than 64 kB is needed. window size greater than 64 kB is needed.
Note that a TCP sender can benefit from the TCP Timestamps option Note that a TCP sender can benefit from the TCP Timestamps option
[RFC7323] in detecting spurious RTOs. The latter are quite likely to [RFC7323] in detecting spurious RTOs. The latter are quite likely to
occur in CNN scenarios due to a number of reasons (e.g. route changes occur in CNN scenarios due to a number of reasons (e.g. route changes
in a multihop scenario, link layer retries, etc.). The header in a multihop scenario, link layer retries, etc.). The header
overhead incurred by the Timestamps option (of up to 12 bytes) needs overhead incurred by the Timestamps option (of up to 12 bytes) needs
to be taken into account. to be taken into account.
One potentially relevant TCP option in the context of CNNs is TCP One potentially relevant TCP option in the context of CNNs is TCP
Fast Open (TFO) [RFC7413]. As described in Section 5.3, TFO can be Fast Open (TFO) [RFC7413]. As described in Section 5.3, TFO can be
used to address the problem of traversing middleboxes that perform used to address the problem of traversing middleboxes that perform
early filter state record deletion. early filter state record deletion.
4.2.3. Delayed Acknowledgments for single-segment stacks 4.2.3. Delayed Acknowledgments for single-MSS stacks
TCP Delayed Acknowledgments are meant to reduce the number of ACKs TCP Delayed Acknowledgments are meant to reduce the number of ACKs
sent within a TCP connection, thus reducing network overhead, but sent within a TCP connection, thus reducing network overhead, but
they may increase the time until a sender may receive an ACK. In they may increase the time until a sender may receive an ACK. In
general, usefulness of Delayed ACKs depends heavily on the usage general, usefulness of Delayed ACKs depends heavily on the usage
scenario (see subsection 4.3.2). There can be interactions with scenario (see subsection 4.3.2). There can be interactions with
single-segment stacks. single-MSS stacks.
When traffic is unidirectional, if the sender can send at most one When traffic is unidirectional, if the sender can send at most one
MSS of data or the receiver advertises a receive window not greater MSS of data or the receiver advertises a receive window not greater
than the MSS, Delayed ACKs may unnecessarily contribute delay (up to than the MSS, Delayed ACKs may unnecessarily contribute delay (up to
500 ms) to the RTT [RFC5681], which limits the throughput and can 500 ms) to the RTT [RFC5681], which limits the throughput and can
increase data delivery time. Note that, in some cases, it may not be increase data delivery time. Note that, in some cases, it may not be
possible to disable Delayed ACKs. One known workaround is to split possible to disable Delayed ACKs. One known workaround is to split
the data to be sent into two segments of smaller size. A standard the data to be sent into two segments of smaller size. A standard
compliant TCP receiver will acknowledge the second MSS of data, which compliant TCP receiver may immediately acknowledge the second MSS of
can improve throughput. However, this 'split hack' may not always data, which can improve throughput. However, this 'split hack' may
work since a TCP receiver is required to acknowledge every second not always work since a TCP receiver is required to acknowledge every
full-sized segment, but not two consecutive small segments. second full-sized segment, but not two consecutive small segments.
Furthermore, the overhead of sending two IP packets instead of one is Furthermore, the overhead of sending two IP packets instead of one is
another downside of the 'split hack'. another downside of the 'split hack'.
Similar issues happen when the sender uses the Nagle algorithm. Similar issues may happen when the sender uses the Nagle algorithm,
Disabling the algorithm will not have impact if the sender can only since the sender may need to wait for an unnecessarily delayed ACK to
handle stop-and-wait operation. send a new segment. Disabling the algorithm will not have impact if
the sender can only handle stop-and-wait operation at the TCP level.
For request-response traffic, when the receiver uses Delayed ACKs, a For request-response traffic, when the receiver uses Delayed ACKs, a
response to a data message can piggyback an ACK, as long as the response to a data message can piggyback an ACK, as long as the
latter is sent before the Delayed ACK timer expires, thus avoiding latter is sent before the Delayed ACK timer expires, thus avoiding
unnecessary pure ACKs. Disabling Delayed ACKs at the sender allows unnecessary ACKs without payload. Disabling Delayed ACKs at the
an immediate ACK for the data segment carrying the response. sender allows an immediate ACK for the data segment carrying the
response.
4.2.4. RTO calculation for single-segment stacks 4.2.4. RTO calculation for single-MSS stacks
The Retransmission Timeout (RTO) calculation is one of the The Retransmission Timeout (RTO) calculation is one of the
fundamental TCP algorithms [RFC6298]. There is a fundamental trade- fundamental TCP algorithms [RFC6298]. There is a fundamental trade-
off: A short, aggressive RTO behavior reduces wait time before off: A short, aggressive RTO behavior reduces wait time before
retransmissions, but it also increases the probability of spurious retransmissions, but it also increases the probability of spurious
timeouts. The latter lead to unnecessary waste of potentially scarce timeouts. The latter lead to unnecessary waste of potentially scarce
resources in CNNs such as energy and bandwidth. In contrast, a resources in CNNs such as energy and bandwidth. In contrast, a
conservative timeout can result in long error recovery times and thus conservative timeout can result in long error recovery times and thus
needlessly delay data delivery. needlessly delay data delivery.
If a TCP sender uses a very small window size, and it cannot benefit If a TCP sender uses a very small window size, and it cannot benefit
from Fast Retransmit/Fast Recovery or SACK, the RTO algorithm has a from Fast Retransmit/Fast Recovery or SACK, the RTO algorithm has a
large impact on performance. In that case, RTO algorithm tuning may large impact on performance. In that case, RTO algorithm tuning may
be considered, although careful assessment of possible drawbacks is be considered, although careful assessment of possible drawbacks is
recommended [I-D.ietf-tcpm-rto-consider]. recommended [I-D.ietf-tcpm-rto-consider].
As an example, an adaptive RTO algorithm for CoAP over UDP has been As an example, adaptive RTO algorithms defined for CoAP over UDP have
defined that has been found to perform well in CNN scenarios been found to perform well in CNN scenarios [Commag]
[Commag]. [I-D.jarvinen-core-fasor].
4.3. General recommendations for TCP in CNNs 4.3. General recommendations for TCP in CNNs
This section summarizes some widely used techniques to improve TCP, This section summarizes some widely used techniques to improve TCP,
with a focus on their use in CNNs. The TCP extensions discussed here with a focus on their use in CNNs. The TCP extensions discussed here
are useful in a wide range of network scenarios, including CNNs. are useful in a wide range of network scenarios, including CNNs.
This section is not comprehensive. A comprehensive survey of TCP This section is not comprehensive. A comprehensive survey of TCP
extensions is published in [RFC7414]. extensions is published in [RFC7414].
4.3.1. Loss recovery and congestion/flow control 4.3.1. Loss recovery and congestion/flow control
Devices that have enough memory to allow a larger (i.e. more than 3 Devices that have enough memory to allow a larger (i.e. more than 3
MSS of data) TCP window size can leverage a more efficient loss MSS of data) TCP window size can leverage a more efficient loss
recovery than the timer-based approach used for smaller TCP window recovery than the timer-based approach used for smaller TCP window
size (see Subsection 3.2.1) by using Fast Retransmit and Fast size (see Subsection 3.2.1) by using Fast Retransmit and Fast
Recovery [RFC5681], at the expense of slightly greater complexity and Recovery [RFC5681], at the expense of slightly greater complexity and
skipping to change at page 12, line 17 skipping to change at page 12, line 22
4.3.1. Loss recovery and congestion/flow control 4.3.1. Loss recovery and congestion/flow control
Devices that have enough memory to allow a larger (i.e. more than 3 Devices that have enough memory to allow a larger (i.e. more than 3
MSS of data) TCP window size can leverage a more efficient loss MSS of data) TCP window size can leverage a more efficient loss
recovery than the timer-based approach used for smaller TCP window recovery than the timer-based approach used for smaller TCP window
size (see Subsection 3.2.1) by using Fast Retransmit and Fast size (see Subsection 3.2.1) by using Fast Retransmit and Fast
Recovery [RFC5681], at the expense of slightly greater complexity and Recovery [RFC5681], at the expense of slightly greater complexity and
TCB size. Assuming that Delayed ACKs are used by the receiver, a TCB size. Assuming that Delayed ACKs are used by the receiver, a
window size of up to 5 MSS is required for Fast Retransmit and Fast window size of up to 5 MSS is required for Fast Retransmit and Fast
Recovery to work efficiently: If in a given TCP transmission of Recovery to work efficiently: If in a given TCP transmission of full-
segments 1, 2, 3, 4, 5, and 6 segment 2 gets lost, and the ACK for sized segments 1, 2, 3, 4, and 5, segment 2 gets lost, and the ACK
segment 1 is held by the Delayed ACK timer, then the sender should for segment 1 is held by the Delayed ACK timer, then the sender
get an ACK for segment 1 when 3 arrives and duplicate ACKs when should get an ACK for segment 1 when 3 arrives and duplicate ACKs
segments 4, 5, and 6 arrive. It will retransmit segment 2 when the when segments 4, 5, and 6 arrive. It will retransmit segment 2 when
third duplicate ACK arrives. In order to have segments 2, 3, 4, 5, the third duplicate ACK arrives. In order to have segments 2, 3, 4,
and 6 sent, the window has to be of at least 5 MSS. With an MSS of 5, and 6 sent, the window has to be of at least 5 MSS. With an MSS
1220 bytes, a buffer of a size of 5 MSS would require 6100 bytes. of 1220 bytes, a buffer of a size of 5 MSS would require 6100 bytes.
Further TCP improvements such as Limited Transmit [RFC3042] may also The example in the previous paragraph did not use a further TCP
improvement such as Limited Transmit [RFC3042]. The latter may also
be useful for any transfer that has more than one segment in flight. be useful for any transfer that has more than one segment in flight.
Small transfers tend to benefit more from Limited Transmit, because Small transfers tend to benefit more from Limited Transmit, because
they are more likely to not receive enough duplicate ACKs. Assuming they are more likely to not receive enough duplicate ACKs. Assuming
the example in the previous paragraph, Limited Transmit allows the example in the previous paragraph, Limited Transmit allows
sending 5 MSS with a congestion window (cwnd) of 3 segments, plus two sending 5 MSS with a congestion window (cwnd) of 3 segments, plus two
additional segments for each one of the first two duplicate ACKs. additional segments for the first two duplicate ACKs. With Limited
Transmit, even a cwnd of 2 segments allows sending 5 MSS, at the
expense of additional delay contributed by the Delayed ACK timer for
the ACK that confirms segment 1.
When a multiple-segment window is used, the receiver will need to When a multiple-segment window is used, the receiver will need to
manage the reception of possible out-of-order received segments, manage the reception of possible out-of-order received segments,
requiring sufficient buffer space. requiring sufficient buffer space.
4.3.1.1. Selective Acknowledgments (SACK) 4.3.1.1. Selective Acknowledgments (SACK)
If a device with less severe memory and processing constraints can If a device with less severe memory and processing constraints can
afford advertising a TCP window size of several MSS, it makes sense afford advertising a TCP window size of several MSS, it makes sense
to support the SACK option to improve performance. SACK allows a to support the SACK option to improve performance. SACK allows a
data receiver to inform the data sender of non-contiguous data blocks data receiver to inform the data sender of non-contiguous data blocks
received, thus a sender (having previously sent the SACK-Permitted received, thus a sender (having previously sent the SACK-Permitted
option) can avoid performing unnecessary retransmissions, saving option) can avoid performing unnecessary retransmissions, saving
energy and bandwidth, as well as reducing latency. In addition, SACK energy and bandwidth, as well as reducing latency. In addition, SACK
often allows for faster loss recovery when there is more than one often allows for faster loss recovery when there is more than one
lost segment in a window of data, since with SACK recovery requires lost segment in a window of data, since with SACK recovery may
less RTTs. SACK is particularly useful for bulk data transfers. A complete with less RTTs. SACK is particularly useful for bulk data
receiver supporting SACK will need to keep track of the SACK blocks transfers. A receiver supporting SACK will need to keep track of the
that need to be received. The sender will also need to keep track of SACK blocks that need to be received. The sender will also need to
which data segments need to be resent after learning which data keep track of which data segments need to be resent after learning
blocks are missing at the receiver. SACK adds 8*n+2 bytes to the TCP which data blocks are missing at the receiver. SACK adds 8*n+2 bytes
header, where n denotes the number of data blocks received, up to 4 to the TCP header, where n denotes the number of data blocks
blocks. For a low number of out-of-order segments, the header received, up to 4 blocks. For a low number of out-of-order segments,
overhead penalty of SACK is compensated by avoiding unnecessary the header overhead penalty of SACK is compensated by avoiding
retransmissions. When the sender discovers the data blocks that have unnecessary retransmissions. When the sender discovers the data
already been received, it needs to also store the necessary state to blocks that have already been received, it needs to also store the
avoid unnecessary retransmission of data segments that have already necessary state to avoid unnecessary retransmission of data segments
been received. that have already been received.
4.3.2. Delayed Acknowledgments 4.3.2. Delayed Acknowledgments
For certain traffic patterns, Delayed ACKs may have a detrimental For certain traffic patterns, Delayed ACKs may have a detrimental
effect, as already noted in Section 4.2.3. Advanced TCP stacks may effect, as already noted in Section 4.2.3. Advanced TCP stacks may
use heuristics to determine the maximum delay for an ACK. For CNNs, use heuristics to determine the maximum delay for an ACK. For CNNs,
the recommendation depends on the expected communication patterns. the recommendation depends on the expected communication patterns.
When traffic over a CNN is expected to mostly be unidirectional When traffic over a CNN is expected to mostly be unidirectional
messages with a size typically up to one MSS, and the time between messages with a size typically up to one MSS, and the time between
two consecutive message transmissions is greater than the Delayed ACK two consecutive message transmissions is greater than the Delayed ACK
timeout, it may make sense to use a small timeout or disable Delayed timeout, it may make sense to use a small timeout or disable Delayed
ACKs at the receiver. This avoids incurring additional delay, as ACKs at the receiver. This avoids incurring additional delay, as
well as the energy consumption of the sender (which might e.g. keep well as the energy consumption of the sender (which might e.g. keep
its radio interface in receive mode) during that time. Note that its radio interface in receive mode) during that time. Note that
disabling Delayed ACKs may only be possible if the peer device is disabling Delayed ACKs may only be possible if the peer device is
administered by the same entity managing the constrained device. For administered by the same entity managing the constrained device. For
request-response traffic, enabling Delayed ACKs is recommended, in request-response traffic, enabling Delayed ACKs is recommended at the
order to allow combining a response with the ACK into a single server end, in order to allow combining a response with the ACK into
segment, thus increasing efficiency. In this case, disabling Delayed a single segment, thus increasing efficiency. In addition, if a
ACKs at the sender allows an immediate ACK for the data segment client issues requests infrequently, disabling Delayed ACKs at the
carrying the response. client allows an immediate ACK for the data segment carrying the
response.
In contrast, Delayed ACKs allow to reduce the number of ACKs in bulk In contrast, Delayed ACKs allow to reduce the number of ACKs in bulk
transfer type of traffic, e.g. for firmware/software updates or for transfer type of traffic, e.g. for firmware/software updates or for
transferring larger data units containing a batch of sensor readings. transferring larger data units containing a batch of sensor readings.
Note that, in many scenarios, the peer that a constrained device Note that, in many scenarios, the peer that a constrained device
communicates with will be a general purpose system that communicates communicates with will be a general purpose system that communicates
with both constrained and unconstrained devices. Since delayed ACKs with both constrained and unconstrained devices. Since delayed ACKs
are often configured through system-wide parameters, delayed ACKs are often configured through system-wide parameters, delayed ACKs
behavior at the peer will be the same regardless of the nature of the behavior at the peer will be the same regardless of the nature of the
skipping to change at page 17, line 24 skipping to change at page 17, line 31
has a size of 16-20 bytes. has a size of 16-20 bytes.
For the mechanisms discussed in this document, the corresponding For the mechanisms discussed in this document, the corresponding
considerations apply. For instance, if TFO is used, the security considerations apply. For instance, if TFO is used, the security
considerations of [RFC7413] apply. considerations of [RFC7413] apply.
Constrained devices are expected to support smaller TCP window sizes Constrained devices are expected to support smaller TCP window sizes
than less limited devices. In such conditions, segment than less limited devices. In such conditions, segment
retransmission triggered by RTO expiration is expected to be retransmission triggered by RTO expiration is expected to be
relatively frequent, due to lack of (enough) duplicate ACKs, relatively frequent, due to lack of (enough) duplicate ACKs,
especially when a constrained device uses a single-segment especially when a constrained device uses a single-MSS
implementation. For this reason, constrained devices running TCP may implementation. For this reason, constrained devices running TCP may
appear as particularly appealing victims of the so-called "shrew" appear as particularly appealing victims of the so-called "shrew"
Denial of Service (DoS) attack [shrew], whereby one or more sources Denial of Service (DoS) attack [shrew], whereby one or more sources
generate a packet spike targetted to coincide with consecutive RTO- generate a packet spike targetted to coincide with consecutive RTO-
expiration-triggered retry attempts of a victim node. Note that the expiration-triggered retry attempts of a victim node. Note that the
attack may be performed by Internet-connected devices, including attack may be performed by Internet-connected devices, including
constrained devices in the same CNN as the victim, as well as remote constrained devices in the same CNN as the victim, as well as remote
ones. Mitigation techniques include RTO randomization and attack ones. Mitigation techniques include RTO randomization and attack
blocking by routers able to detect shrew attacks based on their blocking by routers able to detect shrew attacks based on their
traffic pattern. traffic pattern.
skipping to change at page 19, line 27 skipping to change at page 19, line 30
been recently added to lwIP. been recently added to lwIP.
8.3. RIOT 8.3. RIOT
The RIOT TCP implementation (called GNRC TCP) has been designed for The RIOT TCP implementation (called GNRC TCP) has been designed for
Class 1 devices [RFC 7228]. The main target platforms are 8- and Class 1 devices [RFC 7228]. The main target platforms are 8- and
16-bit microcontrollers, with 32-bit platforms also supported. GNRC 16-bit microcontrollers, with 32-bit platforms also supported. GNRC
TCP offers a similar function set as uIP, but it provides and TCP offers a similar function set as uIP, but it provides and
maintains an independent receive buffer for each connection. In maintains an independent receive buffer for each connection. In
contrast to uIP, retransmission is also handled by GNRC TCP. For contrast to uIP, retransmission is also handled by GNRC TCP. For
simplicity, GNRC TCP uses a single-segment implementation. The simplicity, GNRC TCP uses a single-MSS implementation. The
application programmer does not need to know anything about the TCP application programmer does not need to know anything about the TCP
internals, therefore GNRC TCP can be seen as a user-friendly uIP TCP internals, therefore GNRC TCP can be seen as a user-friendly uIP TCP
implementation. implementation.
The MSS is set on connections establishment and cannot be changed The MSS is set on connections establishment and cannot be changed
during connection lifetime. GNRC TCP allows multiple connections in during connection lifetime. GNRC TCP allows multiple connections in
parallel, but each TCB must be allocated somewhere in the system. By parallel, but each TCB must be allocated somewhere in the system. By
default there is only enough memory allocated for a single TCP default there is only enough memory allocated for a single TCP
connection, but it can be increased at compile time if the user needs connection, but it can be increased at compile time if the user needs
multiple parallel connections. multiple parallel connections.
skipping to change at page 20, line 13 skipping to change at page 20, line 16
Instead, it will immediately dispatch new, in-order data to the Instead, it will immediately dispatch new, in-order data to the
application and otherwise drop the segment. A send buffer is application and otherwise drop the segment. A send buffer is
provided by the application. Multiple TCP connections are possible. provided by the application. Multiple TCP connections are possible.
Recently there has been little further work on the stack. Recently there has been little further work on the stack.
8.5. FreeRTOS 8.5. FreeRTOS
FreeRTOS is a real-time operating system kernel for embedded devices FreeRTOS is a real-time operating system kernel for embedded devices
that is supported by 16- and 32-bit microprocessors. Its TCP that is supported by 16- and 32-bit microprocessors. Its TCP
implementation is based on multiple-segment window size, although a implementation is based on multiple-segment window size, although a
'Tiny-TCP' option, which is a single-segment variant, can be enabled. 'Tiny-TCP' option, which is a single-MSS variant, can be enabled.
Delayed ACKs are supported, with a 20-ms Delayed ACK timer as a Delayed ACKs are supported, with a 20-ms Delayed ACK timer as a
technique intended 'to gain performance'. technique intended 'to gain performance'.
8.6. uC/OS 8.6. uC/OS
uC/OS is a real-time operating system kernel for embedded devices, uC/OS is a real-time operating system kernel for embedded devices,
which is maintained by Micrium. uC/OS is intended for 8-, 16- and which is maintained by Micrium. uC/OS is intended for 8-, 16- and
32-bit microprocessors. The uC/OS TCP implementation supports a 32-bit microprocessors. The uC/OS TCP implementation supports a
multiple-segment window size. multiple-segment window size.
skipping to change at page 23, line 45 skipping to change at page 23, line 45
9.7. Changes between -06 and -07 9.7. Changes between -06 and -07
o Addressed comments by Gorry Fairhurst o Addressed comments by Gorry Fairhurst
9.8. Changes between -07 and -08 9.8. Changes between -07 and -08
o Addressed WGLC comments by Ilpo Jarvinen, Markku Kojo and Ingemar o Addressed WGLC comments by Ilpo Jarvinen, Markku Kojo and Ingemar
Johansson throughout the document, including the addition of a new Johansson throughout the document, including the addition of a new
subsection on Initial Window considerations. subsection on Initial Window considerations.
9.9. Changes between -08 and -09
o Addressed second round of comments by Ilpo Jarvinen and Markku
Kojo, based on the previous draft update.
10. References 10. References
10.1. Normative References 10.1. Normative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, DOI 10.17487/RFC0793, September 1981, RFC 793, DOI 10.17487/RFC0793, September 1981,
<https://www.rfc-editor.org/info/rfc793>. <https://www.rfc-editor.org/info/rfc793>.
[RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, Communication Layers", STD 3, RFC 1122,
skipping to change at page 26, line 10 skipping to change at page 26, line 21
[I-D.delcarpio-6lo-wlanah] [I-D.delcarpio-6lo-wlanah]
Vega, L., Robles, I., and R. Morabito, "IPv6 over Vega, L., Robles, I., and R. Morabito, "IPv6 over
802.11ah", draft-delcarpio-6lo-wlanah-01 (work in 802.11ah", draft-delcarpio-6lo-wlanah-01 (work in
progress), October 2015. progress), October 2015.
[I-D.ietf-tcpm-rto-consider] [I-D.ietf-tcpm-rto-consider]
Allman, M., "Retransmission Timeout Requirements", draft- Allman, M., "Retransmission Timeout Requirements", draft-
ietf-tcpm-rto-consider-08 (work in progress), February ietf-tcpm-rto-consider-08 (work in progress), February
2019. 2019.
[I-D.jarvinen-core-fasor]
Jarvinen, I., Kojo, M., Raitahila, I., and Z. Cao, "Fast-
Slow Retransmission Timeout and Congestion Control
Algorithm for CoAP", draft-jarvinen-core-fasor-02 (work in
progress), July 2019.
[IntComp] C. Gomez, A. Arcia-Moret, J. Crowcroft, "TCP in the [IntComp] C. Gomez, A. Arcia-Moret, J. Crowcroft, "TCP in the
Internet of Things: from ostracism to prominence", IEEE Internet of Things: from ostracism to prominence", IEEE
Internet Computing, January-February 2018. Internet Computing, January-February 2018.
[RFC2757] Montenegro, G., Dawkins, S., Kojo, M., Magret, V., and N. [RFC2757] Montenegro, G., Dawkins, S., Kojo, M., Magret, V., and N.
Vaidya, "Long Thin Networks", RFC 2757, Vaidya, "Long Thin Networks", RFC 2757,
DOI 10.17487/RFC2757, January 2000, DOI 10.17487/RFC2757, January 2000,
<https://www.rfc-editor.org/info/rfc2757>. <https://www.rfc-editor.org/info/rfc2757>.
[RFC2884] Hadi Salim, J. and U. Ahmed, "Performance Evaluation of [RFC2884] Hadi Salim, J. and U. Ahmed, "Performance Evaluation of
 End of changes. 34 change blocks. 
86 lines changed or deleted 104 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/