draft-ietf-tcpm-accecn-reqs-04.txt   draft-ietf-tcpm-accecn-reqs-05.txt 
TCP Maintenance and Minor Extensions (tcpm) M. Kuehlewind, Ed. TCP Maintenance and Minor Extensions (tcpm) M. Kuehlewind, Ed.
Internet-Draft University of Stuttgart Internet-Draft University of Stuttgart
Intended status: Informational R. Scheffenegger Intended status: Informational R. Scheffenegger
Expires: April 24, 2014 NetApp, Inc. Expires: August 16, 2014 NetApp, Inc.
October 21, 2013 B. Briscoe
BT
February 12, 2014
Problem Statement and Requirements for a More Accurate ECN Feedback Problem Statement and Requirements for a More Accurate ECN Feedback
draft-ietf-tcpm-accecn-reqs-04 draft-ietf-tcpm-accecn-reqs-05
Abstract Abstract
Explicit Congestion Notification (ECN) is an IP/TCP mechanism where Explicit Congestion Notification (ECN) is an IP/TCP mechanism where
network nodes can mark IP packets instead of dropping them to network nodes can mark IP packets instead of dropping them to
indicate congestion to the end-points. An ECN-capable receiver will indicate congestion to the end-points. An ECN-capable receiver will
feedback this information to the sender. ECN is specified for TCP in feed this information back to the sender. ECN is specified for TCP
such a way that only one feedback signal can be transmitted per in such a way that it can only feed back one congestion signal per
Round-Trip Time (RTT). Recently, new TCP mechanisms like ConEx or Round-Trip Time (RTT). In contrast, ECN for other transport
DCTCP need more accurate ECN feedback information in the case where protocols, such as RTP/UDP and SCTP, is specified with more accurate
more than one marking is received in one RTT. This document ECN feedback. Recent new TCP mechanisms (like ConEx or DCTCP) need
specifies requirements for an update to the TCP protocol to provide more accurate ECN feedback in the case where more than one marking is
more accurate ECN feedback than one signal per RTT. received in one RTT. This document specifies requirements for an
update to the TCP protocol to provide more accurate ECN feedback.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 24, 2014. This Internet-Draft will expire on August 16, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2. Recap of Classic ECN and ECN Nonce in IP/TCP . . . . . . . . 3 2. Recap of Classic ECN and ECN Nonce in IP/TCP . . . . . . . . 4
3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 7
5. Design Approaches . . . . . . . . . . . . . . . . . . . . . . 9 5. Design Approaches . . . . . . . . . . . . . . . . . . . . . . 10
5.1. Re-use of ECN/NS Header Bits . . . . . . . . . . . . . . 9 5.1. Re-Definition of ECN/NS Header Bits . . . . . . . . . . . 10
5.2. Using Other Header Bits . . . . . . . . . . . . . . . . . 10 5.2. Using Other Header Bits . . . . . . . . . . . . . . . . . 11
5.3. Using a TCP Option . . . . . . . . . . . . . . . . . . . 10 5.3. Using a TCP Option . . . . . . . . . . . . . . . . . . . 12
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 13
9.1. Normative References . . . . . . . . . . . . . . . . . . 11 9.1. Normative References . . . . . . . . . . . . . . . . . . 13
9.2. Informative References . . . . . . . . . . . . . . . . . 12 9.2. Informative References . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 Appendix A. Ambiguity of the More Accurate ECN Feedback in DCTCP 14
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15
1. Introduction 1. Introduction
Explicit Congestion Notification (ECN) [RFC3168] is an IP/TCP Explicit Congestion Notification (ECN) [RFC3168] is an IP/TCP
mechanism where network nodes can mark IP packets instead of dropping mechanism where network nodes can mark IP packets instead of dropping
them to indicate congestion to the end-points. An ECN-capable them to indicate congestion to the end-points. An ECN-capable
receiver will feedback this information to the sender. ECN is receiver will feed this information back to the sender. ECN is
specified for TCP in such a way that only one feedback signal can be specified for TCP in such a way that only one feedback signal can be
transmitted per Round-Trip Time (RTT). This is sufficient for pre- transmitted per Round-Trip Time (RTT). This is sufficient for pre-
existing congestion control mechanisms that perform only one existing TCP congestion control mechanisms that perform only one
reduction in sending rate per RTT, independent of the number of ECN reduction in sending rate per RTT, independent of the number of ECN
congestion marks. But recently proposed/deployed mechanisms like congestion marks. But recently proposed or deployed mechanisms like
Congestion Exposure (ConEx) [RFC6789] or DCTCP [Ali10] need more Congestion Exposure (ConEx) [RFC6789] or Data Center TCP (DCTCP)
fine-grained ECN feedback information to work correctly in the case [Ali10] need more accurate ECN feedback to work correctly in the case
where more than one marking is received in any one RTT. where more than one marking is received in any one RTT.
ECN is also defined for transport protocols beside TCP. ECN feedback
as defined for RTP/UDP [RFC6679] provides a very detailed level of
information, delivering individual counters for all four ECN
codepoints as well as lost and duplicate segments, but at the cost of
high signaling overhead. ECN feedback for SCTP
[I-D.stewart-tsvwg-sctpecn] delivers a counter for the number of CE
marked segments between CWR chunks, but also comes at the cost of
increased overhead.
Today, implementations of DCTCP already exist that alter TCP's ECN
feedback protocol in proprietary ways (DCTCP was released in
Microsoft Windows 8, and implementations exist for Linux and
FreeBSD). The changes DCTCP makes to TCP are not currently the
subject of any IETF standardization activity, and they omit
capability negotiation, relying instead on uniform configuration
across a across all hosts and network devices with ECN capability. A
primary motivation for this document is to intervene before each
proprietary implementation invents its own non-interoperable
handshake, which could lead to _de facto_ consumption of the few
flags or codepoints that remain available for standardizing
capability negotiation.
This document lists requirements for a robust and interoperable more This document lists requirements for a robust and interoperable more
accurate TCP/ECN feedback protocol that all implementations of new accurate TCP/ECN feedback protocol that all implementations of new
TCP extension like ConEx and/or DCTCP can use. While a new feedback TCP extensions, like ConEx and/or DCTCP, can use. While a new
scheme should still deliver identical performance as classic ECN, feedback scheme should still deliver as much information as classic
this document also clarifies what has to be taken into consideration ECN, this document also clarifies what has to be taken into
in addition. Thus the listed requirements should be addressed in the consideration in addition. Thus the listed requirements should be
specification of a more accurate ECN feedback scheme. Moreover, as a addressed in the specification of a more accurate ECN feedback
large set of proposals already exists, a few high level design scheme. A few solutions have already been proposed. Section 5
choices are sketched and briefly discussed, to demonstrate some of demonstrates how to use the requirements to compare them, by briefly
the benefits and drawbacks of each of these potential schemes. A few sketching their high level design choices and discussing the benefits
solutions have already been proposed, so Section 5 demonstrates how and drawbacks of each.
to use the requirements to compare them, by briefly sketching their
high level design choices and discussing the benefits and drawbacks
of each.
1.1. Requirements Language 1.1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
We use the following terminology from [RFC3168] and [RFC3540]: We use the following terminology from [RFC3168] and [RFC3540]:
The ECN field in the IP header: The ECN field in the IP header:
CE: the Congestion Experienced codepoint, Not-ECT: the not ECN-Capable Transport codepoint,
ECT(0): the first ECN-capable Transport codepoint, and CE: the Congestion Experienced codepoint,
ECT(1): the second ECN-capable Transport codepoint. ECT(0): the first ECN-Capable Transport codepoint, and
ECT(1): the second ECN-Capable Transport codepoint.
The ECN flags in the TCP header: The ECN flags in the TCP header:
CWR: the Congestion Window Reduced flag, CWR: the Congestion Window Reduced flag,
ECE: the ECN-Echo flag, and ECE: the ECN-Echo flag, and
NS: ECN Nonce Sum. NS: ECN Nonce Sum.
In this document, the ECN feedback scheme as specified in [RFC3168] In this document, the ECN feedback scheme as specified in [RFC3168]
is called the 'classic ECN' and any new proposal the 'more accurate is called 'classic ECN' and any new proposal is called a 'more
ECN feedback' scheme. A 'congestion mark' is defined as an IP packet accurate ECN feedback' scheme. A 'congestion mark' is defined as an
where the CE codepoint is set. A 'congestion episode' refers to one IP packet where the CE codepoint is set. A 'congestion episode'
or more congestion marks belonging to the same overload situation in refers to one or more congestion marks that belong to the same
the network (usually during one RTT). A TCP segment with the overload situation in the network (usually during one RTT). A TCP
acknowledgment flag set is simply called ACK. segment with the acknowledgment flag set is simply called ACK.
2. Recap of Classic ECN and ECN Nonce in IP/TCP 2. Recap of Classic ECN and ECN Nonce in IP/TCP
ECN requires two bits in the IP header. The ECN capability of a ECN requires two bits in the IP header. The ECN capability of a
packet is indicated when either one of the two bits is set. An ECN packet is indicated when either one of the two bits is set. A
sender can set one or the other bit to indicate an ECN-capable
transport (ECT) which results in two signals, ECT(0) and ECT(1). A
network node can set both bits simultaneously when it experiences network node can set both bits simultaneously when it experiences
congestion. When both bits are set the packet is regarded as congestion. This leads to the four codepoints (not-ECT, ECT(0),
"Congestion Experienced" (CE). ECT(1), and CE) as listed above.
In the TCP header the first two bits in byte 14 are defined as ECN In the TCP header the first two bits in byte 14 are defined as ECN
feedback for each half-connection. A TCP receiver signals the feedback for each half-connection. A TCP receiver signals the
reception of a congestion mark using the ECN-Echo (ECE) flag in the reception of a congestion mark using the ECN-Echo (ECE) flag in the
TCP header. For reliability, the receiver continues to set the ECE TCP header. For reliability, the receiver continues to set the ECE
flag on every ACK. To enable the TCP receiver to determine when to flag on every ACK. To enable the TCP receiver to determine when to
stop setting the ECN-Echo flag, the sender sets the CWR flag upon stop setting the ECN-Echo flag, the sender sets the CWR flag upon
reception of an ECE feedback signal. This always leads to a full RTT reception of an ECE feedback signal. This always leads to a full RTT
of ACKs with ECE set. Thus the receiver cannot signal back any of ACKs with ECE set. Thus the receiver cannot signal back any
additional CE markings arriving within the same RTT. additional CE markings arriving within the same RTT.
The ECN Nonce [RFC3540] is an experimental addition to ECN that the The ECN Nonce [RFC3540] is an experimental addition to ECN that the
TCP sender can use to protect itself against accidental or malicious TCP sender can use to protect itself against accidental or malicious
concealment of marked or dropped packets. This addition defines the concealment of CE-marked (or dropped) packets. This addition defines
last bit of byte 13 in the TCP header as the Nonce Sum (NS) flag. the last bit of byte 13 in the TCP header as the Nonce Sum (NS) flag.
The receiver maintains a nonce sum that counts the occurrence of The receiver maintains a nonce sum that counts the occurrence of
ECT(1) packets, and signals the least significant bit of this sum on ECT(1) packets, and signals the least significant bit of this sum on
the NS flag. the NS flag.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | N | C | E | U | A | P | R | S | F | | | | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I |
| | | | R | E | G | K | H | T | N | N | | | | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 1: The (post-ECN Nonce) definition of the TCP header flags Figure 1: The (post-ECN Nonce) definition of the TCP header flags
However, as ECN is a seperate extension to ECN, even if a sender However, as the ECN Nonce is a separate extension to ECN, even if a
tries to protect itself with the nonce, any receiver wishing to sender tries to protect itself with the ECN Nonce, any receiver
conceal marked or dropped packets only has to pretent not supporting wishing to conceal marked packets only has to pretend not to support
ECN Nonce and simply not provide any nonce feedback. An alternative the ECN Nonce and simply does not provide any nonce sum feedback.
for a sender to assure feedback integrity has been proposed where the
sender occasionally inserts a CE mark, reordering or loss itself, and An alternative for a sender to assure feedback integrity has been
checks that the receiver feeds it back faithfully proposed where the sender occasionally inserts a CE mark itself (or
[I-D.moncaster-tcpm-rcv-cheat]. This alternative requires no reordering or loss), and checks that the receiver feeds it back
standardisation and consumes no header bits or codepoints, as well as faithfully [I-D.moncaster-tcpm-rcv-cheat]. This alternative requires
releasing the ECT(1) codepoint in the IP header and the NS flag in no standardization and consumes no header bits or codepoints, as well
as releasing the ECT(1) codepoint in the IP header and the NS flag in
the TCP header for other uses. the TCP header for other uses.
3. Use Cases 3. Use Cases
ConEx is an experimental approach that allows the sender to re-insert ConEx is an experimental approach that allows a sender to relay
the congestion feedback it sees into the forward data path. This is congestion feedback provided by the receiver into the network along
primarily so that any traffic management can be proportionate to the forward data path. ConEx information can be used for traffic
actual congestion caused by traffic, rather than limiting traffic management to limit traffic proportionate to the actual congestion
based on rate or volume in case it might cause congestion [RFC6789]. being caused, rather than limiting traffic based on rate or volume
A ConEx sender uses selective acknowledgements (SACK [RFC2018]) for [RFC6789]. A ConEx sender uses selective acknowledgements (SACK)
fine-grained feedback of loss signals, but currently TCP offers no [RFC2018] for accurate feedback of loss signals, but currently TCP
equivalent fine-grained feedback for ECN. offers no equivalent accurate feedback for ECN.
DCTCP offers very low and predictable queueing delay. DCTCP requires
switches/routers to have ECN enabled and configured with no signal
smoothing, so it is currently only used in private networks, e.g.
internal to data centres. DCTCP was released in Microsoft Windows 8,
and implementations exist for Linux and FreeBSD.
The changes DCTCP makes to TCP are not currently the subject of any DCTCP offers very low and predictable queuing delay. DCTCP changes
IETF standardisation activity. The different DCTCP implementations the reaction to congestion of a TCP sender and additionally requires
alter TCP's ECN feedback protocol [RFC3168] in unspecified switches/routers to have ECN enabled and configured with a low step
proprietary ways, and they either omit capability negotiation, or threshold and no signal smoothing, so it is currently only used in
they use non-interoperable negotiation. A primary motivation for private networks, e.g. internal to data centers. DCTCP was released
this document is to prevent each proprietary implementation from in Microsoft Windows 8, and implementations exist for Linux and
inventing its own handshake, which could lead to _de facto_ FreeBSD. To retrieve sufficient congestion information, the
consumption of the few flags that remain available for standardising different DCTCP implementations use a proprietary ECN feedback
capability negotiation. Also, those variants that use the feedback protocol, but they omit capability negotiation. Moreover, the
protocol proposed in [Ali10] only work if there are no losses at all, feedback protocol proposed in [Ali10] only works if there are no
and otherwise they become confused. losses at all, and otherwise it gets very confused (see Appendix A).
Therefore, if a generic more accurate ECN feedback scheme were
available, it would solve two problems for DCTCP: i) need for a
consistent variant of DCTCP to be deployed network-wide and ii)
inability to cope with ACK loss.
The following scenarios should briefly show where the accurate The following scenarios should briefly show where accurate ECN
feedback is needed or adds value: feedback is needed or adds value:
An RFC5681 TCP sender that supports ConEx: A sender with standardised TCP congestion control that supports
ConEx:
In this case the ConEx mechanism uses the extra information In this case the ConEx mechanism uses the extra information
per RTT to re-echo the precise congestion information, but per RTT to re-echo the precise congestion information, but
the congestion control algorithm still ignores multiple marks the congestion control algorithm still ignores multiple marks
per RTT [RFC5681]. per RTT [RFC5681].
A sender using DCTCP congestion control without ConEx: A sender using DCTCP congestion control without ConEx:
The congestion control algorithm uses the extra info per RTT The congestion control algorithm uses the extra info per RTT
to perform its decrease depending on the number of congestion to perform its decrease depending on the number of congestion
marks. marks.
A sender using DCTCP congestion control and supports ConEx: A sender using DCTCP congestion control and supporting ConEx:
Both the congestion control algorithm and ConEx use the fine- Both the congestion control algorithm and ConEx use the more
grained ECN feedback mechanism. accurate ECN feedback mechanism.
A RFC5681 TCP sender without ConEx: As-yet-unspecified sender mechanisms:
The above are two examples of more general interest in sender
mechanisms that respond to the extent of congestion feedback,
not just its existence. It will greatly simplify incremental
deployment if the sender can unilaterally deploy new
behaviours, and rely on the presence of generic receivers
that have already implemented more accurate feedback.
A RFC5681 TCP sender without ConEx:
No accurate feedback is necessary here. The congestion No accurate feedback is necessary here. The congestion
control algorithm still reacts on only one signal per RTT. control algorithm still reacts to only one signal per RTT.
But it is best to have one generic feedback mechanism, But it is best to feed back all the information the receiver
whether it is used or not. gets, whether the sender uses it or not -- at least as long
as overhead is low or zero.
Using CE for checking integrity: Using CE for checking integrity:
If a more accurate ECN feedback scheme would feed all If a more accurate ECN feedback scheme feeds all occurrences
occurrences of CE marks back, a sender could perform of CE marks back, a sender could perform integrity checking
interigty checking based in the injection of CE marks. by occasionally injecting CE marks itself. Specifically, a
Thereby, a sender will send packets which are sender can send packets which it randomly marks with CE (at
deterministically marked with CE (at a low frequency) and low frequency), then check if feedback is received for these
keep track if feedback is received for these packets. Of packets. The congestion notification feedback for these
course, the congestion notification feedback for these self- self-injected markings, would not require a congestion
injected markings, does not cause a congestion control react. control reaction [I-D.moncaster-tcpm-rcv-cheat].
4. Requirements 4. Requirements
The requirements of the accurate ECN feedback protocol, for the use The requirements of the accurate ECN feedback protocol are to have
of e.g. Conex or DCTCP, are to have a fairly accurate (not fairly accurate (not necessarily perfect), timely and protected
necessarily perfect), timely and protected signaling. This leads to signaling. This leads to the following requirements, which MUST be
the following requirements, which should be discussed for any discussed for any proposed more accurate ECN feedback scheme:
proposed more accurate ECN feedback scheme:
Resilience Resilience
The ECN feedback signal is carried within the ACK. TCP ACKs The ECN feedback signal is carried within the ACK. Pure TCP
can get lost. Moreover, delayed ACKs are commonly used with ACKs can get lost without recovery (not just due to
TCP. That means in most cases only every second data packet congestion, but also due to deliberate ACK thinning).
triggers an ACK. In a high congestion situation where most Moreover, delayed ACKs are commonly used with TCP.
of the packets are marked with CE, an accurate feedback Typically, an ACK is triggered after two data segments (or
mechanism must still be able to signal sufficient congestion more e.g., due to receive segment coalescing, ACK
compression, ACK congestion control [RFC5690] or other
phenomena). In a high congestion situation where most of the
packets are marked with CE, an accurate feedback mechanism
should still be able to signal sufficient congestion
information. Thus the accurate ECN feedback extension has to information. Thus the accurate ECN feedback extension has to
take delayed ACK and ACK loss into account. Also, a more take delayed ACKs and ACK loss into account. Also, a more
accurate feedback protocol would still work if delayed ACKs accurate feedback protocol should still work if delayed ACKs
covered more than two packets. covered more than two packets.
Timeliness Timeliness
a CE mark can be induced by a network node on the A CE mark can be induced by a network node on the
transmission path and is then echoed by the receiver in the transmission path and is then echoed by the receiver in the
TCP ACK. Thus when this information arrives at the sender, TCP ACK. Thus when this information arrives at the sender,
it is naturally already about one RTT old. With a sufficient it is naturally already about one RTT old. With a sufficient
ACK rate a further delay of a small number of ACKs can be ACK rate a further delay of a small number of packets can be
tolerated. However, this information will become stale with tolerated. However, this information will become stale with
large delays, given the dynamic nature of networks. TCP large delays, given the dynamic nature of networks. TCP
congestion control (which itself partly introduces these congestion control (which itself partly introduces these
dynamics) operates on a time scale of one RTT. Thus, to be dynamics) operates on a time scale of one RTT. Thus, to be
timely, congestion feedback information should be delivered timely, congestion feedback information should be delivered
within about one RTT. within about one RTT.
Integrity Integrity
A more accurate ECN feedback scheme should assure the It should be possible to assure the integrity of the feedback
integrity of the feedback at least as well as the ECN Nonce in a more accurate ECN feedback scheme, at least as well as
or gives strong incentives for the receiver and network nodes the ECN Nonce. Alternatively, it should at least be possible
to give strong incentives for the receiver and network nodes
to cooperate honestly. to cooperate honestly.
Given there are known problems with the ECN nonce (as Given there are known problems with the ECN nonce (as
identified above), this document only requires that the identified above), this document only requires that the
integrity of the more accurate ECN feedback can be assured; integrity of the more accurate ECN feedback can be assured as
it does not require that the ECN Nonce mechanism is employed an inherent part of the new more accurate ECN feedback
to achieve this. Indeed, if integrity could be provided protocol; it does not require that the ECN Nonce mechanism is
else-wise, a more accurate ECN feedback protocol might re-use employed to achieve this. Indeed, if integrity could be
the nonce sum (NS) flag in the TCP header. provided else-wise, a more accurate ECN feedback protocol
might re-purpose the nonce sum (NS) flag in the TCP header.
If the accurate ECN feedback scheme provides sufficient If the more accurate ECN feedback scheme provides sufficient
information, the integrity check could e.g. be performed by information, the integrity check could e.g. be performed by
deterministically setting the CE in the sender and monitoring deterministically setting the CE in the sender and monitoring
the respective feedback (similar to ECT(1) and the ECN Nonce the respective feedback (similar to ECT(1) and the ECN Nonce
sum). If and what kind of enforcements a sender should do, sum). Whether a sender should enforce when it detects wrong
when detecting wrong feedback information, is not part of feedback information, and what kind of enforcement it should
this document. apply, are policy issues that need not be specified as part
of more accurate ECN feedback scheme.
Accuracy Accuracy
Classic ECN feeds back one congestion notification per RTT, Classic ECN feeds back one congestion notification per RTT,
which is sufficient for classic TCP congestion control which which is sufficient for classic TCP congestion control which
reduces the sending rate at most once per RTT. The more reduces the sending rate at most once per RTT. Thus the more
accurate ECN feedback scheme has to ensure that if a accurate ECN feedback scheme should ensure that, if a
congestion episode occurs, at least one congestion congestion episode occurs, at least one congestion
notification is echoed and received per RTT as classic ECN notification is echoed and received per RTT as classic ECN
would do. Of course, the goal of a more accurate ECN would do. Of course, the goal of a more accurate ECN
extension is to reconstruct the number of CE markings more extension is to reconstruct the number of CE markings more
accurately and in the best case even to reconstruct the exact accurately. In the best case the new scheme should even
number of payload bytes that a CE marked packet was carrying. allow reconstruction of the exact number of payload bytes
However, a sender should not assume to get the exact number that a CE marked packet was carrying. However, it is
of congestion markings or marked bytes in all situations. accepted that it may be too complex for a sender to get the
Moreover, the feedback scheme should preserve the order at exact number of congestion markings or marked bytes in all
which any ECN signal was received. And ideally, it would situations. Ideally, the feedback scheme should preserve the
even be possible for the sender to determine which of the order in which any (of the four) ECN signals were received.
packets (covered by one delayed ACK) were congestion marked, And, ideally, it would even be possible for the sender to
e.g. if the flow consists of packets of different sizes, or determine which of the packets covered by one delayed ACK
to allow for future protocols where the order of the markings were congestion marked, e.g. if the flow consists of packets
may be important. of different sizes, or to allow for future protocols where
the order of the markings may be important.
In fact, the ECN field in the IP header provides four code In the best case, a sender that sees more accurate ECN
points. In the best case, a sender that has the more feedback information would be able to reconstruct the
accurate ECN feedback information, would be able to occurrence of any of the four code points (non-ECT, CE,
reconstruct the occurrence of any of the four code points. ECT(0), ECT(1)). However, assuming the sender marks all data
Assuming the sender marks all data packets will at least as packets as ECN-capable and uses the default setting of
ECN-capable and ETC(0) will be the default setting, feeding ECT(0), solely feeding back the occurrence of CE and ECT(1)
the occurrence of CE and ECT(1) back might be sufficient. might be sufficient. Thus a more accurate ECN feedback
scheme should at least provide information on these two
signals, CE and ECT(1).
If a more accurate ECN scheme can reliably deliver feedback
in most but not all circumstances, ideally the scheme should
at least not introduce bias. In other words, undetected loss
of some ACKs should be as likely to increase as decrease the
sender's estimate of the probability of ECN marking.
Complexity Complexity
The implementation should be as simple as possible and only a Implementation should be as simple as possible and only a
minimum of additional state information should be needed. minimum of additional state information should be needed.
This will enable the more accurate ECN feedback to be used as This will enable more accurate ECN feedback to be used as the
the default feedback mechanism, even if only one ECN feedback default feedback mechanism, even if only one ECN feedback
signal per RTT is needed. Furthermore, the receiver should signal per RTT is needed. Furthermore, the receiver should
not take assumptions about the mechanism that was used to set not make assumptions about the mechanism that was used to set
the markin nor about any interpretation or reaction to the the markings nor about any interpretation or reaction to the
congestion signal. The receiver should only feed the congestion signal. The receiver only needs to faithfully
information back as accurate as possible. reflect congestion information back to the sender.
Overhead Overhead
A more accurate ECN feedback signal should limit the A more accurate ECN feedback signal should limit the
additional network load, because ECN feedback is ultimately additional network load, because ECN feedback is ultimately
not critical information (in the worst case, loss will still not critical information (in the worst case, loss will still
be available as a congestion signal of last resort). As be available as a congestion signal of last resort). As
feedback information has to be provided frequently and in a feedback information has to be provided frequently and in a
timely fashion, potentially all or a large fraction of TCP timely fashion, potentially all or a large fraction of TCP
acknowledgments might carry this information. Ideally, no acknowledgments might carry this information. Ideally, no
additional segments should be exchanged compared to an additional segments should be exchanged compared to an
RFC3168 TCP session, and the overhead in each segment should RFC3168 TCP session, and the overhead in each segment should
be minimised. be minimized.
Backward and forward compatibility Backward and forward compatibility
Given more accurate ECN feedback will involve a change to the Given more accurate ECN feedback will involve a change to the
TCP protocol, it will need to be negotiated between the two TCP protocol, it should to be negotiated between the two TCP
TCP endpoints. If either end does not support the more endpoints. If either end does not support the more accurate
accurate feedback, they should both be able to fall-back to feedback, they should both be able to fall-back to classic
classic ECN feedback. ECN feedback.
A more accurate ECN feedback extension should aim to be able A more accurate ECN feedback extension should aim to be able
to traverse most existing middleboxes. Further, a feedback to traverse most existing middleboxes. Further, a feedback
mechanism should provide a method to fall-back to classic mechanism should provide a method to fall-back to classic ECN
RFC3168 signaling if the new signal is suppressed by certain signaling if the new signal is suppressed by certain
middleboxes. middleboxes.
In order to avoid a fork in the TCP protocol specifications, In order to avoid a fork in the TCP protocol specifications,
if experiments with the new ECN feedback protocol are if experiments with the new ECN feedback protocol are
successful, it is intended to eventually update RFC3168 for successful, it is intended to eventually update RFC3168 for
any TCP/ECN sender, not just for ConEx or DCTCP senders. any TCP/ECN sender, not just for ConEx or DCTCP senders.
Therefore, even if only one ECN feedback signal per RTT is Then future senders will be able to unilaterally deploy new
needed, it should be possible to use the more accuarte ECN behaviours that exploit the existence of more accurate ECN
feedback. feedback in receivers (forward compatibility). Conversely,
even if another sender only needs one ECN feedback signal per
RTT, it should be able to use more accurate ECN feedback, and
simply ignore the excess information.
5. Design Approaches 5. Design Approaches
All approaches presented below (and proposed so far) are able to All approaches presented below (and proposed so far) are able to
provide accurate ECN feedback information as long as no ACK loss provide accurate ECN feedback information as long as no ACK loss
occurs and the congestion rate is reasonable. In case of high a high occurs and the congestion rate is reasonable. In case of a high ACK
ACK loss rate or very high congestion (CE marking) rate, the proposed loss rate or very high congestion (CE marking) rate, the proposed
schemes have different resilience characteristics depending on the schemes have different resilience characteristics depending on the
number of used bits for the encoding. While classic ECN provides a number of bits used for the encoding. While classic ECN provides
reliable (inaccurate) feedback of a maximum of one congestion signal reliable (but inaccurate) feedback of a maximum of one congestion
per RTT, the proposed schemes do not implement an explicit signal per RTT, the proposed schemes do not implement an explicit
acknowledgement mechanism. acknowledgement mechanism for the feedback (as e.g. the ECE / CWR
exchange of [RFC3168]).
5.1. Re-use of ECN/NS Header Bits 5.1. Re-Definition of ECN/NS Header Bits
The three ECN/NS header, ECE, CWR and NS are re-used (not only for Schemes in this category can additionally use the NS bit for
additional capability negotiation during the TCP handshake exchange capability negotiation during the TCP handshake exchange. Thus a
but) to signal the current value of an CE counter at the receiver. more accurate ECN could be negotiated without changing the classic
This approach only provides a limited resilience against ACK lost ECN negotiation and thus being backwards compatible.
depending of the number of used bits.
Several codings have been proposed so far: Schemes in this category can simply re-define the ECN header flags,
ECE and CWR, to encode the occurrence of a CE marking at the
receiver. This approach provides very limited resilience against
loss of ACK, particularly pure ACKs (no payload and therefore
delivered unreliably).
o A one bit scheme sends one ECE for each CE received (to increase A couple of schemes have been proposed so far:
the robustness against ACK loss CWR could be used to introduce
redundant information on the next ACK); o A naive one-bit scheme that sends one ECE for each CE received
could use CWR to increase robustness against ACK loss by
introducing redundant information on the next ACK, but this is
still highly vulnerable to ACK loss.
o The scheme defined for DCTCP [Ali10], which toggles the ECE
feedback on an immediate ACK whenever the CE marking changes, and
otherwise feeds back delayed ACKs with the ECE value unchanged.
Appendix A demonstrates that this scheme is still highly ambiguous
to the sender if the ACKs are pure ACKs, and if some may have been
lost.
Alternatively, the receiver uses the three ECN/NS header flags, ECE,
CWR and NS to represent a counter that signals the accumulated number
of CE markings it has received. Resilience against loss is better
than the flag-based schemes, but still not ideal.
A couple of coding schemes have been proposed so far in this
category:
o A 3-bit counter scheme continuously feeds back the three least o A 3-bit counter scheme continuously feeds back the three least
significant bits of a CE counter; significant bits of a CE counter;
o A 3-bit codepoint scheme encodes either a CE counter or an ECT(1) o A scheme that defines a standardised lookup table to map the 8
counter in 8 codepoints. codepoints onto either a CE counter or an ECT(1) counter.
The proposed schemes provide accumulated information on ECN-CE- These proposed schemes provide accumulated information on ECN-CE
marking feedback, similar to the number of acknowledged bytes in the marking feedback, similar to the number of acknowledged bytes in the
TCP header. Due to the limited number of bits the ECN feedback TCP header. Due to the limited number of bits the ECN feedback
information will wrap much more often than the acknowledgement field. information will wrap much more often than the acknowledgement field.
Thus feedback information could be lost due to a relatively small Thus feedback information could be lost due to a relatively small
sequence of pure-ACK losses. Resilience could be increased by sequence of pure-ACK losses. Resilience could be increased by
introducing redundancy, e.g. send each counter increase two or more introducing redundancy, e.g. send each counter increase two or more
times. Of course any of these additional mechanisms will increasee times. Of course any of these additional mechanisms will increase
the complexity. If the congestion rate is larger that the ACK rate the complexity. If the congestion rate is greater than the ACK rate
(multiplied by the number of congestion marks that can be signaled (multiplied by the number of congestion marks that can be signaled
per ACK), the congestion information cannot correctly fed back. Thus per ACK), the congestion information cannot correctly be fed back.
an accurate ECN feedback mechanism needs to be able to also cover the Covering the worst case where every packet is CE marked can
worst case situation where every packet is CE marked. This can
potentially be realized by dynamically adapting the ACK rate and potentially be realized by dynamically adapting the ACK rate and
redundancy, which again increases complexity and perhaps the redundancy. This again increases complexity and perhaps the
signaling overhead as well. Scheme that do not re-use the ECN NS signaling overhead as well. Schemes that do not re-purpose the ECN
bit, can still support ECN Nonce. NS bit, could still support the ECN Nonce.
5.2. Using Other Header Bits 5.2. Using Other Header Bits
As seen in Figure 1, there are currently three unused flag bits in As seen in Figure 1, there are currently three unused flags in the
the TCP header. The proposed 3 bit or codepoint schemes could be TCP header. The proposed 3-bit counter or codepoint schemes could be
extended by one or more bits, to add higher resilience against ACK extended by one or more bits to add higher resilience against ACK
loss. The relative gain would be proportionally higher resilience loss. The relative gain would be exponentially higher resilience
against ACK loss, while the respective drawbacks would remain against ACK loss, while the respective drawbacks would remain
identical. identical.
Alternatively, the receiver could use bits in the Urgent Pointer Alternatively, the receiver could use bits in the Urgent Pointer
field to signal more bits of its congestion signal counter, but only field to signal more bits of its congestion signal counter, but only
whenever it does not set the Urgent Flag. As this is often the case, whenever it does not set the Urgent Flag. As this is often the case,
resilience could be increased without additional header overhead. resilience could be increased without additional header overhead.
Any proposal to use such bits would need to check the likelihood that Any proposal to use such bits would need to check the likelihood that
some middleboxes might discard or 'normalise' the currently unused some middleboxes might discard or 'normalize' the currently unused
flag bits or a non-zero Urgent Pointer when the Urgent Flag is flag bits or a non-zero Urgent Pointer when the Urgent Flag is
cleared. cleared.
5.3. Using a TCP Option 5.3. Using a TCP Option
Alternatively, a new TCP option could be introduced, to help Alternatively, a new TCP option could be introduced, to help maintain
maintaining the accuracy and integrity of the ECN feedback between the accuracy and integrity of ECN feedback between receiver and
receiver and sender. Such an option could provide higher resilience sender. Such an option could provide higher resilience and even more
and even more information. E.g. ECN for RTP/UDP provides explicit information. E.g. ECN for RTP/UDP [RFC6679] explicitly provides the
the number of ECT(0), ECT(1), CE, non-ECT marked and lost packets. number of ECT(0), ECT(1), CE, non-ECT marked and lost packets, and
However, deploying new TCP options has its own challenges. Moreover, SCTP counts the number of ECN marks [I-D.stewart-tsvwg-sctpecn]
to actually achieve a high resilience, this option would need to be between CWR chunks. However, deploying new TCP options has its own
carried by either all or a large number ACKs. Thus this approach challenges. Moreover, to actually achieve high resilience, this
would introduce considerable signaling overhead while ECN feedback is option would need to be carried by most or all ACKs. Thus this
not such a critical information (as in the worst case, loss will approach would introduce considerable signaling overhead even though
still be available to provide a strong congestion feedback signal). ECN feedback is not extremely critical information (in the worst
Anyway, such a TCP option could also be used in addition to a more case, loss will still be available to provide a strong congestion
accurate ECN feedback scheme in the TCP header or in addition to feedback signal). Whatever, such a TCP option could be used in
classic ECN, only when available and needed. addition to a more accurate ECN feedback scheme in the TCP header or
in addition to classic ECN, only when needed and when space is
available.
6. Acknowledgements 6. Acknowledgements
Thanks to Bob Briscoe for reviewing and providing valuable additons Thanks to Gorry Fairhurst for ideas on CE-based integrity checking
on DCTCP and ConEx. Moreover, thanks to Gorry Fairhurst as well as and to Mohammad Alizadeh for suggesting the need to avoid bias.
Bob Briscoe for ideas on CE-based interigty checking. Moverover, thanks to Michael Welzl and Michael Scharf for their
feedback.
7. IANA Considerations 7. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
8. Security Considerations 8. Security Considerations
Given ECN feedback is used as input for congestion control, the Given ECN feedback is used as input for congestion control, the
respective algorithm would not react appropriately if ECN feedback respective algorithm would not react appropriately if ECN feedback
were lost and the resilience mechanism to recover it was inadequate. were lost and the resilience mechanism to recover it was inadequate.
skipping to change at page 12, line 8 skipping to change at page 13, line 23
of Explicit Congestion Notification (ECN) to IP", RFC of Explicit Congestion Notification (ECN) to IP", RFC
3168, September 2001. 3168, September 2001.
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces", RFC Congestion Notification (ECN) Signaling with Nonces", RFC
3540, June 2003. 3540, June 2003.
9.2. Informative References 9.2. Informative References
[Ali10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, [Ali10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel,
P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP: P., Prabhakar, B., Sengupta, S., and M. Sridharan, "Data
Efficient Packet Transport for the Commoditized Data Center TCP (DCTCP)", ACM SIGCOMM CCR 40(4)63-74, October
Center", Jan 2010. 2010, <http://portal.acm.org/citation.cfm?id=1851192>.
[I-D.briscoe-tsvwg-re-ecn-tcp]
Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith,
"Re-ECN: Adding Accountability for Causing Congestion to
TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-09 (work in
progress), October 2010.
[I-D.kuehlewind-tcpm-accurate-ecn-option]
Kuehlewind, M. and R. Scheffenegger, "Accurate ECN
Feedback Option in TCP", draft-kuehlewind-tcpm-accurate-
ecn-option-01 (work in progress), July 2012.
[I-D.moncaster-tcpm-rcv-cheat] [I-D.moncaster-tcpm-rcv-cheat]
Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to
Allow Senders to Identify Receiver Non-Compliance", draft- Allow Senders to Identify Receiver Non-Compliance", draft-
moncaster-tcpm-rcv-cheat-02 (work in progress), November moncaster-tcpm-rcv-cheat-02 (work in progress), November
2007. 2007.
[I-D.stewart-tsvwg-sctpecn]
Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream
Control Transmission Protocol (SCTP)", draft-stewart-
tsvwg-sctpecn-05 (work in progress), January 2014.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996. Selective Acknowledgment Options", RFC 2018, October 1996.
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K.
Ramakrishnan, "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, June
2009.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009. Control", RFC 5681, September 2009.
[RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding
Acknowledgement Congestion Control to TCP", RFC 5690, Acknowledgement Congestion Control to TCP", RFC 5690,
February 2010. February 2010.
[RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
and K. Carlberg, "Explicit Congestion Notification (ECN)
for RTP over UDP", RFC 6679, August 2012.
[RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion [RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion
Exposure (ConEx) Concepts and Use Cases", RFC 6789, Exposure (ConEx) Concepts and Use Cases", RFC 6789,
December 2012. December 2012.
Appendix A. Ambiguity of the More Accurate ECN Feedback in DCTCP
As defined in [Ali10], a DCTCP receiver feeds back ECE=0 on delayed
ACKs as long as CE remains 0, and also immediately sends an ACK with
ECE=0 when CE transitions to 1. Similarly, it continually feeds back
ECE=1 on delayed ACKs while CE remains 1 and immediately feeds back
ECE=1 when CE transitions to 0. A sender can unambiguously decode
this scheme if there is never any ACK loss, and the sender assumes
there will never be any ACK loss.
The following two examples show that the feedback sequence becomes
highly ambiguous to the sender, if either of these conditions is
broken. Below, '0' will represent ECE=0, '1' will represent ECE=1
and '.' will represent a gap of one segment between delayed ACKs.
Now imagine that the sender receives the following sequence of
feedback on 3 pure ACKs:
0.0.0
When the receiver sent this sequence it could have been any of the
following four sequences:
a. 0.0.0 (0 x CE)
b. 010.0 (1 x CE)
c. 0.010 (1 x CE)
d. 01010 (2 x CE)
where any of the 1s represent a possible pure ACK carrying ECE
feedback that could have been lost. If the sender guesses (a), it
might be correct, or it might miss 1 or 2 congestion marks over 5
packets. Therefore, when confronted with this simple sequence (that
is not contrived), a sender can guess that congestion might have been
0%, 20% or 40%, but it doesn't know which.
Sequences with a longer gap (e.g. 0...0.0) become far more ambiguous.
It helps a little if the sender knows the distance the receiver uses
between delayed ACKs, and it helps a lot if the distance is 1, i.e.
no delayed ACKs, but even then there will still be ambiguity whenever
there are pure ACK losses.
Authors' Addresses Authors' Addresses
Mirja Kuehlewind (editor) Mirja Kuehlewind (editor)
University of Stuttgart University of Stuttgart
Pfaffenwaldring 47 Pfaffenwaldring 47
Stuttgart 70569 Stuttgart 70569
Germany Germany
Email: mirja.kuehlewind@ikr.uni-stuttgart.de Email: mirja.kuehlewind@ikr.uni-stuttgart.de
Richard Scheffenegger Richard Scheffenegger
NetApp, Inc. NetApp, Inc.
Am Euro Platz 2 Am Euro Platz 2
Vienna 1120 Vienna 1120
Austria Austria
Phone: +43 1 3676811 3146 Phone: +43 1 3676811 3146
Email: rs@netapp.com Email: rs@netapp.com
Bob Briscoe
BT
B54/77, Adastral Park
Martlesham Heath
Ipswich IP5 3RE
UK
Phone: +44 1473 645196
Email: bob.briscoe@bt.com
URI: http://bobbriscoe.net/
 End of changes. 70 change blocks. 
244 lines changed or deleted 352 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/