--- 1/draft-ietf-tcpm-ecnsyn-08.txt 2009-05-13 22:12:08.000000000 +0200 +++ 2/draft-ietf-tcpm-ecnsyn-09.txt 2009-05-13 22:12:08.000000000 +0200 @@ -1,20 +1,20 @@ Internet Engineering Task Force A. Kuzmanovic INTERNET-DRAFT A. Mondal Intended status: Experimental Northwestern University -Expires: 2 October 2009 S. Floyd +Expires: 13 November 2009 S. Floyd ICSI K.K. Ramakrishnan AT&T Adding Explicit Congestion Notification (ECN) Capability to TCP's SYN/ACK Packets - draft-ietf-tcpm-ecnsyn-08.txt + draft-ietf-tcpm-ecnsyn-09.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow @@ -35,21 +35,21 @@ and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. - This Internet-Draft will expire on 2 October 2009. + This Internet-Draft will expire on 13 November 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights @@ -57,48 +57,48 @@ Abstract The proposal in this document is experimental. While it may be deployed in the current Internet, it does not represent a consensus that this is the best possible mechanism for the use of ECN in TCP SYN/ACK packets. This draft describes an optional, experimental modification to RFC 3168 to allow TCP SYN/ACK packets to be ECN-Capable. For TCP, RFC - 3168 only specifies setting an ECN-Capable codepoint on data packets, - and not on SYN and SYN/ACK packets. However, because of the high - cost to the TCP transfer of having a SYN/ACK packet dropped, with the - resulting retransmit timeout, this document describes the use of ECN - for the SYN/ACK packet itself, when sent in response to a SYN packet - with the two ECN flags set in the TCP header, indicating a + 3168 specifies setting an ECN-Capable codepoint on data packets, but + not on SYN and SYN/ACK packets. However, because of the high cost to + the TCP transfer of having a SYN/ACK packet dropped, with the + resulting retransmission timeout, this document describes the use of + ECN for the SYN/ACK packet itself, when sent in response to a SYN + packet with the two ECN flags set in the TCP header, indicating a willingness to use ECN. Setting the initial TCP SYN/ACK packet as ECN-Capable can be of great benefit to the TCP connection, avoiding - the severe penalty of a retransmit timeout for a connection that has - not yet started placing a load on the network. The TCP responder + the severe penalty of a retransmission timeout for a connection that + has not yet started placing a load on the network. The TCP responder (the sender of the SYN/ACK packet) must reply to a report of an ECN- marked SYN/ACK packet by resending a SYN/ACK packet that is not ECN- Capable. If the resent SYN/ACK packet is acknowledged, then the TCP responder reduces its initial congestion window from two, three, or four segments to one segment, thereby reducing the subsequent load from that connection on the network. If instead the SYN/ACK packet is dropped, or for some other reason the TCP responder does not receive an acknowledgement in the specified time, the TCP responder follows TCP standards for a dropped SYN/ACK packet (setting the - retransmit timer). + retransmission timer). Table of Contents 1. Introduction ....................................................6 - 2. Conventions and Terminology .....................................7 + 2. Conventions and Terminology .....................................8 3. Specification ...................................................8 - 3.1. SYN/ACK Packets Dropped in the Network .....................8 - 3.2. SYN/ACK Packets ECN-Marked in the Network ..................9 + 3.1. SYN/ACK Packets Dropped in the Network .....................9 + 3.2. SYN/ACK Packets ECN-Marked in the Network .................10 3.3. Management Interface ......................................12 4. Discussion .....................................................13 4.1. Flooding Attacks ..........................................13 4.2. The TCP SYN Packet ........................................13 4.3. SYN/ACK Packets and Packet Size ...........................14 4.4. Response to ECN-marking of SYN/ACK Packets ................14 5. Related Work ...................................................16 6. Performance Evaluation .........................................17 6.1. The Costs and Benefit of Adding ECN-Capability ............17 6.2. An Evaluation of Different Responses to ECN-Marked SYN/ACK @@ -110,31 +110,47 @@ 9. Acknowledgements ...............................................21 A. Report on Simulations ..........................................21 A.1. Simulations with RED in Packet Mode .......................22 A.2. Simulations with RED in Byte Mode .........................26 B. Issues of Incremental Deployment ...............................28 Informative References ............................................31 IANA Considerations ...............................................32 NOTE TO RFC EDITOR: PLEASE DELETE THIS NOTE UPON PUBLICATION. + Changes from draft-ietf-tcpm-ecnsyn-08: + + * Minor editing and bug-fixes. Feedback from Anil Agarwal and + Alfred Hoenes. + + * Changed the specification so that after the first SYN/ACK packet + is ECN-marked, and the responder receives an ECN-Echo, the + responder does not set the CWR flag in the second SYN/ACK packet. + We also specified that on receiving the non-ECN-marked SYN/ACK + packet, the TCP initiator clears the ECN-Echo flag on replying + packets. Feedback from Anil Agarwal. + + * Changed it so that the initiator moves from the "SYN-Sent" state + to the "Established" state when it receives a SYN/ACK packet + that is not ECN-marked. + Changes from draft-ietf-tcpm-ecnsyn-07: * Updated boilerplates. * Changed proposed status from "Proposed Standard" to "Experimental", and modified text in the Introduction to match. The added text in the introduction is based on similar text in the Introduction of RFC 3649. - * Specified that with ECN+/TryOnce, the originator resets the - retransmit timer when it receives an ECN-marked SYN/ACK. + * Specified that with ECN+/TryOnce, the originator restarts the + retransmission timer when it receives an ECN-marked SYN/ACK. Also reran simulations for ECN+/TryOnce, and updated Tables 1-6. * Specified that the originator follows the traditional rules in setting the cumulative ack field for the ACK acking the SYN/ACK. * Minor editing. Changes from draft-ietf-tcpm-ecnsyn-06: @@ -264,112 +280,113 @@ the router detects congestion before buffer overflow, the router can provide a congestion indication either by dropping a packet, or by setting the Congestion Experienced (CE) codepoint in the Explicit Congestion Notification (ECN) field in the IP header [RFC3168]. The IETF has standardized the use of the Congestion Experienced (CE) codepoint in the IP header for routers to indicate congestion. For incremental deployment and backwards compatibility, the RFC on ECN [RFC3168] specifies that routers may mark ECN-capable packets that would otherwise have been dropped, using the Congestion Experienced codepoint in the ECN field. The use of ECN allows TCP to react to - congestion while avoiding unnecessary retransmit timeouts. Thus, + congestion while avoiding unnecessary retransmission timeouts. Thus, using ECN has several benefits: 1) For short transfers, a TCP connection's congestion window may be small. For example, if the current window contains only one packet, - and that packet is dropped, TCP will have to wait for a retransmit - timeout to recover, reducing its overall throughput. Similarly, if - the current window contains only a few packets and one of those - packets is dropped, there might not be enough duplicate + and that packet is dropped, TCP will have to wait for a + retransmission timeout to recover, reducing its overall throughput. + Similarly, if the current window contains only a few packets and one + of those packets is dropped, there might not be enough duplicate acknowledgements for a fast retransmission, and the sender of the data packet might have to wait for a delay of several round-trip times using Limited Transmit [RFC3042]. With the use of ECN, short flows are less likely to have packets dropped, sometimes avoiding - unnecessary delays or costly retransmit timeouts. + unnecessary delays or costly retransmission timeouts. 2) While longer flows may not see substantially improved throughput with the use of ECN, they may experience lower loss. This may benefit TCP applications that are latency- and loss-sensitive, because of the avoidance of retransmissions. - RFC 3168 only specifies marking the Congestion Experienced codepoint - on TCP's data packets, and not on SYN and SYN/ACK packets. RFC 3168 - specifies the negotiation of the use of ECN between the two TCP end- - points in the TCP SYN and SYN-ACK exchange, using flags in the TCP - header. Erring on the side of being conservative, RFC 3168 does not - specify the use of ECN for the first SYN/ACK packet itself. However, - because of the high cost to the TCP transfer of having a SYN/ACK - packet dropped, with the resulting retransmit timeout, this document - specifies the use of ECN for the SYN/ACK packet itself. This can be - of great benefit to the TCP connection, avoiding the severe penalty - of a retransmit timeout for a connection that has not yet started - placing a load on the network. The sender of the SYN/ACK packet must - respond to a report of an ECN-marked SYN/ACK packet (a router with - the CE codepoint set in the ECN field in the IP header) by sending a - non-ECN-Capable SYN/ACK packet, and by reducing its initial - congestion window from two, three, or four segments to one segment, - reducing the subsequent load from that connection on the network. + RFC 3168 [RFC3168] specifies setting the ECN-Capable codepoint on TCP + data packets, but not on TCP SYN and SYN/ACK packets. RFC 3168 + [RFC3168] specifies the negotiation of the use of ECN between the two + TCP end-points in the TCP SYN and SYN-ACK exchange, using flags in + the TCP header. Erring on the side of being conservative, RFC 3168 + [RFC3168] does not specify the use of ECN for the first SYN/ACK + packet itself. However, because of the high cost to the TCP transfer + of having a SYN/ACK packet dropped, with the resulting retransmission + timeout, this document specifies the use of ECN for the SYN/ACK + packet itself. This can be of great benefit to the TCP connection, + avoiding the severe penalty of a retransmission timeout for a + connection that has not yet started placing a load on the network. + The sender of the SYN/ACK packet must respond to a report of an ECN- + marked SYN/ACK packet (a SYN/ACK packet with the CE codepoint set in + the ECN field in the IP header) by sending a non-ECN-Capable SYN/ACK + packet, and by reducing its initial congestion window from two, + three, or four segments to one segment, reducing the subsequent load + from that connection on the network. The use of ECN for SYN/ACK packets has the following potential benefits: - 1) Avoidance of a retransmit timeout; + 1) Avoidance of a retransmission timeout; 2) Improvement in the throughput of short connections. - This draft specifies a modification to RFC 3168 to allow TCP SYN/ACK - packets to be ECN-Capable. Section 3 contains the specification of - the change, while Section 4 discusses some of the issues, and Section - 5 discusses related work. Section 6 contains an evaluation of the - specified change. + This draft specifies a modification to RFC 3168 [RFC3168] to allow + TCP SYN/ACK packets to be ECN-Capable. Section 3 contains the + specification of the change, while Section 4 discusses some of the + issues, and Section 5 discusses related work. Section 6 contains an + evaluation of the specified change. 2. Conventions and Terminology - We use the following terminology from RFC 3168: + We use the following terminology from RFC 3168 [RFC3168]: The ECN field in the IP header: o CE: the Congestion Experienced codepoint; and o ECT: either one of the two ECN-Capable Transport codepoints. The ECN flags in the TCP header: o CWR: the Congestion Window Reduced flag; and o ECE: the ECN-Echo flag. ECN-setup packets: o ECN-setup SYN packet: a SYN packet with the ECE and CWR flags; o ECN-setup SYN-ACK packet: a SYN-ACK packet with ECE but not CWR. In this document we use the terms "initiator" and "responder" to refer to the sender of the SYN packet and of the SYN-ACK packet, respectively. 3. Specification - This section specifies the modification to RFC 3168 to allow TCP - SYN/ACK packets to be ECN-Capable. + This section specifies the modification to RFC 3168 [RFC3168] to + allow TCP SYN/ACK packets to be ECN-Capable. - RFC 3168 in Section 6.1.1. states that "A host MUST NOT set ECT on - SYN or SYN-ACK packets." In this section, we specify that a TCP node - may respond to an initial ECN-setup SYN packet by setting ECT in the - responding ECN-setup SYN/ACK packet, indicating to routers that the - SYN/ACK packet is ECN-Capable. This allows a congested router along - the path to mark the packet instead of dropping the packet as an - indication of congestion. + Section 6.1.1 of RFC 3168 [RFC3168] states that "A host MUST NOT set + ECT on SYN or SYN-ACK packets." In this section, we specify that a + TCP node may respond to an initial ECN-setup SYN packet by setting + ECT in the responding ECN-setup SYN/ACK packet, indicating to routers + that the SYN/ACK packet is ECN-Capable. This allows a congested + router along the path to mark the packet instead of dropping the + packet as an indication of congestion. Assume that TCP node A transmits to TCP node B an ECN-setup SYN packet, indicating willingness to use ECN for this connection. As - specified by RFC 3168, if TCP node B is willing to use ECN, node B - responds with an ECN-setup SYN-ACK packet. + specified by RFC 3168 [RFC3168], if TCP node B is willing to use ECN, + node B responds with an ECN-setup SYN-ACK packet. 3.1. SYN/ACK Packets Dropped in the Network Figure 1 shows an interchange with the SYN/ACK packet dropped by a - congested router. Node B waits for a retransmit timeout, and then - retransmits the SYN/ACK packet. + congested router. Node B waits for a retransmission timeout, and + then retransmits the SYN/ACK packet. --------------------------------------------------------------- TCP Node A Router TCP Node B (initiator) (responder) ---------- ------ ---------- ECN-setup SYN packet ---> ECN-setup SYN packet ---> <--- ECN-setup SYN/ACK, possibly ECT @@ -381,21 +398,21 @@ <--- ECN-setup SYN/ACK, not ECT <--- ECN-setup SYN/ACK Data/ACK ---> Data/ACK ---> <--- Data (one to four segments) --------------------------------------------------------------- Figure 1: SYN exchange with the SYN/ACK packet dropped. If the SYN/ACK packet is dropped in the network, the responder (node - B) responds by waiting three seconds for the retransmit timer to + B) responds by waiting three seconds for the retransmission timer to expire [RFC2988]. If a SYN/ACK packet with the ECT codepoint is dropped, the responder should resend the SYN/ACK packet without the ECN-Capable codepoint. (Although we are not aware of any middleboxes that drop SYN/ACK packets that contain an ECN-Capable codepoint in the IP header, we have learned to design our protocols defensively in this regard [RFC3360].) We note that if syn-cookies were used by the responder (node B) in the exchange in Figure 1, the responder wouldn't set a timer upon transmission of the SYN/ACK packet [SYN-COOK] [RFC4987]. In this @@ -421,100 +438,103 @@ ---------- ------ ---------- ECN-setup SYN packet ---> ECN-setup SYN packet ---> <--- ECN-setup SYN/ACK, ECT 3-second timer set <--- Sets CE on SYN/ACK <--- ECN-setup SYN/ACK, CE - Data/ACK, ECN-Echo ---> - Data/ACK, ECN-Echo ---> + ACK, ECN-Echo ---> + ACK, ECN-Echo ---> Window reduced to one segment. - <--- ECN-setup SYN/ACK, CWR, not ECT - <--- ECN-setup SYN/ACK, CWR + <--- ECN-setup SYN/ACK, not ECT + <--- ECN-setup SYN/ACK - Data/ACK ---> - Data/ACK ---> - <--- Data (one segment only) + Data/ACK, ECT ---> + Data/ACK, ECT ---> + <--- Data, ECT (one segment only) --------------------------------------------------------------- Figure 2: SYN exchange with the SYN/ACK packet marked. ECN+/TryOnce. If the initiator (node A) receives a SYN/ACK packet that has been ECN-marked by the congested router, with the CE codepoint set, the - initiator resets the retransmit timer. The initiator responds to the - ECN-marked SYN/ACK packet by setting the ECN-Echo flag in the TCP - header of the responding ACK packet. The initiator uses the standard - rules in setting the cumulative acknowledgement field in the + initiator restarts the retransmission timer. The initiator responds + to the ECN-marked SYN/ACK packet by setting the ECN-Echo flag in the + TCP header of the responding ACK packet. The initiator uses the + standard rules in setting the cumulative acknowledgement field in the responding ACK packet. - However, with ECN+/TryOnce the initiator does not advance from the - "SYN-Sent" to the "SYN-Received" state until it receives a SYN/ACK - packet that is not ECN-marked. As specified in RFC 3168, the - initiator continues to set the ECN-Echo flag in packets until it - receives a packet with the CWR flag set. + The initiator does not advance from the "SYN-Sent" to the + "Established" state until it receives a SYN/ACK packet that is not + ECN-marked. When the responder (node B) receives the ECN-Echo packet reporting the Congestion Experienced indication in the SYN/ACK packet, the responder sets the initial congestion window to one segment, instead of two segments as allowed by [RFC2581], or three or four segments - allowed by [RFC3390]. With ECN+/TryOnce, illustrated in Figure 2, if - the responder (node B) receives an ECN-Echo packet informing it of a - Congestion Experienced indication on its SYN/ACK packet, the - responder sends a SYN/ACK packet that is not ECN-Capable, in addition - to setting the initial window to one segment. + allowed by [RFC3390]. As illustrated in Figure 2, if the responder + (node B) receives an ECN-Echo packet informing it of a Congestion + Experienced indication on its SYN/ACK packet, the responder sends a + SYN/ACK packet that is not ECN-Capable, in addition to setting the + initial window to one segment. The responder does not advance the + send sequence number. The responder also sets the retransmission + timer. The responder follows RFC 2988 [RFC2988] in setting the RTO + (retransmission timeout). - We note that the mechanism in this document differs from RFC 3168, - which specifies that "the sending TCP MUST reset the retransmit timer - on receiving the ECN-Echo packet when the congestion window is one." - In contrast, this document describes the response of a TCP host to - receiving an ECN-Echo packet acknowledging the receipt of an ECN- - Capable SYN/ACK packet. + The TCP hosts follow the standard specification for the response to + duplicate SYN/ACK packets (e.g., Section 3.4 of RFC 793 [RFC793]). - RFC 3168 specifies that in response to an ECN-Echo packet, the TCP - responder also sets the CWR flag in the TCP header of the next data - packet sent, to acknowledge its receipt of and reaction to the ECN- - Echo flag. In contrast, this document describes that in response to - an ECN-Echo packet acknowledging the receipt of an ECN-Capable - SYN/ACK packet, the responder sets the CWR flag in the TCP header of - the non-ECN-Capable SYN/ACK packet. + We note that the mechanism in this document differs from RFC 3168 + [RFC3168], which specifies that "the sending TCP MUST restart the + retransmission timer on receiving the ECN-Echo packet when the + congestion window is one." RFC 3168 [RFC3168] does not allow SYN/ACK + packets to be ECN-Capable. RFC 3168 [RFC3168] specifies that in + response to an ECN-Echo packet, the TCP responder also sets the CWR + flag in the TCP header of the next data packet sent, to acknowledge + its receipt of and reaction to the ECN-Echo flag. In contrast, in + response to an ECN-Echo packet acknowledging the receipt of an ECN- + Capable SYN/ACK packet, the TCP responder doesn't set the CWR flag, + but simply sends a SYN/ACK packet that is not ECN-Capable. On + receiving the non-ECN-Capable SYN/ACK packet, the TCP initiator + clears the ECN-Echo flag on replying packets. --------------------------------------------------------------- TCP Node A Router TCP Node B (initiator) (responder) ---------- ------ ---------- ECN-setup SYN packet ---> ECN-setup SYN packet ---> <--- ECN-setup SYN/ACK, ECT <--- Sets CE on SYN/ACK <--- ECN-setup SYN/ACK, CE - Data/ACK, ECN-Echo ---> - Data/ACK, ECN-Echo ---> + ACK, ECN-Echo ---> + ACK, ECN-Echo ---> Window reduced to one segment. - <--- ECN-setup SYN/ACK, CWR, not ECT + <--- ECN-setup SYN/ACK, not ECT 3-second timer set SYN/ACK dropped . . . 3-second timer expires - <--- ECN-setup SYN/ACK, CWR, not ECT - <--- ECN-setup SYN/ACK, CWR, not ECT - Data/ACK ---> - Data/ACK ---> - <--- Data (one segment only) + <--- ECN-setup SYN/ACK, not ECT + <--- ECN-setup SYN/ACK, not ECT + Data/ACK, ECT ---> + Data/ACK, ECT ---> + <--- Data, ECT (one segment only) --------------------------------------------------------------- Figure 3: SYN exchange with the first SYN/ACK packet marked, and the second SYN/ACK packet dropped. ECN+/TryOnce. In contrast to Figure 2, Figure 3 shows an interchange where the first SYN/ACK packet is ECN-marked and the second SYN/ACK packet is dropped in the network. As in Figure 2, the TCP responder sets a timer when the second SYN/ACK packet is sent. Figure 3 shows that if the timer expires before the TCP responder receives an @@ -535,21 +555,21 @@ set in the TCP header, this indicates that node A is ECN-capable. If node B is also ECN-capable, there are no obstacles to immediately setting one of the ECN-Capable codepoints in the IP header in the responding TCP SYN/ACK packet. There can be a great benefit in setting an ECN-capable codepoint in SYN/ACK packets, as is discussed further in [ECN+], and reported briefly in Section 5 below. Congestion is most likely to occur in the server-to-client direction. As a result, setting an ECN-capable codepoint in SYN/ACK packets can reduce the occurrence of three- - second retransmit timeouts resulting from the drop of SYN/ACK + second retransmission timeouts resulting from the drop of SYN/ACK packets. 4.1. Flooding Attacks Setting an ECN-Capable codepoint in the responding TCP SYN/ACK packets does not raise any new or additional security vulnerabilities. For example, provoking servers or hosts to send SYN/ACK packets to a third party in order to perform a "SYN/ACK flood" attack would be highly inefficient. Third parties would immediately drop such packets, since they would know that they didn't @@ -610,37 +630,39 @@ in terms of the drop or mark behavior at routers as a function of packet size [Tools] (Section 10). We note that all of these alternatives listed above are available in the NS simulator (Drop Tail queues are by default in units of packets, while the default for RED queue management has been changed from packet mode to byte mode). 4.4. Response to ECN-marking of SYN/ACK Packets One question is why TCP SYN/ACK packets should be treated differently from other packets in terms of the end node's response to an ECN- - marked packet. Section 5 of RFC 3168 specifies the following: + marked packet. Section 5 of RFC 3168 [RFC3168] specifies the + following: "Upon the receipt by an ECN-Capable transport of a single CE packet, the congestion control algorithms followed at the end-systems MUST be essentially the same as the congestion control response to a *single* dropped packet. For example, for ECN-Capable TCP the source TCP is required to halve its congestion window for any window of data containing either a packet drop or an ECN indication." - In particular, Section 6.1.2 of RFC 3168 specifies that when the TCP - congestion window consists of a single packet and that packet is ECN- - marked in the network, then the data sender must reduce the sending - rate below one packet per round-trip time, by waiting for one RTO - before sending another packet. If the RTO was set to the average - round-trip time, this would result in halving the sending rate; - because the RTO is in fact larger than the average round-trip time, - the sending rate is reduced to less than half of its previous value. + In particular, Section 6.1.2 of RFC 3168 [RFC3168] specifies that + when the TCP congestion window consists of a single packet and that + packet is ECN-marked in the network, then the data sender must reduce + the sending rate below one packet per round-trip time, by waiting for + one RTO before sending another packet. If the RTO was set to the + average round-trip time, this would result in halving the sending + rate; because the RTO is in fact larger than the average round-trip + time, the sending rate is reduced to less than half of its previous + value. TCP's congestion control response to the *dropping* of a SYN/ACK packet is to wait a default time before sending another packet. This document argues that ECN gives end-systems a wider range of possible responses to the *marking* of a SYN/ACK packet, and that waiting a default time before sending another packet is not the desired response. On the conservative end, one could assume an effective congestion window of one packet for the SYN/ACK packet, and respond to an ECN- @@ -686,22 +708,22 @@ performance measures were the end-to-end response times for each request/response pair, and the aggregate throughput on the bottleneck link. The end-to-end response time was computed as the time from the moment when the request for the file is sent to the server, until that file is successfully downloaded by the client. The measurements from [ECN+] show that setting an ECN-Capable codepoint in the IP packet header in TCP SYN/ACK packets systematically improves performance with all evaluated AQM schemes. When SYN/ACK packets at a congested router are ECN-marked instead of - dropped, this can avoid a long initial retransmit timeout, improving - the response time for the affected flow dramatically. + dropped, this can avoid a long initial retransmission timeout, + improving the response time for the affected flow dramatically. [ECN+] shows that the impact on aggregate throughput can also be quite significant, because marking SYN ACK packets can prevent larger flows from suffering long timeouts before being "admitted" into the network. In addition, the testbed measurements from [ECN+] show that web servers setting the ECN-Capable codepoint in TCP SYN/ACK packets could serve more requests. As a final step, [ECN+] explores the co-existence of flows that do and don't set the ECN-capable codepoint in TCP SYN/ACK packets. The @@ -723,26 +745,26 @@ dropped in the network, and for which the ECN-Capability would allow the SYN/ACK to be marked rather than dropped. The percent of SYN/ACK packets on a link can be quite high. In particular, measurements on links dominated by web traffic indicate that 15-20% of the packets can be SYN/ACK packets [SCJO01]. The benefit of adding ECN-capability to SYN/ACK packets depends in part on the size of the data transfer. The drop of a SYN/ACK packet can increase the download time of a short file by an order of - magnitude, by requiring a three-second retransmit timeout. For + magnitude, by requiring a three-second retransmission timeout. For longer-lived flows, the effect of a dropped SYN/ACK packet on file download time is less dramatic. However, even for longer-lived flows, the addition of ECN-capability to SYN/ACK packets can improve the fairness among long-lived flows, as newly-arriving flows would be - less likely to have to wait for retransmit timeouts. + less likely to have to wait for retransmission timeouts. One question that arises is what fraction of connections would see the benefit from making SYN/ACK packets ECN-capable, in a particular scenario. Specifically: (1) What fraction of arriving SYN/ACK packets are dropped at the congested router when the SYN/ACK packets are not ECN-capable? (2) Of those SYN/ACK packets that are dropped, what fraction would have been ECN-marked instead of dropped if the SYN/ACK packets had @@ -795,22 +817,22 @@ However, Section 4 discussed two other possible responses to an ECN- marked SYN/ACK packet. In ECN+, the original proposal from [ECN+], the end node responds to the report of an ECN-marked SYN/ACK packet by setting the initial congestion window to one segment and immediately sending a data packet, if it has one to send. In ECN+/Wait, the end node responds to the report of an ECN-marked SYN/ACK packet by setting the initial congestion window to one segment and waiting an RTT before sending a data packet. Simulations comparing the performance with Standard ECN (without ECN- - marked SYN/ACK packets), ECN+, and ECN+/Wait, and ECN/TryOnce show - little difference, in terms of aggregate congestion, between ECN+ and + marked SYN/ACK packets), ECN+, ECN+/Wait, and ECN/TryOnce show little + difference, in terms of aggregate congestion, between ECN+ and ECN+/Wait. However, for some scenarios with queues that are packet- based rather than byte-based, and with packet drop rates above 25% without ECN+, the use of ECN+ or of ECN+/Wait can more than double the packet drop rates, to greater than 50%. The details are given in Tables 1 and 3 of Appendix A below. ECN+/TryOnce does not increase the packet drop rate in scenarios of high congestion. Therefore, ECN+/TryOnce is superior to ECN+ or to ECN+/Wait, which both significantly increase the packet drop rate in scenarios of high congestion. At the same time, ECN+/TryOnce gives a performance improvement similar to that of ECN+ or ECN+/Wait (Tables 2 and 4 of @@ -877,36 +899,36 @@ The simulations reported in Appendix A show that even with demanding traffic mixes dominated by short flows and high levels of congestion, the aggregate packet dropping rates are not significantly different with Standard ECN or with ECN+/TryOnce. However, in our simulations, we have one scenario where ECN+ or ECN+/Wait results in a significantly higher packet drop rate than ECN or ECN+/TryOnce (Tables 1 and 3 in Appendix A below). 8. Conclusions - This draft specifies a modification to RFC 3168 to allow TCP nodes to - send SYN/ACK packets as being ECN-Capable. Making the SYN/ACK packet - ECN-Capable avoids the high cost to a TCP transfer when a SYN/ACK - packet is dropped by a congested router, by avoiding the resulting - retransmit timeout. This improves the throughput of short - connections. This document specifies the ECN+/TryOnce mechanism for - ECN-Capability for SYN/ACK packets, where the sender of the SYN/ACK - packet responds to an ECN mark by reducing its initial congestion - window from two, three, or four segments to one segment, and sending - a SYN/ACK packet that is not ECN-Capable. The addition of ECN- - capability to SYN/ACK packets is particularly beneficial in the - server-to-client direction, where congestion is more likely to occur. - In this case, the initial information provided by the ECN marking in - the SYN/ACK packet enables the server to appropriately adjust the - initial load it places on the network, while avoiding the delay of a - retransmit timeout. + This draft specifies a modification to RFC 3168 [RFC3168] to allow + TCP nodes to send SYN/ACK packets as being ECN-Capable. Making the + SYN/ACK packet ECN-Capable avoids the high cost to a TCP transfer + when a SYN/ACK packet is dropped by a congested router, by avoiding + the resulting retransmission timeout. This improves the throughput + of short connections. This document specifies the ECN+/TryOnce + mechanism for ECN-Capability for SYN/ACK packets, where the sender of + the SYN/ACK packet responds to an ECN mark by reducing its initial + congestion window from two, three, or four segments to one segment, + and sending a SYN/ACK packet that is not ECN-Capable. The addition + of ECN-capability to SYN/ACK packets is particularly beneficial in + the server-to-client direction, where congestion is more likely to + occur. In this case, the initial information provided by the ECN + marking in the SYN/ACK packet enables the server to appropriately + adjust the initial load it places on the network, while avoiding the + delay of a retransmission timeout. 9. Acknowledgements We thank Anil Agarwal, Mark Allman, Remi Denis-Courmont, Wesley Eddy, Lars Eggert, Alfred Hoenes, Janardhan Iyengar, and Pasi Sarolahti for feedback on earlier versions of this draft. We thank Adam Langley [L08] for contributing a patch for ECN+/TryOnce for the Linux development tree. A. Report on Simulations @@ -965,21 +987,21 @@ each table showing a particular traffic load, the four rows show the number of packets dropped, the number of packets ECN-marked, the aggregate packet drop rate, and the aggregate throughput, and the four columns show the simulations with Standard ECN, ECN+, ECN+/Wait, and ECN+/TryOnce. These simulations were run with RED set to mark instead of drop packets any time that the queue is not full. This is a worst-case scenario for ECN+ and its variants. For the default implementation of RED in the ns-2 simulator, when the average queue size exceeds a - configured threshold. the router drops all arriving packets. For + configured threshold, the router drops all arriving packets. For scenarios with this RED mechanisms, it is less likely that ECN+ or one of its variants would increase the average queue size above the configured threshold. The usefulness of ECN+: The first thing to observe is that for all of the simulations, the use of ECN+ or ECN+/Wait significantly increases the number of packets marked. In contrast, the use of ECN+/TryOnce significantly increases the number of packets marked in the simulations with moderate congestion, and gives a more moderate increase in the number of packets marked for the simulations with @@ -1024,21 +1046,21 @@ Throughput 92% 92% 92% 94% Target Load = 125%: ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 600,628 1,746,768 2,176,530 625,552 Marked 418,433 1,166,450 1,164,932 439,847 Loss rate 25.45% 51.73% 56.87% 18.31% Throughput 94% 98% 97% 95% - Target Load = 1.50% + Target Load = 150% ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 1,449,945 1,565,0517 1,563,0801 1,351,637 Marked 669,840 583,378 591,315 684,715 Loss rate 46.7% 59.0% 59.0% 32.7% Throughput 88% 94% 94% 92% Table 1: Simulations with an average flow size of 3 Kbytes, a 100 Mbps link, RED in packet mode, queue in packets. @@ -1036,82 +1058,80 @@ ------- ------- ------- ---------- Dropped 1,449,945 1,565,0517 1,563,0801 1,351,637 Marked 669,840 583,378 591,315 684,715 Loss rate 46.7% 59.0% 59.0% 32.7% Throughput 88% 94% 94% 92% Table 1: Simulations with an average flow size of 3 Kbytes, a 100 Mbps link, RED in packet mode, queue in packets. Target Load = 95%: - TIME: 10 100 200 300 400 500 1000 2000 3000 4000 5000 ------------------------------------------------------ ECN: 0.00 0.07 0.26 0.51 0.82 0.96 0.97 0.97 0.97 1.00 1.00 ECN+: 0.00 0.07 0.27 0.53 0.85 0.99 1.00 1.00 1.00 1.00 1.00 Wait: 0.00 0.07 0.26 0.51 0.83 0.97 1.00 1.00 1.00 1.00 1.00 Once: 0.00 0.07 0.24 0.49 0.83 0.97 1.00 1.00 1.00 1.00 1.00 Target Load = 110%: - TIME: 10 100 200 300 400 500 1000 2000 3000 4000 5000 ------------------------------------------------------ ECN: 0.00 0.05 0.19 0.41 0.67 0.79 0.80 0.80 0.80 0.96 0.96 ECN+: 0.00 0.07 0.22 0.48 0.81 0.96 1.00 1.00 1.00 1.00 1.00 Wait: 0.00 0.05 0.18 0.38 0.64 0.77 0.95 1.00 1.00 1.00 1.00 Once: 0.00 0.06 0.19 0.42 0.70 0.86 0.95 0.96 0.96 0.99 0.99 Target Load = 125%: - TIME: 10 100 200 300 400 500 1000 2000 3000 4000 5000 ------------------------------------------------------ ECN: 0.00 0.04 0.13 0.27 0.46 0.56 0.58 0.59 0.59 0.82 0.82 ECN+: 0.00 0.06 0.18 0.33 0.58 0.76 0.97 0.99 0.99 1.00 1.00 Wait: 0.00 0.01 0.06 0.13 0.21 0.27 0.68 0.98 0.99 1.00 1.00 Once: 0.00 0.05 0.16 0.34 0.58 0.73 0.85 0.87 0.87 0.95 0.96 + Target Load = 150%: TIME: 10 100 200 300 400 500 1000 2000 3000 4000 5000 ------------------------------------------------------ ECN: 0.00 0.03 0.08 0.18 0.31 0.39 0.42 0.42 0.43 0.68 0.68 ECN+: 0.00 0.06 0.18 0.39 0.67 0.81 0.83 0.84 0.84 0.93 0.93 Wait: 0.00 0.06 0.18 0.39 0.67 0.81 0.83 0.84 0.84 0.93 0.94 Once: 0.00 0.04 0.13 0.27 0.46 0.59 0.72 0.75 0.75 0.88 0.88 Table 2: The cumulative distribution function (CDF) for transfer times, for simulations with an average flow size of 3 Kbytes, a 100 Mbps link, RED in packet mode, queue in packets. (The graphs are available from "http://www.icir.org/floyd/ecn-syn/".) - Target Load = 0.95% + Target Load = 95% ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 8,448 6,362 7,740 14,107 Marked 9,891 16,787 17,456 16,132 Loss rate 5.5% 4.3% 5.0% 5.0% Throughput 78% 78% 78% 81% - Target Load = 1.10% + Target Load = 110% ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 31,284 29,773 49,297 45,277 Marked 28,429 54,729 60,383 34,622 Loss rate 15.3% 15.2% 21.9% 13.6% Throughput 97% 96% 96% 94% - Target Load = 1.25% + Target Load = 125% ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 61,433 176,682 214,096 75,612 Marked 44,408 119,728 117,301 49,442 Loss rate 25.4% 51.9% 56.0% 22.3% Throughput 97% 98% 98% 96% - Target Load = 1.50% + Target Load = 150% ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 130,007 251,856 326,845 133,603 Marked 63,066 146,757 147,239 66,444 Loss rate 42.5% 61.3% 67.3% 31.7% Throughput 93% 99% 99% 94% Table 3: Simulations with an average flow size of 3 Kbytes, a 10 Mbps link, RED in packet mode, queue in packets. @@ -1200,45 +1216,45 @@ ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 484,251 483,847 507,727 600,737 Marked 865,905 872,254 873,317 818,451 Loss rate 19.09% 19.13% 19.71% 12.66% Throughput 99% 98% 99% 99% Table 5: Simulations with an average flow size of 3 Kbytes, a 100 Mbps link, RED in byte mode, queue in bytes. - Target Load = 0.95% + Target Load = 95% ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 142 77 103 99 Marked 11,694 11,387 11,604 12,129 Loss rate 0.1% 0.1% 0.1% 0.1% Throughput 78% 78% 78% 78% - Target Load = 1.10% + Target Load = 110% ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 338 210 247 274 Marked 41,676 40,412 44,173 36,265 Loss rate 0.2% 0.1% 0.1% 0.1% Throughput 94% 94% 94% 96% - Target Load = 1.25% + Target Load = 125% ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 1,559 951 978 1,723 Marked 74,933 75,499 75,481 59,670 Loss rate 0.8% 0.5% 0.5% 0.6% Throughput 99% 99% 99% 96% - Target Load = 1.50% + Target Load = 150% ECN ECN+ ECN+/Wait ECN+/TryOnce ------- ------- ------- ---------- Dropped 2,374 1,528 1,515 4,848 Marked 85,739 86,428 86,144 81,350 Loss rate 1.2% 0.8% 0.8% 1.4% Throughput 99% 98% 98% 98% Table 6: Simulations with an average flow size of 3 Kbytes, a 10 Mbps link, RED in byte mode, queue in bytes. @@ -1297,21 +1313,21 @@ packets immediately after the SYN/ACK exchange. Of course, with *severe* congestion, the SYN/ACK packets would likely be dropped rather than ECN-marked at the congested router, preventing the TCP responder from adding to the congestion by sending its initial window of four data packets. It is also possible that in some older TCP implementation, the initiator would ignore arriving SYN/ACK packets that had the ECT or CE codepoint set. This would result in a delay in connection set-up for that TCP connection, with the initiator re-sending the SYN packet - after a retransmit timeout. We are not aware of any TCP + after a retransmission timeout. We are not aware of any TCP implementations with this behavior. One possibility for coping with problems of backwards compatibility would be for TCP initiators to use a TCP flag that means "I understand ECN-Capable SYN/ACK packets". If this document were to standardize the use of such an "ECN-SYN" flag, then the TCP responder would only send a SYN/ACK packet as ECN-capable if the incoming SYN packet had the "ECN-SYN" flag set. An ECN-SYN flag would prevent the backwards compatibility problems described in the paragraphs above. @@ -1331,22 +1347,22 @@ implementations. This limits the scope of any backwards compatibility problems. (2) Limits to the scope of the problem: The backwards compatibility problem would not be serious enough to cause congestion collapse; with severe congestion, the buffer at the congested router will overflow, and the congested router will drop rather than ECN-mark arriving SYN packets. Some active queue management mechanisms might switch from packet-marking to packet-dropping in times of high congestion before buffer overflow, as recommended in Section 19.1 of - RFC 3168. This helps to prevent congestion collapse problems with - the use of ECN. + RFC 3168 [RFC3168]. This helps to prevent congestion collapse + problems with the use of ECN. (3) Detection of and response to backwards-compatibility problems: A TCP responder such as a web server can't differentiate between a SYN/ACK packet that is not ECN-marked in the network, and a SYN/ACK packet that is ECN-marked, but where the ECN mark is ignored by the TCP initiator. However, a TCP responder *can* detect if a SYN/ACK packet is sent as ECN-capable and not reported as ECN-marked, but data packets are dropped or marked from the initial window of data. We will call this scenario "initial-window-congestion". If a web server frequently experienced initial-window congestion (without @@ -1380,20 +1396,23 @@ Improved Controllers for AQM Routers Supporting TCP Flows, April 1998. [RED] Floyd, S., and Jacobson, V. Random Early Detection gateways for Congestion Avoidance . IEEE/ACM Transactions on Networking, V.1 N.4, August 1993. [REM] S. Athuraliya, V. H. Li, S. H. Low and Q. Yin, REM: Active Queue Management, IEEE Network, May 2001. + [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, + September 1981. + [RFC2309] B. Braden et al., Recommendations on Queue Management and Congestion Avoidance in the Internet, RFC 2309, April 1998. [RFC2581] M. Allman, V. Paxson, and W. Stevens, TCP Congestion Control, RFC 2581, April 1999. [RFC2988] V. Paxson and M. Allman, Computing TCP's Retransmission Timer, RFC 2988, November 2000. [RFC3042] M. Allman, H. Balakrishnan, and S. Floyd, Enhancing TCP's