Internet Engineering Task Force K. K. Ramakrishnan INTERNET DRAFT TeraOptic Networksdraft-ietf-tsvwg-ecn-01.txtdraft-ietf-tsvwg-ecn-02.txt Sally Floyd ACIRI D. Black EMCJanuary,February, 2001 Expires:July,August, 2001 The Addition of Explicit Congestion Notification (ECN) to IP Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to useInternet- DraftsInternet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document specifies the incorporation of ECN (Explicit Congestion Notification) to TCP and IP, including ECN's use of two bits in the IP header. We begin by describing TCP's use of packet drops as an indication of congestion. Next we explain that with the addition of active queue management (e.g., RED) to the Internet infrastructure, where routers detect congestion before the queue overflows, routers are no longer limited to packet drops as an indication of congestion. Routers can instead set the Congestion Experienced (CE)bitcodepoint in the IP header of packets from ECN-capable transports. We describe when the CEbitcodepoint is to be set in routers, and describe modifications needed to TCP to make it ECN-capable. Modifications to other transport protocols (e.g., unreliable unicast or multicast, reliable multicast, other reliable unicast transport protocols) could be considered as those protocols are developed and advance through the standards process. We also describe in this document the issues involving the use of ECN within IP tunnels, and within IPsec tunnels in particular. One of the guiding principles for this document is that all the mechanisms specified here are incrementally deployable. Table of Contents 1. Introduction 2. Conventions and Acronyms 3. Assumptions and General Principles 4. Active Queue Management (AQM) 5. Explicit Congestion Notification in IP 5.1. ECN as an Indication of Persistent Congestion 5.2. Dropped or Corrupted Packets 5.3. Fragmentation 6. Support from the Transport Protocol 6.1. TCP6.1.1.6.1.1 TCP Initialization 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 6.1.2. The TCP Sender 6.1.3. The TCP Receiver 6.1.4. Congestion on the ACK-path 6.1.5. Retransmitted TCP packets 6.1.6. TCP Window Probes. 7. Non-compliance by the End Nodes 8. Non-compliance in the Network 8.1. Complications Introduced by Split Paths 9. Encapsulated Packets 9.1. IP packets encapsulated in IP 9.1.1. The Limited-functionality and Full-functionality Options 9.1.2. Changes to the ECN Field within an IP Tunnel. 9.2. IPsec Tunnels 9.2.1. Negotiation between Tunnel Endpoints 9.2.1.1. ECN Tunnel Security Association Database Field 9.2.1.2. ECN Tunnel Security Association Attribute 9.2.1.3. Changes to IPsec Tunnel Header Processing 9.2.2. Changes to the ECN Field within an IPsec Tunnel. 9.2.3. Comments for IPsec Support 9.3. IP packets encapsulated in non-IP packet headers. 10. Issues Raised by Monitoring and Policing Devices 11. Evaluations of ECN 11.1. Related Work Evaluating ECN 11.2. A Discussion of the ECN nonce. 11.2.1. The Incremental Deployment of ECT(1) in Routers. 12. Summary of changes required in IP and TCP 13. Conclusions 14. Acknowledgements 15. References 16. Security Considerations 17. IPv4 Header Checksum Recalculation 18. Possible Changes to the ECN Field in the Network 18.1. Possible Changes to the IP Header 18.1.1. Erasing the Congestion Indication 18.1.2. Falsely Reporting Congestion 18.1.3. Disabling ECN-Capability 18.1.4. Falsely Indicating ECN-Capability18.1.5. Changes with No Functional Effect18.2. Information carried in the Transport Header 18.3. Split Paths 19. Implications of Subverting End-to-End Congestion Control 19.1. Implications for the Network and for Competing Flows 19.2. Implications for the Subverted Flow 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 20. The Motivation for the ECTbit.Codepoints. 20.1. The Motivation for an ECT Codepoint. 20.2. The Motivation for two ECT Codepoints. 21. Why use Two Bits in the IP Header? 22. Historical Definitions for the IPv4 TOS Octet 23. IANA Considerations RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare this withdraft-ietf-tsvwg-ecn-00,draft-ietf-tsvwg-ecn-01, compare the following:"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-00.troff""http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-01.troff" "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-02.troff" Changes fromdraft-ietf-tsvwg-ecn-00: * Deleted Section 6.1.1.2. on "Robust TCP Initialization with no response todraft-ietf-tsvwg-ecn-01: Added theSYN",ECT(1) codepoint, andmodified the paragraph in the Conclusions referringchanged references about bits tothis. * Addedreferences about codepoints in many places. Also added Section2311.2 onIANA Considerations. * Added two paragraphs to"A Discussion of the ECN nonce", and Section18.220.2 ondenial-of-service attacks. *"The Motivation for two ECT Codepoints". Addedsome text about the ECN nonce beingaresearch issue. * Moved two paragraphs aboutparagraph saying that by default, the discussion of setting theCWR bit from Section 6.1.3CE codepoint applies to all Differentiated Services Per-Hop Behaviors. Added Section6.1.2. * Various small changes: Adding several small clarifying sentences in Section 12, 22. Small clarification5.3 on fragmentation. Added "A host MUST NOT set ECT on SYN or SYN-ACK packets." totext in Section 19.2. Deleted a few unnecessary sentences inthe end of Section9. Updated some references6.1.1, just toSection X. Added morebe explicit. Corrected some references toRFC 2780. Deleted references"Section 19" tointernet-drafts."Section 22". Clarifiedterminology for "non-ECN-setup SYN packet", including the following: "Receivers MUST correctly handle all forms of the non-ECN- setup SYNthat ECN is defined identically in IPv4 andSYN-ACK packets."in IPv6. 1. Introduction TCP's congestion control and avoidance algorithms are based on the notion that the network is a black-box [Jacobson88, Jacobson90]. The network's state of congestion or otherwise is determined byend-sys- temsend- systems probing for the network state, by gradually increasing the load on the network (by increasing the window of packets that areout- standingoutstanding in the network) until the network becomes congested and a packet is lost. Treating the network as a "black-box" and treating loss as an indication of congestion in the network is appropriate for pure best-effort data carried by TCP, with little or no sensitivity to delay or loss of individual packets. In addition, TCP'sconges- tioncongestion management algorithms have techniques built-in (such as Fast Retransmit and Fast Recovery) to minimize the impact of losses, from a throughput perspective. However, these mechanisms are not intended to help applications that are in fact sensitive to the delay or loss of one or more individual packets. Interactive traffic such astel- net,telnet, web-browsing, and transfer of audio and video data can besensi- tivesensitive to packet losses (especially when using an unreliable data delivery transport such as UDP) or to the increased latency of the packet caused by the need to retransmit the packet after a loss (with the reliable data delivery semantics provided by TCP). Since TCP determines the appropriate congestion window to use by gradually increasing the window size until it experiences a dropped packet, this causes the queues at the bottleneck router to build up. With most packet drop policies at the router that are not sensitive to the load placed by each individual flow (e.g., tail-drop on queue overflow), this means that some of the packets of latency-sensitive flows may be dropped. In addition, such drop policies lead tosyn- chronizationsynchronization of loss across multiple flows. Active queue management mechanisms detect congestion before the queue overflows, and provide an indication of this congestion to the end nodes. Thus, active queue management can reduce unnecessary queueing delay for all traffic sharing that queue. The advantages of active queue management are discussed in RFC 2309 [RFC2309]. Active queue management avoids some of the bad properties of dropping on queue overflow, including the undesirable synchronization of loss across multiple flows. More importantly, active queue management means that transport protocols with mechanisms for congestion control (e.g., TCP) do not have to rely on buffer overflow as the only indication of congestion. Active queue management mechanisms may use one of several methods for indicating congestion to end-nodes. One is to use packet drops, as is currently done. However, active queue management allows the router to separate policies of queueing or dropping packets from the policies for indicating congestion. Thus, active queue management allows routers to use the Congestion Experienced (CE)bitcodepoint in a packet header as an indication of congestion, instead of relying solely on packet drops. This has the potential of reducing the impact of loss on latency-sensitive flows. This document is intended to obsolete RFC 2481, "A Proposal to add Explicit Congestion Notification (ECN) to IP", which defined ECN as an Experimental Protocol for the Internet Community. RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This document obsoletes three subsequent internet-drafts on ECN, "IPsec Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP with ECN: The Treatment of Retransmitted Data Packets". This document is intended largely to merge the earlier documents all into a single document, for greater clarity, in preparation to becoming a Proposed Standard. 2. Conventions and Acronyms The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [B97]. 3. Assumptions and General Principles In this section, we describe some of the important design principles and assumptions that guided the design choices in this proposal. * Because ECN is likely to be adopted gradually, accommodatingmigra- tionmigration is essential. Some routers may still only drop packets toindi- cateindicate congestion, and some end-systems may not be ECN-capable. The most viable strategy is one that accommodates incremental deployment without having to resort to "islands" of ECN-capable and non-ECN- capable environments. * New mechanisms for congestion control and avoidance need to co- exist and cooperate with existing mechanisms for congestion control. In particular, new mechanisms have to co-exist with TCP's current methods of adapting to congestion and with routers' current practice of dropping packets in periods of congestion. * Congestion may persist over different time-scales. The time scales that we are concerned with are congestion events that may last longer than a round-trip time. * The number of packets in an individual flow (e.g., TCP connection or an exchange using UDP) may range from a small number of packets to quite a large number. We are interested in managing the congestion caused by flows that send enough packets so that they are still active when network feedback reaches them. * Asymmetric routing is likely to be a normal occurrence in the Internet. The path (sequence of links and routers) followed by data packets may be different from the path followed by the acknowledgment packets in the reverse direction. * Many routers process the "regular" headers in IP packets moreeffi- cientlyefficiently than they process the header information in IP options. This suggests keeping congestion experienced information in the regular headers of an IP packet. * It must be recognized that not all end-systems will cooperate in mechanisms for congestion control. However, new mechanisms shouldn't make it easier for TCP applications to disable TCP congestioncon- trol.control. The benefit of lying about participating in new mechanisms such as ECN-capability should be small. 4. Active Queue Management (AQM) Random Early Detection (RED) is one mechanism for Active QueueMan- agementManagement (AQM) that has been proposed to detect incipient congestion [FJ93], and is currently being deployed in the Internet [RFC2309]. AQM is meant to be a general mechanism using one of severalalterna- tivesalternatives for congestion indication, but in the absence of ECN, AQM is restricted to using packet drops as a mechanism for congestionindi- cation.indication. AQM drops packets based on the average queue lengthexceed- ingexceeding a threshold, rather than only when the queue overflows. However, because AQM may drop packets before the queue actually overflows, AQM is not always forced by memory limitations to discard the packet. AQM can set a Congestion Experienced (CE)bitcodepoint in the packet header instead of dropping the packet, when such abitfield is provided in the IP header and understood by the transport protocol. The use of the CEbitcodepoint with ECN allows the receiver(s) to receive the packet, avoiding the potential for excessive delays due to retransmissions after packet losses. We use the term 'CE packet' to denote a packet that has the CEbitcodepoint set. 5. Explicit Congestion Notification in IP This document specifies that the Internet provide a congestionindi- cationindication for incipient congestion (as in RED and earlier work [RJ90]) where the notification can sometimes be through marking packets rather than dropping them. This uses an ECN field in the IP header with twobits.bits, making four ECN codepoints, '00' to '11'. The ECN-Capable Transport (ECT)bit iscodepoints '10' and '01' are set by the data sender to indicate that the end-points of the transport protocol areECN-capable.ECN-capable; we call them ECT(0) and ECT(1) respectively. TheCE bit is set by the router to indicate conges- tionphrase "the ECT codepoint" in this documents refers to either of theend nodes.two ECT codepoints. Routersthat have a packet arriving at a full queue droptreat thepacket, justECT(0) and ECT(1) codepoints asthey do inequivalent. Senders are free to use either theabsence of ECN. Bits 6 and 7 in the IPv4 TOS octet are designated asECT(0) or theECN field. Bit 6 is designated asECT(1) codepoint to indicate ECT, on a packet-by-packet basis. The use of both theECT bit,two codepoints for ECT, ECT(0) andbit 7ECT(1), isdesignated asmotivated primarily by theCE bit. The IPv4 TOS octet correspondsdesire tothe Traffic Class octet in IPv6. The definitionsallow mechanisms for theIPv4 TOS octet [RFC791]data sender to verify that network elements are not erasing the CE codepoint, and that data receivers are properly reporting to theIPv6 Traffic Class octet have been supersededsender the receipt of packets with the CE codepoint set, as required by thesix-bit DS (Differen- tiated Services) Field [RFC2474, RFC2780]. Bits 6transport protocol. Guidelines for the senders and7 are listed in [RFC2474] as Currently Unused,receivers to differentiate between the ECT(0) andare specifiedECT(1) codepoints will be addressed inRFC 2780 as approvedseparate documents, forexperimental useeach transport protocol. In particular, this document does not address mechanisms for TCP end- nodes to differentiate between the ECT(0) and ECT(1) codepoints. Protocols and senders that only require a single ECT codepoint SHOULD use ECT(0). The not-ECT codepoint '00' indicates a packet that is not using ECN.Section 19 givesThe CE codepoint '11' is set by abrief his- tory ofrouter to indicate congestion to theTOS octet. 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | DS FIELDend nodes. Routers that have a packet arriving at a full queue drop the packet, just as they do in the absence of ECN. +-----+-----+ | ECN FIELD || | | | DSCP |+-----+-----+ ECT|CE| +-----+-----+-----+-----+-----+-----+-----+-----+ DSCP: differentiated services codepoint ECN: Explicit Congestion Notification Figure 1:TheDifferentiated ServicesECT andECN Fields in IP. BecauseCE bits defined in RFC 2481. 0 0 Not-ECT 0 1 ECT(1) 1 0 ECT(0) 1 1 CE Figure 1: The ECN Field in IP. The use of two ECT codepoints essentially gives a one-bit ECN nonce in packet headers, and routers necessarily "erase" theunstable history ofnonce when they set theTOS octet,CE codepoint [SCWA99]. For example, routers that erased theuseCE codepoint would face additional difficulty in reconstructing the original nonce, and thus repeated erasure of theECN field as specified in this document cannotCE codepoint would beguaranteedmore likely to beback- wards compatible with all past uses of these two bits. The potential dangers of this lack of backwards compatibility are discussed in Sec- tion 19. Upon the receiptdetected byan ECN-Capable transport of a single CE packet,thecongestion control algorithms followed atend-nodes. The ECN nonce also can address theend-systems MUST be essentiallyproblem of misbehaving transport receivers lying to thesame astransport sender about whether or not thecongestion control response toCE codepoint was set in a*single* droppedpacket.For example,The motivations forECN-Capable TCPthesource TCPuse of two ECT codepoints isrequired to halve its congestion window for any windowdiscussed in more detail in Section 20, along with some discussion ofdata con- taining either a packet drop or an ECN indication. One reasonalternate possibilities forrequiring thatthecongestion-control response to the CE packet be essentially the same asfourth ECT codepoint. Backwards compatibility with earlier ECN implementations that do not understand theresponse to a dropped packetECT(1) codepoint isto accommodatediscussed in Section 11. In RFC 2481 [RFC2481], theincremental deployment ofECNin both end-sys- tems and in routers. Some routers may drop ECN-Capable packets (e.g., usingfield was divided into thesame AQM policies for congestion detection) while other routers setECN-Capable Transport (ECT) bit and the CEbit, for equivalent levels of congestion. Similarly, a router might drop a non-ECN-Capable packet but setbit. The ECN field with only theCEECN- Capable Transport (ECT) bit set inan ECN-Capable packet, for equivalent levels of congestion. If there were different congestion control responsesRFC 2481 corresponds toathe ECT(0) codepoint in this document, and the ECN field with both the ECT and CE bitindication thanin RFC 2481 corresponds toa packet drop,the CE codepoint in thiscould resultdocument. The '01' codepoint was left undefined inunfair treat- ment for different flows. An additional goalRFC 2481, and this isthattheend-systems should react to congestion at most once per window of data (i.e., at most once per round-trip time), to avoid reacting multiple times to multiple indications of congestion within a round-trip time. For a router,reason for recommending theCE bituse ofan ECN-Capable packet shouldECT(0) when onlybe set if the router would otherwise have droppeda single ECT codepoint is needed. 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | DS FIELD, DSCP | ECN FIELD | +-----+-----+-----+-----+-----+-----+-----+-----+ DSCP: differentiated services codepoint ECN: Explicit Congestion Notification Figure 2: The Differentiated Services and ECN Fields in IP. Bits 6 and 7 in thepacketIPv4 TOS octet are designated asan indica- tion of congestion totheend nodes. WhenECN field. The IPv4 TOS octet corresponds to therouter's buffer is not yet fullTraffic Class octet in IPv6, and therouterECN field isprepared to drop a packet to inform end nodes of incipient congestion,defined identically in both cases. The definitions for therouter should first check to see ifIPv4 TOS octet [RFC791] and theECT bit is setIPv6 Traffic Class octet have been superseded by the six-bit DS (Differentiated Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed inthat packet's IP header. If so, then instead[RFC2474] as Currently Unused, and are specified in RFC 2780 as approved for experimental use for ECN. Section 22 gives a brief history ofdroppingthepacket,TOS octet. Because of therouter MAY instead setunstable history of theCE bit inTOS octet, theIP header. An environment where all end nodes were ECN-Capable could allow new criteria to be developed for settinguse of theCE bit, and new congestion control mechanisms for end-node reaction to CE packets. However, this is a research issue, andECN field assuch is not addressedspecified in thisdocu- ment. When a CE packet (i.e., a packetdocument cannot be guaranteed to be backwards compatible with those past uses of these two bits thathaspre- date ECN. The potential dangers of this lack of backwards compatibility are discussed in Section 22. Upon theCE bit set) is receivedreceipt by an ECN-Capable transport of arouter, thesingle CEbit is left unchanged, andpacket, thepacket is trans- mitted as usual. When severecongestionhas occurred andcontrol algorithms followed at therouter's queue is full, thenend-systems MUST be essentially therouter has no choice butsame as the congestion control response todrop some packet whenanew packet arrives. We anticipate that such packet losses will become relatively infrequent when a majority of end-systems become*single* dropped packet. For example, for ECN-Capableand participate inTCPor other compatible conges- tion control mechanisms. In an ECN-Capable environment thatthe source TCP isade- quately-provisioned network,required to halve its congestion window for any window of data containing either a packetlosses should occur primarily during transientsdrop orin the presence of non-cooperating sources. We expectan ECN indication. One reason for requiring thatrouters will settheCE bit incongestion-control response toincipient congestion as indicated bytheaverage queue size, usingCE packet be essentially theRED algorithms suggested in [FJ93, RFC2309]. Tosame as thebest of our knowl- edge, thisresponse to a dropped packet is to accommodate theonly proposal currently under discussionincremental deployment of ECN inthe IETF forboth end- systems and in routers. Some routerstomay drop ECN-Capable packetsproactively, before(e.g., using thebuffer over- flows. However, this document does not attempt to specify a particu- lar mechanismsame AQM policies foractive queue management, leaving that endeavor, if needed, to other areas of the IETF. While ECN is inextricably tied up with the need to have a reasonable active queue management mecha- nism at the router, the reverse does not hold; active queue manage- ment mechanisms have been developed and deployed independent of ECN, using packet drops as indications ofcongestionin the absence of ECN indetection) while other routers set theIP architecture. 5.1. ECN as an IndicationCE codepoint, for equivalent levels ofPersistent Congestion We emphasize thatcongestion. Similarly, a*single*router might drop a non-ECN-Capable packetwithbut set the CEbit setcodepoint in anIP packet causes the transport layer to respond, in termsECN-Capable packet, for equivalent levels ofcongestion control, as it wouldcongestion. If there were different congestion control responses to apacket drop. The instantaneous queue size is likelyCE codepoint than tosee considerable variations even when the router does not experience persistent congestion. As such, ita packet drop, this could result in unfair treatment for different flows. An additional goal isimportantthattransientthe end-systems should react to congestion ata router, reflected by the instantaneous queue size reaching a threshold much smaller than the capacitymost once per window ofthe queue, not trigger a reactiondata (i.e., atthe transport layer. Therefore,most once per round-trip time), to avoid reacting multiple times to multiple indications of congestion within a round-trip time. For a router, the CEbit should notcodepoint of an ECN-Capable packet SHOULD only be setby a router based on the instantaneous queue size. For example, sinceif theATM and Frame Relay mechanisms for congestion indicationrouter would otherwise havetypically been defined withoutdropped the packet as anassociated notionindication ofaverage queue size ascongestion to thebasis for determining that an intermedi- ate node is congested, we believe that they provide a very noisy sig- nal. The TCP-sender reaction specified in this document for ECNend nodes. When the router's buffer isNOTnot yet full and theappropriate reaction for suchrouter is prepared to drop anoisy signalpacket to inform end nodes ofcongestion notification. However, ifincipient congestion, therouters that interfacerouter should first check to see if theATM net- work have a wayECT codepoint is set in that packet's IP header. If so, then instead ofmaintainingdropping theaverage queue atpacket, theinterface, and use it to come to a reliable determination thatrouter MAY instead set theATM subnet is congested, they may useCE codepoint in theECN notification that is defined here. We continueIP header. An environment where all end nodes were ECN-Capable could allow new criteria toencourage experiments in techniques at layer 2 (e.g., in ATM switches or Frame Relay switches)be developed for setting the CE codepoint, and new congestion control mechanisms for end-node reaction totake advantage of ECN. For example, usingCE packets. However, this is ascheme suchresearch issue, and asRED (where packet markingsuch isbased on the average queue length exceeding a threshold), layer 2 devices could provide a reasonably reliable indication of congestion. When all the layer 2 devicesnot addressed in this document. When apath setCE packet (i.e., a packet thatlayer's own Conges- tion Experienced bit (e.g., the EFCI bit for ATM, the FECN bit in Frame Relay) in this reliable manner, then the interface router to the layer 2 network could copyhas thestate of that layer 2 Congestion Experienced bit intoCE codepoint set) is received by a router, the CEbit incodepoint is left unchanged, and theIP header. We recognize that thispacket isnottransmitted as usual. When severe congestion has occurred and thecurrent practice, norrouter's queue isit in current standards. However, encouraging experimentation in this manner may providefull, then theinformation needed to enable evolution of existing layer 2 mechanismsrouter has no choice but toprovide a more reliable means of congestion indication,drop some packet whenthey useasingle bit for indicating congestion. 5.2. Dropped or Corrupted Packets For the proposed use for ECN in this document (that is, for a trans- port protocolnew packet arrives. We anticipate that suchas TCP for which a dropped datapacketis an indi- cationlosses will become relatively infrequent when a majority ofcongestion), end nodes detect dropped data packets,end-systems become ECN-Capable andtheparticipate in TCP or other compatible congestionresponse ofcontrol mechanisms. In an ECN-Capable environment that is adequately-provisioned, packet losses should occur primarily during transients or in theend nodes topresence of non- cooperating sources. The above discussion of when CE may be set instead of dropping adropped datapacketis at least as strong as the congestion responseapplies by default to all Differentiated Services Per-Hop Behaviors (PHBs) [RFC 2475]. Specifications for PHBs MAY provide more specifics on how areceived CE packet. To ensure the reliable delivery of the congestion indication of thecompliant implementation is to choose between setting CEbit, the ECT bitand dropping a packet, but this is NOT REQUIRED. A router MUST NOTbesetinCE instead of dropping a packetunlesswhen theloss ofdrop thatpacket in the networkwouldbe detectedoccur is caused by reasons other than congestion or the desire to indicate incipient congestion to end nodesand interpreted as an indication of congestion. Transport protocols such as TCP do not necessarily detect all packet drops, such as the(e.g., a diffserv edge node may be configured to unconditionally drop certain classes ofa "pure" ACK packet; for example, TCP does not reducetraffic to prevent them from entering its diffserv domain). We expect that routers will set thearrival rate of subsequent ACK packetsCE codepoint in response toan earlier dropped ACK packet. Any proposal for extending ECN-Capa- bility to such packets would have to address issues suchincipient congestion as indicated by thecase of an ACK packet that was marked withaverage queue size, using theCE bit but was later droppedRED algorithms suggested in [FJ93, RFC2309]. To thenetwork. We believe that this aspect is still the sub- jectbest ofresearch, so this document specifies that atour knowledge, thistime, "pure" ACK packets MUST NOT indicate ECN-Capability. Similarly, if a CE packetisdropped laterthe only proposal currently under discussion in thenetwork dueIETF for routers tocor- ruption (bit errors),drop packets proactively, before theend nodes should still invoke congestion control, just as TCP would today in responsebuffer overflows. However, this document does not attempt to specify adropped data packet. This issue of corrupted CE packets would have to be consid- ered in any proposalparticular mechanism forthe network to distinguish between packets dropped due to corruption, and packets dropped dueactive queue management, leaving that endeavor, if needed, tocongestion or buffer overflow. In particular, the ubiquitous deploymentother areas of the IETF. While ECNwould not, inis inextricably tied up with the need to have a reasonable active queue management mechanism at the router, the reverse does not hold; active queue management mechanisms have been developed and deployed independent ofitself, be a sufficient development to allow end-nodes to interpretECN, using packet drops as indications ofcorruption rather than congestion. 6. Support from the Transport Protocol ECN requires support from the transport protocol,congestion inaddition to the functionality given bythe absence of ECNfieldin the IP architecture. 5.1. ECN as an Indication of Persistent Congestion We emphasize that a *single* packetheader. The transport protocol might require negotiation betweenwith theendpoints during setupCE codepoint set in an IP packet causes the transport layer todetermine that allrespond, in terms of congestion control, as it would to a packet drop. The instantaneous queue size is likely to see considerable variations even when theendpoints are ECN-capable, sorouter does not experience persistent congestion. As such, it is important that transient congestion at a router, reflected by thesender can setinstantaneous queue size reaching a threshold much smaller than theECT bit in transmitted packets. Sec- ond,capacity of the queue, not trigger a reaction at the transportprotocol mustlayer. Therefore, the CE codepoint should not becapable of reacting appropriately toset by a router based on thereceiptinstantaneous queue size. For example, since the ATM and Frame Relay mechanisms for congestion indication have typically been defined without an associated notion ofCE packets. Thisaverage queue size as the basis for determining that an intermediate node is congested, we believe that they provide a very noisy signal. The TCP-sender reactioncould bespecified in this document for ECN is NOT theformappropriate reaction for such a noisy signal of congestion notification. However, if thedata receiver informingrouters that interface to thedata senderATM network have a way of maintaining thereceived CE packet (e.g., TCP), ofaverage queue at thedata receiver unsubscribinginterface, and use it to come to alayered multi- cast group (e.g., RLM [MJV96]), or of some other actionreliable determination thatulti- mately reducesthearrival rate of that flow on that congested link. This document only addressesATM subnet is congested, they may use theaddition ofECNCapabilitynotification that is defined here. We continue toTCP, leaving issues of ECNencourage experiments inother transport protocolstechniques at layer 2 (e.g., in ATM switches or Frame Relay switches) tofurther research. For TCP, ECN requires three new piecestake advantage offunctionality: negotiation between the endpoints during connection setup to deter- mine if they are both ECN-capable; an ECN-Echo (ECE) flag in the TCP header so that the data receiver can inform the data sender whenECN. For example, using aCEscheme such as RED (where packethas been received; and a Congestion Window Reduced (CWR) flag in the TCP header so thatmarking is based on thedata sender can informaverage queue length exceeding a threshold), layer 2 devices could provide a reasonably reliable indication of congestion. When all thedata receiverlayer 2 devices in a path set that layer's own Congestion Experienced codepoint (e.g., thecongestion window has been reduced. The support required from other transport protocols is likely to be different, particularlyEFCI bit forunreliable orATM, the FECN bit in Frame Relay) in this reliablemulticast transport proto- cols, and will have to be determined as other transport protocols are broughtmanner, then the interface router to theIETF for standardization. 6.1. TCP The following sections describe in detaillayer 2 network could copy theproposed usestate ofECN in TCP. This proposal is described in essentiallythat layer 2 Congestion Experienced codepoint into thesame formCE codepoint in[Floyd94].the IP header. Weassumerecognize that this is not thesource TCP uses the standard congestion control algorithms of Slow-start, Fast Retransmit and Fast Recovery [RFC 2001]. This proposal specifies two new flagscurrent practice, nor is it in current standards. However, encouraging experimentation in this manner may provide theReserved fieldinformation needed to enable evolution ofthe TCP header. The TCP mechanismexisting layer 2 mechanisms to provide a more reliable means of congestion indication, when they use a single bit fornegotiating ECN-Capability usesindicating congestion. 5.2. Dropped or Corrupted Packets For theECN-Echo (ECE) flagproposed use for ECN inthethis document (that is, for a transport protocol such as TCPheader. Bit 9 infor which a dropped data packet is an indication of congestion), end nodes detect dropped data packets, and theReserved fieldcongestion response of theTCP headerend nodes to a dropped data packet isdesignatedat least as strong as theECN-Echo flag. The locationcongestion response to a received CE packet. To ensure the reliable delivery of the6-bit Reserved field incongestion indication of theTCP header is shownCE codepoint, an ECT codepoint MUST NOT be set inFigure 3 of RFC 793 [RFC793] (and is reproduced below for complete- ness). This specificationa packet unless the loss of that packet in theECN Field leavesnetwork would be detected by theReserved fieldend nodes and interpreted as an indication of congestion. Transport protocols such asa 4-bit field using bits 4-7. To enable theTCPreceiver to determine when to stop settingdo not necessarily detect all packet drops, such as theECN- Echo flag, we introducedrop of asecond new flag in the"pure" ACK packet; for example, TCPheader,does not reduce theCWR flag. The CWR flag is assignedarrival rate of subsequent ACK packets in response to an earlier dropped ACK packet. Any proposal for extending ECN- Capability to such packets would have to address issues such as the case of an ACK packet that was marked with the CE codepoint but was later dropped in the network. We believe that this aspect is still the subject of research, so this document specifies that at this time, "pure" ACK packets MUST NOT indicate ECN-Capability. Similarly, if a CE packet is dropped later in the network due to corruption (bit errors), the end nodes should still invoke congestion control, just as TCP would today in response to a dropped data packet. This issue of corrupted CE packets would have to be considered in any proposal for the network to distinguish between packets dropped due to corruption, and packets dropped due to congestion or buffer overflow. In particular, the ubiquitous deployment of ECN would not, in and of itself, be a sufficient development to allow end-nodes to interpret packet drops as indications of corruption rather than congestion. 5.3. Fragmentation All ECN-capable packets SHOULD have the DF (Don't Fragment) bit set. Reassembly of a fragmented packet MUST NOT lose indications of congestion. In other words, if any fragment of an IP packet to be reassembled has the CE codepoint set, then one of two actions MUST be taken: * The reassembled packet has the CE codepoint set. This MUST NOT occur if any of the other fragments contributing to this reassembly carries the Not-ECT codepoint. * The packet is dropped instead of being reassmembled. If both actions are applicable, either MAY be chosen. Reassembly of a fragmented packet MUST NOT change the ECN codepoint when all of the fragments carry the same codepoint. Situations may arise in which the above specification is insufficiently precise. For example, it does not place requirements on reassembly of fragments that carry a mixture of ECT(0), ECT(1) and/or Not-ECT. In situations where more precise reassembly behavior would be required, protocol specifications SHOULD instead specify that DF MUST be set in all packets sent by the protocol. 6. Support from the Transport Protocol ECN requires support from the transport protocol, in addition to the functionality given by the ECN field in the IP packet header. The transport protocol might require negotiation between the endpoints during setup to determine that all of the endpoints are ECN-capable, so that the sender can set the ECT codepoint in transmitted packets. Second, the transport protocol must be capable of reacting appropriately to the receipt of CE packets. This reaction could be in the form of the data receiver informing the data sender of the received CE packet (e.g., TCP), of the data receiver unsubscribing to a layered multicast group (e.g., RLM [MJV96]), or of some other action that ultimately reduces the arrival rate of that flow on that congested link. CE packets indicate persistent rather than transient congestion (see Section 5.1), and hence reactions to the receipt of CE packets should be those appropriate for persistent congestion. This document only addresses the addition of ECN Capability to TCP, leaving issues of ECN in other transport protocols to further research. For TCP, ECN requires three new pieces of functionality: negotiation between the endpoints during connection setup to determine if they are both ECN-capable; an ECN-Echo (ECE) flag in the TCP header so that the data receiver can inform the data sender when a CE packet has been received; and a Congestion Window Reduced (CWR) flag in the TCP header so that the data sender can inform the data receiver that the congestion window has been reduced. The support required from other transport protocols is likely to be different, particularly for unreliable or reliable multicast transport protocols, and will have to be determined as other transport protocols are brought to the IETF for standardization. 6.1. TCP The following sections describe in detail the proposed use of ECN in TCP. This proposal is described in essentially the same form in [Floyd94]. We assume that the source TCP uses the standard congestion control algorithms of Slow-start, Fast Retransmit and Fast Recovery [RFC 2001]. This proposal specifies two new flags in the Reserved field of the TCP header. The TCP mechanism for negotiating ECN-Capability uses the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved field of the TCP header is designated as the ECN-Echo flag. The location of the 6-bit Reserved field in the TCP header is shown in Figure 4 of RFC 793 [RFC793] (and is reproduced below for completeness). This specification of the ECN Field leaves the Reserved field as a 4-bit field using bits 4-7. To enable the TCP receiver to determine when to stop setting the ECN- Echo flag, we introduce a second new flag in the TCP header, the CWR flag. The CWR flag is assigned to Bit 8 in the Reserved field of the TCP header. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | U | A | P | R | S | F | | Header Length | Reserved | R | C | S | S | Y | I | | | | G | K | H | T | N | N | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ Figure2:3: The old definition of bytes 13 and 14 of the TCP header. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | C | E | U | A | P | R | S | F | | Header Length | Reserved | W | C | R | C | S | S | Y | I | | | | R | E | G | K | H | T | N | N | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ Figure3:4: The new definition of bytes 13 and 14 of the TCP Header. Thus, ECN uses the ECT and CE flags in the IP header (as shown in Figure 1) for signaling between routers and connection endpoints, and uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure3)4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection, a typical sequence of events in an ECN-based reaction to congestion is as follows: *TheAn ECTbitcodepoint is set in packets transmitted by the sender toindi- cateindicate that ECN is supported by the transport entities for these packets. * An ECN-capable router detects impending congestion and detects thatthean ECTbitcodepoint is set in the packet it is about to drop. Instead of dropping the packet, the router chooses to set the CEbitcodepoint in the IP header and forwards the packet. * The receiver receives the packet with the CEbitcodepoint set, and sets the ECN-Echo flag in its next TCP ACK sent to the sender. * The sender receives the TCP ACK with ECN-Echo set, and reacts to the congestion as if a packet had been dropped. * The sender sets the CWR flag in the TCP header of the next packet sent to the receiver to acknowledge its receipt of and reaction to the ECN-Echo flag. The negotiation for using ECN by the TCP transport entities and the use of the ECN-Echo and CWR flags is described in more detail in the sections below. 6.1.1 TCP Initialization In the TCP connection setup phase, the source and destination TCPs exchange information about their willingness to use ECN. Subsequent to the completion of this negotiation, the TCP sender setsthean ECTbitcodepoint in the IP header of data packets to indicate to the network that the transport is capable and willing to participate in ECN for this packet. This indicates to the routers that they may mark this packet with the CEbit,codepoint, if they would like to use that as a method ofcon- gestioncongestion notification. If the TCP connection does not wish to use ECN notification for a particular packet, the sending TCP sets theECT bit equalECN codepoint to0 (i.e., not set),not-ECT, and the TCP receiver ignores the CEbitcodepoint in the received packet. For this discussion, we designate the initiating host as Host A and the responding host as Host B. We call a SYN packet with the ECE and CWR flags set an "ECN-setup SYN packet", and we call a SYN packet with at least one of the ECE and CWR flags not set a "non-ECN-setup SYN packet". Similarly, we call a SYN-ACK packet with only the ECE flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and we call a SYN-ACK packet with any other configuration of the ECE and CWR flags a "non-ECN-setup SYN-ACK packet". Before a TCP connection can use ECN, Host A sends an ECN-setup SYN packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN packet, the setting of both ECE and CWR in the ECN-setup SYN packet is defined as an indication that the sending TCP is ECN-Capable, rather than as an indication of congestion or of response toconges- tion.congestion. More precisely, an ECN-setup SYN packet indicates that the TCP implementation transmitting the SYN packet will participate in ECN as both a sender and receiver. Specifically, as a receiver, it will respond to incoming data packets that have the CEbitcodepoint set in the IP header by setting ECE in outgoing TCP Acknowledgement (ACK) packets. As a sender, it will respond to incoming packets that have ECE set by reducing the congestion window and setting CWR when appropriate. An ECN-setup SYN packet does not commit the TCP sender to setting the ECTbitcodepoint in any or all of the packets it may transmit. However, the commitment to respond appropriately to incoming packets with the CEbitcodepoint set remains even if the TCP sender in a later transmission, within this TCP connection, sends a SYN packet without ECE and CWR set. When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an indication that the TCP transmitting the SYN-ACK packet isECN-Capa- ble.ECN- Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does not commit the TCP host to setting the ECTbitcodepoint in transmitted packets. The following rules apply to the sending of ECN-setup packets: * If a host has received an ECN-setup SYN packet, then it MAY send an ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup SYN-ACK packet. * A host MUST NOT set ECT on data packets unless it has sent at least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK packet, then it SHOULD NOT set ECT on data packets. * If a host ever sets the ECTbitcodepoint on a data packet, then that host MUST correctly set/clear the CWR TCP bit on all subsequent packets in the connection. * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- ACK packet, then if that host receives TCP data packets with ECT and CEbitscodepoints set in the IP header, then that host MUST process thesepack- etspackets as specified for an ECN-capable connection. * A host that is not willing to use ECN on a TCP connection SHOULD clear both the ECE and CWR flags in all non-ECN-setup SYN and/orSYN-ACKSYN- ACK packets that it sends to indicate this unwillingness. Receivers MUST correctlyhan- dlehandle all forms of the non-ECN-setup SYN and SYN-ACK packets. * A host MUST NOT set ECT on SYN or SYN-ACK packets. 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field There is the question of why we chose to have the TCP sending the SYN set two ECN-related flags in the Reserved field of the TCP header for the SYN packet, while the responding TCP sending the SYN-ACK sets only one ECN-related flag in the SYN-ACK packet. This asymmetry is necessary for the robust negotiation of ECN-capability with some deployed TCP implementations. There exists at least one faulty TCP implementation in which TCP receivers set the Reserved field of the TCP header in ACK packets (and hence the SYN-ACK) simply to reflect the Reserved field of the TCP header in the received data packet. Because the TCP SYN packet sets the ECN-Echo and CWR flags toindi- cateindicate ECN-capability, while the SYN-ACK packet sets only theECN-EchoECN- Echo flag, the sending TCP correctly interprets a receiver's reflection of its own flags in the Reserved field as an indication that the receiver is not ECN-capable. The sending TCP is not mislead by a faulty TCP implementation sending a SYN-ACK packet that simply reflects the Reserved field of the incoming SYN packet. 6.1.2. The TCP Sender For a TCP connection using ECN, new data packets are transmitted withthean ECTbitcodepoint set in the IPheader (set toheader. When only one ECT codepoint is needed by a sender for all packets sent on a"1").TCP connection, ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE) ACK packet (that is, an ACK packet with the ECN-Echo flag set in the TCP header), then the sender knows that congestion was encountered in the network on the path from the sender to the receiver. The indication of congestion should be treated just as a congestion loss innon-ECN-Capablenon-ECN- Capable TCP. That is, the TCP source halves the congestion window "cwnd" and reduces the slow start threshold "ssthresh". The sending TCP SHOULD NOT increase thecon- gestioncongestion window in response to the receipt of an ECN-Echo ACK packet. TCP should not react to congestion indications more than once every window of data (or more loosely, more than once every round-trip time). That is, the TCP sender's congestion window should be reduced only once in response to a series of dropped and/or CE packets from a single window of data. In addition, the TCP source should not decrease the slow-start threshold, ssthresh, if it has been decreased within the last round trip time. However, if any retransmittedpack- etspackets are dropped, then this is interpreted by the source TCP as a new instance of congestion. After the source TCP reduces its congestion window in response to a CE packet, incoming acknowledgements that continue to arrive can "clock out" outgoing packets as allowed by the reduced congestion window. If the congestion window consists of only one MSS (maximum segment size), and the sending TCP receives an ECN-Echo ACK packet, then the sending TCP should in principle still reduce its congestion window in half. However, the value of the congestion window is bounded below by a value of one MSS. If the sending TCP were tocon- tinuecontinue to send, using a congestion window of 1 MSS, this results in the transmission of one packet per round-trip time. It is necessary to still reduce the sending rate of the TCP sender even further, on receipt of an ECN-Echo packet when the congestion window is one. We use the retransmit timer as a means of reducing the rate further in this circumstance. Therefore, the sending TCP MUST reset the retransmit timer on receiving the ECN-Echo packet when the congestion window is one. The sending TCP will then be able to send a new packet only when the retransmit timer expires. When an ECN-Capable TCP sender reduces its congestion window for any reason (because of a retransmit timeout, a Fast Retransmit, or in response to an ECN Notification), the TCP sender sets the CWR flag in the TCP header of the first new data packet sent after the window reduction. If that data packet is dropped in the network, then the sending TCP will have to reduce the congestion window again and retransmit the dropped packet. We ensure that the "Congestion Window Reduced" information isreli- ablyreliably delivered to the TCP receiver. This comes about from the fact that if the new data packet carrying the CWR flag is dropped, then the TCP sender will have to again reduce its congestion window, and send another new data packet with the CWR flag set. Thus, the CWR bit in the TCP header SHOULD NOT be set on retransmitted packets. When the TCP data sender is ready to set the CWR bit after reducing the congestion window, it SHOULD set the CWR bit only on the first new data packet that it transmits. [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] discusses the validation test in the ns simulator, which illustrates a wide range of ECN scenarios. These scenarios include the following: an ECN followed by another ECN, a Fast Retransmit, or a Retransmit Timeout; a Retransmit Timeout or a Fast Retransmit followed by an ECN; and a congestion window of one packet followed by an ECN. TCP follows existing algorithms for sending data packets in response to incoming ACKs, multiple duplicate acknowledgements, or retransmit timeouts [RFC2581]. TCP also follows the normal procedures for increasing the congestion window when it receives ACK packets without the ECN-Echo bit set [RFC2581]. 6.1.3. The TCP Receiver When TCP receives a CE data packet at the destination end-system, the TCP data receiver sets the ECN-Echo flag in the TCP header of the subsequent ACK packet. If there is any ACK withholding implemented, as in current "delayed-ACK" TCP implementations where the TCP receiver can send an ACK for two arriving data packets, then the ECN- Echo flag in the ACK packet will be set tothe OR of'1' if the CEbits of allcodepoint is set in any of the data packets being acknowledged. That is, if any of the received data packets are CE packets, then the returning ACK has the ECN-Echo flag set. To provide robustness against the possibility of a dropped ACK packet carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in a series of ACK packets sent subsequently. The TCP receiver uses the CWR flag received from the TCP sender to determine when to stopset- tingsetting the ECN-Echo flag. After a TCP receiver sends an ACK packet with the ECN-Echo bit set, that TCP receiver continues to set the ECN-Echo flag in all the ACK packets it sends (whether they acknowledge CE data packets or non-CE data packets) until it receives a CWR packet (a packet with the CWR flag set). After the receipt of the CWR packet, acknowledgements for subsequent non-CE data packets do not have the ECN-Echo flag set. If another CE packet is received by the data receiver, the receiver would once again send ACK packets with the ECN-Echo flag set. While the receipt of a CWR packet does not guarantee that the data sender received the ECN-Echo message, this does suggest that the data sender reduced its congestion window at some point *after* it sent the data packet for which the CEbitcodepoint was set. We have already specified that a TCP sender is not required to reduce its congestion window more than once per window of data. Some care is required if the TCP sender is to avoid unnecessary reductions of the congestion window when a window of data includes both dropped packets and (marked) CE packets. This is illustrated in [Floyd98]. 6.1.4. Congestion on the ACK-path For the current generation of TCP congestion control algorithms, pure acknowledgement packets (e.g., packets that do not contain anyaccom- panyingaccompanying data) should be sent with theECT bit off.not-ECT codepoint. Current TCP receivers have no mechanisms for reducing traffic on the ACK-path in response to congestion notification. Mechanisms for responding to congestion on the ACK-path are areas for current and future research. (One simple possibility would be for the sender to reduce itsconges- tioncongestion window when it receives a pure ACK packet with the CEbitcodepoint set). For current TCP implementations, a single dropped ACK generally has only a very small effect on the TCP's sending rate. 6.1.5. Retransmitted TCP packets This document specifiesthat forECN-capable TCPimplementations, theimplementations MUST NOT set either ECTbit (ECN-Capable Transport)codepoint (ECT(0) or ECT(1)) in the IP headerMUST NOT be set onfor retransmitted data packets, and that the TCP data receiver SHOULD ignore the ECN field on arriving data packets that are outside of the receiver's current window. This is for greater security against denial-of-service attacks, as well as for robustness of the ECNcon- gestioncongestion indication with packets that are dropped later in thenet- work.network. First, we note that if the TCP sender were to setthean ECTbitcodepoint on a retransmitted packet, then if an unnecessarily-retransmitted packet was later dropped in the network, the end nodes would never receive the indication of congestion from the router setting the CEbit.codepoint. Thus, settingthean ECTbitcodepoint on retransmitted data packets is notcon- sistentconsistent with the robust delivery of the congestion indication even for packets that are later dropped in the network. In addition, an attacker capable of spoofing the IP source address of the TCP sender could send data packets with arbitrary sequencenum- bers,numbers, withboththeECT andCEbitscodepoint set in the IP header. Onreceiv- ingreceiving this spoofed data packet, the TCP data receiver would determine that the data does not lie in the current receive window, and return a duplicate acknowledgement. We define an out-of-window packet at the TCP data receiver as a data packet that lies outside the receiver's current window. On receiving an out-of-window packet, the TCP data receiver has to decide whether or not to treat the CEbitcodepoint in the packet header as a valid indication of congestion, and therefore whether to return ECN-Echo indications to the TCP data sender. If the TCP data receiver ignored the CEbitcodepoint in an out-of-window packet, then the TCP data sender would not receive thispossibly-legitimatepossibly- legitimate indication of congestion from the network, resulting in a violation of end-to-end congestion control. On the other hand, if the TCP data receiver honors the CE indication in the out-of-window packet, and reports the indication of congestion to the TCP data sender, then the malicious node that created the spoofed,out-of-windowout-of- window packet has successfully "attacked" the TCP connection by forcing the data sender to unnecessarily reduce (halve) its congestion window. To prevent such a denial-of-service attack, we specify that a legitimate TCP data sender MUST NOT setthean ECTbitcodepoint on retransmitted data packets, and that the TCP data receiver SHOULD ignore the CEbitcodepoint onout-of- windowout-of-window packets. One drawback of not settingECTECT(0) or ECT(1) on retransmitted packets is that it denies ECN protection for retransmitted packets. However, for an ECN-capable TCP connection in a fully-ECN-capable environment with mildconges- tion,congestion, packets should rarely be dropped due to congestion in the first place, and so instances of retransmitted packets should rarely arise. If packets are being retransmitted, then there are already packet losses (from corruption or from congestion) that ECN has been unable to prevent. We note that if the router sets the CEbitcodepoint for an ECN-capable data packet within a TCP connection, then the TCP connection is guaranteed to receive that indication of congestion, or to receive some other indication of congestion within the same window of data, even if this packet is dropped or reordered in the network. We consider two cases, when the packet is later retransmitted, and when the packet is not later retransmitted. In the first case, if the packet is either dropped or delayed, and at some point retransmitted by the data sender, then the retransmission is a result of a Fast Retransmit or a Retransmit Timeout for either that packet or for some prior packet in the same window of data. In this case, because the data sender already has retransmitted this packet, we know that the data sender has already responded to an indication of congestion for some packet within the same window of data as the original packet. Thus, even if the first transmission of the packet is dropped in the network, or is delayed, if it had the CEbitcodepoint set, and is later ignored by the data receiver as anout-of-win- dowout- of-window packet, this is not a problem, because the sender has already responded to an indication of congestion for that window of data. In the second case, if the packet is never retransmitted by the data sender, then this data packet is the only copy of this data received by the data receiver, and therefore arrives at the data receiver as an in-window packet, regardless of how much the packet might be delayed or reordered. In this case, if the CEbitcodepoint is set on the packet within the network, this will be treated by the data receiver as a valid indication of congestion. 6.1.6. TCP Window Probes. When the TCP data receiver advertises a zero window, the TCP data sender sends window probes to determine if the receiver's window has increased. Window probe packets do not contain any user data except for the sequence number, which is a byte. If a window probe packet is dropped in the network, this loss is not detected by the receiver. Therefore, the TCP data sender MUST NOT set eitherthean ECT codepoint or the CWRbitsbit on window probe packets. However, because window probes use exact sequence numbers, theycan- notcannot be easily spoofed in denial-of-service attacks. Therefore, if a window probe arrives withECT andthe CE codepoint set, then the receiver SHOULD respond to the ECN indications. 7. Non-compliance by the End Nodes This section discusses concerns about the vulnerability of ECN to non-compliant end-nodes (i.e., end nodes that set the ECTbitcodepoint in transmitted packets but do not respond to received CE packets). We argue that the addition of ECN to the IP architecture will notsig- nificantlysignificantly increase the current vulnerability of the architecture to unresponsive flows. Even for non-ECN environments, there are serious concerns about the damage that can be done by non-compliant or unresponsive flows (that is, flows that do not respond to congestion control indications by reducing their arrival rate at the congested link). For example, an end-node could "turn off congestion control" by not reducing itscon- gestioncongestion window in response to packet drops. This is a concern for the current Internet. It has been argued that routers will have to deploy mechanisms to detect and differentially treat packets from non-compliant flows [RFC2309,FF99]. It has also been suggested that techniques such as end-to-end per-flow scheduling and isolation of one flow from another, differentiated services, or end-to-endreser- vationsreservations could remove some of the more damaging effects ofunrespon- siveunresponsive flows. It might seem that dropping packets in itself is an adequatedeter- rentdeterrent for non-compliance, and that the use of ECN removes thisdeter- rent.deterrent. We would argue in response that (1) ECN-capable routerspre- servepreserve packet-dropping behavior in times of high congestion; and (2) even in times of high congestion, dropping packets in itself is not an adequate deterrent for non-compliance. First, ECN-Capable routers will only mark packets (as opposed to dropping them) when the packet marking rate is reasonably low. During periods where the average queue size exceeds an upper threshold, and therefore the potential packet marking rate would be high, ourrecom- mendationrecommendation is that routers drop packets rather then set the CEbitcodepoint in packet headers. During the periods of low or moderate packet marking rates when ECN would be deployed, there would be little deterrent effect onunre- sponsiveunresponsive flows of dropping rather than marking those packets. For example, delay-insensitive flows using reliable delivery might have an incentive to increase rather than to decrease their sending rate in the presence of dropped packets. Similarly, delay-sensitive flows using unreliable delivery might increase their use of FEC in response to an increased packet drop rate, increasing rather than decreasing their sending rate. For the same reasons, we do not believe that packet dropping itself is an effective deterrent for non-compliance even in an environment of high packet drop rates, when all flows are sharing the same packet drop rate. Several methods have been proposed to identify and restrictnon-com- pliantnon- compliant or unresponsive flows. The addition of ECN to the network environment would not in any way increase the difficulty of designing and deploying such mechanisms. If anything, the addition of ECN to the architecture would make the job of identifying unresponsive flows slightly easier. For example, in an ECN-Capable environment routers are not limited to information about packets that are dropped or have the CEbitcodepoint set at that router itself; in such an environment, routers could also take note of arriving CE packets that indicate congestion encountered by that packet earlier in the path. 8. Non-compliance in the Network This section considers the issues when a router is operating,possi- blypossibly maliciously, to modify either of the bits in the ECN field.In this section we represent the ECN field in the IP header by the tuple (ECT bit, CE bit).By tampering with the bits in the ECN field, an adversary (or abro- kenbroken router) could do one or more of the following: falsely report congestion, disable ECN-Capability for an individual packet, erase the ECN congestion indication, or falsely indicate ECN-Capability. Section 18 systematically examines the various cases by which the ECN field could be modified. The important criterion considered in determining the consequences of such modifications is whether it is likely to lead to poorer behavior in any dimension (throughput, delay, fairness or functionality) than if a router were to drop a packet. The first two possible changes, falsely reporting congestion ordis- ablingdisabling ECN-Capability for an individual packet, are no worse than if the router were to simply drop the packet. From a congestion control point of view, setting the CEbitcodepoint in the absence of congestion by a non-compliant router would be no worse than a router dropping a packet unnecessarily. By "erasing"thean ECTbitcodepoint of a packet that is later dropped in the network, a router's actions could result in an unnecessary packet drop for that packet later in the network. However, as discussed in Section 18, a router that erases the ECN congestion indication or falsely indicates ECN-Capability could potentially do more damage to the flow that if it has simply dropped the packet. A rogue or broken router that "erased" the CEbitcodepoint in arriving CE packets would prevent that indication of congestion from reaching downstream receivers. This could result in the failure of congestion control for that flow and a resulting increase inconges- tioncongestion in the network, ultimately resulting in subsequent packets dropped for this flow as the average queue size increased at thecon- gestedcongested gateway. Section 19 considers the potential repercussions of subverting end- to-end congestion control by either falsely indicatingECN-Capabil- ity,ECN- Capability, or by erasing the congestion indication in ECN (theCE-bit).CE- codepoint). We observe in Section 19 that the consequence of subverting ECN-based congestion control may lead to potential unfairness, but this is likely to be no worse than the subversion of either ECN-based or packet-based congestion control by the end nodes. 8.1. Complications Introduced by Split Paths If a router or other network element has access to all of the packets of a flow, then that router could do no more damage to a flow by altering the ECN field than it could by simply dropping all of the packets from that flow. However, in some cases, a malicious orbro- kenbroken router might have access to only a subset of the packets from a flow. The question is as follows: can this router, by altering the ECN field in this subset of the packets, do more damage to that flow than if it has simply dropped that set of the packets? This is also discussed in detail in Section 18, which conclude as follows: It is true that the adversary that has access only to a subset of packets in an aggregate might, by subverting ECN-basedcon- gestioncongestion control, be able to deny the benefits of ECN to the other packets in the aggregate. While this is undesirable, this is not a sufficient concern to result in disabling ECN. 9. Encapsulated Packets 9.1. IP packets encapsulated in IP The encapsulation of IP packet headers in tunnels is used in many places, including IPsec and IP in IP [RFC2003]. This sectionconsid- ersconsiders issues related to interactions between ECN and IP tunnels, and specifies two alternative solutions. This discussion is complemented by RFC 2983's discussion of interactions between DifferentiatedSer- vicesServices and IP tunnels of various forms [RFC 2983], as Differentiated Services uses the remaining six bits of the IP header octet that is used by ECN (see Figure12 in Section 5). Some IP tunnel modes are based on adding a new "outer" IP header that encapsulates the original, or "inner" IP header and its associated packet. In many cases, the new "outer" IP header may be added and removed at intermediate points along a connection, enabling thenet- worknetwork to establish a tunnel without requiring endpoint participation. We denote tunnels that specify that the outer header be discarded at tunnel egress as "simple tunnels". ECN uses theECT and CE flagsECN field in the IP header for signaling between routers and connection endpoints. ECN interacts with IP tunnels based on the treatment ofthese flagsthe ECN field in the IP header. In simple IP tunnels the octet containingthese flagsthe ECN field is copied or mapped from the inner IP header to the outer IP header at IP tunnel ingress, and the outer header's copy of this field is discarded at IP tunnel egress. If the outer header were to be simply discarded without taking care to deal with the ECNrelated flags,field, and an ECN-capable router were to set the CE (Congestion Experienced)bitcodepoint within a packet in a simple IP tunnel, this indication would be discarded at tunnel egress,los- inglosing the indication of congestion. Thus, the use of ECN over simple IP tunnels would result in routers attempting to use the outer IP header to signal congestion toend- points,endpoints, but those congestion warnings never arriving because the outer header is discarded at the tunnel egress point. This problem was encountered with ECN and IPsec in tunnel mode, and RFC 2481rec- ommendedrecommended that ECN not be used with the older simple IPsec tunnels in order to avoid this behavior and its consequences. When ECN becomes widely deployed, then simple tunnels likely to carryECN-capableECN- capable traffic will have to be changed. From a security point of view, the use of ECN in the outer header of an IP tunnel might raise security concerns because an adversary could tamper with the ECN information that propagates beyond the tunnel endpoint. Based on an analysis in Sections 18 and 19 of thesecon- cernsconcerns and the resultant risks, our overall approach is to makesup- portsupport for ECN an option for IP tunnels, so that an IP tunnel can be specified or configured either to use ECN or not to use ECN in the outer header of the tunnel. Thus, in environments or tunnelingpro- tocolsprotocols where the risks of using ECN are judged to outweigh itsbene- fits,benefits, the tunnel can simply not use ECN in the outer header. Then the only indication of congestion experienced at routers within the tunnel would be through packet loss. The result is that there are two viable options for the behavior of ECN-capable connections over an IP tunnel, especially IPsec tunnels: * A limited-functionality option in which ECN is preserved in the inner header, but disabled in the outer header. The onlymecha- nismmechanism available for signaling congestion occurring within thetun- neltunnel in this case is dropped packets. * A full-functionality option that supports ECN in both the inner and outer headers, and propagates congestion warnings from nodes within the tunnel to endpoints. Support for these options requires varying amounts of changes to IP header processing at tunnel ingress and egress. A small subset of these changes sufficient to support only the limited-functionality option would be sufficient to eliminate any incompatibility between ECN and IP tunnels. One goal of this document is to give guidance about the tradeoffs between the limited-functionality and full-functionality options. A full discussion of the potential effects of an adversary'smodifica- tionsmodifications of theCE and ECT bitsECN field is given in Sections 18 and 19. 9.1.1. The Limited-functionality and Full-functionality Options The limited-functionality option for ECN encapsulation in IP tunnels is for theECT bitnon-ECT codepoint to be set in the outside (encapsulating) headerto be off (i.e., set to 0),regardless of the value of theECT bitECN field in the inside (encapsulated) header. With this option, the ECN field in the inner header is not altered upon de-capsulation. The disadvantage of this approach is that the flow does not have ECN support for that part of the path that is using IP tunneling, even if the encapsulated packet (from the original TCP sender) is ECN-Capable. That is, if the encapsulated packet arrives at a congested router that is ECN- capable, and the router can decide to drop or mark the packet as an indication of congestion to the end nodes, the router will not be permitted to set the CEbitcodepoint in the packet header, but instead will have to drop the packet. The full-functionality option for ECN encapsulation is to copy theECT bitECN codepoint of the inside header to the outside header onencapsulation,encapsulation if the inside header is not-ECT or ECT, and toORset theCE bit fromECN codepoint of theouteroutside headerwithto ECT(0) if theCE bitECN codepoint of the inside header is CE. On decapsulation, if the CE codepoint is set ondecapsulation.the outside header, then the CE codepoint is also set in the inner header. Otherwise, the ECN codepoint on the inner header is left unchanged. That is, for full ECN support the encapsulation and decapsulation processing involves the following: At tunnel ingress, the full-functionality optioncopiessets thevalue of ECT (bit 6)ECN codepoint in the outer header. If the ECN codepoint in the inner header is not-ECT or ECT, then it is copied to the ECN codepoint in the outer header.CE (bit 7)If the ECN codepoint in the inner header isset to 0CE, then the ECN codepoint in the outerheader.header is set to ECT(0). Upon decapsulation at the tunnel egress, the full-functionality optionsets CE to 1 in the inner header if the value of ECT (bit 6)sets the CE codepoint in the inner headeris 1, andif thevalue ofCE(bit 7)codepoint is set in the outerheader is 1.header. Otherwise, no change is made to this field of the inner header. With the full-functionality option, a flow can take advantage of ECN in those parts of the path that might use IP tunneling. Thedisad- vantagedisadvantage of the full-functionality option from a security perspective is that the IP tunnel cannot protect the flow from certainmodifica- tionsmodifications to the ECN bits in the IP header within the tunnel. Thepoten- tialpotential dangers from modifications to the ECN bits in the IP header are described in detail in Sections 18 and 19. (1) An IP tunnel MUST modify the handling of the DS field octet at IP tunnel endpoints by implementing either thelimited-functional- itylimited- functionality or the full-functionality option. (2) Optionally, an IP tunnel MAY enable the endpoints of an IP tunnel to negotiate the choice between the limited-functionality and the full-functionality option for ECN in the tunnel. The minimum required to make ECN usable with IP tunnels is thelim- ited-functionalitylimited-functionality option, which prevents ECN from being enabled in the outer header of an IPsec tunnel. Full support for ECN requires the use of the full-functionality option. If there are no optional mechanisms for the tunnel endpoints to negotiate a choice between the limited-functionality or full-functionality option, there can be a pre-existing agreement between the tunnel endpoints about whether to support the limited-functionality or thefull-functionalityfull- functionality ECN option. In addition, it is RECOMMENDED that packets withECT andthe CEboth set to 1codepoint in the outer header be dropped if they arrive at the tunnel egress point for a tunnel that uses the limited-functionality option, or for a tunnel that uses the full-functionality option but for which the not- ECTbitcodepoint is set in the innerheader is set to zero.header. This is motivated by backwards compatibility and to ensure that no unauthorizedmodifica- tionsmodifications of the ECN field take place, and is discussed further in the next Section (9.1.2). 9.1.2. Changes to the ECN Field within an IP Tunnel. The presence of a copy of the ECN field in the inner header of an IP tunnel mode packet provides an opportunity for detection ofunautho- rizedunauthorized modifications to theECT bitECN field in the outer header. Comparison of the ECTbitsfields in the inner and outer headers falls into twocate- goriescategories for implementations that conform to this document: * If the IP tunnel uses the full-functionality option, then thevalues ofnot-ECT codepoint should be set in theECT bitsouter header if and only if it is also set in the innerand outer headers should be identical.header. * If the tunnel uses the limited-functionality option, then theECT bitnot-ECT codepoint should be set in the outerheader should be 0.header. Receipt of a packet not satisfying the appropriate condition could be a cause of concern. Consider the case of an IP tunnel where the tunnel ingress point has not been updated to this document's requirements, while the tunnel egress point has been updated to support ECN. In this case, the IP tunnel is not explicitly configured to support the full-functionality ECN option. However, the tunnel ingress point is behaving identically to a tunnel ingress point that supports the full-functionality option. If packets from an ECN-capable connection use this tunnel, the ECT codepoint will be setto 1in the outer header at the tunnel ingress point. Congestion within the tunnel may then result inECN-capableECN- capable routers setting CE in the outer header. Because the tunnel has not been explicitly configured to support the full-functionality option, the tunnel egress point expects theECT bitnot-ECT codepoint to be set in the outerheader to be 0.header. When an ECN-capable tunnel egress point receives a packet with the ECTbitor CE codepoint in the outerheader set to 1,header, in a tunnel that has not been configured to support thefull-functionalityfull- functionality option, that packet should be processed, according to whether the CEbitcodepoint was set, as follows. It is RECOMMENDED thatsuch packets, with the ECT bit in the outer header set to 1on a tunnel that has not been configured to support thefull-functionalityfull- functionality option, packets should be dropped at the egress point if the CE codepoint is setto 1in the outer header but0not in the inner header, andfor- wardedshould be forwarded otherwise. An IP tunnel cannot provide protection against erasure of congestion indications based onresetting the value ofchanging the ECN codepoint from CEbit in packets for which ECT is set in the outer header.to ECT. The erasure of congestion indications may impact the network and other flows in ways that would not be possible in the absence of ECN. It is important to note that erasure of congestion indications can only be performed to congestion indications placed by nodes within the tunnel; the copy of theCE bitECN field in the inner header preserves congestion notifications from nodes upstream of the tunnelingress.ingress (unless the inner header is also erased). If erasure of congestionnotifica- tionsnotifications is judged to be a security risk that exceeds the congestion management benefits of ECN, then tunnels could be specified orcon- figuredconfigured to use the limited-functionality option. 9.2. IPsec Tunnels IPsec supports secure communication over potentially insecure network components such as intermediate routers. IPsec protocols support two operating modes, transport mode and tunnel mode, that span a wide range of security requirements and operating environments. Transport mode security protocol header(s) are inserted between the IP (IPv4 or IPv6) header and higher layer protocol headers (e.g., TCP), and hence transport mode can only be used for end-to-end security on aconnec- tion.connection. IPsec tunnel mode is based on adding a new "outer" IP header that encapsulates the original, or "inner" IP header and itsassoci- atedassociated packet. Tunnel mode security headers are inserted between these two IP headers. In contrast to transport mode, the new "outer" IP header and tunnel mode security headers can be added and removed at intermediate points along a connection, enabling security gateways to secure vulnerable portions of a connection without requiring endpoint participation in the security protocols. An important aspect oftun- neltunnel mode security is that in the original specification, the outer header is discarded at tunnel egress, ensuring that security threats based on modifying the IP header do not propagate beyond that tunnel endpoint. Further discussion of IPsec can be found in [RFC2401]. The IPsec protocol as originally defined in [ESP, AH] required that the inner header's ECN field not be changed by IPsec decapsulation processing at a tunnel egress node; this would have ruled out the possibility of full-functionality mode for ECN. At the same time, this would ensure that an adversary's modifications to the ECN field cannot be used to launch theft- or denial-of-service attacks across an IPsec tunnel endpoint, as any such modifications will be discarded at the tunnel endpoint. In principle, permitting the use of ECN functionality in the outer header of an IPsec tunnel raises security concerns because anadver- saryadversary could tamper with the information that propagates beyond the tunnel endpoint. Based on an analysis (included in Sections 18 and 19) of these concerns and the associated risks, our overall approach has been to provide configuration support for IPsec changes to remove the conflict with ECN. In particular, in tunnel mode the IPsec tunnel MUST support either the limited-functionality or the full-functionality mode outlined in Section 9.1.1. This makes permission to use ECN functionality in the outer header of an IPsec tunnel a configurable part of the corresponding IPsecSecu- ritySecurity Association (SA), so that it can be disabled in situations where the risks are judged to outweigh the benefits. The result is that an IPsec security administrator is presented with two alternatives for the behavior of ECN-capable connections within an IPsec tunnel, the limited-functionality alternative andfull-functionalityfull- functionality alternative described earlier. All IPsec implementations MUST implement either the limited-functionality or the full-functionality alternative in order to eliminate incompatibility between ECN and IPsec tunnels, but implementers MAY choose to implement either alternative. In addition, this document specifies how the endpoints of an IPsec tunnel could negotiate enabling ECN functionality in the outerhead- ersheaders of that tunnel based on security policy. The ability tonegoti- atenegotiate ECN usage between tunnel endpoints would enable a securityadmin- istratoradministrator to disable ECN in situations where she believes the risks (e.g., of lost congestion notifications) outweigh the benefits of ECN. The IPsec protocol, as defined in [ESP, AH], does not include the IP header's ECN field in any of its cryptographic calculations (in the case of tunnel mode, the outer IP header's ECN field is not included). Hence modification of the ECN field by a network node has no effect on IPsec's end-to-end security, because it cannot cause any IPsec integrity check to fail. As a consequence, IPsec does notpro- videprovide any defense against an adversary's modification of the ECN field (i.e., a man-in-the-middle attack), as the adversary's modification will also have no effect on IPsec's end-to-end security. In some environments, the ability to modify the ECN field without affecting IPsec integrity checks may constitute a covert channel; if it isnec- essarynecessary to eliminate such a channel or reduce its bandwidth, then the IPsec tunnel should be run in limited-functionality mode. 9.2.1. Negotiation between Tunnel Endpoints This section describes the detailed changes to enable usage of ECN over IPsec tunnels, including the negotiation of ECN support between tunnel endpoints. This is supported by three changes to IPsec: * An optional Security Association Database (SAD) field indicating whether tunnel encapsulation and decapsulation processing allows or forbids ECN usage in the outer IP header. * An optional Security Association Attribute that enablesnegotia- tionnegotiation of this SAD field between the two endpoints of an SA that supports tunnel mode. * Changes to tunnel mode encapsulation and decapsulationprocess- ingprocessing to allow or forbid ECN usage in the outer IP header based on the value of the SAD field. When ECN usage is allowed in the outer IP header, the ECT codepoint is set in the outer header for ECN-capable connections and congestion notifications (indicated by the CEbit)codepoint) from such connections are propagated to the inner header at tunnel egress. If negotiation of ECN usage is implemented, then the SAD field SHOULD also be implemented. On the other hand, negotiation of ECN usage is OPTIONAL in all cases, even for implementations that support the SAD field. The encapsulation and decapsulation processing changes are REQUIRED, but MAY be implemented without the other two changes by assuming that ECN usage is always forbidden. The full-functionality alternative for ECN usage over IPsec tunnels consists of the SAD field and the full version of encapsulation and decapsulationpro- cessingprocessing changes, with or without the OPTIONAL negotiation support. The limited-functionality alternative consists of a subset of the encapsulation and decapsulation changes that always forbids ECN usage. These changes are covered further in the following three subsections. 9.2.1.1. ECN Tunnel Security Association Database Field Full ECN functionality adds a new field to the SAD (see [RFC2401]): ECN Tunnel: allowed or forbidden. Indicates whether ECN-capable connections using this SA in tunnel mode are permitted to receive ECN congestion notifications for congestion occurring within the tunnel. The allowed value enables ECN congestion notifications. The forbidden value disables such notifications, causing all congestion to be indicated via dropped packets. [OPTIONAL. The value of this field SHOULD be assumed to be "forbidden" in implementations that do not support it.] If this attribute is implemented, then the SA specification in a Security Policy Database (SPD) entry MUST support a corresponding attribute, and this SPD attribute MUST be covered by the SPDadminis- trativeadministrative interface (currently described in Section 4.4.1 of [RFC2401]). 9.2.1.2. ECN Tunnel Security Association Attribute A new IPsec Security Association Attribute is defined to enable the support for ECN congestion notifications based on the outer IP header to be negotiated for IPsec tunnels (see [RFC2407]). This attribute is OPTIONAL, although implementations that support it SHOULD also support the SAD field defined in Section 9.2.1.1. Attribute Type class value type ------------------------------------------------- ECN Tunnel 10 Basic The IPsec SA Attribute value 10 has been allocated by IANA toindi- cateindicate that the ECN Tunnel SA Attribute is being negotiated; the type of this attribute is Basic (see Section 4.5 of [RFC2407]). The Class Values are used to conduct the negotiation. See [RFC2407, RFC2408, RFC2409] for further information including encoding formats and requirements for negotiating this SA attribute. Class Values ECN Tunnel Specifies whether ECN functionality is allowed to be used with Tunnel Encapsulation Mode. This affects tunnel encapsulation and decapsulation processing - see Section 9.2.1.3. RESERVED 0 Allowed 1 Forbidden 2 Values 3-61439 are reserved to IANA. Values 61440-65535 are for private use. If unspecified, the default shall be assumed to be Forbidden. ECN Tunnel is a new SA attribute, and hence initiators that use it can expect to encounter responders that do not understand it, and therefore reject proposals containing it. For backwardscompatibil- itycompatibility with such implementations initiators SHOULD always also include a proposal without the ECN Tunnel attribute to enable such a responder to select a transform or proposal that does not contain the ECNTun- nelTunnel attribute. RFC 2407 currently requires responders to reject all proposals if any proposal contains an unknown attribute; this requirement is expected to be changed to require a responder not to select proposals or transforms containing unknown attributes. 9.2.1.3. Changes to IPsec Tunnel Header Processing For full ECN support, the encapsulation and decapsulation processing for the IPv4 TOS field and the IPv6 Traffic Class field are changed from that specified in [RFC2401] to the following: <-- How Outer Hdr Relates to Inner Hdr --> Outer Hdr at Inner Hdr at IPv4 Encapsulator Decapsulator Header fields: -------------------- ------------ DS Field copied from inner hdr (5) no change ECN Field constructed (7) constructed (8) IPv6 Header fields: DS Field copied from inner hdr (6) no change ECN Field constructed (7) constructed (8) (5)(6) If the packet will immediately enter a domain for which the DSCP value in the outer header is not appropriate, that value MUST be mapped to an appropriate value for the domain [RFC 2474]. Also see [RFC 2475] for further information. (7) If the value of the ECN Tunnel field in the SAD entry for this SA is "allowed" and thevalue of ECT (bit 0) is 1ECN field in the innerheader,header is setECTto1 in the outer header, else set ECTany value other than CE, copy this ECN field to0 inthe outer header.Set CE (bit 1)If the ECN field in the inner header is set to0CE, then set the ECN field in the outerheader.header to ECT(0). (8) If the value of the ECN tunnel field in the SAD entry for this SA is "allowed"and the value of ECT (bit 0)and the ECN field in the inner header is1, thenset to ECT(0) or ECT(1) and theCE bit (bit 1)ECN field in theinnerouter header is set to CE, then copy thelogi- cal OR of the CE bit inECN field from theinnerouter headerwith the CE bit into theouter header, elseinner header. Otherwise, make no change to the ECNfield.field in the inner header. (5) and (6) are identical to match usage in [RFC2401], although they are different in [RFC2401]. The above description applies to implementations that support the ECN Tunnel field in the SAD; such implementations MUST implement this processing instead of the processing of the IPv4 TOS octet and IPv6 Traffic Class octet defined in [RFC2401]. This constitutes the full- functionality alternative for ECN usage with IPsec tunnels. An implementation that does not support the ECN Tunnel field in the SAD MUST implement this processing by assuming that the value of the ECN Tunnel field of the SAD is "forbidden" for every SA. In this case, the processing of the ECN field reduces to: (7) Set the ECN field(ECT and CE bits)tozeronot-ECT in the outer header. (8) Make no change to the ECN field in the inner header. This constitutes the limited functionality alternative for ECN usage with IPsec tunnels. For backwards compatibility, packets withECT andthe CEbothcodepoint setto 1in the outer header SHOULD be dropped if they arrive on an SA that is using the limited-functionality option, or that is using the full- functionality option(i.e., and has set the ECT flag in the outer header to 1) for a packetwith theECT flagnot-ECN codepoint setto 0in the inner header. 9.2.2. Changes to the ECN Field within an IPsec Tunnel. If the ECN Field is changed inappropriately within an IPsec tunnel, and this change is detected at the tunnel egress, then the receipt of a packet not satisfying the appropriate condition for its SA is an auditable event. An implementation MAY create audit records with per-SA counts of incorrect packets over some time period rather than creating an audit record for each erroneous packet. Any such audit record SHOULD contain the headers from at least one erroneous packet, but need not contain the headers from every packet represented by the entry. 9.2.3. Comments for IPsec Support Substantial comments were received on two areas of this documentdur- ingduring review by the IPsec working group. This section describes these comments and explains why the proposed changes were not incorporated. The first comment indicated that per-node configuration is easier to implement than per-SA configuration. After serious thought and despite some initial encouragement of per-node configuration, it no longer seems to be a good idea. The concern is that as ECN-awareness is progressively deployed in IPsec, many ECN-aware IPsecimplementa- tionsimplementations will find themselves communicating with a mixture of ECN-aware and ECN-unaware IPsec tunnel endpoints. In such an environment with per-node configuration, the only reasonable thing to do is forbid ECN usage for all IPsec tunnels, which is not the desired outcome. In the second area, several reviewers noted that SA negotiation is complex, and adding to it is non-trivial. One reviewer suggested using ICMP after tunnel setup as a possible alternative. Theaddi- tionaddition to SA negotiation in this document is OPTIONAL and will remain so; implementers are free to ignore it. The authors believe that the assurance it provides can be useful in a number of situations. In practice, if this is not implemented, it can be deleted at asubse- quentsubsequent stage in the standards process. Extending ICMP to negotiate ECN after tunnel setup is more complex than extending SA attribute negotiation. Some tunnels do not permit traffic to be addressed to the tunnel egress endpoint, hence the ICMP packet would have to be addressed to somewhere else, scanned for by the egress endpoint, and discarded there or at its actual destination. In addition, ICMP delivery is unreliable, and hence there is a possibility of an ICMP packet being dropped, entailing the invention of yet another ack/retransmit mechanism. It seems better simply to specify an OPTIONAL extension to the existing SA negotiation mechanism. 9.3. IP packets encapsulated in non-IP packet headers. A different set of issues are raised, relative to ECN, when IPpack- etspackets are encapsulated in tunnels with non-IP packet headers. This occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. For these protocols, there is no conflict with ECN; it is just that ECN cannot be used within the tunnel unless an ECN codepoint can be specified for the header of the encapsulating protocol. Earlier work considered a preliminary proposal for incorporating ECN into MPLS, and proposals for incorporating ECN into GRE, L2TP, or PPTP will be considered as the need arises. 10. Issues Raised by Monitoring and Policing Devices One possibility is that monitoring and policing devices (or more informally, "penalty boxes") will be installed in the network tomon- itormonitor whether best-effort flows are appropriately responding tocon- gestion,congestion, and to preferentially drop packets from flows determined not to be using adequate end-to-end congestion control procedures. We recommend that any "penalty box" that detects a flow or anaggre- gateaggregate of flows that is not responding to end-to-end congestion control first change from marking to dropping packets from that flow, before taking any additional action to restrict the bandwidth available to that flow. Thus, initially, the router may drop packets in which the router would otherwise would have set the CEbit.codepoint. This could include dropping those arriving packets for that flow that are ECN-Capable and that already have the CEbitcodepoint set. In this way, any congestion indications seen by that router for that flow will be guaranteed to also be seen by the end nodes, even in the presence of malicious or broken routers elsewhere in the path. If we assume that the first action taken at any "penalty box" for anECN-capableECN- capable flow will be to drop packets instead of marking them, then there is no way that an adversary that subverts ECN-based end-to-end congestion control can cause a flow to be characterized as beingnon-cooperativenon- cooperative and placed into a more severe action within the "penalty box". The monitoring and policing devices that are actually deployed could fall short of the `ideal' monitoring device described above, in that the monitoring is applied not to a single flow, but to an aggregate of flows (e.g., those sharing a single IPsec tunnel). In this case, the switch from marking to dropping would apply to all of the flows in that aggregate, denying the benefits of ECN to the other flows in the aggregate also. At the highest level of aggregation, another form of the disabling of ECN happens even in the absence ofmonitor- ingmonitoring and policing devices, when ECN-Capable RED queues switch from marking to dropping packets as an indication of congestion when the average queue size has exceeded some threshold.If there were serious operational problems with routers inappropri- ately erasing the CE bit in packet headers, this could be addressed to some extent by including a one-bit ECN nonce in packet headers. Routers would erase the nonce when they set the CE bit [SCWA99]. Routers that erased the CE bit would face additional difficulty in reconstructing the original nonce, and thus repeated erasure11. Evaluations ofthe CE bit would be more likely to be detected by the end-nodes. (This could in fact be done without adding any extra bits forECNin the IP header, by using the11.1. Related Work Evaluating ECNcodepoints (ECT=1, CE=0) and (ECT=0, CE=1) as the two values for the nonce, and by defining the codepoint (ECT=0, CE=1) to mean exactly the same asThis section discusses some of thecodepoint (ECT=1, CE=0).) However, at this pointrelated work evaluating thepotential danger of misbehaving routers does not seem of sufficient concern to warrant this addi- tional complicationuse ofadding anECN. The ECNnonceWeb Page [ECN] has pointers toprotect against the erasure of the CE bit. Additional research is also neededother papers, as well as tobetter understand the valueimplementations ofsuch a nonceECN. [Floyd94] considers the advantages andappropriate means of gener- ating sequencesdrawbacks ofnonce values that an adversary will find suffi- ciently difficult to reconstruct. Anadding ECNnonce would also addressto theproblemTCP/IP architecture. As shown in the simulation-based comparisons, one advantage ofmisbehaving transport receivers lyingECN is tothe transport sender about whetheravoid unnecessary packet drops for short ornotdelay-sensitive TCP connections. A second advantage of ECN is in avoiding some unnecessary retransmit timeouts in TCP. This paper discusses in detail theCE bit was setintegration of ECN into TCP's congestion control mechanisms. The possible disadvantages of ECN discussed ina packet. However, another possibility is forthedata sender to test forpaper are that amisbehaving receiver directly, by occasion- ally sendingnon-compliant TCP connection could falsely advertise itself as ECN-capable, and that adataTCP ACK packetwith ECT and CE set, to see if the receiver reports receivingcarrying an ECN-Echo message could itself be dropped in theCE bit. Of course, ifnetwork. The first of thesepackets encountered congestiontwo issues is discussed in thenetwork,appendix of this document, and therouter would make no change insecond is addressed by thepackets, becauseaddition of theCE bit would already be set. Thus, for packets sent withCWR flag in theECTTCP header. Experimental evaluations of ECN include [RFC2884,K98]. The conclusions of [K98] and [RFC2884] are that ECN TCP gets moderately better throughput than non-ECN TCP; that ECN TCP flows are fair towards non-ECN TCP flows; and that ECN TCP is robust with two-way traffic (with congestion in both directions) andCE bits set, the TCP end- nodes could not determine if some router intended to setwith multiple congested gateways. Experiments with many short web transfers show that, while most of theCE bit in these packets. For this reason, sending packetsshort connections have similar transfer times with or without ECN, a small percentage of theECT and CE bits wouldshort connections haveto be doneverysparingly. In addition,long transfer times for theTCP sender would havenon-ECN experiments as compared toremember which packets were sent withthe ECN experiments. 11.2. A Discussion of the ECN nonce. The use of two ECT codepoints, ECT(0) andCE bits set, so that it doesn't react to them as if there was congestionECT(1), can provide a one- bit ECN nonce inthe network. We believe that further researchpacket headers [SCWA99]. The primary motivation for this isneeded on possible transport-basedthe desire to allow mechanisms forverifying thatthetransport receiver doesdata sender to verify that network elements are notlieerasing the CE codepoint, and that data receivers are properly reporting to thetransportsenderaboutthe receipt ofcongestion indications. 11. Evaluations of ECNpackets with the CE codepoint set, as required by the transport protocol. This section discussessome of the related work evaluating the useissues ofECN. Thebackwards compatibility with IP ECNWeb Page [ECN] has pointers to other papers, as well as toimplementationsof ECN. [Floyd94] considersin routers conformant with RFC 2481, in which only one ECT codepoint was defined. We do not believe that theadvantages and drawbacksincremental deployment ofaddingECNtoimplementations that understand theTCP/IP architecture. As shown inECT(1) codepoint will cause significant operational problems. This is particularly likely to be thesimulation-based comparisons, one advantagecase when the deployment ofECN isthe ECT(1) codepoint begins with routers, before the ECT(1) codepoint starts toavoid unnecessary packet drops for short or delay-sensitive TCP connections. A second advantagebe used by end-nodes. 11.2.1. The Incremental Deployment ofECN is in avoiding some unnecessary retransmit timeouts in TCP. This paper discussesECT(1) indetail the integration ofRouters. ECNinto TCP's congestion con- trol mechanisms. The possible disadvantageshas been an Experimental standard since January 1999, and there are already implementations of ECNdiscussedinthe paper arerouters thata non-compliantdo not understand the ECT(1) codepoint. When the use of the ECT(1) codepoint is standardized for TCPconnectionor for other transport protocols, this couldfalsely advertise itself as ECN-capable, andmean that aTCP ACK packet carrying an ECN-Echo message could itself be dropped in the network. The first of these two issuesdata sender isdiscussed inusing theappendix ofECT(1) codepoint, but that thisdocument, and the secondcodepoint isaddressednot understood by a congested router on theaddition of the CWR flag inpath. If allowed by theTCP header. Experimental evaluations of ECN include [RFC2884,K98]. The conclu- sionstransport protocol, a data sender would be free not to make use of[K98] and [RFC2884] are that ECN TCP gets moderately better throughput than non-ECN TCP; that ECN TCP flows are fair towards non- ECN TCP flows;ECT(1) at all, andthat ECN TCP is robustto send all ECN-capable packets withtwo-way traffic (with congestion in both directions)the codepoint ECT(0). However, if an ECN-capable sender is using ECT(1), andwith multiplethe congestedgateways. Experiments with many short web transfers show that, while most ofrouter on theshort connections have similar transfer times with or without ECN, a small percentagepath did not understand the ECT(1) codepoint, then the router would end up marking some of theshort connections have very long transfer times forECT(0) packets, and dropping some of thenon-ECN experimentsECT(1) packets, ascomparedindications of congestion. Since TCP is required to react to both marked and dropped packets, this behavior of dropping packets that could have been marked poses no significant threat to the network, and is consistent with the overall approach to ECNexperiments.that allows routers to determine when and whether to mark packets as they see fit (see Section 5). 12. Summary of changes required in IP and TCP This document specified two bits in the IPheader, the ECN-Capable Transport (ECT) bit and the Congestion Experienced (CE) bit,header to be used for ECN. TheECT bit set to "0"not-ECT codepoint indicates that the transport protocol will ignore the CEbit.codepoint. This is the default value for theECT bit.ECN codepoint. The ECTbit set to "1" indicatescodepoints indicate that the transportproto- colprotocol is willing and able to participate in ECN. Thedefault value for the CE bit is "0". Therouter sets the CEbit to "1"codepoint to indicate congestion to the end nodes. The CEbitcodepoint in a packet header MUST NOT be reset by arouter from "1" to "0". When viewed in terms of code points, this document has defined three code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1). The code point of (ECT=0, CE=1) is not defined in this document. One possi- bility would be for this code point to be used, some time in the future, for some other function for non-ECN-capable packets. A sec- ond possibility would be for this code point to be used as an ECN nonce, as described earlier in the document. A third possibility would be for the code point (ECT=0, CE=1) to be used to indicate that the packet is ECN-capable for an alternate semantics for the Conges- tion Experienced indication. However, at this time the code point (ECT=0, CE=1) remains undefined.router. TCP requires three changes for ECN, a setup phase and two new flags in the TCP header. The ECN-Echo flag is used by the data receiver to inform the data sender of a received CE packet. The CongestionWin- dowWindow Reduced (CWR) flag is used by the data sender to inform the data receiver that the congestion window has been reduced. When ECN (Explicit Congestion Notification [RFC2481]) is used, it is required that congestion indications generated within an IP tunnel not be lost at the tunnel egress. We specified a minor modification to the IP protocol's handling of the ECN field during encapsulation and de-capsulation to allow flows that will undergo IP tunneling to use ECN. Two options for ECN in tunnels were specified: 1) A limited-functionality option that does not use ECN inside the IP tunnel, byturningsetting theECT bitECN field in the outer headeroff,to not-ECT, and not altering the inner header at the time of decapsulation. 2) The full-functionality option, whichcopiessets theECT bitECN field in the outer header to either not-ECT or to one of the ECT codepoints, depending on the ECN field in the innerheader to the encapsulatingheader. At decapsulation, if theECT bitCE codepoint is set in theinnerouter header, and theCE bit on the outerinner header isORed with the CE bitset to one of theinner header to updateECT codepoints, then the CEbit ofcodepoint is copied to thepacket.inner header. All IP tunnels MUST implement one of the two alternative approaches described above. For IPsec tunnels, this document also defines an optional IPsec Security Association (SA) attribute that enables negotiation of ECN usage within IPsec tunnels and an optional field in the Security Association Database to indicate whether ECN isper- mittedpermitted in tunnel mode on a SA. The required changes to IPsec tunnels for ECN usage modify RFC 2401 [RFC2401], which defines the IPsec architecture and specifies some aspects of its implementation. The new IPsec SA attribute is in addition to those already defined in Section 4.5 of [RFC2407]. This document is intended to obsolete RFC 2481, "A Proposal to add Explicit Congestion Notification (ECN) to IP", which defined ECN as an Experimental Protocol for the Internet Community. The rest of this section describes the relationship between this document and its predecessor. RFC 2481 included a brief discussion of the use of ECN withencapsu- latedencapsulated packets, and noted that for the IPsec specifications at the time (January 1999), flows could not safely use ECN if they were to traverse IPsec tunnels. RFC 2481 also described the changes that could be made to IPsec tunnel specifications to made them compatible with ECN. This document also incorporates work that was done after RFC 2481, First was to describe the changes to IPsec tunnels in detail, and extensively discuss the security implications of ECN (now included as Sections 18 and 19 of this document). Second was to extend thedis- cussiondiscussion of IPsec tunnels to include all IP tunnels. Because older IP tunnels are not compatible with a flow's use of ECN, the deployment of ECN in the Internet will create strong pressure for older IPtun- nelstunnels to be updated to an ECN-compatible version, using either the limited-functionality or the full-functionality option. This document does not address the issue of including ECN in non-IP tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary document about adding ECN support to MPLS was not advanced. A third new piece of work after RFC2481 was to describe the ECNpro- cedureprocedure with retransmitted data packets, thatthean ECTbitcodepoint should not be set on retransmitted data packets. The motivation for thisaddi- tionaladditional specification is to eliminate a possible avenue fordenial-of- servicedenial-of-service attacks on an existing TCP connection. Some priordeploy- mentsdeployments of ECN-capable TCP might not conform to the (new) requirement not to setthean ECTbitcodepoint on retransmitted packets; we do not believe this will cause significant problems in practice. This document also expands slightly on the specification of the use of SYN packets for the negotiation of ECN. While some priordeploy- mentsdeployments of ECN-capable TCP might not conform to the requirementsspeci- fiedspecified in this document, we do not believe that this will lead to any performance or compatibility problems for TCP connections with acom- binationcombination of TCP implementations at the endpoints. This document also includes the specification of the ECT(1) codepoint, which may be used by TCP as part of the implementation of an ECN nonce. 13. Conclusions Given the current effort to implement AQM, we believe this is the right time to deploy congestion avoidance mechanisms that do not depend on packet drops alone. With the increased deployment of applications and transports sensitive to the delay and loss of asin- glesingle packet (e.g., realtime traffic, short web transfers), depending on packet loss as a normal congestion notification mechanism appears to be insufficient (or at the very least,non-optimal).non- optimal). We examined the consequence of modifications of the ECN field within the network, analyzing all the opportunities for an adversary to change the ECN field. In many cases, the change to the ECN field is no worse than dropping a packet. However, we noted that some changes have the more serious consequence of subverting end-to-end congestion control. However, we point out that even then the potential damage is limited, and is similar to the threat posed by end-systemsinten- tionallyintentionally failing to cooperate with end-to-end congestion control. 14. Acknowledgements Many people have made contributions to this work and this document, including many that we have not managed to directly acknowledge in this document. In addition, we would like to thank Kenjiro Cho for the proposal for the TCP mechanism for negotiating ECN-Capability, Kevin Fall for the proposal of the CWR bit, Steve Blake for material on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim fordiscus- sionsdiscussions of ECN issues, and Steve Bellovin, Jim Bound, Brian Carpenter, Paul Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson fordis- cussionsdiscussions of security issues. We also thank the Internet End-to-End Research Group for ongoing discussions of these issues. Email discussions with a number of people, including Alexey Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed the issues raised by non-conformant equipment in the Internet that does not respond to TCP SYN packets with the ECE and CWR flags set. We thank Mark Handley, Jitentra Padhye, and others for discussions on the TCP initialization procedures. The discussion of ECN and IP tunnel considerations draws heavily on related discussions and documents from the Differentiated Services Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen for proposing modifications to RFC 2407 that improve the usability of negotiating the ECN Tunnel SA attribute. We thank David Wetherall, David Ely, and Neil Spring for the proposal for the ECN nonce. We also thank Stefan Savage for discussions on this issue. We thank Bob Briscoe and Jon Crowcroft for raising the issue of fragmentation in IP, on alternate semantics for the fourth ECN codepoint, and several other topics. We thank Richard Wendland for feedback on several issues in the draft. 15. References [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, November 1998. [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". Reference for informational purposes only. [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", RFC 2406, November 1998. [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 N.4, August 1993, p. 397-413. [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- ecn. Reference for informational purposes only. [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-EndCon- gestionCongestion Control in the Internet", IEEE/ACM Transactions onNetwork- ing,Networking, August 1999. [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", SIGCOMM '97, September 1997. [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing Encapsulation (GRE), RFC 1701, October 1994. [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. ACM SIGCOMM '88, pp. 314-329. [Jacobson90] V. Jacobson, "Modified TCP Congestion AvoidanceAlgo- rithm",Algorithm", Message to end2end-interest mailing list, April 1990. URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) benefits for TCP", Master's thesis, UCLA, 1998, URL "http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz". [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. [MJV96] S. McCanne, V. Jacobson, and M. Vetterli,"Receiver- driven"Receiver-driven Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, Requirements for Traffic Engineering Over MPLS, RFC 2702, September 1999. [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, July 1999. [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of the Internet Checksum", RFC 1141, January 1990. [RFC1349] Almquist, P., "Type of Service in the Internet Protocol Suite", RFC 1349, July 1992. [RFC1455] Eastlake, D., "Physical Link Security Type of Service", RFC 1455, May 1993. [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic Routing Encapsulation (GRE), RFC 1701, October 1994. [RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic Routing Encapsulation over IPv4 networks, RFC 1702, October 1994. [RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October 1996. [RFC 2119] S. Bradner, Key words for use in RFCs to IndicateRequire- mentRequirement Levels, RFC 2119, March 1997. [RFC2309] Braden, B., et al., "Recommendations on Queue Management and Congestion Avoidance in the Internet", RFC 2309, April 1998. [RFC2401] S. Kent and R. Atkinson, Security Architecture for the Internet Protocol, RFC 2401, November 1998. [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation for ISAKMP, RFC 2407, November 1998. [RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner, Internet Security Association and Key Management Protocol (ISAKMP), RFC 2409, November 1998. [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), RFC 2409, November 1998. [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998. [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss, An Architecture for Differentiated Services, RFC 2475,Decem- berDecember 1998. [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit Congestion Notification (ECN) to IP, RFC 2481, January 1999. [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", RFC 2581, April 1999. [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, July 2000. [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, October 2000. [RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For Values In the Internet Protocol and Related Headers", RFC 2780, March 2000. [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for Congestion Avoidance in Computer Networks", ACM Transactions onCom- puterComputer Systems, Vol.8, No.2, pp. 158-181, May 1990. [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM Computer Communications Review, October 1999. 16. Security Considerations Security considerations have been discussed in Sections 7, 8, 18, and 19. 17. IPv4 Header Checksum Recalculation IPv4 header checksum recalculation is an issue with some high-end router architectures using an output-buffered switch, since most if not all of the header manipulation is performed on the input side of the switch, while the ECN decision would need to be made local to the output buffer. This is not an issue for IPv6, since there is no IPv6 header checksum. The IPv4 TOS octet is the last byte of a 16-bit half-word. RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 checksum after the TTL field is decremented. The incrementalupdat- ingupdating of the IPv4 checksum after the CEbitcodepoint was set would work asfol- lows:follows: Let HC be the original headerchecksum,checksum for an ECT(0) packet, and let HC' be the new header checksum after the CEbitchecksum has been set. That is, the ECN field has changed from '10' to '11'. Then for header checksums calculated with one's complement subtraction, HC' would be recalculated as follows: HC' = { HC - 1 HC > 1 { 0x0000 HC = 1 For header checksums calculated on two's complement machines, HC' would be recalculated as follows after the CE bit was set: HC' = { HC - 1 HC > 0 { 0xFFFE HC = 0 A similar incremental updating of the IPv4 checksum can be carried out when the ECN field is changed from ECT(1) to CE, that is, from '01' to '11'. 18. Possible Changes to the ECN Field in the Network This section discusses in detail possible changes to the ECN field in the network, such as falsely reporting congestion, disabling ECN- Capability for an individual packet, erasing the ECN congestionindi- cation,indication, or falsely indicating ECN-Capability.We represent the ECN bits in the IP header by the tuple (ECT bit, CE bit).18.1. Possible Changes to the IP Header 18.1.1. Erasing the Congestion Indication First, we consider the changes that a router could make that would result in effectively erasing the congestion indication after it had been set by a router upstream. The convention followed is:(ECT, CE)ECN codepoint of received packet ->(ECT, CE)ECN codepoint of packet transmitted.(1, 1) -> (1, 0): erase onlyReplacing the CEbit that was set. (1, 1) -> (0, 0): erase bothcodepoint with theECT bit andECT(0) or ECT(1) codepoint effectively erases theCE bit. (1, 1) -> (0, 1): erasecongestion indication. However, with the use of two ECTbit The first change turns offcodepoints, a router erasing the CEbit aftercodepoint has no way to know whether the original ECT codepoint was ECT(0) or ECT(1). Thus, ithas been set by some upstream router alongis possible for thepath.transport protocol to deploy mechanisms to detect such erasures of the CE codepoint. The consequence of the erasure of the CE codepoint for the upstream router is that there is a potential for congestion to build for a time, because the congestion indication does not reach the source. However, the packet would be received and acknowledged. The potential effect of erasing the congestion indication is complex, and is discussed in depth in Section 19 below. Note that the effect of erasing the congestion indication is different from dropping a packet in the network. When a data packet is dropped, the drop is detected by the TCP sender, and interpreted as an indication ofcon- gestion.congestion. Similarly, if a sufficient number of consecutiveacknowl- edgementacknowledgement packets are dropped, causing the cumulative acknowledgement field not to be advanced at the sender, the sender is limited by the congestion window from sending additional packets, and ultimately the retransmit timer expires. In contrast, a systematic erasure of the CE bit by a downstream router can have the effect of causing a queue buildup at an upstream router, including the possible loss of packets due to bufferover- flow.overflow. There is a potential of unfairness in that another flow that goes through the congested router could react to the CE bit set while the flow that has the CE bit erased could see better performance. The limitations on this potential unfairness are discussed in more detail in Section 19 below. Thesecond changelast of the three changes is toturn off both the ECT andreplace the CEbits,codepoint with the not-ECT codepoint. thus erasing the congestion indication and disabling ECN-Capability at the same time. Thethird change turns off only the ECT bit, disabling ECN-Capability. Within an IP tunnel using the full-functionality option, the third change would not erase the congestion indication, but would only dis- able ECN-Capability for that packet within the rest of the tunnel. However, when performed outside of an IP tunnel, the third change would also effectively erase the congestion indication, because an ECN field of (0, 1) is undefined. The`erasure' of the congestion indication is only effective if the packet does not end up being marked or dropped again by a downstream router.WithIf thefirst change,CE codepoint is replaced by an ECT codepoint, the packet remains ECN-Capable, and could be either marked or dropped by a downstream router as anindi- cationindication of congestion.WithIf thesecond and third changes,CE codepoint is replaced by the not-ECT codepoint, the packet is no longer ECN-capable, and can therefore be dropped but not marked by a downstream router as an indication of congestion. 18.1.2. Falsely Reporting Congestion(1, 0) -> (1, 1)This change is to set the CEbitcodepoint whenthean ECTbitcodepoint was already set, even though there was no congestion. This change does not affect the treatment of that packet along the rest of the path. In particular, a router does not examine the CEbitcodepoint in deciding whether to drop or mark an arriving packet. However, this could result in the application unnecessarily invoking end-to-end congestion control, and reducing its arrival rate. By itself, this is no worse (for the application or for the network) than if the tampering router had actually dropped the packet. 18.1.3. Disabling ECN-Capability(1, 0) -> (0, *)This change is to turn off the ECTbit of a packet that does not have the CE bit set. (Section 18.1.1 discussed the case of turning off the ECT bitcodepoint of apacket that does have the CE bit set.)packet. This means that if the packet later encounters congestion (e.g., by arriving to a RED queue with a moderate average queue size), it will be dropped instead of being marked. By itself, this is no worse (for theappli- cation)application) than if the tampering router had actually dropped the packet. The saving grace in this particular case is that there is nocon- gestedcongested router upstream expecting a reaction from setting the CE bit. 18.1.4. Falsely Indicating ECN-Capability This change would incorrectly label a packet as ECN-Capable. The packet may have been sent either by an ECN-Capable transport or a transport that is not ECN-Capable.(0, *) -> (1, 0); (0, *) -> (1, 1);If the packet later encounters moderate congestion at an ECN-Capable router, the router could set the CEbitcodepoint instead of dropping the packet. If the transport protocol in fact is not ECN-Capable, then the transport will never receive this indication of congestion, and will not reduce its sending rate in response. The potentialconse- quencesconsequences of falsely indicating ECN-capability are discussed further in Section 19 below. If the packet never later encounters congestion at an ECN-Capable router, then the first of these two changes would have noeffect.effect, other than possibly interfering with the use of the ECN nonce by the transport protocol. Thesecondlast change, however, would have the effect of giving false reports of congestion to a monitoring device along the path. If the transport protocol is ECN-Capable, thenthe second of these two changes (when, for example, (0,0) was changed to (1,1))this change could also have an effect at the transport level, by combining falselyindicat- ingindicating ECN-Capability with falsely reporting congestion. For anECN- capableECN-capable transport, this would cause the transport to unnecessarily react to congestion. In this particular case, the router that is incorrectly changing the ECN field could have dropped the packet. Thus for this case of an ECN-capable transport, the consequence of this change to the ECN field is no worse than dropping the packet.18.1.5. Changes with No Functional Effect (0, *) -> (0, *) The CE bit is ignored in a packet that does not have the ECT bit set. Thus, this change would have no effect, in terms of ECN.18.2. Information carried in the Transport Header For TCP, an ECN-capable TCP receiver informs its TCP peer that it is ECN-capable at the TCP level, conveying this information in the TCP header at the time the connection is setup. This document does not consider potential dangers introduced by changes in the transport header within the network. In the case of IPsec tunnels, the IPsec tunnel protects the transport header. Another issue concerns TCP packets with a spoofed IP source address carrying invalid ECN information in the transport header. Forcom- pleteness,completeness, we examine here some possible ways that a node spoofing the IP source address of another node could use the two ECN flags in the TCP header to launch a denial-of-service attack. However, these attacks would require an ability for the attacker to use valid TCP sequence numbers, and any attacker with this ability and with the ability to spoof IP source addresses could damage the TCP connection without using the ECN flags. Therefore, ECN does not add any new vulnerabilities in this respect. An acknowledgement packet with a spoofed IP source address of the TCP data receiver could include the ECE bit set. If accepted by the TCP data sender as a valid packet, this spoofed acknowledgement packet could result in the TCP data sender unnecessarily halving itsconges- tioncongestion window. However, to be accepted by the data sender, such a spoofed acknowledgement packet would have to have the correct 32-bit sequence number as well as a valid acknowledgement number. An attacker that could successfully send such a spoofed acknowledgement packet could also send a spoofed RST packet, or do other equallydam- agingdamaging operations to the TCP connection. Packets with a spoofed IP source address of the TCP data sender could include the CWR bit set. Again, to be accepted, such a packet would have to have a valid sequence number. In addition, such a spoofed packet would have a limited performance impact. Spoofing a data packet with the CWR bit set could result in the TCP data receiver sending fewer ECE packets than it would otherwise, if the data receiver was sending ECE packets when it received the spoofed CWR packet. 18.3. Split Paths In some cases, a malicious or broken router might have access to only a subset of the packets from a flow. The question is as follows: can this router, by altering the ECN field in this subset of the packets, do more damage to that flow than if it had simply dropped that set of packets? We will classify the packets in the flow as A packets and B packets, and assume that the adversary only has access to A packets. Assume that the adversary is subverting end-to-end congestion control along the path traveled by A packets only, by either falsely indicating ECN-Capability upstream of the point where congestion occurs, or erasing the congestion indication downstream. Consider also that there exists a monitoring device that sees both the A and B packets, and will "punish" both the A and B packets if the total flow is determined not to be properly responding to indications ofconges- tion.congestion. Another key characteristic that we believe is likely to be true is that the monitoring device, before `punishing' the A&B flow, will first drop packets instead of setting the CEbit,codepoint, and will drop arriving packets of that flow that already have theECT andCEbitscodepoint set. If the end nodes are in fact using end-to-end congestioncon- trol,control, they will see all of the indications of congestion seen by the monitoring device, and will begin to respond to these indications of congestion. Thus, the monitoring device is successful in providing the indications to the flow at an early stage. It is true that the adversary that has access only to the A packets might, by subverting ECN-based congestion control, be able to deny the benefits of ECN to the other packets in the A&B aggregate. While this is unfortunate, this is not a reason to disable ECN within an IPsec tunnel. A variant of falsely reporting congestion occurs when there are two adversaries along a path, where the first adversary falsely reports congestion, and the second adversary `erases' those reports. (Unlike packet drops, ECN congestion reports can be `reversed' later in the network by a malicious or brokenrouter.)router. However, the use of the ECN nonce could help the transport to detect this behavior.) While this would betrans- parenttransparent to the end node, it is possible that a monitoring device between the first and second adversaries would see the falseindica- tionsindications of congestion. Keep in mind our recommendation in thisdocu- ment,document, that before `punishing' a flow for not responding appropriately to congestion, the router will first switch to dropping rather than marking as an indication of congestion, for that flow. When this includes dropping arriving packets from that flow that have the CEbitcodepoint set, this ensures that these indications of congestion are being seen by the end nodes. Thus, there is no additional harm that we are able to postulate as a result of multiple conflicting adversaries. 19. Implications of Subverting End-to-End Congestion Control This section focuses on the potential repercussions of subverting end-to-end congestion control by either falsely indicatingECN-Capa- bility,ECN- Capability, or by erasing the congestion indication in ECN (theCE-bit).CE codepoint). Subverting end-to-end congestion control by either of these twometh- odsmethods can have consequences both for the application and for thenet- work.network. We discuss these separately below. The first method to subvert end-to-end congestion control, that offalsely indicating ECN-Capability, effectively subvertsfalsely indicating ECN-Capability, effectively subverts end-to-end congestion control only if the packet later encounters congestion that results in the setting of the CE codepoint. In this case, the transport protocol (which may not be ECN-capable) does not receive the indication of congestion from these downstream congested routers. The second method to subvert end-to-end congestion control, `erasing' the CE codepoint in a packet, effectively subverts end-to-end congestion control only when the CE codepoint in the packet was set earlier by a congested router. In this case, the transport protocol does not receive the indication of congestion from the upstream congested routers. Either of these two methods of subverting end-to-end congestion control can potentially introduce more damage to the network (and possibly to the flow itself) than if the adversary had simply dropped packets from that flow. However, as we discuss later in this section and in Section 7, this potential damage is limited. 19.1. Implications for the Network and for Competing Flows The CE codepoint of the ECN field is only used by routers as an indication of congestion during periods of *moderate* congestion. ECN-capable routers should drop rather than mark packets during heavy congestion even if the router's queue is not yet full. For example, for routers using active queue management based on RED, the router should drop rather than mark packets that arrive while the average queue sizes exceed the RED queue's maximum threshold. One consequence for the network of subverting end-to-end congestion controlonly ifis that flows that do not receive thepacket later encounterscongestion indications from the network might increase their sending rate until they drive the network into heavier congestion. Then, the congested router could begin to drop rather than mark arriving packets. For flows thatresultsare not isolated by some form of per-flow scheduling or other per-flow mechanisms, but are instead aggregated with other flows in a single queue in an undifferentiated fashion, this packet-dropping at thesettingcongested router would apply to all flows that share that queue. Thus, the consequences would be to increase the level of congestion in theCE bit.network. Inthis case,some cases, thetrans- port protocol (which may not be ECN-capable) does not receiveincrease in theindicationlevel of congestionfrom these downstreamwill lead to a substantial buffer buildup at the congestedrouters. The second methodqueue that will be sufficient tosubvert end-to-end congestion control, `erasing'drive the(set) CE bit in a packet, effectively subverts end-to-end conges- tion control onlycongested queue from the packet-marking to the packet-dropping regime. This transition could occur either because of buffer overflow, or because of the active queue management policy described above that drops packets when theCE bit inaverage queue is above RED's maximum threshold. At this point, all flows, including the subverted flow, will begin to see packetwas set earlier bydrops instead of packet marks, and acongested router. In this case, the transport protocol does not receive the indicationmalicious or broken router will no longer be able to `erase' these indications of congestionfromin theupstream congested routers. Either of these two methods of subvertingnetwork. If the end nodes are deploying appropriate end-to-end congestioncon- trol can potentially introduce more damage to the network (and possi- bly tocontrol, then the subverted flowitself) than if the adversary had simply dropped packets from that flow. However, as we discuss later in this section andwill reduce its arrival rate inSection 7, this potential damage is limited. 19.1. Implications for the Network and for Competing Flows The CE bit ofresponse to congestion. When theECN field is only used by routers as an indication of congestion during periodslevel of*moderate* congestion. ECN-capable routers should drop rather than mark packets during heavycongestioneven if the router's queueisnot yet full. For example, for routers using activesufficiently reduced, the congested queuemanagement based on RED,can return from therouter should drop rather than mark packets that arrive whilepacket-dropping regime to theaverage queue sizes exceedpacket-marking regime. The steady-state pattern could be one of theRED queue's maximum threshold. One consequence forcongested queue oscillating between these two regimes. In other cases, thenetworkconsequences of subverting end-to-end congestion controlis that flows that dowill notreceive the congestion indications from the network might increase their sending rate until theybe severe enough to drive thenetwork into heavier congestion. Then, thecongestedrouter could begin to drop rather than mark arriving packets. For flowslink into sufficiently-heavy congestion that packets arenot isolated by some formdropped instead of being marked. In this case, the implications for competing flows in the network will be a slightly-increased rate ofper-flow schedulingpacket marking orother per-flow mechanisms, but are instead aggregated with other flowsdropping, and a corresponding decrease in the bandwidth available to those flows. This can be asinglestable state if the arrival rate of the subverted flow is sufficiently small, relative to the link bandwidth, that the average queuein an undifferentiated fashion, this packet-droppingsize at the congested routerwould apply to all flows thatremains under control. In particular, the subverted flow could have a limited bandwidth demand on the link at this router, while still getting more than its "fair" sharethat queue. Thus,of theconsequences wouldlink. This limited demand could be due toincrease the level of congestion ina limited demand from thenetwork. In some cases,data source; a limitation from theincrease inTCP advertised window; a lower-bandwidth access pipe; or other factors. Thus thelevelsubversion of ECN-based congestionwillcontrol can still lead toa substantial buffer buildup at the congested queue that will be suffi- cientunfairness, which we believe is appropriate todrive the congested queue from the packet-markingnote here. The threat to thepacket-dropping regime. This transition could occur either because of buffer overflow, or because ofnetwork posed by theactive queue management policy described above that drops packets whensubversion of ECN-based congestion control in theaverage queuenetwork isabove RED's maximum threshold. At this point, all flows, includingessentially thesubverted flow, will beginsame as the threat posed by an end-system that intentionally fails tosee packet drops insteadcooperate with end-to-end congestion control. The deployment ofpacket marks,mechanisms in routers to address this threat is an open research question, and is discussed further in Section 10. Let us take the example described in Section 18.1.1, where the CE codepoint that was set in amaliciouspacket is erased: {'11' -> '10' orbroken'11' -> '01'}. The consequence for the congested upstream routerwill no longer be able to `erase' these indications of congestion inthat set thenetwork. IfCE codepoint is that this congestion indication does not reach the end nodesare deploying appropriate end-to-end congestion control, then the subverted flow will reducefor that flow. The source (even one which is completely cooperative and not malicious) is thus allowed to continue to increase itsarrivalsending ratein response to con- gestion. When the level of congestion(if it issufficiently reduced,a TCP flow, by increasing its congestion window). The flow potentially achieves better throughput than thecongested queue can return fromother flows that also share thepacket-dropping regime tocongested router, especially if there are no policing mechanisms or per-flow queueing mechanisms at that router. Consider thepacket-marking regime. The steady-state pattern could be onebehavior of thecongested queue oscillating between these two regimes. Inothercases,flows, especially if they are cooperative: that is, theconsequences of subvertingflows that do not experience subverted end-to-end congestioncontrol will not be severe enoughcontrol. They are likely todrivereduce their load (e.g., by reducing their window size) on the congestedlink into sufficiently-heavy congestion that packets are dropped instead of being marked. Inrouter, thus benefiting our subverted flow. This results in unfairness. As we discussed above, thiscase,unfairness could either be transient (because the congested queue is driven into the packet- marking regime), oscillatory (because theimplications for competing flows incongested queue oscillates between thenetwork will be a slightly-increased rate ofpacket markingor dropping,anda corresponding decrease inthebandwidth available to those flows. This can bepacket dropping regime), or more moderate but a persistent stable stateif the arrival rate of(because thesubverted flowcongested queue issufficiently small, relativenever driven to thelink bandwidth, that the average queue size at the congested router remains under control. In particular,packet dropping regime). The results would be similar if the subverted flowcould havewas intentionally avoiding end-to-end congestion control. One difference is that alimited bandwidth demand on the linkflow that is intentionally avoiding end-to-end congestion control atthis router, while still getting more than its "fair" share of the link. This limited demand could be due to a limited demand from the data source; a limitation from the TCP advertised window; a lower-bandwidth access pipe; or other factors. Thusthesubversion of ECN-basedend nodes can avoid end-to-end congestion controlcan still lead to unfairness, which we believeeven when the congested queue isappropriatein packet-dropping mode, by refusing tonote here. The threatreduce its sending rate in response to packet drops in the network. Thus the problems for the networkposed byfrom the subversion of ECN-basedcon- gestion control in the network is essentially the same as the threat posed by an end-system that intentionally fails to cooperate with end-to-endcongestioncontrol. The deployment of mechanisms in routers to address this threat is an open research question, and is discussed further in Section 10. Let us take the example described in Section 18.1.1, wherecontrol are less severe than theCE bit that was setproblems caused by the intentional avoidance of end-to-end congestion control ina packetthe end nodes. It iserased: {(1, 1) -> (1, 0)}. The conse- quence foralso thecongested upstream routercase thatset the CE bitit isthat this congestion indication does not reachconsiderably more difficult to control the behavior of the end nodesfor that flow. The source (even one which is completely cooperative and not malicious)than it isthus allowed to continuetoincrease its sending rate (if itcontrol the behavior of the infrastructure itself. This isa TCP flow,not to say that the problems for the network posed byincreasing its congestion window). The flow potentially achieves better throughput thantheother flowsnetwork's subversion of ECN-based congestion control are small; just thatalso share the congested router, especially if therethey areno policing mech- anismsdwarfed by the problems for the network posed by the subversion of either ECN-based orper-flow queueingother currently known packet-based congestion control mechanismsat that router. Considerby thebehavior ofend nodes. 19.2. Implications for theother flows, especially if they are cooperative:Subverted Flow When a source indicates that it is ECN-capable, there is an expectation thatis,theflowsrouters in the network thatdo not experience subverted end-to-end congestion control. Theyarelikely to reduce their load (e.g., by reducing their window size) on the congested router, thus benefiting our sub- verted flow. This resultscapable of participating inunfairness. As we discussed above, this unfairness could either be transient (becauseECN will use thecongested queueCE codepoint for indication of congestion. There isdriven intothepacket-marking regime), oscillatory (becausepotential benefit of using ECN in reducing the amount of packet loss (in addition to thecon- gestedreduced queueing delays because of active queueoscillates between the packet marking andmanagement policies). When the packetdropping regime), or more moderate butflows through apersistent stable state (becausetunnel where thecongested queue is never driven tonodes that thepacket dropping regime). The results would be similar iftunneled packets traverse are untrusted in some way, thesubverted flow was intentionally avoiding end-to-end congestion control. One differenceexpectation is thataIPsec will protect the flow from subversion thatis intentionally avoiding end-to-end congestion control atresults in undesirable consequences. In many cases, a subverted flow will benefit from theend nodes can avoidsubversion of end-to-end congestion controleven when the congested queue isfor that flow inpacket-dropping mode,the network, byrefusing to reduce its sending rate in responsereceiving more bandwidth than it would have otherwise, relative topacket drops in the network. Thuscompeting non-subverted flows. If theproblems forcongested queue reaches thenetwork frompacket-dropping stage, then the subversion ofECN-basedend-to-end congestion controlare less severe than the problems caused bymight or might not be of overall benefit to theintentional avoidancesubverted flow, depending on that flow's relative tradeoffs between throughput, loss, and delay. One form of subverting end-to-end congestion controlin the end nodes. It is also the case that itisconsiderably more difficulttocontrolfalsely indicate ECN-capability by setting thebehaviorECT codepoint. This has the consequence of downstream congested routers setting theend nodes than itCE codepoint in vain. However, as described in Section 9.1.2, if an ECT codepoint isto controlchanged in an IP tunnel, this can be detected at thebehavioregress point of theinfrastructure itself. This is not to say that the problems fortunnel, as long as thenetwork posed byinner header was not changed within thenetwork's subversiontunnel. The second form ofECN-basedsubverting end-to-end congestion controlare small; just that they are dwarfed by the problems for the network posed byis to erase thesubversion of either ECN-based or other cur- rently known packet-basedcongestioncontrol mechanismsindication by erasing theend nodes. 19.2. Implications for the Subverted Flow When a source indicates thatCE codepoint. In this case, it isECN-capable, there is an expecta- tion thatthe upstream congested routers that set the CE codepoint in vain. If an ECT codepoint is erased within an IP tunnel, then this can be detected at thenetwork that are capableegress point ofparticipat- ing in ECN will usethe tunnel, as long as the inner header was not changed within the tunnel. If the CEbit for indication of congestion. Therecodepoint isthe potential benefitset upstream ofusing ECN in reducingtheamountIP tunnel, then any erasure ofpacket loss (in addition tothereduced queueing delaysouter header's CE codepoint within the tunnel will have no effect because the inner header preserves the set value ofactive queue management policies). Whenthepacket flows through a tunnel whereCE codepoint. However, if thenodes thatCE codepoint is set within thetunneled packets traverse are untrusted in some way,tunnel, and erased either within or downstream of theexpectationtunnel, this isthat IPsec will protectnot necessarily detected at theflow from subversion that results in undesirable consequences. In many cases, a subverted flow will benefit fromegress point of the tunnel. With this subversion of end-to-end congestioncontrolcontrol, an end-system transport does not respond to the congestion indication. Along with the increased unfairness forthat flow inthenetwork, by receiving more bandwidth than it would have otherwise, relative to competingnon-subvertedflows. Ifflows described in the previous section, the congested router's queuereaches the packet-dropping stage, thencould continue to build, resulting in packet loss at thesubversion of end-to-endcongested router - which is a means for indicating congestioncontrol might orto the transport in any case. In the interim, the flow might experience higher queueing delays, possibly along with an increased bandwidth relative to other non-subverted flows. But transports do notbeinherently make assumptions ofoverall benefit toconsistently experiencing carefully managed queueing in thesubverted flow, depending onpath. We believe thatflow's relative tradeoffs between throughput, loss, and delay. One formthese forms of subverting end-to-end congestion controlis to falsely indicate ECN-capability by settingare no worse for theECT bit. This hassubverted flow than if theconse- quence of downstream congested routers settingadversary had simply dropped theCE bitpackets of that flow itself. 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control We have shown that, invain. However, as describedmany cases, a malicious or broken router that is able to change the bits inSection 9.1.2,the ECN field can do no more damage than if it had simply dropped theECT bit is changedpacket inan IP tunnel,question. However, thiscan be detected at the egress point of the tunnel, as long as the inner header wasis notchanged within the tunnel. The second form of subvertingtrue in all cases, in particular in the cases where the broken router subverted end-to-end congestion controlis to erase the congestion indication, eitherbyerasing the CE bit directly,either falsely indicating ECN-Capability or by erasing theECT bit whenECN congestion indication (in the CEbit is already set. In this case, it is the upstream congested routerscodepoint). While there are many ways thatset the CE bit in vain. If the ECT bit is erased within an IP tunnel, then thisa router can harm a flow by dropping packets, a router cannot subvert end-to-end congestion control by dropping packets. As an example, a router cannot subvert TCP congestion control by dropping data packets, acknowledgement packets, or control packets. Even though packet-dropping cannot bedetected at the egress point of the tunnel, as long as the inner header was not changed within the tunnel. If the CE bit is set upstream ofused to subvert end-to-end congestion control, there *are* non-ECN-based methods for subverting end-to-end congestion control that a broken or malicious router could use. For example, a broken router could duplicate data packets, thus effectively negating theIP tunnel, then any erasureeffects ofthe outer header's CE bit within the tunnel will have no effect because the inner header preserves the set valueend-to-end congestion control along some portion of theCE bit. However, if the CE bit is setpath. (For a router that duplicated packets withinthean IPsec tunnel,and erased either within or downstream ofthetun- nel, this is not necessarily detected atsecurity administrator can cause theegress point ofduplicate packets to be discarded by configuring anti-replay protection for thetun- nel. With this subversiontunnel.) This duplication ofend-to-end congestion control, an end-system transport does not respond topackets within thecongestion indication. Along withnetwork would have similar implications for theincreased unfairnessnetwork and for thenon-subverted flowssubverted flow as those described in Sections 18.1.1 and 18.1.4 above. 20. The Motivation for theprevious section,ECT Codepoints. 20.1. The Motivation for an ECT Codepoint. The need for an ECT codepoint is motivated by thecongested router's queue could continue to build, resultingfact that ECN will be deployed incrementally inpacket loss atan Internet where some transport protocols and routers understand ECN and some do not. With an ECT codepoint, thecongestedrouter- which is a means for indicating congestion tocan drop packets from flows that are not ECN- capable, but can *instead* set thetransportCE codepoint inany case. In the interim, the flow might experience higher queueing delays, possibly along withpackets that *are* ECN-capable. Because anincreased bandwidth relativeECT codepoint allows an end node toother non-subverted flows. But transports do not inherently make assumptions of consis- tently experiencing carefully managed queueing inhave thepath. We believe that these formsCE codepoint set in a packet *instead* ofsubverting end-to-end congestion control arehaving the packet dropped, an end node might have some incentive to deploy ECN. If there was noworse forECT codepoint, then thesubverted flow than ifrouter would have to set theadversary had simply droppedCE codepoint for packets from both ECN-capable and non-ECN-capable flows. In this case, there would be no incentive for end-nodes to deploy ECN, and no viable path of incremental deployment from a non- ECN world to an ECN-capable world. Consider thepackets of that flow itself. 19.3. Non-ECN-Based Methodsfirst stages ofSubverting End-to-end Congestion Control We have shown that, in many cases,such an incremental deployment, where amalicious or broken router that is able to changesubset of thebits inflows are ECN- capable. At theECN field can do no more damage than if it had simply droppedonset of congestion, when the packetin question.dropping/marking rate would be low, routers would only set CE codepoints, rather than dropping packets. However,thisonly those flows that are ECN-capable would understand and respond to CE packets. The result isnot true in all cases, in particular inthat thecases whereECN-capable flows would back off, and thebroken router subverted end-to-end congestion control by either falsely indicating ECN-Capability or by erasingnon-ECN- capable flows would be unaware of the ECN signals and would continue to open their congestionindication (in the CE-bit). Whilewindows. In this case, there aremany ways that a router can harm a flow by dropping packets, a router cannot subvert end-to-end conges- tion control by dropping packets. As an example, a router cannot subvert TCP congestion control by dropping data packets, acknowledge- ment packets, or control packets. Even though packet-dropping cannot be used to subvert end-to-end con- gestion control, there *are* non-ECN-based methods for subverting end-to-endtwo possible outcomes: (1) the ECN-capable flows back off, the non-ECN-capable flows get all of the bandwidth, and congestioncontrol that a brokenremains mild, ormalicious router could use. For example, a broken router could duplicate data packets, thus effectively negating(2) theeffects of end-to-endECN-capable flows back off, the non-ECN-capable flows don't, and congestioncontrol along some portion ofincreases until thepath. (For arouterthat duplicated pack- ets within an IPsec tunnel, the security administrator can causetransitions from setting theduplicate packetsCE codepoint tobe discarded by configuring anti-replay protec- tion fordropping packets. While this second outcome evens out thetunnel.) This duplication of packets withinfairness, thenetworkECN-capable flows wouldhave similar implications forstill receive little benefit from being ECN-capable, because thenetwork and forincreased congestion would drive thesubvertedrouter to packet- dropping behavior. A flow that advertised itself asthose describedECN-Capable but does not respond to CE codepoints is functionally equivalent to a flow that turns off congestion control, as discussed earlier inSections 18.1.1this document. Thus, in a world when a subset of the flows are ECN-capable, but where ECN-capable flows have no mechanism for indicating that fact to the routers, there would be less effective and18.1.4 above. 20.less fair congestion control in the Internet, resulting in a strong incentive for end nodes not to deploy ECN. 20.2. The Motivation forthetwo ECTbit.Codepoints. Theneedprimary motivation for the two ECTbitcodepoints ismotivated by the fact thatto provide a one-bit ECNwill be deployed incrementally in an Internet where some transport protocols and routers understandnonce. The ECNand some do not. Withnonce allows theECT bit,development of mechanisms for therouter can drop packets from flowssender to probabilistically verify that network elements are notECN-capable, but can *instead* seterasing the CE codepoint, and that data receivers are properly reporting to the sender the receipt of packets with the CE codepoint set. Another possibility for senders to detect misbehaving network elements or receivers would be for theCE bit in packets that *are* ECN-capable. Becausedata sender to occasionally send a data packet with theECT bit allows an end nodeCE codepoint set, tohavesee if the receiver reports receiving the CEbit setcodepoint. Of course, if these packets encountered congestion ina packet *instead* of havingthepacket dropped, an end nodenetwork, the router mighthave some incentive to deploy ECN. If there wasmake noECT indication, thenchange in therouterpackets, because the CE codepoint wouldhavealready be set. Thus, for packets sent with the CE codepoint set, the TCP end-nodes could not determine if some router intended to set the CEbit for packets from both ECN-capable and non-ECN-capable flows. Incodepoint in these packets. For thiscase, therereason, sending packets with the CE codepoint wouldbe no incentive for end-nodeshave todeploy ECN,be done sparingly, andno viable path of incremental deployment fromwould be anon-ECN world to an ECN-capable world. Considerless effective check against misbehaving network elements and receivers than would be thefirst stages of such an incremental deployment, where a subsetECN nonce. The assignment of theflows are ECN-capable. Atfourth ECN codepoint to ECT(1) precludes theonsetuse ofcongestion, whenthis codepoint for other purposes. For clarity, we briefly list those possible purposes here. One possibility might have been for thepacket dropping/marking rate would be low, routers would only set CE bits, rather than dropping packets.data sender to use the fourth ECN codepoint to indicate an alternate semantics for ECN. However,only those flows that are ECN-capablethis seems to us more appropriate to be signalled using a differentiated services codepoint in the DS field. A second possible use for the fourth ECN codepoint wouldunder- stand and respondhave been toCE packets. The result is thatgive theECN-capable flows would back off, androuter two separate codepoints for thenon-ECN-capable flows would be unawareindication ofthe ECN signalscongestion, CE(0) andwould continue to open theirCE(1), for mild and severe congestionwin- dows. Inrespectively. While thiscase,could be useful in some cases, this certainly does not seem a compelling requirement at this point. If thereare two possible outcomes: (1)was judged to be a compelling need for this, theECN-capable flows back off,complications of incremental deployment would most likely necessitate more that just one codepoint for this function. A third use that has been informally proposed for thenon-ECN-capable flows get allECN codepoint is for use in some forms of multicast congestion control, based on randomized procedures for duplicating marked packets at routers. Some proposed multicast packet duplication procedures are based on a new ECN codepoint that (1) conveys thebandwidth, andfact that congestionremains mild, or (2)occurred upstream of theECN-capable flows back off,duplication point that marked thenon-ECN-capable flows don't,packet with this codepoint and (2) can detect congestionincreases until the router transitionsdownstream of that duplication point. ECT(1) can serve this purpose because it is both distinct fromsetting theECT(0) and is replaced by CEbitwhen ECN marking occurs in response todropping packets. While this second outcome evens out the fairness, the ECN-capable flows would still receive little benefit from being ECN-capable, because the increasedcongestion or incipient congestion. Explanation of how this enhanced version of ECN woulddrive the router to packet- dropping behavior. A flow that advertised itself as ECN-Capable but does not respond to CE bitsbe used by multicast congestion control isfunctionally equivalent to a flow that turns off conges- tion control, as discussed earlier inbeyond the scope of thisdocument. Thus,document, as are ECN-aware multicast packet duplication procedures and the processing of the ECN field at multicast receivers ina world when a subsetall cases (i.e., irrespective of theflows are ECN-capable, but where ECN-capable flows have no mechanismmulticast packet duplication procedure(s) used). The specification of IP tunnel modifications forindicatingECN in this document assumes thatfactthe only change made to therouters, there would be less effective and less fair congestion control inouter IP header's ECN field between tunnel endpoints is to set theInternet, resultingCE codepoint to indicate congestion. This is not consistent with some of the proposed uses of ECT(1) by the multicast duplication procedures ina strong incentivethe previous paragraph, and such procedures SHOULD NOT be deployed within tunnels configured forend nodesfull ECN functionality. Limited ECN functionality may be used instead, although in practice many tunnel protocols (including IPsec) will notto deploy ECN.work correctly if multicast traffic duplication occurs within the tunnel 21. Why use Two Bits in the IP Header? Given the need for an ECT indication in the IP header, there still remains the question of whether the ECT (ECN-Capable Transport) and CE (Congestion Experienced)indicationscodepoints should have been overloaded on a single bit. This overloaded-one-bit alternative, explored in [Floyd94], would have involved a single bit with two values. One value, "ECT and not CE", would represent an ECN-Capable Transport, and the other value, "CE or not ECT", would represent eitherConges- tionCongestion Experienced or a non-ECN-Capable transport. One difference between the one-bit and two-bit implementationscon- cernsconcerns packets that traverse multiple congested routers. Consider a CE packet that arrives at a second congested router, and is selected by the active queue management at that router for either marking or dropping. In the one-bit implementation, the second congested router has no choice but to drop the CE packet, because it cannotdistin- guishdistinguish between a CE packet and a non-ECT packet. In the two-bit implementation, the second congested router has the choice of either dropping the CE packet, or of leaving it alone with the CEbitcodepoint set. Another difference between the one-bit and two-bit implementations comes from the fact that with the one-bit implementation, receivers in a single flow cannot distinguish between CE and non-ECT packets. Thus, in the one-bit implementation an ECN-capable data sender would have to unambiguously indicate to the receiver or receivers whether each packet had been sent as ECN-Capable or as non-ECN-Capable. One possibility would be for the sender to indicate in the transport header whether the packet was sent as ECN-Capable. A secondpossi- bilitypossibility that would involve a functional limitation for the one- bit implementation would be for the sender to unambiguously indicate that it was going to send *all* of its packets as ECN-Capable or asnon- ECN-Capable.non-ECN-Capable. For a multicast transport protocol, this unambiguous indication would have to be apparent to receivers joining an on-going multicast session. Another concern that was described earlier (and recommended in this document) is that transports (particularly TCP) should not mark pure ACK packets or retransmitted packets as being ECN-Capable. A pure ACK packet from a non-ECN-capable transport could be dropped, without necessarily having an impact on the transport from a congestioncon- trolcontrol perspective (because subsequent ACKs are cumulative). AnECN- capableECN-capable transport reacting to the CEbit setcodepoint in a pure ACK packet by reducing the window would be at a disadvantage in comparison to a non-ECN-capable transport. For this reason (and for reasons described earlier in relation to retransmitted packets), it is desirable to have theECN-Capable bit indicationECT codepoint set on a per-packet basis. Another advantage of the two-bit approach is that it is somewhat more robust. The most critical issue, discussed in Section 8, is that the default indication should be that of a non-ECN-Capable transport. In a two-bit implementation, this requirement for the default valuesim- plysimply means that theECT bitnon-ECT codepoint should be`OFF' bythe default. In theone- bitone-bit implementation, this means that the single overloaded bit should by default be in the "CE or not ECT" position. This is less clear and straightforward, and possibly more open to incorrectimplementa- tionsimplementations either in the end nodes or in the routers. In summary, while the one-bit implementation could be a possible implementation, it has the following significant limitations relative to the two-bit implementation. First, the one-bit implementation has more limited functionality for the treatment of CE packets at asec- ondsecond congested router. Second, the one-bit implementation requires either that extra information be carried in the transport header of packets from ECN-Capable flows (to convey the functionality of the second bit elsewhere, namely in the transport header), or that senders in ECN-Capable flows accept the limitation that receivers must be able to determine a priori which packets are ECN-Capable and which are not ECN-Capable. Third, the one-bit implementation ispos- siblypossibly more open to errors from faulty implementations that choose the wrong default value for the ECN bit. We believe that the use of the extra bit in the IP header for the ECT-bit is extremely valuable to overcome these limitations. 22. Historical Definitions for the IPv4 TOS Octet RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP header. In RFC 791, bits 6 and 7 of the ToS octet are listed as "Reserved for Future Use", and are shown set to zero. The first two fields of the ToS octet were defined as the Precedence and Type of Service (TOS) fields. 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | PRECEDENCE | TOS | 0 | 0 | RFC 791 +-----+-----+-----+-----+-----+-----+-----+-----+ RFC 1122 included bits 6 and 7 in the TOS field, though it did not discuss any specific use for those two bits: 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | PRECEDENCE | TOS | RFC 1122 +-----+-----+-----+-----+-----+-----+-----+-----+ The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | PRECEDENCE | TOS | MBZ | RFC 1349 +-----+-----+-----+-----+-----+-----+-----+-----+ Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary Cost". In addition to the Precedence and Type of Service (TOS) fields, the last field, MBZ (for "must be zero") was defined ascur- rentlycurrently unused. RFC 1349 stated that "The originator of a datagram sets [the MBZ] field to zero (unless participating in an Internet protocol experiment which makes use of that bit)." RFC 1455 [RFC 1455] defined an experimental standard that used all four bits in the TOS field to request a guaranteed level of link security. RFC 1349 and RFC 1455 have been obsoleted by "Definition of theDif- ferentiatedDifferentiated Services Field (DS Field) in the IPv4 and IPv6 Headers" [RFC2474] in which bits 6 and 7 of the DS field are listed asCur- rentlyCurrently Unused (CU). RFC 2780 [RFC2780] specified ECN as anexperi- mentalexperimental use of the two-bit CU field. RFC 2780 updated the definition of the DS Field to only encompass the first six bits of this octet rather than all eight bits; these first six bits are defined as the Differentiated Services CodePoint (DSCP): 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | DSCP | CU | RFCs 2474, 2780 +-----+-----+-----+-----+-----+-----+-----+-----+ Because of this unstable history, the definition of the ECN field in this document cannot be guaranteed to be backwards compatible with all past uses of these two bits. Prior to RFC 2474, routers were not permitted to modify bits in either the DSCP or ECN field of packets forwarded through them, and hence routers that comply only with RFCs prior to 2474 should have no effect on ECN. For end nodes, bit 7 (the second ECNCEbit) must betrans- mittedtransmitted as zero for any implementation compliant only with RFCs prior to 2474. Such nodes may transmit bit 6 (the first ECNECTbit) as one for the "Minimize Monetary Cost" provision of RFC 1349 or the experiment authorized by RFC 1455; neither this aspect of RFC 1349 nor the experiment in RFC 1455 were widely implemented or used. The damage that could be done by a broken, non-conformant router wouldbe to "erase"include "erasing" the CEbitcodepoint for anECN- capableECN-capable packet that arrived at the router with the CEbitcodepoint set, orsetsetting the CEbitcodepoint even in the absence of congestion. This has been discussed in the section on"Non-compli- ance"Non-compliance in the Network". The damage that could be done in an ECN-capable environment by a non- ECN-capable end-node transmitting packets with the ECTbitcodepoint set has been discussed in the section on "Non-compliance by the End Nodes". 23. IANA Considerations Thebitscodepoints forECT and CE inthe ECN Field of the IP header and the bits for CWR and ECE in the TCP header are specified by the Standards Action of this RFC, as is required by RFC 2780.We would note that this RFC does not define the codepoint of (ECT=0, CE=1) for the ECT and CE bits.IANA allocated the IPSEC Security Association Attribute value 10 for the ECN Tunnel use described in Section 9.2.1.2 above at the request of David Black in November 1999. If this draft is approved forpub- licationpublication as an RFC, IANA should change the Reference for thisalloca- tionallocation from David Black's request to this RFC based on its RFC number. AUTHORS' ADDRESSES K. K. Ramakrishnan TeraOptic Networks, Inc. Phone: +1 (408) 666-8650 Email: kk@teraoptic.com Sally Floyd Phone: +1 (510) 666-2989 ACIRI Email: floyd@aciri.org URL: http://www.aciri.org/floyd/ David L. Black EMC Corporation 42 South St. Hopkinton, MA 01748 Phone: +1 (508) 435-1000 x75140 Email: black_david@emc.com This draft was created inJanuaryFebruary 2001. It expiresJulyAugust 2001.