Internet Engineering Task Force                       K. K. Ramakrishnan
INTERNET DRAFT                                        TeraOptic Networks
draft-ietf-tsvwg-ecn-01.txt
draft-ietf-tsvwg-ecn-02.txt                                  Sally Floyd
                                                                   ACIRI
                                                                D. Black
                                                                     EMC
                                                           January,
                                                          February, 2001
                                                   Expires: July, August, 2001

      The Addition of Explicit Congestion Notification (ECN) to IP

                          Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   This document specifies the incorporation of ECN (Explicit Congestion
   Notification) to TCP and IP, including ECN's use of two bits in the
   IP header.  We begin by describing TCP's use of packet drops as an
   indication of congestion.  Next we explain that with the addition of
   active queue management (e.g., RED) to the Internet infrastructure,
   where routers detect congestion before the queue overflows, routers
   are no longer limited to packet drops as an indication of congestion.
   Routers can instead set the Congestion Experienced (CE) bit codepoint in
   the IP header of packets from ECN-capable transports.  We describe
   when the CE bit codepoint is to be set in routers, and describe
   modifications needed to TCP to make it ECN-capable.  Modifications to
   other transport protocols (e.g., unreliable unicast or multicast,
   reliable multicast, other reliable unicast transport protocols) could
   be considered as those protocols are developed and advance through
   the standards process.

   We also describe in this document the issues involving the use of ECN
   within IP tunnels, and within IPsec tunnels in particular.

   One of the guiding principles for this document is that all the
   mechanisms specified here are incrementally deployable.

Table of Contents
     1.  Introduction
     2.  Conventions and Acronyms
     3.  Assumptions and General Principles
     4.  Active Queue Management (AQM)
     5.  Explicit Congestion Notification in IP
     5.1.  ECN as an Indication of Persistent Congestion
     5.2.  Dropped or Corrupted Packets
     5.3.  Fragmentation
     6.  Support from the Transport Protocol
     6.1.  TCP
     6.1.1.
     6.1.1  TCP Initialization
     6.1.1.1.  Robust TCP Initialization with an Echoed Reserve Field
     6.1.2.  The TCP Sender
     6.1.3.  The TCP Receiver
     6.1.4.  Congestion on the ACK-path
     6.1.5.  Retransmitted TCP packets
     6.1.6.  TCP Window Probes.
     7.  Non-compliance by the End Nodes
     8.  Non-compliance in the Network
     8.1.  Complications Introduced by Split Paths
     9.  Encapsulated Packets
     9.1.  IP packets encapsulated in IP
     9.1.1.  The Limited-functionality and Full-functionality Options
     9.1.2.  Changes to the ECN Field within an IP Tunnel.
     9.2.  IPsec Tunnels
     9.2.1.  Negotiation between Tunnel Endpoints
     9.2.1.1.  ECN Tunnel Security Association Database Field
     9.2.1.2.  ECN Tunnel Security Association Attribute
     9.2.1.3.  Changes to IPsec Tunnel Header Processing
     9.2.2.  Changes to the ECN Field within an IPsec Tunnel.
     9.2.3.  Comments for IPsec Support
     9.3.  IP packets encapsulated in non-IP packet headers.
     10.  Issues Raised by Monitoring and Policing Devices
     11.  Evaluations of ECN
     11.1.  Related Work Evaluating ECN
     11.2.  A Discussion of the ECN nonce.
     11.2.1.  The Incremental Deployment of ECT(1) in Routers.
     12.  Summary of changes required in IP and TCP
     13.  Conclusions
     14.  Acknowledgements
     15.  References
     16.  Security Considerations
     17.  IPv4 Header Checksum Recalculation
     18.  Possible Changes to the ECN Field in the Network
     18.1.  Possible Changes to the IP Header
     18.1.1.  Erasing the Congestion Indication
     18.1.2.  Falsely Reporting Congestion
     18.1.3.  Disabling ECN-Capability
     18.1.4.  Falsely Indicating ECN-Capability
     18.1.5.  Changes with No Functional Effect
     18.2.  Information carried in the Transport Header
     18.3.  Split Paths
     19.  Implications of Subverting End-to-End Congestion Control
     19.1.  Implications for the Network and for Competing Flows
     19.2.  Implications for the Subverted Flow
     19.3.  Non-ECN-Based Methods of Subverting End-to-end Congestion Control
     20.  The Motivation for the ECT bit. Codepoints.
     20.1.  The Motivation for an ECT Codepoint.
     20.2.  The Motivation for two ECT Codepoints.
     21.  Why use Two Bits in the IP Header?
     22.  Historical Definitions for the IPv4 TOS Octet
     23.  IANA Considerations

RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare
this with draft-ietf-tsvwg-ecn-00, draft-ietf-tsvwg-ecn-01, compare the following:
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-00.troff"
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-01.troff"
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-02.troff"
Changes from draft-ietf-tsvwg-ecn-00:
* Deleted Section 6.1.1.2. on "Robust TCP Initialization with no
response to draft-ietf-tsvwg-ecn-01:
  Added the SYN", ECT(1) codepoint, and modified the paragraph in the Conclusions
referring changed references about bits to this.
* Added
references about codepoints in many places.  Also added Section 23 11.2 on IANA Considerations.
* Added two paragraphs to
"A Discussion of the ECN nonce", and Section 18.2 20.2 on denial-of-service attacks.
* "The Motivation for
two ECT Codepoints".
  Added some text about the ECN nonce being a research issue.
* Moved two paragraphs about paragraph saying that by default, the discussion of setting
the CWR bit from Section 6.1.3 CE codepoint applies to all Differentiated Services Per-Hop
Behaviors.
  Added Section 6.1.2.
* Various small changes:
  Adding several small clarifying sentences in Section 12, 22.
  Small clarification 5.3 on fragmentation.
  Added "A host MUST NOT set ECT on SYN or SYN-ACK packets." to text in Section 19.2.
  Deleted a few unnecessary sentences in the end
of Section 9.
  Updated some references 6.1.1, just to Section X.
  Added more be explicit.
  Corrected some references to RFC 2780.
  Deleted references "Section 19" to internet-drafts. "Section 22".
  Clarified terminology for "non-ECN-setup SYN packet", including the
following:  "Receivers MUST correctly handle all forms of the non-ECN-
setup SYN that ECN is defined identically in IPv4 and SYN-ACK packets." in IPv6.

1.  Introduction

   TCP's congestion control and avoidance algorithms are based on the
   notion that the network is a black-box [Jacobson88, Jacobson90].  The
   network's state of congestion or otherwise is determined by end-sys-
   tems end-
   systems probing for the network state, by gradually increasing the
   load on the network (by increasing the window of packets that are out-
   standing
   outstanding in the network) until the network becomes congested and a
   packet is lost.  Treating the network as a "black-box" and treating
   loss as an indication of congestion in the network is appropriate for
   pure best-effort data carried by TCP, with little or no sensitivity
   to delay or loss of individual packets.  In addition, TCP's conges-
   tion
   congestion management algorithms have techniques built-in (such as
   Fast Retransmit and Fast Recovery) to minimize the impact of losses,
   from a throughput perspective.  However, these mechanisms are not
   intended to help applications that are in fact sensitive to the delay
   or loss of one or more individual packets.  Interactive traffic such
   as tel-
   net, telnet, web-browsing, and transfer of audio and video data can be sensi-
   tive
   sensitive to packet losses (especially when using an unreliable data
   delivery transport such as UDP) or to the increased latency of the
   packet caused by the need to retransmit the packet after a loss (with
   the reliable data delivery semantics provided by TCP).

   Since TCP determines the appropriate congestion window to use by
   gradually increasing the window size until it experiences a dropped
   packet, this causes the queues at the bottleneck router to build up.
   With most packet drop policies at the router that are not sensitive
   to the load placed by each individual flow (e.g., tail-drop on queue
   overflow), this means that some of the packets of latency-sensitive
   flows may be dropped. In addition, such drop policies lead to syn-
   chronization
   synchronization of loss across multiple flows.

   Active queue management mechanisms detect congestion before the queue
   overflows, and provide an indication of this congestion to the end
   nodes.  Thus, active queue management can reduce unnecessary queueing
   delay for all traffic sharing that queue.  The advantages of active
   queue management are discussed in RFC 2309 [RFC2309].  Active queue
   management avoids some of the bad properties of dropping on queue
   overflow, including the undesirable synchronization of loss across
   multiple flows.  More importantly, active queue management means that
   transport protocols with mechanisms for congestion control (e.g.,
   TCP) do not have to rely on buffer overflow as the only indication of
   congestion.

   Active queue management mechanisms may use one of several methods for
   indicating congestion to end-nodes. One is to use packet drops, as is
   currently done. However, active queue management allows the router to
   separate policies of queueing or dropping packets from the policies
   for indicating congestion. Thus, active queue management allows
   routers to use the Congestion Experienced (CE) bit codepoint in a packet
   header as an indication of congestion, instead of relying solely on
   packet drops. This has the potential of reducing the impact of loss
   on latency-sensitive flows.

   This document is intended to obsolete RFC 2481, "A Proposal to add
   Explicit Congestion Notification (ECN) to IP", which defined ECN as
   an Experimental Protocol for the Internet Community.

   RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This
   document obsoletes three subsequent internet-drafts on ECN, "IPsec
   Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP
   with ECN: The Treatment of Retransmitted Data Packets".  This
   document is intended largely to merge the earlier documents all into
   a single document, for greater clarity, in preparation to becoming a
   Proposed Standard.

2.  Conventions and Acronyms

   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in [B97].

3.  Assumptions and General Principles

   In this section, we describe some of the important design principles
   and assumptions that guided the design choices in this proposal.

   * Because ECN is likely to be adopted gradually, accommodating migra-
   tion
   migration is essential. Some routers may still only drop packets to indi-
   cate
   indicate congestion, and some end-systems may not be ECN-capable. The
   most viable strategy is one that accommodates incremental deployment
   without having to resort to "islands" of ECN-capable and non-ECN-
   capable environments.
   * New mechanisms for congestion control and avoidance need to co-
   exist and cooperate with existing mechanisms for congestion control.
   In particular, new mechanisms have to co-exist with TCP's current
   methods of adapting to congestion and with routers' current practice
   of dropping packets in periods of congestion.
   * Congestion may persist over different time-scales. The time scales
   that we are concerned with are congestion events that may last longer
   than a round-trip time.
   * The number of packets in an individual flow (e.g., TCP connection
   or an exchange using UDP) may range from a small number of packets to
   quite a large number. We are interested in managing the congestion
   caused by flows that send enough packets so that they are still
   active when network feedback reaches them.
   * Asymmetric routing is likely to be a normal occurrence in the
   Internet. The path (sequence of links and routers) followed by data
   packets may be different from the path followed by the acknowledgment
   packets in the reverse direction.
   * Many routers process the "regular" headers in IP packets more effi-
   ciently
   efficiently than they process the header information in IP options.
   This suggests keeping congestion experienced information in the
   regular headers of an IP packet.
   * It must be recognized that not all end-systems will cooperate in
   mechanisms for congestion control. However, new mechanisms shouldn't
   make it easier for TCP applications to disable TCP congestion con-
   trol.
   control.  The benefit of lying about participating in new mechanisms
   such as ECN-capability should be small.

4.  Active Queue Management (AQM)

   Random Early Detection (RED) is one mechanism for Active Queue Man-
   agement
   Management (AQM) that has been proposed to detect incipient
   congestion [FJ93], and is currently being deployed in the Internet
   [RFC2309].  AQM is meant to be a general mechanism using one of
   several alterna-
   tives alternatives for congestion indication, but in the absence of
   ECN, AQM is restricted to using packet drops as a mechanism for
   congestion indi-
   cation. indication.  AQM drops packets based on the average queue
   length exceed-
   ing exceeding a threshold, rather than only when the queue
   overflows.  However, because AQM may drop packets before the queue
   actually overflows, AQM is not always forced by memory limitations to
   discard the packet.

   AQM can set a Congestion Experienced (CE) bit codepoint in the packet
   header instead of dropping the packet, when such a bit field is provided
   in the IP header and understood by the transport protocol.  The use
   of the CE
   bit codepoint with ECN allows the receiver(s) to receive the
   packet, avoiding the potential for excessive delays due to
   retransmissions after packet losses.  We use the term 'CE packet' to
   denote a packet that has the CE bit codepoint set.

5.  Explicit Congestion Notification in IP

   This document specifies that the Internet provide a congestion indi-
   cation
   indication for incipient congestion (as in RED and earlier work
   [RJ90]) where the notification can sometimes be through marking
   packets rather than dropping them.  This uses an ECN field in the IP
   header with two bits. bits, making four ECN codepoints, '00' to '11'.  The
   ECN-Capable Transport (ECT) bit is codepoints '10' and '01' are set by the
   data sender to indicate that the end-points of the transport protocol
   are ECN-capable. ECN-capable; we call them ECT(0) and ECT(1) respectively.  The CE bit is set by the router to indicate conges-
   tion
   phrase "the ECT codepoint" in this documents refers to either of the end nodes.
   two ECT codepoints.  Routers that have a packet arriving at a full
   queue drop treat the packet, just ECT(0) and ECT(1) codepoints
   as they do in equivalent.  Senders are free to use either the absence of ECN.

   Bits 6 and 7 in the IPv4 TOS octet are designated as ECT(0) or the ECN field.
   Bit 6 is designated as
   ECT(1) codepoint to indicate ECT, on a packet-by-packet basis.

   The use of both the ECT bit, two codepoints for ECT, ECT(0) and bit 7 ECT(1), is designated as
   motivated primarily by the CE
   bit.  The IPv4 TOS octet corresponds desire to the Traffic Class octet in
   IPv6.  The definitions allow mechanisms for the IPv4 TOS octet [RFC791] data
   sender to verify that network elements are not erasing the CE
   codepoint, and that data receivers are properly reporting to the IPv6
   Traffic Class octet have been superseded
   sender the receipt of packets with the CE codepoint set, as required
   by the six-bit DS (Differen-
   tiated Services) Field [RFC2474, RFC2780].  Bits 6 transport protocol.  Guidelines for the senders and 7 are listed
   in [RFC2474] as Currently Unused, receivers
   to differentiate between the ECT(0) and are specified ECT(1) codepoints will be
   addressed in RFC 2780 as
   approved separate documents, for experimental use each transport protocol.  In
   particular, this document does not address mechanisms for TCP end-
   nodes to differentiate between the ECT(0) and ECT(1) codepoints.

   Protocols and senders that only require a single ECT codepoint SHOULD
   use ECT(0).

   The not-ECT codepoint '00' indicates a packet that is not using ECN.  Section 19 gives
   The CE codepoint '11' is set by a brief his-
   tory of router to indicate congestion to
   the TOS octet.

            0     1     2     3     4     5     6     7
         +-----+-----+-----+-----+-----+-----+-----+-----+
         |             DS FIELD end nodes.  Routers that have a packet arriving at a full queue
   drop the packet, just as they do in the absence of ECN.

         +-----+-----+
         | ECN FIELD |
         |                                   |           |
         |               DSCP                |
         +-----+-----+
           ECT |   CE  |
         +-----+-----+-----+-----+-----+-----+-----+-----+

           DSCP: differentiated services codepoint
           ECN:  Explicit Congestion Notification

          Figure 1:         The Differentiated Services ECT and ECN Fields in IP.

   Because CE bits defined in RFC 2481.
            0     0         Not-ECT
            0     1         ECT(1)
            1     0         ECT(0)
            1     1         CE

          Figure 1: The ECN Field in IP.

   The use of two ECT codepoints essentially gives a one-bit ECN nonce
   in packet headers, and routers necessarily "erase" the unstable history of nonce when
   they set the TOS octet, CE codepoint [SCWA99].  For example, routers that erased
   the use CE codepoint would face additional difficulty in reconstructing
   the original nonce, and thus repeated erasure of the ECN
   field as specified in this document cannot CE codepoint
   would be guaranteed more likely to be back-
   wards compatible with all past uses of these two bits.  The potential
   dangers of this lack of backwards compatibility are discussed in Sec-
   tion 19.

   Upon the receipt detected by an ECN-Capable transport of a single CE packet, the congestion control algorithms followed at end-nodes.  The ECN nonce
   also can address the end-systems MUST be
   essentially problem of misbehaving transport receivers lying
   to the same as transport sender about whether or not the congestion control response to CE codepoint was set
   in a *single*
   dropped packet.  For example,  The motivations for ECN-Capable TCP the source TCP use of two ECT codepoints is
   required to halve its congestion window for any window
   discussed in more detail in Section 20, along with some discussion of data con-
   taining either a packet drop or an ECN indication.

   One reason
   alternate possibilities for requiring that the congestion-control response to the
   CE packet be essentially the same as fourth ECT codepoint.  Backwards
   compatibility with earlier ECN implementations that do not understand
   the response to a dropped packet ECT(1) codepoint is to accommodate discussed in Section 11.

   In RFC 2481 [RFC2481], the incremental deployment of ECN in both end-sys-
   tems and in routers.  Some routers may drop ECN-Capable packets
   (e.g., using field was divided into the same AQM policies for congestion detection) while
   other routers set ECN-Capable
   Transport (ECT) bit and the CE bit, for equivalent levels of congestion.
   Similarly, a router might drop a non-ECN-Capable packet but set bit.  The ECN field with only the
   CE ECN-
   Capable Transport (ECT) bit set in an ECN-Capable packet, for equivalent levels of congestion.
   If there were different congestion control responses RFC 2481 corresponds to a the ECT(0)
   codepoint in this document, and the ECN field with both the ECT and
   CE bit
   indication than in RFC 2481 corresponds to a packet drop, the CE codepoint in this could result document.
   The '01' codepoint was left undefined in unfair treat-
   ment for different flows.

   An additional goal RFC 2481, and this is that the end-systems should react to congestion
   at most once per window of data (i.e., at most once per round-trip
   time), to avoid reacting multiple times to multiple indications of
   congestion within a round-trip time.

   For a router,
   reason for recommending the CE bit use of an ECN-Capable packet should ECT(0) when only be set
   if the router would otherwise have dropped a single ECT
   codepoint is needed.

            0     1     2     3     4     5     6     7
         +-----+-----+-----+-----+-----+-----+-----+-----+
         |          DS FIELD, DSCP           | ECN FIELD |
         +-----+-----+-----+-----+-----+-----+-----+-----+

           DSCP: differentiated services codepoint
           ECN:  Explicit Congestion Notification

          Figure 2: The Differentiated Services and ECN Fields in IP.

   Bits 6 and 7 in the packet IPv4 TOS octet are designated as an indica-
   tion of congestion to the end nodes. When ECN field.
   The IPv4 TOS octet corresponds to the router's buffer is not
   yet full Traffic Class octet in IPv6,
   and the router ECN field is prepared to drop a packet to inform end
   nodes of incipient congestion, defined identically in both cases.  The
   definitions for the router should first check to see
   if IPv4 TOS octet [RFC791] and the ECT bit is set IPv6 Traffic
   Class octet have been superseded by the six-bit DS (Differentiated
   Services) Field [RFC2474, RFC2780].  Bits 6 and 7 are listed in that packet's IP header.  If so, then
   instead
   [RFC2474] as Currently Unused, and are specified in RFC 2780 as
   approved for experimental use for ECN.  Section 22 gives a brief
   history of dropping the packet, TOS octet.

   Because of the router MAY instead set unstable history of the CE bit
   in TOS octet, the IP header.

   An environment where all end nodes were ECN-Capable could allow new
   criteria to be developed for setting use of the CE bit, and new congestion
   control mechanisms for end-node reaction to CE packets.  However,
   this is a research issue, and ECN
   field as such is not addressed specified in this docu-
   ment.

   When a CE packet (i.e., a packet document cannot be guaranteed to be
   backwards compatible with those past uses of these two bits that has pre-
   date ECN.  The potential dangers of this lack of backwards
   compatibility are discussed in Section 22.

   Upon the CE bit set) is received receipt by an ECN-Capable transport of a router, the single CE bit is left unchanged, and packet,
   the packet is trans-
   mitted as usual. When severe congestion has occurred and control algorithms followed at the router's
   queue is full, then end-systems MUST be
   essentially the router has no choice but same as the congestion control response to drop some packet
   when a new packet arrives.  We anticipate that such packet losses
   will become relatively infrequent when a majority of end-systems
   become *single*
   dropped packet.  For example, for ECN-Capable and participate in TCP or other compatible conges-
   tion control mechanisms. In an ECN-Capable environment that the source TCP is ade-
   quately-provisioned network,
   required to halve its congestion window for any window of data
   containing either a packet losses should occur primarily
   during transients drop or in the presence of non-cooperating sources.

   We expect an ECN indication.

   One reason for requiring that routers will set the CE bit in congestion-control response to incipient
   congestion as indicated by the average queue size, using
   CE packet be essentially the RED
   algorithms suggested in [FJ93, RFC2309].  To same as the best of our knowl-
   edge, this response to a dropped packet
   is to accommodate the only proposal currently under discussion incremental deployment of ECN in the
   IETF for both end-
   systems and in routers.  Some routers to may drop ECN-Capable packets proactively, before
   (e.g., using the buffer over-
   flows.  However, this document does not attempt to specify a particu-
   lar mechanism same AQM policies for active queue management, leaving that endeavor, if
   needed, to other areas of the IETF.  While ECN is inextricably tied
   up with the need to have a reasonable active queue management mecha-
   nism at the router, the reverse does not hold; active queue manage-
   ment mechanisms have been developed and deployed independent of ECN,
   using packet drops as indications of congestion in the absence of ECN
   in detection) while
   other routers set the IP architecture.

5.1.  ECN as an Indication CE codepoint, for equivalent levels of Persistent Congestion

   We emphasize that
   congestion.  Similarly, a *single* router might drop a non-ECN-Capable packet with
   but set the CE bit set codepoint in an IP
   packet causes the transport layer to respond, in terms ECN-Capable packet, for equivalent
   levels of congestion
   control, as it would congestion.  If there were different congestion control
   responses to a packet drop.  The instantaneous queue size
   is likely CE codepoint than to see considerable variations even when the router does
   not experience persistent congestion.  As such, it a packet drop, this could result
   in unfair treatment for different flows.

   An additional goal is important that
   transient the end-systems should react to congestion
   at a router, reflected by the instantaneous
   queue size reaching a threshold much smaller than the capacity most once per window of the
   queue, not trigger a reaction data (i.e., at the transport layer.  Therefore, most once per round-trip
   time), to avoid reacting multiple times to multiple indications of
   congestion within a round-trip time.

   For a router, the CE bit should not codepoint of an ECN-Capable packet SHOULD only
   be set by a router based on the instantaneous queue
   size.

   For example, since if the ATM and Frame Relay mechanisms for congestion
   indication router would otherwise have typically been defined without dropped the packet as an associated notion
   indication of average queue size as congestion to the basis for determining that an intermedi-
   ate node is congested, we believe that they provide a very noisy sig-
   nal. The TCP-sender reaction specified in this document for ECN end nodes. When the router's buffer
   is
   NOT not yet full and the appropriate reaction for such router is prepared to drop a noisy signal packet to inform
   end nodes of congestion
   notification.  However, if incipient congestion, the routers that interface router should first check to
   see if the ATM net-
   work have a way ECT codepoint is set in that packet's IP header.  If so,
   then instead of maintaining dropping the average queue at packet, the interface,
   and use it to come to a reliable determination that router MAY instead set the ATM subnet is
   congested, they may use
   CE codepoint in the ECN notification that is defined here.

   We continue IP header.

   An environment where all end nodes were ECN-Capable could allow new
   criteria to encourage experiments in techniques at layer 2 (e.g.,
   in ATM switches or Frame Relay switches) be developed for setting the CE codepoint, and new
   congestion control mechanisms for end-node reaction to take advantage of ECN.
   For example, using CE packets.
   However, this is a scheme such research issue, and as RED (where packet marking such is
   based on the average queue length exceeding a threshold), layer 2
   devices could provide a reasonably reliable indication of congestion.
   When all the layer 2 devices not addressed in
   this document.

   When a path set CE packet (i.e., a packet that layer's own Conges-
   tion Experienced bit (e.g., the EFCI bit for ATM, the FECN bit in
   Frame Relay) in this reliable manner, then the interface router to
   the layer 2 network could copy has the state of that layer 2 Congestion
   Experienced bit into CE codepoint set) is
   received by a router, the CE bit in codepoint is left unchanged, and the IP header.  We recognize that
   this
   packet is not transmitted as usual. When severe congestion has occurred
   and the current practice, nor router's queue is it in current standards.
   However, encouraging experimentation in this manner may provide full, then the
   information needed to enable evolution of existing layer 2 mechanisms router has no choice but to provide a more reliable means of congestion indication,
   drop some packet when they
   use a single bit for indicating congestion.

5.2.  Dropped or Corrupted Packets

   For the proposed use for ECN in this document (that is, for a trans-
   port protocol new packet arrives.  We anticipate that such as TCP for which a dropped data
   packet is an indi-
   cation losses will become relatively infrequent when a majority of congestion), end nodes detect dropped data packets,
   end-systems become ECN-Capable and the participate in TCP or other
   compatible congestion response of control mechanisms. In an ECN-Capable
   environment that is adequately-provisioned, packet losses should
   occur primarily during transients or in the end nodes to presence of non-
   cooperating sources.

   The above discussion of when CE may be set instead of dropping a dropped data
   packet is at
   least as strong as the congestion response applies by default to all Differentiated Services Per-Hop
   Behaviors (PHBs) [RFC 2475].  Specifications for PHBs MAY provide
   more specifics on how a received CE packet.
   To ensure the reliable delivery of the congestion indication of the compliant implementation is to choose between
   setting CE bit, the ECT bit and dropping a packet, but this is NOT REQUIRED.  A router
   MUST NOT be set in CE instead of dropping a packet unless when the loss of drop that packet in the network would be detected
   occur is caused by reasons other than congestion or the desire to
   indicate incipient congestion to end nodes and
   interpreted as an indication of congestion.

   Transport protocols such as TCP do not necessarily detect all packet
   drops, such as the (e.g., a diffserv edge
   node may be configured to unconditionally drop certain classes of a "pure" ACK packet; for example, TCP does
   not reduce
   traffic to prevent them from entering its diffserv domain).

   We expect that routers will set the arrival rate of subsequent ACK packets CE codepoint in response to
   an earlier dropped ACK packet.  Any proposal for extending ECN-Capa-
   bility to such packets would have to address issues such
   incipient congestion as indicated by the case
   of an ACK packet that was marked with average queue size, using
   the CE bit but was later
   dropped RED algorithms suggested in [FJ93, RFC2309].  To the network. We believe that this aspect is still the sub-
   ject best of research, so this document specifies that at our
   knowledge, this time,
   "pure" ACK packets MUST NOT indicate ECN-Capability.

   Similarly, if a CE packet is dropped later the only proposal currently under discussion in
   the network due IETF for routers to cor-
   ruption (bit errors), drop packets proactively, before the end nodes should still invoke congestion
   control, just as TCP would today in response buffer
   overflows.  However, this document does not attempt to specify a dropped data
   packet. This issue of corrupted CE packets would have to be consid-
   ered in any proposal
   particular mechanism for the network to distinguish between packets
   dropped due to corruption, and packets dropped due active queue management, leaving that
   endeavor, if needed, to congestion or
   buffer overflow.  In particular, the ubiquitous deployment other areas of the IETF.  While ECN
   would not, in is
   inextricably tied up with the need to have a reasonable active queue
   management mechanism at the router, the reverse does not hold; active
   queue management mechanisms have been developed and deployed
   independent of itself, be a sufficient development to allow
   end-nodes to interpret ECN, using packet drops as indications of corruption
   rather than congestion.

6.  Support from the Transport Protocol

   ECN requires support from the transport protocol, congestion
   in addition to the
   functionality given by the absence of ECN field in the IP architecture.

5.1.  ECN as an Indication of Persistent Congestion

   We emphasize that a *single* packet header. The
   transport protocol might require negotiation between with the endpoints
   during setup CE codepoint set in an
   IP packet causes the transport layer to determine that all respond, in terms of
   congestion control, as it would to a packet drop.  The instantaneous
   queue size is likely to see considerable variations even when the endpoints are ECN-capable,
   so
   router does not experience persistent congestion.  As such, it is
   important that transient congestion at a router, reflected by the sender can set
   instantaneous queue size reaching a threshold much smaller than the ECT bit in transmitted packets.  Sec-
   ond,
   capacity of the queue, not trigger a reaction at the transport protocol must layer.
   Therefore, the CE codepoint should not be capable of reacting appropriately
   to set by a router based on
   the receipt instantaneous queue size.

   For example, since the ATM and Frame Relay mechanisms for congestion
   indication have typically been defined without an associated notion
   of CE packets.  This average queue size as the basis for determining that an
   intermediate node is congested, we believe that they provide a very
   noisy signal. The TCP-sender reaction could be specified in this document for
   ECN is NOT the form appropriate reaction for such a noisy signal of
   congestion notification.  However, if the data receiver informing routers that interface to
   the data sender ATM network have a way of maintaining the received CE packet
   (e.g., TCP), of average queue at the data receiver unsubscribing
   interface, and use it to come to a layered multi-
   cast group (e.g., RLM [MJV96]), or of some other action reliable determination that ulti-
   mately reduces the arrival rate of that flow on that congested link.

   This document only addresses
   ATM subnet is congested, they may use the addition of ECN Capability notification that is
   defined here.

   We continue to TCP,
   leaving issues of ECN encourage experiments in other transport protocols techniques at layer 2 (e.g.,
   in ATM switches or Frame Relay switches) to further
   research.  For TCP, ECN requires three new pieces take advantage of functionality:
   negotiation between the endpoints during connection setup to deter-
   mine if they are both ECN-capable; an ECN-Echo (ECE) flag in the TCP
   header so that the data receiver can inform the data sender when ECN.
   For example, using a CE scheme such as RED (where packet has been received; and a Congestion Window Reduced (CWR) flag
   in the TCP header so that marking is
   based on the data sender can inform average queue length exceeding a threshold), layer 2
   devices could provide a reasonably reliable indication of congestion.
   When all the data
   receiver layer 2 devices in a path set that layer's own
   Congestion Experienced codepoint (e.g., the congestion window has been reduced. The support
   required from other transport protocols is likely to be different,
   particularly EFCI bit for unreliable or ATM, the
   FECN bit in Frame Relay) in this reliable multicast transport proto-
   cols, and will have to be determined as other transport protocols are
   brought manner, then the interface
   router to the IETF for standardization.

6.1.  TCP

   The following sections describe in detail layer 2 network could copy the proposed use state of ECN in
   TCP.  This proposal is described in essentially that layer 2
   Congestion Experienced codepoint into the same form CE codepoint in
   [Floyd94]. the IP
   header.  We assume recognize that this is not the source TCP uses the standard congestion
   control algorithms of Slow-start, Fast Retransmit and Fast Recovery
   [RFC 2001].

   This proposal specifies two new flags current practice, nor is
   it in current standards. However, encouraging experimentation in this
   manner may provide the Reserved field information needed to enable evolution of the
   TCP header.  The TCP mechanism
   existing layer 2 mechanisms to provide a more reliable means of
   congestion indication, when they use a single bit for negotiating ECN-Capability uses indicating
   congestion.

5.2.  Dropped or Corrupted Packets

   For the ECN-Echo (ECE) flag proposed use for ECN in the this document (that is, for a
   transport protocol such as TCP header.  Bit 9 in for which a dropped data packet is an
   indication of congestion), end nodes detect dropped data packets, and
   the Reserved
   field congestion response of the TCP header end nodes to a dropped data packet is designated
   at least as strong as the ECN-Echo flag.  The
   location congestion response to a received CE
   packet.  To ensure the reliable delivery of the 6-bit Reserved field in congestion indication
   of the TCP header is shown CE codepoint, an ECT codepoint MUST NOT be set in
   Figure 3 of RFC 793 [RFC793] (and is reproduced below for complete-
   ness).  This specification a packet
   unless the loss of that packet in the ECN Field leaves network would be detected by
   the Reserved field end nodes and interpreted as an indication of congestion.

   Transport protocols such as a 4-bit field using bits 4-7.

   To enable the TCP receiver to determine when to stop setting do not necessarily detect all packet
   drops, such as the ECN-
   Echo flag, we introduce drop of a second new flag in the "pure" ACK packet; for example, TCP header, does
   not reduce the CWR
   flag.  The CWR flag is assigned arrival rate of subsequent ACK packets in response to
   an earlier dropped ACK packet.  Any proposal for extending ECN-
   Capability to such packets would have to address issues such as the
   case of an ACK packet that was marked with the CE codepoint but was
   later dropped in the network. We believe that this aspect is still
   the subject of research, so this document specifies that at this
   time, "pure" ACK packets MUST NOT indicate ECN-Capability.

   Similarly, if a CE packet is dropped later in the network due to
   corruption (bit errors), the end nodes should still invoke congestion
   control, just as TCP would today in response to a dropped data
   packet. This issue of corrupted CE packets would have to be
   considered in any proposal for the network to distinguish between
   packets dropped due to corruption, and packets dropped due to
   congestion or buffer overflow.  In particular, the ubiquitous
   deployment of ECN would not, in and of itself, be a sufficient
   development to allow end-nodes to interpret packet drops as
   indications of corruption rather than congestion.

5.3.  Fragmentation

   All ECN-capable packets SHOULD have the DF (Don't Fragment) bit set.
   Reassembly of a fragmented packet MUST NOT lose indications of
   congestion.  In other words, if any fragment of an IP packet to be
   reassembled has the CE codepoint set, then one of two actions MUST be
   taken:
      * The reassembled packet has the CE codepoint set.  This MUST NOT
      occur if any of the other fragments contributing to this
      reassembly carries the Not-ECT codepoint.
      * The packet is dropped instead of being reassmembled.
   If both actions are applicable, either MAY be chosen.  Reassembly of
   a fragmented packet MUST NOT change the ECN codepoint when all of the
   fragments carry the same codepoint.

   Situations may arise in which the above specification is
   insufficiently precise.  For example, it does not place requirements
   on reassembly of fragments that carry a mixture of ECT(0), ECT(1)
   and/or Not-ECT.  In situations where more precise reassembly behavior
   would be required, protocol specifications SHOULD instead specify
   that DF MUST be set in all packets sent by the protocol.

6.  Support from the Transport Protocol

   ECN requires support from the transport protocol, in addition to the
   functionality given by the ECN field in the IP packet header. The
   transport protocol might require negotiation between the endpoints
   during setup to determine that all of the endpoints are ECN-capable,
   so that the sender can set the ECT codepoint in transmitted packets.
   Second, the transport protocol must be capable of reacting
   appropriately to the receipt of CE packets.  This reaction could be
   in the form of the data receiver informing the data sender of the
   received CE packet (e.g., TCP), of the data receiver unsubscribing to
   a layered multicast group (e.g., RLM [MJV96]), or of some other
   action that ultimately reduces the arrival rate of that flow on that
   congested link.  CE packets indicate persistent rather than transient
   congestion (see Section 5.1), and hence reactions to the receipt of
   CE packets should be those appropriate for persistent congestion.

   This document only addresses the addition of ECN Capability to TCP,
   leaving issues of ECN in other transport protocols to further
   research.  For TCP, ECN requires three new pieces of functionality:
   negotiation between the endpoints during connection setup to
   determine if they are both ECN-capable; an ECN-Echo (ECE) flag in the
   TCP header so that the data receiver can inform the data sender when
   a CE packet has been received; and a Congestion Window Reduced (CWR)
   flag in the TCP header so that the data sender can inform the data
   receiver that the congestion window has been reduced. The support
   required from other transport protocols is likely to be different,
   particularly for unreliable or reliable multicast transport
   protocols, and will have to be determined as other transport
   protocols are brought to the IETF for standardization.

6.1.  TCP

   The following sections describe in detail the proposed use of ECN in
   TCP.  This proposal is described in essentially the same form in
   [Floyd94]. We assume that the source TCP uses the standard congestion
   control algorithms of Slow-start, Fast Retransmit and Fast Recovery
   [RFC 2001].

   This proposal specifies two new flags in the Reserved field of the
   TCP header.  The TCP mechanism for negotiating ECN-Capability uses
   the ECN-Echo (ECE) flag in the TCP header.  Bit 9 in the Reserved
   field of the TCP header is designated as the ECN-Echo flag.  The
   location of the 6-bit Reserved field in the TCP header is shown in
   Figure 4 of RFC 793 [RFC793] (and is reproduced below for
   completeness).  This specification of the ECN Field leaves the
   Reserved field as a 4-bit field using bits 4-7.

   To enable the TCP receiver to determine when to stop setting the ECN-
   Echo flag, we introduce a second new flag in the TCP header, the CWR
   flag.  The CWR flag is assigned to Bit 8 in the Reserved field of the
   TCP header.

         0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
       |               |                       | U | A | P | R | S | F |
       | Header Length |        Reserved       | R | C | S | S | Y | I |
       |               |                       | G | K | H | T | N | N |
       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

        Figure 2: 3: The old definition of bytes 13 and 14 of the TCP
   header.

         0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
       |               |               | C | E | U | A | P | R | S | F |
       | Header Length |    Reserved   | W | C | R | C | S | S | Y | I |
       |               |               | R | E | G | K | H | T | N | N |
       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

        Figure 3: 4: The new definition of bytes 13 and 14 of the TCP
   Header.

   Thus, ECN uses the ECT and CE flags in the IP header (as shown in
   Figure 1) for signaling between routers and connection endpoints, and
   uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure
   3)
   4) for TCP-endpoint to TCP-endpoint signaling.  For a TCP connection,
   a typical sequence of events in an ECN-based reaction to congestion
   is as follows:
      * The An ECT bit codepoint is set in packets transmitted by the sender to indi-
      cate
      indicate that ECN is supported by the transport entities for these
      packets.
      * An ECN-capable router detects impending congestion and detects
      that the an ECT bit codepoint is set in the packet it is about to drop.
      Instead of dropping the packet, the router chooses to set the CE
      bit
      codepoint in the IP header and forwards the packet.
      * The receiver receives the packet with the CE bit codepoint set, and
      sets the ECN-Echo flag in its next TCP ACK sent to the sender.
      * The sender receives the TCP ACK with ECN-Echo set, and reacts to
      the congestion as if a packet had been dropped.
      * The sender sets the CWR flag in the TCP header of the next
      packet sent to the receiver to acknowledge its receipt of and
      reaction to the ECN-Echo flag.

   The negotiation for using ECN by the TCP transport entities and the
   use of the ECN-Echo and CWR flags is described in more detail in the
   sections below.

6.1.1  TCP Initialization

   In the TCP connection setup phase, the source and destination TCPs
   exchange information about their willingness to use ECN.  Subsequent
   to the completion of this negotiation, the TCP sender sets the an ECT
   bit
   codepoint in the IP header of data packets to indicate to the network
   that the transport is capable and willing to participate in ECN for
   this packet. This indicates to the routers that they may mark this
   packet with the CE bit, codepoint, if they would like to use that as a
   method of con-
   gestion congestion notification. If the TCP connection does not
   wish to use ECN notification for a particular packet, the sending TCP
   sets the ECT
   bit equal ECN codepoint to 0 (i.e., not set), not-ECT, and the TCP receiver ignores the
   CE
   bit codepoint in the received packet.

   For this discussion, we designate the initiating host as Host A and
   the responding host as Host B.  We call a SYN packet with the ECE and
   CWR flags set an "ECN-setup SYN packet", and we call a SYN packet
   with at least one of the ECE and CWR flags not set a "non-ECN-setup
   SYN packet".  Similarly, we call a SYN-ACK packet with only the ECE
   flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and
   we call a SYN-ACK packet with any other configuration of the ECE and
   CWR flags a "non-ECN-setup SYN-ACK packet".

   Before a TCP connection can use ECN, Host A sends an ECN-setup SYN
   packet, and Host B sends an ECN-setup SYN-ACK packet.  For a SYN
   packet, the setting of both ECE and CWR in the ECN-setup SYN packet
   is defined as an indication that the sending TCP is ECN-Capable,
   rather than as an indication of congestion or of response to conges-
   tion.
   congestion. More precisely, an ECN-setup SYN packet indicates that
   the TCP implementation transmitting the SYN packet will participate
   in ECN as both a sender and receiver.  Specifically, as a receiver,
   it will respond to incoming data packets that have the CE bit codepoint
   set in the IP header by setting ECE in outgoing TCP Acknowledgement
   (ACK) packets.  As a sender, it will respond to incoming packets that
   have ECE set by reducing the congestion window and setting CWR when
   appropriate.  An ECN-setup SYN packet does not commit the TCP sender
   to setting the ECT bit codepoint in any or all of the packets it may
   transmit.  However, the commitment to respond appropriately to
   incoming packets with the CE
   bit codepoint set remains even if the TCP
   sender in a later transmission, within this TCP connection, sends a
   SYN packet without ECE and CWR set.

   When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag
   but not the CWR flag.  An ECN-setup SYN-ACK packet is defined as an
   indication that the TCP transmitting the SYN-ACK packet is ECN-Capa-
   ble. ECN-
   Capable.  As with the SYN packet, an ECN-setup SYN-ACK packet does
   not commit the TCP host to setting the ECT bit codepoint in transmitted
   packets.

   The following rules apply to the sending of ECN-setup packets:

   * If a host has received an ECN-setup SYN packet, then it MAY send an
   ECN-setup SYN-ACK packet.  Otherwise, it MUST NOT send an ECN-setup
   SYN-ACK packet.
   * A host MUST NOT set ECT on data packets unless it has sent at least
   one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at
   least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no
   non-ECN-setup SYN or non-ECN-setup SYN-ACK packet.  If a host has
   received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK
   packet, then it SHOULD NOT set ECT on data packets.
   * If a host ever sets the ECT bit codepoint on a data packet, then that
   host MUST correctly set/clear the CWR TCP bit on all subsequent
   packets in the connection.
   * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK
   packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN-
   ACK packet, then if that host receives TCP data packets with ECT and
   CE bits codepoints set in the IP header, then that host MUST process these pack-
   ets
   packets as specified for an ECN-capable connection.
   * A host that is not willing to use ECN on a TCP connection SHOULD
   clear both the ECE and CWR flags in all non-ECN-setup SYN and/or SYN-ACK SYN-
   ACK packets that it sends to indicate this unwillingness.  Receivers
   MUST correctly han-
   dle handle all forms of the non-ECN-setup SYN and SYN-ACK
   packets.
   * A host MUST NOT set ECT on SYN or SYN-ACK packets.

6.1.1.1.  Robust TCP Initialization with an Echoed Reserve Field

   There is the question of why we chose to have the TCP sending the SYN
   set two ECN-related flags in the Reserved field of the TCP header for
   the SYN packet, while the responding TCP sending the SYN-ACK sets
   only one ECN-related flag in the SYN-ACK packet.  This asymmetry is
   necessary for the robust negotiation of ECN-capability with some
   deployed TCP implementations.  There exists at least one faulty TCP
   implementation in which TCP receivers set the Reserved field of the
   TCP header in ACK packets (and hence the SYN-ACK) simply to reflect
   the Reserved field of the TCP header in the received data packet.
   Because the TCP SYN packet sets the ECN-Echo and CWR flags to indi-
   cate
   indicate ECN-capability, while the SYN-ACK packet sets only the ECN-Echo ECN-
   Echo flag, the sending TCP correctly interprets a receiver's
   reflection of its own flags in the Reserved field as an indication
   that the receiver is not ECN-capable.  The sending TCP is not mislead
   by a faulty TCP implementation sending a SYN-ACK packet that simply
   reflects the Reserved field of the incoming SYN packet.

6.1.2.  The TCP Sender

   For a TCP connection using ECN, new data packets are transmitted with
   the
   an ECT bit codepoint set in the IP header (set to header.  When only one ECT codepoint
   is needed by a sender for all packets sent on a "1"). TCP connection,
   ECT(0) SHOULD be used.  If the sender receives an ECN-Echo (ECE) ACK
   packet (that is, an ACK packet with the ECN-Echo flag set in the TCP
   header), then the sender knows that congestion was encountered in the
   network on the path from the sender to the receiver.  The indication
   of congestion should be treated just as a congestion loss in non-ECN-Capable non-ECN-
   Capable TCP. That is, the TCP source halves the congestion window
   "cwnd" and reduces the slow start threshold "ssthresh".  The sending
   TCP SHOULD NOT increase the con-
   gestion congestion window in response to the
   receipt of an ECN-Echo ACK packet.

   TCP should not react to congestion indications more than once every
   window of data (or more loosely, more than once every round-trip
   time). That is, the TCP sender's congestion window should be reduced
   only once in response to a series of dropped and/or CE packets from a
   single window of data.  In addition, the TCP source should not
   decrease the slow-start threshold, ssthresh, if it has been decreased
   within the last round trip time.  However, if any retransmitted pack-
   ets
   packets are dropped, then this is interpreted by the source TCP as a
   new instance of congestion.

   After the source TCP reduces its congestion window in response to a
   CE packet, incoming acknowledgements that continue to arrive can
   "clock out" outgoing packets as allowed by the reduced congestion
   window.  If the congestion window consists of only one MSS (maximum
   segment size), and the sending TCP receives an ECN-Echo ACK packet,
   then the sending TCP should in principle still reduce its congestion
   window in half. However, the value of the congestion window is
   bounded below by a value of one MSS.  If the sending TCP were to con-
   tinue
   continue to send, using a congestion window of 1 MSS, this results in
   the transmission of one packet per round-trip time.  It is necessary
   to still reduce the sending rate of the TCP sender even further, on
   receipt of an ECN-Echo packet when the congestion window is one.  We
   use the retransmit timer as a means of reducing the rate further in
   this circumstance.  Therefore, the sending TCP MUST reset the
   retransmit timer on receiving the ECN-Echo packet when the congestion
   window is one.  The sending TCP will then be able to send a new
   packet only when the retransmit timer expires.

   When an ECN-Capable TCP sender reduces its congestion window for any
   reason (because of a retransmit timeout, a Fast Retransmit, or in
   response to an ECN Notification), the TCP sender sets the CWR flag in
   the TCP header of the first new data packet sent after the window
   reduction.  If that data packet is dropped in the network, then the
   sending TCP will have to reduce the congestion window again and
   retransmit the dropped packet.

   We ensure that the "Congestion Window Reduced" information is reli-
   ably
   reliably delivered to the TCP receiver.  This comes about from the
   fact that if the new data packet carrying the CWR flag is dropped,
   then the TCP sender will have to again reduce its congestion window,
   and send another new data packet with the CWR flag set.  Thus, the
   CWR bit in the TCP header SHOULD NOT be set on retransmitted packets.
   When the TCP data sender is ready to set the CWR bit after reducing
   the congestion window, it SHOULD set the CWR bit only on the first
   new data packet that it transmits.

   [Floyd94] discusses TCP's response to ECN in more detail.  [Floyd98]
   discusses the validation test in the ns simulator, which illustrates
   a wide range of ECN scenarios. These scenarios include the following:
   an ECN followed by another ECN, a Fast Retransmit, or a Retransmit
   Timeout; a Retransmit Timeout or a Fast Retransmit followed by an
   ECN; and a congestion window of one packet followed by an ECN.

   TCP follows existing algorithms for sending data packets in response
   to incoming ACKs, multiple duplicate acknowledgements, or retransmit
   timeouts [RFC2581].  TCP also follows the normal procedures for
   increasing the congestion window when it receives ACK packets without
   the ECN-Echo bit set [RFC2581].

6.1.3.  The TCP Receiver

   When TCP receives a CE data packet at the destination end-system, the
   TCP data receiver sets the ECN-Echo flag in the TCP header of the
   subsequent ACK packet.  If there is any ACK withholding implemented,
   as in current "delayed-ACK" TCP implementations where the TCP
   receiver can send an ACK for two arriving data packets, then the ECN-
   Echo flag in the ACK packet will be set to the OR of '1' if the CE bits of
   all codepoint is
   set in any of the data packets being acknowledged.  That is, if any
   of the received data packets are CE packets, then the returning ACK
   has the ECN-Echo flag set.

   To provide robustness against the possibility of a dropped ACK packet
   carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in
   a series of ACK packets sent subsequently.  The TCP receiver uses the
   CWR flag received from the TCP sender to determine when to stop set-
   ting
   setting the ECN-Echo flag.

   After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
   that TCP receiver continues to set the ECN-Echo flag in all the ACK
   packets it sends (whether they acknowledge CE data packets or non-CE
   data packets) until it receives a CWR packet (a packet with the CWR
   flag set).  After the receipt of the CWR packet, acknowledgements for
   subsequent non-CE data packets do not have the ECN-Echo flag set. If
   another CE packet is received by the data receiver, the receiver
   would once again send ACK packets with the ECN-Echo flag set.  While
   the receipt of a CWR packet does not guarantee that the data sender
   received the ECN-Echo message, this does suggest that the data sender
   reduced its congestion window at some point *after* it sent the data
   packet for which the CE bit codepoint was set.

   We have already specified that a TCP sender is not required to reduce
   its congestion window more than once per window of data.  Some care
   is required if the TCP sender is to avoid unnecessary reductions of
   the congestion window when a window of data includes both dropped
   packets and (marked) CE packets.  This is illustrated in [Floyd98].

6.1.4.  Congestion on the ACK-path

   For the current generation of TCP congestion control algorithms, pure
   acknowledgement packets (e.g., packets that do not contain any accom-
   panying
   accompanying data) should be sent with the ECT bit off. not-ECT codepoint.
   Current TCP receivers have no mechanisms for reducing traffic on the
   ACK-path in response to congestion notification.  Mechanisms for
   responding to congestion on the ACK-path are areas for current and
   future research.  (One simple possibility would be for the sender to
   reduce its conges-
   tion congestion window when it receives a pure ACK packet with
   the CE bit codepoint set). For current TCP implementations, a single
   dropped ACK generally has only a very small effect on the TCP's
   sending rate.

6.1.5.  Retransmitted TCP packets

   This document specifies that for ECN-capable TCP implementations, the implementations MUST NOT set
   either ECT bit (ECN-Capable Transport) codepoint (ECT(0) or ECT(1)) in the IP header MUST NOT be set on for
   retransmitted data packets, and that the TCP data receiver SHOULD
   ignore the ECN field on arriving data packets that are outside of the
   receiver's current window.  This is for greater security against
   denial-of-service attacks, as well as for robustness of the ECN con-
   gestion
   congestion indication with packets that are dropped later in the net-
   work.
   network.

   First, we note that if the TCP sender were to set the an ECT bit codepoint on
   a retransmitted packet, then if an unnecessarily-retransmitted packet
   was later dropped in the network, the end nodes would never receive
   the indication of congestion from the router setting the CE bit.
   codepoint.  Thus, setting the an ECT bit codepoint on retransmitted data
   packets is not con-
   sistent consistent with the robust delivery of the congestion
   indication even for packets that are later dropped in the network.

   In addition, an attacker capable of spoofing the IP source address of
   the TCP sender could send data packets with arbitrary sequence num-
   bers,
   numbers, with both the ECT and CE bits codepoint set in the IP header.  On receiv-
   ing receiving
   this spoofed data packet, the TCP data receiver would determine that
   the data does not lie in the current receive window, and return a
   duplicate acknowledgement.  We define an out-of-window packet at the
   TCP data receiver as a data packet that lies outside the receiver's
   current window.  On receiving an out-of-window packet, the TCP data
   receiver has to decide whether or not to treat the CE bit codepoint in
   the packet header as a valid indication of congestion, and therefore
   whether to return ECN-Echo indications to the TCP data sender.  If
   the TCP data receiver ignored the CE bit codepoint in an out-of-window
   packet, then the TCP data sender would not receive this possibly-legitimate possibly-
   legitimate indication of congestion from the network, resulting in a
   violation of end-to-end congestion control.  On the other hand, if
   the TCP data receiver honors the CE indication in the out-of-window
   packet, and reports the indication of congestion to the TCP data
   sender, then the malicious node that created the spoofed, out-of-window out-of-
   window packet has successfully "attacked" the TCP connection by
   forcing the data sender to unnecessarily reduce (halve) its
   congestion window.  To prevent such a denial-of-service attack, we
   specify that a legitimate TCP data sender MUST NOT set the an ECT bit
   codepoint on retransmitted data packets, and that the TCP data
   receiver SHOULD ignore the CE bit codepoint on out-of-
   window out-of-window packets.

   One drawback of not setting ECT ECT(0) or ECT(1) on retransmitted packets
   is that it denies ECN protection for retransmitted packets.  However,
   for an ECN-capable TCP connection in a fully-ECN-capable environment
   with mild conges-
   tion, congestion, packets should rarely be dropped due to
   congestion in the first place, and so instances of retransmitted
   packets should rarely arise.  If packets are being retransmitted,
   then there are already packet losses (from corruption or from
   congestion) that ECN has been unable to prevent.

   We note that if the router sets the CE bit codepoint for an ECN-capable
   data packet within a TCP connection, then the TCP connection is
   guaranteed to receive that indication of congestion, or to receive
   some other indication of congestion within the same window of data,
   even if this packet is dropped or reordered in the network.  We
   consider two cases, when the packet is later retransmitted, and when
   the packet is not later retransmitted.

   In the first case, if the packet is either dropped or delayed, and at
   some point retransmitted by the data sender, then the retransmission
   is a result of a Fast Retransmit or a Retransmit Timeout for either
   that packet or for some prior packet in the same window of data.  In
   this case, because the data sender already has retransmitted this
   packet, we know that the data sender has already responded to an
   indication of congestion for some packet within the same window of
   data as the original packet.  Thus, even if the first transmission of
   the packet is dropped in the network, or is delayed, if it had the CE
   bit
   codepoint set, and is later ignored by the data receiver as an out-of-win-
   dow out-
   of-window packet, this is not a problem, because the sender has
   already responded to an indication of congestion for that window of
   data.

   In the second case, if the packet is never retransmitted by the data
   sender, then this data packet is the only copy of this data received
   by the data receiver, and therefore arrives at the data receiver as
   an in-window packet, regardless of how much the packet might be
   delayed or reordered.  In this case, if the CE bit codepoint is set on
   the packet within the network, this will be treated by the data
   receiver as a valid indication of congestion.

6.1.6.  TCP Window Probes.

   When the TCP data receiver advertises a zero window, the TCP data
   sender sends window probes to determine if the receiver's window has
   increased.  Window probe packets do not contain any user data except
   for the sequence number, which is a byte.  If a window probe packet
   is dropped in the network, this loss is not detected by the receiver.
   Therefore, the TCP data sender MUST NOT set either the an ECT codepoint
   or the CWR
   bits bit on window probe packets.

   However, because window probes use exact sequence numbers, they can-
   not
   cannot be easily spoofed in denial-of-service attacks.  Therefore, if
   a window probe arrives with ECT and the CE codepoint set, then the receiver
   SHOULD respond to the ECN indications.

7.  Non-compliance by the End Nodes

   This section discusses concerns about the vulnerability of ECN to
   non-compliant end-nodes (i.e., end nodes that set the ECT bit codepoint
   in transmitted packets but do not respond to received CE packets).
   We argue that the addition of ECN to the IP architecture will not sig-
   nificantly
   significantly increase the current vulnerability of the architecture
   to unresponsive flows.

   Even for non-ECN environments, there are serious concerns about the
   damage that can be done by non-compliant or unresponsive flows (that
   is, flows that do not respond to congestion control indications by
   reducing their arrival rate at the congested link).  For example, an
   end-node could "turn off congestion control" by not reducing its con-
   gestion
   congestion window in response to packet drops. This is a concern for
   the current Internet.  It has been argued that routers will have to
   deploy mechanisms to detect and differentially treat packets from
   non-compliant flows [RFC2309,FF99].  It has also been suggested that
   techniques such as end-to-end per-flow scheduling and isolation of
   one flow from another, differentiated services, or end-to-end reser-
   vations
   reservations could remove some of the more damaging effects of unrespon-
   sive
   unresponsive flows.

   It might seem that dropping packets in itself is an adequate deter-
   rent
   deterrent for non-compliance, and that the use of ECN removes this deter-
   rent.
   deterrent.  We would argue in response that (1) ECN-capable routers pre-
   serve
   preserve packet-dropping behavior in times of high congestion; and
   (2) even in times of high congestion, dropping packets in itself is
   not an adequate deterrent for non-compliance.

   First, ECN-Capable routers will only mark packets (as opposed to
   dropping them) when the packet marking rate is reasonably low. During
   periods where the average queue size exceeds an upper threshold, and
   therefore the potential packet marking rate would be high, our recom-
   mendation
   recommendation is that routers drop packets rather then set the CE bit
   codepoint in packet headers.

   During the periods of low or moderate packet marking rates when ECN
   would be deployed, there would be little deterrent effect on unre-
   sponsive
   unresponsive flows of dropping rather than marking those packets. For
   example, delay-insensitive flows using reliable delivery might have
   an incentive to increase rather than to decrease their sending rate
   in the presence of dropped packets.  Similarly, delay-sensitive flows
   using unreliable delivery might increase their use of FEC in response
   to an increased packet drop rate, increasing rather than decreasing
   their sending rate.  For the same reasons, we do not believe that
   packet dropping itself is an effective deterrent for non-compliance
   even in an environment of high packet drop rates, when all flows are
   sharing the same packet drop rate.

   Several methods have been proposed to identify and restrict non-com-
   pliant non-
   compliant or unresponsive flows. The addition of ECN to the network
   environment would not in any way increase the difficulty of designing
   and deploying such mechanisms. If anything, the addition of ECN to
   the architecture would make the job of identifying unresponsive flows
   slightly easier.  For example, in an ECN-Capable environment routers
   are not limited to information about packets that are dropped or have
   the CE bit codepoint set at that router itself; in such an environment,
   routers could also take note of arriving CE packets that indicate
   congestion encountered by that packet earlier in the path.

8.  Non-compliance in the Network

   This section considers the issues when a router is operating, possi-
   bly
   possibly maliciously, to modify either of the bits in the ECN field.  In
   this section we represent the ECN field in the IP header by the tuple
   (ECT bit, CE bit).

   By tampering with the bits in the ECN field, an adversary (or a bro-
   ken
   broken router) could do one or more of the following: falsely report
   congestion, disable ECN-Capability for an individual packet, erase
   the ECN congestion indication, or falsely indicate ECN-Capability.
   Section 18 systematically examines the various cases by which the ECN
   field could be modified.  The important criterion considered in
   determining the consequences of such modifications is whether it is
   likely to lead to poorer behavior in any dimension (throughput,
   delay, fairness or functionality) than if a router were to drop a
   packet.

   The first two possible changes, falsely reporting congestion or dis-
   abling
   disabling ECN-Capability for an individual packet, are no worse than
   if the router were to simply drop the packet.  From a congestion
   control point of view, setting the CE bit codepoint in the absence of
   congestion by a non-compliant router would be no worse than a router
   dropping a packet unnecessarily. By "erasing" the an ECT bit codepoint of a
   packet that is later dropped in the network, a router's actions could
   result in an unnecessary packet drop for that packet later in the
   network.

   However, as discussed in Section 18, a router that erases the ECN
   congestion indication or falsely indicates ECN-Capability could
   potentially do more damage to the flow that if it has simply dropped
   the packet.  A rogue or broken router that "erased" the CE bit codepoint
   in arriving CE packets would prevent that indication of congestion
   from reaching downstream receivers.  This could result in the failure
   of congestion control for that flow and a resulting increase in conges-
   tion
   congestion in the network, ultimately resulting in subsequent packets
   dropped for this flow as the average queue size increased at the con-
   gested
   congested gateway.

   Section 19 considers the potential repercussions of subverting end-
   to-end congestion control by either falsely indicating ECN-Capabil-
   ity, ECN-
   Capability, or by erasing the congestion indication in ECN (the CE-bit). CE-
   codepoint).  We observe in Section 19 that the consequence of
   subverting ECN-based congestion control may lead to potential
   unfairness, but this is likely to be no worse than the subversion of
   either ECN-based or packet-based congestion control by the end nodes.

8.1.  Complications Introduced by Split Paths

   If a router or other network element has access to all of the packets
   of a flow, then that router could do no more damage to a flow by
   altering the ECN field than it could by simply dropping all of the
   packets from that flow.  However, in some cases, a malicious or bro-
   ken
   broken router might have access to only a subset of the packets from
   a flow.  The question is as follows:  can this router, by altering
   the ECN field in this subset of the packets, do more damage to that
   flow than if it has simply dropped that set of the packets?

   This is also discussed in detail in Section 18, which conclude as
   follows:  It is true that the adversary that has access only to a
   subset of packets in an aggregate might, by subverting ECN-based con-
   gestion
   congestion control, be able to deny the benefits of ECN to the other
   packets in the aggregate.  While this is undesirable, this is not a
   sufficient concern to result in disabling ECN.

9.  Encapsulated Packets

9.1.  IP packets encapsulated in IP

   The encapsulation of IP packet headers in tunnels is used in many
   places, including IPsec and IP in IP [RFC2003].  This section consid-
   ers
   considers issues related to interactions between ECN and IP tunnels,
   and specifies two alternative solutions.  This discussion is
   complemented by RFC 2983's discussion of interactions between
   Differentiated Ser-
   vices Services and IP tunnels of various forms [RFC 2983],
   as Differentiated Services uses the remaining six bits of the IP
   header octet that is used by ECN (see Figure 1 2 in Section 5).

   Some IP tunnel modes are based on adding a new "outer" IP header that
   encapsulates the original, or "inner" IP header and its associated
   packet.  In many cases, the new "outer" IP header may be added and
   removed at intermediate points along a connection, enabling the net-
   work
   network to establish a tunnel without requiring endpoint
   participation.  We denote tunnels that specify that the outer header
   be discarded at tunnel egress as "simple tunnels".

   ECN uses the ECT and CE flags ECN field in the IP header for signaling between routers
   and connection endpoints.  ECN interacts with IP tunnels based on the
   treatment of these flags the ECN field in the IP header.  In simple IP tunnels
   the octet containing these flags the ECN field is copied or mapped from the inner
   IP header to the outer IP header at IP tunnel ingress, and the outer
   header's copy of this field is discarded at IP tunnel egress.  If the
   outer header were to be simply discarded without taking care to deal
   with the ECN related flags, field, and an ECN-capable router were to set the CE
   (Congestion Experienced) bit codepoint within a packet in a simple IP
   tunnel, this indication would be discarded at tunnel egress, los-
   ing losing
   the indication of congestion.

   Thus, the use of ECN over simple IP tunnels would result in routers
   attempting to use the outer IP header to signal congestion to end-
   points,
   endpoints, but those congestion warnings never arriving because the
   outer header is discarded at the tunnel egress point.  This problem
   was encountered with ECN and IPsec in tunnel mode, and RFC 2481 rec-
   ommended
   recommended that ECN not be used with the older simple IPsec tunnels
   in order to avoid this behavior and its consequences.  When ECN
   becomes widely deployed, then simple tunnels likely to carry ECN-capable ECN-
   capable traffic will have to be changed.

   From a security point of view, the use of ECN in the outer header of
   an IP tunnel might raise security concerns because an adversary could
   tamper with the ECN information that propagates beyond the tunnel
   endpoint.  Based on an analysis in Sections 18 and 19 of these con-
   cerns
   concerns and the resultant risks, our overall approach is to make sup-
   port
   support for ECN an option for IP tunnels, so that an IP tunnel can be
   specified or configured either to use ECN or not to use ECN in the
   outer header of the tunnel.  Thus, in environments or tunneling pro-
   tocols
   protocols where the risks of using ECN are judged to outweigh its bene-
   fits,
   benefits, the tunnel can simply not use ECN in the outer header.
   Then the only indication of congestion experienced at routers within
   the tunnel would be through packet loss.

   The result is that there are two viable options for the behavior of
   ECN-capable connections over an IP tunnel, especially IPsec tunnels:
      * A limited-functionality option in which ECN is preserved in the
      inner header, but disabled in the outer header.  The only mecha-
      nism
      mechanism available for signaling congestion occurring within the tun-
      nel
      tunnel in this case is dropped packets.
      * A full-functionality option that supports ECN in both the inner
      and outer headers, and propagates congestion warnings from nodes
      within the tunnel to endpoints.

   Support for these options requires varying amounts of changes to IP
   header processing at tunnel ingress and egress.  A small subset of
   these changes sufficient to support only the limited-functionality
   option would be sufficient to eliminate any incompatibility between
   ECN and IP tunnels.

   One goal of this document is to give guidance about the tradeoffs
   between the limited-functionality and full-functionality options.  A
   full discussion of the potential effects of an adversary's modifica-
   tions
   modifications of the CE and ECT bits ECN field is given in Sections 18 and 19.

9.1.1.  The Limited-functionality and Full-functionality Options

   The limited-functionality option for ECN encapsulation in IP tunnels
   is for the ECT bit non-ECT codepoint to be set in the outside (encapsulating)
   header to be off
   (i.e., set to 0), regardless of the value of the ECT bit ECN field in the inside
   (encapsulated) header.  With this option, the ECN field in the inner
   header is not altered upon de-capsulation.  The disadvantage of this
   approach is that the flow does not have ECN support for that part of
   the path that is using IP tunneling, even if the encapsulated packet
   (from the original TCP sender) is ECN-Capable.  That is, if the
   encapsulated packet arrives at a congested router that is ECN-
   capable, and the router can decide to drop or mark the packet as an
   indication of congestion to the end nodes, the router will not be
   permitted to set the CE bit codepoint in the packet header, but instead
   will have to drop the packet.

   The full-functionality option for ECN encapsulation is to copy the
   ECT bit
   ECN codepoint of the inside header to the outside header on encapsulation,
   encapsulation if the inside header is not-ECT or ECT, and to OR set the CE bit from
   ECN codepoint of the outer outside header with to ECT(0) if the CE bit ECN codepoint of
   the inside header is CE.  On decapsulation, if the CE codepoint is
   set on decapsulation. the outside header, then the CE codepoint is also set in the
   inner header.  Otherwise, the ECN codepoint on the inner header is
   left unchanged.  That is, for full ECN support the encapsulation and
   decapsulation processing involves the following:  At tunnel ingress,
   the full-functionality option copies sets the value of
   ECT (bit 6) ECN codepoint in the outer
   header.  If the ECN codepoint in the inner header is not-ECT or ECT,
   then it is copied to the ECN codepoint in the outer header.  CE (bit 7)  If the
   ECN codepoint in the inner header is
   set to 0 CE, then the ECN codepoint in
   the outer header. header is set to ECT(0).  Upon decapsulation at the tunnel
   egress, the full-functionality option sets CE to 1 in the inner
   header if the value of ECT (bit 6) sets the CE codepoint in the
   inner header is 1, and if the
   value of CE (bit 7) codepoint is set in the outer header is 1. header.
   Otherwise, no change is made to this field of the inner header.

   With the full-functionality option, a flow can take advantage of ECN
   in those parts of the path that might use IP tunneling.  The disad-
   vantage
   disadvantage of the full-functionality option from a security
   perspective is that the IP tunnel cannot protect the flow from
   certain modifica-
   tions modifications to the ECN bits in the IP header within the
   tunnel.  The poten-
   tial potential dangers from modifications to the ECN bits in
   the IP header are described in detail in Sections 18 and 19.

      (1) An IP tunnel MUST modify the handling of the DS field octet at
      IP tunnel endpoints by implementing either the limited-functional-
      ity limited-
      functionality or the full-functionality option.
      (2) Optionally, an IP tunnel MAY enable the endpoints of an IP
      tunnel to negotiate the choice between the limited-functionality
      and the full-functionality option for ECN in the tunnel.

   The minimum required to make ECN usable with IP tunnels is the lim-
   ited-functionality
   limited-functionality option, which prevents ECN from being enabled
   in the outer header of an IPsec tunnel.  Full support for ECN
   requires the use of the full-functionality option.  If there are no
   optional mechanisms for the tunnel endpoints to negotiate a choice
   between the limited-functionality or full-functionality option, there
   can be a pre-existing agreement between the tunnel endpoints about
   whether to support the limited-functionality or the full-functionality full-
   functionality ECN option.

   In addition, it is RECOMMENDED that packets with ECT and the CE both set
   to 1 codepoint in
   the outer header be dropped if they arrive at the tunnel egress point
   for a tunnel that uses the limited-functionality option, or for a
   tunnel that uses the full-functionality option but for which the not-
   ECT bit codepoint is set in the inner header is set to zero. header.  This is motivated by
   backwards compatibility and to ensure that no unauthorized modifica-
   tions
   modifications of the ECN field take place, and is discussed further
   in the next Section (9.1.2).

9.1.2.  Changes to the ECN Field within an IP Tunnel.

   The presence of a copy of the ECN field in the inner header of an IP
   tunnel mode packet provides an opportunity for detection of unautho-
   rized
   unauthorized modifications to the ECT bit ECN field in the outer header.
   Comparison of the ECT bits fields in the inner and outer headers falls
   into two cate-
   gories categories for implementations that conform to this
   document:
      * If the IP tunnel uses the full-functionality option, then the
      values of
      not-ECT codepoint should be set in the ECT bits outer header if and only if
      it is also set in the inner and outer headers should be
      identical. header.
      * If the tunnel uses the limited-functionality option, then the
      ECT bit
      not-ECT codepoint should be set in the outer header should be 0. header.

   Receipt of a packet not satisfying the appropriate condition could be
   a cause of concern.

   Consider the case of an IP tunnel where the tunnel ingress point has
   not been updated to this document's requirements, while the tunnel
   egress point has been updated to support ECN.  In this case, the IP
   tunnel is not explicitly configured to support the full-functionality
   ECN option. However, the tunnel ingress point is behaving identically
   to a tunnel ingress point that supports the full-functionality
   option.  If packets from an ECN-capable connection use this tunnel,
   the ECT codepoint will be set to 1 in the outer header at the tunnel
   ingress point.  Congestion within the tunnel may then result in ECN-capable ECN-
   capable routers setting CE in the outer header.  Because the tunnel
   has not been explicitly configured to support the full-functionality
   option, the tunnel egress point expects the ECT bit not-ECT codepoint to be
   set in the outer header to be 0. header.  When an ECN-capable tunnel egress point
   receives a packet with the ECT bit or CE codepoint in the outer header set to 1, header,
   in a tunnel that has not been configured to support the full-functionality full-
   functionality option, that packet should be processed, according to
   whether the CE bit codepoint was set, as follows.  It is RECOMMENDED that such packets, with the ECT bit in the outer
   header set to 1
   on a tunnel that has not been configured to support the full-functionality full-
   functionality option, packets should be dropped at the egress point
   if the CE codepoint is set to 1 in the outer header but 0 not in the inner
   header, and for-
   warded should be forwarded otherwise.

   An IP tunnel cannot provide protection against erasure of congestion
   indications based on resetting the value of changing the ECN codepoint from CE bit in packets for
   which ECT is set in the outer header. to ECT.  The
   erasure of congestion indications may impact the network and other
   flows in ways that would not be possible in the absence of ECN.  It
   is important to note that erasure of congestion indications can only
   be performed to congestion indications placed by nodes within the
   tunnel; the copy of the CE bit ECN field in the inner header preserves
   congestion notifications from nodes upstream of the tunnel ingress. ingress
   (unless the inner header is also erased).  If erasure of congestion notifica-
   tions
   notifications is judged to be a security risk that exceeds the
   congestion management benefits of ECN, then tunnels could be
   specified or con-
   figured configured to use the limited-functionality option.

9.2.  IPsec Tunnels

   IPsec supports secure communication over potentially insecure network
   components such as intermediate routers.  IPsec protocols support two
   operating modes, transport mode and tunnel mode, that span a wide
   range of security requirements and operating environments.  Transport
   mode security protocol header(s) are inserted between the IP (IPv4 or
   IPv6) header and higher layer protocol headers (e.g., TCP), and hence
   transport mode can only be used for end-to-end security on a connec-
   tion.
   connection.  IPsec tunnel mode is based on adding a new "outer" IP
   header that encapsulates the original, or "inner" IP header and its associ-
   ated
   associated packet.  Tunnel mode security headers are inserted between
   these two IP headers.  In contrast to transport mode, the new "outer"
   IP header and tunnel mode security headers can be added and removed
   at intermediate points along a connection, enabling security gateways
   to secure vulnerable portions of a connection without requiring
   endpoint participation in the security protocols.  An important
   aspect of tun-
   nel tunnel mode security is that in the original specification,
   the outer header is discarded at tunnel egress, ensuring that
   security threats based on modifying the IP header do not propagate
   beyond that tunnel endpoint.  Further discussion of IPsec can be
   found in [RFC2401].

   The IPsec protocol as originally defined in [ESP, AH] required that
   the inner header's ECN field not be changed by IPsec decapsulation
   processing at a tunnel egress node; this would have ruled out the
   possibility of full-functionality mode for ECN.  At the same time,
   this would ensure that an adversary's modifications to the ECN field
   cannot be used to launch theft- or denial-of-service attacks across
   an IPsec tunnel endpoint, as any such modifications will be discarded
   at the tunnel endpoint.

   In principle, permitting the use of ECN functionality in the outer
   header of an IPsec tunnel raises security concerns because an adver-
   sary
   adversary could tamper with the information that propagates beyond
   the tunnel endpoint.  Based on an analysis (included in Sections 18
   and 19) of these concerns and the associated risks, our overall
   approach has been to provide configuration support for IPsec changes
   to remove the conflict with ECN.

   In particular, in tunnel mode the IPsec tunnel MUST support either
   the limited-functionality or the full-functionality mode outlined in
   Section 9.1.1.

   This makes permission to use ECN functionality in the outer header of
   an IPsec tunnel a configurable part of the corresponding IPsec Secu-
   rity
   Security Association (SA), so that it can be disabled in situations
   where the risks are judged to outweigh the benefits.  The result is
   that an IPsec security administrator is presented with two
   alternatives for the behavior of ECN-capable connections within an
   IPsec tunnel, the limited-functionality alternative and full-functionality full-
   functionality alternative described earlier.  All IPsec
   implementations MUST implement either the limited-functionality or
   the full-functionality alternative in order to eliminate
   incompatibility between ECN and IPsec tunnels, but implementers MAY
   choose to implement either alternative.

   In addition, this document specifies how the endpoints of an IPsec
   tunnel could negotiate enabling ECN functionality in the outer head-
   ers
   headers of that tunnel based on security policy.  The ability to negoti-
   ate
   negotiate ECN usage between tunnel endpoints would enable a security admin-
   istrator
   administrator to disable ECN in situations where she believes the
   risks (e.g., of lost congestion notifications) outweigh the benefits
   of ECN.

   The IPsec protocol, as defined in [ESP, AH], does not include the IP
   header's ECN field in any of its cryptographic calculations (in the
   case of tunnel mode, the outer IP header's ECN field is not
   included).  Hence modification of the ECN field by a network node has
   no effect on IPsec's end-to-end security, because it cannot cause any
   IPsec integrity check to fail.  As a consequence, IPsec does not pro-
   vide
   provide any defense against an adversary's modification of the ECN
   field (i.e., a man-in-the-middle attack), as the adversary's
   modification will also have no effect on IPsec's end-to-end security.
   In some environments, the ability to modify the ECN field without
   affecting IPsec integrity checks may constitute a covert channel; if
   it is nec-
   essary necessary to eliminate such a channel or reduce its bandwidth,
   then the IPsec tunnel should be run in limited-functionality mode.

9.2.1.  Negotiation between Tunnel Endpoints

   This section describes the detailed changes to enable usage of ECN
   over IPsec tunnels, including the negotiation of ECN support between
   tunnel endpoints.  This is supported by three changes to IPsec:
      * An optional Security Association Database (SAD) field indicating
      whether tunnel encapsulation and decapsulation processing allows
      or forbids ECN usage in the outer IP header.
      * An optional Security Association Attribute that enables negotia-
      tion
      negotiation of this SAD field between the two endpoints of an SA
      that supports tunnel mode.
      * Changes to tunnel mode encapsulation and decapsulation process-
      ing
      processing to allow or forbid ECN usage in the outer IP header
      based on the value of the SAD field.  When ECN usage is allowed in
      the outer IP header, the ECT codepoint is set in the outer header
      for ECN-capable connections and congestion notifications
      (indicated by the CE bit) codepoint) from such connections are
      propagated to the inner header at tunnel egress.

   If negotiation of ECN usage is implemented, then the SAD field SHOULD
   also be implemented.  On the other hand, negotiation of ECN usage is
   OPTIONAL in all cases, even for implementations that support the SAD
   field.  The encapsulation and decapsulation processing changes are
   REQUIRED, but MAY be implemented without the other two changes by
   assuming that ECN usage is always forbidden.  The full-functionality
   alternative for ECN usage over IPsec tunnels consists of the SAD
   field and the full version of encapsulation and decapsulation pro-
   cessing
   processing changes, with or without the OPTIONAL negotiation support.
   The limited-functionality alternative consists of a subset of the
   encapsulation and decapsulation changes that always forbids ECN
   usage.

   These changes are covered further in the following three subsections.

9.2.1.1.  ECN Tunnel Security Association Database Field

   Full ECN functionality adds a new field to the SAD (see [RFC2401]):

      ECN Tunnel: allowed or forbidden.

      Indicates whether ECN-capable connections using this SA in tunnel
      mode are permitted to receive ECN congestion notifications for
      congestion occurring within the tunnel.  The allowed value enables
      ECN congestion notifications.  The forbidden value disables such
      notifications, causing all congestion to be indicated via dropped
      packets.

      [OPTIONAL.  The value of this field SHOULD be assumed to be
      "forbidden" in implementations that do not support it.]

   If this attribute is implemented, then the SA specification in a
   Security Policy Database (SPD) entry MUST support a corresponding
   attribute, and this SPD attribute MUST be covered by the SPD adminis-
   trative
   administrative interface (currently described in Section 4.4.1 of
   [RFC2401]).

9.2.1.2.  ECN Tunnel Security Association Attribute

   A new IPsec Security Association Attribute is defined to enable the
   support for ECN congestion notifications based on the outer IP header
   to be negotiated for IPsec tunnels (see [RFC2407]).  This attribute
   is OPTIONAL, although implementations that support it SHOULD also
   support the SAD field defined in Section 9.2.1.1.

   Attribute Type

           class               value           type
     -------------------------------------------------
     ECN Tunnel                 10             Basic

   The IPsec SA Attribute value 10 has been allocated by IANA to indi-
   cate
   indicate that the ECN Tunnel SA Attribute is being negotiated; the
   type of this attribute is Basic (see Section 4.5 of [RFC2407]).  The
   Class Values are used to conduct the negotiation.  See [RFC2407,
   RFC2408, RFC2409] for further information including encoding formats
   and requirements for negotiating this SA attribute.

   Class Values

     ECN Tunnel

       Specifies whether ECN functionality is allowed to
       be used with Tunnel Encapsulation Mode.
       This affects tunnel encapsulation and decapsulation processing -
       see Section 9.2.1.3.

       RESERVED          0
       Allowed           1
       Forbidden         2
       Values 3-61439 are reserved to IANA.  Values 61440-65535 are for
       private use.

       If unspecified, the default shall be assumed to be Forbidden.

   ECN Tunnel is a new SA attribute, and hence initiators that use it
   can expect to encounter responders that do not understand it, and
   therefore reject proposals containing it.  For backwards compatibil-
   ity
   compatibility with such implementations initiators SHOULD always also
   include a proposal without the ECN Tunnel attribute to enable such a
   responder to select a transform or proposal that does not contain the
   ECN Tun-
   nel Tunnel attribute.  RFC 2407 currently requires responders to
   reject all proposals if any proposal contains an unknown attribute;
   this requirement is expected to be changed to require a responder not
   to select proposals or transforms containing unknown attributes.

9.2.1.3.  Changes to IPsec Tunnel Header Processing

   For full ECN support, the encapsulation and decapsulation processing
   for the IPv4 TOS field and the IPv6 Traffic Class field are changed
   from that specified in [RFC2401] to the following:

                           <-- How Outer Hdr Relates to Inner Hdr -->
                           Outer Hdr at                 Inner Hdr at
      IPv4                 Encapsulator                 Decapsulator
        Header fields:     --------------------         ------------
          DS Field         copied from inner hdr (5)    no change
          ECN Field        constructed (7)              constructed (8)

      IPv6
        Header fields:
          DS Field         copied from inner hdr (6)    no change
          ECN Field        constructed (7)              constructed (8)

      (5)(6) If the packet will immediately enter a domain for which the
      DSCP value in the outer header is not appropriate, that value MUST
      be mapped to an appropriate value for the domain [RFC 2474].  Also
      see [RFC 2475] for further information.

      (7) If the value of the ECN Tunnel field in the SAD entry for this
      SA is "allowed" and the value of ECT (bit 0) is 1 ECN field in the inner
      header, header is set ECT to 1 in the outer header, else set ECT
      any value other than CE, copy this ECN field to 0 in the outer header.  Set CE (bit 1)
      If the ECN field in the inner header is set to 0 CE, then set the
      ECN field in the outer header. header to ECT(0).

      (8) If the value of the ECN tunnel field in the SAD entry for this
      SA is "allowed" and the value of ECT (bit 0) and the ECN field in the inner header is 1, then set to
      ECT(0) or ECT(1) and the CE bit (bit 1) ECN field in the inner outer header is set to
      CE, then copy the logi-
      cal OR of the CE bit in ECN field from the inner outer header with the CE bit in to the
      outer header, else inner
      header.  Otherwise, make no change to the ECN field. field in the inner
      header.

      (5) and (6) are identical to match usage in [RFC2401], although
      they are different in [RFC2401].

   The above description applies to implementations that support the ECN
   Tunnel field in the SAD; such implementations MUST implement this
   processing instead of the processing of the IPv4 TOS octet and IPv6
   Traffic Class octet defined in [RFC2401].  This constitutes the full-
   functionality alternative for ECN usage with IPsec tunnels.

   An implementation that does not support the ECN Tunnel field in the
   SAD MUST implement this processing by assuming that the value of the
   ECN Tunnel field of the SAD is "forbidden" for every SA.  In this
   case, the processing of the ECN field reduces to:

      (7) Set the ECN field (ECT and CE bits) to zero not-ECT in the outer header.
      (8) Make no change to the ECN field in the inner header.

   This constitutes the limited functionality alternative for ECN usage
   with IPsec tunnels.

   For backwards compatibility, packets with ECT and the CE both codepoint set to 1 in the
   outer header SHOULD be dropped if they arrive on an SA that is using
   the limited-functionality option, or that is using the full-
   functionality option (i.e., and has set the ECT flag in the outer
   header to 1) for a packet with the ECT flag not-ECN codepoint set to 0 in the inner
   header.

9.2.2.  Changes to the ECN Field within an IPsec Tunnel.

   If the ECN Field is changed inappropriately within an IPsec tunnel,
   and this change is detected at the tunnel egress, then the receipt of
   a packet not satisfying the appropriate condition for its SA is an
   auditable event.  An implementation MAY create audit records with
   per-SA counts of incorrect packets over some time period rather than
   creating an audit record for each erroneous packet.  Any such audit
   record SHOULD contain the headers from at least one erroneous packet,
   but need not contain the headers from every packet represented by the
   entry.

9.2.3.  Comments for IPsec Support

   Substantial comments were received on two areas of this document dur-
   ing
   during review by the IPsec working group.  This section describes
   these comments and explains why the proposed changes were not
   incorporated.

   The first comment indicated that per-node configuration is easier to
   implement than per-SA configuration.  After serious thought and
   despite some initial encouragement of per-node configuration, it no
   longer seems to be a good idea. The concern is that as ECN-awareness
   is progressively deployed in IPsec, many ECN-aware IPsec implementa-
   tions
   implementations will find themselves communicating with a mixture of
   ECN-aware and ECN-unaware IPsec tunnel endpoints.  In such an
   environment with per-node configuration, the only reasonable thing to
   do is forbid ECN usage for all IPsec tunnels, which is not the
   desired outcome.

   In the second area, several reviewers noted that SA negotiation is
   complex, and adding to it is non-trivial.  One reviewer suggested
   using ICMP after tunnel setup as a possible alternative.  The addi-
   tion
   addition to SA negotiation in this document is OPTIONAL and will
   remain so; implementers are free to ignore it.  The authors believe
   that the assurance it provides can be useful in a number of
   situations.  In practice, if this is not implemented, it can be
   deleted at a subse-
   quent subsequent stage in the standards process.  Extending
   ICMP to negotiate ECN after tunnel setup is more complex than
   extending SA attribute negotiation.  Some tunnels do not permit
   traffic to be addressed to the tunnel egress endpoint, hence the ICMP
   packet would have to be addressed to somewhere else, scanned for by
   the egress endpoint, and discarded there or at its actual
   destination.  In addition, ICMP delivery is unreliable, and hence
   there is a possibility of an ICMP packet being dropped, entailing the
   invention of yet another ack/retransmit mechanism.  It seems better
   simply to specify an OPTIONAL extension to the existing SA
   negotiation mechanism.

9.3.  IP packets encapsulated in non-IP packet headers.

   A different set of issues are raised, relative to ECN, when IP pack-
   ets
   packets are encapsulated in tunnels with non-IP packet headers.  This
   occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP].
   For these protocols, there is no conflict with ECN; it is just that
   ECN cannot be used within the tunnel unless an ECN codepoint can be
   specified for the header of the encapsulating protocol.  Earlier work
   considered a preliminary proposal for incorporating ECN into MPLS,
   and proposals for incorporating ECN into GRE, L2TP, or PPTP will be
   considered as the need arises.

10.  Issues Raised by Monitoring and Policing Devices

   One possibility is that monitoring and policing devices (or more
   informally, "penalty boxes") will be installed in the network to mon-
   itor
   monitor whether best-effort flows are appropriately responding to con-
   gestion,
   congestion, and to preferentially drop packets from flows determined
   not to be using adequate end-to-end congestion control procedures.

   We recommend that any "penalty box" that detects a flow or an aggre-
   gate
   aggregate of flows that is not responding to end-to-end congestion
   control first change from marking to dropping packets from that flow,
   before taking any additional action to restrict the bandwidth
   available to that flow.  Thus, initially, the router may drop packets
   in which the router would otherwise would have set the CE bit. codepoint.
   This could include dropping those arriving packets for that flow that
   are ECN-Capable and that already have the CE bit codepoint set.  In this
   way, any congestion indications seen by that router for that flow
   will be guaranteed to also be seen by the end nodes, even in the
   presence of malicious or broken routers elsewhere in the path.  If we
   assume that the first action taken at any "penalty box" for an ECN-capable ECN-
   capable flow will be to drop packets instead of marking them, then
   there is no way that an adversary that subverts ECN-based end-to-end
   congestion control can cause a flow to be characterized as being non-cooperative non-
   cooperative and placed into a more severe action within the "penalty
   box".

   The monitoring and policing devices that are actually deployed could
   fall short of the `ideal' monitoring device described above, in that
   the monitoring is applied not to a single flow, but to an aggregate
   of flows (e.g., those sharing a single IPsec tunnel).  In this case,
   the switch from marking to dropping would apply to all of the flows
   in that aggregate, denying the benefits of ECN to the other flows in
   the aggregate also.  At the highest level of aggregation, another
   form of the disabling of ECN happens even in the absence of monitor-
   ing
   monitoring and policing devices, when ECN-Capable RED queues switch
   from marking to dropping packets as an indication of congestion when
   the average queue size has exceeded some threshold.

   If there were serious operational problems with routers inappropri-
   ately erasing the CE bit in packet headers, this could be addressed
   to some extent by including a one-bit ECN nonce in packet headers.
   Routers would erase the nonce when they set the CE bit [SCWA99].
   Routers that erased the CE bit would face additional difficulty in
   reconstructing the original nonce, and thus repeated erasure

11.  Evaluations of the
   CE bit would be more likely to be detected by the end-nodes.  (This
   could in fact be done without adding any extra bits for ECN in the IP
   header, by using the

11.1.  Related Work Evaluating ECN codepoints (ECT=1, CE=0) and (ECT=0, CE=1)
   as the two values for the nonce, and by defining the codepoint
   (ECT=0, CE=1) to mean exactly the same as

   This section discusses some of the codepoint (ECT=1,
   CE=0).)  However, at this point related work evaluating the potential danger of misbehaving
   routers does not seem of sufficient concern to warrant this addi-
   tional complication use of adding an
   ECN.  The ECN nonce Web Page [ECN] has pointers to protect against the
   erasure of the CE bit.  Additional research is also needed other papers, as well as
   to better
   understand the value implementations of such a nonce ECN.

   [Floyd94] considers the advantages and appropriate means of gener-
   ating sequences drawbacks of nonce values that an adversary will find suffi-
   ciently difficult to reconstruct.

   An adding ECN nonce would also address to the problem
   TCP/IP architecture.  As shown in the simulation-based comparisons,
   one advantage of misbehaving transport
   receivers lying ECN is to the transport sender about whether avoid unnecessary packet drops for short
   or not delay-sensitive TCP connections.  A second advantage of ECN is in
   avoiding some unnecessary retransmit timeouts in TCP.  This paper
   discusses in detail the CE
   bit was set integration of ECN into TCP's congestion
   control mechanisms.  The possible disadvantages of ECN discussed in a packet.  However, another possibility is for
   the
   data sender to test for paper are that a misbehaving receiver directly, by occasion-
   ally sending non-compliant TCP connection could falsely
   advertise itself as ECN-capable, and that a data TCP ACK packet with ECT and CE set, to see if the
   receiver reports receiving carrying
   an ECN-Echo message could itself be dropped in the CE bit.  Of course, if network.  The
   first of these packets
   encountered congestion two issues is discussed in the network, appendix of this
   document, and the router would make no
   change in second is addressed by the packets, because addition of the CE bit would already be set.
   Thus, for packets sent with CWR flag
   in the ECT TCP header.

   Experimental evaluations of ECN include [RFC2884,K98].  The
   conclusions of [K98] and [RFC2884] are that ECN TCP gets moderately
   better throughput than non-ECN TCP; that ECN TCP flows are fair
   towards non-ECN TCP flows; and that ECN TCP is robust with two-way
   traffic (with congestion in both directions) and CE bits set, the TCP end-
   nodes could not determine if some router intended to set with multiple
   congested gateways.  Experiments with many short web transfers show
   that, while most of the CE bit
   in these packets.  For this reason, sending packets short connections have similar transfer times
   with or without ECN, a small percentage of the ECT and
   CE bits would short connections have to be done
   very sparingly.  In addition, long transfer times for the TCP
   sender would have non-ECN experiments as compared to remember which packets were sent with
   the ECN experiments.

11.2.  A Discussion of the ECN nonce.

   The use of two ECT codepoints, ECT(0) and CE bits set, so that it doesn't react to them as if there was
   congestion ECT(1), can provide a one-
   bit ECN nonce in the network.  We believe that further research packet headers [SCWA99].  The primary motivation for
   this is
   needed on possible transport-based the desire to allow mechanisms for verifying that the
   transport receiver does data sender to verify
   that network elements are not lie erasing the CE codepoint, and that data
   receivers are properly reporting to the transport sender about the receipt of congestion indications.

11.  Evaluations of ECN packets
   with the CE codepoint set, as required by the transport protocol.
   This section discusses some of the related work evaluating the use issues of
   ECN.  The backwards compatibility with IP ECN Web Page [ECN] has pointers to other papers, as well as
   to
   implementations of ECN.

   [Floyd94] considers in routers conformant with RFC 2481, in which only
   one ECT codepoint was defined.  We do not believe that the advantages and drawbacks
   incremental deployment of adding ECN to implementations that understand the
   TCP/IP architecture.  As shown in
   ECT(1) codepoint will cause significant operational problems.  This
   is particularly likely to be the simulation-based comparisons,
   one advantage case when the deployment of ECN is the
   ECT(1) codepoint begins with routers, before the ECT(1) codepoint
   starts to avoid unnecessary packet drops for short
   or delay-sensitive TCP connections.  A second advantage be used by end-nodes.

11.2.1.  The Incremental Deployment of ECN is in
   avoiding some unnecessary retransmit timeouts in TCP.  This paper
   discusses ECT(1) in detail the integration of Routers.

   ECN into TCP's congestion con-
   trol mechanisms.  The possible disadvantages has been an Experimental standard since January 1999, and there
   are already implementations of ECN discussed in the
   paper are routers that a non-compliant do not understand
   the ECT(1) codepoint.  When the use of the ECT(1) codepoint is
   standardized for TCP connection or for other transport protocols, this could falsely advertise
   itself as ECN-capable, and
   mean that a TCP ACK packet carrying an ECN-Echo
   message could itself be dropped in the network.  The first of these
   two issues data sender is discussed in using the appendix of ECT(1) codepoint, but that this document, and the
   second
   codepoint is addressed not understood by a congested router on the addition of the CWR flag in path.

   If allowed by the TCP
   header.

   Experimental evaluations of ECN include [RFC2884,K98].  The conclu-
   sions transport protocol, a data sender would be free not
   to make use of [K98] and [RFC2884] are that ECN TCP gets moderately better
   throughput than non-ECN TCP; that ECN TCP flows are fair towards non-
   ECN TCP flows; ECT(1) at all, and that ECN TCP is robust to send all ECN-capable packets
   with two-way traffic (with
   congestion in both directions) the codepoint ECT(0).  However, if an ECN-capable sender is
   using ECT(1), and with multiple the congested gateways.
   Experiments with many short web transfers show that, while most of router on the short connections have similar transfer times with or without
   ECN, a small percentage path did not understand
   the ECT(1) codepoint, then the router would end up marking some of
   the short connections have very long
   transfer times for ECT(0) packets, and dropping some of the non-ECN experiments ECT(1) packets, as compared
   indications of congestion.  Since TCP is required to react to both
   marked and dropped packets, this behavior of dropping packets that
   could have been marked poses no significant threat to the network,
   and is consistent with the overall approach to ECN
   experiments. that allows
   routers to determine when and whether to mark packets as they see fit
   (see Section 5).

12.  Summary of changes required in IP and TCP

   This document specified two bits in the IP header, the ECN-Capable
   Transport (ECT) bit and the Congestion Experienced (CE) bit, header to be used for ECN.
   The ECT bit set to "0" not-ECT codepoint indicates that the transport protocol will
   ignore the CE bit. codepoint.  This is the default value for the
   ECT bit. ECN
   codepoint.  The ECT bit set to "1" indicates codepoints indicate that the transport proto-
   col protocol
   is willing and able to participate in ECN.

   The default value for the CE bit is "0".  The router sets the CE bit
   to "1" codepoint to indicate congestion to the end
   nodes.  The CE bit codepoint in a packet header MUST NOT be reset by a router from "1" to "0".

   When viewed in terms of code points, this document has defined three
   code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but
   not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1).  The code
   point of (ECT=0, CE=1) is not defined in this document.  One possi-
   bility would be for this code point to be used, some time in the
   future, for some other function for non-ECN-capable packets.  A sec-
   ond possibility would be for this code point to be used as an ECN
   nonce, as described earlier in the document.  A third possibility
   would be for the code point (ECT=0, CE=1) to be used to indicate that
   the packet is ECN-capable for an alternate semantics for the Conges-
   tion Experienced indication.  However, at this time the code point
   (ECT=0, CE=1) remains undefined.
   router.

   TCP requires three changes for ECN, a setup phase and two new flags
   in the TCP header. The ECN-Echo flag is used by the data receiver to
   inform the data sender of a received CE packet.  The Congestion Win-
   dow
   Window Reduced (CWR) flag is used by the data sender to inform the
   data receiver that the congestion window has been reduced.

   When ECN (Explicit Congestion Notification [RFC2481]) is used, it is
   required that congestion indications generated within an IP tunnel
   not be lost at the tunnel egress.  We specified a minor modification
   to the IP protocol's handling of the ECN field during encapsulation
   and de-capsulation to allow flows that will undergo IP tunneling to
   use ECN.

   Two options for ECN in tunnels were specified:
   1) A limited-functionality option that does not use ECN inside the IP
   tunnel, by turning setting the ECT bit ECN field in the outer header off, to not-ECT, and
   not altering the inner header at the time of decapsulation.
   2) The full-functionality option, which copies sets the ECT bit ECN field in the
   outer header to either not-ECT or to one of the ECT codepoints,
   depending on the ECN field in the inner header to the encapsulating header.  At decapsulation, if
   the
   ECT bit CE codepoint is set in the inner outer header, and the CE bit on the outer inner header is
   ORed with the CE bit
   set to one of the inner header to update ECT codepoints, then the CE bit of codepoint is copied to
   the
   packet. inner header.

   All IP tunnels MUST implement one of the two alternative approaches
   described above.  For IPsec tunnels, this document also defines an
   optional IPsec Security Association (SA) attribute that enables
   negotiation of ECN usage within IPsec tunnels and an optional field
   in the Security Association Database to indicate whether ECN is per-
   mitted
   permitted in tunnel mode on a SA.  The required changes to IPsec
   tunnels for ECN usage modify RFC 2401 [RFC2401], which defines the
   IPsec architecture and specifies some aspects of its implementation.
   The new IPsec SA attribute is in addition to those already defined in
   Section 4.5 of [RFC2407].

   This document is intended to obsolete RFC 2481, "A Proposal to add
   Explicit Congestion Notification (ECN) to IP", which defined ECN as
   an Experimental Protocol for the Internet Community.  The rest of
   this section describes the relationship between this document and its
   predecessor.

   RFC 2481 included a brief discussion of the use of ECN with encapsu-
   lated
   encapsulated packets, and noted that for the IPsec specifications at
   the time (January 1999), flows could not safely use ECN if they were
   to traverse IPsec tunnels.  RFC 2481 also described the changes that
   could be made to IPsec tunnel specifications to made them compatible
   with ECN.

   This document also incorporates work that was done after RFC 2481,
   First was to describe the changes to IPsec tunnels in detail, and
   extensively discuss the security implications of ECN (now included as
   Sections 18 and 19 of this document).  Second was to extend the dis-
   cussion
   discussion of IPsec tunnels to include all IP tunnels.  Because older
   IP tunnels are not compatible with a flow's use of ECN, the
   deployment of ECN in the Internet will create strong pressure for
   older IP tun-
   nels tunnels to be updated to an ECN-compatible version, using
   either the limited-functionality or the full-functionality option.

   This document does not address the issue of including ECN in non-IP
   tunnels such as MPLS, GRE, L2TP, or PPTP.  An earlier preliminary
   document about adding ECN support to MPLS was not advanced.

   A third new piece of work after RFC2481 was to describe the ECN pro-
   cedure
   procedure with retransmitted data packets, that the an ECT bit codepoint
   should not be set on retransmitted data packets.  The motivation for
   this addi-
   tional additional specification is to eliminate a possible avenue for denial-of-
   service
   denial-of-service attacks on an existing TCP connection.  Some prior deploy-
   ments
   deployments of ECN-capable TCP might not conform to the (new)
   requirement not to set the an ECT bit codepoint on retransmitted packets; we
   do not believe this will cause significant problems in practice.

   This document also expands slightly on the specification of the use
   of SYN packets for the negotiation of ECN.  While some prior deploy-
   ments
   deployments of ECN-capable TCP might not conform to the requirements speci-
   fied
   specified in this document, we do not believe that this will lead to
   any performance or compatibility problems for TCP connections with a com-
   bination
   combination of TCP implementations at the endpoints.

   This document also includes the specification of the ECT(1)
   codepoint, which may be used by TCP as part of the implementation of
   an ECN nonce.

13.  Conclusions

   Given the current effort to implement AQM, we believe this is the
   right time to deploy congestion avoidance mechanisms that do not
   depend on packet drops alone.  With the increased deployment of
   applications and transports sensitive to the delay and loss of a sin-
   gle
   single packet (e.g., realtime traffic, short web transfers),
   depending on packet loss as a normal congestion notification
   mechanism appears to be insufficient (or at the very least, non-optimal). non-
   optimal).

   We examined the consequence of modifications of the ECN field within
   the network, analyzing all the opportunities for an adversary to
   change the ECN field.  In many cases, the change to the ECN field is
   no worse than dropping a packet. However, we noted that some changes
   have the more serious consequence of subverting end-to-end congestion
   control.  However, we point out that even then the potential damage
   is limited, and is similar to the threat posed by end-systems inten-
   tionally
   intentionally failing to cooperate with end-to-end congestion
   control.

14.  Acknowledgements

   Many people have made contributions to this work and this document,
   including many that we have not managed to directly acknowledge in
   this document.  In addition, we would like to thank Kenjiro Cho for
   the proposal for the TCP mechanism for negotiating ECN-Capability,
   Kevin Fall for the proposal of the CWR bit, Steve Blake for material
   on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for discus-
   sions
   discussions of ECN issues, and Steve Bellovin, Jim Bound, Brian
   Carpenter, Paul Ferguson, Stephen Kent, Greg Minshall, and Vern
   Paxson for dis-
   cussions discussions of security issues.  We also thank the
   Internet End-to-End Research Group for ongoing discussions of these
   issues.

   Email discussions with a number of people, including Alexey
   Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed
   the issues raised by non-conformant equipment in the Internet that
   does not respond to TCP SYN packets with the ECE and CWR flags set.
   We thank Mark Handley, Jitentra Padhye, and others for discussions on
   the TCP initialization procedures.

   The discussion of ECN and IP tunnel considerations draws heavily on
   related discussions and documents from the Differentiated Services
   Working Group.  We thank Tabassum Bint Haque from Dhaka, Bangladesh,
   for feedback on IP tunnels.  We thank Derrell Piper and Kero Tivinen
   for proposing modifications to RFC 2407 that improve the usability of
   negotiating the ECN Tunnel SA attribute.

   We thank David Wetherall, David Ely, and Neil Spring for the proposal
   for the ECN nonce.  We also thank Stefan Savage for discussions on
   this issue.  We thank Bob Briscoe and Jon Crowcroft for raising the
   issue of fragmentation in IP, on alternate semantics for the fourth
   ECN codepoint, and several other topics. We thank Richard Wendland
   for feedback on several issues in the draft.

15.  References

   [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402,
   November 1998.

   [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement
   Levels", BCP 14, RFC 2119, March 1997.

   [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html".
   Reference for informational purposes only.

   [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload",
   RFC 2406, November 1998.

   [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways
   for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1
   N.4, August 1993, p.  397-413.

   [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM
   Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23.

   [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator",
   URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all-
   ecn.  Reference for informational purposes only.

   [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End Con-
   gestion
   Congestion Control in the Internet", IEEE/ACM Transactions on Network-
   ing,
   Networking, August 1999.

   [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection",
   SIGCOMM '97, September 1997.

   [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing
   Encapsulation (GRE), RFC 1701, October 1994.

   [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
   ACM SIGCOMM '88, pp. 314-329.

   [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance Algo-
   rithm",
   Algorithm", Message to end2end-interest mailing list, April 1990. URL
   "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".

   [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN)
   benefits for TCP", Master's thesis, UCLA, 1998, URL
   "http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz".

   [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B.
   Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999.

   [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver- driven "Receiver-driven
   Layered Multicast", SIGCOMM '96, August 1996, pp.  117-130.

   [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus,
   Requirements for Traffic Engineering Over MPLS, RFC 2702, September
   1999.

   [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W.
   and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637,
   July 1999.

   [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September
   1981.

   [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
   September 1981.

   [RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of the
   Internet Checksum", RFC 1141, January 1990.

   [RFC1349] Almquist, P., "Type of Service in the Internet Protocol
   Suite", RFC 1349, July 1992.

   [RFC1455] Eastlake, D., "Physical Link Security Type of Service", RFC
   1455, May 1993.

   [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic
   Routing Encapsulation (GRE), RFC 1701, October 1994.

   [RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic
   Routing Encapsulation over IPv4 networks, RFC 1702, October 1994.

   [RFC2003]  Perkins, C., IP Encapsulation within IP, RFC 2003, October
   1996.

   [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate Require-
   ment
   Requirement Levels, RFC 2119, March 1997.

   [RFC2309] Braden, B., et al., "Recommendations on Queue Management
   and Congestion Avoidance in the Internet", RFC 2309, April 1998.

   [RFC2401] S. Kent and R. Atkinson, Security Architecture for the
   Internet Protocol, RFC 2401, November 1998.

   [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation
   for ISAKMP, RFC 2407, November 1998.

   [RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner,
   Internet Security Association and Key Management Protocol (ISAKMP),
   RFC 2409, November 1998.

   [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE),
   RFC 2409, November 1998.

   [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition
   of the Differentiated Services Field (DS Field) in the IPv4 and IPv6
   Headers", RFC 2474, December 1998.

   [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W.
   Weiss, An Architecture for Differentiated Services, RFC 2475, Decem-
   ber
   December 1998.

   [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit
   Congestion Notification (ECN) to IP, RFC 2481, January 1999.

   [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control",
   RFC 2581, April 1999.

   [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation
   of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884,
   July 2000.

   [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983,
   October 2000.

   [RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For
   Values In the Internet Protocol and Related Headers", RFC 2780, March
   2000.

   [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for
   Congestion Avoidance in Computer Networks", ACM Transactions on Com-
   puter
   Computer Systems, Vol.8, No.2, pp.  158-181, May 1990.

   [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom
   Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM
   Computer Communications Review, October 1999.

16.  Security Considerations

   Security considerations have been discussed in Sections 7, 8, 18, and
   19.

17.  IPv4 Header Checksum Recalculation

   IPv4 header checksum recalculation is an issue with some high-end
   router architectures using an output-buffered switch, since most if
   not all of the header manipulation is performed on the input side of
   the switch, while the ECN decision would need to be made local to the
   output buffer. This is not an issue for IPv6, since there is no IPv6
   header checksum. The IPv4 TOS octet is the last byte of a 16-bit
   half-word.

   RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
   checksum after the TTL field is decremented.  The incremental updat-
   ing
   updating of the IPv4 checksum after the CE bit codepoint was set would
   work as fol-
   lows: follows: Let HC be the original header checksum, checksum for an ECT(0)
   packet, and let HC' be the new header checksum after the CE bit checksum
   has been set.  That is, the ECN field has changed from '10' to '11'.
   Then for header checksums calculated with one's complement
   subtraction, HC' would be recalculated as follows:

        HC' = { HC - 1     HC > 1
              { 0x0000     HC = 1

For header checksums calculated on two's complement machines, HC' would
be recalculated as follows after the CE bit was set:

        HC' = { HC - 1     HC > 0
              { 0xFFFE     HC = 0

A similar incremental updating of the IPv4 checksum can be carried out
when the ECN field is changed from ECT(1) to CE, that is, from '01' to
'11'.

18.  Possible Changes to the ECN Field in the Network

   This section discusses in detail possible changes to the ECN field in
   the network, such as falsely reporting congestion, disabling ECN-
   Capability for an individual packet, erasing the ECN congestion indi-
   cation,
   indication, or falsely indicating ECN-Capability.  We represent the ECN
   bits in the IP header by the tuple (ECT bit, CE bit).

18.1.  Possible Changes to the IP Header

18.1.1.  Erasing the Congestion Indication

   First, we consider the changes that a router could make that would
   result in effectively erasing the congestion indication after it had
   been set by a router upstream.  The convention followed is:
   (ECT, CE)
   ECN codepoint of received packet -> (ECT, CE) ECN codepoint of packet
   transmitted.

   (1, 1) -> (1, 0): erase only

   Replacing the CE bit that was set.
   (1, 1) -> (0, 0): erase both codepoint with the ECT bit and ECT(0) or ECT(1) codepoint
   effectively erases the CE bit.
   (1, 1) -> (0, 1): erase congestion indication.  However, with the use
   of two ECT bit

   The first change turns off codepoints, a router erasing the CE bit after codepoint has no way
   to know whether the original ECT codepoint was ECT(0) or ECT(1).
   Thus, it has been set by some
   upstream router along is possible for the path. transport protocol to deploy mechanisms
   to detect such erasures of the CE codepoint.

   The consequence of the erasure of the CE codepoint for the upstream
   router is that there is a potential for congestion to build for a
   time, because the congestion indication does not reach the source.
   However, the packet would be received and acknowledged.

   The potential effect of erasing the congestion indication is complex,
   and is discussed in depth in Section 19 below.  Note that the effect
   of erasing the congestion indication is different from dropping a
   packet in the network.  When a data packet is dropped, the drop is
   detected by the TCP sender, and interpreted as an indication of con-
   gestion.
   congestion.  Similarly, if a sufficient number of consecutive acknowl-
   edgement
   acknowledgement packets are dropped, causing the cumulative
   acknowledgement field not to be advanced at the sender, the sender is
   limited by the congestion window from sending additional packets, and
   ultimately the retransmit timer expires.

   In contrast, a systematic erasure of the CE bit by a downstream
   router can have the effect of causing a queue buildup at an upstream
   router, including the possible loss of packets due to buffer over-
   flow.
   overflow.  There is a potential of unfairness in that another flow
   that goes through the congested router could react to the CE bit set
   while the flow that has the CE bit erased could see better
   performance.  The limitations on this potential unfairness are
   discussed in more detail in Section 19 below.

   The second change last of the three changes is to turn off both the ECT and replace the CE bits, codepoint with the
   not-ECT codepoint.  thus erasing the congestion indication and
   disabling ECN-Capability at the same time.

   The third change turns off only the ECT bit, disabling
   ECN-Capability.

   Within an IP tunnel using the full-functionality option, the third
   change would not erase the congestion indication, but would only dis-
   able ECN-Capability for that packet within the rest of the tunnel.
   However, when performed outside of an IP tunnel, the third change
   would also effectively erase the congestion indication, because an
   ECN field of (0, 1) is undefined.

   The `erasure' of the congestion indication is only effective if the
   packet does not end up being marked or dropped again by a downstream
   router.  With  If the first change, CE codepoint is replaced by an ECT codepoint, the
   packet remains ECN-Capable, and could be either marked or dropped by
   a downstream router as an indi-
   cation indication of congestion.  With  If the second and third changes, CE
   codepoint is replaced by the not-ECT codepoint, the packet is no
   longer ECN-capable, and can therefore be dropped but not marked by a
   downstream router as an indication of congestion.

18.1.2.  Falsely Reporting Congestion

   (1, 0) -> (1, 1)

   This change is to set the CE bit codepoint when the an ECT bit codepoint was
   already set, even though there was no congestion.  This change does
   not affect the treatment of that packet along the rest of the path.
   In particular, a router does not examine the CE bit codepoint in deciding
   whether to drop or mark an arriving packet.

   However, this could result in the application unnecessarily invoking
   end-to-end congestion control, and reducing its arrival rate.  By
   itself, this is no worse (for the application or for the network)
   than if the tampering router had actually dropped the packet.

18.1.3.  Disabling ECN-Capability

   (1, 0) -> (0, *)

   This change is to turn off the ECT bit of a packet that does not have
   the CE bit set.  (Section 18.1.1 discussed the case of turning off
   the ECT bit codepoint of a packet that does have the CE bit set.) packet.  This means
   that if the packet later encounters congestion (e.g., by arriving to
   a RED queue with a moderate average queue size), it will be dropped
   instead of being marked.  By itself, this is no worse (for the appli-
   cation)
   application) than if the tampering router had actually dropped the
   packet.  The saving grace in this particular case is that there is no con-
   gested
   congested router upstream expecting a reaction from setting the CE
   bit.

18.1.4.  Falsely Indicating ECN-Capability
   This change would incorrectly label a packet as ECN-Capable. The
   packet may have been sent either by an ECN-Capable transport or a
   transport that is not ECN-Capable.

   (0, *) -> (1, 0);
   (0, *) -> (1, 1);

   If the packet later encounters moderate congestion at an ECN-Capable
   router, the router could set the CE bit codepoint instead of dropping the
   packet.  If the transport protocol in fact is not ECN-Capable, then
   the transport will never receive this indication of congestion, and
   will not reduce its sending rate in response.  The potential conse-
   quences
   consequences of falsely indicating ECN-capability are discussed
   further in Section 19 below.

   If the packet never later encounters congestion at an ECN-Capable
   router, then the first of these two changes would have no effect. effect,
   other than possibly interfering with the use of the ECN nonce by the
   transport protocol.  The second last change, however, would have the effect
   of giving false reports of congestion to a monitoring device along
   the path.  If the transport protocol is ECN-Capable, then the second of these two
   changes (when, for example, (0,0) was changed to (1,1)) this change
   could also have an effect at the transport level, by combining
   falsely indicat-
   ing indicating ECN-Capability with falsely reporting congestion.
   For an ECN-
   capable ECN-capable transport, this would cause the transport to
   unnecessarily react to congestion.  In this particular case, the
   router that is incorrectly changing the ECN field could have dropped
   the packet. Thus for this case of an ECN-capable transport, the
   consequence of this change to the ECN field is no worse than dropping
   the packet.

18.1.5.  Changes with No Functional Effect

   (0, *) -> (0, *)

   The CE bit is ignored in a packet that does not have the ECT bit set.
   Thus, this change would have no effect, in terms of ECN.

18.2.  Information carried in the Transport Header

   For TCP, an ECN-capable TCP receiver informs its TCP peer that it is
   ECN-capable at the TCP level, conveying this information in the TCP
   header at the time the connection is setup.  This document does not
   consider potential dangers introduced by changes in the transport
   header within the network.  In the case of IPsec tunnels, the IPsec
   tunnel protects the transport header.

   Another issue concerns TCP packets with a spoofed IP source address
   carrying invalid ECN information in the transport header.  For com-
   pleteness,
   completeness, we examine here some possible ways that a node spoofing
   the IP source address of another node could use the two ECN flags in
   the TCP header to launch a denial-of-service attack. However, these
   attacks would require an ability for the attacker to use valid TCP
   sequence numbers, and any attacker with this ability and with the
   ability to spoof IP source addresses could damage the TCP connection
   without using the ECN flags.  Therefore, ECN does not add any new
   vulnerabilities in this respect.

   An acknowledgement packet with a spoofed IP source address of the TCP
   data receiver could include the ECE bit set.  If accepted by the TCP
   data sender as a valid packet, this spoofed acknowledgement packet
   could result in the TCP data sender unnecessarily halving its conges-
   tion
   congestion window.  However, to be accepted by the data sender, such
   a spoofed acknowledgement packet would have to have the correct
   32-bit sequence number as well as a valid acknowledgement number.  An
   attacker that could successfully send such a spoofed acknowledgement
   packet could also send a spoofed RST packet, or do other equally dam-
   aging
   damaging operations to the TCP connection.

   Packets with a spoofed IP source address of the TCP data sender could
   include the CWR bit set.  Again, to be accepted, such a packet would
   have to have a valid sequence number.  In addition, such a spoofed
   packet would have a limited performance impact.  Spoofing a data
   packet with the CWR bit set could result in the TCP data receiver
   sending fewer ECE packets than it would otherwise, if the data
   receiver was sending ECE packets when it received the spoofed CWR
   packet.

18.3.  Split Paths

   In some cases, a malicious or broken router might have access to only
   a subset of the packets from a flow.  The question is as follows:
   can this router, by altering the ECN field in this subset of the
   packets, do more damage to that flow than if it had simply dropped
   that set of packets?

   We will classify the packets in the flow as A packets and B packets,
   and assume that the adversary only has access to A packets.  Assume
   that the adversary is subverting end-to-end congestion control along
   the path traveled by A packets only, by either falsely indicating
   ECN-Capability upstream of the point where congestion occurs, or
   erasing the congestion indication downstream.  Consider also that
   there exists a monitoring device that sees both the A and B packets,
   and will "punish" both the A and B packets if the total flow is
   determined not to be properly responding to indications of conges-
   tion.
   congestion.  Another key characteristic that we believe is likely to
   be true is that the monitoring device, before `punishing' the A&B
   flow, will first drop packets instead of setting the CE bit, codepoint,
   and will drop arriving packets of that flow that already have the ECT and CE bits
   codepoint set.  If the end nodes are in fact using end-to-end
   congestion con-
   trol, control, they will see all of the indications of
   congestion seen by the monitoring device, and will begin to respond
   to these indications of congestion. Thus, the monitoring device is
   successful in providing the indications to the flow at an early
   stage.

   It is true that the adversary that has access only to the A packets
   might, by subverting ECN-based congestion control, be able to deny
   the benefits of ECN to the other packets in the A&B aggregate.  While
   this is unfortunate, this is not a reason to disable ECN within an
   IPsec tunnel.

   A variant of falsely reporting congestion occurs when there are two
   adversaries along a path, where the first adversary falsely reports
   congestion, and the second adversary `erases' those reports. (Unlike
   packet drops, ECN congestion reports can be `reversed' later in the
   network by a malicious or broken router.) router.  However, the use of the ECN
   nonce could help the transport to detect this behavior.)  While this
   would be trans-
   parent transparent to the end node, it is possible that a
   monitoring device between the first and second adversaries would see
   the false indica-
   tions indications of congestion.  Keep in mind our recommendation
   in this docu-
   ment, document, that before `punishing' a flow for not responding
   appropriately to congestion, the router will first switch to dropping
   rather than marking as an indication of congestion, for that flow.
   When this includes dropping arriving packets from that flow that have
   the CE
   bit codepoint set, this ensures that these indications of
   congestion are being seen by the end nodes.  Thus, there is no
   additional harm that we are able to postulate as a result of multiple
   conflicting adversaries.

19.  Implications of Subverting End-to-End Congestion Control

   This section focuses on the potential repercussions of subverting
   end-to-end congestion control by either falsely indicating ECN-Capa-
   bility, ECN-
   Capability, or by erasing the congestion indication in ECN (the CE-bit). CE
   codepoint).  Subverting end-to-end congestion control by either of
   these two meth-
   ods methods can have consequences both for the application and
   for the net-
   work. network.  We discuss these separately below.

   The first method to subvert end-to-end congestion control, that of
   falsely indicating ECN-Capability, effectively subverts
   falsely indicating ECN-Capability, effectively subverts end-to-end
   congestion control only if the packet later encounters congestion
   that results in the setting of the CE codepoint.  In this case, the
   transport protocol (which may not be ECN-capable) does not receive
   the indication of congestion from these downstream congested routers.

   The second method to subvert end-to-end congestion control, `erasing'
   the CE codepoint in a packet, effectively subverts end-to-end
   congestion control only when the CE codepoint in the packet was set
   earlier by a congested router.  In this case, the transport protocol
   does not receive the indication of congestion from the upstream
   congested routers.

   Either of these two methods of subverting end-to-end congestion
   control can potentially introduce more damage to the network (and
   possibly to the flow itself) than if the adversary had simply dropped
   packets from that flow.  However, as we discuss later in this section
   and in Section 7, this potential damage is limited.

19.1.  Implications for the Network and for Competing Flows

   The CE codepoint of the ECN field is only used by routers as an
   indication of congestion during periods of *moderate* congestion.
   ECN-capable routers should drop rather than mark packets during heavy
   congestion even if the router's queue is not yet full.  For example,
   for routers using active queue management based on RED, the router
   should drop rather than mark packets that arrive while the average
   queue sizes exceed the RED queue's maximum threshold.

   One consequence for the network of subverting end-to-end congestion
   control only if is that flows that do not receive the packet later encounters congestion indications
   from the network might increase their sending rate until they drive
   the network into heavier congestion.  Then, the congested router
   could begin to drop rather than mark arriving packets.  For flows
   that results are not isolated by some form of per-flow scheduling or other
   per-flow mechanisms, but are instead aggregated with other flows in a
   single queue in an undifferentiated fashion, this packet-dropping at
   the setting congested router would apply to all flows that share that queue.
   Thus, the consequences would be to increase the level of congestion
   in the CE bit. network.

   In this case, some cases, the trans-
   port protocol (which may not be ECN-capable) does not receive increase in the
   indication level of congestion from these downstream will lead to a
   substantial buffer buildup at the congested routers.

   The second method queue that will be
   sufficient to subvert end-to-end congestion control, `erasing' drive the (set) CE bit in a packet, effectively subverts end-to-end conges-
   tion control only congested queue from the packet-marking to
   the packet-dropping regime.  This transition could occur either
   because of buffer overflow, or because of the active queue management
   policy described above that drops packets when the CE bit in average queue is
   above RED's maximum threshold.  At this point, all flows, including
   the subverted flow, will begin to see packet was set earlier by drops instead of packet
   marks, and a
   congested router.  In this case, the transport protocol does not
   receive the indication malicious or broken router will no longer be able to
   `erase' these indications of congestion from in the upstream congested
   routers.

   Either of these two methods of subverting network.  If the end
   nodes are deploying appropriate end-to-end congestion con-
   trol can potentially introduce more damage to the network (and possi-
   bly to control, then
   the subverted flow itself) than if the adversary had simply dropped
   packets from that flow.  However, as we discuss later in this section
   and will reduce its arrival rate in Section 7, this potential damage is limited.

19.1.  Implications for the Network and for Competing Flows

   The CE bit of response to
   congestion.  When the ECN field is only used by routers as an indication
   of congestion during periods level of *moderate* congestion.  ECN-capable
   routers should drop rather than mark packets during heavy congestion
   even if the router's queue is not yet full.  For example, for routers
   using active sufficiently reduced,
   the congested queue management based on RED, can return from the router should drop
   rather than mark packets that arrive while packet-dropping regime to the average queue sizes
   exceed
   packet-marking regime.  The steady-state pattern could be one of the RED queue's maximum threshold.

   One consequence for
   congested queue oscillating between these two regimes.

   In other cases, the network consequences of subverting end-to-end congestion
   control is that flows that do will not receive the congestion indications
   from the network might increase their sending rate until they be severe enough to drive the network into heavier congestion.  Then, the congested router
   could begin to drop rather than mark arriving packets.  For flows link into
   sufficiently-heavy congestion that packets are not isolated by some form dropped instead of
   being marked.  In this case, the implications for competing flows in
   the network will be a slightly-increased rate of per-flow scheduling packet marking or other
   per-flow mechanisms, but are instead aggregated with other flows
   dropping, and a corresponding decrease in the bandwidth available to
   those flows.  This can be a
   single stable state if the arrival rate of the
   subverted flow is sufficiently small, relative to the link bandwidth,
   that the average queue in an undifferentiated fashion, this packet-dropping size at the congested router would apply to all flows that remains under
   control.  In particular, the subverted flow could have a limited
   bandwidth demand on the link at this router, while still getting more
   than its "fair" share that queue.
   Thus, of the consequences would link.  This limited demand could be due
   to increase the level of congestion
   in a limited demand from the network.

   In some cases, data source; a limitation from the increase in TCP
   advertised window; a lower-bandwidth access pipe; or other factors.
   Thus the level subversion of ECN-based congestion will control can still lead to a
   substantial buffer buildup at the congested queue that will be suffi-
   cient
   unfairness, which we believe is appropriate to drive the congested queue from the packet-marking note here.

   The threat to the
   packet-dropping regime.  This transition could occur either because
   of buffer overflow, or because of network posed by the active queue management policy
   described above that drops packets when subversion of ECN-based
   congestion control in the average queue network is above
   RED's maximum threshold.  At this point, all flows, including essentially the
   subverted flow, will begin same as the
   threat posed by an end-system that intentionally fails to see packet drops instead cooperate
   with end-to-end congestion control.  The deployment of packet
   marks, mechanisms in
   routers to address this threat is an open research question, and is
   discussed further in Section 10.

   Let us take the example described in Section 18.1.1, where the CE
   codepoint that was set in a malicious packet is erased: {'11' -> '10' or broken '11'
   -> '01'}.  The consequence for the congested upstream router will no longer be able to
   `erase' these indications of congestion in that set
   the network.  If CE codepoint is that this congestion indication does not reach
   the end nodes are deploying appropriate end-to-end congestion control, then
   the subverted flow will reduce for that flow. The source (even one which is completely
   cooperative and not malicious) is thus allowed to continue to
   increase its arrival sending rate in response to con-
   gestion.  When the level of congestion (if it is sufficiently reduced, a TCP flow, by increasing its
   congestion window).  The flow potentially achieves better throughput
   than the
   congested queue can return from other flows that also share the packet-dropping regime to congested router, especially
   if there are no policing mechanisms or per-flow queueing mechanisms
   at that router.  Consider the
   packet-marking regime.  The steady-state pattern could be one behavior of the
   congested queue oscillating between these two regimes.

   In other cases, flows, especially
   if they are cooperative: that is, the consequences of subverting flows that do not experience
   subverted end-to-end congestion
   control will not be severe enough control.  They are likely to drive reduce
   their load (e.g., by reducing their window size) on the congested link into
   sufficiently-heavy congestion that packets are dropped instead of
   being marked.  In
   router, thus benefiting our subverted flow. This results in
   unfairness.  As we discussed above, this case, unfairness could either be
   transient (because the congested queue is driven into the packet-
   marking regime), oscillatory (because the implications for competing flows in congested queue oscillates
   between the network will be a slightly-increased rate of packet marking or
   dropping, and a corresponding decrease in the bandwidth available to
   those flows.  This can be packet dropping regime), or more
   moderate but a persistent stable state if the arrival rate of (because the
   subverted flow congested queue
   is sufficiently small, relative never driven to the link bandwidth,
   that the average queue size at the congested router remains under
   control.  In particular, packet dropping regime).

   The results would be similar if the subverted flow could have was intentionally
   avoiding end-to-end congestion control.  One difference is that a limited
   bandwidth demand on the link
   flow that is intentionally avoiding end-to-end congestion control at this router, while still getting more
   than its "fair" share of the link.  This limited demand could be due
   to a limited demand from the data source; a limitation from the TCP
   advertised window; a lower-bandwidth access pipe; or other factors.
   Thus
   the subversion of ECN-based end nodes can avoid end-to-end congestion control can still lead to
   unfairness, which we believe even when the
   congested queue is appropriate in packet-dropping mode, by refusing to note here.

   The threat reduce its
   sending rate in response to packet drops in the network.  Thus the
   problems for the network posed by from the subversion of ECN-based con-
   gestion control in the network is essentially the same as the threat
   posed by an end-system that intentionally fails to cooperate with
   end-to-end congestion control.  The deployment of mechanisms in
   routers to address this threat is an open research question, and is
   discussed further in Section 10.

   Let us take the example described in Section 18.1.1, where
   control are less severe than the CE bit
   that was set problems caused by the intentional
   avoidance of end-to-end congestion control in a packet the end nodes.  It is erased: {(1, 1) -> (1, 0)}.  The conse-
   quence for
   also the congested upstream router case that set the CE bit it is that
   this congestion indication does not reach considerably more difficult to control the
   behavior of the end nodes for that
   flow. The source (even one which is completely cooperative and not
   malicious) than it is thus allowed to continue to increase its sending rate
   (if it control the behavior of the
   infrastructure itself.  This is a TCP flow, not to say that the problems for the
   network posed by increasing its congestion window).  The flow
   potentially achieves better throughput than the other flows network's subversion of ECN-based congestion
   control are small; just that also
   share the congested router, especially if there they are no policing mech-
   anisms dwarfed by the problems for the
   network posed by the subversion of either ECN-based or per-flow queueing other
   currently known packet-based congestion control mechanisms at that router.  Consider by the
   behavior of end
   nodes.

19.2.  Implications for the other flows, especially if they are cooperative: Subverted Flow

   When a source indicates that it is ECN-capable, there is an
   expectation that
   is, the flows routers in the network that do not experience subverted end-to-end congestion
   control.  They are likely to reduce their load (e.g., by reducing
   their window size) on the congested router, thus benefiting our sub-
   verted flow. This results capable of
   participating in unfairness.  As we discussed above, this
   unfairness could either be transient (because ECN will use the congested queue CE codepoint for indication of
   congestion. There is
   driven into the packet-marking regime), oscillatory (because potential benefit of using ECN in reducing
   the amount of packet loss (in addition to the con-
   gested reduced queueing delays
   because of active queue oscillates between the packet marking and management policies).  When the packet
   dropping regime), or more moderate but flows
   through a persistent stable state
   (because tunnel where the congested queue is never driven to nodes that the packet dropping
   regime).

   The results would be similar if tunneled packets traverse
   are untrusted in some way, the subverted flow was intentionally
   avoiding end-to-end congestion control.  One difference expectation is that a IPsec will protect
   the flow from subversion that is intentionally avoiding end-to-end congestion control at results in undesirable consequences.

   In many cases, a subverted flow will benefit from the end nodes can avoid subversion of
   end-to-end congestion control even when the
   congested queue is for that flow in packet-dropping mode, the network, by refusing to reduce its
   sending rate in response
   receiving more bandwidth than it would have otherwise, relative to packet drops in the network.  Thus
   competing non-subverted flows.  If the
   problems for congested queue reaches the network from
   packet-dropping stage, then the subversion of ECN-based end-to-end congestion
   control are less severe than the problems caused by might or might not be of overall benefit to the intentional
   avoidance subverted
   flow, depending on that flow's relative tradeoffs between throughput,
   loss, and delay.

   One form of subverting end-to-end congestion control in the end nodes.  It is
   also the case that it is considerably more difficult to control falsely
   indicate ECN-capability by setting the
   behavior ECT codepoint.  This has the
   consequence of downstream congested routers setting the end nodes than it CE codepoint
   in vain.  However, as described in Section 9.1.2, if an ECT codepoint
   is to control changed in an IP tunnel, this can be detected at the behavior egress point
   of the
   infrastructure itself.  This is not to say that the problems for tunnel, as long as the
   network posed by inner header was not changed within the network's subversion
   tunnel.

   The second form of ECN-based subverting end-to-end congestion control are small; just that they are dwarfed by the problems for the
   network posed by is to
   erase the subversion of either ECN-based or other cur-
   rently known packet-based congestion control mechanisms indication by erasing the end
   nodes.

19.2.  Implications for the Subverted Flow

   When a source indicates that CE codepoint.  In this
   case, it is ECN-capable, there is an expecta-
   tion that the upstream congested routers that set the CE codepoint
   in vain.

   If an ECT codepoint is erased within an IP tunnel, then this can be
   detected at the network that are capable egress point of participat-
   ing in ECN will use the tunnel, as long as the inner
   header was not changed within the tunnel.  If the CE bit for indication of congestion. There codepoint is
   the potential benefit set
   upstream of using ECN in reducing the amount IP tunnel, then any erasure of packet
   loss (in addition to the reduced queueing delays outer header's CE
   codepoint within the tunnel will have no effect because the inner
   header preserves the set value of active
   queue management policies).  When the packet flows through a tunnel
   where CE codepoint.  However, if the nodes that
   CE codepoint is set within the tunneled packets traverse are untrusted in
   some way, tunnel, and erased either within or
   downstream of the expectation tunnel, this is that IPsec will protect not necessarily detected at the flow from
   subversion that results in undesirable consequences.

   In many cases, a subverted flow will benefit from
   egress point of the tunnel.

   With this subversion of end-to-end congestion control control, an end-system
   transport does not respond to the congestion indication.  Along with
   the increased unfairness for that flow in the network, by
   receiving more bandwidth than it would have otherwise, relative to
   competing non-subverted flows.  If flows described in the
   previous section, the congested router's queue reaches the
   packet-dropping stage, then could continue to
   build, resulting in packet loss at the subversion of end-to-end congested router - which is a
   means for indicating congestion
   control might or to the transport in any case.  In the
   interim, the flow might experience higher queueing delays, possibly
   along with an increased bandwidth relative to other non-subverted
   flows.  But transports do not be inherently make assumptions of overall benefit to
   consistently experiencing carefully managed queueing in the subverted
   flow, depending on path.  We
   believe that flow's relative tradeoffs between throughput,
   loss, and delay.

   One form these forms of subverting end-to-end congestion control is to falsely
   indicate ECN-capability by setting
   are no worse for the ECT bit.  This has subverted flow than if the conse-
   quence of downstream congested routers setting adversary had simply
   dropped the CE bit packets of that flow itself.

19.3.  Non-ECN-Based Methods of Subverting End-to-end Congestion Control

   We have shown that, in vain.
   However, as described many cases, a malicious or broken router that
   is able to change the bits in Section 9.1.2, the ECN field can do no more damage
   than if it had simply dropped the ECT bit is changed packet in
   an IP tunnel, question.  However, this can be detected at the egress point of the tunnel,
   as long as the inner header was
   is not changed within the tunnel.

   The second form of subverting true in all cases, in particular in the cases where the broken
   router subverted end-to-end congestion control is to
   erase the congestion indication, either by erasing the CE bit
   directly, either falsely
   indicating ECN-Capability or by erasing the ECT bit when ECN congestion indication
   (in the CE bit is already set.
   In this case, it is the upstream congested routers codepoint).  While there are many ways that set the CE
   bit in vain.

   If the ECT bit is erased within an IP tunnel, then this a router can
   harm a flow by dropping packets, a router cannot subvert end-to-end
   congestion control by dropping packets.  As an example, a router
   cannot subvert TCP congestion control by dropping data packets,
   acknowledgement packets, or control packets.

   Even though packet-dropping cannot be
   detected at the egress point of the tunnel, as long as the inner
   header was not changed within the tunnel.  If the CE bit is set
   upstream of used to subvert end-to-end
   congestion control, there *are* non-ECN-based methods for subverting
   end-to-end congestion control that a broken or malicious router could
   use.  For example, a broken router could duplicate data packets, thus
   effectively negating the IP tunnel, then any erasure effects of the outer header's CE
   bit within the tunnel will have no effect because the inner header
   preserves the set value end-to-end congestion control
   along some portion of the CE bit.  However, if the CE bit is set path.  (For a router that duplicated
   packets within the an IPsec tunnel, and erased either within or downstream of the tun-
   nel, this is not necessarily detected at security administrator can cause
   the egress point of duplicate packets to be discarded by configuring anti-replay
   protection for the tun-
   nel.

   With this subversion tunnel.)  This duplication of end-to-end congestion control, an end-system
   transport does not respond to packets within the congestion indication.  Along with
   network would have similar implications for the increased unfairness network and for the non-subverted flows
   subverted flow as those described in Sections 18.1.1 and 18.1.4
   above.

20.  The Motivation for the
   previous section, ECT Codepoints.

20.1.  The Motivation for an ECT Codepoint.

   The need for an ECT codepoint is motivated by the congested router's queue could continue to
   build, resulting fact that ECN will
   be deployed incrementally in packet loss at an Internet where some transport
   protocols and routers understand ECN and some do not. With an ECT
   codepoint, the congested router - which is a
   means for indicating congestion to can drop packets from flows that are not ECN-
   capable, but can *instead* set the transport CE codepoint in any case.  In the
   interim, the flow might experience higher queueing delays, possibly
   along with packets that *are*
   ECN-capable. Because an increased bandwidth relative ECT codepoint allows an end node to other non-subverted
   flows.  But transports do not inherently make assumptions of consis-
   tently experiencing carefully managed queueing in have the path.  We
   believe that these forms
   CE codepoint set in a packet *instead* of subverting end-to-end congestion control
   are having the packet dropped,
   an end node might have some incentive to deploy ECN.

   If there was no worse for ECT codepoint, then the subverted flow than if router would have to set the adversary had simply
   dropped
   CE codepoint for packets from both ECN-capable and non-ECN-capable
   flows.  In this case, there would be no incentive for end-nodes to
   deploy ECN, and no viable path of incremental deployment from a non-
   ECN world to an ECN-capable world.  Consider the packets of that flow itself.

19.3.  Non-ECN-Based Methods first stages of Subverting End-to-end Congestion Control

   We have shown that, in many cases, such
   an incremental deployment, where a malicious or broken router that
   is able to change subset of the bits in flows are ECN-
   capable.  At the ECN field can do no more damage
   than if it had simply dropped onset of congestion, when the packet in question.
   dropping/marking rate would be low, routers would only set CE
   codepoints, rather than dropping packets.  However, this only those flows
   that are ECN-capable would understand and respond to CE packets. The
   result is not true in all cases, in particular in that the cases where ECN-capable flows would back off, and the broken
   router subverted end-to-end congestion control by either falsely
   indicating ECN-Capability or by erasing non-ECN-
   capable flows would be unaware of the ECN signals and would continue
   to open their congestion indication
   (in the CE-bit).  While windows.

   In this case, there are many ways that a router can harm a
   flow by dropping packets, a router cannot subvert end-to-end conges-
   tion control by dropping packets.  As an example, a router cannot
   subvert TCP congestion control by dropping data packets, acknowledge-
   ment packets, or control packets.

   Even though packet-dropping cannot be used to subvert end-to-end con-
   gestion control, there *are* non-ECN-based methods for subverting
   end-to-end two possible outcomes: (1) the ECN-capable
   flows back off, the non-ECN-capable flows get all of the bandwidth,
   and congestion control that a broken remains mild, or malicious router could
   use.  For example, a broken router could duplicate data packets, thus
   effectively negating (2) the effects of end-to-end ECN-capable flows back off,
   the non-ECN-capable flows don't, and congestion control
   along some portion of increases until the path.  (For a
   router that duplicated pack-
   ets within an IPsec tunnel, the security administrator can cause transitions from setting the
   duplicate packets CE codepoint to be discarded by configuring anti-replay protec-
   tion for dropping packets.
   While this second outcome evens out the tunnel.)  This duplication of packets within fairness, the network ECN-capable
   flows would have similar implications for still receive little benefit from being ECN-capable,
   because the network and for increased congestion would drive the subverted router to packet-
   dropping behavior.

   A flow that advertised itself as those described ECN-Capable but does not respond to
   CE codepoints is functionally equivalent to a flow that turns off
   congestion control, as discussed earlier in Sections 18.1.1 this document.

   Thus, in a world when a subset of the flows are ECN-capable, but
   where ECN-capable flows have no mechanism for indicating that fact to
   the routers, there would be less effective and 18.1.4 above.

20. less fair congestion
   control in the Internet, resulting in a strong incentive for end
   nodes not to deploy ECN.

20.2.  The Motivation for the two ECT bit. Codepoints.

   The need primary motivation for the two ECT bit codepoints is motivated by the fact that to provide a
   one-bit ECN will be
   deployed incrementally in an Internet where some transport protocols
   and routers understand nonce.  The ECN and some do not. With nonce allows the ECT bit, development of
   mechanisms for the
   router can drop packets from flows sender to probabilistically verify that network
   elements are not ECN-capable, but can
   *instead* set erasing the CE codepoint, and that data receivers
   are properly reporting to the sender the receipt of packets with the
   CE codepoint set.

   Another possibility for senders to detect misbehaving network
   elements or receivers would be for the CE bit in packets that *are* ECN-capable. Because data sender to occasionally
   send a data packet with the ECT bit allows an end node CE codepoint set, to have see if the receiver
   reports receiving the CE bit set codepoint.  Of course, if these packets
   encountered congestion in a packet
   *instead* of having the packet dropped, an end node network, the router might have some
   incentive to deploy ECN.

   If there was make no ECT indication, then
   change in the router packets, because the CE codepoint would have already be set.
   Thus, for packets sent with the CE codepoint set, the TCP end-nodes
   could not determine if some router intended to set the CE bit for packets from both ECN-capable and non-ECN-capable flows.
   In codepoint
   in these packets.  For this case, there reason, sending packets with the CE
   codepoint would be no incentive for end-nodes have to deploy
   ECN, be done sparingly, and no viable path of incremental deployment from would be a non-ECN
   world to an ECN-capable world.  Consider less
   effective check against misbehaving network elements and receivers
   than would be the first stages of such an
   incremental deployment, where a subset ECN nonce.

   The assignment of the flows are ECN-capable.
   At fourth ECN codepoint to ECT(1) precludes the onset
   use of congestion, when this codepoint for other purposes.  For clarity, we briefly
   list those possible purposes here.

   One possibility might have been for the packet dropping/marking rate
   would be low, routers would only set CE bits, rather than dropping
   packets. data sender to use the fourth
   ECN codepoint to indicate an alternate semantics for ECN.  However, only those flows that are ECN-capable
   this seems to us more appropriate to be signalled using a
   differentiated services codepoint in the DS field.

   A second possible use for the fourth ECN codepoint would under-
   stand and respond have been to CE packets. The result is that
   give the ECN-capable
   flows would back off, and router two separate codepoints for the non-ECN-capable flows would be unaware indication of the ECN signals
   congestion, CE(0) and would continue to open their CE(1), for mild and severe congestion win-
   dows.

   In
   respectively.  While this case, could be useful in some cases, this
   certainly does not seem a compelling requirement at this point.  If
   there are two possible outcomes: (1) was judged to be a compelling need for this, the ECN-capable
   flows back off, complications
   of incremental deployment would most likely necessitate more that
   just one codepoint for this function.

   A third use that has been informally proposed for the non-ECN-capable flows get all ECN codepoint
   is for use in some forms of multicast congestion control, based on
   randomized procedures for duplicating marked packets at routers.
   Some proposed multicast packet duplication procedures are based on a
   new ECN codepoint that (1) conveys the bandwidth,
   and fact that congestion remains mild, or (2) occurred
   upstream of the ECN-capable flows back off, duplication point that marked the non-ECN-capable flows don't, packet with this
   codepoint and (2) can detect congestion increases until the
   router transitions downstream of that
   duplication point.  ECT(1) can serve this purpose because it is both
   distinct from setting the ECT(0) and is replaced by CE bit when ECN marking occurs in
   response to dropping packets.
   While this second outcome evens out the fairness, the ECN-capable
   flows would still receive little benefit from being ECN-capable,
   because the increased congestion or incipient congestion.  Explanation of how
   this enhanced version of ECN would drive the router to packet-
   dropping behavior.

   A flow that advertised itself as ECN-Capable but does not respond to
   CE bits be used by multicast congestion
   control is functionally equivalent to a flow that turns off conges-
   tion control, as discussed earlier in beyond the scope of this document.

   Thus, document, as are ECN-aware
   multicast packet duplication procedures and the processing of the ECN
   field at multicast receivers in a world when a subset all cases (i.e., irrespective of the flows are ECN-capable, but
   where ECN-capable flows have no mechanism
   multicast packet duplication procedure(s) used).

   The specification of IP tunnel modifications for indicating ECN in this document
   assumes that fact the only change made to the routers, there would be less effective and less fair congestion
   control in outer IP header's ECN field
   between tunnel endpoints is to set the Internet, resulting CE codepoint to indicate
   congestion.  This is not consistent with some of the proposed uses of
   ECT(1) by the multicast duplication procedures in a strong incentive the previous
   paragraph, and such procedures SHOULD NOT be deployed within tunnels
   configured for end
   nodes full ECN functionality.  Limited ECN functionality may
   be used instead, although in practice many tunnel protocols
   (including IPsec) will not to deploy ECN. work correctly if multicast traffic
   duplication occurs within the tunnel

21.  Why use Two Bits in the IP Header?

   Given the need for an ECT indication in the IP header, there still
   remains the question of whether the ECT (ECN-Capable Transport) and
   CE (Congestion Experienced) indications codepoints should have been overloaded on
   a single bit.  This overloaded-one-bit alternative, explored in
   [Floyd94], would have involved a single bit with two values.  One
   value, "ECT and not CE", would represent an ECN-Capable Transport,
   and the other value, "CE or not ECT", would represent either Conges-
   tion
   Congestion Experienced or a non-ECN-Capable transport.

   One difference between the one-bit and two-bit implementations con-
   cerns
   concerns packets that traverse multiple congested routers.  Consider
   a CE packet that arrives at a second congested router, and is
   selected by the active queue management at that router for either
   marking or dropping.  In the one-bit implementation, the second
   congested router has no choice but to drop the CE packet, because it
   cannot distin-
   guish distinguish between a CE packet and a non-ECT packet.  In the
   two-bit implementation, the second congested router has the choice of
   either dropping the CE packet, or of leaving it alone with the CE bit
   codepoint set.

   Another difference between the one-bit and two-bit implementations
   comes from the fact that with the one-bit implementation, receivers
   in a single flow cannot distinguish between CE and non-ECT packets.
   Thus, in the one-bit implementation an ECN-capable data sender would
   have to unambiguously indicate to the receiver or receivers whether
   each packet had been sent as ECN-Capable or as non-ECN-Capable.  One
   possibility would be for the sender to indicate in the transport
   header whether the packet was sent as ECN-Capable.  A second possi-
   bility
   possibility that would involve a functional limitation for the one-
   bit implementation would be for the sender to unambiguously indicate
   that it was going to send *all* of its packets as ECN-Capable or as non-
   ECN-Capable.
   non-ECN-Capable.  For a multicast transport protocol, this
   unambiguous indication would have to be apparent to receivers joining
   an on-going multicast session.

   Another concern that was described earlier (and recommended in this
   document) is that transports (particularly TCP) should not mark pure
   ACK packets or retransmitted packets as being ECN-Capable.  A pure
   ACK packet from a non-ECN-capable transport could be dropped, without
   necessarily having an impact on the transport from a congestion con-
   trol
   control perspective (because subsequent ACKs are cumulative).  An ECN-
   capable
   ECN-capable transport reacting to the CE bit set codepoint in a pure ACK
   packet by reducing the window would be at a disadvantage in
   comparison to a non-ECN-capable transport. For this reason (and for
   reasons described earlier in relation to retransmitted packets), it
   is desirable to have the ECN-Capable bit indication ECT codepoint set on a per-packet basis.

   Another advantage of the two-bit approach is that it is somewhat more
   robust.  The most critical issue, discussed in Section 8, is that the
   default indication should be that of a non-ECN-Capable transport.  In
   a two-bit implementation, this requirement for the default value sim-
   ply
   simply means that the ECT bit non-ECT codepoint should be `OFF' by the default.  In
   the one-
   bit one-bit implementation, this means that the single overloaded bit
   should by default be in the "CE or not ECT" position.  This is less
   clear and straightforward, and possibly more open to incorrect implementa-
   tions
   implementations either in the end nodes or in the routers.

   In summary, while the one-bit implementation could be a possible
   implementation, it has the following significant limitations relative
   to the two-bit implementation.  First, the one-bit implementation has
   more limited functionality for the treatment of CE packets at a sec-
   ond
   second congested router.  Second, the one-bit implementation requires
   either that extra information be carried in the transport header of
   packets from ECN-Capable flows (to convey the functionality of the
   second bit elsewhere, namely in the transport header), or that
   senders in ECN-Capable flows accept the limitation that receivers
   must be able to determine a priori which packets are ECN-Capable and
   which are not ECN-Capable. Third, the one-bit implementation is pos-
   sibly
   possibly more open to errors from faulty implementations that choose
   the wrong default value for the ECN bit.  We believe that the use of
   the extra bit in the IP header for the ECT-bit is extremely valuable
   to overcome these limitations.

22.  Historical Definitions for the IPv4 TOS Octet

   RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP
   header.  In RFC 791, bits 6 and 7 of the ToS octet are listed as
   "Reserved for Future Use", and are shown set to zero.  The first two
   fields of the ToS octet were defined as the Precedence and Type of
   Service (TOS) fields.

            0     1     2     3     4     5     6     7
         +-----+-----+-----+-----+-----+-----+-----+-----+
         |   PRECEDENCE    |       TOS       |  0  |  0  |  RFC 791
         +-----+-----+-----+-----+-----+-----+-----+-----+

   RFC 1122 included bits 6 and 7 in the TOS field, though it did not
   discuss any specific use for those two bits:

            0     1     2     3     4     5     6     7
         +-----+-----+-----+-----+-----+-----+-----+-----+
         |   PRECEDENCE    |       TOS                   |  RFC 1122
         +-----+-----+-----+-----+-----+-----+-----+-----+

   The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows:

            0     1     2     3     4     5     6     7
         +-----+-----+-----+-----+-----+-----+-----+-----+
         |   PRECEDENCE    |       TOS             | MBZ |  RFC 1349
         +-----+-----+-----+-----+-----+-----+-----+-----+

   Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary
   Cost".  In addition to the Precedence and Type of Service (TOS)
   fields, the last field, MBZ (for "must be zero") was defined as cur-
   rently
   currently unused.  RFC 1349 stated that "The originator of a datagram
   sets [the MBZ] field to zero (unless participating in an Internet
   protocol experiment which makes use of that bit)."

   RFC 1455 [RFC 1455] defined an experimental standard that used all
   four bits in the TOS field to request a guaranteed level of link
   security.

   RFC 1349 and RFC 1455 have been obsoleted by "Definition of the Dif-
   ferentiated
   Differentiated Services Field (DS Field) in the IPv4 and IPv6
   Headers" [RFC2474] in which bits 6 and 7 of the DS field are listed
   as Cur-
   rently Currently Unused (CU).  RFC 2780 [RFC2780] specified ECN as an experi-
   mental
   experimental use of the two-bit CU field.  RFC 2780 updated the
   definition of the DS Field to only encompass the first six bits of
   this octet rather than all eight bits; these first six bits are
   defined as the Differentiated Services CodePoint (DSCP):

            0     1     2     3     4     5     6     7
         +-----+-----+-----+-----+-----+-----+-----+-----+
         |               DSCP                |    CU     |  RFCs 2474,
   2780
         +-----+-----+-----+-----+-----+-----+-----+-----+

   Because of this unstable history, the definition of the ECN field in
   this document cannot be guaranteed to be backwards compatible with
   all past uses of these two bits.

   Prior to RFC 2474, routers were not permitted to modify bits in
   either the DSCP or ECN field of packets forwarded through them, and
   hence routers that comply only with RFCs prior to 2474 should have no
   effect on ECN.  For end nodes, bit 7 (the second ECN CE bit) must be trans-
   mitted
   transmitted as zero for any implementation compliant only with RFCs
   prior to 2474.  Such nodes may transmit bit 6 (the first ECN ECT bit) as
   one for the "Minimize Monetary Cost" provision of RFC 1349 or the
   experiment authorized by RFC 1455; neither this aspect of RFC 1349
   nor the experiment in RFC 1455 were widely implemented or used.  The
   damage that could be done by a broken, non-conformant router would be to
   "erase"
   include "erasing" the CE bit codepoint for an ECN- capable ECN-capable packet that
   arrived at the router with the CE bit codepoint set, or set setting the CE bit
   codepoint even in the absence of congestion.  This has been discussed
   in the section on "Non-compli-
   ance "Non-compliance in the Network".

   The damage that could be done in an ECN-capable environment by a non-
   ECN-capable end-node transmitting packets with the ECT bit codepoint set
   has been discussed in the section on "Non-compliance by the End
   Nodes".

23.  IANA Considerations

   The bits codepoints for ECT and CE in the ECN Field of the IP header and the bits for
   CWR and ECE in the TCP header are specified by the Standards Action
   of this RFC, as is required by RFC 2780.  We would note that
   this RFC does not define the codepoint of (ECT=0, CE=1) for the ECT
   and CE bits.

   IANA allocated the IPSEC Security Association Attribute value 10 for
   the ECN Tunnel use described in Section 9.2.1.2 above at the request
   of David Black in November 1999.  If this draft is approved for pub-
   lication
   publication as an RFC, IANA should change the Reference for this alloca-
   tion
   allocation from David Black's request to this RFC based on its RFC
   number.

   AUTHORS' ADDRESSES

      K. K. Ramakrishnan
      TeraOptic Networks, Inc.
      Phone: +1 (408) 666-8650
      Email: kk@teraoptic.com

      Sally Floyd
      Phone: +1 (510) 666-2989
      ACIRI
      Email: floyd@aciri.org
      URL: http://www.aciri.org/floyd/

      David L. Black
      EMC Corporation
      42 South St.
      Hopkinton, MA  01748
      Phone:  +1 (508) 435-1000 x75140
      Email: black_david@emc.com

      This draft was created in January February 2001.
      It expires July August 2001.