draft-ietf-tsvwg-ecn-04.txt   rfc3168.txt 
Internet Engineering Task Force K. K. Ramakrishnan Network Working Group K. Ramakrishnan
INTERNET DRAFT TeraOptic Networks Request for Comments: 3168 TeraOptic Networks
draft-ietf-tsvwg-ecn-04.txt Sally Floyd Updates: 2474, 2401, 793 S. Floyd
ACIRI Obsoletes: 2481 ACIRI
D. Black Category: Standards Track D. Black
EMC EMC
June, 2001 September 2001
Expires: December, 2001
The Addition of Explicit Congestion Notification (ECN) to IP The Addition of Explicit Congestion Notification (ECN) to IP
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months This document specifies an Internet standards track protocol for the
and may be updated, replaced, or obsoleted by other documents at any Internet community, and requests discussion and suggestions for
time. It is inappropriate to use Internet-Drafts as reference improvements. Please refer to the current edition of the "Internet
material or to cite them other than as "work in progress." Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
The list of current Internet-Drafts can be accessed at Copyright Notice
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at Copyright (C) The Internet Society (2001). All Rights Reserved.
http://www.ietf.org/shadow.html.
Abstract Abstract
This document specifies the incorporation of ECN (Explicit Congestion This memo specifies the incorporation of ECN (Explicit Congestion
Notification) to TCP and IP, including ECN's use of two bits in the Notification) to TCP and IP, including ECN's use of two bits in the
IP header. We begin by describing TCP's use of packet drops as an IP header.
indication of congestion. Next we explain that with the addition of
active queue management (e.g., RED) to the Internet infrastructure, Table of Contents
where routers detect congestion before the queue overflows, routers
are no longer limited to packet drops as an indication of congestion. 1. Introduction.................................................. 3
Routers can instead set the Congestion Experienced (CE) codepoint in 2. Conventions and Acronyms...................................... 5
the IP header of packets from ECN-capable transports. We describe 3. Assumptions and General Principles............................ 5
when the CE codepoint is to be set in routers, and describe 4. Active Queue Management (AQM)................................. 6
modifications needed to TCP to make it ECN-capable. Modifications to 5. Explicit Congestion Notification in IP........................ 6
other transport protocols (e.g., unreliable unicast or multicast, 5.1. ECN as an Indication of Persistent Congestion............... 10
reliable multicast, other reliable unicast transport protocols) could 5.2. Dropped or Corrupted Packets................................ 11
be considered as those protocols are developed and advance through 5.3. Fragmentation............................................... 11
the standards process. We also describe in this document the issues 6. Support from the Transport Protocol........................... 12
6.1. TCP......................................................... 13
6.1.1 TCP Initialization......................................... 14
6.1.1.1. Middlebox Issues........................................ 16
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field. 17
6.1.2. The TCP Sender............................................ 18
6.1.3. The TCP Receiver.......................................... 19
6.1.4. Congestion on the ACK-path................................ 20
6.1.5. Retransmitted TCP packets................................. 20
6.1.6. TCP Window Probes......................................... 22
7. Non-compliance by the End Nodes............................... 22
8. Non-compliance in the Network................................. 24
8.1. Complications Introduced by Split Paths..................... 25
9. Encapsulated Packets.......................................... 25
9.1. IP packets encapsulated in IP............................... 25
9.1.1. The Limited-functionality and Full-functionality Options.. 27
9.1.2. Changes to the ECN Field within an IP Tunnel.............. 28
9.2. IPsec Tunnels............................................... 29
9.2.1. Negotiation between Tunnel Endpoints...................... 31
9.2.1.1. ECN Tunnel Security Association Database Field.......... 32
9.2.1.2. ECN Tunnel Security Association Attribute............... 32
9.2.1.3. Changes to IPsec Tunnel Header Processing............... 33
9.2.2. Changes to the ECN Field within an IPsec Tunnel........... 35
9.2.3. Comments for IPsec Support................................ 35
9.3. IP packets encapsulated in non-IP Packet Headers............ 36
10. Issues Raised by Monitoring and Policing Devices............. 36
11. Evaluations of ECN........................................... 37
11.1. Related Work Evaluating ECN................................ 37
11.2. A Discussion of the ECN nonce.............................. 37
11.2.1. The Incremental Deployment of ECT(1) in Routers.......... 38
12. Summary of changes required in IP and TCP.................... 38
13. Conclusions.................................................. 40
14. Acknowledgements............................................. 41
15. References................................................... 41
16. Security Considerations...................................... 45
17. IPv4 Header Checksum Recalculation........................... 45
18. Possible Changes to the ECN Field in the Network............. 45
18.1. Possible Changes to the IP Header.......................... 46
18.1.1. Erasing the Congestion Indication........................ 46
18.1.2. Falsely Reporting Congestion............................. 47
18.1.3. Disabling ECN-Capability................................. 47
18.1.4. Falsely Indicating ECN-Capability........................ 47
18.2. Information carried in the Transport Header................ 48
18.3. Split Paths................................................ 49
19. Implications of Subverting End-to-End Congestion Control..... 50
19.1. Implications for the Network and for Competing Flows....... 50
19.2. Implications for the Subverted Flow........................ 53
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion
Control.................................................... 54
20. The Motivation for the ECT Codepoints........................ 54
20.1. The Motivation for an ECT Codepoint........................ 54
20.2. The Motivation for two ECT Codepoints...................... 55
21. Why use Two Bits in the IP Header?........................... 57
22. Historical Definitions for the IPv4 TOS Octet................ 58
23. IANA Considerations.......................................... 60
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet................. 60
23.2. TCP Header Flags........................................... 61
23.3. IPSEC Security Association Attributes....................... 62
24. Authors' Addresses........................................... 62
25. Full Copyright Statement..................................... 63
1. Introduction
We begin by describing TCP's use of packet drops as an indication of
congestion. Next we explain that with the addition of active queue
management (e.g., RED) to the Internet infrastructure, where routers
detect congestion before the queue overflows, routers are no longer
limited to packet drops as an indication of congestion. Routers can
instead set the Congestion Experienced (CE) codepoint in the IP
header of packets from ECN-capable transports. We describe when the
CE codepoint is to be set in routers, and describe modifications
needed to TCP to make it ECN-capable. Modifications to other
transport protocols (e.g., unreliable unicast or multicast, reliable
multicast, other reliable unicast transport protocols) could be
considered as those protocols are developed and advance through the
standards process. We also describe in this document the issues
involving the use of ECN within IP tunnels, and within IPsec tunnels involving the use of ECN within IP tunnels, and within IPsec tunnels
in particular. in particular.
One of the guiding principles for this document is that, to the One of the guiding principles for this document is that, to the
extent possible, the mechanisms specified here be incrementally extent possible, the mechanisms specified here be incrementally
deployable. One challenge to the principle of incremental deployment deployable. One challenge to the principle of incremental deployment
has been the prior existence of some IP tunnels that were not has been the prior existence of some IP tunnels that were not
compatible with the use of ECN. As ECN becomes deployed, non- compatible with the use of ECN. As ECN becomes deployed, non-
compatible IP tunnels will have to be upgraded to conform to this compatible IP tunnels will have to be upgraded to conform to this
document. document.
This document is intended to obsolete RFC 2481, "A Proposal to add This document obsoletes RFC 2481, "A Proposal to add Explicit
Explicit Congestion Notification (ECN) to IP", which defined ECN as Congestion Notification (ECN) to IP", which defined ECN as an
an Experimental Protocol for the Internet Community. Experimental Protocol for the Internet Community. This document also
updates RFC 2474, "Definition of the Differentiated Services Field
RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This (DS Field) in the IPv4 and IPv6 Headers", in defining the ECN field
document also obsoletes three subsequent internet-drafts on ECN, in the IP header, RFC 2401, "Security Architecture for the Internet
"IPsec Interactions with ECN", "ECN Interactions with IP Tunnels", Protocol" to change the handling of IPv4 TOS Byte and IPv6 Traffic
and "TCP with ECN: The Treatment of Retransmitted Data Packets". Class Octet in tunnel mode header construction to be compatible with
This document also updates RFC 2401 on "Security Architecture for the the use of ECN, and RFC 793, "Transmission Control Protocol", in
Internet Protocol". defining two new flags in the TCP header.
Table of Contents
1. Introduction
2. Conventions and Acronyms
3. Assumptions and General Principles
4. Active Queue Management (AQM)
5. Explicit Congestion Notification in IP
5.1. ECN as an Indication of Persistent Congestion
5.2. Dropped or Corrupted Packets
5.3. Fragmentation
6. Support from the Transport Protocol
6.1. TCP
6.1.1 TCP Initialization
6.1.1.1. Middlebox Issues
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field
6.1.2. The TCP Sender
6.1.3. The TCP Receiver
6.1.4. Congestion on the ACK-path
6.1.5. Retransmitted TCP packets
6.1.6. TCP Window Probes.
7. Non-compliance by the End Nodes
8. Non-compliance in the Network
8.1. Complications Introduced by Split Paths
9. Encapsulated Packets
9.1. IP packets encapsulated in IP
9.1.1. The Limited-functionality and Full-functionality Options
9.1.2. Changes to the ECN Field within an IP Tunnel.
9.2. IPsec Tunnels
9.2.1. Negotiation between Tunnel Endpoints
9.2.1.1. ECN Tunnel Security Association Database Field
9.2.1.2. ECN Tunnel Security Association Attribute
9.2.1.3. Changes to IPsec Tunnel Header Processing
9.2.2. Changes to the ECN Field within an IPsec Tunnel.
9.2.3. Comments for IPsec Support
9.3. IP packets encapsulated in non-IP Packet Headers.
10. Issues Raised by Monitoring and Policing Devices
11. Evaluations of ECN
11.1. Related Work Evaluating ECN
11.2. A Discussion of the ECN nonce.
11.2.1. The Incremental Deployment of ECT(1) in Routers.
12. Summary of changes required in IP and TCP
13. Conclusions
14. Acknowledgements
15. References
16. Security Considerations
17. IPv4 Header Checksum Recalculation
18. Possible Changes to the ECN Field in the Network
18.1. Possible Changes to the IP Header
18.1.1. Erasing the Congestion Indication
18.1.2. Falsely Reporting Congestion
18.1.3. Disabling ECN-Capability
18.1.4. Falsely Indicating ECN-Capability
18.2. Information carried in the Transport Header
18.3. Split Paths
19. Implications of Subverting End-to-End Congestion Control
19.1. Implications for the Network and for Competing Flows
19.2. Implications for the Subverted Flow
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control
20. The Motivation for the ECT Codepoints.
20.1. The Motivation for an ECT Codepoint.
20.2. The Motivation for two ECT Codepoints.
21. Why use Two Bits in the IP Header?
22. Historical Definitions for the IPv4 TOS Octet
23. IANA Considerations
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet
23.2. TCP Header Flags
23.3. IPSEC Security Association Attributes
RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION -
To compare this with draft-ietf-tsvwg-ecn-03, compare the following:
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-03.troff"
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-04.troff"
Changes from draft-ietf-tsvwg-ecn-03:
An expanded section on IANA Considerations.
Added back the section on "Middlebox Issues" about devices that either
drop an ECN-setup SYN packet or respond with a RST.
Clarified MUSTs and SHOULDs for limited-functionality and full-
functionality modes of tunnels.
Changed "should" to "MUST" for the sentence about not using ECT with
pure ACK TCP packets.
Specified that ECN status is ignored in TCP once TIME-WAIT state is
entered.
Moved notes to the RFC editor about obsoleted documents to the
beginning of this document.
Some minor rephrasing for clarity.
Changes from draft-ietf-tsvwg-ecn-02:
Revised Section 5.3 on fragmentation.
Changes from draft-ietf-tsvwg-ecn-01:
Added the ECT(1) codepoint, and changed references about bits to
references about codepoints in many places. Also added Section 11.2 on
"A Discussion of the ECN nonce", and Section 20.2 on "The Motivation for
two ECT Codepoints".
Added a paragraph saying that by default, the discussion of setting
the CE codepoint applies to all Differentiated Services Per-Hop
Behaviors.
Added Section 5.3 on fragmentation.
Added "A host MUST NOT set ECT on SYN or SYN-ACK packets." to the end
of Section 6.1.1, just to be explicit.
Corrected some references to "Section 19" to "Section 22".
Clarified that ECN is defined identically in IPv4 and in IPv6.
1. Introduction
TCP's congestion control and avoidance algorithms are based on the TCP's congestion control and avoidance algorithms are based on the
notion that the network is a black-box [Jacobson88, Jacobson90]. The notion that the network is a black-box [Jacobson88, Jacobson90]. The
network's state of congestion or otherwise is determined by end- network's state of congestion or otherwise is determined by end-
systems probing for the network state, by gradually increasing the systems probing for the network state, by gradually increasing the
load on the network (by increasing the window of packets that are load on the network (by increasing the window of packets that are
outstanding in the network) until the network becomes congested and a outstanding in the network) until the network becomes congested and a
packet is lost. Treating the network as a "black-box" and treating packet is lost. Treating the network as a "black-box" and treating
loss as an indication of congestion in the network is appropriate for loss as an indication of congestion in the network is appropriate for
pure best-effort data carried by TCP, with little or no sensitivity pure best-effort data carried by TCP, with little or no sensitivity
skipping to change at page 5, line 43 skipping to change at page 4, line 29
gradually increasing the window size until it experiences a dropped gradually increasing the window size until it experiences a dropped
packet, this causes the queues at the bottleneck router to build up. packet, this causes the queues at the bottleneck router to build up.
With most packet drop policies at the router that are not sensitive With most packet drop policies at the router that are not sensitive
to the load placed by each individual flow (e.g., tail-drop on queue to the load placed by each individual flow (e.g., tail-drop on queue
overflow), this means that some of the packets of latency-sensitive overflow), this means that some of the packets of latency-sensitive
flows may be dropped. In addition, such drop policies lead to flows may be dropped. In addition, such drop policies lead to
synchronization of loss across multiple flows. synchronization of loss across multiple flows.
Active queue management mechanisms detect congestion before the queue Active queue management mechanisms detect congestion before the queue
overflows, and provide an indication of this congestion to the end overflows, and provide an indication of this congestion to the end
nodes. Thus, active queue management can reduce unnecessary queueing nodes. Thus, active queue management can reduce unnecessary queuing
delay for all traffic sharing that queue. The advantages of active delay for all traffic sharing that queue. The advantages of active
queue management are discussed in RFC 2309 [RFC2309]. Active queue queue management are discussed in RFC 2309 [RFC2309]. Active queue
management avoids some of the bad properties of dropping on queue management avoids some of the bad properties of dropping on queue
overflow, including the undesirable synchronization of loss across overflow, including the undesirable synchronization of loss across
multiple flows. More importantly, active queue management means that multiple flows. More importantly, active queue management means that
transport protocols with mechanisms for congestion control (e.g., transport protocols with mechanisms for congestion control (e.g.,
TCP) do not have to rely on buffer overflow as the only indication of TCP) do not have to rely on buffer overflow as the only indication of
congestion. congestion.
Active queue management mechanisms may use one of several methods for Active queue management mechanisms may use one of several methods for
indicating congestion to end-nodes. One is to use packet drops, as is indicating congestion to end-nodes. One is to use packet drops, as is
currently done. However, active queue management allows the router to currently done. However, active queue management allows the router to
separate policies of queueing or dropping packets from the policies separate policies of queuing or dropping packets from the policies
for indicating congestion. Thus, active queue management allows for indicating congestion. Thus, active queue management allows
routers to use the Congestion Experienced (CE) codepoint in a packet routers to use the Congestion Experienced (CE) codepoint in a packet
header as an indication of congestion, instead of relying solely on header as an indication of congestion, instead of relying solely on
packet drops. This has the potential of reducing the impact of loss packet drops. This has the potential of reducing the impact of loss
on latency-sensitive flows. on latency-sensitive flows.
There exist some middleboxes (firewalls, load balancers, or intrusion There exist some middleboxes (firewalls, load balancers, or intrusion
detection systems) in the Internet that either drop a TCP SYN packet detection systems) in the Internet that either drop a TCP SYN packet
configured to negotiate ECN, or respond with a RST. This document configured to negotiate ECN, or respond with a RST. This document
specifies procedures that TCP implementations may use to provide specifies procedures that TCP implementations may use to provide
robust connectivity even in the presence of such equipment. robust connectivity even in the presence of such equipment.
This document is intended to obsolete RFC 2481, "A Proposal to add
Explicit Congestion Notification (ECN) to IP", which defined ECN as
an Experimental Protocol for the Internet Community.
2. Conventions and Acronyms 2. Conventions and Acronyms
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119]. document, are to be interpreted as described in [RFC2119].
3. Assumptions and General Principles 3. Assumptions and General Principles
In this section, we describe some of the important design principles In this section, we describe some of the important design principles
and assumptions that guided the design choices in this proposal. and assumptions that guided the design choices in this proposal.
* Because ECN is likely to be adopted gradually, accommodating * Because ECN is likely to be adopted gradually, accommodating
migration is essential. Some routers may still only drop packets to migration is essential. Some routers may still only drop packets
indicate congestion, and some end-systems may not be ECN-capable. The to indicate congestion, and some end-systems may not be ECN-
most viable strategy is one that accommodates incremental deployment capable. The most viable strategy is one that accommodates
without having to resort to "islands" of ECN-capable and non-ECN- incremental deployment without having to resort to "islands" of
capable environments. ECN-capable and non-ECN-capable environments.
* New mechanisms for congestion control and avoidance need to co-
exist and cooperate with existing mechanisms for congestion control. * New mechanisms for congestion control and avoidance need to co-
In particular, new mechanisms have to co-exist with TCP's current exist and cooperate with existing mechanisms for congestion
methods of adapting to congestion and with routers' current practice control. In particular, new mechanisms have to co-exist with
of dropping packets in periods of congestion. TCP's current methods of adapting to congestion and with
* Congestion may persist over different time-scales. The time scales routers' current practice of dropping packets in periods of
that we are concerned with are congestion events that may last longer congestion.
than a round-trip time.
* The number of packets in an individual flow (e.g., TCP connection * Congestion may persist over different time-scales. The time
or an exchange using UDP) may range from a small number of packets to scales that we are concerned with are congestion events that may
quite a large number. We are interested in managing the congestion last longer than a round-trip time.
caused by flows that send enough packets so that they are still
active when network feedback reaches them. * The number of packets in an individual flow (e.g., TCP
* Asymmetric routing is likely to be a normal occurrence in the connection or an exchange using UDP) may range from a small
Internet. The path (sequence of links and routers) followed by data number of packets to quite a large number. We are interested in
packets may be different from the path followed by the acknowledgment managing the congestion caused by flows that send enough packets
packets in the reverse direction. so that they are still active when network feedback reaches
* Many routers process the "regular" headers in IP packets more them.
efficiently than they process the header information in IP options.
This suggests keeping congestion experienced information in the * Asymmetric routing is likely to be a normal occurrence in the
regular headers of an IP packet. Internet. The path (sequence of links and routers) followed by
* It must be recognized that not all end-systems will cooperate in data packets may be different from the path followed by the
mechanisms for congestion control. However, new mechanisms shouldn't acknowledgment packets in the reverse direction.
make it easier for TCP applications to disable TCP congestion
control. The benefit of lying about participating in new mechanisms * Many routers process the "regular" headers in IP packets more
such as ECN-capability should be small. efficiently than they process the header information in IP
options. This suggests keeping congestion experienced
information in the regular headers of an IP packet.
* It must be recognized that not all end-systems will cooperate in
mechanisms for congestion control. However, new mechanisms
shouldn't make it easier for TCP applications to disable TCP
congestion control. The benefit of lying about participating in
new mechanisms such as ECN-capability should be small.
4. Active Queue Management (AQM) 4. Active Queue Management (AQM)
Random Early Detection (RED) is one mechanism for Active Queue Random Early Detection (RED) is one mechanism for Active Queue
Management (AQM) that has been proposed to detect incipient Management (AQM) that has been proposed to detect incipient
congestion [FJ93], and is currently being deployed in the Internet congestion [FJ93], and is currently being deployed in the Internet
[RFC2309]. AQM is meant to be a general mechanism using one of [RFC2309]. AQM is meant to be a general mechanism using one of
several alternatives for congestion indication, but in the absence of several alternatives for congestion indication, but in the absence of
ECN, AQM is restricted to using packet drops as a mechanism for ECN, AQM is restricted to using packet drops as a mechanism for
congestion indication. AQM drops packets based on the average queue congestion indication. AQM drops packets based on the average queue
skipping to change at page 8, line 27 skipping to change at page 7, line 23
particular, this document does not address mechanisms for TCP end- particular, this document does not address mechanisms for TCP end-
nodes to differentiate between the ECT(0) and ECT(1) codepoints. nodes to differentiate between the ECT(0) and ECT(1) codepoints.
Protocols and senders that only require a single ECT codepoint SHOULD Protocols and senders that only require a single ECT codepoint SHOULD
use ECT(0). use ECT(0).
The not-ECT codepoint '00' indicates a packet that is not using ECN. The not-ECT codepoint '00' indicates a packet that is not using ECN.
The CE codepoint '11' is set by a router to indicate congestion to The CE codepoint '11' is set by a router to indicate congestion to
the end nodes. Routers that have a packet arriving at a full queue the end nodes. Routers that have a packet arriving at a full queue
drop the packet, just as they do in the absence of ECN. drop the packet, just as they do in the absence of ECN.
+-----+-----+ +-----+-----+
| ECN FIELD | | ECN FIELD |
+-----+-----+ +-----+-----+
ECT CE [Obsolete] RFC 2481 names for the ECN bits. ECT CE [Obsolete] RFC 2481 names for the ECN bits.
0 0 Not-ECT 0 0 Not-ECT
0 1 ECT(1) 0 1 ECT(1)
1 0 ECT(0) 1 0 ECT(0)
1 1 CE 1 1 CE
Figure 1: The ECN Field in IP. Figure 1: The ECN Field in IP.
The use of two ECT codepoints essentially gives a one-bit ECN nonce The use of two ECT codepoints essentially gives a one-bit ECN nonce
in packet headers, and routers necessarily "erase" the nonce when in packet headers, and routers necessarily "erase" the nonce when
they set the CE codepoint [SCWA99]. For example, routers that erased they set the CE codepoint [SCWA99]. For example, routers that erased
the CE codepoint would face additional difficulty in reconstructing the CE codepoint would face additional difficulty in reconstructing
the original nonce, and thus repeated erasure of the CE codepoint the original nonce, and thus repeated erasure of the CE codepoint
would be more likely to be detected by the end-nodes. The ECN nonce would be more likely to be detected by the end-nodes. The ECN nonce
also can address the problem of misbehaving transport receivers lying also can address the problem of misbehaving transport receivers lying
to the transport sender about whether or not the CE codepoint was set to the transport sender about whether or not the CE codepoint was set
in a packet. The motivations for the use of two ECT codepoints is in a packet. The motivations for the use of two ECT codepoints is
discussed in more detail in Section 20, along with some discussion of discussed in more detail in Section 20, along with some discussion of
alternate possibilities for the fourth ECT codepoint (that is, the alternate possibilities for the fourth ECT codepoint (that is, the
codepoint '01'). Backwards compatibility with earlier ECN codepoint '01'). Backwards compatibility with earlier ECN
implementations that do not understand the ECT(1) codepoint is implementations that do not understand the ECT(1) codepoint is
discussed in Section 11. discussed in Section 11.
In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable
Transport (ECT) bit and the CE bit. The ECN field with only the ECN- Transport (ECT) bit and the CE bit. The ECN field with only the
Capable Transport (ECT) bit set in RFC 2481 corresponds to the ECT(0) ECN-Capable Transport (ECT) bit set in RFC 2481 corresponds to the
codepoint in this document, and the ECN field with both the ECT and ECT(0) codepoint in this document, and the ECN field with both the
CE bit in RFC 2481 corresponds to the CE codepoint in this document. ECT and CE bit in RFC 2481 corresponds to the CE codepoint in this
The '01' codepoint was left undefined in RFC 2481, and this is the document. The '01' codepoint was left undefined in RFC 2481, and
reason for recommending the use of ECT(0) when only a single ECT this is the reason for recommending the use of ECT(0) when only a
codepoint is needed. single ECT codepoint is needed.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
| DS FIELD, DSCP | ECN FIELD | | DS FIELD, DSCP | ECN FIELD |
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
DSCP: differentiated services codepoint DSCP: differentiated services codepoint
ECN: Explicit Congestion Notification ECN: Explicit Congestion Notification
Figure 2: The Differentiated Services and ECN Fields in IP. Figure 2: The Differentiated Services and ECN Fields in IP.
Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field.
The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6, The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6,
and the ECN field is defined identically in both cases. The and the ECN field is defined identically in both cases. The
definitions for the IPv4 TOS octet [RFC791] and the IPv6 Traffic definitions for the IPv4 TOS octet [RFC791] and the IPv6 Traffic
Class octet have been superseded by the six-bit DS (Differentiated Class octet have been superseded by the six-bit DS (Differentiated
Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed in Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed in
[RFC2474] as Currently Unused, and are specified in RFC 2780 as [RFC2474] as Currently Unused, and are specified in RFC 2780 as
approved for experimental use for ECN. Section 22 gives a brief approved for experimental use for ECN. Section 22 gives a brief
history of the TOS octet. history of the TOS octet.
Because of the unstable history of the TOS octet, the use of the ECN Because of the unstable history of the TOS octet, the use of the ECN
field as specified in this document cannot be guaranteed to be field as specified in this document cannot be guaranteed to be
backwards compatible with those past uses of these two bits that pre- backwards compatible with those past uses of these two bits that
date ECN. The potential dangers of this lack of backwards pre-date ECN. The potential dangers of this lack of backwards
compatibility are discussed in Section 22. compatibility are discussed in Section 22.
Upon the receipt by an ECN-Capable transport of a single CE packet, Upon the receipt by an ECN-Capable transport of a single CE packet,
the congestion control algorithms followed at the end-systems MUST be the congestion control algorithms followed at the end-systems MUST be
essentially the same as the congestion control response to a *single* essentially the same as the congestion control response to a *single*
dropped packet. For example, for ECN-Capable TCP the source TCP is dropped packet. For example, for ECN-Capable TCP the source TCP is
required to halve its congestion window for any window of data required to halve its congestion window for any window of data
containing either a packet drop or an ECN indication. containing either a packet drop or an ECN indication.
One reason for requiring that the congestion-control response to the One reason for requiring that the congestion-control response to the
skipping to change at page 13, line 7 skipping to change at page 12, line 4
development to allow end-nodes to interpret packet drops as development to allow end-nodes to interpret packet drops as
indications of corruption rather than congestion. indications of corruption rather than congestion.
5.3. Fragmentation 5.3. Fragmentation
ECN-capable packets MAY have the DF (Don't Fragment) bit set. ECN-capable packets MAY have the DF (Don't Fragment) bit set.
Reassembly of a fragmented packet MUST NOT lose indications of Reassembly of a fragmented packet MUST NOT lose indications of
congestion. In other words, if any fragment of an IP packet to be congestion. In other words, if any fragment of an IP packet to be
reassembled has the CE codepoint set, then one of two actions MUST be reassembled has the CE codepoint set, then one of two actions MUST be
taken: taken:
* Set the CE codepoint on the reassembled packet. However, this * Set the CE codepoint on the reassembled packet. However, this
MUST NOT occur if any of the other fragments contributing to this MUST NOT occur if any of the other fragments contributing to
reassembly carries the Not-ECT codepoint. this reassembly carries the Not-ECT codepoint.
* The packet is dropped, instead of being reassembled, for any * The packet is dropped, instead of being reassembled, for any
other reason. other reason.
If both actions are applicable, either MAY be chosen. Reassembly of If both actions are applicable, either MAY be chosen. Reassembly of
a fragmented packet MUST NOT change the ECN codepoint when all of the a fragmented packet MUST NOT change the ECN codepoint when all of the
fragments carry the same codepoint. fragments carry the same codepoint.
We would note that because RFC 2481 did not specify reassembly We would note that because RFC 2481 did not specify reassembly
behavior, older ECN implementations conformant with that Experimental behavior, older ECN implementations conformant with that Experimental
RFC do not necessarily perform reassembly correctly, in terms of RFC do not necessarily perform reassembly correctly, in terms of
preserving the CE codepoint in a fragment. The sender could avoid preserving the CE codepoint in a fragment. The sender could avoid
the consequences of this behavior by setting the DF bit in ECN- the consequences of this behavior by setting the DF bit in ECN-
Capable packets. Capable packets.
Situations may arise in which the above reassembly specification is Situations may arise in which the above reassembly specification is
insufficiently precise. For example, if there is a malicious or insufficiently precise. For example, if there is a malicious or
broken entity in the path at or after the fragmentation point, packet broken entity in the path at or after the fragmentation point, packet
fragments could carry a mixture of ECT(0), ECT(1), and/or Not-ECT fragments could carry a mixture of ECT(0), ECT(1), and/or Not-ECT
codepoints. The reassembly specification above does not place codepoints. The reassembly specification above does not place
requirements on reassembly of fragments in this case. In situations requirements on reassembly of fragments in this case. In situations
where more precise reassembly behavior would be required, protocol where more precise reassembly behavior would be required, protocol
specifications SHOULD instead specify that DF MUST be set in all ECN- specifications SHOULD instead specify that DF MUST be set in all
capable packets sent by the protocol. ECN-capable packets sent by the protocol.
6. Support from the Transport Protocol 6. Support from the Transport Protocol
ECN requires support from the transport protocol, in addition to the ECN requires support from the transport protocol, in addition to the
functionality given by the ECN field in the IP packet header. The functionality given by the ECN field in the IP packet header. The
transport protocol might require negotiation between the endpoints transport protocol might require negotiation between the endpoints
during setup to determine that all of the endpoints are ECN-capable, during setup to determine that all of the endpoints are ECN-capable,
so that the sender can set the ECT codepoint in transmitted packets. so that the sender can set the ECT codepoint in transmitted packets.
Second, the transport protocol must be capable of reacting Second, the transport protocol must be capable of reacting
appropriately to the receipt of CE packets. This reaction could be appropriately to the receipt of CE packets. This reaction could be
skipping to change at page 14, line 36 skipping to change at page 13, line 36
This proposal specifies two new flags in the Reserved field of the This proposal specifies two new flags in the Reserved field of the
TCP header. The TCP mechanism for negotiating ECN-Capability uses TCP header. The TCP mechanism for negotiating ECN-Capability uses
the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved
field of the TCP header is designated as the ECN-Echo flag. The field of the TCP header is designated as the ECN-Echo flag. The
location of the 6-bit Reserved field in the TCP header is shown in location of the 6-bit Reserved field in the TCP header is shown in
Figure 4 of RFC 793 [RFC793] (and is reproduced below for Figure 4 of RFC 793 [RFC793] (and is reproduced below for
completeness). This specification of the ECN Field leaves the completeness). This specification of the ECN Field leaves the
Reserved field as a 4-bit field using bits 4-7. Reserved field as a 4-bit field using bits 4-7.
To enable the TCP receiver to determine when to stop setting the ECN- To enable the TCP receiver to determine when to stop setting the
Echo flag, we introduce a second new flag in the TCP header, the CWR ECN-Echo flag, we introduce a second new flag in the TCP header, the
flag. The CWR flag is assigned to Bit 8 in the Reserved field of the CWR flag. The CWR flag is assigned to Bit 8 in the Reserved field of
TCP header. the TCP header.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | U | A | P | R | S | F | | | | U | A | P | R | S | F |
| Header Length | Reserved | R | C | S | S | Y | I | | Header Length | Reserved | R | C | S | S | Y | I |
| | | G | K | H | T | N | N | | | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 3: The old definition of bytes 13 and 14 of the TCP Figure 3: The old definition of bytes 13 and 14 of the TCP
header. header.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | C | E | U | A | P | R | S | F | | | | C | E | U | A | P | R | S | F |
| Header Length | Reserved | W | C | R | C | S | S | Y | I | | Header Length | Reserved | W | C | R | C | S | S | Y | I |
| | | R | E | G | K | H | T | N | N | | | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 4: The new definition of bytes 13 and 14 of the TCP Figure 4: The new definition of bytes 13 and 14 of the TCP
Header. Header.
Thus, ECN uses the ECT and CE flags in the IP header (as shown in Thus, ECN uses the ECT and CE flags in the IP header (as shown in
Figure 1) for signaling between routers and connection endpoints, and Figure 1) for signaling between routers and connection endpoints, and
uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure
4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection, 4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection,
a typical sequence of events in an ECN-based reaction to congestion a typical sequence of events in an ECN-based reaction to congestion
is as follows: is as follows:
* An ECT codepoint is set in packets transmitted by the sender to * An ECT codepoint is set in packets transmitted by the sender to
indicate that ECN is supported by the transport entities for these indicate that ECN is supported by the transport entities for
packets. these packets.
* An ECN-capable router detects impending congestion and detects * An ECN-capable router detects impending congestion and detects
that an ECT codepoint is set in the packet it is about to drop. that an ECT codepoint is set in the packet it is about to drop.
Instead of dropping the packet, the router chooses to set the CE Instead of dropping the packet, the router chooses to set the CE
codepoint in the IP header and forwards the packet. codepoint in the IP header and forwards the packet.
* The receiver receives the packet with the CE codepoint set, and * The receiver receives the packet with the CE codepoint set, and
sets the ECN-Echo flag in its next TCP ACK sent to the sender. sets the ECN-Echo flag in its next TCP ACK sent to the sender.
* The sender receives the TCP ACK with ECN-Echo set, and reacts to * The sender receives the TCP ACK with ECN-Echo set, and reacts to
the congestion as if a packet had been dropped. the congestion as if a packet had been dropped.
* The sender sets the CWR flag in the TCP header of the next * The sender sets the CWR flag in the TCP header of the next
packet sent to the receiver to acknowledge its receipt of and packet sent to the receiver to acknowledge its receipt of and
reaction to the ECN-Echo flag. reaction to the ECN-Echo flag.
The negotiation for using ECN by the TCP transport entities and the The negotiation for using ECN by the TCP transport entities and the
use of the ECN-Echo and CWR flags is described in more detail in the use of the ECN-Echo and CWR flags is described in more detail in the
sections below. sections below.
6.1.1 TCP Initialization 6.1.1 TCP Initialization
In the TCP connection setup phase, the source and destination TCPs In the TCP connection setup phase, the source and destination TCPs
exchange information about their willingness to use ECN. Subsequent exchange information about their willingness to use ECN. Subsequent
to the completion of this negotiation, the TCP sender sets an ECT to the completion of this negotiation, the TCP sender sets an ECT
skipping to change at page 16, line 44 skipping to change at page 15, line 49
but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an
indication that the TCP transmitting the SYN-ACK packet is ECN- indication that the TCP transmitting the SYN-ACK packet is ECN-
Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does
not commit the TCP host to setting the ECT codepoint in transmitted not commit the TCP host to setting the ECT codepoint in transmitted
packets. packets.
The following rules apply to the sending of ECN-setup packets within The following rules apply to the sending of ECN-setup packets within
a TCP connection, where a TCP connection is defined by the standard a TCP connection, where a TCP connection is defined by the standard
rules for TCP connection establishment and termination. rules for TCP connection establishment and termination.
* If a host has received an ECN-setup SYN packet, then it MAY send an * If a host has received an ECN-setup SYN packet, then it MAY send
ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup an ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an
SYN-ACK packet. ECN-setup SYN-ACK packet.
* A host MUST NOT set ECT on data packets unless it has sent at least
one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at
least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no
non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has
received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK
packet, then it SHOULD NOT set ECT on data packets.
* If a host ever sets the ECT codepoint on a data packet, then that * A host MUST NOT set ECT on data packets unless it has sent at
host MUST correctly set/clear the CWR TCP bit on all subsequent least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has
packets in the connection. received at least one ECN-setup SYN or ECN-setup SYN-ACK packet,
* If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK and has sent no non-ECN-setup SYN or non-ECN-setup SYN-ACK
packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- packet. If a host has received at least one non-ECN-setup SYN
ACK packet, then if that host receives TCP data packets with ECT and or non-ECN-setup SYN-ACK packet, then it SHOULD NOT set ECT on
CE codepoints set in the IP header, then that host MUST process these data packets.
packets as specified for an ECN-capable connection.
* A host that is not willing to use ECN on a TCP connection SHOULD * If a host ever sets the ECT codepoint on a data packet, then
clear both the ECE and CWR flags in all non-ECN-setup SYN and/or SYN- that host MUST correctly set/clear the CWR TCP bit on all
ACK packets that it sends to indicate this unwillingness. Receivers subsequent packets in the connection.
MUST correctly handle all forms of the non-ECN-setup SYN and SYN-ACK
packets. * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-
* A host MUST NOT set ECT on SYN or SYN-ACK packets. ACK packet, and has received no non-ECN-setup SYN or non-ECN-
setup SYN-ACK packet, then if that host receives TCP data
packets with ECT and CE codepoints set in the IP header, then
that host MUST process these packets as specified for an ECN-
capable connection.
* A host that is not willing to use ECN on a TCP connection SHOULD
clear both the ECE and CWR flags in all non-ECN-setup SYN and/or
SYN-ACK packets that it sends to indicate this unwillingness.
Receivers MUST correctly handle all forms of the non-ECN-setup
SYN and SYN-ACK packets.
* A host MUST NOT set ECT on SYN or SYN-ACK packets.
A TCP client enters TIME-WAIT state after receiving a FIN-ACK, and A TCP client enters TIME-WAIT state after receiving a FIN-ACK, and
transitions to CLOSED state after a timeout. Many TCP transitions to CLOSED state after a timeout. Many TCP
implementations create a new TCP connection if they receive an in- implementations create a new TCP connection if they receive an in-
window SYN packet during TIME-WAIT state. When a TCP host enters window SYN packet during TIME-WAIT state. When a TCP host enters
TIME-WAIT or CLOSED state, it should ignore any previous state about TIME-WAIT or CLOSED state, it should ignore any previous state about
the negotiation of ECN for that connection. the negotiation of ECN for that connection.
6.1.1.1. Middlebox Issues 6.1.1.1. Middlebox Issues
ECN introduces the use of the ECN-Echo and CWR flags in the TCP ECN introduces the use of the ECN-Echo and CWR flags in the TCP
header (as shown in Figure 3) for initialization. There exist some header (as shown in Figure 3) for initialization. There exist some
faulty firewalls, load balancers, and intrusion detection systems in faulty firewalls, load balancers, and intrusion detection systems in
the Internet that either drop an ECN-setup SYN packet or respond with the Internet that either drop an ECN-setup SYN packet or respond with
a RST, in the belief that such a packet (with these bits set) is a a RST, in the belief that such a packet (with these bits set) is a
signature for a port-scanning tool that could be used in a denial-of- signature for a port-scanning tool that could be used in a denial-
service attack. Some of the offending equipment has been identified, of-service attack. Some of the offending equipment has been
and a web page [FIXES] contains a list of non-compliant products and identified, and a web page [FIXES] contains a list of non-compliant
the fixes posted by the vendors, where these are available. The TBIT products and the fixes posted by the vendors, where these are
web page [TBIT] lists some of the web servers affected by this faulty available. The TBIT web page [TBIT] lists some of the web servers
equipment. We mention this in this document as a warning to the affected by this faulty equipment. We mention this in this document
community of this problem. as a warning to the community of this problem.
To provide robust connectivity even in the presence of such faulty To provide robust connectivity even in the presence of such faulty
equipment, a host that receives a RST in response to the transmission equipment, a host that receives a RST in response to the transmission
of an ECN-setup SYN packet MAY resend a SYN with CWR and ECE cleared. of an ECN-setup SYN packet MAY resend a SYN with CWR and ECE cleared.
This could result in a TCP connection being established without using This could result in a TCP connection being established without using
ECN. ECN.
A host that receives no reply to an ECN-setup SYN within the normal A host that receives no reply to an ECN-setup SYN within the normal
SYN retransmission timeout interval MAY resend the SYN and any SYN retransmission timeout interval MAY resend the SYN and any
subsequent SYN retransmissions with CWR and ECE cleared. To overcome subsequent SYN retransmissions with CWR and ECE cleared. To overcome
normal packet loss that results in the original SYN being lost, the normal packet loss that results in the original SYN being lost, the
originating host may retransmit one or more ECN-setup SYN packets originating host may retransmit one or more ECN-setup SYN packets
before giving up and retransmitting the SYN with the CWR and ECE bits before giving up and retransmitting the SYN with the CWR and ECE bits
cleared. cleared.
We note that in this case, the following example scenario is possi- We note that in this case, the following example scenario is
ble: possible:
(1) Host A: Sends an ECN-setup SYN. (1) Host A: Sends an ECN-setup SYN.
(2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or delayed. (2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or delayed.
(3) Host A: Sends a non-ECN-setup SYN. (3) Host A: Sends a non-ECN-setup SYN.
(4) Host B: Sends a non-ECN-setup SYN/ACK. (4) Host B: Sends a non-ECN-setup SYN/ACK.
We note that in this case, following the procedures above, neither We note that in this case, following the procedures above, neither
Host A nor Host B may set the ECT bit on data packets. Further, an Host A nor Host B may set the ECT bit on data packets. Further, an
important consequence of the rules for ECN setup and usage in Section important consequence of the rules for ECN setup and usage in Section
6.1.1 is that a host is forbidden from using the reception of ECT 6.1.1 is that a host is forbidden from using the reception of ECT
skipping to change at page 19, line 4 skipping to change at page 18, line 14
6.1.2. The TCP Sender 6.1.2. The TCP Sender
For a TCP connection using ECN, new data packets are transmitted with For a TCP connection using ECN, new data packets are transmitted with
an ECT codepoint set in the IP header. When only one ECT codepoint an ECT codepoint set in the IP header. When only one ECT codepoint
is needed by a sender for all packets sent on a TCP connection, is needed by a sender for all packets sent on a TCP connection,
ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE) ACK ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE) ACK
packet (that is, an ACK packet with the ECN-Echo flag set in the TCP packet (that is, an ACK packet with the ECN-Echo flag set in the TCP
header), then the sender knows that congestion was encountered in the header), then the sender knows that congestion was encountered in the
network on the path from the sender to the receiver. The indication network on the path from the sender to the receiver. The indication
of congestion should be treated just as a congestion loss in non-ECN- of congestion should be treated just as a congestion loss in non-
Capable TCP. That is, the TCP source halves the congestion window ECN-Capable TCP. That is, the TCP source halves the congestion window
"cwnd" and reduces the slow start threshold "ssthresh". The sending "cwnd" and reduces the slow start threshold "ssthresh". The sending
TCP SHOULD NOT increase the congestion window in response to the TCP SHOULD NOT increase the congestion window in response to the
receipt of an ECN-Echo ACK packet. receipt of an ECN-Echo ACK packet.
TCP should not react to congestion indications more than once every TCP should not react to congestion indications more than once every
window of data (or more loosely, more than once every round-trip window of data (or more loosely, more than once every round-trip
time). That is, the TCP sender's congestion window should be reduced time). That is, the TCP sender's congestion window should be reduced
only once in response to a series of dropped and/or CE packets from a only once in response to a series of dropped and/or CE packets from a
single window of data. In addition, the TCP source should not single window of data. In addition, the TCP source should not
decrease the slow-start threshold, ssthresh, if it has been decreased decrease the slow-start threshold, ssthresh, if it has been decreased
within the last round trip time. However, if any retransmitted within the last round trip time. However, if any retransmitted
packets are dropped, then this is interpreted by the source TCP as a packets are dropped, then this is interpreted by the source TCP as a
new instance of congestion. new instance of congestion.
After the source TCP reduces its congestion window in response to a After the source TCP reduces its congestion window in response to a
CE packet, incoming acknowledgements that continue to arrive can CE packet, incoming acknowledgments that continue to arrive can
"clock out" outgoing packets as allowed by the reduced congestion "clock out" outgoing packets as allowed by the reduced congestion
window. If the congestion window consists of only one MSS (maximum window. If the congestion window consists of only one MSS (maximum
segment size), and the sending TCP receives an ECN-Echo ACK packet, segment size), and the sending TCP receives an ECN-Echo ACK packet,
then the sending TCP should in principle still reduce its congestion then the sending TCP should in principle still reduce its congestion
window in half. However, the value of the congestion window is window in half. However, the value of the congestion window is
bounded below by a value of one MSS. If the sending TCP were to bounded below by a value of one MSS. If the sending TCP were to
continue to send, using a congestion window of 1 MSS, this results in continue to send, using a congestion window of 1 MSS, this results in
the transmission of one packet per round-trip time. It is necessary the transmission of one packet per round-trip time. It is necessary
to still reduce the sending rate of the TCP sender even further, on to still reduce the sending rate of the TCP sender even further, on
receipt of an ECN-Echo packet when the congestion window is one. We receipt of an ECN-Echo packet when the congestion window is one. We
skipping to change at page 20, line 17 skipping to change at page 19, line 26
new data packet that it transmits. new data packet that it transmits.
[Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98]
discusses the validation test in the ns simulator, which illustrates discusses the validation test in the ns simulator, which illustrates
a wide range of ECN scenarios. These scenarios include the following: a wide range of ECN scenarios. These scenarios include the following:
an ECN followed by another ECN, a Fast Retransmit, or a Retransmit an ECN followed by another ECN, a Fast Retransmit, or a Retransmit
Timeout; a Retransmit Timeout or a Fast Retransmit followed by an Timeout; a Retransmit Timeout or a Fast Retransmit followed by an
ECN; and a congestion window of one packet followed by an ECN. ECN; and a congestion window of one packet followed by an ECN.
TCP follows existing algorithms for sending data packets in response TCP follows existing algorithms for sending data packets in response
to incoming ACKs, multiple duplicate acknowledgements, or retransmit to incoming ACKs, multiple duplicate acknowledgments, or retransmit
timeouts [RFC2581]. TCP also follows the normal procedures for timeouts [RFC2581]. TCP also follows the normal procedures for
increasing the congestion window when it receives ACK packets without increasing the congestion window when it receives ACK packets without
the ECN-Echo bit set [RFC2581]. the ECN-Echo bit set [RFC2581].
6.1.3. The TCP Receiver 6.1.3. The TCP Receiver
When TCP receives a CE data packet at the destination end-system, the When TCP receives a CE data packet at the destination end-system, the
TCP data receiver sets the ECN-Echo flag in the TCP header of the TCP data receiver sets the ECN-Echo flag in the TCP header of the
subsequent ACK packet. If there is any ACK withholding implemented, subsequent ACK packet. If there is any ACK withholding implemented,
as in current "delayed-ACK" TCP implementations where the TCP as in current "delayed-ACK" TCP implementations where the TCP
receiver can send an ACK for two arriving data packets, then the ECN- receiver can send an ACK for two arriving data packets, then the
Echo flag in the ACK packet will be set to '1' if the CE codepoint is ECN-Echo flag in the ACK packet will be set to '1' if the CE
set in any of the data packets being acknowledged. That is, if any codepoint is set in any of the data packets being acknowledged. That
of the received data packets are CE packets, then the returning ACK is, if any of the received data packets are CE packets, then the
has the ECN-Echo flag set. returning ACK has the ECN-Echo flag set.
To provide robustness against the possibility of a dropped ACK packet To provide robustness against the possibility of a dropped ACK packet
carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in
a series of ACK packets sent subsequently. The TCP receiver uses the a series of ACK packets sent subsequently. The TCP receiver uses the
CWR flag received from the TCP sender to determine when to stop CWR flag received from the TCP sender to determine when to stop
setting the ECN-Echo flag. setting the ECN-Echo flag.
After a TCP receiver sends an ACK packet with the ECN-Echo bit set, After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
that TCP receiver continues to set the ECN-Echo flag in all the ACK that TCP receiver continues to set the ECN-Echo flag in all the ACK
packets it sends (whether they acknowledge CE data packets or non-CE packets it sends (whether they acknowledge CE data packets or non-CE
data packets) until it receives a CWR packet (a packet with the CWR data packets) until it receives a CWR packet (a packet with the CWR
flag set). After the receipt of the CWR packet, acknowledgements for flag set). After the receipt of the CWR packet, acknowledgments for
subsequent non-CE data packets do not have the ECN-Echo flag set. If subsequent non-CE data packets do not have the ECN-Echo flag set. If
another CE packet is received by the data receiver, the receiver another CE packet is received by the data receiver, the receiver
would once again send ACK packets with the ECN-Echo flag set. While would once again send ACK packets with the ECN-Echo flag set. While
the receipt of a CWR packet does not guarantee that the data sender the receipt of a CWR packet does not guarantee that the data sender
received the ECN-Echo message, this does suggest that the data sender received the ECN-Echo message, this does suggest that the data sender
reduced its congestion window at some point *after* it sent the data reduced its congestion window at some point *after* it sent the data
packet for which the CE codepoint was set. packet for which the CE codepoint was set.
We have already specified that a TCP sender is not required to reduce We have already specified that a TCP sender is not required to reduce
its congestion window more than once per window of data. Some care its congestion window more than once per window of data. Some care
skipping to change at page 24, line 44 skipping to change at page 24, line 11
the CE codepoint set at that router itself; in such an environment, the CE codepoint set at that router itself; in such an environment,
routers could also take note of arriving CE packets that indicate routers could also take note of arriving CE packets that indicate
congestion encountered by that packet earlier in the path. congestion encountered by that packet earlier in the path.
8. Non-compliance in the Network 8. Non-compliance in the Network
This section considers the issues when a router is operating, This section considers the issues when a router is operating,
possibly maliciously, to modify either of the bits in the ECN field. possibly maliciously, to modify either of the bits in the ECN field.
We note that in IPv4, the IP header is protected from bit errors by a We note that in IPv4, the IP header is protected from bit errors by a
header checksum; this is not the case in IPv6. Thus for IPv6 the header checksum; this is not the case in IPv6. Thus for IPv6 the
ECN field can be accidentially modified by bit errors on links or in ECN field can be accidentally modified by bit errors on links or in
routers without being detected by an IP header checksum. routers without being detected by an IP header checksum.
By tampering with the bits in the ECN field, an adversary (or a By tampering with the bits in the ECN field, an adversary (or a
broken router) could do one or more of the following: falsely report broken router) could do one or more of the following: falsely report
congestion, disable ECN-Capability for an individual packet, erase congestion, disable ECN-Capability for an individual packet, erase
the ECN congestion indication, or falsely indicate ECN-Capability. the ECN congestion indication, or falsely indicate ECN-Capability.
Section 18 systematically examines the various cases by which the ECN Section 18 systematically examines the various cases by which the ECN
field could be modified. The important criterion considered in field could be modified. The important criterion considered in
determining the consequences of such modifications is whether it is determining the consequences of such modifications is whether it is
likely to lead to poorer behavior in any dimension (throughput, likely to lead to poorer behavior in any dimension (throughput,
skipping to change at page 27, line 24 skipping to change at page 26, line 37
support for ECN an option for IP tunnels, so that an IP tunnel can be support for ECN an option for IP tunnels, so that an IP tunnel can be
specified or configured either to use ECN or not to use ECN in the specified or configured either to use ECN or not to use ECN in the
outer header of the tunnel. Thus, in environments or tunneling outer header of the tunnel. Thus, in environments or tunneling
protocols where the risks of using ECN are judged to outweigh its protocols where the risks of using ECN are judged to outweigh its
benefits, the tunnel can simply not use ECN in the outer header. benefits, the tunnel can simply not use ECN in the outer header.
Then the only indication of congestion experienced at routers within Then the only indication of congestion experienced at routers within
the tunnel would be through packet loss. the tunnel would be through packet loss.
The result is that there are two viable options for the behavior of The result is that there are two viable options for the behavior of
ECN-capable connections over an IP tunnel, including IPsec tunnels: ECN-capable connections over an IP tunnel, including IPsec tunnels:
* A limited-functionality option in which ECN is preserved in the * A limited-functionality option in which ECN is preserved in the
inner header, but disabled in the outer header. The only inner header, but disabled in the outer header. The only
mechanism available for signaling congestion occurring within the mechanism available for signaling congestion occurring within
tunnel in this case is dropped packets. the tunnel in this case is dropped packets.
* A full-functionality option that supports ECN in both the inner * A full-functionality option that supports ECN in both the inner
and outer headers, and propagates congestion warnings from nodes and outer headers, and propagates congestion warnings from nodes
within the tunnel to endpoints. within the tunnel to endpoints.
Support for these options requires varying amounts of changes to IP Support for these options requires varying amounts of changes to IP
header processing at tunnel ingress and egress. A small subset of header processing at tunnel ingress and egress. A small subset of
these changes sufficient to support only the limited-functionality these changes sufficient to support only the limited-functionality
option would be sufficient to eliminate any incompatibility between option would be sufficient to eliminate any incompatibility between
ECN and IP tunnels. ECN and IP tunnels.
One goal of this document is to give guidance about the tradeoffs One goal of this document is to give guidance about the tradeoffs
between the limited-functionality and full-functionality options. A between the limited-functionality and full-functionality options. A
full discussion of the potential effects of an adversary's full discussion of the potential effects of an adversary's
skipping to change at page 29, line 12 skipping to change at page 28, line 29
pre-existing agreement between the tunnel endpoints about whether to pre-existing agreement between the tunnel endpoints about whether to
support the limited-functionality or the full-functionality ECN support the limited-functionality or the full-functionality ECN
option. option.
All IP tunnels MUST implement the limited-functionality option, and All IP tunnels MUST implement the limited-functionality option, and
SHOULD support the full-functionality option. SHOULD support the full-functionality option.
In addition, it is RECOMMENDED that packets with the CE codepoint in In addition, it is RECOMMENDED that packets with the CE codepoint in
the outer header be dropped if they arrive at the tunnel egress point the outer header be dropped if they arrive at the tunnel egress point
for a tunnel that uses the limited-functionality option, or for a for a tunnel that uses the limited-functionality option, or for a
tunnel that uses the full-functionality option but for which the not- tunnel that uses the full-functionality option but for which the
ECT codepoint is set in the inner header. This is motivated by not-ECT codepoint is set in the inner header. This is motivated by
backwards compatibility and to ensure that no unauthorized backwards compatibility and to ensure that no unauthorized
modifications of the ECN field take place, and is discussed further modifications of the ECN field take place, and is discussed further
in the next Section (9.1.2). in the next Section (9.1.2).
9.1.2. Changes to the ECN Field within an IP Tunnel. 9.1.2. Changes to the ECN Field within an IP Tunnel.
The presence of a copy of the ECN field in the inner header of an IP The presence of a copy of the ECN field in the inner header of an IP
tunnel mode packet provides an opportunity for detection of tunnel mode packet provides an opportunity for detection of
unauthorized modifications to the ECN field in the outer header. unauthorized modifications to the ECN field in the outer header.
Comparison of the ECT fields in the inner and outer headers falls Comparison of the ECT fields in the inner and outer headers falls
skipping to change at page 29, line 26 skipping to change at page 28, line 43
in the next Section (9.1.2). in the next Section (9.1.2).
9.1.2. Changes to the ECN Field within an IP Tunnel. 9.1.2. Changes to the ECN Field within an IP Tunnel.
The presence of a copy of the ECN field in the inner header of an IP The presence of a copy of the ECN field in the inner header of an IP
tunnel mode packet provides an opportunity for detection of tunnel mode packet provides an opportunity for detection of
unauthorized modifications to the ECN field in the outer header. unauthorized modifications to the ECN field in the outer header.
Comparison of the ECT fields in the inner and outer headers falls Comparison of the ECT fields in the inner and outer headers falls
into two categories for implementations that conform to this into two categories for implementations that conform to this
document: document:
* If the IP tunnel uses the full-functionality option, then the * If the IP tunnel uses the full-functionality option, then the
not-ECT codepoint should be set in the outer header if and only if not-ECT codepoint should be set in the outer header if and only
it is also set in the inner header. if it is also set in the inner header.
* If the tunnel uses the limited-functionality option, then the * If the tunnel uses the limited-functionality option, then the
not-ECT codepoint should be set in the outer header. not-ECT codepoint should be set in the outer header.
Receipt of a packet not satisfying the appropriate condition could be Receipt of a packet not satisfying the appropriate condition could be
a cause of concern. a cause of concern.
Consider the case of an IP tunnel where the tunnel ingress point has Consider the case of an IP tunnel where the tunnel ingress point has
not been updated to this document's requirements, while the tunnel not been updated to this document's requirements, while the tunnel
egress point has been updated to support ECN. In this case, the IP egress point has been updated to support ECN. In this case, the IP
tunnel is not explicitly configured to support the full-functionality tunnel is not explicitly configured to support the full-functionality
ECN option. However, the tunnel ingress point is behaving identically ECN option. However, the tunnel ingress point is behaving identically
to a tunnel ingress point that supports the full-functionality to a tunnel ingress point that supports the full-functionality
skipping to change at page 32, line 10 skipping to change at page 31, line 24
In some environments, the ability to modify the ECN field without In some environments, the ability to modify the ECN field without
affecting IPsec integrity checks may constitute a covert channel; if affecting IPsec integrity checks may constitute a covert channel; if
it is necessary to eliminate such a channel or reduce its bandwidth, it is necessary to eliminate such a channel or reduce its bandwidth,
then the IPsec tunnel should be run in limited-functionality mode. then the IPsec tunnel should be run in limited-functionality mode.
9.2.1. Negotiation between Tunnel Endpoints 9.2.1. Negotiation between Tunnel Endpoints
This section describes the detailed changes to enable usage of ECN This section describes the detailed changes to enable usage of ECN
over IPsec tunnels, including the negotiation of ECN support between over IPsec tunnels, including the negotiation of ECN support between
tunnel endpoints. This is supported by three changes to IPsec: tunnel endpoints. This is supported by three changes to IPsec:
* An optional Security Association Database (SAD) field indicating * An optional Security Association Database (SAD) field indicating
whether tunnel encapsulation and decapsulation processing allows whether tunnel encapsulation and decapsulation processing allows
or forbids ECN usage in the outer IP header. or forbids ECN usage in the outer IP header.
* An optional Security Association Attribute that enables * An optional Security Association Attribute that enables
negotiation of this SAD field between the two endpoints of an SA negotiation of this SAD field between the two endpoints of an SA
that supports tunnel mode. that supports tunnel mode.
* Changes to tunnel mode encapsulation and decapsulation * Changes to tunnel mode encapsulation and decapsulation
processing to allow or forbid ECN usage in the outer IP header processing to allow or forbid ECN usage in the outer IP header
based on the value of the SAD field. When ECN usage is allowed in based on the value of the SAD field. When ECN usage is allowed
the outer IP header, the ECT codepoint is set in the outer header in the outer IP header, the ECT codepoint is set in the outer
for ECN-capable connections and congestion notifications header for ECN-capable connections and congestion notifications
(indicated by the CE codepoint) from such connections are (indicated by the CE codepoint) from such connections are
propagated to the inner header at tunnel egress. propagated to the inner header at tunnel egress.
If negotiation of ECN usage is implemented, then the SAD field SHOULD If negotiation of ECN usage is implemented, then the SAD field SHOULD
also be implemented. On the other hand, negotiation of ECN usage is also be implemented. On the other hand, negotiation of ECN usage is
OPTIONAL in all cases, even for implementations that support the SAD OPTIONAL in all cases, even for implementations that support the SAD
field. The encapsulation and decapsulation processing changes are field. The encapsulation and decapsulation processing changes are
REQUIRED, but MAY be implemented without the other two changes by REQUIRED, but MAY be implemented without the other two changes by
assuming that ECN usage is always forbidden. The full-functionality assuming that ECN usage is always forbidden. The full-functionality
alternative for ECN usage over IPsec tunnels consists of the SAD alternative for ECN usage over IPsec tunnels consists of the SAD
field and the full version of encapsulation and decapsulation field and the full version of encapsulation and decapsulation
processing changes, with or without the OPTIONAL negotiation support. processing changes, with or without the OPTIONAL negotiation support.
skipping to change at page 33, line 35 skipping to change at page 33, line 7
The IPsec SA Attribute value 10 has been allocated by IANA to The IPsec SA Attribute value 10 has been allocated by IANA to
indicate that the ECN Tunnel SA Attribute is being negotiated; the indicate that the ECN Tunnel SA Attribute is being negotiated; the
type of this attribute is Basic (see Section 4.5 of [RFC2407]). The type of this attribute is Basic (see Section 4.5 of [RFC2407]). The
Class Values are used to conduct the negotiation. See [RFC2407, Class Values are used to conduct the negotiation. See [RFC2407,
RFC2408, RFC2409] for further information including encoding formats RFC2408, RFC2409] for further information including encoding formats
and requirements for negotiating this SA attribute. and requirements for negotiating this SA attribute.
Class Values Class Values
ECN Tunnel ECN Tunnel
Specifies whether ECN functionality is allowed to Specifies whether ECN functionality is allowed to be used with Tunnel
be used with Tunnel Encapsulation Mode. Encapsulation Mode. This affects tunnel encapsulation and
This affects tunnel encapsulation and decapsulation processing - decapsulation processing - see Section 9.2.1.3.
see Section 9.2.1.3.
RESERVED 0 RESERVED 0
Allowed 1 Allowed 1
Forbidden 2 Forbidden 2
Values 3-61439 are reserved to IANA. Values 61440-65535 are for Values 3-61439 are reserved to IANA. Values 61440-65535 are for
private use. private use.
If unspecified, the default shall be assumed to be Forbidden. If unspecified, the default shall be assumed to be Forbidden.
ECN Tunnel is a new SA attribute, and hence initiators that use it ECN Tunnel is a new SA attribute, and hence initiators that use it
can expect to encounter responders that do not understand it, and can expect to encounter responders that do not understand it, and
therefore reject proposals containing it. For backwards therefore reject proposals containing it. For backwards
compatibility with such implementations initiators SHOULD always also compatibility with such implementations initiators SHOULD always also
include a proposal without the ECN Tunnel attribute to enable such a include a proposal without the ECN Tunnel attribute to enable such a
responder to select a transform or proposal that does not contain the responder to select a transform or proposal that does not contain the
ECN Tunnel attribute. RFC 2407 currently requires responders to ECN Tunnel attribute. RFC 2407 currently requires responders to
reject all proposals if any proposal contains an unknown attribute; reject all proposals if any proposal contains an unknown attribute;
this requirement is expected to be changed to require a responder not this requirement is expected to be changed to require a responder not
to select proposals or transforms containing unknown attributes. to select proposals or transforms containing unknown attributes.
9.2.1.3. Changes to IPsec Tunnel Header Processing 9.2.1.3. Changes to IPsec Tunnel Header Processing
For full ECN support, the encapsulation and decapsulation processing For full ECN support, the encapsulation and decapsulation processing
for the IPv4 TOS field and the IPv6 Traffic Class field are changed for the IPv4 TOS field and the IPv6 Traffic Class field are changed
from that specified in [RFC2401] to the following: from that specified in [RFC2401] to the following:
<-- How Outer Hdr Relates to Inner Hdr --> <-- How Outer Hdr Relates to Inner Hdr -->
Outer Hdr at Inner Hdr at Outer Hdr at Inner Hdr at
IPv4 Encapsulator Decapsulator IPv4 Encapsulator Decapsulator
Header fields: -------------------- ------------ Header fields: -------------------- ------------
DS Field copied from inner hdr (5) no change DS Field copied from inner hdr (5) no change
ECN Field constructed (7) constructed (8) ECN Field constructed (7) constructed (8)
IPv6 IPv6
Header fields: Header fields:
DS Field copied from inner hdr (6) no change DS Field copied from inner hdr (6) no change
ECN Field constructed (7) constructed (8) ECN Field constructed (7) constructed (8)
(5)(6) If the packet will immediately enter a domain for which the (5)(6) If the packet will immediately enter a domain for which the
DSCP value in the outer header is not appropriate, that value MUST DSCP value in the outer header is not appropriate, that value MUST
be mapped to an appropriate value for the domain [RFC 2474]. Also be mapped to an appropriate value for the domain [RFC 2474]. Also
see [RFC 2475] for further information. see [RFC 2475] for further information.
(7) If the value of the ECN Tunnel field in the SAD entry for this (7) If the value of the ECN Tunnel field in the SAD entry for this
SA is "allowed" and the ECN field in the inner header is set to SA is "allowed" and the ECN field in the inner header is set to
any value other than CE, copy this ECN field to the outer header. any value other than CE, copy this ECN field to the outer header.
If the ECN field in the inner header is set to CE, then set the If the ECN field in the inner header is set to CE, then set the
skipping to change at page 35, line 8 skipping to change at page 34, line 29
CE, then copy the ECN field from the outer header to the inner CE, then copy the ECN field from the outer header to the inner
header. Otherwise, make no change to the ECN field in the inner header. Otherwise, make no change to the ECN field in the inner
header. header.
(5) and (6) are identical to match usage in [RFC2401], although (5) and (6) are identical to match usage in [RFC2401], although
they are different in [RFC2401]. they are different in [RFC2401].
The above description applies to implementations that support the ECN The above description applies to implementations that support the ECN
Tunnel field in the SAD; such implementations MUST implement this Tunnel field in the SAD; such implementations MUST implement this
processing instead of the processing of the IPv4 TOS octet and IPv6 processing instead of the processing of the IPv4 TOS octet and IPv6
Traffic Class octet defined in [RFC2401]. This constitutes the full- Traffic Class octet defined in [RFC2401]. This constitutes the
functionality alternative for ECN usage with IPsec tunnels. full-functionality alternative for ECN usage with IPsec tunnels.
An implementation that does not support the ECN Tunnel field in the An implementation that does not support the ECN Tunnel field in the
SAD MUST implement this processing by assuming that the value of the SAD MUST implement this processing by assuming that the value of the
ECN Tunnel field of the SAD is "forbidden" for every SA. In this ECN Tunnel field of the SAD is "forbidden" for every SA. In this
case, the processing of the ECN field reduces to: case, the processing of the ECN field reduces to:
(7) Set the ECN field to not-ECT in the outer header. (7) Set the ECN field to not-ECT in the outer header.
(8) Make no change to the ECN field in the inner header. (8) Make no change to the ECN field in the inner header.
This constitutes the limited functionality alternative for ECN usage This constitutes the limited functionality alternative for ECN usage
skipping to change at page 37, line 14 skipping to change at page 36, line 39
available to that flow. Thus, initially, the router may drop packets available to that flow. Thus, initially, the router may drop packets
in which the router would otherwise would have set the CE codepoint. in which the router would otherwise would have set the CE codepoint.
This could include dropping those arriving packets for that flow that This could include dropping those arriving packets for that flow that
are ECN-Capable and that already have the CE codepoint set. In this are ECN-Capable and that already have the CE codepoint set. In this
way, any congestion indications seen by that router for that flow way, any congestion indications seen by that router for that flow
will be guaranteed to also be seen by the end nodes, even in the will be guaranteed to also be seen by the end nodes, even in the
presence of malicious or broken routers elsewhere in the path. If we presence of malicious or broken routers elsewhere in the path. If we
assume that the first action taken at any "penalty box" for an ECN- assume that the first action taken at any "penalty box" for an ECN-
capable flow will be to drop packets instead of marking them, then capable flow will be to drop packets instead of marking them, then
there is no way that an adversary that subverts ECN-based end-to-end there is no way that an adversary that subverts ECN-based end-to-end
congestion control can cause a flow to be characterized as being non- congestion control can cause a flow to be characterized as being
cooperative and placed into a more severe action within the "penalty non-cooperative and placed into a more severe action within the
box". "penalty box".
The monitoring and policing devices that are actually deployed could The monitoring and policing devices that are actually deployed could
fall short of the `ideal' monitoring device described above, in that fall short of the `ideal' monitoring device described above, in that
the monitoring is applied not to a single flow, but to an aggregate the monitoring is applied not to a single flow, but to an aggregate
of flows (e.g., those sharing a single IPsec tunnel). In this case, of flows (e.g., those sharing a single IPsec tunnel). In this case,
the switch from marking to dropping would apply to all of the flows the switch from marking to dropping would apply to all of the flows
in that aggregate, denying the benefits of ECN to the other flows in in that aggregate, denying the benefits of ECN to the other flows in
the aggregate also. At the highest level of aggregation, another the aggregate also. At the highest level of aggregation, another
form of the disabling of ECN happens even in the absence of form of the disabling of ECN happens even in the absence of
monitoring and policing devices, when ECN-Capable RED queues switch monitoring and policing devices, when ECN-Capable RED queues switch
skipping to change at page 39, line 51 skipping to change at page 39, line 34
For IPsec tunnels, this document also defines an optional IPsec For IPsec tunnels, this document also defines an optional IPsec
Security Association (SA) attribute that enables negotiation of ECN Security Association (SA) attribute that enables negotiation of ECN
usage within IPsec tunnels and an optional field in the Security usage within IPsec tunnels and an optional field in the Security
Association Database to indicate whether ECN is permitted in tunnel Association Database to indicate whether ECN is permitted in tunnel
mode on a SA. The required changes to IPsec tunnels for ECN usage mode on a SA. The required changes to IPsec tunnels for ECN usage
modify RFC 2401 [RFC2401], which defines the IPsec architecture and modify RFC 2401 [RFC2401], which defines the IPsec architecture and
specifies some aspects of its implementation. The new IPsec SA specifies some aspects of its implementation. The new IPsec SA
attribute is in addition to those already defined in Section 4.5 of attribute is in addition to those already defined in Section 4.5 of
[RFC2407]. [RFC2407].
This document is intended to obsolete RFC 2481, "A Proposal to add This document obsoletes RFC 2481, "A Proposal to add Explicit
Explicit Congestion Notification (ECN) to IP", which defined ECN as Congestion Notification (ECN) to IP", which defined ECN as an
an Experimental Protocol for the Internet Community. The rest of Experimental Protocol for the Internet Community. The rest of this
this section describes the relationship between this document and its section describes the relationship between this document and its
predecessor. predecessor.
RFC 2481 included a brief discussion of the use of ECN with RFC 2481 included a brief discussion of the use of ECN with
encapsulated packets, and noted that for the IPsec specifications at encapsulated packets, and noted that for the IPsec specifications at
the time (January 1999), flows could not safely use ECN if they were the time (January 1999), flows could not safely use ECN if they were
to traverse IPsec tunnels. RFC 2481 also described the changes that to traverse IPsec tunnels. RFC 2481 also described the changes that
could be made to IPsec tunnel specifications to made them compatible could be made to IPsec tunnel specifications to made them compatible
with ECN. with ECN.
This document also incorporates work that was done after RFC 2481. This document also incorporates work that was done after RFC 2481.
skipping to change at page 42, line 9 skipping to change at page 41, line 37
related discussions and documents from the Differentiated Services related discussions and documents from the Differentiated Services
Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh,
for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen
for proposing modifications to RFC 2407 that improve the usability of for proposing modifications to RFC 2407 that improve the usability of
negotiating the ECN Tunnel SA attribute. negotiating the ECN Tunnel SA attribute.
We thank David Wetherall, David Ely, and Neil Spring for the proposal We thank David Wetherall, David Ely, and Neil Spring for the proposal
for the ECN nonce. We also thank Stefan Savage for discussions on for the ECN nonce. We also thank Stefan Savage for discussions on
this issue. We thank Bob Briscoe and Jon Crowcroft for raising the this issue. We thank Bob Briscoe and Jon Crowcroft for raising the
issue of fragmentation in IP, on alternate semantics for the fourth issue of fragmentation in IP, on alternate semantics for the fourth
ECN codepoint, and several other topics. We thank Richard Wendland ECN codepoint, and several other topics. We thank Richard Wendland
for feedback on several issues in the draft. for feedback on several issues in the document.
We also thank the IESG, and in particular the Transport Area We also thank the IESG, and in particular the Transport Area
Directors over the years, for their feedback and their work towards Directors over the years, for their feedback and their work towards
the standardization of ECN. the standardization of ECN.
15. References 15. References
[AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, [AH] Kent, S. and R. Atkinson, "IP Authentication Header",
November 1998. RFC 2402, November 1998.
[ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". [ECN] "The ECN Web Page", URL
Reference for informational purposes only. "http://www.aciri.org/floyd/ecn.html". Reference for
informational purposes only.
[ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security
RFC 2406, November 1998. Payload", RFC 2406, November 1998.
[FIXES] ECN-under-Linux Unofficial Vendor Support Page, URL [FIXES] ECN-under-Linux Unofficial Vendor Support Page, URL
"http://gtf.org/garzik/ecn/". Reference for informational purposes "http://gtf.org/garzik/ecn/". Reference for
only. informational purposes only.
[FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection
for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 gateways for Congestion Avoidance", IEEE/ACM
N.4, August 1993, p. 397-413. Transactions on Networking, V.1 N.4, August 1993, p.
397-413.
[Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification",
Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. ACM Computer Communication Review, V. 24 N. 5, October
1994, p. 10-23.
[Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", [Floyd98] Floyd, S., "The ECN Validation Test in the NS
URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- Simulator", URL "http://www-mash.cs.berkeley.edu/ns/",
ecn. Reference for informational purposes only. test tcl/test/test-all- ecn. Reference for
informational purposes only.
[FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-
Congestion Control in the Internet", IEEE/ACM Transactions on End Congestion Control in the Internet", IEEE/ACM
Networking, August 1999. Transactions on Networking, August 1999.
[FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", [FRED] Lin, D., and Morris, R., "Dynamics of Random Early
SIGCOMM '97, September 1997. Detection", SIGCOMM '97, September 1997.
[GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing [GRE] Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic
Encapsulation (GRE), RFC 1701, October 1994. Routing Encapsulation (GRE)", RFC 1701, October 1994.
[Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
ACM SIGCOMM '88, pp. 314-329.
ACM SIGCOMM '88, pp. 314-329.
[Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance
Algorithm", Message to end2end-interest mailing list, April 1990. URL Algorithm", Message to end2end-interest mailing list,
"ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". April 1990. URL
"ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".
[K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) [K98] Krishnan, H., "Analyzing Explicit Congestion
benefits for TCP", Master's thesis, UCLA, 1998. Citation for Notification (ECN) benefits for TCP", Master's thesis,
acknowledgement purposes only. UCLA, 1998. Citation for acknowledgement purposes only.
[L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. [L2TP] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
Palter, Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. G. and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
RFC 2661, August 1999.
[MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-
Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. driven Layered Multicast", SIGCOMM '96, August 1996, pp.
117-130.
[MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, [MPLS] Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M. and J.
Requirements for Traffic Engineering Over MPLS, RFC 2702, September McManus, Requirements for Traffic Engineering Over MPLS,
1999. RFC 2702, September 1999.
[PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little,
and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, W. and G. Zorn, "Point-to-Point Tunneling Protocol
July 1999. (PPTP)", RFC 2637, July 1999.
[RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791,
1981. September 1981.
[RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC
September 1981. 793, September 1981.
[RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of the [RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of
Internet Checksum", RFC 1141, January 1990. the Internet Checksum", RFC 1141, January 1990.
[RFC1349] Almquist, P., "Type of Service in the Internet Protocol [RFC1349] Almquist, P., "Type of Service in the Internet Protocol
Suite", RFC 1349, July 1992. Suite", RFC 1349, July 1992.
[RFC1455] Eastlake, D., "Physical Link Security Type of Service", RFC [RFC1455] Eastlake, D., "Physical Link Security Type of Service",
1455, May 1993. RFC 1455, May 1993.
[RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic [RFC1701] Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic
Routing Encapsulation (GRE), RFC 1701, October 1994. Routing Encapsulation (GRE)", RFC 1701, October 1994.
[RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic [RFC1702] Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic
Routing Encapsulation over IPv4 networks, RFC 1702, October 1994. Routing Encapsulation over IPv4 networks", RFC 1702,
October 1994.
[RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003,
1996. October 1996.
[RFC2119] S. Bradner, Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2309] Braden, B., et al., "Recommendations on Queue Management [RFC2309] Braden, B., et al., "Recommendations on Queue Management
and Congestion Avoidance in the Internet", RFC 2309, April 1998. and Congestion Avoidance in the Internet", RFC 2309,
April 1998.
[RFC2401] S. Kent and R. Atkinson, Security Architecture for the [RFC2401] Kent, S. and R. Atkinson, Security Architecture for the
Internet Protocol, RFC 2401, November 1998. Internet Protocol, RFC 2401, November 1998.
[RFC2407] D. Piper, The Internet IP Security Domain of Interpretation [RFC2407] Piper, D., "The Internet IP Security Domain of
for ISAKMP, RFC 2407, November 1998. Interpretation for ISAKMP", RFC 2407, November 1998.
[RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner, [RFC2408] Maughan, D., Schertler, M., Schneider, M. and J. Turner,
Internet Security Association and Key Management Protocol (ISAKMP), "Internet Security Association and Key Management
RFC 2409, November 1998. Protocol (ISAKMP)", RFC 2409, November 1998.
[RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), [RFC2409] Harkins D. and D. Carrel, "The Internet Key Exchange
RFC 2409, November 1998. (IKE)", RFC 2409, November 1998.
[RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black,
of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 "Definition of the Differentiated Services Field (DS
Headers", RFC 2474, December 1998. Field) in the IPv4 and IPv6 Headers", RFC 2474, December
1998.
[RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.
Weiss, An Architecture for Differentiated Services, RFC 2475, and W. Weiss, "An Architecture for Differentiated
December 1998. Services", RFC 2475, December 1998.
[RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit [RFC2481] Ramakrishnan K. and S. Floyd, "A Proposal to add
Congestion Notification (ECN) to IP, RFC 2481, January 1999. Explicit Congestion Notification (ECN) to IP", RFC 2481,
January 1999.
[RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", [RFC2581] Alman, M., Paxson, V. and W. Stevens, "TCP Congestion
RFC 2581, April 1999. Control", RFC 2581, April 1999.
[RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation [RFC2884] Hadi Salim, J. and U. Ahmed, "Performance Evaluation of
of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, Explicit Congestion Notification (ECN) in IP Networks",
July 2000. RFC 2884, July 2000.
[RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, [RFC2983] Black, D., "Differentiated Services and Tunnels",
October 2000. RFC2983, October 2000.
[RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For [RFC2780] Bradner S. and V. Paxson, "IANA Allocation Guidelines
Values In the Internet Protocol and Related Headers", RFC 2780, March For Values In the Internet Protocol and Related
2000. Headers", BCP 37, RFC 2780, March 2000.
[RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback
Congestion Avoidance in Computer Networks", ACM Transactions on Scheme for Congestion Avoidance in Computer Networks",
Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. ACM Transactions on Computer Systems, Vol.8, No.2, pp.
158-181, May 1990.
[SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom
Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM Anderson, TCP Congestion Control with a Misbehaving
Computer Communications Review, October 1999. Receiver, ACM Computer Communications Review, October
1999.
[TBIT] Jitendra Padhye and Sally Floyd, "Identifying the TCP Behavior [TBIT] Jitendra Padhye and Sally Floyd, "Identifying the TCP
of Web Servers", ICSI TR-01-002, February 2001. URL Behavior of Web Servers", ICSI TR-01-002, February 2001.
"http://www.aciri.org/tbit/". URL "http://www.aciri.org/tbit/".
16. Security Considerations 16. Security Considerations
Security considerations have been discussed in Sections 7, 8, 18, and Security considerations have been discussed in Sections 7, 8, 18, and
19. 19.
17. IPv4 Header Checksum Recalculation 17. IPv4 Header Checksum Recalculation
IPv4 header checksum recalculation is an issue with some high-end IPv4 header checksum recalculation is an issue with some high-end
router architectures using an output-buffered switch, since most if router architectures using an output-buffered switch, since most if
not all of the header manipulation is performed on the input side of not all of the header manipulation is performed on the input side of
the switch, while the ECN decision would need to be made local to the the switch, while the ECN decision would need to be made local to the
output buffer. This is not an issue for IPv6, since there is no IPv6 output buffer. This is not an issue for IPv6, since there is no IPv6
header checksum. The IPv4 TOS octet is the last byte of a 16-bit header checksum. The IPv4 TOS octet is the last byte of a 16-bit
half-word. half-word.
RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
checksum after the TTL field is decremented. The incremental checksum after the TTL field is decremented. The incremental
updating of the IPv4 checksum after the CE codepoint was set would updating of the IPv4 checksum after the CE codepoint was set would
work as follows: Let HC be the original header checksum for an ECT(0) work as follows: Let HC be the original header checksum for an ECT(0)
packet, and let HC' be the new header checksum after the CE checksum packet, and let HC' be the new header checksum after the CE bit has
has been set. That is, the ECN field has changed from '10' to '11'. been set. That is, the ECN field has changed from '10' to '11'.
Then for header checksums calculated with one's complement Then for header checksums calculated with one's complement
subtraction, HC' would be recalculated as follows: subtraction, HC' would be recalculated as follows:
HC' = { HC - 1 HC > 1 HC' = { HC - 1 HC > 1
{ 0x0000 HC = 1 { 0x0000 HC = 1
For header checksums calculated on two's complement machines, HC' would For header checksums calculated on two's complement machines, HC'
be recalculated as follows after the CE bit was set: would be recalculated as follows after the CE bit was set:
HC' = { HC - 1 HC > 0 HC' = { HC - 1 HC > 0
{ 0xFFFE HC = 0 { 0xFFFE HC = 0
A similar incremental updating of the IPv4 checksum can be carried out A similar incremental updating of the IPv4 checksum can be carried
when the ECN field is changed from ECT(1) to CE, that is, from '01' to out when the ECN field is changed from ECT(1) to CE, that is, from '
'11'. 01' to '11'.
18. Possible Changes to the ECN Field in the Network 18. Possible Changes to the ECN Field in the Network
This section discusses in detail possible changes to the ECN field in This section discusses in detail possible changes to the ECN field in
the network, such as falsely reporting congestion, disabling ECN- the network, such as falsely reporting congestion, disabling ECN-
Capability for an individual packet, erasing the ECN congestion Capability for an individual packet, erasing the ECN congestion
indication, or falsely indicating ECN-Capability. indication, or falsely indicating ECN-Capability.
18.1. Possible Changes to the IP Header 18.1. Possible Changes to the IP Header
18.1.1. Erasing the Congestion Indication 18.1.1. Erasing the Congestion Indication
First, we consider the changes that a router could make that would First, we consider the changes that a router could make that would
result in effectively erasing the congestion indication after it had result in effectively erasing the congestion indication after it had
been set by a router upstream. The convention followed is: been set by a router upstream. The convention followed is: ECN
ECN codepoint of received packet -> ECN codepoint of packet codepoint of received packet -> ECN codepoint of packet transmitted.
transmitted.
Replacing the CE codepoint with the ECT(0) or ECT(1) codepoint Replacing the CE codepoint with the ECT(0) or ECT(1) codepoint
effectively erases the congestion indication. However, with the use effectively erases the congestion indication. However, with the use
of two ECT codepoints, a router erasing the CE codepoint has no way of two ECT codepoints, a router erasing the CE codepoint has no way
to know whether the original ECT codepoint was ECT(0) or ECT(1). to know whether the original ECT codepoint was ECT(0) or ECT(1).
Thus, it is possible for the transport protocol to deploy mechanisms Thus, it is possible for the transport protocol to deploy mechanisms
to detect such erasures of the CE codepoint. to detect such erasures of the CE codepoint.
The consequence of the erasure of the CE codepoint for the upstream The consequence of the erasure of the CE codepoint for the upstream
router is that there is a potential for congestion to build for a router is that there is a potential for congestion to build for a
skipping to change at page 49, line 5 skipping to change at page 48, line 41
sequence numbers, and any attacker with this ability and with the sequence numbers, and any attacker with this ability and with the
ability to spoof IP source addresses could damage the TCP connection ability to spoof IP source addresses could damage the TCP connection
without using the ECN flags. Therefore, ECN does not add any new without using the ECN flags. Therefore, ECN does not add any new
vulnerabilities in this respect. vulnerabilities in this respect.
An acknowledgement packet with a spoofed IP source address of the TCP An acknowledgement packet with a spoofed IP source address of the TCP
data receiver could include the ECE bit set. If accepted by the TCP data receiver could include the ECE bit set. If accepted by the TCP
data sender as a valid packet, this spoofed acknowledgement packet data sender as a valid packet, this spoofed acknowledgement packet
could result in the TCP data sender unnecessarily halving its could result in the TCP data sender unnecessarily halving its
congestion window. However, to be accepted by the data sender, such congestion window. However, to be accepted by the data sender, such
a spoofed acknowledgement packet would have to have the correct a spoofed acknowledgement packet would have to have the correct 32-
32-bit sequence number as well as a valid acknowledgement number. An bit sequence number as well as a valid acknowledgement number. An
attacker that could successfully send such a spoofed acknowledgement attacker that could successfully send such a spoofed acknowledgement
packet could also send a spoofed RST packet, or do other equally packet could also send a spoofed RST packet, or do other equally
damaging operations to the TCP connection. damaging operations to the TCP connection.
Packets with a spoofed IP source address of the TCP data sender could Packets with a spoofed IP source address of the TCP data sender could
include the CWR bit set. Again, to be accepted, such a packet would include the CWR bit set. Again, to be accepted, such a packet would
have to have a valid sequence number. In addition, such a spoofed have to have a valid sequence number. In addition, such a spoofed
packet would have a limited performance impact. Spoofing a data packet would have a limited performance impact. Spoofing a data
packet with the CWR bit set could result in the TCP data receiver packet with the CWR bit set could result in the TCP data receiver
sending fewer ECE packets than it would otherwise, if the data sending fewer ECE packets than it would otherwise, if the data
skipping to change at page 51, line 35 skipping to change at page 51, line 25
in the network. in the network.
In some cases, the increase in the level of congestion will lead to a In some cases, the increase in the level of congestion will lead to a
substantial buffer buildup at the congested queue that will be substantial buffer buildup at the congested queue that will be
sufficient to drive the congested queue from the packet-marking to sufficient to drive the congested queue from the packet-marking to
the packet-dropping regime. This transition could occur either the packet-dropping regime. This transition could occur either
because of buffer overflow, or because of the active queue management because of buffer overflow, or because of the active queue management
policy described above that drops packets when the average queue is policy described above that drops packets when the average queue is
above RED's maximum threshold. At this point, all flows, including above RED's maximum threshold. At this point, all flows, including
the subverted flow, will begin to see packet drops instead of packet the subverted flow, will begin to see packet drops instead of packet
marks, and a malicious or broken router will no longer be able to marks, and a malicious or broken router will no longer be able to `
`erase' these indications of congestion in the network. If the end erase' these indications of congestion in the network. If the end
nodes are deploying appropriate end-to-end congestion control, then nodes are deploying appropriate end-to-end congestion control, then
the subverted flow will reduce its arrival rate in response to the subverted flow will reduce its arrival rate in response to
congestion. When the level of congestion is sufficiently reduced, congestion. When the level of congestion is sufficiently reduced,
the congested queue can return from the packet-dropping regime to the the congested queue can return from the packet-dropping regime to the
packet-marking regime. The steady-state pattern could be one of the packet-marking regime. The steady-state pattern could be one of the
congested queue oscillating between these two regimes. congested queue oscillating between these two regimes.
In other cases, the consequences of subverting end-to-end congestion In other cases, the consequences of subverting end-to-end congestion
control will not be severe enough to drive the congested link into control will not be severe enough to drive the congested link into
sufficiently-heavy congestion that packets are dropped instead of sufficiently-heavy congestion that packets are dropped instead of
skipping to change at page 52, line 28 skipping to change at page 52, line 21
Let us take the example described in Section 18.1.1, where the CE Let us take the example described in Section 18.1.1, where the CE
codepoint that was set in a packet is erased: {'11' -> '10' or '11' codepoint that was set in a packet is erased: {'11' -> '10' or '11'
-> '01'}. The consequence for the congested upstream router that set -> '01'}. The consequence for the congested upstream router that set
the CE codepoint is that this congestion indication does not reach the CE codepoint is that this congestion indication does not reach
the end nodes for that flow. The source (even one which is completely the end nodes for that flow. The source (even one which is completely
cooperative and not malicious) is thus allowed to continue to cooperative and not malicious) is thus allowed to continue to
increase its sending rate (if it is a TCP flow, by increasing its increase its sending rate (if it is a TCP flow, by increasing its
congestion window). The flow potentially achieves better throughput congestion window). The flow potentially achieves better throughput
than the other flows that also share the congested router, especially than the other flows that also share the congested router, especially
if there are no policing mechanisms or per-flow queueing mechanisms if there are no policing mechanisms or per-flow queuing mechanisms at
at that router. Consider the behavior of the other flows, especially that router. Consider the behavior of the other flows, especially if
if they are cooperative: that is, the flows that do not experience they are cooperative: that is, the flows that do not experience
subverted end-to-end congestion control. They are likely to reduce subverted end-to-end congestion control. They are likely to reduce
their load (e.g., by reducing their window size) on the congested their load (e.g., by reducing their window size) on the congested
router, thus benefiting our subverted flow. This results in router, thus benefiting our subverted flow. This results in
unfairness. As we discussed above, this unfairness could either be unfairness. As we discussed above, this unfairness could either be
transient (because the congested queue is driven into the packet- transient (because the congested queue is driven into the packet-
marking regime), oscillatory (because the congested queue oscillates marking regime), oscillatory (because the congested queue oscillates
between the packet marking and the packet dropping regime), or more between the packet marking and the packet dropping regime), or more
moderate but a persistent stable state (because the congested queue moderate but a persistent stable state (because the congested queue
is never driven to the packet dropping regime). is never driven to the packet dropping regime).
skipping to change at page 53, line 17 skipping to change at page 53, line 11
network posed by the subversion of either ECN-based or other network posed by the subversion of either ECN-based or other
currently known packet-based congestion control mechanisms by the end currently known packet-based congestion control mechanisms by the end
nodes. nodes.
19.2. Implications for the Subverted Flow 19.2. Implications for the Subverted Flow
When a source indicates that it is ECN-capable, there is an When a source indicates that it is ECN-capable, there is an
expectation that the routers in the network that are capable of expectation that the routers in the network that are capable of
participating in ECN will use the CE codepoint for indication of participating in ECN will use the CE codepoint for indication of
congestion. There is the potential benefit of using ECN in reducing congestion. There is the potential benefit of using ECN in reducing
the amount of packet loss (in addition to the reduced queueing delays the amount of packet loss (in addition to the reduced queuing delays
because of active queue management policies). When the packet flows because of active queue management policies). When the packet flows
through an IPsec tunnel where the nodes that the tunneled packets through an IPsec tunnel where the nodes that the tunneled packets
traverse are untrusted in some way, the expectation is that IPsec traverse are untrusted in some way, the expectation is that IPsec
will protect the flow from subversion that results in undesirable will protect the flow from subversion that results in undesirable
consequences. consequences.
In many cases, a subverted flow will benefit from the subversion of In many cases, a subverted flow will benefit from the subversion of
end-to-end congestion control for that flow in the network, by end-to-end congestion control for that flow in the network, by
receiving more bandwidth than it would have otherwise, relative to receiving more bandwidth than it would have otherwise, relative to
competing non-subverted flows. If the congested queue reaches the competing non-subverted flows. If the congested queue reaches the
skipping to change at page 54, line 14 skipping to change at page 54, line 7
CE codepoint is set within the tunnel, and erased either within or CE codepoint is set within the tunnel, and erased either within or
downstream of the tunnel, this is not necessarily detected at the downstream of the tunnel, this is not necessarily detected at the
egress point of the tunnel. egress point of the tunnel.
With this subversion of end-to-end congestion control, an end-system With this subversion of end-to-end congestion control, an end-system
transport does not respond to the congestion indication. Along with transport does not respond to the congestion indication. Along with
the increased unfairness for the non-subverted flows described in the the increased unfairness for the non-subverted flows described in the
previous section, the congested router's queue could continue to previous section, the congested router's queue could continue to
build, resulting in packet loss at the congested router - which is a build, resulting in packet loss at the congested router - which is a
means for indicating congestion to the transport in any case. In the means for indicating congestion to the transport in any case. In the
interim, the flow might experience higher queueing delays, possibly interim, the flow might experience higher queuing delays, possibly
along with an increased bandwidth relative to other non-subverted along with an increased bandwidth relative to other non-subverted
flows. But transports do not inherently make assumptions of flows. But transports do not inherently make assumptions of
consistently experiencing carefully managed queueing in the path. We consistently experiencing carefully managed queuing in the path. We
believe that these forms of subverting end-to-end congestion control believe that these forms of subverting end-to-end congestion control
are no worse for the subverted flow than if the adversary had simply are no worse for the subverted flow than if the adversary had simply
dropped the packets of that flow itself. dropped the packets of that flow itself.
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control
We have shown that, in many cases, a malicious or broken router that We have shown that, in many cases, a malicious or broken router that
is able to change the bits in the ECN field can do no more damage is able to change the bits in the ECN field can do no more damage
than if it had simply dropped the packet in question. However, this than if it had simply dropped the packet in question. However, this
is not true in all cases, in particular in the cases where the broken is not true in all cases, in particular in the cases where the broken
skipping to change at page 55, line 28 skipping to change at page 55, line 18
If there was no ECT codepoint, then the router would have to set the If there was no ECT codepoint, then the router would have to set the
CE codepoint for packets from both ECN-capable and non-ECN-capable CE codepoint for packets from both ECN-capable and non-ECN-capable
flows. In this case, there would be no incentive for end-nodes to flows. In this case, there would be no incentive for end-nodes to
deploy ECN, and no viable path of incremental deployment from a non- deploy ECN, and no viable path of incremental deployment from a non-
ECN world to an ECN-capable world. Consider the first stages of such ECN world to an ECN-capable world. Consider the first stages of such
an incremental deployment, where a subset of the flows are ECN- an incremental deployment, where a subset of the flows are ECN-
capable. At the onset of congestion, when the packet capable. At the onset of congestion, when the packet
dropping/marking rate would be low, routers would only set CE dropping/marking rate would be low, routers would only set CE
codepoints, rather than dropping packets. However, only those flows codepoints, rather than dropping packets. However, only those flows
that are ECN-capable would understand and respond to CE packets. The that are ECN-capable would understand and respond to CE packets. The
result is that the ECN-capable flows would back off, and the non-ECN- result is that the ECN-capable flows would back off, and the non-
capable flows would be unaware of the ECN signals and would continue ECN-capable flows would be unaware of the ECN signals and would
to open their congestion windows. continue to open their congestion windows.
In this case, there are two possible outcomes: (1) the ECN-capable In this case, there are two possible outcomes: (1) the ECN-capable
flows back off, the non-ECN-capable flows get all of the bandwidth, flows back off, the non-ECN-capable flows get all of the bandwidth,
and congestion remains mild, or (2) the ECN-capable flows back off, and congestion remains mild, or (2) the ECN-capable flows back off,
the non-ECN-capable flows don't, and congestion increases until the the non-ECN-capable flows don't, and congestion increases until the
router transitions from setting the CE codepoint to dropping packets. router transitions from setting the CE codepoint to dropping packets.
While this second outcome evens out the fairness, the ECN-capable While this second outcome evens out the fairness, the ECN-capable
flows would still receive little benefit from being ECN-capable, flows would still receive little benefit from being ECN-capable,
because the increased congestion would drive the router to packet- because the increased congestion would drive the router to packet-
dropping behavior. dropping behavior.
skipping to change at page 56, line 33 skipping to change at page 56, line 24
codepoint would have to be done sparingly, and would be a less codepoint would have to be done sparingly, and would be a less
effective check against misbehaving network elements and receivers effective check against misbehaving network elements and receivers
than would be the ECN nonce. than would be the ECN nonce.
The assignment of the fourth ECN codepoint to ECT(1) precludes the The assignment of the fourth ECN codepoint to ECT(1) precludes the
use of this codepoint for some other purposes. For clarity, we use of this codepoint for some other purposes. For clarity, we
briefly list other possible purposes here. briefly list other possible purposes here.
One possibility might have been for the data sender to use the fourth One possibility might have been for the data sender to use the fourth
ECN codepoint to indicate an alternate semantics for ECN. However, ECN codepoint to indicate an alternate semantics for ECN. However,
this seems to us more appropriate to be signalled using a this seems to us more appropriate to be signaled using a
differentiated services codepoint in the DS field. differentiated services codepoint in the DS field.
A second possible use for the fourth ECN codepoint would have been to A second possible use for the fourth ECN codepoint would have been to
give the router two separate codepoints for the indication of give the router two separate codepoints for the indication of
congestion, CE(0) and CE(1), for mild and severe congestion congestion, CE(0) and CE(1), for mild and severe congestion
respectively. While this could be useful in some cases, this respectively. While this could be useful in some cases, this
certainly does not seem a compelling requirement at this point. If certainly does not seem a compelling requirement at this point. If
there was judged to be a compelling need for this, the complications there was judged to be a compelling need for this, the complications
of incremental deployment would most likely necessitate more that of incremental deployment would most likely necessitate more that
just one codepoint for this function. just one codepoint for this function.
skipping to change at page 59, line 13 skipping to change at page 59, line 5
to overcome these limitations. to overcome these limitations.
22. Historical Definitions for the IPv4 TOS Octet 22. Historical Definitions for the IPv4 TOS Octet
RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP
header. In RFC 791, bits 6 and 7 of the ToS octet are listed as header. In RFC 791, bits 6 and 7 of the ToS octet are listed as
"Reserved for Future Use", and are shown set to zero. The first two "Reserved for Future Use", and are shown set to zero. The first two
fields of the ToS octet were defined as the Precedence and Type of fields of the ToS octet were defined as the Precedence and Type of
Service (TOS) fields. Service (TOS) fields.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | 0 | 0 | RFC 791 | PRECEDENCE | TOS | 0 | 0 | RFC 791
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
RFC 1122 included bits 6 and 7 in the TOS field, though it did not RFC 1122 included bits 6 and 7 in the TOS field, though it did not
discuss any specific use for those two bits: discuss any specific use for those two bits:
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | RFC 1122 | PRECEDENCE | TOS | RFC 1122
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows:
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | MBZ | RFC 1349 | PRECEDENCE | TOS | MBZ | RFC 1349
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary
Cost". In addition to the Precedence and Type of Service (TOS) Cost". In addition to the Precedence and Type of Service (TOS)
fields, the last field, MBZ (for "must be zero") was defined as fields, the last field, MBZ (for "must be zero") was defined as
currently unused. RFC 1349 stated that "The originator of a datagram currently unused. RFC 1349 stated that "The originator of a datagram
sets [the MBZ] field to zero (unless participating in an Internet sets [the MBZ] field to zero (unless participating in an Internet
protocol experiment which makes use of that bit)." protocol experiment which makes use of that bit)."
RFC 1455 [RFC 1455] defined an experimental standard that used all RFC 1455 [RFC 1455] defined an experimental standard that used all
four bits in the TOS field to request a guaranteed level of link four bits in the TOS field to request a guaranteed level of link
skipping to change at page 60, line 29 skipping to change at page 60, line 20
prior to 2474. Such nodes may transmit bit 6 (the first ECN bit) as prior to 2474. Such nodes may transmit bit 6 (the first ECN bit) as
one for the "Minimize Monetary Cost" provision of RFC 1349 or the one for the "Minimize Monetary Cost" provision of RFC 1349 or the
experiment authorized by RFC 1455; neither this aspect of RFC 1349 experiment authorized by RFC 1455; neither this aspect of RFC 1349
nor the experiment in RFC 1455 were widely implemented or used. The nor the experiment in RFC 1455 were widely implemented or used. The
damage that could be done by a broken, non-conformant router would damage that could be done by a broken, non-conformant router would
include "erasing" the CE codepoint for an ECN-capable packet that include "erasing" the CE codepoint for an ECN-capable packet that
arrived at the router with the CE codepoint set, or setting the CE arrived at the router with the CE codepoint set, or setting the CE
codepoint even in the absence of congestion. This has been discussed codepoint even in the absence of congestion. This has been discussed
in the section on "Non-compliance in the Network". in the section on "Non-compliance in the Network".
The damage that could be done in an ECN-capable environment by a non- The damage that could be done in an ECN-capable environment by a
ECN-capable end-node transmitting packets with the ECT codepoint set non-ECN-capable end-node transmitting packets with the ECT codepoint
has been discussed in the section on "Non-compliance by the End set has been discussed in the section on "Non-compliance by the End
Nodes". Nodes".
23. IANA Considerations 23. IANA Considerations
This section contains the namespaces that have either been created in This section contains the namespaces that have either been created in
this specification, or the values assigned in existing namespaces this specification, or the values assigned in existing namespaces
managed by IANA. managed by IANA.
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet 23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet
The codepoints for the ECN Field of the IP header are specified by The codepoints for the ECN Field of the IP header are specified by
the Standards Action of this RFC, as is required by RFC 2780. the Standards Action of this RFC, as is required by RFC 2780.
When this draft is published as an RFC, IANA should create a new When this document is published as an RFC, IANA should create a new
registry, "IPv4 TOS Byte and IPv6 Traffic Class Octet", with the registry, "IPv4 TOS Byte and IPv6 Traffic Class Octet", with the
namespace as follows: namespace as follows:
IPv4 TOS Byte and IPv6 Traffic Class Octet IPv4 TOS Byte and IPv6 Traffic Class Octet
Description: The registrations are identical for IPv4 and IPv6. Description: The registrations are identical for IPv4 and IPv6.
Bits 0-5: see Differentiated Services Field Codepoints Registry Bits 0-5: see Differentiated Services Field Codepoints Registry
(http://www.iana.org/assignments/dscp-registry) (http://www.iana.org/assignments/dscp-registry)
Bits 6-7, ECN Field: Bits 6-7, ECN Field:
Binary Keyword References Binary Keyword References
------ ------- ---------- ------ ------- ----------
00 Not-ECT (Not ECN-Capable Transport) [RFC xxx] 00 Not-ECT (Not ECN-Capable Transport) [RFC 3168]
01 ECT(1) (ECN-Capable Transport(1)) [RFC xxx] 01 ECT(1) (ECN-Capable Transport(1)) [RFC 3168]
10 ECT(0) (ECN-Capable Transport(0)) [RFC xxx] 10 ECT(0) (ECN-Capable Transport(0)) [RFC 3168]
11 CE (Congestion Experienced) [RFC xxx] 11 CE (Congestion Experienced) [RFC 3168]
23.2. TCP Header Flags 23.2. TCP Header Flags
The codepoints for the CWR and ECE flags in the TCP header are The codepoints for the CWR and ECE flags in the TCP header are
specified by the Standards Action of this RFC, as is required by RFC specified by the Standards Action of this RFC, as is required by RFC
2780. 2780.
When this draft is published as an RFC, IANA should create a new When this document is published as an RFC, IANA should create a new
registry, "TCP Header Flags", with the namespace as follows: registry, "TCP Header Flags", with the namespace as follows:
TCP Header Flags TCP Header Flags
The Transmission Control Protocol (TCP) included a 6-bit Reserved
field defined in RFC 793, reserved for future use, in bytes
13 and 14 of the TCP header, as illustrated below. The other six
Control bits are defined separately by RFC 793.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 The Transmission Control Protocol (TCP) included a 6-bit Reserved
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ field defined in RFC 793, reserved for future use, in bytes 13 and 14
| | | U | A | P | R | S | F | of the TCP header, as illustrated below. The other six Control bits
| Header Length | Reserved | R | C | S | S | Y | I | are defined separately by RFC 793.
| | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
RFC xxx defines two of the six bits from the Reserved field to be 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
used for ECN, as follows: +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | U | A | P | R | S | F |
| Header Length | Reserved | R | C | S | S | Y | I |
| | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 RFC 3168 defines two of the six bits from the Reserved field to be
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ used for ECN, as follows:
| | | C | E | U | A | P | R | S | F |
| Header Length | Reserved | W | C | R | C | S | S | Y | I |
| | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
TCP Header Flags 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | C | E | U | A | P | R | S | F |
| Header Length | Reserved | W | C | R | C | S | S | Y | I |
| | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
TCP Header Flags
Bit Name Reference Bit Name Reference
--- ---- --------- --- ---- ---------
8 CWR (Congestion Window Reduced) [RFC xxx] 8 CWR (Congestion Window Reduced) [RFC 3168]
9 ECE (ECN-Echo) [RFC xxx] 9 ECE (ECN-Echo) [RFC 3168]
23.3. IPSEC Security Association Attributes 23.3. IPSEC Security Association Attributes
IANA allocated the IPSEC Security Association Attribute value 10 for IANA allocated the IPSEC Security Association Attribute value 10 for
the ECN Tunnel use described in Section 9.2.1.2 above at the request the ECN Tunnel use described in Section 9.2.1.2 above at the request
of David Black in November 1999. When this draft is published as an of David Black in November 1999. The IANA has changed the Reference
RFC, IANA should change the Reference for this allocation from David for this allocation from David Black's request to this RFC.
Black's request to this RFC based on its RFC number.
AUTHORS' ADDRESSES 24. Authors' Addresses
K. K. Ramakrishnan K. K. Ramakrishnan
TeraOptic Networks, Inc. TeraOptic Networks, Inc.
Phone: +1 (408) 666-8650 Phone: +1 (408) 666-8650
Email: kk@teraoptic.com EMail: kk@teraoptic.com
Sally Floyd Sally Floyd
Phone: +1 (510) 666-2989
ACIRI ACIRI
Email: floyd@aciri.org
Phone: +1 (510) 666-2989
EMail: floyd@aciri.org
URL: http://www.aciri.org/floyd/ URL: http://www.aciri.org/floyd/
David L. Black David L. Black
EMC Corporation EMC Corporation
42 South St. 42 South St.
Hopkinton, MA 01748 Hopkinton, MA 01748
Phone: +1 (508) 435-1000 x75140 Phone: +1 (508) 435-1000 x75140
Email: black_david@emc.com EMail: black_david@emc.com
This draft was created in June 2001. 25. Full Copyright Statement
It expires December 2001.
Copyright (C) The Internet Society (2001). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 146 change blocks. 
525 lines changed or deleted 524 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/