draft-ietf-tsvwg-ecn-00.txt   draft-ietf-tsvwg-ecn-01.txt 
Internet Engineering Task Force K. K. Ramakrishnan Internet Engineering Task Force K. K. Ramakrishnan
INTERNET DRAFT TeraOptic Networks INTERNET DRAFT TeraOptic Networks
draft-ietf-tsvwg-ecn-00.txt Sally Floyd draft-ietf-tsvwg-ecn-01.txt Sally Floyd
ACIRI ACIRI
D. Black D. Black
EMC EMC
November, 2000 January, 2001
Expires: May, 2001 Expires: July, 2001
The Addition of Explicit Congestion Notification (ECN) to IP The Addition of Explicit Congestion Notification (ECN) to IP
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 39 skipping to change at page 1, line 39
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
This document specifies the incorporation of ECN (Explicit Congestion This document specifies the incorporation of ECN (Explicit Congestion
Notification) to TCP and IP, including ECN's use of two bits in the Notification) to TCP and IP, including ECN's use of two bits in the
IP header's DS field. We begin by describing TCP's use of packet IP header. We begin by describing TCP's use of packet drops as an
drops as an indication of congestion. Next we explain that with the indication of congestion. Next we explain that with the addition of
addition of active queue management (e.g., RED) to the Internet active queue management (e.g., RED) to the Internet infrastructure,
infrastructure, where routers detect congestion before the queue where routers detect congestion before the queue overflows, routers
overflows, routers are no longer limited to packet drops as an are no longer limited to packet drops as an indication of congestion.
indication of congestion. Routers can instead set the Congestion Routers can instead set the Congestion Experienced (CE) bit in the IP
Experienced (CE) bit in the IP header of packets from ECN-capable header of packets from ECN-capable transports. We describe when the
transports. We describe when the CE bit is to be set in routers, and CE bit is to be set in routers, and describe modifications needed to
describe modifications needed to TCP to make it ECN-capable. TCP to make it ECN-capable. Modifications to other transport
Modifications to other transport protocols (e.g., unreliable unicast protocols (e.g., unreliable unicast or multicast, reliable multicast,
or multicast, reliable multicast, other reliable unicast transport other reliable unicast transport protocols) could be considered as
protocols) could be considered as those protocols are developed and those protocols are developed and advance through the standards
advance through the standards process. process.
We also describe in this document the issues involving the use of ECN We also describe in this document the issues involving the use of ECN
within IP tunnels, and within IPsec tunnels in particular. within IP tunnels, and within IPsec tunnels in particular.
One of the guiding principles for this document is that all the One of the guiding principles for this document is that all the
mechanisms specified here are incrementally deployable. mechanisms specified here are incrementally deployable.
Table of Contents Table of Contents
1. Introduction 1. Introduction
2. Conventions and Acronyms 2. Conventions and Acronyms
3. Assumptions and General Principles 3. Assumptions and General Principles
4. Active Queue Management (AQM) 4. Active Queue Management (AQM)
5. Explicit Congestion Notification in IP 5. Explicit Congestion Notification in IP
5.1. ECN as an indication of persistent congestion 5.1. ECN as an Indication of Persistent Congestion
5.2. Dropped or Corrupted Packets 5.2. Dropped or Corrupted Packets
6. Support from the Transport Protocol 6. Support from the Transport Protocol
6.1. TCP 6.1. TCP
6.1.1. TCP Initialization 6.1.1. TCP Initialization
6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field
6.1.1.2. Robust TCP Initialization with no response to the SYN
6.1.2. The TCP Sender 6.1.2. The TCP Sender
6.1.3. The TCP Receiver 6.1.3. The TCP Receiver
6.1.4. Congestion on the ACK-path 6.1.4. Congestion on the ACK-path
6.1.5. Retransmitted TCP packets 6.1.5. Retransmitted TCP packets
6.1.6. TCP Window Probes. 6.1.6. TCP Window Probes.
7. Non-compliance by the End Nodes 7. Non-compliance by the End Nodes
8. Non-compliance in the Network 8. Non-compliance in the Network
8.1. Complications Introduced by Split Paths 8.1. Complications Introduced by Split Paths
9. Encapsulated Packets 9. Encapsulated Packets
9.1. IP packets encapsulated in IP 9.1. IP packets encapsulated in IP
9.1.1. The limited-functionality and full-functionality options within 9.1.1. The Limited-functionality and Full-functionality Options
9.1.2. Changes to the ECN Field within an IP Tunnel. 9.1.2. Changes to the ECN Field within an IP Tunnel.
9.2. IPsec Tunnels 9.2. IPsec Tunnels
9.2.1. Negotiation between Tunnel Endpoints 9.2.1. Negotiation between Tunnel Endpoints
9.2.1.1. ECN Tunnel Security Association Database Field 9.2.1.1. ECN Tunnel Security Association Database Field
9.2.1.2. ECN Tunnel Security Association Attribute 9.2.1.2. ECN Tunnel Security Association Attribute
9.2.1.3. Changes to IPsec Tunnel Header Processing 9.2.1.3. Changes to IPsec Tunnel Header Processing
9.2.2. Changes to the ECN Field within an IPsec Tunnel. 9.2.2. Changes to the ECN Field within an IPsec Tunnel.
9.2.3. Comments for IPsec Support 9.2.3. Comments for IPsec Support
9.3. IP packets encapsulated in non-IP packet headers. 9.3. IP packets encapsulated in non-IP packet headers.
10. Issues Raised by Monitoring and Policing Devices 10. Issues Raised by Monitoring and Policing Devices
skipping to change at page 4, line 9 skipping to change at page 4, line 8
18.1.2. Falsely Reporting Congestion 18.1.2. Falsely Reporting Congestion
18.1.3. Disabling ECN-Capability 18.1.3. Disabling ECN-Capability
18.1.4. Falsely Indicating ECN-Capability 18.1.4. Falsely Indicating ECN-Capability
18.1.5. Changes with No Functional Effect 18.1.5. Changes with No Functional Effect
18.2. Information carried in the Transport Header 18.2. Information carried in the Transport Header
18.3. Split Paths 18.3. Split Paths
19. Implications of Subverting End-to-End Congestion Control 19. Implications of Subverting End-to-End Congestion Control
19.1. Implications for the Network and for Competing Flows 19.1. Implications for the Network and for Competing Flows
19.2. Implications for the Subverted Flow 19.2. Implications for the Subverted Flow
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control
20. The motivation for the ECT bit. 20. The Motivation for the ECT bit.
21. Why use two bits in the IP header? 21. Why use Two Bits in the IP Header?
22. Historical definitions for the IPv4 TOS octet 22. Historical Definitions for the IPv4 TOS Octet
23. IANA Considerations
RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare
this with draft-ietf-tsvwg-ecn-00, compare the following:
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-00.troff"
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-01.troff"
Changes from draft-ietf-tsvwg-ecn-00:
* Deleted Section 6.1.1.2. on "Robust TCP Initialization with no
response to the SYN", and modified the paragraph in the Conclusions
referring to this.
* Added Section 23 on IANA Considerations.
* Added two paragraphs to Section 18.2 on denial-of-service attacks.
* Added some text about the ECN nonce being a research issue.
* Moved two paragraphs about setting the CWR bit from Section 6.1.3 to
Section 6.1.2.
* Various small changes:
Adding several small clarifying sentences in Section 12, 22.
Small clarification to text in Section 19.2.
Deleted a few unnecessary sentences in Section 9.
Updated some references to Section X.
Added more references to RFC 2780.
Deleted references to internet-drafts.
Clarified terminology for "non-ECN-setup SYN packet", including the
following: "Receivers MUST correctly handle all forms of the non-ECN-
setup SYN and SYN-ACK packets."
1. Introduction 1. Introduction
TCP's congestion control and avoidance algorithms are based on the TCP's congestion control and avoidance algorithms are based on the
notion that the network is a black-box [Jacobson88, Jacobson90]. The notion that the network is a black-box [Jacobson88, Jacobson90]. The
network's state of congestion or otherwise is determined by end- sys- network's state of congestion or otherwise is determined by end- sys-
tems probing for the network state, by gradually increasing the load tems probing for the network state, by gradually increasing the load
on the network (by increasing the window of packets that are out- on the network (by increasing the window of packets that are out-
standing in the network) until the network becomes congested and a standing in the network) until the network becomes congested and a
packet is lost. Treating the network as a "black-box" and treating packet is lost. Treating the network as a "black-box" and treating
skipping to change at page 5, line 20 skipping to change at page 5, line 44
Active queue management mechanisms may use one of several methods for Active queue management mechanisms may use one of several methods for
indicating congestion to end-nodes. One is to use packet drops, as is indicating congestion to end-nodes. One is to use packet drops, as is
currently done. However, active queue management allows the router to currently done. However, active queue management allows the router to
separate policies of queueing or dropping packets from the policies separate policies of queueing or dropping packets from the policies
for indicating congestion. Thus, active queue management allows for indicating congestion. Thus, active queue management allows
routers to use the Congestion Experienced (CE) bit in a packet header routers to use the Congestion Experienced (CE) bit in a packet header
as an indication of congestion, instead of relying solely on packet as an indication of congestion, instead of relying solely on packet
drops. This has the potential of reducing the impact of loss on drops. This has the potential of reducing the impact of loss on
latency-sensitive flows. latency-sensitive flows.
This document is intended to obsolete RFC 2481, "A Proposal to add
Explicit Congestion Notification (ECN) to IP", which defined ECN as
an Experimental Protocol for the Internet Community.
RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This
document obsoletes three subsequent internet-drafts on ECN, "IPsec
Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP
with ECN: The Treatment of Retransmitted Data Packets". This
document is intended largely to merge the earlier documents all into
a single document, for greater clarity, in preparation to becoming a
Proposed Standard.
2. Conventions and Acronyms 2. Conventions and Acronyms
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [B97]. document, are to be interpreted as described in [B97].
3. Assumptions and General Principles 3. Assumptions and General Principles
In this section, we describe some of the important design principles In this section, we describe some of the important design principles
and assumptions that guided the design choices in this proposal. and assumptions that guided the design choices in this proposal.
skipping to change at page 7, line 5 skipping to change at page 7, line 42
with two bits. The ECN-Capable Transport (ECT) bit is set by the with two bits. The ECN-Capable Transport (ECT) bit is set by the
data sender to indicate that the end-points of the transport protocol data sender to indicate that the end-points of the transport protocol
are ECN-capable. The CE bit is set by the router to indicate conges- are ECN-capable. The CE bit is set by the router to indicate conges-
tion to the end nodes. Routers that have a packet arriving at a full tion to the end nodes. Routers that have a packet arriving at a full
queue drop the packet, just as they do in the absence of ECN. queue drop the packet, just as they do in the absence of ECN.
Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field.
Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE
bit. The IPv4 TOS octet corresponds to the Traffic Class octet in bit. The IPv4 TOS octet corresponds to the Traffic Class octet in
IPv6. The definitions for the IPv4 TOS octet [RFC791] and the IPv6 IPv6. The definitions for the IPv4 TOS octet [RFC791] and the IPv6
Traffic Class octet have been superseded by the DS (Differentiated Traffic Class octet have been superseded by the six-bit DS (Differen-
Services) Field [RFC2474]. Bits 6 and 7 are listed in [RFC2474] as tiated Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed
Currently Unused. Section 19 gives a brief history of the TOS octet. in [RFC2474] as Currently Unused, and are specified in RFC 2780 as
approved for experimental use for ECN. Section 19 gives a brief his-
tory of the TOS octet.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
| | ECN FIELD | | DS FIELD | ECN FIELD |
| DSCP | | | | |
| | ECT | CE | | DSCP | ECT | CE |
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
DSCP: differentiated services codepoint DSCP: differentiated services codepoint
ECN: Explicit Congestion Notification ECN: Explicit Congestion Notification
Figure 1: The Differentiated Services Field in IP. Figure 1: The Differentiated Services and ECN Fields in IP.
Because of the unstable history of the TOS octet, the use of the ECN Because of the unstable history of the TOS octet, the use of the ECN
field as specified in this document cannot be guaranteed to be back- field as specified in this document cannot be guaranteed to be back-
wards compatible with all past uses of these two bits. The potential wards compatible with all past uses of these two bits. The potential
dangers of this lack of backwards compatibility are discussed in Sec- dangers of this lack of backwards compatibility are discussed in Sec-
tion 19. tion 19.
Upon the receipt by an ECN-Capable transport of a single CE packet, Upon the receipt by an ECN-Capable transport of a single CE packet,
the congestion control algorithms followed at the end-systems MUST be the congestion control algorithms followed at the end-systems MUST be
essentially the same as the congestion control response to a *single* essentially the same as the congestion control response to a *single*
skipping to change at page 8, line 24 skipping to change at page 9, line 19
control mechanisms for end-node reaction to CE packets. However, control mechanisms for end-node reaction to CE packets. However,
this is a research issue, and as such is not addressed in this docu- this is a research issue, and as such is not addressed in this docu-
ment. ment.
When a CE packet (i.e., a packet that has the CE bit set) is received When a CE packet (i.e., a packet that has the CE bit set) is received
by a router, the CE bit is left unchanged, and the packet is trans- by a router, the CE bit is left unchanged, and the packet is trans-
mitted as usual. When severe congestion has occurred and the router's mitted as usual. When severe congestion has occurred and the router's
queue is full, then the router has no choice but to drop some packet queue is full, then the router has no choice but to drop some packet
when a new packet arrives. We anticipate that such packet losses when a new packet arrives. We anticipate that such packet losses
will become relatively infrequent when a majority of end-systems will become relatively infrequent when a majority of end-systems
become ECN- Capable and participate in TCP or other compatible con- become ECN-Capable and participate in TCP or other compatible conges-
gestion control mechanisms. In an ECN-Capable environment that is tion control mechanisms. In an ECN-Capable environment that is ade-
adequately-provisioned network, packet losses should occur primarily quately-provisioned network, packet losses should occur primarily
during transients or in the presence of non-cooperating sources. during transients or in the presence of non-cooperating sources.
We expect that routers will set the CE bit in response to incipient We expect that routers will set the CE bit in response to incipient
congestion as indicated by the average queue size, using the RED congestion as indicated by the average queue size, using the RED
algorithms suggested in [FJ93, RFC2309]. To the best of our knowl- algorithms suggested in [FJ93, RFC2309]. To the best of our knowl-
edge, this is the only proposal currently under discussion in the edge, this is the only proposal currently under discussion in the
IETF for routers to drop packets proactively, before the buffer over- IETF for routers to drop packets proactively, before the buffer over-
flows. However, this document does not attempt to specify a particu- flows. However, this document does not attempt to specify a particu-
lar mechanism for active queue management, leaving that endeavor, if lar mechanism for active queue management, leaving that endeavor, if
needed, to other areas of the IETF. While ECN is inextricably tied needed, to other areas of the IETF. While ECN is inextricably tied
up with the need to have a reasonable active queue management mecha- up with the need to have a reasonable active queue management mecha-
nism at the router, the reverse does not hold; active queue manage- nism at the router, the reverse does not hold; active queue manage-
ment mechanisms have been developed and deployed independent of ECN, ment mechanisms have been developed and deployed independent of ECN,
using packet drops as indications of congestion in the absence of ECN using packet drops as indications of congestion in the absence of ECN
in the IP architecture. in the IP architecture.
5.1. ECN as an indication of persistent congestion 5.1. ECN as an Indication of Persistent Congestion
We emphasize that a *single* packet with the CE bit set in an IP We emphasize that a *single* packet with the CE bit set in an IP
packet causes the transport layer to respond, in terms of congestion packet causes the transport layer to respond, in terms of congestion
control, as it would to a packet drop. The instantaneous queue size control, as it would to a packet drop. The instantaneous queue size
is likely to see considerable variations even when the router does is likely to see considerable variations even when the router does
not experience persistent congestion. As such, it is important that not experience persistent congestion. As such, it is important that
transient congestion at a router, reflected by the instantaneous transient congestion at a router, reflected by the instantaneous
queue size reaching a threshold much smaller than the capacity of the queue size reaching a threshold much smaller than the capacity of the
queue, not trigger a reaction at the transport layer. Therefore, the queue, not trigger a reaction at the transport layer. Therefore, the
CE bit should not be set by a router based on the instantaneous queue CE bit should not be set by a router based on the instantaneous queue
skipping to change at page 12, line 36 skipping to change at page 13, line 32
packet. This indicates to the routers that they may mark this packet packet. This indicates to the routers that they may mark this packet
with the CE bit, if they would like to use that as a method of con- with the CE bit, if they would like to use that as a method of con-
gestion notification. If the TCP connection does not wish to use ECN gestion notification. If the TCP connection does not wish to use ECN
notification for a particular packet, the sending TCP sets the ECT notification for a particular packet, the sending TCP sets the ECT
bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE
bit in the received packet. bit in the received packet.
For this discussion, we designate the initiating host as Host A and For this discussion, we designate the initiating host as Host A and
the responding host as Host B. We call a SYN packet with the ECE and the responding host as Host B. We call a SYN packet with the ECE and
CWR flags set an "ECN-setup SYN packet", and we call a SYN packet CWR flags set an "ECN-setup SYN packet", and we call a SYN packet
with the ECE and CWR flags not set a "non-ECN-setup SYN packet". with at least one of the ECE and CWR flags not set a "non-ECN-setup
Similarly, we call a SYN-ACK packet with only the ECE flag set but SYN packet". Similarly, we call a SYN-ACK packet with only the ECE
the CWR flag not set an "ECN-setup SYN-ACK packet", and we call a flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and
SYN-ACK packet with both the ECE and CWR flags not set a "non-ECN- we call a SYN-ACK packet with any other configuration of the ECE and
setup SYN-ACK packet". CWR flags a "non-ECN-setup SYN-ACK packet".
Before a TCP connection can use ECN, Host A sends an ECN-setup SYN Before a TCP connection can use ECN, Host A sends an ECN-setup SYN
packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN
packet, the setting of both ECE and CWR in the ECN-setup SYN packet packet, the setting of both ECE and CWR in the ECN-setup SYN packet
is defined as an indication that the sending TCP is ECN-Capable, is defined as an indication that the sending TCP is ECN-Capable,
rather than as an indication of congestion or of response to conges- rather than as an indication of congestion or of response to conges-
tion. More precisely, an ECN-setup SYN packet indicates that the TCP tion. More precisely, an ECN-setup SYN packet indicates that the TCP
implementation transmitting the SYN packet will participate in ECN as implementation transmitting the SYN packet will participate in ECN as
both a sender and receiver. Specifically, as a receiver, it will both a sender and receiver. Specifically, as a receiver, it will
respond to incoming data packets that have the CE bit set in the IP respond to incoming data packets that have the CE bit set in the IP
skipping to change at page 13, line 38 skipping to change at page 14, line 32
non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has
received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK
packet, then it SHOULD NOT set ECT on data packets. packet, then it SHOULD NOT set ECT on data packets.
* If a host ever sets the ECT bit on a data packet, then that host * If a host ever sets the ECT bit on a data packet, then that host
MUST correctly set/clear the CWR TCP bit on all subsequent packets in MUST correctly set/clear the CWR TCP bit on all subsequent packets in
the connection. the connection.
* If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK
packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN-
ACK packet, then if that host receives TCP data packets with ECT and ACK packet, then if that host receives TCP data packets with ECT and
CE bits set in the IP header, then that host MUST process these pack- CE bits set in the IP header, then that host MUST process these pack-
ets as specified for an ECN-capable connection. ets as specified for an ECN-capable connection. * A host that is not
willing to use ECN on a TCP connection SHOULD clear both the ECE and
CWR flags in all non-ECN-setup SYN and/or SYN-ACK packets that it
sends to indicate this unwillingness. Receivers MUST correctly han-
dle all forms of the non-ECN-setup SYN and SYN-ACK packets.
6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field
There is the question of why we chose to have the TCP sending the SYN There is the question of why we chose to have the TCP sending the SYN
set two ECN-related flags in the Reserved field of the TCP header for set two ECN-related flags in the Reserved field of the TCP header for
the SYN packet, while the responding TCP sending the SYN-ACK sets the SYN packet, while the responding TCP sending the SYN-ACK sets
only one ECN-related flag in the SYN-ACK packet. This asymmetry is only one ECN-related flag in the SYN-ACK packet. This asymmetry is
necessary for the robust negotiation of ECN-capability with some necessary for the robust negotiation of ECN-capability with some
deployed TCP implementations. There exists at least one faulty TCP deployed TCP implementations. There exists at least one faulty TCP
implementation in which TCP receivers set the Reserved field of the implementation in which TCP receivers set the Reserved field of the
TCP header in ACK packets (and hence the SYN-ACK) simply to reflect TCP header in ACK packets (and hence the SYN-ACK) simply to reflect
the Reserved field of the TCP header in the received data packet. the Reserved field of the TCP header in the received data packet.
Because the TCP SYN packet sets the ECN-Echo and CWR flags to indi- Because the TCP SYN packet sets the ECN-Echo and CWR flags to indi-
cate ECN-capability, while the SYN-ACK packet sets only the ECN-Echo cate ECN-capability, while the SYN-ACK packet sets only the ECN-Echo
flag, the sending TCP correctly interprets a receiver's reflection of flag, the sending TCP correctly interprets a receiver's reflection of
its own flags in the Reserved field as an indication that the its own flags in the Reserved field as an indication that the
receiver is not ECN-capable. The sending TCP is not mislead by a receiver is not ECN-capable. The sending TCP is not mislead by a
faulty TCP implementation sending a SYN-ACK packet that simply faulty TCP implementation sending a SYN-ACK packet that simply
reflects the Reserved field of the incoming SYN packet. reflects the Reserved field of the incoming SYN packet.
6.1.1.2. Robust TCP Initialization with no response to the SYN
ECN introduces the use of the ECN-Echo and CWR flags in the TCP
header (as shown in Figure 3) for initialization. There exists some
faulty equipment in the Internet that either ignores an ECN-setup SYN
packet or responds with a RST, in the belief that such a packet (with
these bits set) is a signature for a port-scanning tool that could be
used in a denial-of-service attack. To provide robust connectivity
even in the presence of such faulty equipment, a host that receives a
RST in response to the transmission of an ECN-setup SYN packet MAY
resend a SYN with CWR and ECE cleared. This could result in a TCP
connection being established without using ECN. Similarly, a host
that receives no reply to an ECN-setup SYN within the normal SYN
retransmission timeout interval MAY resend the SYN and any subsequent
SYN retransmissions with CWR and ECE cleared. To overcome normal
packet loss that results in the original SYN being lost, the origi-
nating host may retransmit one or more ECN-setup SYN packets before
giving up and retransmitting the SYN with the CWR and ECE bits
cleared.
We note that in this case, the following example scenario is possi-
ble:
(1) Host A: Sends an ECN-setup SYN.
(2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or
delayed.
(3) Host A: Sends a non-ECN-setup SYN.
(4) Host B: Sends a non-ECN-setup SYN/ACK.
We note that in this case, following the procedures above, neither
Host A nor Host B may set the ECT bit on data packets, We further
note that a host NEVER uses the reception of ECT data packets as an
implicit signal that the other host is ECN-capable.
6.1.2. The TCP Sender 6.1.2. The TCP Sender
For a TCP connection using ECN, data packets are transmitted with the For a TCP connection using ECN, new data packets are transmitted with
ECT bit set in the IP header (set to a "1"). If the sender receives the ECT bit set in the IP header (set to a "1"). If the sender
an ECN-Echo (ECE) ACK packet (that is, an ACK packet with the ECN- receives an ECN-Echo (ECE) ACK packet (that is, an ACK packet with
Echo flag set in the TCP header), then the sender knows that conges- the ECN-Echo flag set in the TCP header), then the sender knows that
tion was encountered in the network on the path from the sender to congestion was encountered in the network on the path from the sender
the receiver. The indication of congestion should be treated just as to the receiver. The indication of congestion should be treated just
a congestion loss in non-ECN-Capable TCP. That is, the TCP source as a congestion loss in non-ECN-Capable TCP. That is, the TCP source
halves the congestion window "cwnd" and reduces the slow start halves the congestion window "cwnd" and reduces the slow start
threshold "ssthresh". The sending TCP SHOULD NOT increase the con- threshold "ssthresh". The sending TCP SHOULD NOT increase the con-
gestion window in response to the receipt of an ECN-Echo ACK packet. gestion window in response to the receipt of an ECN-Echo ACK packet.
TCP should not react to congestion indications more than once every TCP should not react to congestion indications more than once every
window of data (or more loosely, more than once every round-trip window of data (or more loosely, more than once every round-trip
time). That is, the TCP sender's congestion window should be reduced time). That is, the TCP sender's congestion window should be reduced
only once in response to a series of dropped and/or CE packets from a only once in response to a series of dropped and/or CE packets from a
single window of data. In addition, the TCP source should not single window of data. In addition, the TCP source should not
decrease the slow-start threshold, ssthresh, if it has been decreased decrease the slow-start threshold, ssthresh, if it has been decreased
skipping to change at page 15, line 37 skipping to change at page 15, line 50
tinue to send, using a congestion window of 1 MSS, this results in tinue to send, using a congestion window of 1 MSS, this results in
the transmission of one packet per round-trip time. It is necessary the transmission of one packet per round-trip time. It is necessary
to still reduce the sending rate of the TCP sender even further, on to still reduce the sending rate of the TCP sender even further, on
receipt of an ECN-Echo packet when the congestion window is one. We receipt of an ECN-Echo packet when the congestion window is one. We
use the retransmit timer as a means of reducing the rate further in use the retransmit timer as a means of reducing the rate further in
this circumstance. Therefore, the sending TCP MUST reset the this circumstance. Therefore, the sending TCP MUST reset the
retransmit timer on receiving the ECN-Echo packet when the congestion retransmit timer on receiving the ECN-Echo packet when the congestion
window is one. The sending TCP will then be able to send a new window is one. The sending TCP will then be able to send a new
packet only when the retransmit timer expires. packet only when the retransmit timer expires.
When an ECN-Capable TCP sender reduces its congestion window for any
reason (because of a retransmit timeout, a Fast Retransmit, or in
response to an ECN Notification), the TCP sender sets the CWR flag in
the TCP header of the first new data packet sent after the window
reduction. If that data packet is dropped in the network, then the
sending TCP will have to reduce the congestion window again and
retransmit the dropped packet.
We ensure that the "Congestion Window Reduced" information is reli-
ably delivered to the TCP receiver. This comes about from the fact
that if the new data packet carrying the CWR flag is dropped, then
the TCP sender will have to again reduce its congestion window, and
send another new data packet with the CWR flag set. Thus, the CWR
bit in the TCP header SHOULD NOT be set on retransmitted packets.
When the TCP data sender is ready to set the CWR bit after reducing
the congestion window, it SHOULD set the CWR bit only on the first
new data packet that it transmits.
[Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98]
discusses the validation test in the ns simulator, which illustrates discusses the validation test in the ns simulator, which illustrates
a wide range of ECN scenarios. These scenarios include the following: a wide range of ECN scenarios. These scenarios include the following:
an ECN followed by another ECN, a Fast Retransmit, or a Retransmit an ECN followed by another ECN, a Fast Retransmit, or a Retransmit
Timeout; a Retransmit Timeout or a Fast Retransmit followed by an Timeout; a Retransmit Timeout or a Fast Retransmit followed by an
ECN; and a congestion window of one packet followed by an ECN. ECN; and a congestion window of one packet followed by an ECN.
TCP follows existing algorithms for sending data packets in response TCP follows existing algorithms for sending data packets in response
to incoming ACKs, multiple duplicate acknowledgements, or retransmit to incoming ACKs, multiple duplicate acknowledgements, or retransmit
timeouts [RFC2581]. TCP also follows the normal procedures for timeouts [RFC2581]. TCP also follows the normal procedures for
skipping to change at page 16, line 23 skipping to change at page 16, line 51
all of the data packets being acknowledged. That is, if any of the all of the data packets being acknowledged. That is, if any of the
received data packets are CE packets, then the returning ACK has the received data packets are CE packets, then the returning ACK has the
ECN-Echo flag set. ECN-Echo flag set.
To provide robustness against the possibility of a dropped ACK packet To provide robustness against the possibility of a dropped ACK packet
carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in
a series of ACK packets sent subsequently. The TCP receiver uses the a series of ACK packets sent subsequently. The TCP receiver uses the
CWR flag received from the TCP sender to determine when to stop set- CWR flag received from the TCP sender to determine when to stop set-
ting the ECN-Echo flag. ting the ECN-Echo flag.
When an ECN-Capable TCP sender reduces its congestion window for any
reason (because of a retransmit timeout, a Fast Retransmit, or in
response to an ECN Notification), the TCP sender sets the CWR flag in
the TCP header of the first new data packet sent after the window
reduction. If that data packet is dropped in the network, then the
sending TCP will have to reduce the congestion window again and
retransmit the dropped packet.
We ensure that the "Congestion Window Reduced" information is reli-
ably delivered to the TCP receiver. This comes about from the fact
that if the new data packet carrying the CWR flag is dropped, then
the TCP sender will have to again reduce its congestion window, and
send another new data packet with the CWR flag set. Thus, the CWR
bit in the TCP header SHOULD NOT be set on retransmitted packets.
When the TCP data sender is ready to set the CWR bit after reducing
the congestion window, it SHOULD set the CWR bit only on the first
new data packet that it transmits.
After a TCP receiver sends an ACK packet with the ECN-Echo bit set, After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
that TCP receiver continues to set the ECN-Echo flag in all the ACK that TCP receiver continues to set the ECN-Echo flag in all the ACK
packets it sends (whether they acknowledge CE data packets or non-CE packets it sends (whether they acknowledge CE data packets or non-CE
data packets) until it receives a CWR packet (a packet with the CWR data packets) until it receives a CWR packet (a packet with the CWR
flag set). After the receipt of the CWR packet, acknowledgements for flag set). After the receipt of the CWR packet, acknowledgements for
subsequent non-CE data packets do not have the ECN-Echo flag set. If subsequent non-CE data packets do not have the ECN-Echo flag set. If
another CE packet is received by the data receiver, the receiver another CE packet is received by the data receiver, the receiver
would once again send ACK packets with the ECN-Echo flag set. While would once again send ACK packets with the ECN-Echo flag set. While
the receipt of a CWR packet does not guarantee that the data sender the receipt of a CWR packet does not guarantee that the data sender
received the ECN-Echo message, this does suggest that the data sender received the ECN-Echo message, this does suggest that the data sender
skipping to change at page 20, line 48 skipping to change at page 21, line 16
This section considers the issues when a router is operating, possi- This section considers the issues when a router is operating, possi-
bly maliciously, to modify either of the bits in the ECN field. In bly maliciously, to modify either of the bits in the ECN field. In
this section we represent the ECN field in the IP header by the tuple this section we represent the ECN field in the IP header by the tuple
(ECT bit, CE bit). (ECT bit, CE bit).
By tampering with the bits in the ECN field, an adversary (or a bro- By tampering with the bits in the ECN field, an adversary (or a bro-
ken router) could do one or more of the following: falsely report ken router) could do one or more of the following: falsely report
congestion, disable ECN-Capability for an individual packet, erase congestion, disable ECN-Capability for an individual packet, erase
the ECN congestion indication, or falsely indicate ECN-Capability. the ECN congestion indication, or falsely indicate ECN-Capability.
Appendix X systematically examines the various cases by which the ECN Section 18 systematically examines the various cases by which the ECN
field could be modified. The important criterion considered in field could be modified. The important criterion considered in
determining the consequences of such modifications is whether it is determining the consequences of such modifications is whether it is
likely to lead to poorer behavior in any dimension (throughput, likely to lead to poorer behavior in any dimension (throughput,
delay, fairness or functionality) than if a router were to drop a delay, fairness or functionality) than if a router were to drop a
packet. packet.
The first two possible changes, falsely reporting congestion or dis- The first two possible changes, falsely reporting congestion or dis-
abling ECN-Capability for an individual packet, are no worse than if abling ECN-Capability for an individual packet, are no worse than if
the router were to simply drop the packet. From a congestion control the router were to simply drop the packet. From a congestion control
point of view, setting the CE bit in the absence of congestion by a point of view, setting the CE bit in the absence of congestion by a
non-compliant router would be no worse than a router dropping a non-compliant router would be no worse than a router dropping a
packet unnecessarily. By "erasing" the ECT bit of a packet that is packet unnecessarily. By "erasing" the ECT bit of a packet that is
later dropped in the network, a router's actions could result in an later dropped in the network, a router's actions could result in an
unnecessary packet drop for that packet later in the network. unnecessary packet drop for that packet later in the network.
However, as discussed in Section X in the Appendix, a router that However, as discussed in Section 18, a router that erases the ECN
erases the ECN congestion indication or falsely indicates ECN-Capa- congestion indication or falsely indicates ECN-Capability could
bility could potentially do more damage to the flow that if it has potentially do more damage to the flow that if it has simply dropped
simply dropped the packet. A rogue or broken router that "erased" the packet. A rogue or broken router that "erased" the CE bit in
the CE bit in arriving CE packets would prevent that indication of arriving CE packets would prevent that indication of congestion from
congestion from reaching downstream receivers. This could result in reaching downstream receivers. This could result in the failure of
the failure of congestion control for that flow and a resulting congestion control for that flow and a resulting increase in conges-
increase in congestion in the network, ultimately resulting in subse- tion in the network, ultimately resulting in subsequent packets
quent packets dropped for this flow as the average queue size dropped for this flow as the average queue size increased at the con-
increased at the congested gateway. gested gateway.
Appendix X considers the potential repercussions of subverting end- Section 19 considers the potential repercussions of subverting end-
to-end congestion control by either falsely indicating ECN-Capabil- to-end congestion control by either falsely indicating ECN-Capabil-
ity, or by erasing the congestion indication in ECN (the CE-bit). We ity, or by erasing the congestion indication in ECN (the CE-bit). We
observe in the Appendix that the consequence of subverting ECN-based observe in Section 19 that the consequence of subverting ECN-based
congestion control may lead to potential unfairness, but this is congestion control may lead to potential unfairness, but this is
likely to be no worse than the subversion of either ECN-based or likely to be no worse than the subversion of either ECN-based or
packet-based congestion control by the end nodes. packet-based congestion control by the end nodes.
8.1. Complications Introduced by Split Paths 8.1. Complications Introduced by Split Paths
If a router or other network element has access to all of the packets If a router or other network element has access to all of the packets
of a flow, then that router could do no more damage to a flow by of a flow, then that router could do no more damage to a flow by
altering the ECN field than it could by simply dropping all of the altering the ECN field than it could by simply dropping all of the
packets from that flow. However, in some cases, a malicious or bro- packets from that flow. However, in some cases, a malicious or bro-
ken router might have access to only a subset of the packets from a ken router might have access to only a subset of the packets from a
flow. The question is as follows: can this router, by altering the flow. The question is as follows: can this router, by altering the
ECN field in this subset of the packets, do more damage to that flow ECN field in this subset of the packets, do more damage to that flow
than if it has simply dropped that set of the packets? than if it has simply dropped that set of the packets?
This is also discussed in detail in the Appendix, which concludes as This is also discussed in detail in Section 18, which conclude as
follows: It is true that the adversary that has access only to a follows: It is true that the adversary that has access only to a
subset of packets in an aggregate might, by subverting ECN-based con- subset of packets in an aggregate might, by subverting ECN-based con-
gestion control, be able to deny the benefits of ECN to the other gestion control, be able to deny the benefits of ECN to the other
packets in the aggregate. While this is undesirable, this is not a packets in the aggregate. While this is undesirable, this is not a
sufficient concern to result in disabling ECN within an IP tunnel. sufficient concern to result in disabling ECN.
9. Encapsulated Packets 9. Encapsulated Packets
9.1. IP packets encapsulated in IP 9.1. IP packets encapsulated in IP
The encapsulation of IP packet headers in tunnels is used in many The encapsulation of IP packet headers in tunnels is used in many
places, including IPsec and IP in IP [RFC2003]. Currently, the ECN places, including IPsec and IP in IP [RFC2003]. This section consid-
specification does not accommodate the constraints imposed by some of
these pre-existing specifications for tunnels. This document consid-
ers issues related to interactions between ECN and IP tunnels, and ers issues related to interactions between ECN and IP tunnels, and
specifies two alternative solutions. specifies two alternative solutions. This discussion is complemented
by RFC 2983's discussion of interactions between Differentiated Ser-
vices and IP tunnels of various forms [RFC 2983], as Differentiated
Services uses the remaining six bits of the IP header octet that is
used by ECN (see Figure 1 in Section 5).
Some IP tunnel modes are based on adding a new "outer" IP header that Some IP tunnel modes are based on adding a new "outer" IP header that
encapsulates the original, or "inner" IP header and its associated encapsulates the original, or "inner" IP header and its associated
packet. In many cases, the new "outer" IP header may be added and packet. In many cases, the new "outer" IP header may be added and
removed at intermediate points along a connection, enabling the net- removed at intermediate points along a connection, enabling the net-
work to establish a tunnel without requiring endpoint participation. work to establish a tunnel without requiring endpoint participation.
We denote tunnels that specify that the outer header be discarded at We denote tunnels that specify that the outer header be discarded at
tunnel egress as "simple tunnels". tunnel egress as "simple tunnels".
ECN uses the ECT and CE flags in the IP header for signaling between ECN uses the ECT and CE flags in the IP header for signaling between
routers and connection endpoints. ECN interacts with IP tunnels routers and connection endpoints. ECN interacts with IP tunnels
because of the ECT and CE flags in the DS field octet in the IP based on the treatment of these flags in the IP header. In simple IP
header [RFC2474] (also referred to as the IPv4 TOS octet or IPv6 tunnels the octet containing these flags is copied or mapped from the
Traffic Class octet). [RFC2983] discusses interactions of Differen- inner IP header to the outer IP header at IP tunnel ingress, and the
tiated Services with IP tunnels of various forms. In simple IP tun- outer header's copy of this field is discarded at IP tunnel egress.
nels the DS field octet is copied or mapped from the inner IP header If the outer header were to be simply discarded without taking care
to the outer IP header at IP tunnel ingress, and the outer header's to deal with the ECN related flags, and an ECN-capable router were to
copy of this field is discarded at IP tunnel egress. If the outer set the CE (Congestion Experienced) bit within a packet in a simple
header were to be simply discarded without taking care to deal with IP tunnel, this indication would be discarded at tunnel egress, los-
the ECN related flags, and an ECN-capable router were to set the CE ing the indication of congestion.
(Congestion Experienced) bit within a packet in a simple IP tunnel,
this indication would be discarded at tunnel egress, losing the indi-
cation of congestion.
Thus, the use of ECN over simple IP tunnels would result in routers Thus, the use of ECN over simple IP tunnels would result in routers
attempting to use the outer IP header to signal congestion to end- attempting to use the outer IP header to signal congestion to end-
points, but those congestion warnings never arriving because the points, but those congestion warnings never arriving because the
outer header is discarded at the tunnel egress point. This problem outer header is discarded at the tunnel egress point. This problem
was encountered with ECN and IPsec in tunnel mode, and RFC 2481 rec- was encountered with ECN and IPsec in tunnel mode, and RFC 2481 rec-
ommended that ECN not be used with the older simple IPsec tunnels in ommended that ECN not be used with the older simple IPsec tunnels in
order to avoid this behavior and its consequences. When ECN becomes order to avoid this behavior and its consequences. When ECN becomes
widely deployed, then simple tunnels likely to carry ECN-capable widely deployed, then simple tunnels likely to carry ECN-capable
traffic will have to be changed. traffic will have to be changed.
From a security point of view, the use of ECN in the outer header of From a security point of view, the use of ECN in the outer header of
an IP tunnel might raise security concerns because an adversary could an IP tunnel might raise security concerns because an adversary could
tamper with the ECN information that propagates beyond the tunnel tamper with the ECN information that propagates beyond the tunnel
endpoint. Based on an analysis in the Appendix of these concerns and endpoint. Based on an analysis in Sections 18 and 19 of these con-
the resultant risks, our overall approach is to make support for ECN cerns and the resultant risks, our overall approach is to make sup-
an option for IP tunnels, so that an IP tunnel can be specified or port for ECN an option for IP tunnels, so that an IP tunnel can be
configured either to use ECN or not to use ECN in the outer header of specified or configured either to use ECN or not to use ECN in the
the tunnel. Thus, in environments or tunneling protocols where the outer header of the tunnel. Thus, in environments or tunneling pro-
risks of using ECN are judged to outweigh its benefits, the tunnel tocols where the risks of using ECN are judged to outweigh its bene-
can simply not use ECN in the outer header. Then the only indication fits, the tunnel can simply not use ECN in the outer header. Then
of congestion experienced at routers within the tunnel would be the only indication of congestion experienced at routers within the
through packet loss. tunnel would be through packet loss.
The result is that there are two viable options for the behavior of The result is that there are two viable options for the behavior of
ECN-capable connections over an IP tunnel, especially IPSec tunnels: ECN-capable connections over an IP tunnel, especially IPsec tunnels:
* A limited-functionality option in which ECN is preserved in the * A limited-functionality option in which ECN is preserved in the
inner header, but disabled in the outer header. The only mecha- inner header, but disabled in the outer header. The only mecha-
nism available for signaling congestion occurring within the tun- nism available for signaling congestion occurring within the tun-
nel in this case is dropped packets. nel in this case is dropped packets.
* A full-functionality option that supports ECN in both the inner * A full-functionality option that supports ECN in both the inner
and outer headers, and propagates congestion warnings from nodes and outer headers, and propagates congestion warnings from nodes
within the tunnel to endpoints. within the tunnel to endpoints.
Support for these options requires varying amounts of changes to IP Support for these options requires varying amounts of changes to IP
header processing at tunnel ingress and egress. A small subset of header processing at tunnel ingress and egress. A small subset of
these changes sufficient to support only the limited-functionality these changes sufficient to support only the limited-functionality
option would be sufficient to eliminate any incompatibility between option would be sufficient to eliminate any incompatibility between
ECN and IP tunnels. ECN and IP tunnels.
One goal of this document is to give guidance about the tradeoffs One goal of this document is to give guidance about the tradeoffs
between the limited-functionality and full-functionality options. A between the limited-functionality and full-functionality options. A
full discussion of the potential effects of an adversary's modifica- full discussion of the potential effects of an adversary's modifica-
tions of the CE and ECT bits is given in the Appendix. tions of the CE and ECT bits is given in Sections 18 and 19.
9.1.1. The limited-functionality and full-functionality options within 9.1.1. The Limited-functionality and Full-functionality Options
IP Tunnels
The limited-functionality option for ECN encapsulation in IP tunnels The limited-functionality option for ECN encapsulation in IP tunnels
is for the ECT bit in the outside (encapsulating) header to be off is for the ECT bit in the outside (encapsulating) header to be off
(i.e., set to 0), regardless of the value of the ECT bit in the (i.e., set to 0), regardless of the value of the ECT bit in the
inside (encapsulated) header. With this option, the ECN field in the inside (encapsulated) header. With this option, the ECN field in the
inner header is not altered upon de-capsulation. The disadvantage of inner header is not altered upon de-capsulation. The disadvantage of
this approach is that the flow does not have ECN support for that this approach is that the flow does not have ECN support for that
part of the path that is using IP tunneling, even if the encapsulated part of the path that is using IP tunneling, even if the encapsulated
packet (from the original TCP sender) is ECN-Capable. That is, if packet (from the original TCP sender) is ECN-Capable. That is, if
the encapsulated packet arrives at a congested router that is ECN- the encapsulated packet arrives at a congested router that is ECN-
capable, and the router can decide to drop or mark the packet as an capable, and the router can decide to drop or mark the packet as an
indication of congestion to the end nodes, the router will not be indication of congestion to the end nodes, the router will not be
permitted to set the CE bit in the packet header, but instead will permitted to set the CE bit in the packet header, but instead will
have to drop the packet. have to drop the packet.
The IP full-functionality option for ECN encapsulation is to copy the The full-functionality option for ECN encapsulation is to copy the
ECT bit of the inside header to the outside header on encapsulation, ECT bit of the inside header to the outside header on encapsulation,
and to OR the CE bit from the outer header with the CE bit of the and to OR the CE bit from the outer header with the CE bit of the
inside header on decapsulation. That is, for full ECN support the inside header on decapsulation. That is, for full ECN support the
encapsulation and decapsulation processing for the DS field octet encapsulation and decapsulation processing involves the following:
involves the following: At tunnel ingress, the full-functionality At tunnel ingress, the full-functionality option copies the value of
option copies the value of ECT (bit 6) in the inner header to the ECT (bit 6) in the inner header to the outer header. CE (bit 7) is
outer header. CE (bit 7) is set to 0 in the outer header. Upon set to 0 in the outer header. Upon decapsulation at the tunnel
decapsulation at the tunnel egress, the full-functionality option egress, the full-functionality option sets CE to 1 in the inner
sets CE to 1 in the inner header if the value of ECT (bit 6) in the header if the value of ECT (bit 6) in the inner header is 1, and the
inner header is 1, and the value of CE (bit 7) in the outer header is value of CE (bit 7) in the outer header is 1. Otherwise, no change
1. Otherwise, no change is made to this field of the inner header. is made to this field of the inner header.
With the full-functionality option, a flow can take advantage of ECN With the full-functionality option, a flow can take advantage of ECN
in those parts of the path that might use IP tunneling. The disad- in those parts of the path that might use IP tunneling. The disad-
vantage of the full-functionality option from a security perspective vantage of the full-functionality option from a security perspective
is that the IP tunnel cannot protect the flow from certain modifica- is that the IP tunnel cannot protect the flow from certain modifica-
tions to the ECN bits in the IP header within the tunnel. The poten- tions to the ECN bits in the IP header within the tunnel. The poten-
tial dangers from modifications to the ECN bits in the IP header are tial dangers from modifications to the ECN bits in the IP header are
described in detail in the Appendix. described in detail in Sections 18 and 19.
(1) An IP tunnel MUST modify the handling of the DS field octet at (1) An IP tunnel MUST modify the handling of the DS field octet at
IP tunnel endpoints by implementing either the limited-functional- IP tunnel endpoints by implementing either the limited-functional-
ity or the full-functionality option. ity or the full-functionality option.
(2) Optionally, an IP tunnel MAY enable the endpoints of an IP (2) Optionally, an IP tunnel MAY enable the endpoints of an IP
tunnel to negotiate the choice between the limited-functionality tunnel to negotiate the choice between the limited-functionality
and the full-functionality option for ECN in the tunnel. and the full-functionality option for ECN in the tunnel.
The minimum required to make ECN usable with IP tunnels is the lim- The minimum required to make ECN usable with IP tunnels is the lim-
ited-functionality option, which prevents ECN from being enabled in ited-functionality option, which prevents ECN from being enabled in
skipping to change at page 24, line 48 skipping to change at page 25, line 17
support the limited-functionality or the full-functionality ECN support the limited-functionality or the full-functionality ECN
option. option.
In addition, it is RECOMMENDED that packets with ECT and CE both set In addition, it is RECOMMENDED that packets with ECT and CE both set
to 1 in the outer header be dropped if they arrive at the tunnel to 1 in the outer header be dropped if they arrive at the tunnel
egress point for a tunnel that uses the limited-functionality option, egress point for a tunnel that uses the limited-functionality option,
or for a tunnel that uses the full-functionality option but for which or for a tunnel that uses the full-functionality option but for which
the ECT bit in the inner header is set to zero. This is motivated by the ECT bit in the inner header is set to zero. This is motivated by
backwards compatibility and to ensure that no unauthorized modifica- backwards compatibility and to ensure that no unauthorized modifica-
tions of the ECN field take place, and is discussed further in the tions of the ECN field take place, and is discussed further in the
Appendix. next Section (9.1.2).
9.1.2. Changes to the ECN Field within an IP Tunnel. 9.1.2. Changes to the ECN Field within an IP Tunnel.
The presence of a copy of the ECN field in the inner header of an IP The presence of a copy of the ECN field in the inner header of an IP
tunnel mode packet provides an opportunity for detection of unautho- tunnel mode packet provides an opportunity for detection of unautho-
rized modifications to the ECT bit in the outer header. Comparison rized modifications to the ECT bit in the outer header. Comparison
of the ECT bits in the inner and outer headers falls into two cate- of the ECT bits in the inner and outer headers falls into two cate-
gories for implementations that conform to this document: gories for implementations that conform to this document:
* If the IP tunnel uses the full-functionality option, then the * If the IP tunnel uses the full-functionality option, then the
values of the ECT bits in the inner and outer headers should be values of the ECT bits in the inner and outer headers should be
skipping to change at page 26, line 41 skipping to change at page 27, line 8
processing at a tunnel egress node; this would have ruled out the processing at a tunnel egress node; this would have ruled out the
possibility of full-functionality mode for ECN. At the same time, possibility of full-functionality mode for ECN. At the same time,
this would ensure that an adversary's modifications to the ECN field this would ensure that an adversary's modifications to the ECN field
cannot be used to launch theft- or denial-of-service attacks across cannot be used to launch theft- or denial-of-service attacks across
an IPsec tunnel endpoint, as any such modifications will be discarded an IPsec tunnel endpoint, as any such modifications will be discarded
at the tunnel endpoint. at the tunnel endpoint.
In principle, permitting the use of ECN functionality in the outer In principle, permitting the use of ECN functionality in the outer
header of an IPsec tunnel raises security concerns because an adver- header of an IPsec tunnel raises security concerns because an adver-
sary could tamper with the information that propagates beyond the sary could tamper with the information that propagates beyond the
tunnel endpoint. Based on an analysis (included in the Appendix) of tunnel endpoint. Based on an analysis (included in Sections 18 and
these concerns and the associated risks, our overall approach has 19) of these concerns and the associated risks, our overall approach
been to provide configuration support for IPsec changes to remove the has been to provide configuration support for IPsec changes to remove
conflict with ECN. the conflict with ECN.
In particular, in tunnel mode the IPsec tunnel MUST support either In particular, in tunnel mode the IPsec tunnel MUST support either
the limited-functionality or the full-functionality mode outlined in the limited-functionality or the full-functionality mode outlined in
Section X. Section 9.1.1.
This makes permission to use ECN functionality in the outer header of This makes permission to use ECN functionality in the outer header of
an IPsec tunnel a configurable part of the corresponding IPsec an IPsec tunnel a configurable part of the corresponding IPsec Secu-
Security Association (SA), so that it can be disabled in situations rity Association (SA), so that it can be disabled in situations where
where the risks are judged to outweigh the benefits. The result is the risks are judged to outweigh the benefits. The result is that an
that an IPsec security administrator is presented with two alterna- IPsec security administrator is presented with two alternatives for
tives for the behavior of ECN-capable connections within an IPsec the behavior of ECN-capable connections within an IPsec tunnel, the
tunnel, the limited-functionality alternative and full-functionality limited-functionality alternative and full-functionality alternative
alternative described earlier. All IPsec implementations MUST imple- described earlier. All IPsec implementations MUST implement either
ment either the limited-functionality or the full-functionality the limited-functionality or the full-functionality alternative in
alternative in order to eliminate incompatibility between ECN and order to eliminate incompatibility between ECN and IPsec tunnels, but
IPsec tunnels, but implementers MAY choose to implement either alter- implementers MAY choose to implement either alternative.
native.
In addition, this document specifies how the endpoints of an IPsec In addition, this document specifies how the endpoints of an IPsec
tunnel could negotiate enabling ECN functionality in the outer head- tunnel could negotiate enabling ECN functionality in the outer head-
ers of that tunnel based on security policy. The ability to negoti- ers of that tunnel based on security policy. The ability to negoti-
ate ECN usage between tunnel endpoints would enable a security admin- ate ECN usage between tunnel endpoints would enable a security admin-
istrator to disable ECN in situations where she believes the risks istrator to disable ECN in situations where she believes the risks
(e.g., of lost congestion notifications) outweigh the benefits of (e.g., of lost congestion notifications) outweigh the benefits of
ECN. ECN.
The IPsec protocol, as defined in [ESP, AH], does not include the IP The IPsec protocol, as defined in [ESP, AH], does not include the IP
skipping to change at page 28, line 36 skipping to change at page 28, line 52
ECN Tunnel: allowed or forbidden. ECN Tunnel: allowed or forbidden.
Indicates whether ECN-capable connections using this SA in tunnel Indicates whether ECN-capable connections using this SA in tunnel
mode are permitted to receive ECN congestion notifications for mode are permitted to receive ECN congestion notifications for
congestion occurring within the tunnel. The allowed value enables congestion occurring within the tunnel. The allowed value enables
ECN congestion notifications. The forbidden value disables such ECN congestion notifications. The forbidden value disables such
notifications, causing all congestion to be indicated via dropped notifications, causing all congestion to be indicated via dropped
packets. packets.
[OPTIONAL. The value of this field SHOULD be assumed to be "for- [OPTIONAL. The value of this field SHOULD be assumed to be
bidden" in implementations that do not support it.] "forbidden" in implementations that do not support it.]
If this attribute is implemented, then the SA specification in a If this attribute is implemented, then the SA specification in a
Security Policy Database (SPD) entry MUST support a corresponding Security Policy Database (SPD) entry MUST support a corresponding
attribute, and this SPD attribute MUST be covered by the SPD adminis- attribute, and this SPD attribute MUST be covered by the SPD adminis-
trative interface (currently described in Section 4.4.1 of trative interface (currently described in Section 4.4.1 of
[RFC2401]). [RFC2401]).
9.2.1.2. ECN Tunnel Security Association Attribute 9.2.1.2. ECN Tunnel Security Association Attribute
A new IPsec Security Association Attribute is defined to enable the A new IPsec Security Association Attribute is defined to enable the
support for ECN congestion notifications based on the outer IP header support for ECN congestion notifications based on the outer IP header
to be negotiated for IPsec tunnels (see [RFC2407]). This attribute to be negotiated for IPsec tunnels (see [RFC2407]). This attribute
is OPTIONAL, although implementations that support it SHOULD also is OPTIONAL, although implementations that support it SHOULD also
support the SAD field defined in Section 3.1. support the SAD field defined in Section 9.2.1.1.
Attribute Type Attribute Type
class value type class value type
------------------------------------------------- -------------------------------------------------
ECN Tunnel 10 Basic ECN Tunnel 10 Basic
The IPsec SA Attribute value 10 has been allocated by IANA to indi- The IPsec SA Attribute value 10 has been allocated by IANA to indi-
cate that the ECN Tunnel SA Attribute is being negotiated; the type cate that the ECN Tunnel SA Attribute is being negotiated; the type
of this attribute is Basic (see Section 4.5 of [RFC2407]). The Class of this attribute is Basic (see Section 4.5 of [RFC2407]). The Class
skipping to change at page 29, line 25 skipping to change at page 29, line 40
RFC2409] for further information including encoding formats and RFC2409] for further information including encoding formats and
requirements for negotiating this SA attribute. requirements for negotiating this SA attribute.
Class Values Class Values
ECN Tunnel ECN Tunnel
Specifies whether ECN functionality is allowed to Specifies whether ECN functionality is allowed to
be used with Tunnel Encapsulation Mode. be used with Tunnel Encapsulation Mode.
This affects tunnel encapsulation and decapsulation processing - This affects tunnel encapsulation and decapsulation processing -
see Section 3.3. see Section 9.2.1.3.
RESERVED 0 RESERVED 0
Allowed 1 Allowed 1
Forbidden 2 Forbidden 2
Values 3-61439 are reserved to IANA. Values 61440-65535 are for Values 3-61439 are reserved to IANA. Values 61440-65535 are for
private use. private use.
If unspecified, the default shall be assumed to be Forbidden. If unspecified, the default shall be assumed to be Forbidden.
skipping to change at page 29, line 49 skipping to change at page 30, line 16
ity with such implementations initiators SHOULD always also include a ity with such implementations initiators SHOULD always also include a
proposal without the ECN Tunnel attribute to enable such a responder proposal without the ECN Tunnel attribute to enable such a responder
to select a transform or proposal that does not contain the ECN Tun- to select a transform or proposal that does not contain the ECN Tun-
nel attribute. RFC 2407 currently requires responders to reject all nel attribute. RFC 2407 currently requires responders to reject all
proposals if any proposal contains an unknown attribute; this proposals if any proposal contains an unknown attribute; this
requirement is expected to be changed to require a responder not to requirement is expected to be changed to require a responder not to
select proposals or transforms containing unknown attributes. select proposals or transforms containing unknown attributes.
9.2.1.3. Changes to IPsec Tunnel Header Processing 9.2.1.3. Changes to IPsec Tunnel Header Processing
Subsequent to the publication of [RFC 2401], the TOS octet of IPv4
and the Traffic Class octet of IPv6 have been superseded by the six-
bit DS Field [RFC2474, RFC2780] and a two-bit "currently unused" (CU)
field [RFC2780], and this document supersedes the CU field by tne ECN
Field.
For full ECN support, the encapsulation and decapsulation processing For full ECN support, the encapsulation and decapsulation processing
for the IPv4 TOS field and the IPv6 Traffic Class field are changed for the IPv4 TOS field and the IPv6 Traffic Class field are changed
from that specified in [RFC2401] to the following: from that specified in [RFC2401] to the following:
<-- How Outer Hdr Relates to Inner Hdr --> <-- How Outer Hdr Relates to Inner Hdr -->
Outer Hdr at Inner Hdr at Outer Hdr at Inner Hdr at
IPv4 Encapsulator Decapsulator IPv4 Encapsulator Decapsulator
Header fields: -------------------- ------------ Header fields: -------------------- ------------
DS Field copied from inner hdr (5) no change DS Field copied from inner hdr (5) no change
ECN Field constructed (7) constructed (8) ECN Field constructed (7) constructed (8)
skipping to change at page 30, line 43 skipping to change at page 31, line 5
SA is "allowed" and the value of ECT (bit 0) in the inner header SA is "allowed" and the value of ECT (bit 0) in the inner header
is 1, then set the CE bit (bit 1) in the inner header to the logi- is 1, then set the CE bit (bit 1) in the inner header to the logi-
cal OR of the CE bit in the inner header with the CE bit in the cal OR of the CE bit in the inner header with the CE bit in the
outer header, else make no change to the ECN field. outer header, else make no change to the ECN field.
(5) and (6) are identical to match usage in [RFC2401], although (5) and (6) are identical to match usage in [RFC2401], although
they are different in [RFC2401]. they are different in [RFC2401].
The above description applies to implementations that support the ECN The above description applies to implementations that support the ECN
Tunnel field in the SAD; such implementations MUST implement this Tunnel field in the SAD; such implementations MUST implement this
processing of the DS field instead of the processing of the IPv4 TOS processing instead of the processing of the IPv4 TOS octet and IPv6
octet and IPv6 Traffic Class octet defined in [RFC2401]. This con- Traffic Class octet defined in [RFC2401]. This constitutes the full-
stitutes the full-functionality alternative for ECN usage with IPsec functionality alternative for ECN usage with IPsec tunnels.
tunnels.
An implementation that does not support the ECN Tunnel field in the An implementation that does not support the ECN Tunnel field in the
SAD MUST implement processing of the DS Field by assuming that the SAD MUST implement this processing by assuming that the value of the
value of the ECN Tunnel field of the SAD is "forbidden" for every SA. ECN Tunnel field of the SAD is "forbidden" for every SA. In this
In this case, the processing of the ECN field reduces to: case, the processing of the ECN field reduces to:
(7) Set the ECN field (ECT and CE bits) to zero in the outer (7) Set the ECN field (ECT and CE bits) to zero in the outer
header. header.
(8) Make no change to the ECN field in the inner header. (8) Make no change to the ECN field in the inner header.
This constitutes the limited functionality alternative for ECN usage This constitutes the limited functionality alternative for ECN usage
with IPsec tunnels. with IPsec tunnels.
For backwards compatibility, packets with ECT and CE both set to 1 in For backwards compatibility, packets with ECT and CE both set to 1 in
the outer header SHOULD be dropped if they arrive on an SA that is the outer header SHOULD be dropped if they arrive on an SA that is
skipping to change at page 31, line 40 skipping to change at page 31, line 49
9.2.3. Comments for IPsec Support 9.2.3. Comments for IPsec Support
Substantial comments were received on two areas of this document dur- Substantial comments were received on two areas of this document dur-
ing review by the IPsec working group. This section describes these ing review by the IPsec working group. This section describes these
comments and explains why the proposed changes were not incorporated. comments and explains why the proposed changes were not incorporated.
The first comment indicated that per-node configuration is easier to The first comment indicated that per-node configuration is easier to
implement than per-SA configuration. After serious thought and implement than per-SA configuration. After serious thought and
despite some initial encouragement of per-node configuration, it no despite some initial encouragement of per-node configuration, it no
longer seems to be a good idea. The concern is that as IPsec is pro- longer seems to be a good idea. The concern is that as ECN-awareness
gressively deployed, many ECN-aware IPsec implementations will find is progressively deployed in IPsec, many ECN-aware IPsec implementa-
themselves communicating with a mixture of ECN-aware and ECN-unaware tions will find themselves communicating with a mixture of ECN-aware
IPsec tunnel endpoints. In such an environment with per-node config- and ECN-unaware IPsec tunnel endpoints. In such an environment with
uration, the only reasonable thing to do is forbid ECN usage for all per-node configuration, the only reasonable thing to do is forbid ECN
IPsec tunnels, which is not the desired outcome. usage for all IPsec tunnels, which is not the desired outcome.
In the second area, several reviewers noted that SA negotiation is In the second area, several reviewers noted that SA negotiation is
complex, and adding to it is non-trivial. One reviewer suggested complex, and adding to it is non-trivial. One reviewer suggested
using ICMP after tunnel setup as a possible alternative. The addi- using ICMP after tunnel setup as a possible alternative. The addi-
tion to SA negotiation in the document is OPTIONAL and will remain tion to SA negotiation in this document is OPTIONAL and will remain
so; implementers are free to ignore it. The authors believe that the so; implementers are free to ignore it. The authors believe that the
assurance it provides can be useful in a number of situations. In assurance it provides can be useful in a number of situations. In
practice, if this is not implemented, it can be deleted at a subse- practice, if this is not implemented, it can be deleted at a subse-
quent stage in the standards process. Extending ICMP to negotiate quent stage in the standards process. Extending ICMP to negotiate
ECN after tunnel setup is more complex than extending SA attribute ECN after tunnel setup is more complex than extending SA attribute
negotiation. Some tunnels do not permit traffic to be addressed to negotiation. Some tunnels do not permit traffic to be addressed to
the tunnel egress endpoint, hence the ICMP packet would have to be the tunnel egress endpoint, hence the ICMP packet would have to be
addressed to somewhere else, scanned for by the egress endpoint, and addressed to somewhere else, scanned for by the egress endpoint, and
discarded there or at its actual destination. In addition, ICMP discarded there or at its actual destination. In addition, ICMP
delivery is unreliable, and hence there is a possibility of an ICMP delivery is unreliable, and hence there is a possibility of an ICMP
skipping to change at page 32, line 23 skipping to change at page 32, line 33
ack/retransmit mechanism. It seems better simply to specify an ack/retransmit mechanism. It seems better simply to specify an
OPTIONAL extension to the existing SA negotiation mechanism. OPTIONAL extension to the existing SA negotiation mechanism.
9.3. IP packets encapsulated in non-IP packet headers. 9.3. IP packets encapsulated in non-IP packet headers.
A different set of issues are raised, relative to ECN, when IP pack- A different set of issues are raised, relative to ECN, when IP pack-
ets are encapsulated in tunnels with non-IP packet headers. This ets are encapsulated in tunnels with non-IP packet headers. This
occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP].
For these protocols, there is no conflict with ECN; it is just that For these protocols, there is no conflict with ECN; it is just that
ECN cannot be used within the tunnel unless an ECN codepoint can be ECN cannot be used within the tunnel unless an ECN codepoint can be
specified for the header of the encapsulating protocol. [RFD99] con- specified for the header of the encapsulating protocol. Earlier work
sidered a preliminary proposal for incorporating ECN into MPLS, and considered a preliminary proposal for incorporating ECN into MPLS,
proposals for incorporating ECN into GRE, L2TP, or PPTP will be con- and proposals for incorporating ECN into GRE, L2TP, or PPTP will be
sidered as the need arises. considered as the need arises.
10. Issues Raised by Monitoring and Policing Devices 10. Issues Raised by Monitoring and Policing Devices
One possibility is that monitoring and policing devices (or more One possibility is that monitoring and policing devices (or more
informally, "penalty boxes") will be installed in the network to mon- informally, "penalty boxes") will be installed in the network to mon-
itor whether best-effort flows are appropriately responding to con- itor whether best-effort flows are appropriately responding to con-
gestion, and to preferentially drop packets from flows determined not gestion, and to preferentially drop packets from flows determined not
to be using adequate end-to-end congestion control procedures. This to be using adequate end-to-end congestion control procedures.
is discussed in more detail in the Appendix.
We recommend that any "penalty box" that detects a flow or an aggre- We recommend that any "penalty box" that detects a flow or an aggre-
gate of flows that is not responding to end-to-end congestion control gate of flows that is not responding to end-to-end congestion control
first change from marking to dropping packets from that flow, before first change from marking to dropping packets from that flow, before
taking any additional action to restrict the bandwidth available to taking any additional action to restrict the bandwidth available to
that flow. Thus, initially, the router may drop packets in which the that flow. Thus, initially, the router may drop packets in which the
router would otherwise would have set the CE bit. This could include router would otherwise would have set the CE bit. This could include
dropping those arriving packets for that flow that are ECN-Capable dropping those arriving packets for that flow that are ECN-Capable
and that already have the CE bit set. In this way, any congestion and that already have the CE bit set. In this way, any congestion
indications seen by that router for that flow will be guaranteed to indications seen by that router for that flow will be guaranteed to
also be seen by the end nodes, even in the presence of malicious or also be seen by the end nodes, even in the presence of malicious or
broken routers elsewhere in the path. If we assume that the first broken routers elsewhere in the path. If we assume that the first
action taken at any "penalty box" for an ECN-capable flow will be to action taken at any "penalty box" for an ECN-capable flow will be to
drop packets instead of marking them, then there is no way that an drop packets instead of marking them, then there is no way that an
adversary that subverts ECN-based end-to-end congestion control can adversary that subverts ECN-based end-to-end congestion control can
cause a flow to be characterized as being non-cooperative and placed cause a flow to be characterized as being non-cooperative and placed
into a more severe action within the "penalty box". into a more severe action within the "penalty box".
The monitoring and policing devices that are actually deployed could The monitoring and policing devices that are actually deployed could
fall short of the `ideal' monitoring device described above, in that fall short of the `ideal' monitoring device described above, in that
the monitoring is applied not to a single flow or to a single IPsec the monitoring is applied not to a single flow, but to an aggregate
tunnel, but to an aggregate of flows. In this case, the switch from of flows (e.g., those sharing a single IPsec tunnel). In this case,
marking to dropping would apply to all of the flows in that aggre- the switch from marking to dropping would apply to all of the flows
gate, denying the benefits of ECN to the other flows in the aggregate in that aggregate, denying the benefits of ECN to the other flows in
also. At the highest level of aggregation, another form of the dis- the aggregate also. At the highest level of aggregation, another
abling of ECN happens even in the absence of monitoring and policing form of the disabling of ECN happens even in the absence of monitor-
devices, when ECN-Capable RED queues switch from marking to dropping ing and policing devices, when ECN-Capable RED queues switch from
packets as an indication of congestion when the average queue size marking to dropping packets as an indication of congestion when the
has exceeded some threshold. average queue size has exceeded some threshold.
If there were serious operational problems with routers inappropri- If there were serious operational problems with routers inappropri-
ately erasing the CE bit in packet headers, one potential fix would ately erasing the CE bit in packet headers, this could be addressed
be to include a one-bit ECN nonce in packet headers, and for routers to some extent by including a one-bit ECN nonce in packet headers.
to erase the nonce when they set the CE bit [SCWA99]. Routers that Routers would erase the nonce when they set the CE bit [SCWA99].
erased the CE bit would be unable to consistently reconstruct the Routers that erased the CE bit would face additional difficulty in
original nonce, and thus repeated erasure of the CE bit would be reconstructing the original nonce, and thus repeated erasure of the
detected by the end-nodes. (This could in fact be done without CE bit would be more likely to be detected by the end-nodes. (This
adding any extra bits for ECN in the IP header, by using the ECN could in fact be done without adding any extra bits for ECN in the IP
codepoints (ECT=1, CE=0) and (ECT=0, CE=1) as the two values for the header, by using the ECN codepoints (ECT=1, CE=0) and (ECT=0, CE=1)
nonce, and by defining the codepoint (ECT=0, CE=1) to mean exactly as the two values for the nonce, and by defining the codepoint
the same as the codepoint (ECT=1, CE=0).) However, at this point the (ECT=0, CE=1) to mean exactly the same as the codepoint (ECT=1,
potential danger of misbehaving routers does not seem of sufficient CE=0).) However, at this point the potential danger of misbehaving
concern to warrant this additional complication of adding an ECN routers does not seem of sufficient concern to warrant this addi-
nonce to protect against the erasure of the CE bit. tional complication of adding an ECN nonce to protect against the
erasure of the CE bit. Additional research is also needed to better
understand the value of such a nonce and appropriate means of gener-
ating sequences of nonce values that an adversary will find suffi-
ciently difficult to reconstruct.
An ECN nonce would also address the problem of misbehaving transport An ECN nonce would also address the problem of misbehaving transport
receivers lying to the transport sender about whether or not the CE receivers lying to the transport sender about whether or not the CE
bit was set in a packet. However, another possibility is for the bit was set in a packet. However, another possibility is for the
data sender to test for a misbehaving receiver directly, by occasion- data sender to test for a misbehaving receiver directly, by occasion-
ally sending a data packet with ECT and CE set, to see if the ally sending a data packet with ECT and CE set, to see if the
receiver reports receiving the CE bit. Of course, if these packets receiver reports receiving the CE bit. Of course, if these packets
encountered congestion in the network, the TCP sender would not encountered congestion in the network, the router would make no
receive this indication of congestion, so setting the ECT and CE bits change in the packets, because the CE bit would already be set.
at the sender would have to be done very sparingly. In addition, the Thus, for packets sent with the ECT and CE bits set, the TCP end-
TCP sender would have to remember which packets were sent with the nodes could not determine if some router intended to set the CE bit
ECT and CE bits set, so that it doesn't react to them as if there was in these packets. For this reason, sending packets with the ECT and
CE bits would have to be done very sparingly. In addition, the TCP
sender would have to remember which packets were sent with the ECT
and CE bits set, so that it doesn't react to them as if there was
congestion in the network. We believe that further research is congestion in the network. We believe that further research is
needed on possible transport-based mechanisms for verifying that the needed on possible transport-based mechanisms for verifying that the
transport receiver does not lie to the transport sender about the transport receiver does not lie to the transport sender about the
receipt of congestion indications. receipt of congestion indications.
11. Evaluations of ECN 11. Evaluations of ECN
This section discusses some of the related work evaluating the use of This section discusses some of the related work evaluating the use of
ECN. The ECN Web Page [ECN] has pointers to other papers, as well as ECN. The ECN Web Page [ECN] has pointers to other papers, as well as
to implementations of ECN. to implementations of ECN.
skipping to change at page 34, line 52 skipping to change at page 35, line 16
ECT bit. The ECT bit set to "1" indicates that the transport proto- ECT bit. The ECT bit set to "1" indicates that the transport proto-
col is willing and able to participate in ECN. col is willing and able to participate in ECN.
The default value for the CE bit is "0". The router sets the CE bit The default value for the CE bit is "0". The router sets the CE bit
to "1" to indicate congestion to the end nodes. The CE bit in a to "1" to indicate congestion to the end nodes. The CE bit in a
packet header MUST NOT be reset by a router from "1" to "0". packet header MUST NOT be reset by a router from "1" to "0".
When viewed in terms of code points, this document has defined three When viewed in terms of code points, this document has defined three
code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but
not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1). The code not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1). The code
point of (ECT=0, CE=1) is not defined in this document. One point of (ECT=0, CE=1) is not defined in this document. One possi-
possibility would be for this code point to be used, some time in the bility would be for this code point to be used, some time in the
future, for some other function for non-ECN-capable packets. A sec- future, for some other function for non-ECN-capable packets. A sec-
ond possibility would be for this code point to be used as an ECN ond possibility would be for this code point to be used as an ECN
nonce, as described earlier in the paper. A third possibility would nonce, as described earlier in the document. A third possibility
be for the code point (ECT=0, CE=1) to be used to indicate that the would be for the code point (ECT=0, CE=1) to be used to indicate that
packet is ECN-capable for an alternate semantics for the Congestion the packet is ECN-capable for an alternate semantics for the Conges-
Experienced indication. However, at this time the code point (ECT=0, tion Experienced indication. However, at this time the code point
CE=1) remains undefined. (ECT=0, CE=1) remains undefined.
TCP requires three changes for ECN, a setup phase and two new flags TCP requires three changes for ECN, a setup phase and two new flags
in the TCP header. The ECN-Echo flag is used by the data receiver to in the TCP header. The ECN-Echo flag is used by the data receiver to
inform the data sender of a received CE packet. The Congestion Win- inform the data sender of a received CE packet. The Congestion Win-
dow Reduced (CWR) flag is used by the data sender to inform the data dow Reduced (CWR) flag is used by the data sender to inform the data
receiver that the congestion window has been reduced. receiver that the congestion window has been reduced.
When ECN (Explicit Congestion Notification [RFC2481]) is used, it is When ECN (Explicit Congestion Notification [RFC2481]) is used, it is
required that congestion indications generated within an IP tunnel required that congestion indications generated within an IP tunnel
not be lost at the tunnel egress. We specified a minor modification not be lost at the tunnel egress. We specified a minor modification
skipping to change at page 35, line 38 skipping to change at page 35, line 51
tunnel, by turning the ECT bit in the outer header off, and not tunnel, by turning the ECT bit in the outer header off, and not
altering the inner header at the time of decapsulation. altering the inner header at the time of decapsulation.
2) The full-functionality option, which copies the ECT bit of the 2) The full-functionality option, which copies the ECT bit of the
inner header to the encapsulating header. At decapsulation, if the inner header to the encapsulating header. At decapsulation, if the
ECT bit is set in the inner header, the CE bit on the outer header is ECT bit is set in the inner header, the CE bit on the outer header is
ORed with the CE bit of the inner header to update the CE bit of the ORed with the CE bit of the inner header to update the CE bit of the
packet. packet.
All IP tunnels MUST implement one of the two alternative approaches All IP tunnels MUST implement one of the two alternative approaches
described above. For IPsec tunnels, this document also defines an described above. For IPsec tunnels, this document also defines an
optional IPsec SA attribute that enables negotiation of ECN usage optional IPsec Security Association (SA) attribute that enables
within IPsec tunnels and an optional field in the Security Associa- negotiation of ECN usage within IPsec tunnels and an optional field
tion Database to indicate whether ECN is permitted in tunnel mode on in the Security Association Database to indicate whether ECN is per-
a SA. mitted in tunnel mode on a SA. The required changes to IPsec tunnels
for ECN usage modify RFC 2401 [RFC2401], which defines the IPsec
architecture and specifies some aspects of its implementation. The
new IPsec SA attribute is in addition to those already defined in
Section 4.5 of [RFC2407].
This document is intended to obsolete RFC 2481, "A Proposal to add This document is intended to obsolete RFC 2481, "A Proposal to add
Explicit Congestion Notification (ECN) to IP", which defined ECN as Explicit Congestion Notification (ECN) to IP", which defined ECN as
an Experimental Protocol for the Internet Community, as well as to an Experimental Protocol for the Internet Community. The rest of
obsolete three subsequent internet-drafts on ECN, "IPsec Interactions this section describes the relationship between this document and its
with ECN", "ECN Interactions with IP Tunnels", and "TCP with ECN: The predecessor.
Treatment of Retransmitted Data Packets". This document is intended
largely to merge the earlier documents all into a single document,
for greater clarity, in preparation to becoming a Proposed Standard.
The rest of this section describes the relationship between this
document and its predecessors.
RFC 2481 included a brief discussion of the use of ECN with encapsu- RFC 2481 included a brief discussion of the use of ECN with encapsu-
lated packets, and noted that for the IPsec specifications at the lated packets, and noted that for the IPsec specifications at the
time (January 1999), flows could not safely use ECN if they were to time (January 1999), flows could not safely use ECN if they were to
traverse IPsec tunnels. RFC 2481 also described the changes that traverse IPsec tunnels. RFC 2481 also described the changes that
could be made to IPsec tunnel specifications to made them compatible could be made to IPsec tunnel specifications to made them compatible
with ECN. "IPsec Interactions with ECN" outlined these changes to with ECN.
IPsec tunnels in detail, and included an extensive discussion of the
security implications of ECN (now included as Sections 18 and 19 of This document also incorporates work that was done after RFC 2481,
this document). The draft of "ECN Interactions with IP Tunnels" First was to describe the changes to IPsec tunnels in detail, and
extended the discussion of IPsec tunnels to include all IP tunnels. extensively discuss the security implications of ECN (now included as
Because older IP tunnels are not compatible with a flow's use of ECN, Sections 18 and 19 of this document). Second was to extend the dis-
the deployment of ECN in the Internet will create strong pressure for cussion of IPsec tunnels to include all IP tunnels. Because older IP
older IP tunnels to be updated to an ECN-compatible version, using tunnels are not compatible with a flow's use of ECN, the deployment
either the limited-functionality or the full-functionality option. of ECN in the Internet will create strong pressure for older IP tun-
nels to be updated to an ECN-compatible version, using either the
limited-functionality or the full-functionality option.
This document does not address the issue of including ECN in non-IP This document does not address the issue of including ECN in non-IP
tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary
document about adding ECN support to MPLS has since expired. document about adding ECN support to MPLS was not advanced.
This document expands on one area not addressed in RFC 2481, the use A third new piece of work after RFC2481 was to describe the ECN pro-
of ECN with retransmitted data packets. That is, this document cedure with retransmitted data packets, that the ECT bit should not
includes the material from "TCP with ECN: The Treatment of Retrans- be set on retransmitted data packets. The motivation for this addi-
mitted Data Packets" specifying that the ECT bit should not be set on tional specification is to eliminate a possible avenue for denial-of-
retransmitted data packets. The motivation for this additional spec- service attacks on an existing TCP connection. Some prior deploy-
ification is to eliminate a possible avenue for denial-of-service ments of ECN-capable TCP might not conform to the (new) requirement
attacks on an existing TCP connection. Some prior deployments of not to set the ECT bit on retransmitted packets; we do not believe
ECN-capable TCP might not conform to the (new) requirement not to set this will cause significant problems in practice.
the ECT bit on retransmitted packets; we do not believe this will
cause significant problems in practice.
This document also expands on the specification of the use of SYN This document also expands slightly on the specification of the use
packets for the negotiation of ECN, and specifies some optional of SYN packets for the negotiation of ECN. While some prior deploy-
behavior for this. In particular, the document allows a TCP host to ments of ECN-capable TCP might not conform to the requirements speci-
send a non-ECN-setup SYN packet after sending a failed ECN-setup SYN fied in this document, we do not believe that this will lead to any
packet, and precisely specifies the required behavior when both ECN- performance or compatibility problems for TCP connections with a com-
setup SYN packets and non-ECN-setup SYN packets are sent in the same bination of TCP implementations at the endpoints.
connection. While some prior deployments of ECN-capable TCP might
not conform to the requirements specified in this document, we do not
believe that this will lead to any performance or compatibility prob-
lems for TCP connections with a combination of TCP implementations at
the endpoints.
13. Conclusions 13. Conclusions
Given the current effort to implement AQM, we believe this is the Given the current effort to implement AQM, we believe this is the
right time to deploy congestion avoidance mechanisms that do not right time to deploy congestion avoidance mechanisms that do not
depend on packet drops alone. With the increased deployment of depend on packet drops alone. With the increased deployment of
applications and transports sensitive to the delay and loss of a sin- applications and transports sensitive to the delay and loss of a sin-
gle packet (e.g., realtime traffic, short web transfers), depending gle packet (e.g., realtime traffic, short web transfers), depending
on packet loss as a normal congestion notification mechanism appears on packet loss as a normal congestion notification mechanism appears
to be insufficient (or at the very least, non-optimal). to be insufficient (or at the very least, non-optimal).
skipping to change at page 37, line 36 skipping to change at page 37, line 43
on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for discus- on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for discus-
sions of ECN issues, and Steve Bellovin, Jim Bound, Brian Carpenter, sions of ECN issues, and Steve Bellovin, Jim Bound, Brian Carpenter,
Paul Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for dis- Paul Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for dis-
cussions of security issues. We also thank the Internet End-to-End cussions of security issues. We also thank the Internet End-to-End
Research Group for ongoing discussions of these issues. Research Group for ongoing discussions of these issues.
Email discussions with a number of people, including Alexey Email discussions with a number of people, including Alexey
Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed
the issues raised by non-conformant equipment in the Internet that the issues raised by non-conformant equipment in the Internet that
does not respond to TCP SYN packets with the ECE and CWR flags set. does not respond to TCP SYN packets with the ECE and CWR flags set.
We thank Mark Handley, Jitentra Padhye, and others for contributions We thank Mark Handley, Jitentra Padhye, and others for discussions on
to the TCP initialization procedures. the TCP initialization procedures.
The discussion of ECN and IP tunnel considerations draws heavily on The discussion of ECN and IP tunnel considerations draws heavily on
related discussions and documents from the Differentiated Services related discussions and documents from the Differentiated Services
Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh,
for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen
for proposing modifications to RFC 2407 that improve the usability of for proposing modifications to RFC 2407 that improve the usability of
negotiating the ECN Tunnel SA attribute. negotiating the ECN Tunnel SA attribute.
15. References 15. References
[AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402,
November 1998. November 1998.
[B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997. Levels", BCP 14, RFC 2119, March 1997.
[ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html".
Reference for informational purposes only.
[ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload",
RFC 2406, November 1998. RFC 2406, November 1998.
[FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways
for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1
N.4, August 1993, p. 397-413. N.4, August 1993, p. 397-413.
[Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM
Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23.
[Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator",
URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all-
ecn. ecn. Reference for informational purposes only.
[FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End Con- [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End Con-
gestion Control in the Internet", IEEE/ACM Transactions on Network- gestion Control in the Internet", IEEE/ACM Transactions on Network-
ing, August 1999. ing, August 1999.
[FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection",
SIGCOMM '97, September 1997. SIGCOMM '97, September 1997.
[GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing
Encapsulation (GRE), RFC 1701, October 1994. Encapsulation (GRE), RFC 1701, October 1994.
skipping to change at page 40, line 24 skipping to change at page 40, line 31
[RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control",
RFC 2581, April 1999. RFC 2581, April 1999.
[RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation
of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884,
July 2000. July 2000.
[RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983,
October 2000. October 2000.
[RFC2780] S. Bradner and V. Paxson, IANA Allocation Guidelines For [RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For
Values In the Internet Protocol and Related Headers, RFC 2780, March Values In the Internet Protocol and Related Headers", RFC 2780, March
2000. 2000.
[RFD99] Ramakrishnan, Floyd, S., and Davie, B., A Proposal to Incor-
porate ECN in MPLS, work in progress, June 1999. URL
"http://www.aciri.org/floyd/papers/draft-ietf-mpls-ecn-00.txt".
[RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for
Congestion Avoidance in Computer Networks", ACM Transactions on Com- Congestion Avoidance in Computer Networks", ACM Transactions on Com-
puter Systems, Vol.8, No.2, pp. 158-181, May 1990. puter Systems, Vol.8, No.2, pp. 158-181, May 1990.
[SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom
Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM
Computer Communications Review, October 1999. Computer Communications Review, October 1999.
16. Security Considerations 16. Security Considerations
Security considerations have been discussed in Sections 7 and 8. Security considerations have been discussed in Sections 7, 8, 18, and
19.
17. IPv4 Header Checksum Recalculation 17. IPv4 Header Checksum Recalculation
IPv4 header checksum recalculation is an issue with some high-end IPv4 header checksum recalculation is an issue with some high-end
router architectures using an output-buffered switch, since most if router architectures using an output-buffered switch, since most if
not all of the header manipulation is performed on the input side of not all of the header manipulation is performed on the input side of
the switch, while the ECN decision would need to be made local to the the switch, while the ECN decision would need to be made local to the
output buffer. This is not an issue for IPv6, since there is no IPv6 output buffer. This is not an issue for IPv6, since there is no IPv6
header checksum. The IPv4 TOS octet is the last byte of a 16-bit header checksum. The IPv4 TOS octet is the last byte of a 16-bit
half-word. half-word.
skipping to change at page 44, line 18 skipping to change at page 44, line 25
18.1.5. Changes with No Functional Effect 18.1.5. Changes with No Functional Effect
(0, *) -> (0, *) (0, *) -> (0, *)
The CE bit is ignored in a packet that does not have the ECT bit set. The CE bit is ignored in a packet that does not have the ECT bit set.
Thus, this change would have no effect, in terms of ECN. Thus, this change would have no effect, in terms of ECN.
18.2. Information carried in the Transport Header 18.2. Information carried in the Transport Header
For TCP, an ECN-capable TCP receiver informs its TCP peer that it is For TCP, an ECN-capable TCP receiver informs its TCP peer that it is
ECN-capable at the TCP level, using information in the TCP header at ECN-capable at the TCP level, conveying this information in the TCP
the time the connection is setup. This document does not consider header at the time the connection is setup. This document does not
potential dangers introduced by changes in the transport header consider potential dangers introduced by changes in the transport
within the network. In the case of IPsec tunnels, the IPsec tunnel header within the network. In the case of IPsec tunnels, the IPsec
protects the transport header. tunnel protects the transport header.
Another issue concerns TCP packets with a spoofed IP source address
carrying invalid ECN information in the transport header. For com-
pleteness, we examine here some possible ways that a node spoofing
the IP source address of another node could use the two ECN flags in
the TCP header to launch a denial-of-service attack. However, these
attacks would require an ability for the attacker to use valid TCP
sequence numbers, and any attacker with this ability and with the
ability to spoof IP source addresses could damage the TCP connection
without using the ECN flags. Therefore, ECN does not add any new
vulnerabilities in this respect.
An acknowledgement packet with a spoofed IP source address of the TCP
data receiver could include the ECE bit set. If accepted by the TCP
data sender as a valid packet, this spoofed acknowledgement packet
could result in the TCP data sender unnecessarily halving its conges-
tion window. However, to be accepted by the data sender, such a
spoofed acknowledgement packet would have to have the correct 32-bit
sequence number as well as a valid acknowledgement number. An
attacker that could successfully send such a spoofed acknowledgement
packet could also send a spoofed RST packet, or do other equally dam-
aging operations to the TCP connection.
Packets with a spoofed IP source address of the TCP data sender could
include the CWR bit set. Again, to be accepted, such a packet would
have to have a valid sequence number. In addition, such a spoofed
packet would have a limited performance impact. Spoofing a data
packet with the CWR bit set could result in the TCP data receiver
sending fewer ECE packets than it would otherwise, if the data
receiver was sending ECE packets when it received the spoofed CWR
packet.
18.3. Split Paths 18.3. Split Paths
In some cases, a malicious or broken router might have access to only In some cases, a malicious or broken router might have access to only
a subset of the packets from a flow. The question is as follows: a subset of the packets from a flow. The question is as follows:
can this router, by altering the ECN field in this subset of the can this router, by altering the ECN field in this subset of the
packets, do more damage to that flow than if it had simply dropped packets, do more damage to that flow than if it had simply dropped
that set of packets? that set of packets?
We will classify the packets in the flow as A packets and B packets, We will classify the packets in the flow as A packets and B packets,
skipping to change at page 48, line 35 skipping to change at page 49, line 29
receiving more bandwidth than it would have otherwise, relative to receiving more bandwidth than it would have otherwise, relative to
competing non-subverted flows. If the congested queue reaches the competing non-subverted flows. If the congested queue reaches the
packet-dropping stage, then the subversion of end-to-end congestion packet-dropping stage, then the subversion of end-to-end congestion
control might or might not be of overall benefit to the subverted control might or might not be of overall benefit to the subverted
flow, depending on that flow's relative tradeoffs between throughput, flow, depending on that flow's relative tradeoffs between throughput,
loss, and delay. loss, and delay.
One form of subverting end-to-end congestion control is to falsely One form of subverting end-to-end congestion control is to falsely
indicate ECN-capability by setting the ECT bit. This has the conse- indicate ECN-capability by setting the ECT bit. This has the conse-
quence of downstream congested routers setting the CE bit in vain. quence of downstream congested routers setting the CE bit in vain.
However, as we describe in the section below, if the ECT bit is However, as described in Section 9.1.2, if the ECT bit is changed in
changed in the IPsec tunnel, this can be detected at the egress point an IP tunnel, this can be detected at the egress point of the tunnel,
of the tunnel. as long as the inner header was not changed within the tunnel.
The second form of subverting end-to-end congestion control is to The second form of subverting end-to-end congestion control is to
erase the congestion indication, either by erasing the CE bit erase the congestion indication, either by erasing the CE bit
directly, or by erasing the ECT bit when the CE bit is already set. directly, or by erasing the ECT bit when the CE bit is already set.
In this case, it is the upstream congested routers that set the CE In this case, it is the upstream congested routers that set the CE
bit in vain. bit in vain.
If the ECT bit is erased within an IP tunnel, then this can be If the ECT bit is erased within an IP tunnel, then this can be
detected at the egress point of the tunnel. If the CE bit is set detected at the egress point of the tunnel, as long as the inner
header was not changed within the tunnel. If the CE bit is set
upstream of the IP tunnel, then any erasure of the outer header's CE upstream of the IP tunnel, then any erasure of the outer header's CE
bit within the tunnel will have no effect because the inner header bit within the tunnel will have no effect because the inner header
preserves the set value of the CE bit. However, if the CE bit is set preserves the set value of the CE bit. However, if the CE bit is set
within the tunnel, and erased either within or downstream of the tun- within the tunnel, and erased either within or downstream of the tun-
nel, this is not necessarily detected at the egress point of the nel, this is not necessarily detected at the egress point of the tun-
tunnel. nel.
With this subversion of end-to-end congestion control, an end-system With this subversion of end-to-end congestion control, an end-system
transport does not respond to the congestion indication. Along with transport does not respond to the congestion indication. Along with
the increased unfairness for the non-subverted flows described in the the increased unfairness for the non-subverted flows described in the
previous section, the congested router's queue could continue to previous section, the congested router's queue could continue to
build, resulting in packet loss at the congested router - which is a build, resulting in packet loss at the congested router - which is a
means for indicating congestion to the transport in any case. In the means for indicating congestion to the transport in any case. In the
interim, the flow might experience higher queueing delays, possibly interim, the flow might experience higher queueing delays, possibly
along with an increased bandwidth relative to other non-subverted along with an increased bandwidth relative to other non-subverted
flows. But transports do not inherently make assumptions of consis- flows. But transports do not inherently make assumptions of consis-
skipping to change at page 49, line 46 skipping to change at page 50, line 40
end-to-end congestion control that a broken or malicious router could end-to-end congestion control that a broken or malicious router could
use. For example, a broken router could duplicate data packets, thus use. For example, a broken router could duplicate data packets, thus
effectively negating the effects of end-to-end congestion control effectively negating the effects of end-to-end congestion control
along some portion of the path. (For a router that duplicated pack- along some portion of the path. (For a router that duplicated pack-
ets within an IPsec tunnel, the security administrator can cause the ets within an IPsec tunnel, the security administrator can cause the
duplicate packets to be discarded by configuring anti-replay protec- duplicate packets to be discarded by configuring anti-replay protec-
tion for the tunnel.) This duplication of packets within the network tion for the tunnel.) This duplication of packets within the network
would have similar implications for the network and for the subverted would have similar implications for the network and for the subverted
flow as those described in Sections 18.1.1 and 18.1.4 above. flow as those described in Sections 18.1.1 and 18.1.4 above.
20. The motivation for the ECT bit. 20. The Motivation for the ECT bit.
The need for the ECT bit is motivated by the fact that ECN will be The need for the ECT bit is motivated by the fact that ECN will be
deployed incrementally in an Internet where some transport protocols deployed incrementally in an Internet where some transport protocols
and routers understand ECN and some do not. With the ECT bit, the and routers understand ECN and some do not. With the ECT bit, the
router can drop packets from flows that are not ECN-capable, but can router can drop packets from flows that are not ECN-capable, but can
*instead* set the CE bit in packets that *are* ECN-capable. Because *instead* set the CE bit in packets that *are* ECN-capable. Because
the ECT bit allows an end node to have the CE bit set in a packet the ECT bit allows an end node to have the CE bit set in a packet
*instead* of having the packet dropped, an end node might have some *instead* of having the packet dropped, an end node might have some
incentive to deploy ECN. incentive to deploy ECN.
skipping to change at page 50, line 43 skipping to change at page 51, line 37
A flow that advertised itself as ECN-Capable but does not respond to A flow that advertised itself as ECN-Capable but does not respond to
CE bits is functionally equivalent to a flow that turns off conges- CE bits is functionally equivalent to a flow that turns off conges-
tion control, as discussed earlier in this document. tion control, as discussed earlier in this document.
Thus, in a world when a subset of the flows are ECN-capable, but Thus, in a world when a subset of the flows are ECN-capable, but
where ECN-capable flows have no mechanism for indicating that fact to where ECN-capable flows have no mechanism for indicating that fact to
the routers, there would be less effective and less fair congestion the routers, there would be less effective and less fair congestion
control in the Internet, resulting in a strong incentive for end control in the Internet, resulting in a strong incentive for end
nodes not to deploy ECN. nodes not to deploy ECN.
21. Why use two bits in the IP header? 21. Why use Two Bits in the IP Header?
Given the need for an ECT indication in the IP header, there still Given the need for an ECT indication in the IP header, there still
remains the question of whether the ECT (ECN-Capable Transport) and remains the question of whether the ECT (ECN-Capable Transport) and
CE (Congestion Experienced) indications should have been overloaded CE (Congestion Experienced) indications should have been overloaded
on a single bit. This overloaded-one-bit alternative, explored in on a single bit. This overloaded-one-bit alternative, explored in
[Floyd94], would have involved a single bit with two values. One [Floyd94], would have involved a single bit with two values. One
value, "ECT and not CE", would represent an ECN-Capable Transport, value, "ECT and not CE", would represent an ECN-Capable Transport,
and the other value, "CE or not ECT", would represent either and the other value, "CE or not ECT", would represent either Conges-
Congestion Experienced or a non-ECN-Capable transport. tion Experienced or a non-ECN-Capable transport.
One difference between the one-bit and two-bit implementations con- One difference between the one-bit and two-bit implementations con-
cerns packets that traverse multiple congested routers. Consider a cerns packets that traverse multiple congested routers. Consider a
CE packet that arrives at a second congested router, and is selected CE packet that arrives at a second congested router, and is selected
by the active queue management at that router for either marking or by the active queue management at that router for either marking or
dropping. In the one-bit implementation, the second congested router dropping. In the one-bit implementation, the second congested router
has no choice but to drop the CE packet, because it cannot distin- has no choice but to drop the CE packet, because it cannot distin-
guish between a CE packet and a non-ECT packet. In the two-bit guish between a CE packet and a non-ECT packet. In the two-bit
implementation, the second congested router has the choice of either implementation, the second congested router has the choice of either
dropping the CE packet, or of leaving it alone with the CE bit set. dropping the CE packet, or of leaving it alone with the CE bit set.
skipping to change at page 52, line 21 skipping to change at page 53, line 15
packets from ECN-Capable flows (to convey the functionality of the packets from ECN-Capable flows (to convey the functionality of the
second bit elsewhere, namely in the transport header), or that second bit elsewhere, namely in the transport header), or that
senders in ECN-Capable flows accept the limitation that receivers senders in ECN-Capable flows accept the limitation that receivers
must be able to determine a priori which packets are ECN-Capable and must be able to determine a priori which packets are ECN-Capable and
which are not ECN-Capable. Third, the one-bit implementation is pos- which are not ECN-Capable. Third, the one-bit implementation is pos-
sibly more open to errors from faulty implementations that choose the sibly more open to errors from faulty implementations that choose the
wrong default value for the ECN bit. We believe that the use of the wrong default value for the ECN bit. We believe that the use of the
extra bit in the IP header for the ECT-bit is extremely valuable to extra bit in the IP header for the ECT-bit is extremely valuable to
overcome these limitations. overcome these limitations.
22. Historical definitions for the IPv4 TOS octet 22. Historical Definitions for the IPv4 TOS Octet
RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP
header. In RFC 791, bits 6 and 7 of the ToS octet are listed as header. In RFC 791, bits 6 and 7 of the ToS octet are listed as
"Reserved for Future Use", and are shown set to zero. The first two "Reserved for Future Use", and are shown set to zero. The first two
fields of the ToS octet were defined as the Precedence and Type of fields of the ToS octet were defined as the Precedence and Type of
Service (TOS) fields. Service (TOS) fields.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | 0 | 0 | RFC 791 | PRECEDENCE | TOS | 0 | 0 | RFC 791
skipping to change at page 52, line 51 skipping to change at page 53, line 45
The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows:
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | MBZ | RFC 1349 | PRECEDENCE | TOS | MBZ | RFC 1349
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary
Cost". In addition to the Precedence and Type of Service (TOS) Cost". In addition to the Precedence and Type of Service (TOS)
fields, the last field, MBZ (for "must be zero") was defined as fields, the last field, MBZ (for "must be zero") was defined as cur-
currently unused. RFC 1349 stated that "The originator of a datagram rently unused. RFC 1349 stated that "The originator of a datagram
sets [the MBZ] field to zero (unless participating in an Internet sets [the MBZ] field to zero (unless participating in an Internet
protocol experiment which makes use of that bit)." protocol experiment which makes use of that bit)."
RFC 1455 [RFC 1455] defined an experimental standard that used all RFC 1455 [RFC 1455] defined an experimental standard that used all
four bits in the TOS field to request a guaranteed level of link four bits in the TOS field to request a guaranteed level of link
security. security.
RFC 1349 is obsoleted by "Definition of the Differentiated Services RFC 1349 and RFC 1455 have been obsoleted by "Definition of the Dif-
Field (DS Field) in the IPv4 and IPv6 Headers" [RFC2474], in which ferentiated Services Field (DS Field) in the IPv4 and IPv6 Headers"
bits 6 and 7 of the DS field are listed as Currently Unused (CU). [RFC2474] in which bits 6 and 7 of the DS field are listed as Cur-
The first six bits of the DS field are defined as the Differentiated rently Unused (CU). RFC 2780 [RFC2780] specified ECN as an experi-
Services CodePoint (DSCP): mental use of the two-bit CU field. RFC 2780 updated the definition
of the DS Field to only encompass the first six bits of this octet
rather than all eight bits; these first six bits are defined as the
Differentiated Services CodePoint (DSCP):
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
| DSCP | CU | RFC 2474 | DSCP | CU | RFCs 2474,
2780
+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+-----+-----+
Because of this unstable history, the definition of the ECN field in Because of this unstable history, the definition of the ECN field in
this document cannot be guaranteed to be backwards compatible with this document cannot be guaranteed to be backwards compatible with
all past uses of these two bits. The damage that could be done by a all past uses of these two bits.
non-ECN-capable router would be to "erase" the CE bit for an ECN-
capable packet that arrived at the router with the CE bit set, or set Prior to RFC 2474, routers were not permitted to modify bits in
the CE bit even in the absence of congestion. This has been dis- either the DSCP or ECN field of packets forwarded through them, and
cussed in the section on "Non-compliance in the Network". hence routers that comply only with RFCs prior to 2474 should have no
effect on ECN. For end nodes, bit 7 (the ECN CE bit) must be trans-
mitted as zero for any implementation compliant only with RFCs prior
to 2474. Such nodes may transmit bit 6 (the ECN ECT bit) as one for
the "Minimize Monetary Cost" provision of RFC 1349 or the experiment
authorized by RFC 1455; neither this aspect of RFC 1349 nor the
experiment in RFC 1455 were widely implemented or used. The damage
that could be done by a broken, non-conformant router would be to
"erase" the CE bit for an ECN- capable packet that arrived at the
router with the CE bit set, or set the CE bit even in the absence of
congestion. This has been discussed in the section on "Non-compli-
ance in the Network".
The damage that could be done in an ECN-capable environment by a non- The damage that could be done in an ECN-capable environment by a non-
ECN-capable end-node transmitting packets with the ECT bit set has ECN-capable end-node transmitting packets with the ECT bit set has
been discussed in the section on "Non-compliance by the End Nodes". been discussed in the section on "Non-compliance by the End Nodes".
23. IANA Considerations
The bits for ECT and CE in the ECN Field of the IP header and the
bits for CWR and ECE in the TCP header are specified by the Standards
Action of this RFC, as is required by RFC 2780. We would note that
this RFC does not define the codepoint of (ECT=0, CE=1) for the ECT
and CE bits.
IANA allocated the IPSEC Security Association Attribute value 10 for
the ECN Tunnel use described in Section 9.2.1.2 above at the request
of David Black in November 1999. If this draft is approved for pub-
lication as an RFC, IANA should change the Reference for this alloca-
tion from David Black's request to this RFC based on its RFC number.
AUTHORS' ADDRESSES AUTHORS' ADDRESSES
K. K. Ramakrishnan K. K. Ramakrishnan
TeraOptic Networks, Inc. TeraOptic Networks, Inc.
Phone: +1 (408) 666-8650 Phone: +1 (408) 666-8650
Email: kk@teraoptic.com Email: kk@teraoptic.com
Sally Floyd Sally Floyd
Phone: +1 (510) 666-2989 Phone: +1 (510) 666-2989
ACIRI ACIRI
skipping to change at page 54, line 4 skipping to change at page 55, line 27
Sally Floyd Sally Floyd
Phone: +1 (510) 666-2989 Phone: +1 (510) 666-2989
ACIRI ACIRI
Email: floyd@aciri.org Email: floyd@aciri.org
URL: http://www.aciri.org/floyd/ URL: http://www.aciri.org/floyd/
David L. Black David L. Black
EMC Corporation EMC Corporation
42 South St. 42 South St.
Hopkinton, MA 01748 Hopkinton, MA 01748
Phone: +1 (508) 435-1000 x75140 Phone: +1 (508) 435-1000 x75140
Email: black_david@emc.com Email: black_david@emc.com
This draft was created in November 2000. This draft was created in January 2001.
It expires May 2001. It expires July 2001.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/