draft-ietf-tsvwg-ecn-03.txt | draft-ietf-tsvwg-ecn-04.txt | |||
---|---|---|---|---|
Internet Engineering Task Force K. K. Ramakrishnan | Internet Engineering Task Force K. K. Ramakrishnan | |||
INTERNET DRAFT TeraOptic Networks | INTERNET DRAFT TeraOptic Networks | |||
draft-ietf-tsvwg-ecn-03.txt Sally Floyd | draft-ietf-tsvwg-ecn-04.txt Sally Floyd | |||
ACIRI | ACIRI | |||
D. Black | D. Black | |||
EMC | EMC | |||
March, 2001 | June, 2001 | |||
Expires: September, 2001 | Expires: December, 2001 | |||
The Addition of Explicit Congestion Notification (ECN) to IP | The Addition of Explicit Congestion Notification (ECN) to IP | |||
Status of this Memo | Status of this Memo | |||
This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
skipping to change at page 2, line 8 | skipping to change at page 2, line 8 | |||
active queue management (e.g., RED) to the Internet infrastructure, | active queue management (e.g., RED) to the Internet infrastructure, | |||
where routers detect congestion before the queue overflows, routers | where routers detect congestion before the queue overflows, routers | |||
are no longer limited to packet drops as an indication of congestion. | are no longer limited to packet drops as an indication of congestion. | |||
Routers can instead set the Congestion Experienced (CE) codepoint in | Routers can instead set the Congestion Experienced (CE) codepoint in | |||
the IP header of packets from ECN-capable transports. We describe | the IP header of packets from ECN-capable transports. We describe | |||
when the CE codepoint is to be set in routers, and describe | when the CE codepoint is to be set in routers, and describe | |||
modifications needed to TCP to make it ECN-capable. Modifications to | modifications needed to TCP to make it ECN-capable. Modifications to | |||
other transport protocols (e.g., unreliable unicast or multicast, | other transport protocols (e.g., unreliable unicast or multicast, | |||
reliable multicast, other reliable unicast transport protocols) could | reliable multicast, other reliable unicast transport protocols) could | |||
be considered as those protocols are developed and advance through | be considered as those protocols are developed and advance through | |||
the standards process. | the standards process. We also describe in this document the issues | |||
involving the use of ECN within IP tunnels, and within IPsec tunnels | ||||
in particular. | ||||
We also describe in this document the issues involving the use of ECN | One of the guiding principles for this document is that, to the | |||
within IP tunnels, and within IPsec tunnels in particular. | extent possible, the mechanisms specified here be incrementally | |||
deployable. One challenge to the principle of incremental deployment | ||||
has been the prior existence of some IP tunnels that were not | ||||
compatible with the use of ECN. As ECN becomes deployed, non- | ||||
compatible IP tunnels will have to be upgraded to conform to this | ||||
document. | ||||
One of the guiding principles for this document is that all the | This document is intended to obsolete RFC 2481, "A Proposal to add | |||
mechanisms specified here are incrementally deployable. | Explicit Congestion Notification (ECN) to IP", which defined ECN as | |||
an Experimental Protocol for the Internet Community. | ||||
RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This | ||||
document also obsoletes three subsequent internet-drafts on ECN, | ||||
"IPsec Interactions with ECN", "ECN Interactions with IP Tunnels", | ||||
and "TCP with ECN: The Treatment of Retransmitted Data Packets". | ||||
This document also updates RFC 2401 on "Security Architecture for the | ||||
Internet Protocol". | ||||
Table of Contents | Table of Contents | |||
1. Introduction | 1. Introduction | |||
2. Conventions and Acronyms | 2. Conventions and Acronyms | |||
3. Assumptions and General Principles | 3. Assumptions and General Principles | |||
4. Active Queue Management (AQM) | 4. Active Queue Management (AQM) | |||
5. Explicit Congestion Notification in IP | 5. Explicit Congestion Notification in IP | |||
5.1. ECN as an Indication of Persistent Congestion | 5.1. ECN as an Indication of Persistent Congestion | |||
5.2. Dropped or Corrupted Packets | 5.2. Dropped or Corrupted Packets | |||
5.3. Fragmentation | 5.3. Fragmentation | |||
6. Support from the Transport Protocol | 6. Support from the Transport Protocol | |||
6.1. TCP | 6.1. TCP | |||
6.1.1 TCP Initialization | 6.1.1 TCP Initialization | |||
6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field | 6.1.1.1. Middlebox Issues | |||
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field | ||||
6.1.2. The TCP Sender | 6.1.2. The TCP Sender | |||
6.1.3. The TCP Receiver | 6.1.3. The TCP Receiver | |||
6.1.4. Congestion on the ACK-path | 6.1.4. Congestion on the ACK-path | |||
6.1.5. Retransmitted TCP packets | 6.1.5. Retransmitted TCP packets | |||
6.1.6. TCP Window Probes. | 6.1.6. TCP Window Probes. | |||
7. Non-compliance by the End Nodes | 7. Non-compliance by the End Nodes | |||
8. Non-compliance in the Network | 8. Non-compliance in the Network | |||
8.1. Complications Introduced by Split Paths | 8.1. Complications Introduced by Split Paths | |||
9. Encapsulated Packets | 9. Encapsulated Packets | |||
9.1. IP packets encapsulated in IP | 9.1. IP packets encapsulated in IP | |||
9.1.1. The Limited-functionality and Full-functionality Options | 9.1.1. The Limited-functionality and Full-functionality Options | |||
9.1.2. Changes to the ECN Field within an IP Tunnel. | 9.1.2. Changes to the ECN Field within an IP Tunnel. | |||
9.2. IPsec Tunnels | 9.2. IPsec Tunnels | |||
9.2.1. Negotiation between Tunnel Endpoints | 9.2.1. Negotiation between Tunnel Endpoints | |||
9.2.1.1. ECN Tunnel Security Association Database Field | 9.2.1.1. ECN Tunnel Security Association Database Field | |||
9.2.1.2. ECN Tunnel Security Association Attribute | 9.2.1.2. ECN Tunnel Security Association Attribute | |||
9.2.1.3. Changes to IPsec Tunnel Header Processing | 9.2.1.3. Changes to IPsec Tunnel Header Processing | |||
9.2.2. Changes to the ECN Field within an IPsec Tunnel. | 9.2.2. Changes to the ECN Field within an IPsec Tunnel. | |||
9.2.3. Comments for IPsec Support | 9.2.3. Comments for IPsec Support | |||
9.3. IP packets encapsulated in non-IP packet headers. | 9.3. IP packets encapsulated in non-IP Packet Headers. | |||
10. Issues Raised by Monitoring and Policing Devices | 10. Issues Raised by Monitoring and Policing Devices | |||
11. Evaluations of ECN | 11. Evaluations of ECN | |||
11.1. Related Work Evaluating ECN | 11.1. Related Work Evaluating ECN | |||
11.2. A Discussion of the ECN nonce. | 11.2. A Discussion of the ECN nonce. | |||
11.2.1. The Incremental Deployment of ECT(1) in Routers. | 11.2.1. The Incremental Deployment of ECT(1) in Routers. | |||
12. Summary of changes required in IP and TCP | 12. Summary of changes required in IP and TCP | |||
13. Conclusions | 13. Conclusions | |||
14. Acknowledgements | 14. Acknowledgements | |||
15. References | 15. References | |||
16. Security Considerations | 16. Security Considerations | |||
skipping to change at page 4, line 17 | skipping to change at page 4, line 18 | |||
19. Implications of Subverting End-to-End Congestion Control | 19. Implications of Subverting End-to-End Congestion Control | |||
19.1. Implications for the Network and for Competing Flows | 19.1. Implications for the Network and for Competing Flows | |||
19.2. Implications for the Subverted Flow | 19.2. Implications for the Subverted Flow | |||
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | |||
20. The Motivation for the ECT Codepoints. | 20. The Motivation for the ECT Codepoints. | |||
20.1. The Motivation for an ECT Codepoint. | 20.1. The Motivation for an ECT Codepoint. | |||
20.2. The Motivation for two ECT Codepoints. | 20.2. The Motivation for two ECT Codepoints. | |||
21. Why use Two Bits in the IP Header? | 21. Why use Two Bits in the IP Header? | |||
22. Historical Definitions for the IPv4 TOS Octet | 22. Historical Definitions for the IPv4 TOS Octet | |||
23. IANA Considerations | 23. IANA Considerations | |||
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet | ||||
23.2. TCP Header Flags | ||||
23.3. IPSEC Security Association Attributes | ||||
RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare | RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - | |||
this with draft-ietf-tsvwg-ecn-02, compare the following: | To compare this with draft-ietf-tsvwg-ecn-03, compare the following: | |||
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-02.troff" | ||||
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-03.troff" | "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-03.troff" | |||
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-04.troff" | ||||
Changes from draft-ietf-tsvwg-ecn-03: | ||||
An expanded section on IANA Considerations. | ||||
Added back the section on "Middlebox Issues" about devices that either | ||||
drop an ECN-setup SYN packet or respond with a RST. | ||||
Clarified MUSTs and SHOULDs for limited-functionality and full- | ||||
functionality modes of tunnels. | ||||
Changed "should" to "MUST" for the sentence about not using ECT with | ||||
pure ACK TCP packets. | ||||
Specified that ECN status is ignored in TCP once TIME-WAIT state is | ||||
entered. | ||||
Moved notes to the RFC editor about obsoleted documents to the | ||||
beginning of this document. | ||||
Some minor rephrasing for clarity. | ||||
Changes from draft-ietf-tsvwg-ecn-02: | Changes from draft-ietf-tsvwg-ecn-02: | |||
Revised Section 5.3 on fragmentation. | Revised Section 5.3 on fragmentation. | |||
Changes from draft-ietf-tsvwg-ecn-01: | Changes from draft-ietf-tsvwg-ecn-01: | |||
Added the ECT(1) codepoint, and changed references about bits to | Added the ECT(1) codepoint, and changed references about bits to | |||
references about codepoints in many places. Also added Section 11.2 on | references about codepoints in many places. Also added Section 11.2 on | |||
"A Discussion of the ECN nonce", and Section 20.2 on "The Motivation for | "A Discussion of the ECN nonce", and Section 20.2 on "The Motivation for | |||
two ECT Codepoints". | two ECT Codepoints". | |||
Added a paragraph saying that by default, the discussion of setting | Added a paragraph saying that by default, the discussion of setting | |||
the CE codepoint applies to all Differentiated Services Per-Hop | the CE codepoint applies to all Differentiated Services Per-Hop | |||
Behaviors. | Behaviors. | |||
skipping to change at page 5, line 45 | skipping to change at page 6, line 15 | |||
Active queue management mechanisms may use one of several methods for | Active queue management mechanisms may use one of several methods for | |||
indicating congestion to end-nodes. One is to use packet drops, as is | indicating congestion to end-nodes. One is to use packet drops, as is | |||
currently done. However, active queue management allows the router to | currently done. However, active queue management allows the router to | |||
separate policies of queueing or dropping packets from the policies | separate policies of queueing or dropping packets from the policies | |||
for indicating congestion. Thus, active queue management allows | for indicating congestion. Thus, active queue management allows | |||
routers to use the Congestion Experienced (CE) codepoint in a packet | routers to use the Congestion Experienced (CE) codepoint in a packet | |||
header as an indication of congestion, instead of relying solely on | header as an indication of congestion, instead of relying solely on | |||
packet drops. This has the potential of reducing the impact of loss | packet drops. This has the potential of reducing the impact of loss | |||
on latency-sensitive flows. | on latency-sensitive flows. | |||
There exist some middleboxes (firewalls, load balancers, or intrusion | ||||
detection systems) in the Internet that either drop a TCP SYN packet | ||||
configured to negotiate ECN, or respond with a RST. This document | ||||
specifies procedures that TCP implementations may use to provide | ||||
robust connectivity even in the presence of such equipment. | ||||
This document is intended to obsolete RFC 2481, "A Proposal to add | This document is intended to obsolete RFC 2481, "A Proposal to add | |||
Explicit Congestion Notification (ECN) to IP", which defined ECN as | Explicit Congestion Notification (ECN) to IP", which defined ECN as | |||
an Experimental Protocol for the Internet Community. | an Experimental Protocol for the Internet Community. | |||
RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This | ||||
document obsoletes three subsequent internet-drafts on ECN, "IPsec | ||||
Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP | ||||
with ECN: The Treatment of Retransmitted Data Packets". This | ||||
document is intended largely to merge the earlier documents all into | ||||
a single document, for greater clarity, in preparation to becoming a | ||||
Proposed Standard. | ||||
2. Conventions and Acronyms | 2. Conventions and Acronyms | |||
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | |||
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | |||
document, are to be interpreted as described in [B97]. | document, are to be interpreted as described in [RFC2119]. | |||
3. Assumptions and General Principles | 3. Assumptions and General Principles | |||
In this section, we describe some of the important design principles | In this section, we describe some of the important design principles | |||
and assumptions that guided the design choices in this proposal. | and assumptions that guided the design choices in this proposal. | |||
* Because ECN is likely to be adopted gradually, accommodating | * Because ECN is likely to be adopted gradually, accommodating | |||
migration is essential. Some routers may still only drop packets to | migration is essential. Some routers may still only drop packets to | |||
indicate congestion, and some end-systems may not be ECN-capable. The | indicate congestion, and some end-systems may not be ECN-capable. The | |||
most viable strategy is one that accommodates incremental deployment | most viable strategy is one that accommodates incremental deployment | |||
skipping to change at page 8, line 16 | skipping to change at page 8, line 30 | |||
use ECT(0). | use ECT(0). | |||
The not-ECT codepoint '00' indicates a packet that is not using ECN. | The not-ECT codepoint '00' indicates a packet that is not using ECN. | |||
The CE codepoint '11' is set by a router to indicate congestion to | The CE codepoint '11' is set by a router to indicate congestion to | |||
the end nodes. Routers that have a packet arriving at a full queue | the end nodes. Routers that have a packet arriving at a full queue | |||
drop the packet, just as they do in the absence of ECN. | drop the packet, just as they do in the absence of ECN. | |||
+-----+-----+ | +-----+-----+ | |||
| ECN FIELD | | | ECN FIELD | | |||
+-----+-----+ | +-----+-----+ | |||
ECT CE The ECT and CE bits defined in RFC 2481. | ECT CE [Obsolete] RFC 2481 names for the ECN bits. | |||
0 0 Not-ECT | 0 0 Not-ECT | |||
0 1 ECT(1) | 0 1 ECT(1) | |||
1 0 ECT(0) | 1 0 ECT(0) | |||
1 1 CE | 1 1 CE | |||
Figure 1: The ECN Field in IP. | Figure 1: The ECN Field in IP. | |||
The use of two ECT codepoints essentially gives a one-bit ECN nonce | The use of two ECT codepoints essentially gives a one-bit ECN nonce | |||
in packet headers, and routers necessarily "erase" the nonce when | in packet headers, and routers necessarily "erase" the nonce when | |||
they set the CE codepoint [SCWA99]. For example, routers that erased | they set the CE codepoint [SCWA99]. For example, routers that erased | |||
the CE codepoint would face additional difficulty in reconstructing | the CE codepoint would face additional difficulty in reconstructing | |||
the original nonce, and thus repeated erasure of the CE codepoint | the original nonce, and thus repeated erasure of the CE codepoint | |||
would be more likely to be detected by the end-nodes. The ECN nonce | would be more likely to be detected by the end-nodes. The ECN nonce | |||
also can address the problem of misbehaving transport receivers lying | also can address the problem of misbehaving transport receivers lying | |||
to the transport sender about whether or not the CE codepoint was set | to the transport sender about whether or not the CE codepoint was set | |||
in a packet. The motivations for the use of two ECT codepoints is | in a packet. The motivations for the use of two ECT codepoints is | |||
discussed in more detail in Section 20, along with some discussion of | discussed in more detail in Section 20, along with some discussion of | |||
alternate possibilities for the fourth ECT codepoint. Backwards | alternate possibilities for the fourth ECT codepoint (that is, the | |||
compatibility with earlier ECN implementations that do not understand | codepoint '01'). Backwards compatibility with earlier ECN | |||
the ECT(1) codepoint is discussed in Section 11. | implementations that do not understand the ECT(1) codepoint is | |||
discussed in Section 11. | ||||
In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable | In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable | |||
Transport (ECT) bit and the CE bit. The ECN field with only the ECN- | Transport (ECT) bit and the CE bit. The ECN field with only the ECN- | |||
Capable Transport (ECT) bit set in RFC 2481 corresponds to the ECT(0) | Capable Transport (ECT) bit set in RFC 2481 corresponds to the ECT(0) | |||
codepoint in this document, and the ECN field with both the ECT and | codepoint in this document, and the ECN field with both the ECT and | |||
CE bit in RFC 2481 corresponds to the CE codepoint in this document. | CE bit in RFC 2481 corresponds to the CE codepoint in this document. | |||
The '01' codepoint was left undefined in RFC 2481, and this is the | The '01' codepoint was left undefined in RFC 2481, and this is the | |||
reason for recommending the use of ECT(0) when only a single ECT | reason for recommending the use of ECT(0) when only a single ECT | |||
codepoint is needed. | codepoint is needed. | |||
skipping to change at page 12, line 48 | skipping to change at page 13, line 10 | |||
5.3. Fragmentation | 5.3. Fragmentation | |||
ECN-capable packets MAY have the DF (Don't Fragment) bit set. | ECN-capable packets MAY have the DF (Don't Fragment) bit set. | |||
Reassembly of a fragmented packet MUST NOT lose indications of | Reassembly of a fragmented packet MUST NOT lose indications of | |||
congestion. In other words, if any fragment of an IP packet to be | congestion. In other words, if any fragment of an IP packet to be | |||
reassembled has the CE codepoint set, then one of two actions MUST be | reassembled has the CE codepoint set, then one of two actions MUST be | |||
taken: | taken: | |||
* Set the CE codepoint on the reassembled packet. However, this | * Set the CE codepoint on the reassembled packet. However, this | |||
MUST NOT occur if any of the other fragments contributing to this | MUST NOT occur if any of the other fragments contributing to this | |||
reassembly carries the Not-ECT codepoint. | reassembly carries the Not-ECT codepoint. | |||
* The packet is dropped, instead of being reassmembled, for any | * The packet is dropped, instead of being reassembled, for any | |||
other reason. | other reason. | |||
If both actions are applicable, either MAY be chosen. Reassembly of | If both actions are applicable, either MAY be chosen. Reassembly of | |||
a fragmented packet MUST NOT change the ECN codepoint when all of the | a fragmented packet MUST NOT change the ECN codepoint when all of the | |||
fragments carry the same codepoint. | fragments carry the same codepoint. | |||
We would note that because RFC 2481 did not specify reassembly | We would note that because RFC 2481 did not specify reassembly | |||
behavior, older ECN implementations conformant with that Experimental | behavior, older ECN implementations conformant with that Experimental | |||
RFC do not necessarily perform reassembly correctly, in terms of | RFC do not necessarily perform reassembly correctly, in terms of | |||
preserving the CE codepoint in a fragment. The sender could avoid | preserving the CE codepoint in a fragment. The sender could avoid | |||
the consequences of this behavior by setting the DF bit in ECN- | the consequences of this behavior by setting the DF bit in ECN- | |||
skipping to change at page 14, line 5 | skipping to change at page 14, line 16 | |||
determine if they are both ECN-capable; an ECN-Echo (ECE) flag in the | determine if they are both ECN-capable; an ECN-Echo (ECE) flag in the | |||
TCP header so that the data receiver can inform the data sender when | TCP header so that the data receiver can inform the data sender when | |||
a CE packet has been received; and a Congestion Window Reduced (CWR) | a CE packet has been received; and a Congestion Window Reduced (CWR) | |||
flag in the TCP header so that the data sender can inform the data | flag in the TCP header so that the data sender can inform the data | |||
receiver that the congestion window has been reduced. The support | receiver that the congestion window has been reduced. The support | |||
required from other transport protocols is likely to be different, | required from other transport protocols is likely to be different, | |||
particularly for unreliable or reliable multicast transport | particularly for unreliable or reliable multicast transport | |||
protocols, and will have to be determined as other transport | protocols, and will have to be determined as other transport | |||
protocols are brought to the IETF for standardization. | protocols are brought to the IETF for standardization. | |||
In a mild abuse of terminology, in this document we refer to `TCP | ||||
packets' instead of `TCP segments'. | ||||
6.1. TCP | 6.1. TCP | |||
The following sections describe in detail the proposed use of ECN in | The following sections describe in detail the proposed use of ECN in | |||
TCP. This proposal is described in essentially the same form in | TCP. This proposal is described in essentially the same form in | |||
[Floyd94]. We assume that the source TCP uses the standard congestion | [Floyd94]. We assume that the source TCP uses the standard congestion | |||
control algorithms of Slow-start, Fast Retransmit and Fast Recovery | control algorithms of Slow-start, Fast Retransmit and Fast Recovery | |||
[RFC 2001]. | [RFC2581]. | |||
This proposal specifies two new flags in the Reserved field of the | This proposal specifies two new flags in the Reserved field of the | |||
TCP header. The TCP mechanism for negotiating ECN-Capability uses | TCP header. The TCP mechanism for negotiating ECN-Capability uses | |||
the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved | the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved | |||
field of the TCP header is designated as the ECN-Echo flag. The | field of the TCP header is designated as the ECN-Echo flag. The | |||
location of the 6-bit Reserved field in the TCP header is shown in | location of the 6-bit Reserved field in the TCP header is shown in | |||
Figure 4 of RFC 793 [RFC793] (and is reproduced below for | Figure 4 of RFC 793 [RFC793] (and is reproduced below for | |||
completeness). This specification of the ECN Field leaves the | completeness). This specification of the ECN Field leaves the | |||
Reserved field as a 4-bit field using bits 4-7. | Reserved field as a 4-bit field using bits 4-7. | |||
skipping to change at page 16, line 24 | skipping to change at page 16, line 40 | |||
sender in a later transmission, within this TCP connection, sends a | sender in a later transmission, within this TCP connection, sends a | |||
SYN packet without ECE and CWR set. | SYN packet without ECE and CWR set. | |||
When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag | When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag | |||
but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an | but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an | |||
indication that the TCP transmitting the SYN-ACK packet is ECN- | indication that the TCP transmitting the SYN-ACK packet is ECN- | |||
Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does | Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does | |||
not commit the TCP host to setting the ECT codepoint in transmitted | not commit the TCP host to setting the ECT codepoint in transmitted | |||
packets. | packets. | |||
The following rules apply to the sending of ECN-setup packets: | The following rules apply to the sending of ECN-setup packets within | |||
a TCP connection, where a TCP connection is defined by the standard | ||||
rules for TCP connection establishment and termination. | ||||
* If a host has received an ECN-setup SYN packet, then it MAY send an | * If a host has received an ECN-setup SYN packet, then it MAY send an | |||
ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup | ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup | |||
SYN-ACK packet. | SYN-ACK packet. | |||
* A host MUST NOT set ECT on data packets unless it has sent at least | * A host MUST NOT set ECT on data packets unless it has sent at least | |||
one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at | one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at | |||
least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no | least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no | |||
non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has | non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has | |||
received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK | received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK | |||
packet, then it SHOULD NOT set ECT on data packets. | packet, then it SHOULD NOT set ECT on data packets. | |||
skipping to change at page 17, line 5 | skipping to change at page 17, line 20 | |||
ACK packet, then if that host receives TCP data packets with ECT and | ACK packet, then if that host receives TCP data packets with ECT and | |||
CE codepoints set in the IP header, then that host MUST process these | CE codepoints set in the IP header, then that host MUST process these | |||
packets as specified for an ECN-capable connection. | packets as specified for an ECN-capable connection. | |||
* A host that is not willing to use ECN on a TCP connection SHOULD | * A host that is not willing to use ECN on a TCP connection SHOULD | |||
clear both the ECE and CWR flags in all non-ECN-setup SYN and/or SYN- | clear both the ECE and CWR flags in all non-ECN-setup SYN and/or SYN- | |||
ACK packets that it sends to indicate this unwillingness. Receivers | ACK packets that it sends to indicate this unwillingness. Receivers | |||
MUST correctly handle all forms of the non-ECN-setup SYN and SYN-ACK | MUST correctly handle all forms of the non-ECN-setup SYN and SYN-ACK | |||
packets. | packets. | |||
* A host MUST NOT set ECT on SYN or SYN-ACK packets. | * A host MUST NOT set ECT on SYN or SYN-ACK packets. | |||
6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field | A TCP client enters TIME-WAIT state after receiving a FIN-ACK, and | |||
transitions to CLOSED state after a timeout. Many TCP | ||||
implementations create a new TCP connection if they receive an in- | ||||
window SYN packet during TIME-WAIT state. When a TCP host enters | ||||
TIME-WAIT or CLOSED state, it should ignore any previous state about | ||||
the negotiation of ECN for that connection. | ||||
6.1.1.1. Middlebox Issues | ||||
ECN introduces the use of the ECN-Echo and CWR flags in the TCP | ||||
header (as shown in Figure 3) for initialization. There exist some | ||||
faulty firewalls, load balancers, and intrusion detection systems in | ||||
the Internet that either drop an ECN-setup SYN packet or respond with | ||||
a RST, in the belief that such a packet (with these bits set) is a | ||||
signature for a port-scanning tool that could be used in a denial-of- | ||||
service attack. Some of the offending equipment has been identified, | ||||
and a web page [FIXES] contains a list of non-compliant products and | ||||
the fixes posted by the vendors, where these are available. The TBIT | ||||
web page [TBIT] lists some of the web servers affected by this faulty | ||||
equipment. We mention this in this document as a warning to the | ||||
community of this problem. | ||||
To provide robust connectivity even in the presence of such faulty | ||||
equipment, a host that receives a RST in response to the transmission | ||||
of an ECN-setup SYN packet MAY resend a SYN with CWR and ECE cleared. | ||||
This could result in a TCP connection being established without using | ||||
ECN. | ||||
A host that receives no reply to an ECN-setup SYN within the normal | ||||
SYN retransmission timeout interval MAY resend the SYN and any | ||||
subsequent SYN retransmissions with CWR and ECE cleared. To overcome | ||||
normal packet loss that results in the original SYN being lost, the | ||||
originating host may retransmit one or more ECN-setup SYN packets | ||||
before giving up and retransmitting the SYN with the CWR and ECE bits | ||||
cleared. | ||||
We note that in this case, the following example scenario is possi- | ||||
ble: | ||||
(1) Host A: Sends an ECN-setup SYN. | ||||
(2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or delayed. | ||||
(3) Host A: Sends a non-ECN-setup SYN. | ||||
(4) Host B: Sends a non-ECN-setup SYN/ACK. | ||||
We note that in this case, following the procedures above, neither | ||||
Host A nor Host B may set the ECT bit on data packets. Further, an | ||||
important consequence of the rules for ECN setup and usage in Section | ||||
6.1.1 is that a host is forbidden from using the reception of ECT | ||||
data packets as an implicit signal that the other host is ECN- | ||||
capable. | ||||
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field | ||||
There is the question of why we chose to have the TCP sending the SYN | There is the question of why we chose to have the TCP sending the SYN | |||
set two ECN-related flags in the Reserved field of the TCP header for | set two ECN-related flags in the Reserved field of the TCP header for | |||
the SYN packet, while the responding TCP sending the SYN-ACK sets | the SYN packet, while the responding TCP sending the SYN-ACK sets | |||
only one ECN-related flag in the SYN-ACK packet. This asymmetry is | only one ECN-related flag in the SYN-ACK packet. This asymmetry is | |||
necessary for the robust negotiation of ECN-capability with some | necessary for the robust negotiation of ECN-capability with some | |||
deployed TCP implementations. There exists at least one faulty TCP | deployed TCP implementations. There exists at least one faulty TCP | |||
implementation in which TCP receivers set the Reserved field of the | implementation in which TCP receivers set the Reserved field of the | |||
TCP header in ACK packets (and hence the SYN-ACK) simply to reflect | TCP header in ACK packets (and hence the SYN-ACK) simply to reflect | |||
the Reserved field of the TCP header in the received data packet. | the Reserved field of the TCP header in the received data packet. | |||
skipping to change at page 19, line 46 | skipping to change at page 21, line 15 | |||
We have already specified that a TCP sender is not required to reduce | We have already specified that a TCP sender is not required to reduce | |||
its congestion window more than once per window of data. Some care | its congestion window more than once per window of data. Some care | |||
is required if the TCP sender is to avoid unnecessary reductions of | is required if the TCP sender is to avoid unnecessary reductions of | |||
the congestion window when a window of data includes both dropped | the congestion window when a window of data includes both dropped | |||
packets and (marked) CE packets. This is illustrated in [Floyd98]. | packets and (marked) CE packets. This is illustrated in [Floyd98]. | |||
6.1.4. Congestion on the ACK-path | 6.1.4. Congestion on the ACK-path | |||
For the current generation of TCP congestion control algorithms, pure | For the current generation of TCP congestion control algorithms, pure | |||
acknowledgement packets (e.g., packets that do not contain any | acknowledgement packets (e.g., packets that do not contain any | |||
accompanying data) should be sent with the not-ECT codepoint. | accompanying data) MUST be sent with the not-ECT codepoint. Current | |||
Current TCP receivers have no mechanisms for reducing traffic on the | TCP receivers have no mechanisms for reducing traffic on the ACK-path | |||
ACK-path in response to congestion notification. Mechanisms for | in response to congestion notification. Mechanisms for responding to | |||
responding to congestion on the ACK-path are areas for current and | congestion on the ACK-path are areas for current and future research. | |||
future research. (One simple possibility would be for the sender to | (One simple possibility would be for the sender to reduce its | |||
reduce its congestion window when it receives a pure ACK packet with | congestion window when it receives a pure ACK packet with the CE | |||
the CE codepoint set). For current TCP implementations, a single | codepoint set). For current TCP implementations, a single dropped ACK | |||
dropped ACK generally has only a very small effect on the TCP's | generally has only a very small effect on the TCP's sending rate. | |||
sending rate. | ||||
6.1.5. Retransmitted TCP packets | 6.1.5. Retransmitted TCP packets | |||
This document specifies ECN-capable TCP implementations MUST NOT set | This document specifies ECN-capable TCP implementations MUST NOT set | |||
either ECT codepoint (ECT(0) or ECT(1)) in the IP header for | either ECT codepoint (ECT(0) or ECT(1)) in the IP header for | |||
retransmitted data packets, and that the TCP data receiver SHOULD | retransmitted data packets, and that the TCP data receiver SHOULD | |||
ignore the ECN field on arriving data packets that are outside of the | ignore the ECN field on arriving data packets that are outside of the | |||
receiver's current window. This is for greater security against | receiver's current window. This is for greater security against | |||
denial-of-service attacks, as well as for robustness of the ECN | denial-of-service attacks, as well as for robustness of the ECN | |||
congestion indication with packets that are dropped later in the | congestion indication with packets that are dropped later in the | |||
skipping to change at page 23, line 26 | skipping to change at page 24, line 42 | |||
slightly easier. For example, in an ECN-Capable environment routers | slightly easier. For example, in an ECN-Capable environment routers | |||
are not limited to information about packets that are dropped or have | are not limited to information about packets that are dropped or have | |||
the CE codepoint set at that router itself; in such an environment, | the CE codepoint set at that router itself; in such an environment, | |||
routers could also take note of arriving CE packets that indicate | routers could also take note of arriving CE packets that indicate | |||
congestion encountered by that packet earlier in the path. | congestion encountered by that packet earlier in the path. | |||
8. Non-compliance in the Network | 8. Non-compliance in the Network | |||
This section considers the issues when a router is operating, | This section considers the issues when a router is operating, | |||
possibly maliciously, to modify either of the bits in the ECN field. | possibly maliciously, to modify either of the bits in the ECN field. | |||
We note that in IPv4, the IP header is protected from bit errors by a | ||||
header checksum; this is not the case in IPv6. Thus for IPv6 the | ||||
ECN field can be accidentially modified by bit errors on links or in | ||||
routers without being detected by an IP header checksum. | ||||
By tampering with the bits in the ECN field, an adversary (or a | By tampering with the bits in the ECN field, an adversary (or a | |||
broken router) could do one or more of the following: falsely report | broken router) could do one or more of the following: falsely report | |||
congestion, disable ECN-Capability for an individual packet, erase | congestion, disable ECN-Capability for an individual packet, erase | |||
the ECN congestion indication, or falsely indicate ECN-Capability. | the ECN congestion indication, or falsely indicate ECN-Capability. | |||
Section 18 systematically examines the various cases by which the ECN | Section 18 systematically examines the various cases by which the ECN | |||
field could be modified. The important criterion considered in | field could be modified. The important criterion considered in | |||
determining the consequences of such modifications is whether it is | determining the consequences of such modifications is whether it is | |||
likely to lead to poorer behavior in any dimension (throughput, | likely to lead to poorer behavior in any dimension (throughput, | |||
delay, fairness or functionality) than if a router were to drop a | delay, fairness or functionality) than if a router were to drop a | |||
skipping to change at page 24, line 30 | skipping to change at page 25, line 50 | |||
If a router or other network element has access to all of the packets | If a router or other network element has access to all of the packets | |||
of a flow, then that router could do no more damage to a flow by | of a flow, then that router could do no more damage to a flow by | |||
altering the ECN field than it could by simply dropping all of the | altering the ECN field than it could by simply dropping all of the | |||
packets from that flow. However, in some cases, a malicious or | packets from that flow. However, in some cases, a malicious or | |||
broken router might have access to only a subset of the packets from | broken router might have access to only a subset of the packets from | |||
a flow. The question is as follows: can this router, by altering | a flow. The question is as follows: can this router, by altering | |||
the ECN field in this subset of the packets, do more damage to that | the ECN field in this subset of the packets, do more damage to that | |||
flow than if it has simply dropped that set of the packets? | flow than if it has simply dropped that set of the packets? | |||
This is also discussed in detail in Section 18, which conclude as | This is also discussed in detail in Section 18, which concludes as | |||
follows: It is true that the adversary that has access only to a | follows: It is true that the adversary that has access only to a | |||
subset of packets in an aggregate might, by subverting ECN-based | subset of packets in an aggregate might, by subverting ECN-based | |||
congestion control, be able to deny the benefits of ECN to the other | congestion control, be able to deny the benefits of ECN to the other | |||
packets in the aggregate. While this is undesirable, this is not a | packets in the aggregate. While this is undesirable, this is not a | |||
sufficient concern to result in disabling ECN. | sufficient concern to result in disabling ECN. | |||
9. Encapsulated Packets | 9. Encapsulated Packets | |||
9.1. IP packets encapsulated in IP | 9.1. IP packets encapsulated in IP | |||
skipping to change at page 25, line 31 | skipping to change at page 26, line 50 | |||
the indication of congestion. | the indication of congestion. | |||
Thus, the use of ECN over simple IP tunnels would result in routers | Thus, the use of ECN over simple IP tunnels would result in routers | |||
attempting to use the outer IP header to signal congestion to | attempting to use the outer IP header to signal congestion to | |||
endpoints, but those congestion warnings never arriving because the | endpoints, but those congestion warnings never arriving because the | |||
outer header is discarded at the tunnel egress point. This problem | outer header is discarded at the tunnel egress point. This problem | |||
was encountered with ECN and IPsec in tunnel mode, and RFC 2481 | was encountered with ECN and IPsec in tunnel mode, and RFC 2481 | |||
recommended that ECN not be used with the older simple IPsec tunnels | recommended that ECN not be used with the older simple IPsec tunnels | |||
in order to avoid this behavior and its consequences. When ECN | in order to avoid this behavior and its consequences. When ECN | |||
becomes widely deployed, then simple tunnels likely to carry ECN- | becomes widely deployed, then simple tunnels likely to carry ECN- | |||
capable traffic will have to be changed. | capable traffic will have to be changed. If ECN-capable traffic is | |||
carried by a simple tunnel through a congested, ECN-capable router, | ||||
this could result in subsequent packets being dropped for this flow | ||||
as the average queue size increases at the congested router, as | ||||
discussed in Section 8 above. | ||||
From a security point of view, the use of ECN in the outer header of | From a security point of view, the use of ECN in the outer header of | |||
an IP tunnel might raise security concerns because an adversary could | an IP tunnel might raise security concerns because an adversary could | |||
tamper with the ECN information that propagates beyond the tunnel | tamper with the ECN information that propagates beyond the tunnel | |||
endpoint. Based on an analysis in Sections 18 and 19 of these | endpoint. Based on an analysis in Sections 18 and 19 of these | |||
concerns and the resultant risks, our overall approach is to make | concerns and the resultant risks, our overall approach is to make | |||
support for ECN an option for IP tunnels, so that an IP tunnel can be | support for ECN an option for IP tunnels, so that an IP tunnel can be | |||
specified or configured either to use ECN or not to use ECN in the | specified or configured either to use ECN or not to use ECN in the | |||
outer header of the tunnel. Thus, in environments or tunneling | outer header of the tunnel. Thus, in environments or tunneling | |||
protocols where the risks of using ECN are judged to outweigh its | protocols where the risks of using ECN are judged to outweigh its | |||
benefits, the tunnel can simply not use ECN in the outer header. | benefits, the tunnel can simply not use ECN in the outer header. | |||
Then the only indication of congestion experienced at routers within | Then the only indication of congestion experienced at routers within | |||
the tunnel would be through packet loss. | the tunnel would be through packet loss. | |||
The result is that there are two viable options for the behavior of | The result is that there are two viable options for the behavior of | |||
ECN-capable connections over an IP tunnel, especially IPsec tunnels: | ECN-capable connections over an IP tunnel, including IPsec tunnels: | |||
* A limited-functionality option in which ECN is preserved in the | * A limited-functionality option in which ECN is preserved in the | |||
inner header, but disabled in the outer header. The only | inner header, but disabled in the outer header. The only | |||
mechanism available for signaling congestion occurring within the | mechanism available for signaling congestion occurring within the | |||
tunnel in this case is dropped packets. | tunnel in this case is dropped packets. | |||
* A full-functionality option that supports ECN in both the inner | * A full-functionality option that supports ECN in both the inner | |||
and outer headers, and propagates congestion warnings from nodes | and outer headers, and propagates congestion warnings from nodes | |||
within the tunnel to endpoints. | within the tunnel to endpoints. | |||
Support for these options requires varying amounts of changes to IP | Support for these options requires varying amounts of changes to IP | |||
header processing at tunnel ingress and egress. A small subset of | header processing at tunnel ingress and egress. A small subset of | |||
these changes sufficient to support only the limited-functionality | these changes sufficient to support only the limited-functionality | |||
option would be sufficient to eliminate any incompatibility between | option would be sufficient to eliminate any incompatibility between | |||
ECN and IP tunnels. | ECN and IP tunnels. | |||
skipping to change at page 26, line 23 | skipping to change at page 27, line 46 | |||
ECN and IP tunnels. | ECN and IP tunnels. | |||
One goal of this document is to give guidance about the tradeoffs | One goal of this document is to give guidance about the tradeoffs | |||
between the limited-functionality and full-functionality options. A | between the limited-functionality and full-functionality options. A | |||
full discussion of the potential effects of an adversary's | full discussion of the potential effects of an adversary's | |||
modifications of the ECN field is given in Sections 18 and 19. | modifications of the ECN field is given in Sections 18 and 19. | |||
9.1.1. The Limited-functionality and Full-functionality Options | 9.1.1. The Limited-functionality and Full-functionality Options | |||
The limited-functionality option for ECN encapsulation in IP tunnels | The limited-functionality option for ECN encapsulation in IP tunnels | |||
is for the non-ECT codepoint to be set in the outside (encapsulating) | is for the not-ECT codepoint to be set in the outside (encapsulating) | |||
header regardless of the value of the ECN field in the inside | header regardless of the value of the ECN field in the inside | |||
(encapsulated) header. With this option, the ECN field in the inner | (encapsulated) header. With this option, the ECN field in the inner | |||
header is not altered upon de-capsulation. The disadvantage of this | header is not altered upon de-capsulation. The disadvantage of this | |||
approach is that the flow does not have ECN support for that part of | approach is that the flow does not have ECN support for that part of | |||
the path that is using IP tunneling, even if the encapsulated packet | the path that is using IP tunneling, even if the encapsulated packet | |||
(from the original TCP sender) is ECN-Capable. That is, if the | (from the original TCP sender) is ECN-Capable. That is, if the | |||
encapsulated packet arrives at a congested router that is ECN- | encapsulated packet arrives at a congested router that is ECN- | |||
capable, and the router can decide to drop or mark the packet as an | capable, and the router can decide to drop or mark the packet as an | |||
indication of congestion to the end nodes, the router will not be | indication of congestion to the end nodes, the router will not be | |||
permitted to set the CE codepoint in the packet header, but instead | permitted to set the CE codepoint in the packet header, but instead | |||
skipping to change at page 27, line 22 | skipping to change at page 28, line 46 | |||
(1) An IP tunnel MUST modify the handling of the DS field octet at | (1) An IP tunnel MUST modify the handling of the DS field octet at | |||
IP tunnel endpoints by implementing either the limited- | IP tunnel endpoints by implementing either the limited- | |||
functionality or the full-functionality option. | functionality or the full-functionality option. | |||
(2) Optionally, an IP tunnel MAY enable the endpoints of an IP | (2) Optionally, an IP tunnel MAY enable the endpoints of an IP | |||
tunnel to negotiate the choice between the limited-functionality | tunnel to negotiate the choice between the limited-functionality | |||
and the full-functionality option for ECN in the tunnel. | and the full-functionality option for ECN in the tunnel. | |||
The minimum required to make ECN usable with IP tunnels is the | The minimum required to make ECN usable with IP tunnels is the | |||
limited-functionality option, which prevents ECN from being enabled | limited-functionality option, which prevents ECN from being enabled | |||
in the outer header of an IPsec tunnel. Full support for ECN | in the outer header of the tunnel. Full support for ECN requires the | |||
requires the use of the full-functionality option. If there are no | use of the full-functionality option. If there are no optional | |||
optional mechanisms for the tunnel endpoints to negotiate a choice | mechanisms for the tunnel endpoints to negotiate a choice between the | |||
between the limited-functionality or full-functionality option, there | limited-functionality or full-functionality option, there can be a | |||
can be a pre-existing agreement between the tunnel endpoints about | pre-existing agreement between the tunnel endpoints about whether to | |||
whether to support the limited-functionality or the full- | support the limited-functionality or the full-functionality ECN | |||
functionality ECN option. | option. | |||
All IP tunnels MUST implement the limited-functionality option, and | ||||
SHOULD support the full-functionality option. | ||||
In addition, it is RECOMMENDED that packets with the CE codepoint in | In addition, it is RECOMMENDED that packets with the CE codepoint in | |||
the outer header be dropped if they arrive at the tunnel egress point | the outer header be dropped if they arrive at the tunnel egress point | |||
for a tunnel that uses the limited-functionality option, or for a | for a tunnel that uses the limited-functionality option, or for a | |||
tunnel that uses the full-functionality option but for which the not- | tunnel that uses the full-functionality option but for which the not- | |||
ECT codepoint is set in the inner header. This is motivated by | ECT codepoint is set in the inner header. This is motivated by | |||
backwards compatibility and to ensure that no unauthorized | backwards compatibility and to ensure that no unauthorized | |||
modifications of the ECN field take place, and is discussed further | modifications of the ECN field take place, and is discussed further | |||
in the next Section (9.1.2). | in the next Section (9.1.2). | |||
skipping to change at page 29, line 34 | skipping to change at page 31, line 14 | |||
at the tunnel endpoint. | at the tunnel endpoint. | |||
In principle, permitting the use of ECN functionality in the outer | In principle, permitting the use of ECN functionality in the outer | |||
header of an IPsec tunnel raises security concerns because an | header of an IPsec tunnel raises security concerns because an | |||
adversary could tamper with the information that propagates beyond | adversary could tamper with the information that propagates beyond | |||
the tunnel endpoint. Based on an analysis (included in Sections 18 | the tunnel endpoint. Based on an analysis (included in Sections 18 | |||
and 19) of these concerns and the associated risks, our overall | and 19) of these concerns and the associated risks, our overall | |||
approach has been to provide configuration support for IPsec changes | approach has been to provide configuration support for IPsec changes | |||
to remove the conflict with ECN. | to remove the conflict with ECN. | |||
In particular, in tunnel mode the IPsec tunnel MUST support either | In particular, in tunnel mode the IPsec tunnel MUST support the | |||
the limited-functionality or the full-functionality mode outlined in | limited-functionality option outlined in Section 9.1.1, and SHOULD | |||
Section 9.1.1. | support the full-functionality option outlined in Section 9.1.1. | |||
This makes permission to use ECN functionality in the outer header of | This makes permission to use ECN functionality in the outer header of | |||
an IPsec tunnel a configurable part of the corresponding IPsec | an IPsec tunnel a configurable part of the corresponding IPsec | |||
Security Association (SA), so that it can be disabled in situations | Security Association (SA), so that it can be disabled in situations | |||
where the risks are judged to outweigh the benefits. The result is | where the risks are judged to outweigh the benefits. The result is | |||
that an IPsec security administrator is presented with two | that an IPsec security administrator is presented with two | |||
alternatives for the behavior of ECN-capable connections within an | alternatives for the behavior of ECN-capable connections within an | |||
IPsec tunnel, the limited-functionality alternative and full- | IPsec tunnel, the limited-functionality alternative and full- | |||
functionality alternative described earlier. All IPsec | functionality alternative described earlier. | |||
implementations MUST implement either the limited-functionality or | ||||
the full-functionality alternative in order to eliminate | ||||
incompatibility between ECN and IPsec tunnels, but implementers MAY | ||||
choose to implement either alternative. | ||||
In addition, this document specifies how the endpoints of an IPsec | In addition, this document specifies how the endpoints of an IPsec | |||
tunnel could negotiate enabling ECN functionality in the outer | tunnel could negotiate enabling ECN functionality in the outer | |||
headers of that tunnel based on security policy. The ability to | headers of that tunnel based on security policy. The ability to | |||
negotiate ECN usage between tunnel endpoints would enable a security | negotiate ECN usage between tunnel endpoints would enable a security | |||
administrator to disable ECN in situations where she believes the | administrator to disable ECN in situations where she believes the | |||
risks (e.g., of lost congestion notifications) outweigh the benefits | risks (e.g., of lost congestion notifications) outweigh the benefits | |||
of ECN. | of ECN. | |||
The IPsec protocol, as defined in [ESP, AH], does not include the IP | The IPsec protocol, as defined in [ESP, AH], does not include the IP | |||
skipping to change at page 35, line 5 | skipping to change at page 36, line 28 | |||
extending SA attribute negotiation. Some tunnels do not permit | extending SA attribute negotiation. Some tunnels do not permit | |||
traffic to be addressed to the tunnel egress endpoint, hence the ICMP | traffic to be addressed to the tunnel egress endpoint, hence the ICMP | |||
packet would have to be addressed to somewhere else, scanned for by | packet would have to be addressed to somewhere else, scanned for by | |||
the egress endpoint, and discarded there or at its actual | the egress endpoint, and discarded there or at its actual | |||
destination. In addition, ICMP delivery is unreliable, and hence | destination. In addition, ICMP delivery is unreliable, and hence | |||
there is a possibility of an ICMP packet being dropped, entailing the | there is a possibility of an ICMP packet being dropped, entailing the | |||
invention of yet another ack/retransmit mechanism. It seems better | invention of yet another ack/retransmit mechanism. It seems better | |||
simply to specify an OPTIONAL extension to the existing SA | simply to specify an OPTIONAL extension to the existing SA | |||
negotiation mechanism. | negotiation mechanism. | |||
9.3. IP packets encapsulated in non-IP packet headers. | 9.3. IP packets encapsulated in non-IP Packet Headers. | |||
A different set of issues are raised, relative to ECN, when IP | A different set of issues are raised, relative to ECN, when IP | |||
packets are encapsulated in tunnels with non-IP packet headers. This | packets are encapsulated in tunnels with non-IP packet headers. This | |||
occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. | occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. | |||
For these protocols, there is no conflict with ECN; it is just that | For these protocols, there is no conflict with ECN; it is just that | |||
ECN cannot be used within the tunnel unless an ECN codepoint can be | ECN cannot be used within the tunnel unless an ECN codepoint can be | |||
specified for the header of the encapsulating protocol. Earlier work | specified for the header of the encapsulating protocol. Earlier work | |||
considered a preliminary proposal for incorporating ECN into MPLS, | considered a preliminary proposal for incorporating ECN into MPLS, | |||
and proposals for incorporating ECN into GRE, L2TP, or PPTP will be | and proposals for incorporating ECN into GRE, L2TP, or PPTP will be | |||
considered as the need arises. | considered as the need arises. | |||
skipping to change at page 37, line 48 | skipping to change at page 39, line 24 | |||
The router sets the CE codepoint to indicate congestion to the end | The router sets the CE codepoint to indicate congestion to the end | |||
nodes. The CE codepoint in a packet header MUST NOT be reset by a | nodes. The CE codepoint in a packet header MUST NOT be reset by a | |||
router. | router. | |||
TCP requires three changes for ECN, a setup phase and two new flags | TCP requires three changes for ECN, a setup phase and two new flags | |||
in the TCP header. The ECN-Echo flag is used by the data receiver to | in the TCP header. The ECN-Echo flag is used by the data receiver to | |||
inform the data sender of a received CE packet. The Congestion | inform the data sender of a received CE packet. The Congestion | |||
Window Reduced (CWR) flag is used by the data sender to inform the | Window Reduced (CWR) flag is used by the data sender to inform the | |||
data receiver that the congestion window has been reduced. | data receiver that the congestion window has been reduced. | |||
When ECN (Explicit Congestion Notification [RFC2481]) is used, it is | When ECN (Explicit Congestion Notification) is used, it is required | |||
required that congestion indications generated within an IP tunnel | that congestion indications generated within an IP tunnel not be lost | |||
not be lost at the tunnel egress. We specified a minor modification | at the tunnel egress. We specified a minor modification to the IP | |||
to the IP protocol's handling of the ECN field during encapsulation | protocol's handling of the ECN field during encapsulation and de- | |||
and de-capsulation to allow flows that will undergo IP tunneling to | capsulation to allow flows that will undergo IP tunneling to use ECN. | |||
use ECN. | ||||
Two options for ECN in tunnels were specified: | Two options for ECN in tunnels were specified: | |||
1) A limited-functionality option that does not use ECN inside the IP | 1) A limited-functionality option that does not use ECN inside the IP | |||
tunnel, by setting the ECN field in the outer header to not-ECT, and | tunnel, by setting the ECN field in the outer header to not-ECT, and | |||
not altering the inner header at the time of decapsulation. | not altering the inner header at the time of decapsulation. | |||
2) The full-functionality option, which sets the ECN field in the | 2) The full-functionality option, which sets the ECN field in the | |||
outer header to either not-ECT or to one of the ECT codepoints, | outer header to either not-ECT or to one of the ECT codepoints, | |||
depending on the ECN field in the inner header. At decapsulation, if | depending on the ECN field in the inner header. At decapsulation, if | |||
the CE codepoint is set in the outer header, and the inner header is | the CE codepoint is set in the outer header, and the inner header is | |||
set to one of the ECT codepoints, then the CE codepoint is copied to | set to one of the ECT codepoints, then the CE codepoint is copied to | |||
the inner header. | the inner header. | |||
All IP tunnels MUST implement one of the two alternative approaches | For IPsec tunnels, this document also defines an optional IPsec | |||
described above. For IPsec tunnels, this document also defines an | Security Association (SA) attribute that enables negotiation of ECN | |||
optional IPsec Security Association (SA) attribute that enables | usage within IPsec tunnels and an optional field in the Security | |||
negotiation of ECN usage within IPsec tunnels and an optional field | Association Database to indicate whether ECN is permitted in tunnel | |||
in the Security Association Database to indicate whether ECN is | mode on a SA. The required changes to IPsec tunnels for ECN usage | |||
permitted in tunnel mode on a SA. The required changes to IPsec | modify RFC 2401 [RFC2401], which defines the IPsec architecture and | |||
tunnels for ECN usage modify RFC 2401 [RFC2401], which defines the | specifies some aspects of its implementation. The new IPsec SA | |||
IPsec architecture and specifies some aspects of its implementation. | attribute is in addition to those already defined in Section 4.5 of | |||
The new IPsec SA attribute is in addition to those already defined in | [RFC2407]. | |||
Section 4.5 of [RFC2407]. | ||||
This document is intended to obsolete RFC 2481, "A Proposal to add | This document is intended to obsolete RFC 2481, "A Proposal to add | |||
Explicit Congestion Notification (ECN) to IP", which defined ECN as | Explicit Congestion Notification (ECN) to IP", which defined ECN as | |||
an Experimental Protocol for the Internet Community. The rest of | an Experimental Protocol for the Internet Community. The rest of | |||
this section describes the relationship between this document and its | this section describes the relationship between this document and its | |||
predecessor. | predecessor. | |||
RFC 2481 included a brief discussion of the use of ECN with | RFC 2481 included a brief discussion of the use of ECN with | |||
encapsulated packets, and noted that for the IPsec specifications at | encapsulated packets, and noted that for the IPsec specifications at | |||
the time (January 1999), flows could not safely use ECN if they were | the time (January 1999), flows could not safely use ECN if they were | |||
to traverse IPsec tunnels. RFC 2481 also described the changes that | to traverse IPsec tunnels. RFC 2481 also described the changes that | |||
could be made to IPsec tunnel specifications to made them compatible | could be made to IPsec tunnel specifications to made them compatible | |||
with ECN. | with ECN. | |||
This document also incorporates work that was done after RFC 2481, | This document also incorporates work that was done after RFC 2481. | |||
First was to describe the changes to IPsec tunnels in detail, and | First was to describe the changes to IPsec tunnels in detail, and | |||
extensively discuss the security implications of ECN (now included as | extensively discuss the security implications of ECN (now included as | |||
Sections 18 and 19 of this document). Second was to extend the | Sections 18 and 19 of this document). Second was to extend the | |||
discussion of IPsec tunnels to include all IP tunnels. Because older | discussion of IPsec tunnels to include all IP tunnels. Because older | |||
IP tunnels are not compatible with a flow's use of ECN, the | IP tunnels are not compatible with a flow's use of ECN, the | |||
deployment of ECN in the Internet will create strong pressure for | deployment of ECN in the Internet will create strong pressure for | |||
older IP tunnels to be updated to an ECN-compatible version, using | older IP tunnels to be updated to an ECN-compatible version, using | |||
either the limited-functionality or the full-functionality option. | either the limited-functionality or the full-functionality option. | |||
This document does not address the issue of including ECN in non-IP | This document does not address the issue of including ECN in non-IP | |||
skipping to change at page 40, line 19 | skipping to change at page 41, line 40 | |||
this document. In addition, we would like to thank Kenjiro Cho for | this document. In addition, we would like to thank Kenjiro Cho for | |||
the proposal for the TCP mechanism for negotiating ECN-Capability, | the proposal for the TCP mechanism for negotiating ECN-Capability, | |||
Kevin Fall for the proposal of the CWR bit, Steve Blake for material | Kevin Fall for the proposal of the CWR bit, Steve Blake for material | |||
on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for | on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for | |||
discussions of ECN issues, and Steve Bellovin, Jim Bound, Brian | discussions of ECN issues, and Steve Bellovin, Jim Bound, Brian | |||
Carpenter, Paul Ferguson, Stephen Kent, Greg Minshall, and Vern | Carpenter, Paul Ferguson, Stephen Kent, Greg Minshall, and Vern | |||
Paxson for discussions of security issues. We also thank the | Paxson for discussions of security issues. We also thank the | |||
Internet End-to-End Research Group for ongoing discussions of these | Internet End-to-End Research Group for ongoing discussions of these | |||
issues. | issues. | |||
Email discussions with a number of people, including Alexey | Email discussions with a number of people, including Dax Kelson, | |||
Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed | Alexey Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have | |||
the issues raised by non-conformant equipment in the Internet that | addressed the issues raised by non-conformant equipment in the | |||
does not respond to TCP SYN packets with the ECE and CWR flags set. | Internet that does not respond to TCP SYN packets with the ECE and | |||
We thank Mark Handley, Jitentra Padhye, and others for discussions on | CWR flags set. We thank Mark Handley, Jitentra Padhye, and others | |||
the TCP initialization procedures. | for discussions on the TCP initialization procedures. | |||
The discussion of ECN and IP tunnel considerations draws heavily on | The discussion of ECN and IP tunnel considerations draws heavily on | |||
related discussions and documents from the Differentiated Services | related discussions and documents from the Differentiated Services | |||
Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, | Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, | |||
for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen | for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen | |||
for proposing modifications to RFC 2407 that improve the usability of | for proposing modifications to RFC 2407 that improve the usability of | |||
negotiating the ECN Tunnel SA attribute. | negotiating the ECN Tunnel SA attribute. | |||
We thank David Wetherall, David Ely, and Neil Spring for the proposal | We thank David Wetherall, David Ely, and Neil Spring for the proposal | |||
for the ECN nonce. We also thank Stefan Savage for discussions on | for the ECN nonce. We also thank Stefan Savage for discussions on | |||
this issue. We thank Bob Briscoe and Jon Crowcroft for raising the | this issue. We thank Bob Briscoe and Jon Crowcroft for raising the | |||
issue of fragmentation in IP, on alternate semantics for the fourth | issue of fragmentation in IP, on alternate semantics for the fourth | |||
ECN codepoint, and several other topics. We thank Richard Wendland | ECN codepoint, and several other topics. We thank Richard Wendland | |||
for feedback on several issues in the draft. | for feedback on several issues in the draft. | |||
We also thank the IESG, and in particular the Transport Area | ||||
Directors over the years, for their feedback and their work towards | ||||
the standardization of ECN. | ||||
15. References | 15. References | |||
[AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, | [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, | |||
November 1998. | November 1998. | |||
[B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement | ||||
Levels", BCP 14, RFC 2119, March 1997. | ||||
[ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". | [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". | |||
Reference for informational purposes only. | Reference for informational purposes only. | |||
[ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", | [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", | |||
RFC 2406, November 1998. | RFC 2406, November 1998. | |||
[FIXES] ECN-under-Linux Unofficial Vendor Support Page, URL | ||||
"http://gtf.org/garzik/ecn/". Reference for informational purposes | ||||
only. | ||||
[FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways | [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways | |||
for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 | for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 | |||
N.4, August 1993, p. 397-413. | N.4, August 1993, p. 397-413. | |||
[Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM | [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM | |||
Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. | Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. | |||
[Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", | [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", | |||
URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- | URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- | |||
ecn. Reference for informational purposes only. | ecn. Reference for informational purposes only. | |||
skipping to change at page 41, line 27 | skipping to change at page 43, line 4 | |||
Congestion Control in the Internet", IEEE/ACM Transactions on | Congestion Control in the Internet", IEEE/ACM Transactions on | |||
Networking, August 1999. | Networking, August 1999. | |||
[FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", | [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", | |||
SIGCOMM '97, September 1997. | SIGCOMM '97, September 1997. | |||
[GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing | [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing | |||
Encapsulation (GRE), RFC 1701, October 1994. | Encapsulation (GRE), RFC 1701, October 1994. | |||
[Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. | [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. | |||
ACM SIGCOMM '88, pp. 314-329. | ACM SIGCOMM '88, pp. 314-329. | |||
[Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance | [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance | |||
Algorithm", Message to end2end-interest mailing list, April 1990. URL | Algorithm", Message to end2end-interest mailing list, April 1990. URL | |||
"ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". | "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". | |||
[K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) | [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) | |||
benefits for TCP", Master's thesis, UCLA, 1998, URL | benefits for TCP", Master's thesis, UCLA, 1998. Citation for | |||
"http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz". | acknowledgement purposes only. | |||
[L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. | [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. | |||
Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. | Palter, Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. | |||
[MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven | [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven | |||
Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. | Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. | |||
[MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, | [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, | |||
Requirements for Traffic Engineering Over MPLS, RFC 2702, September | Requirements for Traffic Engineering Over MPLS, RFC 2702, September | |||
1999. | 1999. | |||
[PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. | [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. | |||
and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, | and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, | |||
skipping to change at page 43, line 30 | skipping to change at page 45, line 9 | |||
2000. | 2000. | |||
[RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for | [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for | |||
Congestion Avoidance in Computer Networks", ACM Transactions on | Congestion Avoidance in Computer Networks", ACM Transactions on | |||
Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. | Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. | |||
[SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom | [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom | |||
Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM | Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM | |||
Computer Communications Review, October 1999. | Computer Communications Review, October 1999. | |||
[TBIT] Jitendra Padhye and Sally Floyd, "Identifying the TCP Behavior | ||||
of Web Servers", ICSI TR-01-002, February 2001. URL | ||||
"http://www.aciri.org/tbit/". | ||||
16. Security Considerations | 16. Security Considerations | |||
Security considerations have been discussed in Sections 7, 8, 18, and | Security considerations have been discussed in Sections 7, 8, 18, and | |||
19. | 19. | |||
17. IPv4 Header Checksum Recalculation | 17. IPv4 Header Checksum Recalculation | |||
IPv4 header checksum recalculation is an issue with some high-end | IPv4 header checksum recalculation is an issue with some high-end | |||
router architectures using an output-buffered switch, since most if | router architectures using an output-buffered switch, since most if | |||
not all of the header manipulation is performed on the input side of | not all of the header manipulation is performed on the input side of | |||
skipping to change at page 45, line 22 | skipping to change at page 47, line 7 | |||
In contrast, a systematic erasure of the CE bit by a downstream | In contrast, a systematic erasure of the CE bit by a downstream | |||
router can have the effect of causing a queue buildup at an upstream | router can have the effect of causing a queue buildup at an upstream | |||
router, including the possible loss of packets due to buffer | router, including the possible loss of packets due to buffer | |||
overflow. There is a potential of unfairness in that another flow | overflow. There is a potential of unfairness in that another flow | |||
that goes through the congested router could react to the CE bit set | that goes through the congested router could react to the CE bit set | |||
while the flow that has the CE bit erased could see better | while the flow that has the CE bit erased could see better | |||
performance. The limitations on this potential unfairness are | performance. The limitations on this potential unfairness are | |||
discussed in more detail in Section 19 below. | discussed in more detail in Section 19 below. | |||
The last of the three changes is to replace the CE codepoint with the | The last of the three changes is to replace the CE codepoint with the | |||
not-ECT codepoint. thus erasing the congestion indication and | not-ECT codepoint, thus erasing the congestion indication and | |||
disabling ECN-Capability at the same time. | disabling ECN-Capability at the same time. | |||
The `erasure' of the congestion indication is only effective if the | The `erasure' of the congestion indication is only effective if the | |||
packet does not end up being marked or dropped again by a downstream | packet does not end up being marked or dropped again by a downstream | |||
router. If the CE codepoint is replaced by an ECT codepoint, the | router. If the CE codepoint is replaced by an ECT codepoint, the | |||
packet remains ECN-Capable, and could be either marked or dropped by | packet remains ECN-Capable, and could be either marked or dropped by | |||
a downstream router as an indication of congestion. If the CE | a downstream router as an indication of congestion. If the CE | |||
codepoint is replaced by the not-ECT codepoint, the packet is no | codepoint is replaced by the not-ECT codepoint, the packet is no | |||
longer ECN-capable, and can therefore be dropped but not marked by a | longer ECN-capable, and can therefore be dropped but not marked by a | |||
downstream router as an indication of congestion. | downstream router as an indication of congestion. | |||
skipping to change at page 46, line 46 | skipping to change at page 48, line 34 | |||
the packet. Thus for this case of an ECN-capable transport, the | the packet. Thus for this case of an ECN-capable transport, the | |||
consequence of this change to the ECN field is no worse than dropping | consequence of this change to the ECN field is no worse than dropping | |||
the packet. | the packet. | |||
18.2. Information carried in the Transport Header | 18.2. Information carried in the Transport Header | |||
For TCP, an ECN-capable TCP receiver informs its TCP peer that it is | For TCP, an ECN-capable TCP receiver informs its TCP peer that it is | |||
ECN-capable at the TCP level, conveying this information in the TCP | ECN-capable at the TCP level, conveying this information in the TCP | |||
header at the time the connection is setup. This document does not | header at the time the connection is setup. This document does not | |||
consider potential dangers introduced by changes in the transport | consider potential dangers introduced by changes in the transport | |||
header within the network. In the case of IPsec tunnels, the IPsec | header within the network. We note that when IPsec is used, the | |||
tunnel protects the transport header. | transport header is protected both in tunnel and transport modes | |||
[ESP, AH]. | ||||
Another issue concerns TCP packets with a spoofed IP source address | Another issue concerns TCP packets with a spoofed IP source address | |||
carrying invalid ECN information in the transport header. For | carrying invalid ECN information in the transport header. For | |||
completeness, we examine here some possible ways that a node spoofing | completeness, we examine here some possible ways that a node spoofing | |||
the IP source address of another node could use the two ECN flags in | the IP source address of another node could use the two ECN flags in | |||
the TCP header to launch a denial-of-service attack. However, these | the TCP header to launch a denial-of-service attack. However, these | |||
attacks would require an ability for the attacker to use valid TCP | attacks would require an ability for the attacker to use valid TCP | |||
sequence numbers, and any attacker with this ability and with the | sequence numbers, and any attacker with this ability and with the | |||
ability to spoof IP source addresses could damage the TCP connection | ability to spoof IP source addresses could damage the TCP connection | |||
without using the ECN flags. Therefore, ECN does not add any new | without using the ECN flags. Therefore, ECN does not add any new | |||
skipping to change at page 48, line 16 | skipping to change at page 49, line 51 | |||
codepoint set. If the end nodes are in fact using end-to-end | codepoint set. If the end nodes are in fact using end-to-end | |||
congestion control, they will see all of the indications of | congestion control, they will see all of the indications of | |||
congestion seen by the monitoring device, and will begin to respond | congestion seen by the monitoring device, and will begin to respond | |||
to these indications of congestion. Thus, the monitoring device is | to these indications of congestion. Thus, the monitoring device is | |||
successful in providing the indications to the flow at an early | successful in providing the indications to the flow at an early | |||
stage. | stage. | |||
It is true that the adversary that has access only to the A packets | It is true that the adversary that has access only to the A packets | |||
might, by subverting ECN-based congestion control, be able to deny | might, by subverting ECN-based congestion control, be able to deny | |||
the benefits of ECN to the other packets in the A&B aggregate. While | the benefits of ECN to the other packets in the A&B aggregate. While | |||
this is unfortunate, this is not a reason to disable ECN within an | this is unfortunate, this is not a reason to disable ECN. | |||
IPsec tunnel. | ||||
A variant of falsely reporting congestion occurs when there are two | A variant of falsely reporting congestion occurs when there are two | |||
adversaries along a path, where the first adversary falsely reports | adversaries along a path, where the first adversary falsely reports | |||
congestion, and the second adversary `erases' those reports. (Unlike | congestion, and the second adversary `erases' those reports. (Unlike | |||
packet drops, ECN congestion reports can be `reversed' later in the | packet drops, ECN congestion reports can be `reversed' later in the | |||
network by a malicious or broken router. However, the use of the ECN | network by a malicious or broken router. However, the use of the ECN | |||
nonce could help the transport to detect this behavior.) While this | nonce could help the transport to detect this behavior.) While this | |||
would be transparent to the end node, it is possible that a | would be transparent to the end node, it is possible that a | |||
monitoring device between the first and second adversaries would see | monitoring device between the first and second adversaries would see | |||
the false indications of congestion. Keep in mind our recommendation | the false indications of congestion. Keep in mind our recommendation | |||
skipping to change at page 51, line 32 | skipping to change at page 53, line 19 | |||
nodes. | nodes. | |||
19.2. Implications for the Subverted Flow | 19.2. Implications for the Subverted Flow | |||
When a source indicates that it is ECN-capable, there is an | When a source indicates that it is ECN-capable, there is an | |||
expectation that the routers in the network that are capable of | expectation that the routers in the network that are capable of | |||
participating in ECN will use the CE codepoint for indication of | participating in ECN will use the CE codepoint for indication of | |||
congestion. There is the potential benefit of using ECN in reducing | congestion. There is the potential benefit of using ECN in reducing | |||
the amount of packet loss (in addition to the reduced queueing delays | the amount of packet loss (in addition to the reduced queueing delays | |||
because of active queue management policies). When the packet flows | because of active queue management policies). When the packet flows | |||
through a tunnel where the nodes that the tunneled packets traverse | through an IPsec tunnel where the nodes that the tunneled packets | |||
are untrusted in some way, the expectation is that IPsec will protect | traverse are untrusted in some way, the expectation is that IPsec | |||
the flow from subversion that results in undesirable consequences. | will protect the flow from subversion that results in undesirable | |||
consequences. | ||||
In many cases, a subverted flow will benefit from the subversion of | In many cases, a subverted flow will benefit from the subversion of | |||
end-to-end congestion control for that flow in the network, by | end-to-end congestion control for that flow in the network, by | |||
receiving more bandwidth than it would have otherwise, relative to | receiving more bandwidth than it would have otherwise, relative to | |||
competing non-subverted flows. If the congested queue reaches the | competing non-subverted flows. If the congested queue reaches the | |||
packet-dropping stage, then the subversion of end-to-end congestion | packet-dropping stage, then the subversion of end-to-end congestion | |||
control might or might not be of overall benefit to the subverted | control might or might not be of overall benefit to the subverted | |||
flow, depending on that flow's relative tradeoffs between throughput, | flow, depending on that flow's relative tradeoffs between throughput, | |||
loss, and delay. | loss, and delay. | |||
skipping to change at page 54, line 34 | skipping to change at page 56, line 28 | |||
encountered congestion in the network, the router might make no | encountered congestion in the network, the router might make no | |||
change in the packets, because the CE codepoint would already be set. | change in the packets, because the CE codepoint would already be set. | |||
Thus, for packets sent with the CE codepoint set, the TCP end-nodes | Thus, for packets sent with the CE codepoint set, the TCP end-nodes | |||
could not determine if some router intended to set the CE codepoint | could not determine if some router intended to set the CE codepoint | |||
in these packets. For this reason, sending packets with the CE | in these packets. For this reason, sending packets with the CE | |||
codepoint would have to be done sparingly, and would be a less | codepoint would have to be done sparingly, and would be a less | |||
effective check against misbehaving network elements and receivers | effective check against misbehaving network elements and receivers | |||
than would be the ECN nonce. | than would be the ECN nonce. | |||
The assignment of the fourth ECN codepoint to ECT(1) precludes the | The assignment of the fourth ECN codepoint to ECT(1) precludes the | |||
use of this codepoint for other purposes. For clarity, we briefly | use of this codepoint for some other purposes. For clarity, we | |||
list those possible purposes here. | briefly list other possible purposes here. | |||
One possibility might have been for the data sender to use the fourth | One possibility might have been for the data sender to use the fourth | |||
ECN codepoint to indicate an alternate semantics for ECN. However, | ECN codepoint to indicate an alternate semantics for ECN. However, | |||
this seems to us more appropriate to be signalled using a | this seems to us more appropriate to be signalled using a | |||
differentiated services codepoint in the DS field. | differentiated services codepoint in the DS field. | |||
A second possible use for the fourth ECN codepoint would have been to | A second possible use for the fourth ECN codepoint would have been to | |||
give the router two separate codepoints for the indication of | give the router two separate codepoints for the indication of | |||
congestion, CE(0) and CE(1), for mild and severe congestion | congestion, CE(0) and CE(1), for mild and severe congestion | |||
respectively. While this could be useful in some cases, this | respectively. While this could be useful in some cases, this | |||
skipping to change at page 55, line 23 | skipping to change at page 57, line 17 | |||
control is beyond the scope of this document, as are ECN-aware | control is beyond the scope of this document, as are ECN-aware | |||
multicast packet duplication procedures and the processing of the ECN | multicast packet duplication procedures and the processing of the ECN | |||
field at multicast receivers in all cases (i.e., irrespective of the | field at multicast receivers in all cases (i.e., irrespective of the | |||
multicast packet duplication procedure(s) used). | multicast packet duplication procedure(s) used). | |||
The specification of IP tunnel modifications for ECN in this document | The specification of IP tunnel modifications for ECN in this document | |||
assumes that the only change made to the outer IP header's ECN field | assumes that the only change made to the outer IP header's ECN field | |||
between tunnel endpoints is to set the CE codepoint to indicate | between tunnel endpoints is to set the CE codepoint to indicate | |||
congestion. This is not consistent with some of the proposed uses of | congestion. This is not consistent with some of the proposed uses of | |||
ECT(1) by the multicast duplication procedures in the previous | ECT(1) by the multicast duplication procedures in the previous | |||
paragraph, and such procedures SHOULD NOT be deployed within tunnels | paragraph, and such procedures SHOULD NOT be deployed unless this | |||
configured for full ECN functionality. Limited ECN functionality may | inconsistency between multicast duplication procedures and IP tunnels | |||
be used instead, although in practice many tunnel protocols | with full ECN functionality is resolved. Limited ECN functionality | |||
may be used instead, although in practice many tunnel protocols | ||||
(including IPsec) will not work correctly if multicast traffic | (including IPsec) will not work correctly if multicast traffic | |||
duplication occurs within the tunnel | duplication occurs within the tunnel | |||
21. Why use Two Bits in the IP Header? | 21. Why use Two Bits in the IP Header? | |||
Given the need for an ECT indication in the IP header, there still | Given the need for an ECT indication in the IP header, there still | |||
remains the question of whether the ECT (ECN-Capable Transport) and | remains the question of whether the ECT (ECN-Capable Transport) and | |||
CE (Congestion Experienced) codepoints should have been overloaded on | CE (Congestion Experienced) codepoints should have been overloaded on | |||
a single bit. This overloaded-one-bit alternative, explored in | a single bit. This overloaded-one-bit alternative, explored in | |||
[Floyd94], would have involved a single bit with two values. One | [Floyd94], would have involved a single bit with two values. One | |||
skipping to change at page 56, line 34 | skipping to change at page 58, line 29 | |||
ECN-capable transport reacting to the CE codepoint in a pure ACK | ECN-capable transport reacting to the CE codepoint in a pure ACK | |||
packet by reducing the window would be at a disadvantage in | packet by reducing the window would be at a disadvantage in | |||
comparison to a non-ECN-capable transport. For this reason (and for | comparison to a non-ECN-capable transport. For this reason (and for | |||
reasons described earlier in relation to retransmitted packets), it | reasons described earlier in relation to retransmitted packets), it | |||
is desirable to have the ECT codepoint set on a per-packet basis. | is desirable to have the ECT codepoint set on a per-packet basis. | |||
Another advantage of the two-bit approach is that it is somewhat more | Another advantage of the two-bit approach is that it is somewhat more | |||
robust. The most critical issue, discussed in Section 8, is that the | robust. The most critical issue, discussed in Section 8, is that the | |||
default indication should be that of a non-ECN-Capable transport. In | default indication should be that of a non-ECN-Capable transport. In | |||
a two-bit implementation, this requirement for the default value | a two-bit implementation, this requirement for the default value | |||
simply means that the non-ECT codepoint should be the default. In | simply means that the not-ECT codepoint should be the default. In | |||
the one-bit implementation, this means that the single overloaded bit | the one-bit implementation, this means that the single overloaded bit | |||
should by default be in the "CE or not ECT" position. This is less | should by default be in the "CE or not ECT" position. This is less | |||
clear and straightforward, and possibly more open to incorrect | clear and straightforward, and possibly more open to incorrect | |||
implementations either in the end nodes or in the routers. | implementations either in the end nodes or in the routers. | |||
In summary, while the one-bit implementation could be a possible | In summary, while the one-bit implementation could be a possible | |||
implementation, it has the following significant limitations relative | implementation, it has the following significant limitations relative | |||
to the two-bit implementation. First, the one-bit implementation has | to the two-bit implementation. First, the one-bit implementation has | |||
more limited functionality for the treatment of CE packets at a | more limited functionality for the treatment of CE packets at a | |||
second congested router. Second, the one-bit implementation requires | second congested router. Second, the one-bit implementation requires | |||
skipping to change at page 58, line 12 | skipping to change at page 60, line 8 | |||
Headers" [RFC2474] in which bits 6 and 7 of the DS field are listed | Headers" [RFC2474] in which bits 6 and 7 of the DS field are listed | |||
as Currently Unused (CU). RFC 2780 [RFC2780] specified ECN as an | as Currently Unused (CU). RFC 2780 [RFC2780] specified ECN as an | |||
experimental use of the two-bit CU field. RFC 2780 updated the | experimental use of the two-bit CU field. RFC 2780 updated the | |||
definition of the DS Field to only encompass the first six bits of | definition of the DS Field to only encompass the first six bits of | |||
this octet rather than all eight bits; these first six bits are | this octet rather than all eight bits; these first six bits are | |||
defined as the Differentiated Services CodePoint (DSCP): | defined as the Differentiated Services CodePoint (DSCP): | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| DSCP | CU | RFCs 2474, | | DSCP | CU | RFCs 2474, | |||
2780 | +-----+-----+-----+-----+-----+-----+-----+-----+ 2780 | |||
+-----+-----+-----+-----+-----+-----+-----+-----+ | ||||
Because of this unstable history, the definition of the ECN field in | Because of this unstable history, the definition of the ECN field in | |||
this document cannot be guaranteed to be backwards compatible with | this document cannot be guaranteed to be backwards compatible with | |||
all past uses of these two bits. | all past uses of these two bits. | |||
Prior to RFC 2474, routers were not permitted to modify bits in | Prior to RFC 2474, routers were not permitted to modify bits in | |||
either the DSCP or ECN field of packets forwarded through them, and | either the DSCP or ECN field of packets forwarded through them, and | |||
hence routers that comply only with RFCs prior to 2474 should have no | hence routers that comply only with RFCs prior to 2474 should have no | |||
effect on ECN. For end nodes, bit 7 (the second ECN bit) must be | effect on ECN. For end nodes, bit 7 (the second ECN bit) must be | |||
transmitted as zero for any implementation compliant only with RFCs | transmitted as zero for any implementation compliant only with RFCs | |||
skipping to change at page 58, line 41 | skipping to change at page 60, line 36 | |||
codepoint even in the absence of congestion. This has been discussed | codepoint even in the absence of congestion. This has been discussed | |||
in the section on "Non-compliance in the Network". | in the section on "Non-compliance in the Network". | |||
The damage that could be done in an ECN-capable environment by a non- | The damage that could be done in an ECN-capable environment by a non- | |||
ECN-capable end-node transmitting packets with the ECT codepoint set | ECN-capable end-node transmitting packets with the ECT codepoint set | |||
has been discussed in the section on "Non-compliance by the End | has been discussed in the section on "Non-compliance by the End | |||
Nodes". | Nodes". | |||
23. IANA Considerations | 23. IANA Considerations | |||
The codepoints for the ECN Field of the IP header and the bits for | This section contains the namespaces that have either been created in | |||
CWR and ECE in the TCP header are specified by the Standards Action | this specification, or the values assigned in existing namespaces | |||
of this RFC, as is required by RFC 2780. | managed by IANA. | |||
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet | ||||
The codepoints for the ECN Field of the IP header are specified by | ||||
the Standards Action of this RFC, as is required by RFC 2780. | ||||
When this draft is published as an RFC, IANA should create a new | ||||
registry, "IPv4 TOS Byte and IPv6 Traffic Class Octet", with the | ||||
namespace as follows: | ||||
IPv4 TOS Byte and IPv6 Traffic Class Octet | ||||
Description: The registrations are identical for IPv4 and IPv6. | ||||
Bits 0-5: see Differentiated Services Field Codepoints Registry | ||||
(http://www.iana.org/assignments/dscp-registry) | ||||
Bits 6-7, ECN Field: | ||||
Binary Keyword References | ||||
------ ------- ---------- | ||||
00 Not-ECT (Not ECN-Capable Transport) [RFC xxx] | ||||
01 ECT(1) (ECN-Capable Transport(1)) [RFC xxx] | ||||
10 ECT(0) (ECN-Capable Transport(0)) [RFC xxx] | ||||
11 CE (Congestion Experienced) [RFC xxx] | ||||
23.2. TCP Header Flags | ||||
The codepoints for the CWR and ECE flags in the TCP header are | ||||
specified by the Standards Action of this RFC, as is required by RFC | ||||
2780. | ||||
When this draft is published as an RFC, IANA should create a new | ||||
registry, "TCP Header Flags", with the namespace as follows: | ||||
TCP Header Flags | ||||
The Transmission Control Protocol (TCP) included a 6-bit Reserved | ||||
field defined in RFC 793, reserved for future use, in bytes | ||||
13 and 14 of the TCP header, as illustrated below. The other six | ||||
Control bits are defined separately by RFC 793. | ||||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | ||||
| | | U | A | P | R | S | F | | ||||
| Header Length | Reserved | R | C | S | S | Y | I | | ||||
| | | G | K | H | T | N | N | | ||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | ||||
RFC xxx defines two of the six bits from the Reserved field to be | ||||
used for ECN, as follows: | ||||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | ||||
| | | C | E | U | A | P | R | S | F | | ||||
| Header Length | Reserved | W | C | R | C | S | S | Y | I | | ||||
| | | R | E | G | K | H | T | N | N | | ||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | ||||
TCP Header Flags | ||||
Bit Name Reference | ||||
--- ---- --------- | ||||
8 CWR (Congestion Window Reduced) [RFC xxx] | ||||
9 ECE (ECN-Echo) [RFC xxx] | ||||
23.3. IPSEC Security Association Attributes | ||||
IANA allocated the IPSEC Security Association Attribute value 10 for | IANA allocated the IPSEC Security Association Attribute value 10 for | |||
the ECN Tunnel use described in Section 9.2.1.2 above at the request | the ECN Tunnel use described in Section 9.2.1.2 above at the request | |||
of David Black in November 1999. If this draft is approved for | of David Black in November 1999. When this draft is published as an | |||
publication as an RFC, IANA should change the Reference for this | RFC, IANA should change the Reference for this allocation from David | |||
allocation from David Black's request to this RFC based on its RFC | Black's request to this RFC based on its RFC number. | |||
number. | ||||
AUTHORS' ADDRESSES | AUTHORS' ADDRESSES | |||
K. K. Ramakrishnan | K. K. Ramakrishnan | |||
TeraOptic Networks, Inc. | TeraOptic Networks, Inc. | |||
Phone: +1 (408) 666-8650 | Phone: +1 (408) 666-8650 | |||
Email: kk@teraoptic.com | Email: kk@teraoptic.com | |||
Sally Floyd | Sally Floyd | |||
Phone: +1 (510) 666-2989 | Phone: +1 (510) 666-2989 | |||
ACIRI | ACIRI | |||
Email: floyd@aciri.org | Email: floyd@aciri.org | |||
URL: http://www.aciri.org/floyd/ | URL: http://www.aciri.org/floyd/ | |||
David L. Black | David L. Black | |||
EMC Corporation | EMC Corporation | |||
42 South St. | 42 South St. | |||
Hopkinton, MA 01748 | Hopkinton, MA 01748 | |||
skipping to change at page 59, line 25 | skipping to change at page 63, line 17 | |||
Email: floyd@aciri.org | Email: floyd@aciri.org | |||
URL: http://www.aciri.org/floyd/ | URL: http://www.aciri.org/floyd/ | |||
David L. Black | David L. Black | |||
EMC Corporation | EMC Corporation | |||
42 South St. | 42 South St. | |||
Hopkinton, MA 01748 | Hopkinton, MA 01748 | |||
Phone: +1 (508) 435-1000 x75140 | Phone: +1 (508) 435-1000 x75140 | |||
Email: black_david@emc.com | Email: black_david@emc.com | |||
This draft was created in March 2001. | This draft was created in June 2001. | |||
It expires September 2001. | It expires December 2001. | |||
End of changes. | ||||
This html diff was produced by rfcdiff 1.25, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |