draft-ietf-tsvwg-ecn-04.txt | rfc3168.txt | |||
---|---|---|---|---|
Internet Engineering Task Force K. K. Ramakrishnan | Network Working Group K. Ramakrishnan | |||
INTERNET DRAFT TeraOptic Networks | Request for Comments: 3168 TeraOptic Networks | |||
draft-ietf-tsvwg-ecn-04.txt Sally Floyd | Updates: 2474, 2401, 793 S. Floyd | |||
ACIRI | Obsoletes: 2481 ACIRI | |||
D. Black | Category: Standards Track D. Black | |||
EMC | EMC | |||
June, 2001 | September 2001 | |||
Expires: December, 2001 | ||||
The Addition of Explicit Congestion Notification (ECN) to IP | The Addition of Explicit Congestion Notification (ECN) to IP | |||
Status of this Memo | Status of this Memo | |||
This document is an Internet-Draft and is in full conformance with | ||||
all provisions of Section 10 of RFC2026. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF), its areas, and its working groups. Note that | ||||
other groups may also distribute working documents as Internet- | ||||
Drafts. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document specifies an Internet standards track protocol for the | |||
and may be updated, replaced, or obsoleted by other documents at any | Internet community, and requests discussion and suggestions for | |||
time. It is inappropriate to use Internet-Drafts as reference | improvements. Please refer to the current edition of the "Internet | |||
material or to cite them other than as "work in progress." | Official Protocol Standards" (STD 1) for the standardization state | |||
and status of this protocol. Distribution of this memo is unlimited. | ||||
The list of current Internet-Drafts can be accessed at | Copyright Notice | |||
http://www.ietf.org/ietf/1id-abstracts.txt | ||||
The list of Internet-Draft Shadow Directories can be accessed at | Copyright (C) The Internet Society (2001). All Rights Reserved. | |||
http://www.ietf.org/shadow.html. | ||||
Abstract | Abstract | |||
This document specifies the incorporation of ECN (Explicit Congestion | This memo specifies the incorporation of ECN (Explicit Congestion | |||
Notification) to TCP and IP, including ECN's use of two bits in the | Notification) to TCP and IP, including ECN's use of two bits in the | |||
IP header. We begin by describing TCP's use of packet drops as an | IP header. | |||
indication of congestion. Next we explain that with the addition of | ||||
active queue management (e.g., RED) to the Internet infrastructure, | Table of Contents | |||
where routers detect congestion before the queue overflows, routers | ||||
are no longer limited to packet drops as an indication of congestion. | 1. Introduction.................................................. 3 | |||
Routers can instead set the Congestion Experienced (CE) codepoint in | 2. Conventions and Acronyms...................................... 5 | |||
the IP header of packets from ECN-capable transports. We describe | 3. Assumptions and General Principles............................ 5 | |||
when the CE codepoint is to be set in routers, and describe | 4. Active Queue Management (AQM)................................. 6 | |||
modifications needed to TCP to make it ECN-capable. Modifications to | 5. Explicit Congestion Notification in IP........................ 6 | |||
other transport protocols (e.g., unreliable unicast or multicast, | 5.1. ECN as an Indication of Persistent Congestion............... 10 | |||
reliable multicast, other reliable unicast transport protocols) could | 5.2. Dropped or Corrupted Packets................................ 11 | |||
be considered as those protocols are developed and advance through | 5.3. Fragmentation............................................... 11 | |||
the standards process. We also describe in this document the issues | 6. Support from the Transport Protocol........................... 12 | |||
6.1. TCP......................................................... 13 | ||||
6.1.1 TCP Initialization......................................... 14 | ||||
6.1.1.1. Middlebox Issues........................................ 16 | ||||
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field. 17 | ||||
6.1.2. The TCP Sender............................................ 18 | ||||
6.1.3. The TCP Receiver.......................................... 19 | ||||
6.1.4. Congestion on the ACK-path................................ 20 | ||||
6.1.5. Retransmitted TCP packets................................. 20 | ||||
6.1.6. TCP Window Probes......................................... 22 | ||||
7. Non-compliance by the End Nodes............................... 22 | ||||
8. Non-compliance in the Network................................. 24 | ||||
8.1. Complications Introduced by Split Paths..................... 25 | ||||
9. Encapsulated Packets.......................................... 25 | ||||
9.1. IP packets encapsulated in IP............................... 25 | ||||
9.1.1. The Limited-functionality and Full-functionality Options.. 27 | ||||
9.1.2. Changes to the ECN Field within an IP Tunnel.............. 28 | ||||
9.2. IPsec Tunnels............................................... 29 | ||||
9.2.1. Negotiation between Tunnel Endpoints...................... 31 | ||||
9.2.1.1. ECN Tunnel Security Association Database Field.......... 32 | ||||
9.2.1.2. ECN Tunnel Security Association Attribute............... 32 | ||||
9.2.1.3. Changes to IPsec Tunnel Header Processing............... 33 | ||||
9.2.2. Changes to the ECN Field within an IPsec Tunnel........... 35 | ||||
9.2.3. Comments for IPsec Support................................ 35 | ||||
9.3. IP packets encapsulated in non-IP Packet Headers............ 36 | ||||
10. Issues Raised by Monitoring and Policing Devices............. 36 | ||||
11. Evaluations of ECN........................................... 37 | ||||
11.1. Related Work Evaluating ECN................................ 37 | ||||
11.2. A Discussion of the ECN nonce.............................. 37 | ||||
11.2.1. The Incremental Deployment of ECT(1) in Routers.......... 38 | ||||
12. Summary of changes required in IP and TCP.................... 38 | ||||
13. Conclusions.................................................. 40 | ||||
14. Acknowledgements............................................. 41 | ||||
15. References................................................... 41 | ||||
16. Security Considerations...................................... 45 | ||||
17. IPv4 Header Checksum Recalculation........................... 45 | ||||
18. Possible Changes to the ECN Field in the Network............. 45 | ||||
18.1. Possible Changes to the IP Header.......................... 46 | ||||
18.1.1. Erasing the Congestion Indication........................ 46 | ||||
18.1.2. Falsely Reporting Congestion............................. 47 | ||||
18.1.3. Disabling ECN-Capability................................. 47 | ||||
18.1.4. Falsely Indicating ECN-Capability........................ 47 | ||||
18.2. Information carried in the Transport Header................ 48 | ||||
18.3. Split Paths................................................ 49 | ||||
19. Implications of Subverting End-to-End Congestion Control..... 50 | ||||
19.1. Implications for the Network and for Competing Flows....... 50 | ||||
19.2. Implications for the Subverted Flow........................ 53 | ||||
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion | ||||
Control.................................................... 54 | ||||
20. The Motivation for the ECT Codepoints........................ 54 | ||||
20.1. The Motivation for an ECT Codepoint........................ 54 | ||||
20.2. The Motivation for two ECT Codepoints...................... 55 | ||||
21. Why use Two Bits in the IP Header?........................... 57 | ||||
22. Historical Definitions for the IPv4 TOS Octet................ 58 | ||||
23. IANA Considerations.......................................... 60 | ||||
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet................. 60 | ||||
23.2. TCP Header Flags........................................... 61 | ||||
23.3. IPSEC Security Association Attributes....................... 62 | ||||
24. Authors' Addresses........................................... 62 | ||||
25. Full Copyright Statement..................................... 63 | ||||
1. Introduction | ||||
We begin by describing TCP's use of packet drops as an indication of | ||||
congestion. Next we explain that with the addition of active queue | ||||
management (e.g., RED) to the Internet infrastructure, where routers | ||||
detect congestion before the queue overflows, routers are no longer | ||||
limited to packet drops as an indication of congestion. Routers can | ||||
instead set the Congestion Experienced (CE) codepoint in the IP | ||||
header of packets from ECN-capable transports. We describe when the | ||||
CE codepoint is to be set in routers, and describe modifications | ||||
needed to TCP to make it ECN-capable. Modifications to other | ||||
transport protocols (e.g., unreliable unicast or multicast, reliable | ||||
multicast, other reliable unicast transport protocols) could be | ||||
considered as those protocols are developed and advance through the | ||||
standards process. We also describe in this document the issues | ||||
involving the use of ECN within IP tunnels, and within IPsec tunnels | involving the use of ECN within IP tunnels, and within IPsec tunnels | |||
in particular. | in particular. | |||
One of the guiding principles for this document is that, to the | One of the guiding principles for this document is that, to the | |||
extent possible, the mechanisms specified here be incrementally | extent possible, the mechanisms specified here be incrementally | |||
deployable. One challenge to the principle of incremental deployment | deployable. One challenge to the principle of incremental deployment | |||
has been the prior existence of some IP tunnels that were not | has been the prior existence of some IP tunnels that were not | |||
compatible with the use of ECN. As ECN becomes deployed, non- | compatible with the use of ECN. As ECN becomes deployed, non- | |||
compatible IP tunnels will have to be upgraded to conform to this | compatible IP tunnels will have to be upgraded to conform to this | |||
document. | document. | |||
This document is intended to obsolete RFC 2481, "A Proposal to add | This document obsoletes RFC 2481, "A Proposal to add Explicit | |||
Explicit Congestion Notification (ECN) to IP", which defined ECN as | Congestion Notification (ECN) to IP", which defined ECN as an | |||
an Experimental Protocol for the Internet Community. | Experimental Protocol for the Internet Community. This document also | |||
updates RFC 2474, "Definition of the Differentiated Services Field | ||||
RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This | (DS Field) in the IPv4 and IPv6 Headers", in defining the ECN field | |||
document also obsoletes three subsequent internet-drafts on ECN, | in the IP header, RFC 2401, "Security Architecture for the Internet | |||
"IPsec Interactions with ECN", "ECN Interactions with IP Tunnels", | Protocol" to change the handling of IPv4 TOS Byte and IPv6 Traffic | |||
and "TCP with ECN: The Treatment of Retransmitted Data Packets". | Class Octet in tunnel mode header construction to be compatible with | |||
This document also updates RFC 2401 on "Security Architecture for the | the use of ECN, and RFC 793, "Transmission Control Protocol", in | |||
Internet Protocol". | defining two new flags in the TCP header. | |||
Table of Contents | ||||
1. Introduction | ||||
2. Conventions and Acronyms | ||||
3. Assumptions and General Principles | ||||
4. Active Queue Management (AQM) | ||||
5. Explicit Congestion Notification in IP | ||||
5.1. ECN as an Indication of Persistent Congestion | ||||
5.2. Dropped or Corrupted Packets | ||||
5.3. Fragmentation | ||||
6. Support from the Transport Protocol | ||||
6.1. TCP | ||||
6.1.1 TCP Initialization | ||||
6.1.1.1. Middlebox Issues | ||||
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field | ||||
6.1.2. The TCP Sender | ||||
6.1.3. The TCP Receiver | ||||
6.1.4. Congestion on the ACK-path | ||||
6.1.5. Retransmitted TCP packets | ||||
6.1.6. TCP Window Probes. | ||||
7. Non-compliance by the End Nodes | ||||
8. Non-compliance in the Network | ||||
8.1. Complications Introduced by Split Paths | ||||
9. Encapsulated Packets | ||||
9.1. IP packets encapsulated in IP | ||||
9.1.1. The Limited-functionality and Full-functionality Options | ||||
9.1.2. Changes to the ECN Field within an IP Tunnel. | ||||
9.2. IPsec Tunnels | ||||
9.2.1. Negotiation between Tunnel Endpoints | ||||
9.2.1.1. ECN Tunnel Security Association Database Field | ||||
9.2.1.2. ECN Tunnel Security Association Attribute | ||||
9.2.1.3. Changes to IPsec Tunnel Header Processing | ||||
9.2.2. Changes to the ECN Field within an IPsec Tunnel. | ||||
9.2.3. Comments for IPsec Support | ||||
9.3. IP packets encapsulated in non-IP Packet Headers. | ||||
10. Issues Raised by Monitoring and Policing Devices | ||||
11. Evaluations of ECN | ||||
11.1. Related Work Evaluating ECN | ||||
11.2. A Discussion of the ECN nonce. | ||||
11.2.1. The Incremental Deployment of ECT(1) in Routers. | ||||
12. Summary of changes required in IP and TCP | ||||
13. Conclusions | ||||
14. Acknowledgements | ||||
15. References | ||||
16. Security Considerations | ||||
17. IPv4 Header Checksum Recalculation | ||||
18. Possible Changes to the ECN Field in the Network | ||||
18.1. Possible Changes to the IP Header | ||||
18.1.1. Erasing the Congestion Indication | ||||
18.1.2. Falsely Reporting Congestion | ||||
18.1.3. Disabling ECN-Capability | ||||
18.1.4. Falsely Indicating ECN-Capability | ||||
18.2. Information carried in the Transport Header | ||||
18.3. Split Paths | ||||
19. Implications of Subverting End-to-End Congestion Control | ||||
19.1. Implications for the Network and for Competing Flows | ||||
19.2. Implications for the Subverted Flow | ||||
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | ||||
20. The Motivation for the ECT Codepoints. | ||||
20.1. The Motivation for an ECT Codepoint. | ||||
20.2. The Motivation for two ECT Codepoints. | ||||
21. Why use Two Bits in the IP Header? | ||||
22. Historical Definitions for the IPv4 TOS Octet | ||||
23. IANA Considerations | ||||
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet | ||||
23.2. TCP Header Flags | ||||
23.3. IPSEC Security Association Attributes | ||||
RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - | ||||
To compare this with draft-ietf-tsvwg-ecn-03, compare the following: | ||||
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-03.troff" | ||||
"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-04.troff" | ||||
Changes from draft-ietf-tsvwg-ecn-03: | ||||
An expanded section on IANA Considerations. | ||||
Added back the section on "Middlebox Issues" about devices that either | ||||
drop an ECN-setup SYN packet or respond with a RST. | ||||
Clarified MUSTs and SHOULDs for limited-functionality and full- | ||||
functionality modes of tunnels. | ||||
Changed "should" to "MUST" for the sentence about not using ECT with | ||||
pure ACK TCP packets. | ||||
Specified that ECN status is ignored in TCP once TIME-WAIT state is | ||||
entered. | ||||
Moved notes to the RFC editor about obsoleted documents to the | ||||
beginning of this document. | ||||
Some minor rephrasing for clarity. | ||||
Changes from draft-ietf-tsvwg-ecn-02: | ||||
Revised Section 5.3 on fragmentation. | ||||
Changes from draft-ietf-tsvwg-ecn-01: | ||||
Added the ECT(1) codepoint, and changed references about bits to | ||||
references about codepoints in many places. Also added Section 11.2 on | ||||
"A Discussion of the ECN nonce", and Section 20.2 on "The Motivation for | ||||
two ECT Codepoints". | ||||
Added a paragraph saying that by default, the discussion of setting | ||||
the CE codepoint applies to all Differentiated Services Per-Hop | ||||
Behaviors. | ||||
Added Section 5.3 on fragmentation. | ||||
Added "A host MUST NOT set ECT on SYN or SYN-ACK packets." to the end | ||||
of Section 6.1.1, just to be explicit. | ||||
Corrected some references to "Section 19" to "Section 22". | ||||
Clarified that ECN is defined identically in IPv4 and in IPv6. | ||||
1. Introduction | ||||
TCP's congestion control and avoidance algorithms are based on the | TCP's congestion control and avoidance algorithms are based on the | |||
notion that the network is a black-box [Jacobson88, Jacobson90]. The | notion that the network is a black-box [Jacobson88, Jacobson90]. The | |||
network's state of congestion or otherwise is determined by end- | network's state of congestion or otherwise is determined by end- | |||
systems probing for the network state, by gradually increasing the | systems probing for the network state, by gradually increasing the | |||
load on the network (by increasing the window of packets that are | load on the network (by increasing the window of packets that are | |||
outstanding in the network) until the network becomes congested and a | outstanding in the network) until the network becomes congested and a | |||
packet is lost. Treating the network as a "black-box" and treating | packet is lost. Treating the network as a "black-box" and treating | |||
loss as an indication of congestion in the network is appropriate for | loss as an indication of congestion in the network is appropriate for | |||
pure best-effort data carried by TCP, with little or no sensitivity | pure best-effort data carried by TCP, with little or no sensitivity | |||
skipping to change at page 5, line 43 | skipping to change at page 4, line 29 | |||
gradually increasing the window size until it experiences a dropped | gradually increasing the window size until it experiences a dropped | |||
packet, this causes the queues at the bottleneck router to build up. | packet, this causes the queues at the bottleneck router to build up. | |||
With most packet drop policies at the router that are not sensitive | With most packet drop policies at the router that are not sensitive | |||
to the load placed by each individual flow (e.g., tail-drop on queue | to the load placed by each individual flow (e.g., tail-drop on queue | |||
overflow), this means that some of the packets of latency-sensitive | overflow), this means that some of the packets of latency-sensitive | |||
flows may be dropped. In addition, such drop policies lead to | flows may be dropped. In addition, such drop policies lead to | |||
synchronization of loss across multiple flows. | synchronization of loss across multiple flows. | |||
Active queue management mechanisms detect congestion before the queue | Active queue management mechanisms detect congestion before the queue | |||
overflows, and provide an indication of this congestion to the end | overflows, and provide an indication of this congestion to the end | |||
nodes. Thus, active queue management can reduce unnecessary queueing | nodes. Thus, active queue management can reduce unnecessary queuing | |||
delay for all traffic sharing that queue. The advantages of active | delay for all traffic sharing that queue. The advantages of active | |||
queue management are discussed in RFC 2309 [RFC2309]. Active queue | queue management are discussed in RFC 2309 [RFC2309]. Active queue | |||
management avoids some of the bad properties of dropping on queue | management avoids some of the bad properties of dropping on queue | |||
overflow, including the undesirable synchronization of loss across | overflow, including the undesirable synchronization of loss across | |||
multiple flows. More importantly, active queue management means that | multiple flows. More importantly, active queue management means that | |||
transport protocols with mechanisms for congestion control (e.g., | transport protocols with mechanisms for congestion control (e.g., | |||
TCP) do not have to rely on buffer overflow as the only indication of | TCP) do not have to rely on buffer overflow as the only indication of | |||
congestion. | congestion. | |||
Active queue management mechanisms may use one of several methods for | Active queue management mechanisms may use one of several methods for | |||
indicating congestion to end-nodes. One is to use packet drops, as is | indicating congestion to end-nodes. One is to use packet drops, as is | |||
currently done. However, active queue management allows the router to | currently done. However, active queue management allows the router to | |||
separate policies of queueing or dropping packets from the policies | separate policies of queuing or dropping packets from the policies | |||
for indicating congestion. Thus, active queue management allows | for indicating congestion. Thus, active queue management allows | |||
routers to use the Congestion Experienced (CE) codepoint in a packet | routers to use the Congestion Experienced (CE) codepoint in a packet | |||
header as an indication of congestion, instead of relying solely on | header as an indication of congestion, instead of relying solely on | |||
packet drops. This has the potential of reducing the impact of loss | packet drops. This has the potential of reducing the impact of loss | |||
on latency-sensitive flows. | on latency-sensitive flows. | |||
There exist some middleboxes (firewalls, load balancers, or intrusion | There exist some middleboxes (firewalls, load balancers, or intrusion | |||
detection systems) in the Internet that either drop a TCP SYN packet | detection systems) in the Internet that either drop a TCP SYN packet | |||
configured to negotiate ECN, or respond with a RST. This document | configured to negotiate ECN, or respond with a RST. This document | |||
specifies procedures that TCP implementations may use to provide | specifies procedures that TCP implementations may use to provide | |||
robust connectivity even in the presence of such equipment. | robust connectivity even in the presence of such equipment. | |||
This document is intended to obsolete RFC 2481, "A Proposal to add | ||||
Explicit Congestion Notification (ECN) to IP", which defined ECN as | ||||
an Experimental Protocol for the Internet Community. | ||||
2. Conventions and Acronyms | 2. Conventions and Acronyms | |||
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | |||
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | |||
document, are to be interpreted as described in [RFC2119]. | document, are to be interpreted as described in [RFC2119]. | |||
3. Assumptions and General Principles | 3. Assumptions and General Principles | |||
In this section, we describe some of the important design principles | In this section, we describe some of the important design principles | |||
and assumptions that guided the design choices in this proposal. | and assumptions that guided the design choices in this proposal. | |||
* Because ECN is likely to be adopted gradually, accommodating | * Because ECN is likely to be adopted gradually, accommodating | |||
migration is essential. Some routers may still only drop packets to | migration is essential. Some routers may still only drop packets | |||
indicate congestion, and some end-systems may not be ECN-capable. The | to indicate congestion, and some end-systems may not be ECN- | |||
most viable strategy is one that accommodates incremental deployment | capable. The most viable strategy is one that accommodates | |||
without having to resort to "islands" of ECN-capable and non-ECN- | incremental deployment without having to resort to "islands" of | |||
capable environments. | ECN-capable and non-ECN-capable environments. | |||
* New mechanisms for congestion control and avoidance need to co- | ||||
exist and cooperate with existing mechanisms for congestion control. | * New mechanisms for congestion control and avoidance need to co- | |||
In particular, new mechanisms have to co-exist with TCP's current | exist and cooperate with existing mechanisms for congestion | |||
methods of adapting to congestion and with routers' current practice | control. In particular, new mechanisms have to co-exist with | |||
of dropping packets in periods of congestion. | TCP's current methods of adapting to congestion and with | |||
* Congestion may persist over different time-scales. The time scales | routers' current practice of dropping packets in periods of | |||
that we are concerned with are congestion events that may last longer | congestion. | |||
than a round-trip time. | ||||
* The number of packets in an individual flow (e.g., TCP connection | * Congestion may persist over different time-scales. The time | |||
or an exchange using UDP) may range from a small number of packets to | scales that we are concerned with are congestion events that may | |||
quite a large number. We are interested in managing the congestion | last longer than a round-trip time. | |||
caused by flows that send enough packets so that they are still | ||||
active when network feedback reaches them. | * The number of packets in an individual flow (e.g., TCP | |||
* Asymmetric routing is likely to be a normal occurrence in the | connection or an exchange using UDP) may range from a small | |||
Internet. The path (sequence of links and routers) followed by data | number of packets to quite a large number. We are interested in | |||
packets may be different from the path followed by the acknowledgment | managing the congestion caused by flows that send enough packets | |||
packets in the reverse direction. | so that they are still active when network feedback reaches | |||
* Many routers process the "regular" headers in IP packets more | them. | |||
efficiently than they process the header information in IP options. | ||||
This suggests keeping congestion experienced information in the | * Asymmetric routing is likely to be a normal occurrence in the | |||
regular headers of an IP packet. | Internet. The path (sequence of links and routers) followed by | |||
* It must be recognized that not all end-systems will cooperate in | data packets may be different from the path followed by the | |||
mechanisms for congestion control. However, new mechanisms shouldn't | acknowledgment packets in the reverse direction. | |||
make it easier for TCP applications to disable TCP congestion | ||||
control. The benefit of lying about participating in new mechanisms | * Many routers process the "regular" headers in IP packets more | |||
such as ECN-capability should be small. | efficiently than they process the header information in IP | |||
options. This suggests keeping congestion experienced | ||||
information in the regular headers of an IP packet. | ||||
* It must be recognized that not all end-systems will cooperate in | ||||
mechanisms for congestion control. However, new mechanisms | ||||
shouldn't make it easier for TCP applications to disable TCP | ||||
congestion control. The benefit of lying about participating in | ||||
new mechanisms such as ECN-capability should be small. | ||||
4. Active Queue Management (AQM) | 4. Active Queue Management (AQM) | |||
Random Early Detection (RED) is one mechanism for Active Queue | Random Early Detection (RED) is one mechanism for Active Queue | |||
Management (AQM) that has been proposed to detect incipient | Management (AQM) that has been proposed to detect incipient | |||
congestion [FJ93], and is currently being deployed in the Internet | congestion [FJ93], and is currently being deployed in the Internet | |||
[RFC2309]. AQM is meant to be a general mechanism using one of | [RFC2309]. AQM is meant to be a general mechanism using one of | |||
several alternatives for congestion indication, but in the absence of | several alternatives for congestion indication, but in the absence of | |||
ECN, AQM is restricted to using packet drops as a mechanism for | ECN, AQM is restricted to using packet drops as a mechanism for | |||
congestion indication. AQM drops packets based on the average queue | congestion indication. AQM drops packets based on the average queue | |||
skipping to change at page 8, line 27 | skipping to change at page 7, line 23 | |||
particular, this document does not address mechanisms for TCP end- | particular, this document does not address mechanisms for TCP end- | |||
nodes to differentiate between the ECT(0) and ECT(1) codepoints. | nodes to differentiate between the ECT(0) and ECT(1) codepoints. | |||
Protocols and senders that only require a single ECT codepoint SHOULD | Protocols and senders that only require a single ECT codepoint SHOULD | |||
use ECT(0). | use ECT(0). | |||
The not-ECT codepoint '00' indicates a packet that is not using ECN. | The not-ECT codepoint '00' indicates a packet that is not using ECN. | |||
The CE codepoint '11' is set by a router to indicate congestion to | The CE codepoint '11' is set by a router to indicate congestion to | |||
the end nodes. Routers that have a packet arriving at a full queue | the end nodes. Routers that have a packet arriving at a full queue | |||
drop the packet, just as they do in the absence of ECN. | drop the packet, just as they do in the absence of ECN. | |||
+-----+-----+ | +-----+-----+ | |||
| ECN FIELD | | | ECN FIELD | | |||
+-----+-----+ | +-----+-----+ | |||
ECT CE [Obsolete] RFC 2481 names for the ECN bits. | ECT CE [Obsolete] RFC 2481 names for the ECN bits. | |||
0 0 Not-ECT | 0 0 Not-ECT | |||
0 1 ECT(1) | 0 1 ECT(1) | |||
1 0 ECT(0) | 1 0 ECT(0) | |||
1 1 CE | 1 1 CE | |||
Figure 1: The ECN Field in IP. | Figure 1: The ECN Field in IP. | |||
The use of two ECT codepoints essentially gives a one-bit ECN nonce | The use of two ECT codepoints essentially gives a one-bit ECN nonce | |||
in packet headers, and routers necessarily "erase" the nonce when | in packet headers, and routers necessarily "erase" the nonce when | |||
they set the CE codepoint [SCWA99]. For example, routers that erased | they set the CE codepoint [SCWA99]. For example, routers that erased | |||
the CE codepoint would face additional difficulty in reconstructing | the CE codepoint would face additional difficulty in reconstructing | |||
the original nonce, and thus repeated erasure of the CE codepoint | the original nonce, and thus repeated erasure of the CE codepoint | |||
would be more likely to be detected by the end-nodes. The ECN nonce | would be more likely to be detected by the end-nodes. The ECN nonce | |||
also can address the problem of misbehaving transport receivers lying | also can address the problem of misbehaving transport receivers lying | |||
to the transport sender about whether or not the CE codepoint was set | to the transport sender about whether or not the CE codepoint was set | |||
in a packet. The motivations for the use of two ECT codepoints is | in a packet. The motivations for the use of two ECT codepoints is | |||
discussed in more detail in Section 20, along with some discussion of | discussed in more detail in Section 20, along with some discussion of | |||
alternate possibilities for the fourth ECT codepoint (that is, the | alternate possibilities for the fourth ECT codepoint (that is, the | |||
codepoint '01'). Backwards compatibility with earlier ECN | codepoint '01'). Backwards compatibility with earlier ECN | |||
implementations that do not understand the ECT(1) codepoint is | implementations that do not understand the ECT(1) codepoint is | |||
discussed in Section 11. | discussed in Section 11. | |||
In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable | In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable | |||
Transport (ECT) bit and the CE bit. The ECN field with only the ECN- | Transport (ECT) bit and the CE bit. The ECN field with only the | |||
Capable Transport (ECT) bit set in RFC 2481 corresponds to the ECT(0) | ECN-Capable Transport (ECT) bit set in RFC 2481 corresponds to the | |||
codepoint in this document, and the ECN field with both the ECT and | ECT(0) codepoint in this document, and the ECN field with both the | |||
CE bit in RFC 2481 corresponds to the CE codepoint in this document. | ECT and CE bit in RFC 2481 corresponds to the CE codepoint in this | |||
The '01' codepoint was left undefined in RFC 2481, and this is the | document. The '01' codepoint was left undefined in RFC 2481, and | |||
reason for recommending the use of ECT(0) when only a single ECT | this is the reason for recommending the use of ECT(0) when only a | |||
codepoint is needed. | single ECT codepoint is needed. | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| DS FIELD, DSCP | ECN FIELD | | | DS FIELD, DSCP | ECN FIELD | | |||
+-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
DSCP: differentiated services codepoint | DSCP: differentiated services codepoint | |||
ECN: Explicit Congestion Notification | ECN: Explicit Congestion Notification | |||
Figure 2: The Differentiated Services and ECN Fields in IP. | Figure 2: The Differentiated Services and ECN Fields in IP. | |||
Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. | Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. | |||
The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6, | The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6, | |||
and the ECN field is defined identically in both cases. The | and the ECN field is defined identically in both cases. The | |||
definitions for the IPv4 TOS octet [RFC791] and the IPv6 Traffic | definitions for the IPv4 TOS octet [RFC791] and the IPv6 Traffic | |||
Class octet have been superseded by the six-bit DS (Differentiated | Class octet have been superseded by the six-bit DS (Differentiated | |||
Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed in | Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed in | |||
[RFC2474] as Currently Unused, and are specified in RFC 2780 as | [RFC2474] as Currently Unused, and are specified in RFC 2780 as | |||
approved for experimental use for ECN. Section 22 gives a brief | approved for experimental use for ECN. Section 22 gives a brief | |||
history of the TOS octet. | history of the TOS octet. | |||
Because of the unstable history of the TOS octet, the use of the ECN | Because of the unstable history of the TOS octet, the use of the ECN | |||
field as specified in this document cannot be guaranteed to be | field as specified in this document cannot be guaranteed to be | |||
backwards compatible with those past uses of these two bits that pre- | backwards compatible with those past uses of these two bits that | |||
date ECN. The potential dangers of this lack of backwards | pre-date ECN. The potential dangers of this lack of backwards | |||
compatibility are discussed in Section 22. | compatibility are discussed in Section 22. | |||
Upon the receipt by an ECN-Capable transport of a single CE packet, | Upon the receipt by an ECN-Capable transport of a single CE packet, | |||
the congestion control algorithms followed at the end-systems MUST be | the congestion control algorithms followed at the end-systems MUST be | |||
essentially the same as the congestion control response to a *single* | essentially the same as the congestion control response to a *single* | |||
dropped packet. For example, for ECN-Capable TCP the source TCP is | dropped packet. For example, for ECN-Capable TCP the source TCP is | |||
required to halve its congestion window for any window of data | required to halve its congestion window for any window of data | |||
containing either a packet drop or an ECN indication. | containing either a packet drop or an ECN indication. | |||
One reason for requiring that the congestion-control response to the | One reason for requiring that the congestion-control response to the | |||
skipping to change at page 13, line 7 | skipping to change at page 12, line 4 | |||
development to allow end-nodes to interpret packet drops as | development to allow end-nodes to interpret packet drops as | |||
indications of corruption rather than congestion. | indications of corruption rather than congestion. | |||
5.3. Fragmentation | 5.3. Fragmentation | |||
ECN-capable packets MAY have the DF (Don't Fragment) bit set. | ECN-capable packets MAY have the DF (Don't Fragment) bit set. | |||
Reassembly of a fragmented packet MUST NOT lose indications of | Reassembly of a fragmented packet MUST NOT lose indications of | |||
congestion. In other words, if any fragment of an IP packet to be | congestion. In other words, if any fragment of an IP packet to be | |||
reassembled has the CE codepoint set, then one of two actions MUST be | reassembled has the CE codepoint set, then one of two actions MUST be | |||
taken: | taken: | |||
* Set the CE codepoint on the reassembled packet. However, this | * Set the CE codepoint on the reassembled packet. However, this | |||
MUST NOT occur if any of the other fragments contributing to this | MUST NOT occur if any of the other fragments contributing to | |||
reassembly carries the Not-ECT codepoint. | this reassembly carries the Not-ECT codepoint. | |||
* The packet is dropped, instead of being reassembled, for any | * The packet is dropped, instead of being reassembled, for any | |||
other reason. | other reason. | |||
If both actions are applicable, either MAY be chosen. Reassembly of | If both actions are applicable, either MAY be chosen. Reassembly of | |||
a fragmented packet MUST NOT change the ECN codepoint when all of the | a fragmented packet MUST NOT change the ECN codepoint when all of the | |||
fragments carry the same codepoint. | fragments carry the same codepoint. | |||
We would note that because RFC 2481 did not specify reassembly | We would note that because RFC 2481 did not specify reassembly | |||
behavior, older ECN implementations conformant with that Experimental | behavior, older ECN implementations conformant with that Experimental | |||
RFC do not necessarily perform reassembly correctly, in terms of | RFC do not necessarily perform reassembly correctly, in terms of | |||
preserving the CE codepoint in a fragment. The sender could avoid | preserving the CE codepoint in a fragment. The sender could avoid | |||
the consequences of this behavior by setting the DF bit in ECN- | the consequences of this behavior by setting the DF bit in ECN- | |||
Capable packets. | Capable packets. | |||
Situations may arise in which the above reassembly specification is | Situations may arise in which the above reassembly specification is | |||
insufficiently precise. For example, if there is a malicious or | insufficiently precise. For example, if there is a malicious or | |||
broken entity in the path at or after the fragmentation point, packet | broken entity in the path at or after the fragmentation point, packet | |||
fragments could carry a mixture of ECT(0), ECT(1), and/or Not-ECT | fragments could carry a mixture of ECT(0), ECT(1), and/or Not-ECT | |||
codepoints. The reassembly specification above does not place | codepoints. The reassembly specification above does not place | |||
requirements on reassembly of fragments in this case. In situations | requirements on reassembly of fragments in this case. In situations | |||
where more precise reassembly behavior would be required, protocol | where more precise reassembly behavior would be required, protocol | |||
specifications SHOULD instead specify that DF MUST be set in all ECN- | specifications SHOULD instead specify that DF MUST be set in all | |||
capable packets sent by the protocol. | ECN-capable packets sent by the protocol. | |||
6. Support from the Transport Protocol | 6. Support from the Transport Protocol | |||
ECN requires support from the transport protocol, in addition to the | ECN requires support from the transport protocol, in addition to the | |||
functionality given by the ECN field in the IP packet header. The | functionality given by the ECN field in the IP packet header. The | |||
transport protocol might require negotiation between the endpoints | transport protocol might require negotiation between the endpoints | |||
during setup to determine that all of the endpoints are ECN-capable, | during setup to determine that all of the endpoints are ECN-capable, | |||
so that the sender can set the ECT codepoint in transmitted packets. | so that the sender can set the ECT codepoint in transmitted packets. | |||
Second, the transport protocol must be capable of reacting | Second, the transport protocol must be capable of reacting | |||
appropriately to the receipt of CE packets. This reaction could be | appropriately to the receipt of CE packets. This reaction could be | |||
skipping to change at page 14, line 36 | skipping to change at page 13, line 36 | |||
This proposal specifies two new flags in the Reserved field of the | This proposal specifies two new flags in the Reserved field of the | |||
TCP header. The TCP mechanism for negotiating ECN-Capability uses | TCP header. The TCP mechanism for negotiating ECN-Capability uses | |||
the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved | the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved | |||
field of the TCP header is designated as the ECN-Echo flag. The | field of the TCP header is designated as the ECN-Echo flag. The | |||
location of the 6-bit Reserved field in the TCP header is shown in | location of the 6-bit Reserved field in the TCP header is shown in | |||
Figure 4 of RFC 793 [RFC793] (and is reproduced below for | Figure 4 of RFC 793 [RFC793] (and is reproduced below for | |||
completeness). This specification of the ECN Field leaves the | completeness). This specification of the ECN Field leaves the | |||
Reserved field as a 4-bit field using bits 4-7. | Reserved field as a 4-bit field using bits 4-7. | |||
To enable the TCP receiver to determine when to stop setting the ECN- | To enable the TCP receiver to determine when to stop setting the | |||
Echo flag, we introduce a second new flag in the TCP header, the CWR | ECN-Echo flag, we introduce a second new flag in the TCP header, the | |||
flag. The CWR flag is assigned to Bit 8 in the Reserved field of the | CWR flag. The CWR flag is assigned to Bit 8 in the Reserved field of | |||
TCP header. | the TCP header. | |||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | U | A | P | R | S | F | | | | | U | A | P | R | S | F | | |||
| Header Length | Reserved | R | C | S | S | Y | I | | | Header Length | Reserved | R | C | S | S | Y | I | | |||
| | | G | K | H | T | N | N | | | | | G | K | H | T | N | N | | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
Figure 3: The old definition of bytes 13 and 14 of the TCP | Figure 3: The old definition of bytes 13 and 14 of the TCP | |||
header. | header. | |||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | C | E | U | A | P | R | S | F | | | | | C | E | U | A | P | R | S | F | | |||
| Header Length | Reserved | W | C | R | C | S | S | Y | I | | | Header Length | Reserved | W | C | R | C | S | S | Y | I | | |||
| | | R | E | G | K | H | T | N | N | | | | | R | E | G | K | H | T | N | N | | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
Figure 4: The new definition of bytes 13 and 14 of the TCP | Figure 4: The new definition of bytes 13 and 14 of the TCP | |||
Header. | Header. | |||
Thus, ECN uses the ECT and CE flags in the IP header (as shown in | Thus, ECN uses the ECT and CE flags in the IP header (as shown in | |||
Figure 1) for signaling between routers and connection endpoints, and | Figure 1) for signaling between routers and connection endpoints, and | |||
uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure | uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure | |||
4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection, | 4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection, | |||
a typical sequence of events in an ECN-based reaction to congestion | a typical sequence of events in an ECN-based reaction to congestion | |||
is as follows: | is as follows: | |||
* An ECT codepoint is set in packets transmitted by the sender to | * An ECT codepoint is set in packets transmitted by the sender to | |||
indicate that ECN is supported by the transport entities for these | indicate that ECN is supported by the transport entities for | |||
packets. | these packets. | |||
* An ECN-capable router detects impending congestion and detects | * An ECN-capable router detects impending congestion and detects | |||
that an ECT codepoint is set in the packet it is about to drop. | that an ECT codepoint is set in the packet it is about to drop. | |||
Instead of dropping the packet, the router chooses to set the CE | Instead of dropping the packet, the router chooses to set the CE | |||
codepoint in the IP header and forwards the packet. | codepoint in the IP header and forwards the packet. | |||
* The receiver receives the packet with the CE codepoint set, and | * The receiver receives the packet with the CE codepoint set, and | |||
sets the ECN-Echo flag in its next TCP ACK sent to the sender. | sets the ECN-Echo flag in its next TCP ACK sent to the sender. | |||
* The sender receives the TCP ACK with ECN-Echo set, and reacts to | * The sender receives the TCP ACK with ECN-Echo set, and reacts to | |||
the congestion as if a packet had been dropped. | the congestion as if a packet had been dropped. | |||
* The sender sets the CWR flag in the TCP header of the next | * The sender sets the CWR flag in the TCP header of the next | |||
packet sent to the receiver to acknowledge its receipt of and | packet sent to the receiver to acknowledge its receipt of and | |||
reaction to the ECN-Echo flag. | reaction to the ECN-Echo flag. | |||
The negotiation for using ECN by the TCP transport entities and the | The negotiation for using ECN by the TCP transport entities and the | |||
use of the ECN-Echo and CWR flags is described in more detail in the | use of the ECN-Echo and CWR flags is described in more detail in the | |||
sections below. | sections below. | |||
6.1.1 TCP Initialization | 6.1.1 TCP Initialization | |||
In the TCP connection setup phase, the source and destination TCPs | In the TCP connection setup phase, the source and destination TCPs | |||
exchange information about their willingness to use ECN. Subsequent | exchange information about their willingness to use ECN. Subsequent | |||
to the completion of this negotiation, the TCP sender sets an ECT | to the completion of this negotiation, the TCP sender sets an ECT | |||
skipping to change at page 16, line 44 | skipping to change at page 15, line 49 | |||
but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an | but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an | |||
indication that the TCP transmitting the SYN-ACK packet is ECN- | indication that the TCP transmitting the SYN-ACK packet is ECN- | |||
Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does | Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does | |||
not commit the TCP host to setting the ECT codepoint in transmitted | not commit the TCP host to setting the ECT codepoint in transmitted | |||
packets. | packets. | |||
The following rules apply to the sending of ECN-setup packets within | The following rules apply to the sending of ECN-setup packets within | |||
a TCP connection, where a TCP connection is defined by the standard | a TCP connection, where a TCP connection is defined by the standard | |||
rules for TCP connection establishment and termination. | rules for TCP connection establishment and termination. | |||
* If a host has received an ECN-setup SYN packet, then it MAY send an | * If a host has received an ECN-setup SYN packet, then it MAY send | |||
ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup | an ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an | |||
SYN-ACK packet. | ECN-setup SYN-ACK packet. | |||
* A host MUST NOT set ECT on data packets unless it has sent at least | ||||
one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at | ||||
least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no | ||||
non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has | ||||
received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK | ||||
packet, then it SHOULD NOT set ECT on data packets. | ||||
* If a host ever sets the ECT codepoint on a data packet, then that | * A host MUST NOT set ECT on data packets unless it has sent at | |||
host MUST correctly set/clear the CWR TCP bit on all subsequent | least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has | |||
packets in the connection. | received at least one ECN-setup SYN or ECN-setup SYN-ACK packet, | |||
* If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK | and has sent no non-ECN-setup SYN or non-ECN-setup SYN-ACK | |||
packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- | packet. If a host has received at least one non-ECN-setup SYN | |||
ACK packet, then if that host receives TCP data packets with ECT and | or non-ECN-setup SYN-ACK packet, then it SHOULD NOT set ECT on | |||
CE codepoints set in the IP header, then that host MUST process these | data packets. | |||
packets as specified for an ECN-capable connection. | ||||
* A host that is not willing to use ECN on a TCP connection SHOULD | * If a host ever sets the ECT codepoint on a data packet, then | |||
clear both the ECE and CWR flags in all non-ECN-setup SYN and/or SYN- | that host MUST correctly set/clear the CWR TCP bit on all | |||
ACK packets that it sends to indicate this unwillingness. Receivers | subsequent packets in the connection. | |||
MUST correctly handle all forms of the non-ECN-setup SYN and SYN-ACK | ||||
packets. | * If a host has sent at least one ECN-setup SYN or ECN-setup SYN- | |||
* A host MUST NOT set ECT on SYN or SYN-ACK packets. | ACK packet, and has received no non-ECN-setup SYN or non-ECN- | |||
setup SYN-ACK packet, then if that host receives TCP data | ||||
packets with ECT and CE codepoints set in the IP header, then | ||||
that host MUST process these packets as specified for an ECN- | ||||
capable connection. | ||||
* A host that is not willing to use ECN on a TCP connection SHOULD | ||||
clear both the ECE and CWR flags in all non-ECN-setup SYN and/or | ||||
SYN-ACK packets that it sends to indicate this unwillingness. | ||||
Receivers MUST correctly handle all forms of the non-ECN-setup | ||||
SYN and SYN-ACK packets. | ||||
* A host MUST NOT set ECT on SYN or SYN-ACK packets. | ||||
A TCP client enters TIME-WAIT state after receiving a FIN-ACK, and | A TCP client enters TIME-WAIT state after receiving a FIN-ACK, and | |||
transitions to CLOSED state after a timeout. Many TCP | transitions to CLOSED state after a timeout. Many TCP | |||
implementations create a new TCP connection if they receive an in- | implementations create a new TCP connection if they receive an in- | |||
window SYN packet during TIME-WAIT state. When a TCP host enters | window SYN packet during TIME-WAIT state. When a TCP host enters | |||
TIME-WAIT or CLOSED state, it should ignore any previous state about | TIME-WAIT or CLOSED state, it should ignore any previous state about | |||
the negotiation of ECN for that connection. | the negotiation of ECN for that connection. | |||
6.1.1.1. Middlebox Issues | 6.1.1.1. Middlebox Issues | |||
ECN introduces the use of the ECN-Echo and CWR flags in the TCP | ECN introduces the use of the ECN-Echo and CWR flags in the TCP | |||
header (as shown in Figure 3) for initialization. There exist some | header (as shown in Figure 3) for initialization. There exist some | |||
faulty firewalls, load balancers, and intrusion detection systems in | faulty firewalls, load balancers, and intrusion detection systems in | |||
the Internet that either drop an ECN-setup SYN packet or respond with | the Internet that either drop an ECN-setup SYN packet or respond with | |||
a RST, in the belief that such a packet (with these bits set) is a | a RST, in the belief that such a packet (with these bits set) is a | |||
signature for a port-scanning tool that could be used in a denial-of- | signature for a port-scanning tool that could be used in a denial- | |||
service attack. Some of the offending equipment has been identified, | of-service attack. Some of the offending equipment has been | |||
and a web page [FIXES] contains a list of non-compliant products and | identified, and a web page [FIXES] contains a list of non-compliant | |||
the fixes posted by the vendors, where these are available. The TBIT | products and the fixes posted by the vendors, where these are | |||
web page [TBIT] lists some of the web servers affected by this faulty | available. The TBIT web page [TBIT] lists some of the web servers | |||
equipment. We mention this in this document as a warning to the | affected by this faulty equipment. We mention this in this document | |||
community of this problem. | as a warning to the community of this problem. | |||
To provide robust connectivity even in the presence of such faulty | To provide robust connectivity even in the presence of such faulty | |||
equipment, a host that receives a RST in response to the transmission | equipment, a host that receives a RST in response to the transmission | |||
of an ECN-setup SYN packet MAY resend a SYN with CWR and ECE cleared. | of an ECN-setup SYN packet MAY resend a SYN with CWR and ECE cleared. | |||
This could result in a TCP connection being established without using | This could result in a TCP connection being established without using | |||
ECN. | ECN. | |||
A host that receives no reply to an ECN-setup SYN within the normal | A host that receives no reply to an ECN-setup SYN within the normal | |||
SYN retransmission timeout interval MAY resend the SYN and any | SYN retransmission timeout interval MAY resend the SYN and any | |||
subsequent SYN retransmissions with CWR and ECE cleared. To overcome | subsequent SYN retransmissions with CWR and ECE cleared. To overcome | |||
normal packet loss that results in the original SYN being lost, the | normal packet loss that results in the original SYN being lost, the | |||
originating host may retransmit one or more ECN-setup SYN packets | originating host may retransmit one or more ECN-setup SYN packets | |||
before giving up and retransmitting the SYN with the CWR and ECE bits | before giving up and retransmitting the SYN with the CWR and ECE bits | |||
cleared. | cleared. | |||
We note that in this case, the following example scenario is possi- | We note that in this case, the following example scenario is | |||
ble: | possible: | |||
(1) Host A: Sends an ECN-setup SYN. | (1) Host A: Sends an ECN-setup SYN. | |||
(2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or delayed. | (2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or delayed. | |||
(3) Host A: Sends a non-ECN-setup SYN. | (3) Host A: Sends a non-ECN-setup SYN. | |||
(4) Host B: Sends a non-ECN-setup SYN/ACK. | (4) Host B: Sends a non-ECN-setup SYN/ACK. | |||
We note that in this case, following the procedures above, neither | We note that in this case, following the procedures above, neither | |||
Host A nor Host B may set the ECT bit on data packets. Further, an | Host A nor Host B may set the ECT bit on data packets. Further, an | |||
important consequence of the rules for ECN setup and usage in Section | important consequence of the rules for ECN setup and usage in Section | |||
6.1.1 is that a host is forbidden from using the reception of ECT | 6.1.1 is that a host is forbidden from using the reception of ECT | |||
skipping to change at page 19, line 4 | skipping to change at page 18, line 14 | |||
6.1.2. The TCP Sender | 6.1.2. The TCP Sender | |||
For a TCP connection using ECN, new data packets are transmitted with | For a TCP connection using ECN, new data packets are transmitted with | |||
an ECT codepoint set in the IP header. When only one ECT codepoint | an ECT codepoint set in the IP header. When only one ECT codepoint | |||
is needed by a sender for all packets sent on a TCP connection, | is needed by a sender for all packets sent on a TCP connection, | |||
ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE) ACK | ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE) ACK | |||
packet (that is, an ACK packet with the ECN-Echo flag set in the TCP | packet (that is, an ACK packet with the ECN-Echo flag set in the TCP | |||
header), then the sender knows that congestion was encountered in the | header), then the sender knows that congestion was encountered in the | |||
network on the path from the sender to the receiver. The indication | network on the path from the sender to the receiver. The indication | |||
of congestion should be treated just as a congestion loss in non-ECN- | of congestion should be treated just as a congestion loss in non- | |||
Capable TCP. That is, the TCP source halves the congestion window | ECN-Capable TCP. That is, the TCP source halves the congestion window | |||
"cwnd" and reduces the slow start threshold "ssthresh". The sending | "cwnd" and reduces the slow start threshold "ssthresh". The sending | |||
TCP SHOULD NOT increase the congestion window in response to the | TCP SHOULD NOT increase the congestion window in response to the | |||
receipt of an ECN-Echo ACK packet. | receipt of an ECN-Echo ACK packet. | |||
TCP should not react to congestion indications more than once every | TCP should not react to congestion indications more than once every | |||
window of data (or more loosely, more than once every round-trip | window of data (or more loosely, more than once every round-trip | |||
time). That is, the TCP sender's congestion window should be reduced | time). That is, the TCP sender's congestion window should be reduced | |||
only once in response to a series of dropped and/or CE packets from a | only once in response to a series of dropped and/or CE packets from a | |||
single window of data. In addition, the TCP source should not | single window of data. In addition, the TCP source should not | |||
decrease the slow-start threshold, ssthresh, if it has been decreased | decrease the slow-start threshold, ssthresh, if it has been decreased | |||
within the last round trip time. However, if any retransmitted | within the last round trip time. However, if any retransmitted | |||
packets are dropped, then this is interpreted by the source TCP as a | packets are dropped, then this is interpreted by the source TCP as a | |||
new instance of congestion. | new instance of congestion. | |||
After the source TCP reduces its congestion window in response to a | After the source TCP reduces its congestion window in response to a | |||
CE packet, incoming acknowledgements that continue to arrive can | CE packet, incoming acknowledgments that continue to arrive can | |||
"clock out" outgoing packets as allowed by the reduced congestion | "clock out" outgoing packets as allowed by the reduced congestion | |||
window. If the congestion window consists of only one MSS (maximum | window. If the congestion window consists of only one MSS (maximum | |||
segment size), and the sending TCP receives an ECN-Echo ACK packet, | segment size), and the sending TCP receives an ECN-Echo ACK packet, | |||
then the sending TCP should in principle still reduce its congestion | then the sending TCP should in principle still reduce its congestion | |||
window in half. However, the value of the congestion window is | window in half. However, the value of the congestion window is | |||
bounded below by a value of one MSS. If the sending TCP were to | bounded below by a value of one MSS. If the sending TCP were to | |||
continue to send, using a congestion window of 1 MSS, this results in | continue to send, using a congestion window of 1 MSS, this results in | |||
the transmission of one packet per round-trip time. It is necessary | the transmission of one packet per round-trip time. It is necessary | |||
to still reduce the sending rate of the TCP sender even further, on | to still reduce the sending rate of the TCP sender even further, on | |||
receipt of an ECN-Echo packet when the congestion window is one. We | receipt of an ECN-Echo packet when the congestion window is one. We | |||
skipping to change at page 20, line 17 | skipping to change at page 19, line 26 | |||
new data packet that it transmits. | new data packet that it transmits. | |||
[Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] | [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] | |||
discusses the validation test in the ns simulator, which illustrates | discusses the validation test in the ns simulator, which illustrates | |||
a wide range of ECN scenarios. These scenarios include the following: | a wide range of ECN scenarios. These scenarios include the following: | |||
an ECN followed by another ECN, a Fast Retransmit, or a Retransmit | an ECN followed by another ECN, a Fast Retransmit, or a Retransmit | |||
Timeout; a Retransmit Timeout or a Fast Retransmit followed by an | Timeout; a Retransmit Timeout or a Fast Retransmit followed by an | |||
ECN; and a congestion window of one packet followed by an ECN. | ECN; and a congestion window of one packet followed by an ECN. | |||
TCP follows existing algorithms for sending data packets in response | TCP follows existing algorithms for sending data packets in response | |||
to incoming ACKs, multiple duplicate acknowledgements, or retransmit | to incoming ACKs, multiple duplicate acknowledgments, or retransmit | |||
timeouts [RFC2581]. TCP also follows the normal procedures for | timeouts [RFC2581]. TCP also follows the normal procedures for | |||
increasing the congestion window when it receives ACK packets without | increasing the congestion window when it receives ACK packets without | |||
the ECN-Echo bit set [RFC2581]. | the ECN-Echo bit set [RFC2581]. | |||
6.1.3. The TCP Receiver | 6.1.3. The TCP Receiver | |||
When TCP receives a CE data packet at the destination end-system, the | When TCP receives a CE data packet at the destination end-system, the | |||
TCP data receiver sets the ECN-Echo flag in the TCP header of the | TCP data receiver sets the ECN-Echo flag in the TCP header of the | |||
subsequent ACK packet. If there is any ACK withholding implemented, | subsequent ACK packet. If there is any ACK withholding implemented, | |||
as in current "delayed-ACK" TCP implementations where the TCP | as in current "delayed-ACK" TCP implementations where the TCP | |||
receiver can send an ACK for two arriving data packets, then the ECN- | receiver can send an ACK for two arriving data packets, then the | |||
Echo flag in the ACK packet will be set to '1' if the CE codepoint is | ECN-Echo flag in the ACK packet will be set to '1' if the CE | |||
set in any of the data packets being acknowledged. That is, if any | codepoint is set in any of the data packets being acknowledged. That | |||
of the received data packets are CE packets, then the returning ACK | is, if any of the received data packets are CE packets, then the | |||
has the ECN-Echo flag set. | returning ACK has the ECN-Echo flag set. | |||
To provide robustness against the possibility of a dropped ACK packet | To provide robustness against the possibility of a dropped ACK packet | |||
carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in | carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in | |||
a series of ACK packets sent subsequently. The TCP receiver uses the | a series of ACK packets sent subsequently. The TCP receiver uses the | |||
CWR flag received from the TCP sender to determine when to stop | CWR flag received from the TCP sender to determine when to stop | |||
setting the ECN-Echo flag. | setting the ECN-Echo flag. | |||
After a TCP receiver sends an ACK packet with the ECN-Echo bit set, | After a TCP receiver sends an ACK packet with the ECN-Echo bit set, | |||
that TCP receiver continues to set the ECN-Echo flag in all the ACK | that TCP receiver continues to set the ECN-Echo flag in all the ACK | |||
packets it sends (whether they acknowledge CE data packets or non-CE | packets it sends (whether they acknowledge CE data packets or non-CE | |||
data packets) until it receives a CWR packet (a packet with the CWR | data packets) until it receives a CWR packet (a packet with the CWR | |||
flag set). After the receipt of the CWR packet, acknowledgements for | flag set). After the receipt of the CWR packet, acknowledgments for | |||
subsequent non-CE data packets do not have the ECN-Echo flag set. If | subsequent non-CE data packets do not have the ECN-Echo flag set. If | |||
another CE packet is received by the data receiver, the receiver | another CE packet is received by the data receiver, the receiver | |||
would once again send ACK packets with the ECN-Echo flag set. While | would once again send ACK packets with the ECN-Echo flag set. While | |||
the receipt of a CWR packet does not guarantee that the data sender | the receipt of a CWR packet does not guarantee that the data sender | |||
received the ECN-Echo message, this does suggest that the data sender | received the ECN-Echo message, this does suggest that the data sender | |||
reduced its congestion window at some point *after* it sent the data | reduced its congestion window at some point *after* it sent the data | |||
packet for which the CE codepoint was set. | packet for which the CE codepoint was set. | |||
We have already specified that a TCP sender is not required to reduce | We have already specified that a TCP sender is not required to reduce | |||
its congestion window more than once per window of data. Some care | its congestion window more than once per window of data. Some care | |||
skipping to change at page 24, line 44 | skipping to change at page 24, line 11 | |||
the CE codepoint set at that router itself; in such an environment, | the CE codepoint set at that router itself; in such an environment, | |||
routers could also take note of arriving CE packets that indicate | routers could also take note of arriving CE packets that indicate | |||
congestion encountered by that packet earlier in the path. | congestion encountered by that packet earlier in the path. | |||
8. Non-compliance in the Network | 8. Non-compliance in the Network | |||
This section considers the issues when a router is operating, | This section considers the issues when a router is operating, | |||
possibly maliciously, to modify either of the bits in the ECN field. | possibly maliciously, to modify either of the bits in the ECN field. | |||
We note that in IPv4, the IP header is protected from bit errors by a | We note that in IPv4, the IP header is protected from bit errors by a | |||
header checksum; this is not the case in IPv6. Thus for IPv6 the | header checksum; this is not the case in IPv6. Thus for IPv6 the | |||
ECN field can be accidentially modified by bit errors on links or in | ECN field can be accidentally modified by bit errors on links or in | |||
routers without being detected by an IP header checksum. | routers without being detected by an IP header checksum. | |||
By tampering with the bits in the ECN field, an adversary (or a | By tampering with the bits in the ECN field, an adversary (or a | |||
broken router) could do one or more of the following: falsely report | broken router) could do one or more of the following: falsely report | |||
congestion, disable ECN-Capability for an individual packet, erase | congestion, disable ECN-Capability for an individual packet, erase | |||
the ECN congestion indication, or falsely indicate ECN-Capability. | the ECN congestion indication, or falsely indicate ECN-Capability. | |||
Section 18 systematically examines the various cases by which the ECN | Section 18 systematically examines the various cases by which the ECN | |||
field could be modified. The important criterion considered in | field could be modified. The important criterion considered in | |||
determining the consequences of such modifications is whether it is | determining the consequences of such modifications is whether it is | |||
likely to lead to poorer behavior in any dimension (throughput, | likely to lead to poorer behavior in any dimension (throughput, | |||
skipping to change at page 27, line 24 | skipping to change at page 26, line 37 | |||
support for ECN an option for IP tunnels, so that an IP tunnel can be | support for ECN an option for IP tunnels, so that an IP tunnel can be | |||
specified or configured either to use ECN or not to use ECN in the | specified or configured either to use ECN or not to use ECN in the | |||
outer header of the tunnel. Thus, in environments or tunneling | outer header of the tunnel. Thus, in environments or tunneling | |||
protocols where the risks of using ECN are judged to outweigh its | protocols where the risks of using ECN are judged to outweigh its | |||
benefits, the tunnel can simply not use ECN in the outer header. | benefits, the tunnel can simply not use ECN in the outer header. | |||
Then the only indication of congestion experienced at routers within | Then the only indication of congestion experienced at routers within | |||
the tunnel would be through packet loss. | the tunnel would be through packet loss. | |||
The result is that there are two viable options for the behavior of | The result is that there are two viable options for the behavior of | |||
ECN-capable connections over an IP tunnel, including IPsec tunnels: | ECN-capable connections over an IP tunnel, including IPsec tunnels: | |||
* A limited-functionality option in which ECN is preserved in the | * A limited-functionality option in which ECN is preserved in the | |||
inner header, but disabled in the outer header. The only | inner header, but disabled in the outer header. The only | |||
mechanism available for signaling congestion occurring within the | mechanism available for signaling congestion occurring within | |||
tunnel in this case is dropped packets. | the tunnel in this case is dropped packets. | |||
* A full-functionality option that supports ECN in both the inner | * A full-functionality option that supports ECN in both the inner | |||
and outer headers, and propagates congestion warnings from nodes | and outer headers, and propagates congestion warnings from nodes | |||
within the tunnel to endpoints. | within the tunnel to endpoints. | |||
Support for these options requires varying amounts of changes to IP | Support for these options requires varying amounts of changes to IP | |||
header processing at tunnel ingress and egress. A small subset of | header processing at tunnel ingress and egress. A small subset of | |||
these changes sufficient to support only the limited-functionality | these changes sufficient to support only the limited-functionality | |||
option would be sufficient to eliminate any incompatibility between | option would be sufficient to eliminate any incompatibility between | |||
ECN and IP tunnels. | ECN and IP tunnels. | |||
One goal of this document is to give guidance about the tradeoffs | One goal of this document is to give guidance about the tradeoffs | |||
between the limited-functionality and full-functionality options. A | between the limited-functionality and full-functionality options. A | |||
full discussion of the potential effects of an adversary's | full discussion of the potential effects of an adversary's | |||
skipping to change at page 29, line 12 | skipping to change at page 28, line 29 | |||
pre-existing agreement between the tunnel endpoints about whether to | pre-existing agreement between the tunnel endpoints about whether to | |||
support the limited-functionality or the full-functionality ECN | support the limited-functionality or the full-functionality ECN | |||
option. | option. | |||
All IP tunnels MUST implement the limited-functionality option, and | All IP tunnels MUST implement the limited-functionality option, and | |||
SHOULD support the full-functionality option. | SHOULD support the full-functionality option. | |||
In addition, it is RECOMMENDED that packets with the CE codepoint in | In addition, it is RECOMMENDED that packets with the CE codepoint in | |||
the outer header be dropped if they arrive at the tunnel egress point | the outer header be dropped if they arrive at the tunnel egress point | |||
for a tunnel that uses the limited-functionality option, or for a | for a tunnel that uses the limited-functionality option, or for a | |||
tunnel that uses the full-functionality option but for which the not- | tunnel that uses the full-functionality option but for which the | |||
ECT codepoint is set in the inner header. This is motivated by | not-ECT codepoint is set in the inner header. This is motivated by | |||
backwards compatibility and to ensure that no unauthorized | backwards compatibility and to ensure that no unauthorized | |||
modifications of the ECN field take place, and is discussed further | modifications of the ECN field take place, and is discussed further | |||
in the next Section (9.1.2). | in the next Section (9.1.2). | |||
9.1.2. Changes to the ECN Field within an IP Tunnel. | 9.1.2. Changes to the ECN Field within an IP Tunnel. | |||
The presence of a copy of the ECN field in the inner header of an IP | The presence of a copy of the ECN field in the inner header of an IP | |||
tunnel mode packet provides an opportunity for detection of | tunnel mode packet provides an opportunity for detection of | |||
unauthorized modifications to the ECN field in the outer header. | unauthorized modifications to the ECN field in the outer header. | |||
Comparison of the ECT fields in the inner and outer headers falls | Comparison of the ECT fields in the inner and outer headers falls | |||
skipping to change at page 29, line 26 | skipping to change at page 28, line 43 | |||
in the next Section (9.1.2). | in the next Section (9.1.2). | |||
9.1.2. Changes to the ECN Field within an IP Tunnel. | 9.1.2. Changes to the ECN Field within an IP Tunnel. | |||
The presence of a copy of the ECN field in the inner header of an IP | The presence of a copy of the ECN field in the inner header of an IP | |||
tunnel mode packet provides an opportunity for detection of | tunnel mode packet provides an opportunity for detection of | |||
unauthorized modifications to the ECN field in the outer header. | unauthorized modifications to the ECN field in the outer header. | |||
Comparison of the ECT fields in the inner and outer headers falls | Comparison of the ECT fields in the inner and outer headers falls | |||
into two categories for implementations that conform to this | into two categories for implementations that conform to this | |||
document: | document: | |||
* If the IP tunnel uses the full-functionality option, then the | * If the IP tunnel uses the full-functionality option, then the | |||
not-ECT codepoint should be set in the outer header if and only if | not-ECT codepoint should be set in the outer header if and only | |||
it is also set in the inner header. | if it is also set in the inner header. | |||
* If the tunnel uses the limited-functionality option, then the | * If the tunnel uses the limited-functionality option, then the | |||
not-ECT codepoint should be set in the outer header. | not-ECT codepoint should be set in the outer header. | |||
Receipt of a packet not satisfying the appropriate condition could be | Receipt of a packet not satisfying the appropriate condition could be | |||
a cause of concern. | a cause of concern. | |||
Consider the case of an IP tunnel where the tunnel ingress point has | Consider the case of an IP tunnel where the tunnel ingress point has | |||
not been updated to this document's requirements, while the tunnel | not been updated to this document's requirements, while the tunnel | |||
egress point has been updated to support ECN. In this case, the IP | egress point has been updated to support ECN. In this case, the IP | |||
tunnel is not explicitly configured to support the full-functionality | tunnel is not explicitly configured to support the full-functionality | |||
ECN option. However, the tunnel ingress point is behaving identically | ECN option. However, the tunnel ingress point is behaving identically | |||
to a tunnel ingress point that supports the full-functionality | to a tunnel ingress point that supports the full-functionality | |||
skipping to change at page 32, line 10 | skipping to change at page 31, line 24 | |||
In some environments, the ability to modify the ECN field without | In some environments, the ability to modify the ECN field without | |||
affecting IPsec integrity checks may constitute a covert channel; if | affecting IPsec integrity checks may constitute a covert channel; if | |||
it is necessary to eliminate such a channel or reduce its bandwidth, | it is necessary to eliminate such a channel or reduce its bandwidth, | |||
then the IPsec tunnel should be run in limited-functionality mode. | then the IPsec tunnel should be run in limited-functionality mode. | |||
9.2.1. Negotiation between Tunnel Endpoints | 9.2.1. Negotiation between Tunnel Endpoints | |||
This section describes the detailed changes to enable usage of ECN | This section describes the detailed changes to enable usage of ECN | |||
over IPsec tunnels, including the negotiation of ECN support between | over IPsec tunnels, including the negotiation of ECN support between | |||
tunnel endpoints. This is supported by three changes to IPsec: | tunnel endpoints. This is supported by three changes to IPsec: | |||
* An optional Security Association Database (SAD) field indicating | * An optional Security Association Database (SAD) field indicating | |||
whether tunnel encapsulation and decapsulation processing allows | whether tunnel encapsulation and decapsulation processing allows | |||
or forbids ECN usage in the outer IP header. | or forbids ECN usage in the outer IP header. | |||
* An optional Security Association Attribute that enables | * An optional Security Association Attribute that enables | |||
negotiation of this SAD field between the two endpoints of an SA | negotiation of this SAD field between the two endpoints of an SA | |||
that supports tunnel mode. | that supports tunnel mode. | |||
* Changes to tunnel mode encapsulation and decapsulation | * Changes to tunnel mode encapsulation and decapsulation | |||
processing to allow or forbid ECN usage in the outer IP header | processing to allow or forbid ECN usage in the outer IP header | |||
based on the value of the SAD field. When ECN usage is allowed in | based on the value of the SAD field. When ECN usage is allowed | |||
the outer IP header, the ECT codepoint is set in the outer header | in the outer IP header, the ECT codepoint is set in the outer | |||
for ECN-capable connections and congestion notifications | header for ECN-capable connections and congestion notifications | |||
(indicated by the CE codepoint) from such connections are | (indicated by the CE codepoint) from such connections are | |||
propagated to the inner header at tunnel egress. | propagated to the inner header at tunnel egress. | |||
If negotiation of ECN usage is implemented, then the SAD field SHOULD | If negotiation of ECN usage is implemented, then the SAD field SHOULD | |||
also be implemented. On the other hand, negotiation of ECN usage is | also be implemented. On the other hand, negotiation of ECN usage is | |||
OPTIONAL in all cases, even for implementations that support the SAD | OPTIONAL in all cases, even for implementations that support the SAD | |||
field. The encapsulation and decapsulation processing changes are | field. The encapsulation and decapsulation processing changes are | |||
REQUIRED, but MAY be implemented without the other two changes by | REQUIRED, but MAY be implemented without the other two changes by | |||
assuming that ECN usage is always forbidden. The full-functionality | assuming that ECN usage is always forbidden. The full-functionality | |||
alternative for ECN usage over IPsec tunnels consists of the SAD | alternative for ECN usage over IPsec tunnels consists of the SAD | |||
field and the full version of encapsulation and decapsulation | field and the full version of encapsulation and decapsulation | |||
processing changes, with or without the OPTIONAL negotiation support. | processing changes, with or without the OPTIONAL negotiation support. | |||
skipping to change at page 33, line 35 | skipping to change at page 33, line 7 | |||
The IPsec SA Attribute value 10 has been allocated by IANA to | The IPsec SA Attribute value 10 has been allocated by IANA to | |||
indicate that the ECN Tunnel SA Attribute is being negotiated; the | indicate that the ECN Tunnel SA Attribute is being negotiated; the | |||
type of this attribute is Basic (see Section 4.5 of [RFC2407]). The | type of this attribute is Basic (see Section 4.5 of [RFC2407]). The | |||
Class Values are used to conduct the negotiation. See [RFC2407, | Class Values are used to conduct the negotiation. See [RFC2407, | |||
RFC2408, RFC2409] for further information including encoding formats | RFC2408, RFC2409] for further information including encoding formats | |||
and requirements for negotiating this SA attribute. | and requirements for negotiating this SA attribute. | |||
Class Values | Class Values | |||
ECN Tunnel | ECN Tunnel | |||
Specifies whether ECN functionality is allowed to | Specifies whether ECN functionality is allowed to be used with Tunnel | |||
be used with Tunnel Encapsulation Mode. | Encapsulation Mode. This affects tunnel encapsulation and | |||
This affects tunnel encapsulation and decapsulation processing - | decapsulation processing - see Section 9.2.1.3. | |||
see Section 9.2.1.3. | ||||
RESERVED 0 | RESERVED 0 | |||
Allowed 1 | Allowed 1 | |||
Forbidden 2 | Forbidden 2 | |||
Values 3-61439 are reserved to IANA. Values 61440-65535 are for | Values 3-61439 are reserved to IANA. Values 61440-65535 are for | |||
private use. | private use. | |||
If unspecified, the default shall be assumed to be Forbidden. | If unspecified, the default shall be assumed to be Forbidden. | |||
ECN Tunnel is a new SA attribute, and hence initiators that use it | ECN Tunnel is a new SA attribute, and hence initiators that use it | |||
can expect to encounter responders that do not understand it, and | can expect to encounter responders that do not understand it, and | |||
therefore reject proposals containing it. For backwards | therefore reject proposals containing it. For backwards | |||
compatibility with such implementations initiators SHOULD always also | compatibility with such implementations initiators SHOULD always also | |||
include a proposal without the ECN Tunnel attribute to enable such a | include a proposal without the ECN Tunnel attribute to enable such a | |||
responder to select a transform or proposal that does not contain the | responder to select a transform or proposal that does not contain the | |||
ECN Tunnel attribute. RFC 2407 currently requires responders to | ECN Tunnel attribute. RFC 2407 currently requires responders to | |||
reject all proposals if any proposal contains an unknown attribute; | reject all proposals if any proposal contains an unknown attribute; | |||
this requirement is expected to be changed to require a responder not | this requirement is expected to be changed to require a responder not | |||
to select proposals or transforms containing unknown attributes. | to select proposals or transforms containing unknown attributes. | |||
9.2.1.3. Changes to IPsec Tunnel Header Processing | 9.2.1.3. Changes to IPsec Tunnel Header Processing | |||
For full ECN support, the encapsulation and decapsulation processing | For full ECN support, the encapsulation and decapsulation processing | |||
for the IPv4 TOS field and the IPv6 Traffic Class field are changed | for the IPv4 TOS field and the IPv6 Traffic Class field are changed | |||
from that specified in [RFC2401] to the following: | from that specified in [RFC2401] to the following: | |||
<-- How Outer Hdr Relates to Inner Hdr --> | <-- How Outer Hdr Relates to Inner Hdr --> | |||
Outer Hdr at Inner Hdr at | Outer Hdr at Inner Hdr at | |||
IPv4 Encapsulator Decapsulator | IPv4 Encapsulator Decapsulator | |||
Header fields: -------------------- ------------ | Header fields: -------------------- ------------ | |||
DS Field copied from inner hdr (5) no change | DS Field copied from inner hdr (5) no change | |||
ECN Field constructed (7) constructed (8) | ECN Field constructed (7) constructed (8) | |||
IPv6 | IPv6 | |||
Header fields: | Header fields: | |||
DS Field copied from inner hdr (6) no change | DS Field copied from inner hdr (6) no change | |||
ECN Field constructed (7) constructed (8) | ECN Field constructed (7) constructed (8) | |||
(5)(6) If the packet will immediately enter a domain for which the | (5)(6) If the packet will immediately enter a domain for which the | |||
DSCP value in the outer header is not appropriate, that value MUST | DSCP value in the outer header is not appropriate, that value MUST | |||
be mapped to an appropriate value for the domain [RFC 2474]. Also | be mapped to an appropriate value for the domain [RFC 2474]. Also | |||
see [RFC 2475] for further information. | see [RFC 2475] for further information. | |||
(7) If the value of the ECN Tunnel field in the SAD entry for this | (7) If the value of the ECN Tunnel field in the SAD entry for this | |||
SA is "allowed" and the ECN field in the inner header is set to | SA is "allowed" and the ECN field in the inner header is set to | |||
any value other than CE, copy this ECN field to the outer header. | any value other than CE, copy this ECN field to the outer header. | |||
If the ECN field in the inner header is set to CE, then set the | If the ECN field in the inner header is set to CE, then set the | |||
skipping to change at page 35, line 8 | skipping to change at page 34, line 29 | |||
CE, then copy the ECN field from the outer header to the inner | CE, then copy the ECN field from the outer header to the inner | |||
header. Otherwise, make no change to the ECN field in the inner | header. Otherwise, make no change to the ECN field in the inner | |||
header. | header. | |||
(5) and (6) are identical to match usage in [RFC2401], although | (5) and (6) are identical to match usage in [RFC2401], although | |||
they are different in [RFC2401]. | they are different in [RFC2401]. | |||
The above description applies to implementations that support the ECN | The above description applies to implementations that support the ECN | |||
Tunnel field in the SAD; such implementations MUST implement this | Tunnel field in the SAD; such implementations MUST implement this | |||
processing instead of the processing of the IPv4 TOS octet and IPv6 | processing instead of the processing of the IPv4 TOS octet and IPv6 | |||
Traffic Class octet defined in [RFC2401]. This constitutes the full- | Traffic Class octet defined in [RFC2401]. This constitutes the | |||
functionality alternative for ECN usage with IPsec tunnels. | full-functionality alternative for ECN usage with IPsec tunnels. | |||
An implementation that does not support the ECN Tunnel field in the | An implementation that does not support the ECN Tunnel field in the | |||
SAD MUST implement this processing by assuming that the value of the | SAD MUST implement this processing by assuming that the value of the | |||
ECN Tunnel field of the SAD is "forbidden" for every SA. In this | ECN Tunnel field of the SAD is "forbidden" for every SA. In this | |||
case, the processing of the ECN field reduces to: | case, the processing of the ECN field reduces to: | |||
(7) Set the ECN field to not-ECT in the outer header. | (7) Set the ECN field to not-ECT in the outer header. | |||
(8) Make no change to the ECN field in the inner header. | (8) Make no change to the ECN field in the inner header. | |||
This constitutes the limited functionality alternative for ECN usage | This constitutes the limited functionality alternative for ECN usage | |||
skipping to change at page 37, line 14 | skipping to change at page 36, line 39 | |||
available to that flow. Thus, initially, the router may drop packets | available to that flow. Thus, initially, the router may drop packets | |||
in which the router would otherwise would have set the CE codepoint. | in which the router would otherwise would have set the CE codepoint. | |||
This could include dropping those arriving packets for that flow that | This could include dropping those arriving packets for that flow that | |||
are ECN-Capable and that already have the CE codepoint set. In this | are ECN-Capable and that already have the CE codepoint set. In this | |||
way, any congestion indications seen by that router for that flow | way, any congestion indications seen by that router for that flow | |||
will be guaranteed to also be seen by the end nodes, even in the | will be guaranteed to also be seen by the end nodes, even in the | |||
presence of malicious or broken routers elsewhere in the path. If we | presence of malicious or broken routers elsewhere in the path. If we | |||
assume that the first action taken at any "penalty box" for an ECN- | assume that the first action taken at any "penalty box" for an ECN- | |||
capable flow will be to drop packets instead of marking them, then | capable flow will be to drop packets instead of marking them, then | |||
there is no way that an adversary that subverts ECN-based end-to-end | there is no way that an adversary that subverts ECN-based end-to-end | |||
congestion control can cause a flow to be characterized as being non- | congestion control can cause a flow to be characterized as being | |||
cooperative and placed into a more severe action within the "penalty | non-cooperative and placed into a more severe action within the | |||
box". | "penalty box". | |||
The monitoring and policing devices that are actually deployed could | The monitoring and policing devices that are actually deployed could | |||
fall short of the `ideal' monitoring device described above, in that | fall short of the `ideal' monitoring device described above, in that | |||
the monitoring is applied not to a single flow, but to an aggregate | the monitoring is applied not to a single flow, but to an aggregate | |||
of flows (e.g., those sharing a single IPsec tunnel). In this case, | of flows (e.g., those sharing a single IPsec tunnel). In this case, | |||
the switch from marking to dropping would apply to all of the flows | the switch from marking to dropping would apply to all of the flows | |||
in that aggregate, denying the benefits of ECN to the other flows in | in that aggregate, denying the benefits of ECN to the other flows in | |||
the aggregate also. At the highest level of aggregation, another | the aggregate also. At the highest level of aggregation, another | |||
form of the disabling of ECN happens even in the absence of | form of the disabling of ECN happens even in the absence of | |||
monitoring and policing devices, when ECN-Capable RED queues switch | monitoring and policing devices, when ECN-Capable RED queues switch | |||
skipping to change at page 39, line 51 | skipping to change at page 39, line 34 | |||
For IPsec tunnels, this document also defines an optional IPsec | For IPsec tunnels, this document also defines an optional IPsec | |||
Security Association (SA) attribute that enables negotiation of ECN | Security Association (SA) attribute that enables negotiation of ECN | |||
usage within IPsec tunnels and an optional field in the Security | usage within IPsec tunnels and an optional field in the Security | |||
Association Database to indicate whether ECN is permitted in tunnel | Association Database to indicate whether ECN is permitted in tunnel | |||
mode on a SA. The required changes to IPsec tunnels for ECN usage | mode on a SA. The required changes to IPsec tunnels for ECN usage | |||
modify RFC 2401 [RFC2401], which defines the IPsec architecture and | modify RFC 2401 [RFC2401], which defines the IPsec architecture and | |||
specifies some aspects of its implementation. The new IPsec SA | specifies some aspects of its implementation. The new IPsec SA | |||
attribute is in addition to those already defined in Section 4.5 of | attribute is in addition to those already defined in Section 4.5 of | |||
[RFC2407]. | [RFC2407]. | |||
This document is intended to obsolete RFC 2481, "A Proposal to add | This document obsoletes RFC 2481, "A Proposal to add Explicit | |||
Explicit Congestion Notification (ECN) to IP", which defined ECN as | Congestion Notification (ECN) to IP", which defined ECN as an | |||
an Experimental Protocol for the Internet Community. The rest of | Experimental Protocol for the Internet Community. The rest of this | |||
this section describes the relationship between this document and its | section describes the relationship between this document and its | |||
predecessor. | predecessor. | |||
RFC 2481 included a brief discussion of the use of ECN with | RFC 2481 included a brief discussion of the use of ECN with | |||
encapsulated packets, and noted that for the IPsec specifications at | encapsulated packets, and noted that for the IPsec specifications at | |||
the time (January 1999), flows could not safely use ECN if they were | the time (January 1999), flows could not safely use ECN if they were | |||
to traverse IPsec tunnels. RFC 2481 also described the changes that | to traverse IPsec tunnels. RFC 2481 also described the changes that | |||
could be made to IPsec tunnel specifications to made them compatible | could be made to IPsec tunnel specifications to made them compatible | |||
with ECN. | with ECN. | |||
This document also incorporates work that was done after RFC 2481. | This document also incorporates work that was done after RFC 2481. | |||
skipping to change at page 42, line 9 | skipping to change at page 41, line 37 | |||
related discussions and documents from the Differentiated Services | related discussions and documents from the Differentiated Services | |||
Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, | Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, | |||
for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen | for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen | |||
for proposing modifications to RFC 2407 that improve the usability of | for proposing modifications to RFC 2407 that improve the usability of | |||
negotiating the ECN Tunnel SA attribute. | negotiating the ECN Tunnel SA attribute. | |||
We thank David Wetherall, David Ely, and Neil Spring for the proposal | We thank David Wetherall, David Ely, and Neil Spring for the proposal | |||
for the ECN nonce. We also thank Stefan Savage for discussions on | for the ECN nonce. We also thank Stefan Savage for discussions on | |||
this issue. We thank Bob Briscoe and Jon Crowcroft for raising the | this issue. We thank Bob Briscoe and Jon Crowcroft for raising the | |||
issue of fragmentation in IP, on alternate semantics for the fourth | issue of fragmentation in IP, on alternate semantics for the fourth | |||
ECN codepoint, and several other topics. We thank Richard Wendland | ECN codepoint, and several other topics. We thank Richard Wendland | |||
for feedback on several issues in the draft. | for feedback on several issues in the document. | |||
We also thank the IESG, and in particular the Transport Area | We also thank the IESG, and in particular the Transport Area | |||
Directors over the years, for their feedback and their work towards | Directors over the years, for their feedback and their work towards | |||
the standardization of ECN. | the standardization of ECN. | |||
15. References | 15. References | |||
[AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, | [AH] Kent, S. and R. Atkinson, "IP Authentication Header", | |||
November 1998. | RFC 2402, November 1998. | |||
[ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". | [ECN] "The ECN Web Page", URL | |||
Reference for informational purposes only. | "http://www.aciri.org/floyd/ecn.html". Reference for | |||
informational purposes only. | ||||
[ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", | [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security | |||
RFC 2406, November 1998. | Payload", RFC 2406, November 1998. | |||
[FIXES] ECN-under-Linux Unofficial Vendor Support Page, URL | [FIXES] ECN-under-Linux Unofficial Vendor Support Page, URL | |||
"http://gtf.org/garzik/ecn/". Reference for informational purposes | "http://gtf.org/garzik/ecn/". Reference for | |||
only. | informational purposes only. | |||
[FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways | [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection | |||
for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 | gateways for Congestion Avoidance", IEEE/ACM | |||
N.4, August 1993, p. 397-413. | Transactions on Networking, V.1 N.4, August 1993, p. | |||
397-413. | ||||
[Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM | [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", | |||
Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. | ACM Computer Communication Review, V. 24 N. 5, October | |||
1994, p. 10-23. | ||||
[Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", | [Floyd98] Floyd, S., "The ECN Validation Test in the NS | |||
URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- | Simulator", URL "http://www-mash.cs.berkeley.edu/ns/", | |||
ecn. Reference for informational purposes only. | test tcl/test/test-all- ecn. Reference for | |||
informational purposes only. | ||||
[FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End | [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to- | |||
Congestion Control in the Internet", IEEE/ACM Transactions on | End Congestion Control in the Internet", IEEE/ACM | |||
Networking, August 1999. | Transactions on Networking, August 1999. | |||
[FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", | [FRED] Lin, D., and Morris, R., "Dynamics of Random Early | |||
SIGCOMM '97, September 1997. | Detection", SIGCOMM '97, September 1997. | |||
[GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing | [GRE] Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic | |||
Encapsulation (GRE), RFC 1701, October 1994. | Routing Encapsulation (GRE)", RFC 1701, October 1994. | |||
[Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. | [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. | |||
ACM SIGCOMM '88, pp. 314-329. | ||||
ACM SIGCOMM '88, pp. 314-329. | ||||
[Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance | [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance | |||
Algorithm", Message to end2end-interest mailing list, April 1990. URL | Algorithm", Message to end2end-interest mailing list, | |||
"ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". | April 1990. URL | |||
"ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". | ||||
[K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) | [K98] Krishnan, H., "Analyzing Explicit Congestion | |||
benefits for TCP", Master's thesis, UCLA, 1998. Citation for | Notification (ECN) benefits for TCP", Master's thesis, | |||
acknowledgement purposes only. | UCLA, 1998. Citation for acknowledgement purposes only. | |||
[L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. | [L2TP] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, | |||
Palter, Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. | G. and B. Palter, "Layer Two Tunneling Protocol "L2TP"", | |||
RFC 2661, August 1999. | ||||
[MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven | [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver- | |||
Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. | driven Layered Multicast", SIGCOMM '96, August 1996, pp. | |||
117-130. | ||||
[MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, | [MPLS] Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M. and J. | |||
Requirements for Traffic Engineering Over MPLS, RFC 2702, September | McManus, Requirements for Traffic Engineering Over MPLS, | |||
1999. | RFC 2702, September 1999. | |||
[PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. | [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, | |||
and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, | W. and G. Zorn, "Point-to-Point Tunneling Protocol | |||
July 1999. | (PPTP)", RFC 2637, July 1999. | |||
[RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September | [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, | |||
1981. | September 1981. | |||
[RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, | [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC | |||
September 1981. | 793, September 1981. | |||
[RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of the | [RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of | |||
Internet Checksum", RFC 1141, January 1990. | the Internet Checksum", RFC 1141, January 1990. | |||
[RFC1349] Almquist, P., "Type of Service in the Internet Protocol | [RFC1349] Almquist, P., "Type of Service in the Internet Protocol | |||
Suite", RFC 1349, July 1992. | Suite", RFC 1349, July 1992. | |||
[RFC1455] Eastlake, D., "Physical Link Security Type of Service", RFC | [RFC1455] Eastlake, D., "Physical Link Security Type of Service", | |||
1455, May 1993. | RFC 1455, May 1993. | |||
[RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic | [RFC1701] Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic | |||
Routing Encapsulation (GRE), RFC 1701, October 1994. | Routing Encapsulation (GRE)", RFC 1701, October 1994. | |||
[RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic | [RFC1702] Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic | |||
Routing Encapsulation over IPv4 networks, RFC 1702, October 1994. | Routing Encapsulation over IPv4 networks", RFC 1702, | |||
October 1994. | ||||
[RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October | [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, | |||
1996. | October 1996. | |||
[RFC2119] S. Bradner, Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC2309] Braden, B., et al., "Recommendations on Queue Management | [RFC2309] Braden, B., et al., "Recommendations on Queue Management | |||
and Congestion Avoidance in the Internet", RFC 2309, April 1998. | and Congestion Avoidance in the Internet", RFC 2309, | |||
April 1998. | ||||
[RFC2401] S. Kent and R. Atkinson, Security Architecture for the | [RFC2401] Kent, S. and R. Atkinson, Security Architecture for the | |||
Internet Protocol, RFC 2401, November 1998. | Internet Protocol, RFC 2401, November 1998. | |||
[RFC2407] D. Piper, The Internet IP Security Domain of Interpretation | [RFC2407] Piper, D., "The Internet IP Security Domain of | |||
for ISAKMP, RFC 2407, November 1998. | Interpretation for ISAKMP", RFC 2407, November 1998. | |||
[RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner, | [RFC2408] Maughan, D., Schertler, M., Schneider, M. and J. Turner, | |||
Internet Security Association and Key Management Protocol (ISAKMP), | "Internet Security Association and Key Management | |||
RFC 2409, November 1998. | Protocol (ISAKMP)", RFC 2409, November 1998. | |||
[RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), | [RFC2409] Harkins D. and D. Carrel, "The Internet Key Exchange | |||
RFC 2409, November 1998. | (IKE)", RFC 2409, November 1998. | |||
[RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition | [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, | |||
of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 | "Definition of the Differentiated Services Field (DS | |||
Headers", RFC 2474, December 1998. | Field) in the IPv4 and IPv6 Headers", RFC 2474, December | |||
1998. | ||||
[RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. | [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. | |||
Weiss, An Architecture for Differentiated Services, RFC 2475, | and W. Weiss, "An Architecture for Differentiated | |||
December 1998. | Services", RFC 2475, December 1998. | |||
[RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit | [RFC2481] Ramakrishnan K. and S. Floyd, "A Proposal to add | |||
Congestion Notification (ECN) to IP, RFC 2481, January 1999. | Explicit Congestion Notification (ECN) to IP", RFC 2481, | |||
January 1999. | ||||
[RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", | [RFC2581] Alman, M., Paxson, V. and W. Stevens, "TCP Congestion | |||
RFC 2581, April 1999. | Control", RFC 2581, April 1999. | |||
[RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation | [RFC2884] Hadi Salim, J. and U. Ahmed, "Performance Evaluation of | |||
of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, | Explicit Congestion Notification (ECN) in IP Networks", | |||
July 2000. | RFC 2884, July 2000. | |||
[RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, | [RFC2983] Black, D., "Differentiated Services and Tunnels", | |||
October 2000. | RFC2983, October 2000. | |||
[RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For | [RFC2780] Bradner S. and V. Paxson, "IANA Allocation Guidelines | |||
Values In the Internet Protocol and Related Headers", RFC 2780, March | For Values In the Internet Protocol and Related | |||
2000. | Headers", BCP 37, RFC 2780, March 2000. | |||
[RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for | [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback | |||
Congestion Avoidance in Computer Networks", ACM Transactions on | Scheme for Congestion Avoidance in Computer Networks", | |||
Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. | ACM Transactions on Computer Systems, Vol.8, No.2, pp. | |||
158-181, May 1990. | ||||
[SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom | [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom | |||
Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM | Anderson, TCP Congestion Control with a Misbehaving | |||
Computer Communications Review, October 1999. | Receiver, ACM Computer Communications Review, October | |||
1999. | ||||
[TBIT] Jitendra Padhye and Sally Floyd, "Identifying the TCP Behavior | [TBIT] Jitendra Padhye and Sally Floyd, "Identifying the TCP | |||
of Web Servers", ICSI TR-01-002, February 2001. URL | Behavior of Web Servers", ICSI TR-01-002, February 2001. | |||
"http://www.aciri.org/tbit/". | URL "http://www.aciri.org/tbit/". | |||
16. Security Considerations | 16. Security Considerations | |||
Security considerations have been discussed in Sections 7, 8, 18, and | Security considerations have been discussed in Sections 7, 8, 18, and | |||
19. | 19. | |||
17. IPv4 Header Checksum Recalculation | 17. IPv4 Header Checksum Recalculation | |||
IPv4 header checksum recalculation is an issue with some high-end | IPv4 header checksum recalculation is an issue with some high-end | |||
router architectures using an output-buffered switch, since most if | router architectures using an output-buffered switch, since most if | |||
not all of the header manipulation is performed on the input side of | not all of the header manipulation is performed on the input side of | |||
the switch, while the ECN decision would need to be made local to the | the switch, while the ECN decision would need to be made local to the | |||
output buffer. This is not an issue for IPv6, since there is no IPv6 | output buffer. This is not an issue for IPv6, since there is no IPv6 | |||
header checksum. The IPv4 TOS octet is the last byte of a 16-bit | header checksum. The IPv4 TOS octet is the last byte of a 16-bit | |||
half-word. | half-word. | |||
RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 | RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 | |||
checksum after the TTL field is decremented. The incremental | checksum after the TTL field is decremented. The incremental | |||
updating of the IPv4 checksum after the CE codepoint was set would | updating of the IPv4 checksum after the CE codepoint was set would | |||
work as follows: Let HC be the original header checksum for an ECT(0) | work as follows: Let HC be the original header checksum for an ECT(0) | |||
packet, and let HC' be the new header checksum after the CE checksum | packet, and let HC' be the new header checksum after the CE bit has | |||
has been set. That is, the ECN field has changed from '10' to '11'. | been set. That is, the ECN field has changed from '10' to '11'. | |||
Then for header checksums calculated with one's complement | Then for header checksums calculated with one's complement | |||
subtraction, HC' would be recalculated as follows: | subtraction, HC' would be recalculated as follows: | |||
HC' = { HC - 1 HC > 1 | HC' = { HC - 1 HC > 1 | |||
{ 0x0000 HC = 1 | { 0x0000 HC = 1 | |||
For header checksums calculated on two's complement machines, HC' would | For header checksums calculated on two's complement machines, HC' | |||
be recalculated as follows after the CE bit was set: | would be recalculated as follows after the CE bit was set: | |||
HC' = { HC - 1 HC > 0 | HC' = { HC - 1 HC > 0 | |||
{ 0xFFFE HC = 0 | { 0xFFFE HC = 0 | |||
A similar incremental updating of the IPv4 checksum can be carried out | A similar incremental updating of the IPv4 checksum can be carried | |||
when the ECN field is changed from ECT(1) to CE, that is, from '01' to | out when the ECN field is changed from ECT(1) to CE, that is, from ' | |||
'11'. | 01' to '11'. | |||
18. Possible Changes to the ECN Field in the Network | 18. Possible Changes to the ECN Field in the Network | |||
This section discusses in detail possible changes to the ECN field in | This section discusses in detail possible changes to the ECN field in | |||
the network, such as falsely reporting congestion, disabling ECN- | the network, such as falsely reporting congestion, disabling ECN- | |||
Capability for an individual packet, erasing the ECN congestion | Capability for an individual packet, erasing the ECN congestion | |||
indication, or falsely indicating ECN-Capability. | indication, or falsely indicating ECN-Capability. | |||
18.1. Possible Changes to the IP Header | 18.1. Possible Changes to the IP Header | |||
18.1.1. Erasing the Congestion Indication | 18.1.1. Erasing the Congestion Indication | |||
First, we consider the changes that a router could make that would | First, we consider the changes that a router could make that would | |||
result in effectively erasing the congestion indication after it had | result in effectively erasing the congestion indication after it had | |||
been set by a router upstream. The convention followed is: | been set by a router upstream. The convention followed is: ECN | |||
ECN codepoint of received packet -> ECN codepoint of packet | codepoint of received packet -> ECN codepoint of packet transmitted. | |||
transmitted. | ||||
Replacing the CE codepoint with the ECT(0) or ECT(1) codepoint | Replacing the CE codepoint with the ECT(0) or ECT(1) codepoint | |||
effectively erases the congestion indication. However, with the use | effectively erases the congestion indication. However, with the use | |||
of two ECT codepoints, a router erasing the CE codepoint has no way | of two ECT codepoints, a router erasing the CE codepoint has no way | |||
to know whether the original ECT codepoint was ECT(0) or ECT(1). | to know whether the original ECT codepoint was ECT(0) or ECT(1). | |||
Thus, it is possible for the transport protocol to deploy mechanisms | Thus, it is possible for the transport protocol to deploy mechanisms | |||
to detect such erasures of the CE codepoint. | to detect such erasures of the CE codepoint. | |||
The consequence of the erasure of the CE codepoint for the upstream | The consequence of the erasure of the CE codepoint for the upstream | |||
router is that there is a potential for congestion to build for a | router is that there is a potential for congestion to build for a | |||
skipping to change at page 49, line 5 | skipping to change at page 48, line 41 | |||
sequence numbers, and any attacker with this ability and with the | sequence numbers, and any attacker with this ability and with the | |||
ability to spoof IP source addresses could damage the TCP connection | ability to spoof IP source addresses could damage the TCP connection | |||
without using the ECN flags. Therefore, ECN does not add any new | without using the ECN flags. Therefore, ECN does not add any new | |||
vulnerabilities in this respect. | vulnerabilities in this respect. | |||
An acknowledgement packet with a spoofed IP source address of the TCP | An acknowledgement packet with a spoofed IP source address of the TCP | |||
data receiver could include the ECE bit set. If accepted by the TCP | data receiver could include the ECE bit set. If accepted by the TCP | |||
data sender as a valid packet, this spoofed acknowledgement packet | data sender as a valid packet, this spoofed acknowledgement packet | |||
could result in the TCP data sender unnecessarily halving its | could result in the TCP data sender unnecessarily halving its | |||
congestion window. However, to be accepted by the data sender, such | congestion window. However, to be accepted by the data sender, such | |||
a spoofed acknowledgement packet would have to have the correct | a spoofed acknowledgement packet would have to have the correct 32- | |||
32-bit sequence number as well as a valid acknowledgement number. An | bit sequence number as well as a valid acknowledgement number. An | |||
attacker that could successfully send such a spoofed acknowledgement | attacker that could successfully send such a spoofed acknowledgement | |||
packet could also send a spoofed RST packet, or do other equally | packet could also send a spoofed RST packet, or do other equally | |||
damaging operations to the TCP connection. | damaging operations to the TCP connection. | |||
Packets with a spoofed IP source address of the TCP data sender could | Packets with a spoofed IP source address of the TCP data sender could | |||
include the CWR bit set. Again, to be accepted, such a packet would | include the CWR bit set. Again, to be accepted, such a packet would | |||
have to have a valid sequence number. In addition, such a spoofed | have to have a valid sequence number. In addition, such a spoofed | |||
packet would have a limited performance impact. Spoofing a data | packet would have a limited performance impact. Spoofing a data | |||
packet with the CWR bit set could result in the TCP data receiver | packet with the CWR bit set could result in the TCP data receiver | |||
sending fewer ECE packets than it would otherwise, if the data | sending fewer ECE packets than it would otherwise, if the data | |||
skipping to change at page 51, line 35 | skipping to change at page 51, line 25 | |||
in the network. | in the network. | |||
In some cases, the increase in the level of congestion will lead to a | In some cases, the increase in the level of congestion will lead to a | |||
substantial buffer buildup at the congested queue that will be | substantial buffer buildup at the congested queue that will be | |||
sufficient to drive the congested queue from the packet-marking to | sufficient to drive the congested queue from the packet-marking to | |||
the packet-dropping regime. This transition could occur either | the packet-dropping regime. This transition could occur either | |||
because of buffer overflow, or because of the active queue management | because of buffer overflow, or because of the active queue management | |||
policy described above that drops packets when the average queue is | policy described above that drops packets when the average queue is | |||
above RED's maximum threshold. At this point, all flows, including | above RED's maximum threshold. At this point, all flows, including | |||
the subverted flow, will begin to see packet drops instead of packet | the subverted flow, will begin to see packet drops instead of packet | |||
marks, and a malicious or broken router will no longer be able to | marks, and a malicious or broken router will no longer be able to ` | |||
`erase' these indications of congestion in the network. If the end | erase' these indications of congestion in the network. If the end | |||
nodes are deploying appropriate end-to-end congestion control, then | nodes are deploying appropriate end-to-end congestion control, then | |||
the subverted flow will reduce its arrival rate in response to | the subverted flow will reduce its arrival rate in response to | |||
congestion. When the level of congestion is sufficiently reduced, | congestion. When the level of congestion is sufficiently reduced, | |||
the congested queue can return from the packet-dropping regime to the | the congested queue can return from the packet-dropping regime to the | |||
packet-marking regime. The steady-state pattern could be one of the | packet-marking regime. The steady-state pattern could be one of the | |||
congested queue oscillating between these two regimes. | congested queue oscillating between these two regimes. | |||
In other cases, the consequences of subverting end-to-end congestion | In other cases, the consequences of subverting end-to-end congestion | |||
control will not be severe enough to drive the congested link into | control will not be severe enough to drive the congested link into | |||
sufficiently-heavy congestion that packets are dropped instead of | sufficiently-heavy congestion that packets are dropped instead of | |||
skipping to change at page 52, line 28 | skipping to change at page 52, line 21 | |||
Let us take the example described in Section 18.1.1, where the CE | Let us take the example described in Section 18.1.1, where the CE | |||
codepoint that was set in a packet is erased: {'11' -> '10' or '11' | codepoint that was set in a packet is erased: {'11' -> '10' or '11' | |||
-> '01'}. The consequence for the congested upstream router that set | -> '01'}. The consequence for the congested upstream router that set | |||
the CE codepoint is that this congestion indication does not reach | the CE codepoint is that this congestion indication does not reach | |||
the end nodes for that flow. The source (even one which is completely | the end nodes for that flow. The source (even one which is completely | |||
cooperative and not malicious) is thus allowed to continue to | cooperative and not malicious) is thus allowed to continue to | |||
increase its sending rate (if it is a TCP flow, by increasing its | increase its sending rate (if it is a TCP flow, by increasing its | |||
congestion window). The flow potentially achieves better throughput | congestion window). The flow potentially achieves better throughput | |||
than the other flows that also share the congested router, especially | than the other flows that also share the congested router, especially | |||
if there are no policing mechanisms or per-flow queueing mechanisms | if there are no policing mechanisms or per-flow queuing mechanisms at | |||
at that router. Consider the behavior of the other flows, especially | that router. Consider the behavior of the other flows, especially if | |||
if they are cooperative: that is, the flows that do not experience | they are cooperative: that is, the flows that do not experience | |||
subverted end-to-end congestion control. They are likely to reduce | subverted end-to-end congestion control. They are likely to reduce | |||
their load (e.g., by reducing their window size) on the congested | their load (e.g., by reducing their window size) on the congested | |||
router, thus benefiting our subverted flow. This results in | router, thus benefiting our subverted flow. This results in | |||
unfairness. As we discussed above, this unfairness could either be | unfairness. As we discussed above, this unfairness could either be | |||
transient (because the congested queue is driven into the packet- | transient (because the congested queue is driven into the packet- | |||
marking regime), oscillatory (because the congested queue oscillates | marking regime), oscillatory (because the congested queue oscillates | |||
between the packet marking and the packet dropping regime), or more | between the packet marking and the packet dropping regime), or more | |||
moderate but a persistent stable state (because the congested queue | moderate but a persistent stable state (because the congested queue | |||
is never driven to the packet dropping regime). | is never driven to the packet dropping regime). | |||
skipping to change at page 53, line 17 | skipping to change at page 53, line 11 | |||
network posed by the subversion of either ECN-based or other | network posed by the subversion of either ECN-based or other | |||
currently known packet-based congestion control mechanisms by the end | currently known packet-based congestion control mechanisms by the end | |||
nodes. | nodes. | |||
19.2. Implications for the Subverted Flow | 19.2. Implications for the Subverted Flow | |||
When a source indicates that it is ECN-capable, there is an | When a source indicates that it is ECN-capable, there is an | |||
expectation that the routers in the network that are capable of | expectation that the routers in the network that are capable of | |||
participating in ECN will use the CE codepoint for indication of | participating in ECN will use the CE codepoint for indication of | |||
congestion. There is the potential benefit of using ECN in reducing | congestion. There is the potential benefit of using ECN in reducing | |||
the amount of packet loss (in addition to the reduced queueing delays | the amount of packet loss (in addition to the reduced queuing delays | |||
because of active queue management policies). When the packet flows | because of active queue management policies). When the packet flows | |||
through an IPsec tunnel where the nodes that the tunneled packets | through an IPsec tunnel where the nodes that the tunneled packets | |||
traverse are untrusted in some way, the expectation is that IPsec | traverse are untrusted in some way, the expectation is that IPsec | |||
will protect the flow from subversion that results in undesirable | will protect the flow from subversion that results in undesirable | |||
consequences. | consequences. | |||
In many cases, a subverted flow will benefit from the subversion of | In many cases, a subverted flow will benefit from the subversion of | |||
end-to-end congestion control for that flow in the network, by | end-to-end congestion control for that flow in the network, by | |||
receiving more bandwidth than it would have otherwise, relative to | receiving more bandwidth than it would have otherwise, relative to | |||
competing non-subverted flows. If the congested queue reaches the | competing non-subverted flows. If the congested queue reaches the | |||
skipping to change at page 54, line 14 | skipping to change at page 54, line 7 | |||
CE codepoint is set within the tunnel, and erased either within or | CE codepoint is set within the tunnel, and erased either within or | |||
downstream of the tunnel, this is not necessarily detected at the | downstream of the tunnel, this is not necessarily detected at the | |||
egress point of the tunnel. | egress point of the tunnel. | |||
With this subversion of end-to-end congestion control, an end-system | With this subversion of end-to-end congestion control, an end-system | |||
transport does not respond to the congestion indication. Along with | transport does not respond to the congestion indication. Along with | |||
the increased unfairness for the non-subverted flows described in the | the increased unfairness for the non-subverted flows described in the | |||
previous section, the congested router's queue could continue to | previous section, the congested router's queue could continue to | |||
build, resulting in packet loss at the congested router - which is a | build, resulting in packet loss at the congested router - which is a | |||
means for indicating congestion to the transport in any case. In the | means for indicating congestion to the transport in any case. In the | |||
interim, the flow might experience higher queueing delays, possibly | interim, the flow might experience higher queuing delays, possibly | |||
along with an increased bandwidth relative to other non-subverted | along with an increased bandwidth relative to other non-subverted | |||
flows. But transports do not inherently make assumptions of | flows. But transports do not inherently make assumptions of | |||
consistently experiencing carefully managed queueing in the path. We | consistently experiencing carefully managed queuing in the path. We | |||
believe that these forms of subverting end-to-end congestion control | believe that these forms of subverting end-to-end congestion control | |||
are no worse for the subverted flow than if the adversary had simply | are no worse for the subverted flow than if the adversary had simply | |||
dropped the packets of that flow itself. | dropped the packets of that flow itself. | |||
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | |||
We have shown that, in many cases, a malicious or broken router that | We have shown that, in many cases, a malicious or broken router that | |||
is able to change the bits in the ECN field can do no more damage | is able to change the bits in the ECN field can do no more damage | |||
than if it had simply dropped the packet in question. However, this | than if it had simply dropped the packet in question. However, this | |||
is not true in all cases, in particular in the cases where the broken | is not true in all cases, in particular in the cases where the broken | |||
skipping to change at page 55, line 28 | skipping to change at page 55, line 18 | |||
If there was no ECT codepoint, then the router would have to set the | If there was no ECT codepoint, then the router would have to set the | |||
CE codepoint for packets from both ECN-capable and non-ECN-capable | CE codepoint for packets from both ECN-capable and non-ECN-capable | |||
flows. In this case, there would be no incentive for end-nodes to | flows. In this case, there would be no incentive for end-nodes to | |||
deploy ECN, and no viable path of incremental deployment from a non- | deploy ECN, and no viable path of incremental deployment from a non- | |||
ECN world to an ECN-capable world. Consider the first stages of such | ECN world to an ECN-capable world. Consider the first stages of such | |||
an incremental deployment, where a subset of the flows are ECN- | an incremental deployment, where a subset of the flows are ECN- | |||
capable. At the onset of congestion, when the packet | capable. At the onset of congestion, when the packet | |||
dropping/marking rate would be low, routers would only set CE | dropping/marking rate would be low, routers would only set CE | |||
codepoints, rather than dropping packets. However, only those flows | codepoints, rather than dropping packets. However, only those flows | |||
that are ECN-capable would understand and respond to CE packets. The | that are ECN-capable would understand and respond to CE packets. The | |||
result is that the ECN-capable flows would back off, and the non-ECN- | result is that the ECN-capable flows would back off, and the non- | |||
capable flows would be unaware of the ECN signals and would continue | ECN-capable flows would be unaware of the ECN signals and would | |||
to open their congestion windows. | continue to open their congestion windows. | |||
In this case, there are two possible outcomes: (1) the ECN-capable | In this case, there are two possible outcomes: (1) the ECN-capable | |||
flows back off, the non-ECN-capable flows get all of the bandwidth, | flows back off, the non-ECN-capable flows get all of the bandwidth, | |||
and congestion remains mild, or (2) the ECN-capable flows back off, | and congestion remains mild, or (2) the ECN-capable flows back off, | |||
the non-ECN-capable flows don't, and congestion increases until the | the non-ECN-capable flows don't, and congestion increases until the | |||
router transitions from setting the CE codepoint to dropping packets. | router transitions from setting the CE codepoint to dropping packets. | |||
While this second outcome evens out the fairness, the ECN-capable | While this second outcome evens out the fairness, the ECN-capable | |||
flows would still receive little benefit from being ECN-capable, | flows would still receive little benefit from being ECN-capable, | |||
because the increased congestion would drive the router to packet- | because the increased congestion would drive the router to packet- | |||
dropping behavior. | dropping behavior. | |||
skipping to change at page 56, line 33 | skipping to change at page 56, line 24 | |||
codepoint would have to be done sparingly, and would be a less | codepoint would have to be done sparingly, and would be a less | |||
effective check against misbehaving network elements and receivers | effective check against misbehaving network elements and receivers | |||
than would be the ECN nonce. | than would be the ECN nonce. | |||
The assignment of the fourth ECN codepoint to ECT(1) precludes the | The assignment of the fourth ECN codepoint to ECT(1) precludes the | |||
use of this codepoint for some other purposes. For clarity, we | use of this codepoint for some other purposes. For clarity, we | |||
briefly list other possible purposes here. | briefly list other possible purposes here. | |||
One possibility might have been for the data sender to use the fourth | One possibility might have been for the data sender to use the fourth | |||
ECN codepoint to indicate an alternate semantics for ECN. However, | ECN codepoint to indicate an alternate semantics for ECN. However, | |||
this seems to us more appropriate to be signalled using a | this seems to us more appropriate to be signaled using a | |||
differentiated services codepoint in the DS field. | differentiated services codepoint in the DS field. | |||
A second possible use for the fourth ECN codepoint would have been to | A second possible use for the fourth ECN codepoint would have been to | |||
give the router two separate codepoints for the indication of | give the router two separate codepoints for the indication of | |||
congestion, CE(0) and CE(1), for mild and severe congestion | congestion, CE(0) and CE(1), for mild and severe congestion | |||
respectively. While this could be useful in some cases, this | respectively. While this could be useful in some cases, this | |||
certainly does not seem a compelling requirement at this point. If | certainly does not seem a compelling requirement at this point. If | |||
there was judged to be a compelling need for this, the complications | there was judged to be a compelling need for this, the complications | |||
of incremental deployment would most likely necessitate more that | of incremental deployment would most likely necessitate more that | |||
just one codepoint for this function. | just one codepoint for this function. | |||
skipping to change at page 59, line 13 | skipping to change at page 59, line 5 | |||
to overcome these limitations. | to overcome these limitations. | |||
22. Historical Definitions for the IPv4 TOS Octet | 22. Historical Definitions for the IPv4 TOS Octet | |||
RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP | RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP | |||
header. In RFC 791, bits 6 and 7 of the ToS octet are listed as | header. In RFC 791, bits 6 and 7 of the ToS octet are listed as | |||
"Reserved for Future Use", and are shown set to zero. The first two | "Reserved for Future Use", and are shown set to zero. The first two | |||
fields of the ToS octet were defined as the Precedence and Type of | fields of the ToS octet were defined as the Precedence and Type of | |||
Service (TOS) fields. | Service (TOS) fields. | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| PRECEDENCE | TOS | 0 | 0 | RFC 791 | | PRECEDENCE | TOS | 0 | 0 | RFC 791 | |||
+-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
RFC 1122 included bits 6 and 7 in the TOS field, though it did not | RFC 1122 included bits 6 and 7 in the TOS field, though it did not | |||
discuss any specific use for those two bits: | discuss any specific use for those two bits: | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| PRECEDENCE | TOS | RFC 1122 | | PRECEDENCE | TOS | RFC 1122 | |||
+-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: | The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| PRECEDENCE | TOS | MBZ | RFC 1349 | | PRECEDENCE | TOS | MBZ | RFC 1349 | |||
+-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary | Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary | |||
Cost". In addition to the Precedence and Type of Service (TOS) | Cost". In addition to the Precedence and Type of Service (TOS) | |||
fields, the last field, MBZ (for "must be zero") was defined as | fields, the last field, MBZ (for "must be zero") was defined as | |||
currently unused. RFC 1349 stated that "The originator of a datagram | currently unused. RFC 1349 stated that "The originator of a datagram | |||
sets [the MBZ] field to zero (unless participating in an Internet | sets [the MBZ] field to zero (unless participating in an Internet | |||
protocol experiment which makes use of that bit)." | protocol experiment which makes use of that bit)." | |||
RFC 1455 [RFC 1455] defined an experimental standard that used all | RFC 1455 [RFC 1455] defined an experimental standard that used all | |||
four bits in the TOS field to request a guaranteed level of link | four bits in the TOS field to request a guaranteed level of link | |||
skipping to change at page 60, line 29 | skipping to change at page 60, line 20 | |||
prior to 2474. Such nodes may transmit bit 6 (the first ECN bit) as | prior to 2474. Such nodes may transmit bit 6 (the first ECN bit) as | |||
one for the "Minimize Monetary Cost" provision of RFC 1349 or the | one for the "Minimize Monetary Cost" provision of RFC 1349 or the | |||
experiment authorized by RFC 1455; neither this aspect of RFC 1349 | experiment authorized by RFC 1455; neither this aspect of RFC 1349 | |||
nor the experiment in RFC 1455 were widely implemented or used. The | nor the experiment in RFC 1455 were widely implemented or used. The | |||
damage that could be done by a broken, non-conformant router would | damage that could be done by a broken, non-conformant router would | |||
include "erasing" the CE codepoint for an ECN-capable packet that | include "erasing" the CE codepoint for an ECN-capable packet that | |||
arrived at the router with the CE codepoint set, or setting the CE | arrived at the router with the CE codepoint set, or setting the CE | |||
codepoint even in the absence of congestion. This has been discussed | codepoint even in the absence of congestion. This has been discussed | |||
in the section on "Non-compliance in the Network". | in the section on "Non-compliance in the Network". | |||
The damage that could be done in an ECN-capable environment by a non- | The damage that could be done in an ECN-capable environment by a | |||
ECN-capable end-node transmitting packets with the ECT codepoint set | non-ECN-capable end-node transmitting packets with the ECT codepoint | |||
has been discussed in the section on "Non-compliance by the End | set has been discussed in the section on "Non-compliance by the End | |||
Nodes". | Nodes". | |||
23. IANA Considerations | 23. IANA Considerations | |||
This section contains the namespaces that have either been created in | This section contains the namespaces that have either been created in | |||
this specification, or the values assigned in existing namespaces | this specification, or the values assigned in existing namespaces | |||
managed by IANA. | managed by IANA. | |||
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet | 23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet | |||
The codepoints for the ECN Field of the IP header are specified by | The codepoints for the ECN Field of the IP header are specified by | |||
the Standards Action of this RFC, as is required by RFC 2780. | the Standards Action of this RFC, as is required by RFC 2780. | |||
When this draft is published as an RFC, IANA should create a new | When this document is published as an RFC, IANA should create a new | |||
registry, "IPv4 TOS Byte and IPv6 Traffic Class Octet", with the | registry, "IPv4 TOS Byte and IPv6 Traffic Class Octet", with the | |||
namespace as follows: | namespace as follows: | |||
IPv4 TOS Byte and IPv6 Traffic Class Octet | IPv4 TOS Byte and IPv6 Traffic Class Octet | |||
Description: The registrations are identical for IPv4 and IPv6. | Description: The registrations are identical for IPv4 and IPv6. | |||
Bits 0-5: see Differentiated Services Field Codepoints Registry | Bits 0-5: see Differentiated Services Field Codepoints Registry | |||
(http://www.iana.org/assignments/dscp-registry) | (http://www.iana.org/assignments/dscp-registry) | |||
Bits 6-7, ECN Field: | Bits 6-7, ECN Field: | |||
Binary Keyword References | Binary Keyword References | |||
------ ------- ---------- | ------ ------- ---------- | |||
00 Not-ECT (Not ECN-Capable Transport) [RFC xxx] | 00 Not-ECT (Not ECN-Capable Transport) [RFC 3168] | |||
01 ECT(1) (ECN-Capable Transport(1)) [RFC xxx] | 01 ECT(1) (ECN-Capable Transport(1)) [RFC 3168] | |||
10 ECT(0) (ECN-Capable Transport(0)) [RFC xxx] | 10 ECT(0) (ECN-Capable Transport(0)) [RFC 3168] | |||
11 CE (Congestion Experienced) [RFC xxx] | 11 CE (Congestion Experienced) [RFC 3168] | |||
23.2. TCP Header Flags | 23.2. TCP Header Flags | |||
The codepoints for the CWR and ECE flags in the TCP header are | The codepoints for the CWR and ECE flags in the TCP header are | |||
specified by the Standards Action of this RFC, as is required by RFC | specified by the Standards Action of this RFC, as is required by RFC | |||
2780. | 2780. | |||
When this draft is published as an RFC, IANA should create a new | When this document is published as an RFC, IANA should create a new | |||
registry, "TCP Header Flags", with the namespace as follows: | registry, "TCP Header Flags", with the namespace as follows: | |||
TCP Header Flags | TCP Header Flags | |||
The Transmission Control Protocol (TCP) included a 6-bit Reserved | ||||
field defined in RFC 793, reserved for future use, in bytes | ||||
13 and 14 of the TCP header, as illustrated below. The other six | ||||
Control bits are defined separately by RFC 793. | ||||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | The Transmission Control Protocol (TCP) included a 6-bit Reserved | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | field defined in RFC 793, reserved for future use, in bytes 13 and 14 | |||
| | | U | A | P | R | S | F | | of the TCP header, as illustrated below. The other six Control bits | |||
| Header Length | Reserved | R | C | S | S | Y | I | | are defined separately by RFC 793. | |||
| | | G | K | H | T | N | N | | ||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | ||||
RFC xxx defines two of the six bits from the Reserved field to be | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
used for ECN, as follows: | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | U | A | P | R | S | F | | ||||
| Header Length | Reserved | R | C | S | S | Y | I | | ||||
| | | G | K | H | T | N | N | | ||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | ||||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | RFC 3168 defines two of the six bits from the Reserved field to be | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | used for ECN, as follows: | |||
| | | C | E | U | A | P | R | S | F | | ||||
| Header Length | Reserved | W | C | R | C | S | S | Y | I | | ||||
| | | R | E | G | K | H | T | N | N | | ||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | ||||
TCP Header Flags | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | ||||
| | | C | E | U | A | P | R | S | F | | ||||
| Header Length | Reserved | W | C | R | C | S | S | Y | I | | ||||
| | | R | E | G | K | H | T | N | N | | ||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | ||||
TCP Header Flags | ||||
Bit Name Reference | Bit Name Reference | |||
--- ---- --------- | --- ---- --------- | |||
8 CWR (Congestion Window Reduced) [RFC xxx] | 8 CWR (Congestion Window Reduced) [RFC 3168] | |||
9 ECE (ECN-Echo) [RFC xxx] | 9 ECE (ECN-Echo) [RFC 3168] | |||
23.3. IPSEC Security Association Attributes | 23.3. IPSEC Security Association Attributes | |||
IANA allocated the IPSEC Security Association Attribute value 10 for | IANA allocated the IPSEC Security Association Attribute value 10 for | |||
the ECN Tunnel use described in Section 9.2.1.2 above at the request | the ECN Tunnel use described in Section 9.2.1.2 above at the request | |||
of David Black in November 1999. When this draft is published as an | of David Black in November 1999. The IANA has changed the Reference | |||
RFC, IANA should change the Reference for this allocation from David | for this allocation from David Black's request to this RFC. | |||
Black's request to this RFC based on its RFC number. | ||||
AUTHORS' ADDRESSES | 24. Authors' Addresses | |||
K. K. Ramakrishnan | K. K. Ramakrishnan | |||
TeraOptic Networks, Inc. | TeraOptic Networks, Inc. | |||
Phone: +1 (408) 666-8650 | Phone: +1 (408) 666-8650 | |||
Email: kk@teraoptic.com | EMail: kk@teraoptic.com | |||
Sally Floyd | Sally Floyd | |||
Phone: +1 (510) 666-2989 | ||||
ACIRI | ACIRI | |||
Email: floyd@aciri.org | ||||
Phone: +1 (510) 666-2989 | ||||
EMail: floyd@aciri.org | ||||
URL: http://www.aciri.org/floyd/ | URL: http://www.aciri.org/floyd/ | |||
David L. Black | David L. Black | |||
EMC Corporation | EMC Corporation | |||
42 South St. | 42 South St. | |||
Hopkinton, MA 01748 | Hopkinton, MA 01748 | |||
Phone: +1 (508) 435-1000 x75140 | Phone: +1 (508) 435-1000 x75140 | |||
Email: black_david@emc.com | EMail: black_david@emc.com | |||
This draft was created in June 2001. | 25. Full Copyright Statement | |||
It expires December 2001. | ||||
Copyright (C) The Internet Society (2001). All Rights Reserved. | ||||
This document and translations of it may be copied and furnished to | ||||
others, and derivative works that comment on or otherwise explain it | ||||
or assist in its implementation may be prepared, copied, published | ||||
and distributed, in whole or in part, without restriction of any | ||||
kind, provided that the above copyright notice and this paragraph are | ||||
included on all such copies and derivative works. However, this | ||||
document itself may not be modified in any way, such as by removing | ||||
the copyright notice or references to the Internet Society or other | ||||
Internet organizations, except as needed for the purpose of | ||||
developing Internet standards in which case the procedures for | ||||
copyrights defined in the Internet Standards process must be | ||||
followed, or as required to translate it into languages other than | ||||
English. | ||||
The limited permissions granted above are perpetual and will not be | ||||
revoked by the Internet Society or its successors or assigns. | ||||
This document and the information contained herein is provided on an | ||||
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING | ||||
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING | ||||
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION | ||||
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF | ||||
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
Acknowledgement | ||||
Funding for the RFC Editor function is currently provided by the | ||||
Internet Society. | ||||
End of changes. 146 change blocks. | ||||
525 lines changed or deleted | 524 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |