[Docs] [txt|pdf] [Tracker] [Email] [Nits]

Versions: 00

Internet Engineering Task Force                Hadi Salim, J
Internet Draft                                 Nandy, B
                                               Seddigh, N
                                               Computing Technology Labs,
                                               Nortel
                                               June 1998
                  <draft-salim-jhsbnns-ecn-00.txt>

   A proposal for Backward ECN for the Internet Protocol (IPv4/IPv6)


Status of this Memo
   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   To view the entire list of current Internet-Drafts, please check
   the "1id-abstracts.txt" listing contained in the Internet-Drafts
   Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
   (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
   (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
   (US West Coast).


Abstract

This memo proposes an alternative approach to the current ECN mechanism
as proposed in the internet draft [draft-kksjf].  A Backward-ECN(BECN)
is proposed which uses the existing IP signalling mechanism, the
Internet Control Messaging Protocol (ICMP) [RFC 792] Source Quench
message.  The use of ICMP Source Quench (ISQ) allows a basic ECN
mechanism for IP which does not require any negotiation between end
systems.  Congestion notification is kept at the network(IP) level.  The
congestion state can be reflected up to the transport layer (e.g. TCP or
UDP) for appropriate action. The ISQ based approach reduces the reaction
time to a congestion in the network.  In addition, the ISQ message can
include information on the severity of the congestion allowing the end
host to react accordingly so as to make maximal use of the resources
while maintaining network equilibrium.







Hadi et al                Expires December 1998                 [Page 1]


Internet Draft   Backward ECN for the Internet Protocol        June 1998


1.0 Introduction

IP currently does not have any adhered to mechanism to notify its
transport protocols of network congestion problems. ISQs have been in
the past used for congestion notification; TCP implements its own
congestion control algorithm and makes inferences about network
congestion: TCP-Reno and variants use packet losses as an indicator
whereas TCP-Vegas uses delay/throughput as the indicator.  UDP
applications are usually unresponsive and the protocols running over UDP
(e.g., RTP) use their own congestion control methods if they do at all.
The initial suggestions to introduce a methodology for adding Explicit
Congestion Notification to IP are outlined in [Floyd94] and later in the
IETF draft [draft-kksjf].

1.1 Current ECN Proposal [draft-kksjf]

Bits 10 and 11 in the IPV6 header are proposed respectively for the ECT
(ECN Capable Transport indicator) and CE (Congestion Experienced
indicator).  Bits 6 and 7 of the IPV4 header TOS field are also proposed
as the ECT and CE place holders respectively.  The TCP header is
modified to add an additional flag, the ECN Echo, to notify the sender
(from the receiver) that it is contributing to congestion. The flag's
bit-space is borrowed from the reserved field in the TCP header.  This
bit is also interchangebly referred to as the ECE bit in this text.

The ECT bit is set by the sender end system if both the end systems are
ECN capable. This is confirmed in the pre-negotiation during the
connection setup phase in TCP.  Packets encountering congestion are
marked (CE bit) by a router on their way to the receiver end system
(from the sender end system), with a probability proportional to their
bandwidth usage following the procedure used in RED [RFC2309] routers.
When the receiver end system receives the congestion causing packet with
CE and ECT bits set, it informs the sender end system that it is
contributing to congestion by the setting of ECE bit in the ACK packet.
The sender end system reacts by halving the congestion window upon
receiving the ACK packet.  The sender end system reacts only once to ECE
messages per in-flight window of messages.

1.2 Limitations of the Current ECN Proposal [draft-kksjf]

1) The [draft-kksjf] proposal's congestion notification is coupled to
the transport layer(TCP) via the use of header information (ECE bit).
To extend this proposal to other transport protocols will require
changes to each of their respective headers.

2) The proposed [draft-kksjf] scheme requires the congestion
notification to incur a round trip time (RTT) before the sender can
react.  In a path with high delay-bandwidth product this would be



Hadi et al                Expires December 1998                 [Page 2]


Internet Draft   Backward ECN for the Internet Protocol        June 1998


problematic for two reasons: i) in the scenario where the delay-
bandwidth product is dominated mostly by the high bandwidth (as in in
high-speed networks), a large amount of traffic will pass through the
intermediate routers causing an increase in congestion level before the
sender is notified. ii) in the scenario where the delay-bandwidth
product is dominated mostly by the high latency/RTT (as in satellite
networks), the reaction will take too long to address the congestion
issue.  In both cases, the efficient use of the available bandwidth is
affected.

3) Because of the binary nature of the feedback, the reaction is limited
to halving the window size even if the congestion level is very low.
Network resources could be more effectively utilized if the feedback was
indicative of the congestion level at the overloaded point in the
network.

In this document we introduce a Backward ECN (BECN) which is a binary
feedback mechanism and then an incremental improvement to BECN which
provides Multi-level Backward ECN which we refer to as Multilevel ECN
(MECN).

Section 2 gives an introduction to our solution and how it addresses the
above limitations: a justification for using ISQ is made and Backward
ECN (BECN) and then multi-level BECN (MECN).  Section 3 goes into the
details of BECN and suggests a role for the router and the end system.
Section 4 goes into the details of MECN and suggests a role for the
router and the end system.  Section 5 addresses the situation of
multiple congested routers with our scheme.  Section 6 is on security
issues.


2.0 Network Level Signalling for ECN

We argue that ECN is a network level functionality and should be
decoupled from the transport protocols. A mechanism should be provided
for the end IP layer to inform its transport protocols of congestion
problems without using their header bit(s). This provides the value that
all IP transport protocols (including any new ones that might be added
in the future) are notified in the same manner about network congestion.
In this document we only deal with TCP and in particular TCP mechanisms
which use packet drops as indicators of congestion such as TCP-Reno and
its variants.

It is assumed that the participating routers are capable of RED or some
other active queue management mechanism. In such a router, a packet has
a probability of being dropped where this probability is dependent on
average queue size. For packets with the ECT bit set in the IP header,
instead of the packet being dropped it would have the CE bit in the



Hadi et al                Expires December 1998                 [Page 3]


Internet Draft   Backward ECN for the Internet Protocol        June 1998


header set before being forwarded with a given probability if the
average queue size goes between the minimum and maximum thresholds as
described in [draft-kksjf].

We leverage ICMP's Source Quench message whose design intent is to
provide feedback to a source end system about network congestion.  Both
the CE and ECT bits defined in [draft-kksjf] are maintained.  During the
de-multiplexing of the IP message, the values of both CE and ECT are
passed to the transport layer.

We start by introducing a traditional ISQ which comprises a binary
feedback mechanism and a relatively modified binary reaction at the
source end system (in comparison to what the requirements for the end
host's reaction to ISQ are at the moment [RFC1122])

Definition: The term binary congestion feedback is used to define
gathered knowledge of network congestion being passed back to an end
node, explicit or otherwise, ignoring the levels of congestion.  The
data only says that the network is congested.

We then introduce a multilevel congestion feedback mechanism based on
the various incipient congestion levels detected at the RED router. The
sender end system in that scenario has the luxury of having more varied
reactions based on the congestion level that is fed back. This results
in effective use of the network resources and performance.

Definition: The term multilevel congestion feedback is used to define
gathered knowledge of network congestion being passed back to an end
node with explicit level indicators of how severely the network is
congested.

We propose the multilevel congestion feedback and reaction as an
incremental improvement over the binary congestion feedback and reaction
mechanism.  In sections 3 and 4 we suggest some simple algorithms for
both the binary and multilevel solutions.

2.1  Backward ECN (BECN)

This section briefly describes the binary feedback-reaction mechanism.

ICMP Source Quench messages (ISQ) are generated by the intermediate
congested RED router and sent back to the source as an indication of
incipient congestion whenever that router decides to mark the CE bit.
ISQs are usually not generated for a packet that has already been marked
previously by another router regardless of whether that packet is
contributing to some congestion; however, when the router queue level
mandates that the packet be dropped then an ISQ is sent back to the
source regardless of whether the packet was marked previously or not.



Hadi et al                Expires December 1998                 [Page 4]


Internet Draft   Backward ECN for the Internet Protocol        June 1998


The source reacts at the transport protocol level by lowering its data
throughput into the network.  In TCP, upon identifying the flow causing
the congestion, the sender reacts by halving both the congestion window
and the slow start threshold value for that flow.  The sender does not
react to an ISQ message more than once per window.  This is similar to
the algorithm defined in the draft[draft-kksjf].


2.2 Multilevel BECN (MECN)

This section briefly describes the multilevel congestion feedback-
reaction.

Multi-level ICMP Source Quench messages (ISQ) are generated by the RED
router and sent back to the source as an indication of incipient
congestion whenever the CE bit is marked by the intermediate congested
router.  The levels are based on the RED probability, and therefore
average queue size, at the time a congestive packet arrives at the
router. The congestion level sent back is a multiplicative factor of the
marking probability and is stored in the 32-bit unused field of the ISQ.
As an example the multiplicative value selected is 100. The upper limit
of 100  is returned when the probability of dropping the packet is equal
to one.(i.e average queue size is above maximum threshold).  ISQs are
not generated for a packet that has already been marked; however, as in
the case of the BECN when the router queue level mandates that a packet
is dropped then an ISQ is sent back to the source regardless of whether
the packet was previously marked or not. The value is the maximum i.e
100 in the above example.


2.3 The argument to justify the use of ISQ

ISQ messages, generated by a router to an end system, in the past have
been considered inefficient due to the following reasons:

1) Gateway CPU abuse while processing these extra messages and 2)
Bandwidth consumption on the reverse path.  It is suggested [RFC1812]
that the routers, if implementing ISQs, should rate limit their
generation because they consume too much bandwidth in the reverse path.

We argue that CPU time is no longer a constrained resource today and
that the benefits provided by ECN outweigh the small performance hit
added.  Moreover, it has been shown [red-paper] that when using RED
(with cooperating end systems) less packet drops happen at the router in
comparison to the traditional drop-tail algorithms used in disapproving
ISQ.  This implies the amount of processing needed at the router is
reduced. It has been quantitatively shown in simulations [kcho-97] that
only about 1-5% of the packets are marked or dropped in a RED gateway



Hadi et al                Expires December 1998                 [Page 5]


Internet Draft   Backward ECN for the Internet Protocol        June 1998


under incipient congestion.  We argue that a faster reaction to the
problem as provided by ISQ would alleviate the problem faster resulting
in even further reductions.

Using a RED gateway provides us with an advantage. A connection is
notified (by an ISQ in this case) of congestion at a rate proportional
to the connection's share of the bandwidth at the congested gateway.
Generation of ISQ messages will be limited to the period between when
incipient congestion is detected all the way until the source end system
adjusts.  In fact, given our scheme which addresses congested routers
sequentially on a downstream path, we argue that the back-path even if
it is the same as the forward path is probably not really congested
since it covers the path only to the first point of congestion along
that path. More details in section 5.

In essence RED addresses both the backward path congestion problem, if
the back path is the same one as the forward path, as well as the router
processing concerns.

3.0  Suggested BECN algorithm

This is a binary feedback-reaction mechanism.  The ISQs sent by the
router to the source host act as an indication of incipient congestion.
The source reacts at the transport level by lowering its congestion
window.  The algorithm supplied here is the same as the one used in the
ECN proposal [draft-kksjf]

3.1  Role of the Router

If the incoming message causes the average queue size to go above the
maximum threshold, then drop the segment
  if the ECT bit is marked in the IP header send an ISQ back to the
    source.
else if the incoming message causes the average queue to go between
the minimum and maximum thresholds then:
  if the RED probability chooses this packet and the ECT bit is set
        and if packet is not already marked then:
               mark the packet (CE bit) and send an ISQ back.
  else if RED chooses this packet and the ECT bit is not set then:
               drop the packet.


3.2  Role of the Source End System

    If an ISQ message is received then the sender knows that there is
network congestion. The flow causing the congestion is identified from
the ICMP data. The TCP source reacts by halving both the congestion
window and the slow start threshold value for that flow.



Hadi et al                Expires December 1998                 [Page 6]


Internet Draft   Backward ECN for the Internet Protocol        June 1998


    The sender does not react to ISQ more than once per window. Upon
receipt of an ISQ  packet at time t, it notes the packets that are
outstanding at that time (sent but not yet acked) and waits until a time
u when they have all been acknowledged before reacting to a new ISQ
message.


4.0  Suggested MECN algorithm

This is an evolution of BECN. The router now sends levels of congestion
notification and the source end system reacts differently depending on
the severity of the congestion. The level of notification is stored in
the 32-bit unused field in the ISQ.

4.1  Role of the Router

4.1.1 How the congestion level weight is computed

Pb refers to the computed RED packet marking probability. Pb is a
function of the computed average queue size. As the average queue size
varies from minimum to maximum threshold, Pb varies between 0 and the
maximum value set for it, Maxp.  Note that we quantify Pb to be one
when the threshold is above maximum; in that particular case, the
maximum weight is sent to the source system.  We choose for simplicity's
sake a multiplicative factor to be 100 to fashion the weight as a
percentage congestion level. Above the maximum threshold we send a value
of 100 in the feedback message indicating 100% incipient congestion.  We
multiply Pb by some factor such that we get a reflection of 99%
congestion when Pb reaches its maximum value and we add 1 to counter for
the fact that Pb is zero at the minimum threshold. The equation used to
compute the weight to send between the minimum and maximum thresholds
is:

level= Pb*(98/Maxp) + 1

At the maximum threshold the weight sent is 99 and at minimum threshold
the weight sent is 1.  For efficiency, 98/Maxp could be computed at RED
initialization.

4.1.2  The Router functionality

If the incoming message causes the average queue size to go above the
maximum threshold, then:
        drop the packet,
   if the ECT bit is marked in the IP header then:
        send an ISQ back to the source with a weight of 100





Hadi et al                Expires December 1998                 [Page 7]


Internet Draft   Backward ECN for the Internet Protocol        June 1998


If the incoming message causes the average queue to go between the
minimum and maximum thresholds then:

   if the RED probability picks this packet then:
     if the ECT bit is set and the CE bit is not already marked then:
        mark the packet and send an ISQ of integer level 1+(Pb*98/Maxp)
            back to the source
     else (the ECT bit is not set in the IP header) then:
             drop the packet.


4.2  Role of the End System

The end system can now react to a shade of congestion level
notifications.

We show here a simple algorithm that could be incrementally improved.
We react to each ISQ received under the assumption that the effect of
burstiness and spuriousness is accounted for by the RED algorithm at the
router.  Since a weight of 100 indicates that the packet was dropped we
use this information to improve RTO in TCP by retransmitting that
packet. Note that the packet sequence number can be deduced from the 8
bytes of the TCP header passed back in the ISQ message (ISQs always pass
8 bytes on top of the IP header's information). The slow start,
congestion avoidance and Fast retransmit/recovery mechanics are
maintained.

4.2.1  The Source end system functionality

If an ISQ message is received then the sender knows that there is
network congestion. The flow causing the congestion is identified from
the ICMP data and the congestion level is extracted.

If the congestion level == 100 then:
    extract the TCP sequence number from the ISQ.
    retransmit the packet.
    cut the congestion window and threshold value by 1/2.

else (we are between max and min threshold at the router) then:
    if congestion level >=50 then:
       cut the congestion window and threshold value by 1/2.
    else (anything below 50%) then:
        congestion window is linearly decremented by 1.

Note: a) The usual rules about the lower bounds of the threshold and
congestion window values apply when decrementing.

b) The MECN method outlined above will have interactions with the



Hadi et al                Expires December 1998                 [Page 8]


Internet Draft   Backward ECN for the Internet Protocol        June 1998


existing congestion control mechanisms in TCP. The overall effect still
slows down the system throughput if the congestion levels warrant it.

5.0  Multiple congested routers

Multiple congested routers on the path between the sender and the
receiver have their concerns addressed one at a time in a domino effect.
If any of the downstream routers are congested to the extent of a packet
drop then that router's congestion concerns are addressed immediately.
If a packet is marked by a congested router, no ISQ message is generated
further for it on its way to the destination.  The exception to the rule
is, if along the path after the marking, some other intermediate router
decides to drop this packet.  In that case it will transmit an ISQ of
level 100 to which the end system will have to invoke the congestion
reaction immediately. Therefore any router which is congested to the
level of dropping packets will participate in the congestion control.
Routers which are closer to the source will be favored in the sense that
their incipient congestion levels will be reacted to first. If the flow
is long enough, the router closest to the source will have its
congestion concerns serviced first with the next downstream router
serviced next and so forth with the router closest to the destination
being the last one responded to.  The bias is more eminent when a
further downstream router (other the one that marked the packet) would
have sent a higher notification level had it had the opportunity i.e had
a packet not been marked and given a lesser weight in a previous router.
We feel that this bias is not of great significance given that any
downstream router dropping a packet will contribute to the congestion
reaction at the source.

6.0 Security issues

ISQ messages can be spoofed. This can be used for a Denial of Service
attack on a source end system.  Building authentication is probably too
heavy weight.  This is a problem faced by IP in general and so we have
not attempted to address it.

7.0 References

[draft-kksjf] Ramakrishanan, KK and Floyd, S. A proposal to add Explicit
Congestion Notification(ECN) to IPv6 and to TCP, IETF Draft draft-kksjf-
ECN-00.txt, November 1997.

[Floyd94] Floyd, S. TCP and Explicit Congestion Notification, ACM
Computer Communications Review, V.24N, October 1994.

[red-paper] Floyd,S. and Jacobson, V. Random Early Detection Gateways
for Congestion Avoidance, IEEE/ACM Transactions on Networking,Aug 1993.




Hadi et al                Expires December 1998                 [Page 9]


Internet Draft   Backward ECN for the Internet Protocol        June 1998


[kcho-97] Cho, K.J. ALTQ/RED Performance,
http://www.csl.csl.sony.co.jp/person/kjc/red/perf.html

[RFC 792] Postel, J Internet Control Message Protocol (sep 1981)

[RFC1122] Braden, R (Editor) Requirements for Internet Hosts --
Communication Layers (oct 1989).

[RFC2309] Braden, B.,Clark, D.,Crowcroft, J.,Davie, B., Deering, S.,
Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C.,
Peterseon, L., Ramakrishnan, K., Shenker, S.,Wroclaski, J., and Zhang,
L.  Recommendations on Queue Management and Congestion Avoidance in the
Internet (April 1998).

[RFC 1812] Baker, F. Requirements for IPv4 routers (June 1995).


8.0 Acknowledgements

The authors are much indebted to Alan Chapman. Without his insight and
multiple edits the ideas embedded in here would have been much difficult
to present.

9.0 Authors' Addresses

Jamal Hadi Salim,
Computing Technology Labs,
Nortel Canada,
PO Box 3511 Station C
Ottawa ON K1Y 4H7
Canada

Phone: 613-763-6395
Email: hadi@nortel.com

Biswajit Nandy,
Computing Technology Labs,
Nortel Canada,
PO Box 3511 Station C
Ottawa ON K1Y 4H7
Canada

Phone: 613-765-3709
Email: bnandy@nortel.com

Nabil Seddigh,
Computing Technology Labs,
Nortel Canada,



Hadi et al                Expires December 1998                [Page 10]


Internet Draft   Backward ECN for the Internet Protocol        June 1998


PO Box 3511 Station C
Ottawa ON K1Y 4H7
Canada

Phone: 613-763-6396
Email: nseddigh@nortel.com













































Hadi et al                Expires December 1998                [Page 11]


Html markup produced by rfcmarkup 1.128b, available from https://tools.ietf.org/tools/rfcmarkup/