draft-ietf-tsvwg-tunnel-congestion-feedback-04.txt   draft-ietf-tsvwg-tunnel-congestion-feedback-05.txt 
Internet Engineering Task Force X. Wei Internet Engineering Task Force X. Wei
INTERNET-DRAFT Huawei Technologies INTERNET-DRAFT Huawei Technologies
Intended Status: Informational L.Zhu Intended Status: Informational L.Zhu
Expires: July 29, 2017 Huawei Technologies Expires: November 24, 2017 Huawei Technologies
L.Deng L.Deng
China Mobile China Mobile
January 25, 2017 May 23, 2017
Tunnel Congestion Feedback Tunnel Congestion Feedback
draft-ietf-tsvwg-tunnel-congestion-feedback-04 draft-ietf-tsvwg-tunnel-congestion-feedback-05
Abstract Abstract
This document describes a method to measure congestion on a tunnel This document describes a method to measure congestion on a tunnel
segment based on recommendations from RFC 6040, "Tunneling of segment based on recommendations from RFC 6040, "Tunneling of
Explicit Congestion Notification", and to use IPFIX to communicate Explicit Congestion Notification", and to use IPFIX to communicate
the congestion measurements from the tunnel's egress to a controller the congestion measurements from the tunnel's egress to a controller
which can respond by modifying the traffic control policies at the which can respond by modifying the traffic control policies at the
tunnel's ingress. tunnel's ingress.
skipping to change at page 2, line 21 skipping to change at page 2, line 21
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions And Terminologies . . . . . . . . . . . . . . . . . 3 2. Conventions And Terminologies . . . . . . . . . . . . . . . . . 3
3. Congestion Information Feedback Models . . . . . . . . . . . . 3 3. Congestion Information Feedback Models . . . . . . . . . . . . 4
4. Congestion Level Measurement . . . . . . . . . . . . . . . . . 4 4. Congestion Level Measurement . . . . . . . . . . . . . . . . . 5
5. Congestion Information Delivery . . . . . . . . . . . . . . . . 6 5. Congestion Information Delivery . . . . . . . . . . . . . . . . 7
5.1 IPFIX Extensions . . . . . . . . . . . . . . . . . . . . . . 7 5.1 IPFIX Extensions . . . . . . . . . . . . . . . . . . . . . . 8
5.1.1 tunnelEcnCeCePacketTotalCount . . . . . . . . . . . . . 8 5.1.1 tunnelEcnCeCeByteTotalCount . . . . . . . . . . . . . . 8
5.1.2 tunnelEcnEct0NectPacketTotalCount . . . . . . . . . . . 8 5.1.2 tunnelEcnEct0NectBytetTotalCount . . . . . . . . . . . . 8
5.1.3 tunnelEcnEct1NectPacketTotalCount . . . . . . . . . . . 8 5.1.3 tunnelEcnEct1NectByteTotalCount . . . . . . . . . . . . 9
5.1.4 tunnelEcnCeNectPacketTotalCount . . . . . . . . . . . . 9 5.1.4 tunnelEcnCeNectByteTotalCount . . . . . . . . . . . . . 9
5.1.5 tunnelEcnCeEct0PacketTotalCount . . . . . . . . . . . . 9 5.1.5 tunnelEcnCeEct0ByteTotalCount . . . . . . . . . . . . . 9
5.1.6 tunnelEcnCeEct1PacketTotalCount . . . . . . . . . . . . 9 5.1.6 tunnelEcnCeEct1ByteTotalCount . . . . . . . . . . . . . 10
5.1.7 tunnelEcnEct0Ect0PacketTotalCount . . . . . . . . . . . 10 5.1.7 tunnelEcnEct0Ect0ByteTotalCount . . . . . . . . . . . . 10
5.1.8 tunnelEcnEct1Ect1PacketTotalCount . . . . . . . . . . . 10 5.1.8 tunnelEcnEct1Ect1PacketTotalCount . . . . . . . . . . . 10
6. Congestion Management . . . . . . . . . . . . . . . . . . . . . 10 5.1.9 tunnelEcnCEMarkedRatio . . . . . . . . . . . . . . . . . 11
6. Congestion Management . . . . . . . . . . . . . . . . . . . . . 11
6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7. Security Considerations . . . . . . . . . . . . . . . . . . . . 14 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 14
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 14 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 15
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.1 Normative References . . . . . . . . . . . . . . . . . . . 16 9.1 Normative References . . . . . . . . . . . . . . . . . . . 17
9.2 Informative References . . . . . . . . . . . . . . . . . . 17 9.2 Informative References . . . . . . . . . . . . . . . . . . 18
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18
1. Introduction 1. Introduction
In IP networks, persistent congestion[RFC2914] lowers transport In IP networks, persistent congestion[RFC2914] lowers transport
throughput, leading to waste of network resource. Appropriate throughput, leading to waste of network resource. Appropriate
congestion control mechanisms are therefore critical to prevent the congestion control mechanisms are therefore critical to prevent the
network from falling into the persistent congestion state. Currently, network from falling into the persistent congestion state. Currently,
transport protocols such as TCP[RFC793], SCTP[RFC4960], transport protocols such as TCP[RFC793], SCTP[RFC4960],
DCCP[RFC4340], have their built-in congestion control mechanisms, and DCCP[RFC4340], have their built-in congestion control mechanisms, and
even for certain single transport protocol like TCP there can be a even for certain single transport protocol like TCP there can be a
couple of different congestion control mechanisms to choose from. All couple of different congestion control mechanisms to choose from. All
these congestion control mechanisms are implemented on host side, and these congestion control mechanisms are implemented on host side, and
there are reasons that only host side congestion control is not there are reasons that only host side congestion control is not
sufficient for the whole network to keep away from persistent sufficient for the whole network to keep away from persistent
congestion. For example, (1) some protocol's congestion control congestion. For example, (1) some protocol's congestion control
scheme may have internal design flaws; (2) improper software scheme may have internal design flaws; (2) improper software
implementation of protocol; (3) some transport protocols, e.g. implementation of protocol; (3) some transport protocols, e.g.
RTP[RFC3550] do not even provide congestion control at all. RTP[RFC3550] do not even provide congestion control at all; (4)a
heavy load from a much larger than expected number of responsive
flows could also lead to persistent congestion.
Tunnels are widely deployed in various networks including public Tunnels are widely deployed in various networks including public
Internet, data center network, and enterprise network etc. A tunnel Internet, data center network, and enterprise network etc. A tunnel
consists of ingress, egress and a set of intermediate routers. For consists of ingress, egress and a set of intermediate routers. For
the tunnel scenario, a tunnel-based mechanism is introduced for the tunnel scenario, a tunnel-based mechanism is introduced for
network traffic control to keep the network from persistent network traffic control to keep the network from persistent
congestion. Here, tunnel ingress will implement congestion congestion. Here, tunnel ingress will implement congestion
management function to control the traffic entering the tunnel. management function to control the traffic entering the tunnel.
This document provides a mechanism of feeding back inner tunnel This document provides a mechanism of feeding back inner tunnel
congestion level to the ingress. Using this mechanism the egress can congestion level to the ingress. Using this mechanism the egress can
feed the tunnel congestion level information it collects back to the feed the tunnel congestion level information it collects back to the
ingress. After receiving this information the ingress will be able to ingress. After receiving this information the ingress will be able to
perform congestion management according to network management policy. perform congestion management according to network management policy.
The following subjects are out of scope of current document: it gives
no advice on how to select which tunnel endpoints should be used in
order to manage traffic over a network criss-crossed by multiple
tunnels; if a congested node is part of multiple tunnels, and it
causes congestion feedback to multiple traffic management functions
at the ingresses of all the tunnels, the draft gives no advice on how
all the traffic management functions should respond.
2. Conventions And Terminologies 2. Conventions And Terminologies
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119] document are to be interpreted as described in RFC 2119 [RFC2119]
DP: Decision Point, an logical entity that makes congestion DP: Decision Point, an logical entity that makes congestion
management decision based on the received congestion feedback management decision based on the received congestion feedback
information. information.
AP: Action Point, an logical entity that implements congestion EP: Enforcement Point, an logical entity that implements congestion
management action according to the decision made by Decision Point. management action according to the decision made by Decision Point.
ECT: ECN-Capable Transport code point defined in RFC3168. ECT: ECN-Capable Transport code point defined in RFC3168.
3. Congestion Information Feedback Models 3. Congestion Information Feedback Models
The feedback model mainly consists of tunnel egress and tunnel The feedback model mainly consists of tunnel egress and tunnel
ingress. The tunnel egress composes of meter function and exporter ingress. The tunnel egress composes of meter function and exporter
function; tunnel ingress composes AP (Action Point) function, function; tunnel ingress composes EP (Enforcement Point) function,
collector function and DP (Decision Point) function. collector function and DP (Decision Point) function.
The Meter function collects network congestion level information, and The Meter function collects network congestion level information, and
conveys the information to Exporter which feeds back the information conveys the information to Exporter which feeds back the information
to the collector function. to the collector function.
The feedback message contains CE-marked packet ratio, the traffic
volumes of all kinds of ECN marking packets.
The collector collects congestion level information from exporter, The collector collects congestion level information from exporter,
after that congestion management Decision Point (DP) function will after that congestion management Decision Point (DP) function will
make congestion management decision based on the information from make congestion management decision based on the information from
collector. collector.
The action point controls the traffic entering tunnel, and it The Enforcement Point controls the traffic entering tunnel, and it
implements traffic control decision of DP. implements traffic control decision of DP.
Feedback Feedback
+-----------------------------------+ +-----------------------------------+
| | | |
| | | |
| V | V
+--------------+ +-------------+ +--------------+ +-------------+
| +--------+ | | +---------+ | | +--------+ | | +---------+ |
| |Exporter| | | |Collector| | | |Exporter| | | |Collector| |
| +---|----+ | | +---|-----+ | | +---|----+ | | +---|-----+ |
| +--|--+ | | +|-+ | | +--|--+ | | +|-+ |
| |Meter| | | |DP| | | |Meter| | traffic | |DP| |
| +-----+ | | +--+ | | +-----+ |<==================| +--+ |
| | | +--+ | | | | +--+ |
| | | |AP| | | | | |EP| |
| | | +--+ | | | | +--+ |
|Egress | | Ingress | |Egress | | Ingress |
+--------------+ +-------------+ +--------------+ +-------------+
Figure 1: Feedback Model. Figure 1: Feedback Model.
4. Congestion Level Measurement 4. Congestion Level Measurement
This section describes how to measure congestion level in a
tunnel.
The congestion level measurement is based on ECN (Explicit The congestion level measurement is based on ECN (Explicit
Congestion Notification) [RFC3168] and packet drop. If the routers Congestion Notification) [RFC3168] and packet drop. The network
support ECN, after router's queue length is over a predefined congestion level could be indicated through the ratio of CE-marked
threshold, the routers will mark the ECN-capable packets as packet and the volumes of packet drop, the relationship between
Congestion Experienced (CE) or drop not-ECT packets with the these two kinds of indicator is complementary. If the congestion
probability proportional to queue length; if the queue overflows level in tunnel is not high enough, the packets would be marked as
all packets will be dropped. If the routers do not support ECN, CE instead of being dropped, and then it is easy to calculate
after router's queue length is over a predefined threshold, the congestion level according to the ratio of CE-marked packets. If
routers will drop both the ECN-capable packets and the not-ECT the congestion level is so high that ECT packet will be dropped,
packets with the probability proportional to the queue length. then the packet loss ratio could be calculated by comparing total
packets entering ingress and total packets arriving at egress over
the same span of packets, if packet loss is detected, it could be
assumed that severe congestion has occurred in the tunnel.
The network congestion level could be indicated through the ratio Egress calculates CE-marked packet ratio by counting different
of CE-marked packet and the ratio of packet drop, the relationship kinds of ECN-marked packet, the CE-marked packet ratio will be
between these two kinds of indicator is complementary. If the used as an indication of tunnel load level. It's assumed that
congestion level in tunnel is not high enough, the packets would routers in the tunnel will not drop packets biased towards certain
be marked as CE instead of being dropped, and then it is easy to ECN codepoint, so calculating of CE-marked packet ratio is not
calculate congestion level according to the ratio of CE-marked affect by packet drop.
packets. If the congestion level is so high that ECT packet will
be dropped, then the packet loss ratio could be calculated by The calculation of volumes of packet drop is by comparing the
comparing total packets entering ingress and total packets traffic volumes between ingress and egress.
arriving at egress over the same span of packets, if packet loss
is detected, it could be assumed that severe congestion has
occurred in the tunnel. Because loss is only ever a sign of
serious congestion, so it doesn't need to measure loss ratio
accurately.
Faked ECN-capable transport (ECT) is used at ingress to defer Faked ECN-capable transport (ECT) is used at ingress to defer
packet loss to egress. The basic idea of faked ECT is that, when packet loss to egress. The basic idea of faked ECT is that, when
encapsulating packets, ingress first marks tunnel outer header encapsulating packets, ingress first marks tunnel outer header
according to RFC6040, and then remarks outer header of Not-ECT according to RFC6040, and then remarks outer header of Not-ECT
packet as ECT, there will be three kinds of combination of outer packet as ECT, there will be three kinds of combination of outer
header ECN field and inner header ECN field: CE|CE, ECT|N-ECT, header ECN field and inner header ECN field: CE|CE, ECT|N-ECT,
ECT|ECT (in the form of outer ECN| inner ECN); when decapsulating ECT|ECT (in the form of outer ECN| inner ECN); when decapsulating
packets at egress, RFC6040 defined decapsulation behavior is used, packets at egress, RFC6040 defined decapsulation behavior is used,
and according to RFC6040, the packets marked as CE|N-ECT will be and according to RFC6040, the packets marked as CE|N-ECT will be
dropped by egress. dropped by egress. Faked-ECT is used to shift some drops to the
egress in order to calculate CE-marked packet ratio more precisely
by egress.
To calculate congestion level, for the same span of packets, the To calculate congestion level, for the same span of packets, the
number of each kind of ECN marking packet at ingress and egress ratio of CE-marked packets will be calculated by egress, and the
will be compared to get the volume of CE-marked packet in the total bytes count of packets at ingress and egress will be
tunnel; and the total number of packets at ingress and egress will compared to detect the traffic volume loss in tunnel.
be compared to detect the packet loss.
The basic procedure of congestion level measurement is as follows: The basic procedure of packets loss measurement is as follows:
+-------+ +------+ +-------+ +------+
|Ingress| |Egress| |Ingress| |Egress|
+-------+ +------+ +-------+ +------+
| | | |
+----------------+ | +----------------+ |
|cumulative count| | |cumulative count| |
+----------------+ | +----------------+ |
| | | |
| <node id-i, ECN counts> | | <node id-i, ECN counts> |
|------------------------>| |------------------------>|
|<node id-e, ECN counts> | |<node id-e, ECN counts> |
|<------------------------| |<------------------------|
| | | |
| | | |
Figure 2: Procedure of Congestion Level Measurement Figure 2: Procedure of Packet Loss Measurement
Ingress encapsulates packets and marks outer header according to Ingress encapsulates packets and marks outer header according to
faked ECT as described above. Ingress cumulatively counts packets for faked ECT as described above. Ingress cumulatively counts packet
three types of ECN combination (CE|CE, ECT|N-ECT, ECT|ECT) and then bytes for three types of ECN combination (CE|CE, ECT|N-ECT, ECT|ECT)
the ingress regularly sends cumulative packet counts message of each and then the ingress regularly sends cumulative bytes counts message
type of ECN combination to the egress. When each message arrives, the of each type of ECN combination to the egress.
egress cumulatively counts packets coming from the ingress and adds
its own packet counts of each type of ECN combination (CE|CE, ECT|N-
ECT, CE|N-ECT, CE|ECT, ECT|ECT) to the message and returns the whole
message to the ingress.
The counting of packets can be at the granularity of the all traffic When each message arrives at egress, (1)egress calculates the ratio
of CE-marked packet; (2)the egress cumulatively counts packet bytes
coming from the ingress and adds its own bytes counts of each type of
ECN combination (CE|CE, ECT|N-ECT, CE|N-ECT, CE|ECT, ECT|ECT) to the
message for ingress to calculate packet loss. Egress feeds back CE-
marked packet ratio and bytes counts information to the ingress for
evaluating congestion level in the tunnel.
The counting of bytes can be at the granularity of the all traffic
from the ingress to the egress to learn about the overall congestion from the ingress to the egress to learn about the overall congestion
status of the path between the ingress and the egress. The counting status of the path between the ingress and the egress. The counting
can also be at the granularity of individual customer's traffic or a can also be at the granularity of individual customer's traffic or a
specific set of flows to learn about their congestion contribution. specific set of flows to learn about their congestion contribution.
5. Congestion Information Delivery 5. Congestion Information Delivery
As described above, the tunnel ingress needs to convey a message As described above, the tunnel ingress needs to convey a message
containing cumulative packet counts of each type of ECN combination containing cumulative bytes counts of packets of each type of ECN
to tunnel egress, and the tunnel egress also needs to feed back the combination to tunnel egress, and the tunnel egress also needs to
message of cumulative packet counts of each type of ECN combination feed back the message of cumulative bytes counts of packets of each
to the ingress. This section describes how the messages should be type of ECN combination and CE-marked packet ratio to the ingress.
conveyed. This section describes how the messages should be conveyed.
The message travels along the same path with network data traffic, The message travels along the same path with network data traffic,
referred as in-band signal. Because the message is transmitted in referred as in-band signal. Because the message is transmitted in
band, so the message packet may get lost in case of network band, so the message packet may get lost in case of network
congestion. To cope with the situation that the message packet gets congestion. To cope with the situation that the message packet gets
lost, the packet counts values are sent as cumulative counters. Then lost, the bytes counts values are sent as cumulative counters. Then
if a message is lost the next message will recover the missing if a message is lost the next message will recover the missing
information. Even though the missing information could be recovered, information. Even though the missing information could be recovered,
the message should be transmitted in a much higher priority than the message should be transmitted in a much higher priority than
users' traffic flows. users' traffic flows.
IPFIX [RFC7011] is selected as information feedback protocol. IPFIX IPFIX [RFC7011] is selected as a candidate information feedback
uses preferably SCTP as transport. SCTP allows partially reliable protocol. IPFIX uses preferably SCTP as transport. SCTP allows
delivery [RFC3758], which ensures the feedback message will not be partially reliable delivery [RFC3758], which ensures the feedback
blocked in case of packet loss due to network congestion. message will not be blocked in case of packet loss due to network
congestion.
Ingress can do congestion management at different granularity which Ingress can do congestion management at different granularity which
means both the overall aggregated inner tunnel congestion level and means both the overall aggregated inner tunnel congestion level and
congestion level contributed by certain traffic(s) could be measured congestion level contributed by certain traffic(s) could be measured
for different congestion management purpose. For example, if the for different congestion management purpose. For example, if the
ingress only wants to limit congestion volume caused by certain ingress only wants to limit congestion volume caused by certain
traffic(s),e.g UDP-based traffic, then congestion volume for the traffic(s),e.g UDP-based traffic, then congestion volume for the
traffic will be fed back; or if the ingress do overall congestion traffic will be fed back; or if the ingress do overall congestion
management, the aggregated congestion volume will be fed back. management, the aggregated congestion volume will be fed back.
When sending message from ingress to egress, the ingress acts as When sending message from ingress to egress, the ingress acts as
IPFIX exporter and egress acts as IPFIX collector; When feedback IPFIX exporter and egress acts as IPFIX collector; When feedback
congestion level information from egress to ingress, then the egress congestion level information from egress to ingress, then the egress
acts as IPFIX exporter and ingress acts as IPFIX collector. acts as IPFIX exporter and ingress acts as IPFIX collector.
The combination of congestion level measurement and congestion The combination of congestion level measurement and congestion
information delivery procedure should be as following: information delivery procedure should be as following:
# The ingress determines IPFIX template record to be used. The # The ingress determines IPFIX template record to be used. The
template record can be preconfigured or determined at runtime, the template record can be pre-configured or determined at runtime, the
content of template record will be determined according to the content of template record will be determined according to the
granularity of congestion management, if the ingress wants to limit granularity of congestion management, if the ingress wants to limit
congestion volume contributed by specific traffic flow then the congestion volume contributed by specific traffic flow then the
elements such as source IP address, destination IP address, flow id elements such as source IP address, destination IP address, flow id
and CE-marked packet volume of the flow etc will be included in the and CE-marked packet volume of the flow etc will be included in the
template record. template record.
# Meter on ingress measures traffic volume according to template # Meter on ingress measures traffic volume according to template
record chosen and then the measurement records are sent to egress in record chosen and then the measurement records are sent to egress in
band. band.
skipping to change at page 8, line 4 skipping to change at page 8, line 29
band. band.
# Meter on egress measures congestion level information according to # Meter on egress measures congestion level information according to
template record, the content of template record should be the same template record, the content of template record should be the same
as template record of ingress. as template record of ingress.
# Exporter of egress sends measurement record together with the # Exporter of egress sends measurement record together with the
measurement record of ingress back to the ingress. measurement record of ingress back to the ingress.
5.1 IPFIX Extensions 5.1 IPFIX Extensions
This sub-section defines a list of new IPFIX Information Elements This sub-section defines a list of new IPFIX Information Elements
according to RFC7013 [RFC7013]. according to RFC7013 [RFC7013].
5.1.1 tunnelEcnCeCePacketTotalCount 5.1.1 tunnelEcnCeCeByteTotalCount
Description: The total number of incoming packets with CE|CE ECN Description: The total number of bytes of incoming packets with CE|CE
marking combination for this Flow at the Observation Point since the ECN marking combination at the Observation Point since the Metering
Metering Process (re-)initialization for this Observation Point. Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD1 ElementId: TBD1
Statues: current Statues: current
Units: packets Units: bytes
5.1.2 tunnelEcnEct0NectPacketTotalCount 5.1.2 tunnelEcnEct0NectBytetTotalCount
Description: The total number of incoming packets with ECT(0)|N-ECT Description: The total number of bytes of incoming packets with
ECN marking combination for this Flow at the Observation Point since ECT(0)|N-ECT ECN marking combination at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD2 ElementId: TBD2
Statues: current Statues: current
Units: packets Units: bytes
5.1.3 tunnelEcnEct1NectPacketTotalCount 5.1.3 tunnelEcnEct1NectByteTotalCount
Description: The total number of incoming packets with ECT(1)|N-ECT Description: The total number of bytes of incoming packets with
ECN marking combination for this Flow at the Observation Point since ECT(1)|N-ECT ECN marking combination at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD3 ElementId: TBD3
Statues: current Statues: current
Units: packets
5.1.4 tunnelEcnCeNectPacketTotalCount Units: bytes
Description: The total number of incoming packets with CE|N-ECT ECN 5.1.4 tunnelEcnCeNectByteTotalCount
marking combination for this Flow at the Observation Point since the
Description: The total number of bytes of incoming packets with CE|N-
ECT ECN marking combination at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD4 ElementId: TBD4
Statues: current Statues: current
Units: packets Units: bytes
5.1.5 tunnelEcnCeEct0PacketTotalCount 5.1.5 tunnelEcnCeEct0ByteTotalCount
Description: The total number of incoming packets with CE|ECT(0) ECN Description: The total number of bytes of incoming packets with
marking combination for this Flow at the Observation Point since the CE|ECT(0) ECN marking combination at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD5 ElementId: TBD5
Statues: current Statues: current
Units: packets Units: bytes
5.1.6 tunnelEcnCeEct1PacketTotalCount 5.1.6 tunnelEcnCeEct1ByteTotalCount
Description: The total number of incoming packets with CE|ECT(1) ECN Description: The total number of bytes of incoming packets with
marking combination for this Flow at the Observation Point since the CE|ECT(1) ECN marking combination at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD6 ElementId: TBD6
Statues: current Statues: current
Units: packets
5.1.7 tunnelEcnEct0Ect0PacketTotalCount Units: bytes
Description: The total number of incoming packets with ECT(0)|ECT(0) 5.1.7 tunnelEcnEct0Ect0ByteTotalCount
ECN marking combination for this Flow at the Observation Point since
Description: The total number of bytes of incoming packets with
ECT(0)|ECT(0) ECN marking combination at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD7 ElementId: TBD7
Statues: current Statues: current
Units: packets Units: bytes
5.1.8 tunnelEcnEct1Ect1PacketTotalCount 5.1.8 tunnelEcnEct1Ect1PacketTotalCount
Description: The total number of incoming packets with ECT(1)|ECT(1) Description: The total number of bytes of incoming packets with
ECN marking combination for this Flow at the Observation Point since ECT(1)|ECT(1) ECN marking combination at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD8 ElementId: TBD8
Statues: current Statues: current
Units: packets Units: bytes
5.1.9 tunnelEcnCEMarkedRatio
Description: The ratio of CE-marked Packet at the Observation Point.
Abstract Data Type: float32
ElementId: TBD8
Statues: current
6. Congestion Management 6. Congestion Management
After tunnel ingress receives congestion level information, then After tunnel ingress receives congestion level information, then
congestion management actions could be taken based on the congestion management actions could be taken based on the
information, e.g. if the congestion level is higher than a predefined information, e.g. if the congestion level is higher than a predefined
threshold, then action could be taken to reduce the congestion level. threshold, then action could be taken to reduce the congestion level.
The design of network side congestion management SHOULD take host The design of network side congestion management SHOULD take host
side e2e congestion control mechanism into consideration, which means side e2e congestion control mechanism into consideration, which means
the congestion management needs to avoid the impacts on e2e the congestion management needs to avoid the impacts on e2e
congestion control. For instance, congestion management action must congestion control. For instance, congestion management action must
be delayed by more than a worst-case global RTT (e.g. 100ms), be delayed by more than a worst-case global RTT (e.g. 100ms),
otherwise tunnel traffic management will not give normal e2e otherwise tunnel traffic management will not give normal e2e
congestion control enough time to do its job, and the system could go congestion control enough time to do its job, and the system could go
unstable. unstable.
The detailed description of congestion management is out of scope of The detailed description of congestion management is out of scope of
this document, as examples, congestion management such as circuit this document, as examples, congestion management such as circuit
breaker [CB] could be applied. Circuit breaker is an automatic breaker [RFC8084] could be applied. Circuit breaker is an automatic
mechanism to estimate congestion, and to terminate flow(s) when mechanism to estimate congestion, and to terminate flow(s) when
persistent congestion is detected to prevent network congestion persistent congestion is detected to prevent network congestion
collapse. collapse.
6.1 Example 6.1 Example
This subsection provides an example of how the solution described in This subsection provides an example of how the solution described in
this document could work. this document could work.
First of all, IPFIX template records are exchanged between ingress First of all, IPFIX template records are exchanged between ingress
and egress to negotiate the format of data record, the example here and egress to negotiate the format of data record, the example here
is to measure the congestion level for the overall tunnel (caused by is to measure the congestion level for the overall tunnel (caused by
all the traffic in tunnel). After the negotiation is finished, all the traffic in tunnel). After the negotiation is finished,
ingress sends in-band message to egress, the message contains the ingress sends in-band message to egress, the message contains the
number of each kind of ECN-marked packets (i.e. CE|CE, ECT|N-ECT and number of each kind of ECN-marked packets (i.e. CE|CE, ECT|N-ECT and
ECT|ECT) received until the sending of message. ECT|ECT) received until the sending of message.
skipping to change at page 11, line 26 skipping to change at page 12, line 15
this document could work. this document could work.
First of all, IPFIX template records are exchanged between ingress First of all, IPFIX template records are exchanged between ingress
and egress to negotiate the format of data record, the example here and egress to negotiate the format of data record, the example here
is to measure the congestion level for the overall tunnel (caused by is to measure the congestion level for the overall tunnel (caused by
all the traffic in tunnel). After the negotiation is finished, all the traffic in tunnel). After the negotiation is finished,
ingress sends in-band message to egress, the message contains the ingress sends in-band message to egress, the message contains the
number of each kind of ECN-marked packets (i.e. CE|CE, ECT|N-ECT and number of each kind of ECN-marked packets (i.e. CE|CE, ECT|N-ECT and
ECT|ECT) received until the sending of message. ECT|ECT) received until the sending of message.
After egress receives the message, the egress counts number of After egress receives the message, the egress calculates CE-marked
different kinds of ECN-marking packets received until receiving the packet ratio and counts number of different kinds of ECN-marking
message, then the egress sends a feedback message containing the packets received until receiving the message, then the egress sends a
counts together with the information in ingress's message to ingress. feedback message containing the counts together with the information
in ingress's message to ingress.
Figure 3 to Figure 6 below show the example procedure between ingress Figure 3 to Figure 6 below show the example procedure between ingress
and egress. and egress.
+---------------------------------+----------------------+ +---------------------------------+----------------------+
|Set ID=2 Length=40 | |Set ID=2 Length=40 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|Template ID=256 Field Count =8 | |Template ID=256 Field Count =8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnCeCePacketTotalCount Field Length=8 | |tunnelEcnCeCeByteTotalCount Field Length=8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnEctNectPacketTotalCount Field Length=8 | |tunnelEcnEctNectByteTotalCount Field Length=8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnEctEctPacketTotalCount Field Length=8 | |tunnelEcnEctEctByteTotalCount Field Length=8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnCeCePacketTotalCount Field Length=8 | |tunnelEcnCeCeByteTotalCount Field Length=8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnEctNectPacketTotalCount Field Length=8 | |tunnelEcnEctNectByteTotalCount Field Length=8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnEctEctPacketTotalCount Field Length=8 | |tunnelEcnEctEctByteTotalCount Field Length=8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnCeNectPacketTotalCount Field Length=8 | |tunnelEcnCeNectByteTotalCount Field Length=8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnCeEctPacketTotalCount | Field Length=8 | |tunnelEcnCeEctByteTotalCount | Field Length=8 |
+---------------------------------+----------------------+
|tunnelEcnCEMarkedRatio | Field Length=4 |
+---------------------------------+----------------------+ +---------------------------------+----------------------+
Figure 3: Template Record Sent From Egress to Ingress Figure 3: Template Record Sent From Egress to Ingress
+---------------------------------+----------------------+ +---------------------------------+----------------------+
|Set ID=2 Length=28 | |Set ID=2 Length=28 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|Template ID=257 Field Count =3 | |Template ID=257 Field Count =3 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnCeCePacketTotalCount Field Length=8 | |tunnelEcnCeCeByteTotalCount Field Length=8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnEctNectPacketTotalCount Field Length=8 | |tunnelEcnEctNectByteTotalCount Field Length=8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
|tunnelEcnEctEctPacketTotalCount Field Length=8 | |tunnelEcnEctEctByteTotalCount Field Length=8 |
|---------------------------------|----------------------| |---------------------------------|----------------------|
Figure 4: Template Record Sent From Ingress to Egress Figure 4: Template Record Sent From Ingress to Egress
+-------+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-------+ +-------+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-------+
| | |M| |P| |P| |P| |M| |P| |P| | | | | |M| |P| |P| |P| |M| |P| |P| | |
| | +-+ +-+ +-+ +-+ +-+ +-+ +-+ | | | | +-+ +-+ +-+ +-+ +-+ +-+ +-+ | |
| |<---------------------------------------| | | |<---------------------------------------| |
| | | | | | | |
| | | | | | | |
|egress | +-+ +-+ |ingress| |egress | +-+ +-+ |ingress|
skipping to change at page 13, line 28 skipping to change at page 14, line 4
+-+ +-+
|M| : Message Packet |M| : Message Packet
+-+ +-+
+-+ +-+
|P| : User Packet |P| : User Packet
+-+ +-+
Figure 5 Traffic flow Between Ingress and Egress Figure 5 Traffic flow Between Ingress and Egress
Set ID=257, Length=28 Set ID=257, Length=28
+------+ A1 +------+ +------+ A1 +------+
| | B1 | | | | B1 | |
| | C1 | | | | C1 | |
| | <----------------------------- | | | | <----------------------------- | |
| | | | | | | |
| | | | | | | |
| | SetID=256, Length=68 | | | | SetID=256, Length=72 | |
| | A1 | | | | A1 | |
| | B1 | | | | B1 | |
|egress| C1 ingress| |egress| C1 ingress|
| | A2 | | | | A2 | |
| | B2 | | | | B2 | |
| | C2 | | | | C2 | |
| | D | | | | D | |
| | E | | | | E
| | R | |
| | ----------------------------> | | | | ----------------------------> | |
| | | | | | | |
+------+ +------+ +------+ +------+
Figure 6: Message Between Ingress and Egress Figure 6: Message Between Ingress and Egress
The following provides an example of how tunnel congestion level The following provides an example of how tunnel congestion level
could be calculated: could be calculated:
Congestion Level could be divided into two categories:(1)slight Congestion Level could be divided into two categories:(1)slight
congestion(no packets dropped); (2)serious congestion (packet congestion(no packets dropped); (2)serious congestion (packet
dropping happen). dropping happen).
For slight congestion, the congestion level is indicated as the For slight congestion, the congestion level is indicated as the ratio
number of CE-marked packet: of CE-marked packet:
ce_marked = (A2 + D + E) - A1; ce_marked = R;
For serious congestion, the congestion level is indicated as the For serious congestion, the congestion level is indicated as the
number of lost packets: number of volume loss:
total_ingress = (A1 + B1 + C1) total_ingress = (A1 + B1 + C1)
total_egress = (A2 + B2 + C2 + D + E) total_egress = (A2 + B2 + C2 + D + E)
packet_loss = (total_ingress - total_egress) volume_loss = (total_ingress - total_egress)
7. Security Considerations 7. Security Considerations
This document describes the tunnel congestion calculation and This document describes the tunnel congestion calculation and
feedback. feedback.
The tunnel endpoints are assumed to be deployed in the same The tunnel endpoints are assumed to be deployed in the same
administrative domain, so the ingress and egress will trust each administrative domain, so the ingress and egress will trust each
other, the signaling traffic between ingress and egress will be other, the signaling traffic between ingress and egress will be
protected utilizing security mechanism provided IPFIX (see section 11 protected utilizing security mechanism provided IPFIX (see section 11
in RFC7011). in RFC7011).
From the consideration of privacy point of view, in case of fine From the consideration of privacy point of view, in case of fine
skipping to change at page 15, line 6 skipping to change at page 15, line 30
This document defines a set of new IPFIX Information Elements This document defines a set of new IPFIX Information Elements
(IE),which need to be registered at IANA IPFIX Information Element (IE),which need to be registered at IANA IPFIX Information Element
Registry. Registry.
ElementID: TBD1 ElementID: TBD1
Name:tunnelEcnCeCePacketTotalCount Name:tunnelEcnCeCePacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with CE|CE ECN Description:The total number of bytes of incoming packets with CE|CE
marking combination for this Flow at the Observation Point since the ECN marking combination at the Observation Point since the Metering
Metering Process (re-)initialization for this Observation Point. Process (re-)initialization for this Observation Point.
Units: packets Units: octets
ElementID: TBD2 ElementID: TBD2
Name:tunnelEcnEct0NectPacketTotalCount Name:tunnelEcnEct0NectPacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with ECT(0)|N-ECT Description:The total number of bytes of incoming packets with
ECN marking combination for this Flow at the Observation Point since ECT(0)|N-ECT ECN marking combination at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Units: packets Units: octets
ElementID: TBD3 ElementID: TBD3
Name: tunnelEcnEct1NectPacketTotalCount Name: tunnelEcnEct1NectPacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with ECT(1)|N-ECT Description:The total number of bytes of incoming packets with
ECN marking combination for this Flow at the Observation Point since ECT(1)|N-ECT ECN marking combination at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Units: packets Units: octets
ElementID: TBD4 ElementID: TBD4
Name:tunnelEcnCeNectPacketTotalCount Name:tunnelEcnCeNectPacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with CE|N-ECT ECN Description:The total number of bytes of incoming packets with CE|N-
marking combination for this Flow at the Observation Point since the ECT ECN marking combination at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Units: packets Units: octets
ElementID: TBD5 ElementID: TBD5
Name:tunnelEcnCeEct0PacketTotalCount Name:tunnelEcnCeEct0PacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with CE|ECT(0) ECN Description:The total number of bytes of incoming packets with
marking combination for this Flow at the Observation Point since the CE|ECT(0) ECN marking combination at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Units: packets Units: octets
ElementID: TBD6 ElementID: TBD6
Name:tunnelEcnCeEct1PacketTotalCount Name:tunnelEcnCeEct1PacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with CE|ECT(1) ECN Description:The total number of bytes of incoming packets with
marking combination for this Flow at the Observation Point since the CE|ECT(1) ECN marking combination at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Units: packets Units: octets
ElementID: TBD7 ElementID: TBD7
Name:tunnelEcnEct0Ect0PacketTotalCount Name:tunnelEcnEct0Ect0PacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with ECT(0)|ECT(0) Description:The total number of bytes of incoming packets with
ECN marking combination for this Flow at the Observation Point since ECT(0)|ECT(0) ECN marking combination at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Units: packets Units: octets
ElementID: TBD8 ElementID: TBD8
Name:tunnelEcnEct1Ect1PacketTotalCount Name:tunnelEcnEct1Ect1PacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with Description:The total number of bytes of incoming packets with
ECT(1)|ECT(1)ECN marking combination for this Flow at the Observation ECT(1)|ECT(1)ECN marking combination at the Observation Point since
Point since the Metering Process (re-)initialization for this the Metering Process (re-)initialization for this Observation Point.
Observation Point. Units: octets
Units: packets
ElementID: TBD9
Name: tunnelEcnCEMarkedRatio
Data Type: float32
Status: current
Description: The ratio of CE-marked Packet at the Observation Point.
[TO BE REMOVED: This registration should take place at the following [TO BE REMOVED: This registration should take place at the following
location: http://www.iana.org/assignments/ipfix/ipfix.xhtml#ipfix- location: http://www.iana.org/assignments/ipfix/ipfix.xhtml#ipfix-
information-elements] information-elements]
9. References 9. References
9.1 Normative References 9.1 Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
skipping to change at page 17, line 44 skipping to change at page 18, line 24
Reviewers of IP Flow Information Export (IPFIX) Reviewers of IP Flow Information Export (IPFIX)
Information Elements", BCP 184, RFC 7013, September 2013, Information Elements", BCP 184, RFC 7013, September 2013,
<http://www.rfc-editor.org/info/rfc7013>. <http://www.rfc-editor.org/info/rfc7013>.
[CONEX] Matt Mathis, Bob Briscoe. "Congestion Exposure (ConEx) [CONEX] Matt Mathis, Bob Briscoe. "Congestion Exposure (ConEx)
Concepts, Abstract Mechanism and Requirements", RFC7713, Concepts, Abstract Mechanism and Requirements", RFC7713,
December 2015 December 2015
9.2 Informative References 9.2 Informative References
[CB] G. Fairhurst. "Network Transport Circuit Breakers", draft-ietf- [RFC8084] G. Fairhurst. "Network Transport Circuit Breakers", draft-
tsvwg-circuit-breaker-01, April 02, 2015 ietf-tsvwg-circuit-breaker-01, April 02, 2015
10. Acknowledgements 10. Acknowledgements
Thanks Bob Briscoe for his insightful suggestions on the basic Thanks Bob Briscoe for his insightful suggestions on the basic
mechanisms of congestion information collection and many other useful mechanisms of congestion information collection and many other useful
comments. Thanks David Black for his useful technical suggestions. comments. Thanks David Black for his useful technical suggestions.
Also, thanks Anthony Chan, Jake Holland, John Kaippallimalil and Also, thanks Anthony Chan, Jake Holland, John Kaippallimalil and
Vincent Roca for their careful reviews. Vincent Roca for their careful reviews.
Authors' Addresses Authors' Addresses
Xinpeng Wei Xinpeng Wei
Beiqing Rd. Z-park No.156, Haidian District, Beiqing Rd. Z-park No.156, Haidian District,
Beijing, 100095, P. R. China Beijing, 100095, P. R. China
E-mail: weixinpeng@huawei.com E-mail: weixinpeng@huawei.com
 End of changes. 91 change blocks. 
169 lines changed or deleted 202 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/