draft-ietf-tcpm-proportional-rate-reduction-01.txt | draft-ietf-tcpm-proportional-rate-reduction-02.txt | |||
---|---|---|---|---|
TCP Maintenance Working Group M. Mathis | TCP Maintenance Working Group M. Mathis | |||
Internet-Draft N. Dukkipati | Internet-Draft N. Dukkipati | |||
Intended status: Experimental Y. Cheng | Intended status: Experimental Y. Cheng | |||
Expires: August 27, 2012 Google, Inc | Expires: January 14, 2013 Google, Inc | |||
February 24, 2012 | July 13, 2012 | |||
Proportional Rate Reduction for TCP | Proportional Rate Reduction for TCP | |||
draft-ietf-tcpm-proportional-rate-reduction-01.txt | draft-ietf-tcpm-proportional-rate-reduction-02.txt | |||
Abstract | Abstract | |||
This document describes an experimental algorithm, Proportional Rate | This document describes an experimental algorithm, Proportional Rate | |||
Reduction (PPR) to improve the accuracy of the amount of data sent by | Reduction (PPR) to improve the accuracy of the amount of data sent by | |||
TCP during loss recovery. Standard Congestion Control requires that | TCP during loss recovery. Standard Congestion Control requires that | |||
TCP and other protocols reduce their congestion window in response to | TCP and other protocols reduce their congestion window in response to | |||
losses. This window reduction naturally occurs in the same round | losses. This window reduction naturally occurs in the same round | |||
trip as the data retransmissions to repair the losses, and is | trip as the data retransmissions to repair the losses, and is | |||
implemented by choosing not to transmit any data in response to some | implemented by choosing not to transmit any data in response to some | |||
ACKs arriving from the receiver. Two widely deployed algorithms are | ACKs arriving from the receiver. Two widely deployed algorithms are | |||
used to implement this window reduction: Fast Recovery and Rate | used to implement this window reduction: Fast Recovery and Rate | |||
Halving. Both algorithms are needlessly fragile under a number of | Halving. Both algorithms are needlessly fragile under a number of | |||
conditions, particularly when there is a burst of losses that such | conditions, particularly when there is a burst of losses such that | |||
that the number of ACKs returning to the sender is small. | the number of ACKs returning to the sender is small. Proportional | |||
Proportional Rate Reduction minimizes these excess window reductions | Rate Reduction minimizes these excess window reductions such that at | |||
such that at the end of recovery the actual window size will be as | the end of recovery the actual window size will be as close as | |||
close as possible to ssthresh, the window size determined by the | possible to ssthresh, the window size determined by the congestion | |||
congestion control algorithm. It is patterned after Rate Halving, | control algorithm. It is patterned after Rate Halving, but using the | |||
but using the fraction that is appropriate for target window chosen | fraction that is appropriate for target window chosen by the | |||
by the congestion control algorithm. | congestion control algorithm. | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on August 27, 2012. | This Internet-Draft will expire on January 14, 2013. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 33 | skipping to change at page 2, line 33 | |||
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
3. Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 3. Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
3.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 3.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
4. Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 4. Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
5. Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 5. Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
6. Conclusion and Recommendations . . . . . . . . . . . . . . . . 12 | 6. Conclusion and Recommendations . . . . . . . . . . . . . . . . 12 | |||
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 | 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . . 13 | ||||
10.2. Informative References . . . . . . . . . . . . . . . . . . 14 | ||||
Appendix A. Packet Conservation Bound . . . . . . . . . . . . . . 14 | Appendix A. Packet Conservation Bound . . . . . . . . . . . . . . 14 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
1. Introduction | 1. Introduction | |||
This document describes an experimental algorithm, Proportional Rate | This document describes an experimental algorithm, Proportional Rate | |||
Reduction (PPR) to improve the accuracy of the amount of data sent by | Reduction (PPR) to improve the accuracy of the amount of data sent by | |||
TCP during loss recovery. | TCP during loss recovery. | |||
Standard Congestion Control [RFC5681] requires that TCP (and other | Standard Congestion Control [RFC5681] requires that TCP (and other | |||
protocols) reduce their congestion window in response to losses. | protocols) reduce their congestion window in response to losses. | |||
Fast Recovery, described in the same document, is the reference | Fast Recovery, described in the same document, is the reference | |||
algorithm for making this adjustment. It's stated goal is to recover | algorithm for making this adjustment. Its stated goal is to recover | |||
TCP's self clock by relying on returning ACKs during recovery to | TCP's self clock by relying on returning ACKs during recovery to | |||
clock more data into the network. Fast Recovery adjusts the window | clock more data into the network. Fast Recovery adjusts the window | |||
by waiting for one half RTT of ACKs to pass before sending any data. | by waiting for one half RTT of ACKs to pass before sending any data. | |||
It is fragile because it can not compensate for the implicit window | It is fragile because it can not compensate for the implicit window | |||
reduction caused by the losses themselves, and is exposed to | reduction caused by the losses themselves, and is exposed to | |||
timeouts. For example if half of the data or ACKs are lost, Fast | timeouts. For example if half of the data or ACKs are lost, Fast | |||
Recovery's expected behavior would be wait for half window of ACKs to | Recovery's expected behavior would be to wait for a half of a window | |||
pass and then not receive any ACKs for the recovery and suffer a | of (remaining) ACKs to pass. It would then not receive any of the | |||
timeout. | ACKs needed for recovery and suffer a timeout. | |||
The rate-halving algorithm improves this situation by sending data on | The rate-halving algorithm improves this situation by sending data on | |||
alternate ACKs during recovery, such that after one RTT the window | alternate ACKs during recovery, such that after one RTT the window | |||
has been halved. Rate-having is implemented in Linux after only | has been halved. Rate-halving is implemented in Linux after only | |||
being informally published [RHweb], including an uncompleted | being informally published [RHweb], including an uncompleted | |||
Internet-Draft [RHID]. Rate-halving does not adequately compensate | Internet-Draft [RHID]. Rate-halving also does not adequately | |||
for the implicit window reduction caused by the losses and assumes a | compensate for the implicit window reduction caused by the losses and | |||
50% window reduction, which was completely standard at the time it | assumes a 50% window reduction, which was completely standard at the | |||
was written, but not appropriate for modern congestion control | time it was written, but not appropriate for modern congestion | |||
algorithms such as Cubic [CUBIC], which can reduce the window by less | control algorithms such as Cubic [CUBIC], which reduce the window by | |||
than 50%. As a consequence rate-halving often allows the window to | less than 50%. As a consequence rate-halving often allows the window | |||
fall further than necessary, reducing performance and increasing the | to fall further than necessary, reducing performance and increasing | |||
risk of timeouts if there are additional losses. | the risk of timeouts if there are additional losses. | |||
Proportional Rate Reduction (PPR) avoids these excess window | Proportional Rate Reduction (PPR) avoids these excess window | |||
reductions such that at the end of recovery the actual window size | reductions such that at the end of recovery the actual window size | |||
will be as close as possible to, ssthresh, the window size determined | will be as close as possible to ssthresh, the window size determined | |||
by the congestion control algorithm. It is patterned after Rate | by the congestion control algorithm. It is patterned after Rate | |||
Halving, but using the fraction that is appropriate for target window | Halving, but using the fraction that is appropriate for the target | |||
chosen by the congestion control algorithm. During PRR one of two | window chosen by the congestion control algorithm. During PRR one of | |||
additional reduction bound algorithms limits the total window | two additional reduction bound algorithms limits the total window | |||
reduction due to all mechanisms, including application stalls and the | reduction due to all mechanisms, including application stalls and the | |||
losses themselves. | losses themselves. | |||
We describe two slightly different reduction bound algorithms: | We describe two slightly different reduction bound algorithms: | |||
conservative reduction bound (CRB), which is strictly packet | conservative reduction bound (CRB), which is strictly packet | |||
conserving; and a slow start reduction bound (SSRB), which is more | conserving; and a slow start reduction bound (SSRB), which is more | |||
aggressive than CRB by at most one segment per ACK. PRR-CRB meets | aggressive than CRB by at most one segment per ACK. PRR-CRB meets | |||
the strong conservative bound described in Appendix A, however in | the strong conservative bound described in Appendix A, however in | |||
real networks it does not perform as well as the algorithms described | real networks it does not perform as well as the algorithms described | |||
in RFC 3517, which prove to be non-conservative in a significant | in RFC 3517, which prove to be non-conservative in a significant | |||
skipping to change at page 4, line 31 | skipping to change at page 4, line 31 | |||
We evaluated these and other algorithms in a large scale measurement | We evaluated these and other algorithms in a large scale measurement | |||
study, summarized below. The most important results from that study | study, summarized below. The most important results from that study | |||
are presented in an companion paper [IMC11]. PRR+SSRB outperforms | are presented in an companion paper [IMC11]. PRR+SSRB outperforms | |||
both RFC 3517 and Linux Rate Halving under authentic network traffic, | both RFC 3517 and Linux Rate Halving under authentic network traffic, | |||
even though it is less aggressive than RFC 3517. | even though it is less aggressive than RFC 3517. | |||
The algorithms are described as modifications to RFC 5681 [RFC5681], | The algorithms are described as modifications to RFC 5681 [RFC5681], | |||
TCP Congestion Control, using concepts drawn from the pipe algorithm | TCP Congestion Control, using concepts drawn from the pipe algorithm | |||
[RFC3517]. They are most accurate and more easily implemented with | [RFC3517]. They are most accurate and more easily implemented with | |||
SACK [RFC2018], but they do not require SACK. | SACK [RFC2018], but do not require SACK. | |||
2. Definitions | 2. Definitions | |||
The following terms, parameters and state variables are used as they | The following terms, parameters and state variables are used as they | |||
are defined in earlier documents: | are defined in earlier documents: | |||
RFC 3517: covered (as in "covered sequence numbers") | RFC 3517: covered (as in "covered sequence numbers") | |||
RFC 5681: duplicate ACK, FlightSize, Sender Maximum Segment Size | RFC 5681: duplicate ACK, FlightSize, Sender Maximum Segment Size | |||
(SMSS) | (SMSS) | |||
skipping to change at page 5, line 35 | skipping to change at page 5, line 35 | |||
decision of which data to send (e.g. retransmit missing data or send | decision of which data to send (e.g. retransmit missing data or send | |||
more new data) is out of scope for this document. | more new data) is out of scope for this document. | |||
3. Algorithms | 3. Algorithms | |||
At the beginning of recovery initialize PRR state. This assumes a | At the beginning of recovery initialize PRR state. This assumes a | |||
modern congestion control algorithm, CongCtrlAlg(), that might set | modern congestion control algorithm, CongCtrlAlg(), that might set | |||
ssthresh to something other than FlightSize/2: | ssthresh to something other than FlightSize/2: | |||
ssthresh = CongCtrlAlg() // Target cwnd after recovery | ssthresh = CongCtrlAlg() // Target cwnd after recovery | |||
prr_delivered = 0 // Total bytes delivered during recov | prr_delivered = 0 // Total bytes delivered during recovery | |||
prr_out = 0 // Total bytes sent during recovery | prr_out = 0 // Total bytes sent during recovery | |||
RecoverFS = snd.nxt-snd.una // FlightSize at the start of recov | RecoverFS = snd.nxt-snd.una // FlightSize at the start of recovery | |||
On every ACK during recovery compute: | On every ACK during recovery compute: | |||
DeliveredData = delta(snd.una) + delta(SACKd) | DeliveredData = change_in(snd.una) + change_in(SACKd) | |||
prr_delivered += DeliveredData | prr_delivered += DeliveredData | |||
pipe = (RFC 3517 pipe algorithm) | pipe = (RFC 3517 pipe algorithm) | |||
if (pipe > ssthresh) { | if (pipe > ssthresh) { | |||
// Proportional Rate Reduction | // Proportional Rate Reduction | |||
sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out | sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out | |||
} else { | } else { | |||
// Two version of the reduction bound | // Two version of the reduction bound | |||
if (conservative) { // PRR+CRB | if (conservative) { // PRR+CRB | |||
limit = prr_delivered - prr_out | limit = prr_delivered - prr_out | |||
} else { // PRR+SSRB | } else { // PRR+SSRB | |||
skipping to change at page 6, line 30 | skipping to change at page 6, line 30 | |||
} | } | |||
On any data transmission or retransmission: | On any data transmission or retransmission: | |||
prr_out += (data sent) // strictly less than or equal to sndcnt | prr_out += (data sent) // strictly less than or equal to sndcnt | |||
3.1. Examples | 3.1. Examples | |||
We illustrate these algorithms by showing their different behaviors | We illustrate these algorithms by showing their different behaviors | |||
for two scenarios: TCP experiencing either a single loss or a burst | for two scenarios: TCP experiencing either a single loss or a burst | |||
of 15 consecutive losses. In all cases we assume bulk data, standard | of 15 consecutive losses. In all cases we assume bulk data (no | |||
AIMD congestion control and cwnd = FlightSize = pipe = 20 segments, | application pauses), standard AIMD congestion control and cwnd = | |||
so ssthresh will be set to 10 at the beginning of recovery. We also | FlightSize = pipe = 20 segments, so ssthresh will be set to 10 at the | |||
assume standard Fast Retransmit and Limited Transmit, so we send two | beginning of recovery. We also assume standard Fast Retransmit and | |||
new segments followed by one retransmit on the first 3 duplicate ACKs | Limited Transmit, so TCP will send two new segments followed by one | |||
after the losses. | retransmit in response to the first 3 duplicate ACKs following the | |||
losses. | ||||
Each of the diagrams below shows the per ACK response to the first | Each of the diagrams below shows the per ACK response to the first | |||
round trip for the various recovery algorithms when the zeroth | round trip for the various recovery algorithms when the zeroth | |||
segment is lost. The top line indicates the transmitted segment | segment is lost. The top line indicates the transmitted segment | |||
number triggering the ACKs, with an X for the lost segment. "cwnd" | number triggering the ACKs, with an X for the lost segment. "cwnd" | |||
and "pipe" indicate the values of these algorithms after processing | and "pipe" indicate the values of these algorithms after processing | |||
each returning ACK. "Sent" indicates how much 'N'ew or | each returning ACK. "Sent" indicates how much 'N'ew or | |||
'R'etransmitted data would be sent. Note that the algorithms for | 'R'etransmitted data would be sent. Note that the algorithms for | |||
deciding which data to send are out of scope of this document. | deciding which data to send are out of scope of this document. | |||
When there is a single loss, PRR with either of the reduction bound | When there is a single loss, PRR with either of the reduction bound | |||
algorithms has the same behavior. We show "RB", a flag indicating | algorithms has the same behavior. We show "RB", a flag indicating | |||
which reduction bound subexpression ultimately determined the value | which reduction bound subexpression ultimately determined the value | |||
of sndcnt. When there is minimal losses "limit" (both algorithms) | of sndcnt. When there is minimal losses "limit" (both algorithms) | |||
will always be larger than ssthresh - pipe, so the sndcnt will be | will always be larger than ssthresh - pipe, so the sndcnt will be | |||
ssthresh - pipe indicated by "s" in the "RB" row. Since PRR does not | ssthresh - pipe indicated by "s" in the "RB" row. | |||
use cwnd during recovery it is not shown in the example. | ||||
RFC 3517 | RFC 3517 | |||
ack# X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ack# X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |||
cwnd: 20 20 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 | cwnd: 20 20 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 | |||
pipe: 19 19 18 18 17 16 15 14 13 12 11 10 10 10 10 10 10 10 10 | pipe: 19 19 18 18 17 16 15 14 13 12 11 10 10 10 10 10 10 10 10 | |||
sent: N N R N N N N N N N N | sent: N N R N N N N N N N N | |||
Rate Halving (Linux) | Rate Halving (Linux) | |||
ack# X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ack# X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |||
cwnd: 20 20 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 11 | cwnd: 20 20 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 11 | |||
pipe: 19 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 11 10 | pipe: 19 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 11 10 | |||
sent: N N R N N N N N N N N | sent: N N R N N N N N N N N | |||
PRR | PRR | |||
ack# X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ack# X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |||
pipe: 19 19 18 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 10 | pipe: 19 19 18 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 10 | |||
sent: N N R N N N N N N N N | sent: N N R N N N N N N N N | |||
RB: s s | RB: s s | |||
Cwnd is not shown because PRR does not use it. | ||||
Note that all three algorithms send same total amount of data. RFC | Key for RB | |||
3517 experiences a "half-window of silence", while the Rate Halving | s: sndcnt = ssthresh - pipe // from ssthresh | |||
and PRR spread the voluntary window reduction across an entire RTT. | b: sndcnt = prr_delivered - prr_out + SMSS // from banked | |||
d: sndcnt = DeliveredData + SMSS // from DeliveredData | ||||
(Sometimes more than one applies) | ||||
Note that all three algorithms send the same total amount of data. | ||||
RFC 3517 experiences a "half-window of silence", while the Rate | ||||
Halving and PRR spread the voluntary window reduction across an | ||||
entire RTT. | ||||
Next we consider the same initial conditions when the first 15 | Next we consider the same initial conditions when the first 15 | |||
packets (0-14) are lost. During the remainder of the lossy RTT, only | packets (0-14) are lost. During the remainder of the lossy RTT, only | |||
5 ACKs are returned to the sender. We examine each of these | 5 ACKs are returned to the sender. We examine each of these | |||
algorithms in succession. | algorithms in succession. | |||
RFC 3517 | RFC 3517 | |||
ack# X X X X X X X X X X X X X X X 15 16 17 18 19 | ack# X X X X X X X X X X X X X X X 15 16 17 18 19 | |||
cwnd: 20 20 11 11 11 | cwnd: 20 20 11 11 11 | |||
pipe: 19 19 4 10 10 | pipe: 19 19 4 10 10 | |||
skipping to change at page 8, line 21 | skipping to change at page 8, line 21 | |||
Rate Halving (Linux) | Rate Halving (Linux) | |||
ack# X X X X X X X X X X X X X X X 15 16 17 18 19 | ack# X X X X X X X X X X X X X X X 15 16 17 18 19 | |||
cwnd: 20 20 5 5 5 | cwnd: 20 20 5 5 5 | |||
pipe: 19 19 4 4 4 | pipe: 19 19 4 4 4 | |||
sent: N N R R R | sent: N N R R R | |||
PRR-CRB | PRR-CRB | |||
ack# X X X X X X X X X X X X X X X 15 16 17 18 19 | ack# X X X X X X X X X X X X X X X 15 16 17 18 19 | |||
pipe: 19 19 4 4 4 | pipe: 19 19 4 4 4 | |||
sent: N N R R R | sent: N N R R R | |||
RB: f f f | RB: b b b | |||
PRR-SSRB | PRR-SSRB | |||
ack# X X X X X X X X X X X X X X X 15 16 17 18 19 | ack# X X X X X X X X X X X X X X X 15 16 17 18 19 | |||
pipe: 19 19 4 5 6 | pipe: 19 19 4 5 6 | |||
sent: N N 2R 2R 2R | sent: N N 2R 2R 2R | |||
RB: d d d | RB: bd d d | |||
In this specific situation, RFC 3517 is very non-conservative, | In this specific situation, RFC 3517 is very non-conservative, | |||
because once fast retransmit is triggered (on the ACK for segment 17) | because once fast retransmit is triggered (on the ACK for segment 17) | |||
TCP immediately retransmits sufficient data to bring pipe up to cwnd. | TCP immediately retransmits sufficient data to bring pipe up to cwnd. | |||
Our measurement data (see Section 5) indicates that RFC 3517 | Our measurement data (see Section 5) indicates that RFC 3517 | |||
significantly outperforms Rate Halving, PRR-CRB and some other | significantly outperforms Rate Halving, PRR-CRB and some other | |||
similarly conservative algorithms that we tested, suggesting that it | similarly conservative algorithms that we tested, suggesting that it | |||
is significantly common for the actual losses to exceed the window | is significantly common for the actual losses to exceed the window | |||
reduction determined by the congestion control algorithm. | reduction determined by the congestion control algorithm. | |||
skipping to change at page 9, line 5 | skipping to change at page 9, line 5 | |||
the conservative reduction bound [RHweb]. In this situation the five | the conservative reduction bound [RHweb]. In this situation the five | |||
ACKs trigger exactly one transmission each (2 new data, 3 old data), | ACKs trigger exactly one transmission each (2 new data, 3 old data), | |||
and cwnd is set to 5. At a window size of 5, it takes three round | and cwnd is set to 5. At a window size of 5, it takes three round | |||
trips to retransmit all 15 lost segments. Rate Halving does not | trips to retransmit all 15 lost segments. Rate Halving does not | |||
raise the window at all during recovery, so when recovery finally | raise the window at all during recovery, so when recovery finally | |||
completes, TCP will slowstart cwnd from 5 up to 10. In this example, | completes, TCP will slowstart cwnd from 5 up to 10. In this example, | |||
TCP operates at half of the window chosen by the congestion control | TCP operates at half of the window chosen by the congestion control | |||
for more than three RTTs, increasing the elapsed time and exposing it | for more than three RTTs, increasing the elapsed time and exposing it | |||
to timeouts in the event that there are additional losses. | to timeouts in the event that there are additional losses. | |||
PRR-CRB implements conservative reduction bound. Since the total | PRR-CRB implements a conservative reduction bound. Since the total | |||
losses bring pipe below ssthresh, data is sent such that the total | losses bring pipe below ssthresh, data is sent such that the total | |||
data transmitted, prr_out, follows the total data delivered to the | data transmitted, prr_out, follows the total data delivered to the | |||
receiver as reported by returning ACKs. Transmission is controlled | receiver as reported by returning ACKs. Transmission is controlled | |||
by the sending limit, which was set to prr_delivered - prr_out. This | by the sending limit, which was set to prr_delivered - prr_out. This | |||
is indicated by the RB:f tagging in the figure. In this case PRR-CRB | is indicated by the RB:b tagging in the figure. In this case PRR-CRB | |||
is exposed to exactly the same problems as Rate Halving, the excess | is exposed to exactly the same problems as Rate Halving, the excess | |||
window reduction causes it to take excessively long to recover the | window reduction causes it to take excessively long to recover the | |||
losses and exposes it to additional timeouts. | losses and exposes it to additional timeouts. | |||
PRR-SSRB increases the window by exactly 1 segment per ACK until pipe | PRR-SSRB increases the window by exactly 1 segment per ACK until pipe | |||
rises to sshthresh during recovery. This is accomplished by setting | rises to ssthresh during recovery. This is accomplished by setting | |||
limit to one greater than the data reported to have been delivered to | limit to one greater than the data reported to have been delivered to | |||
the receiver on this ACK, implementing slowstart during recovery, and | the receiver on this ACK, implementing slowstart during recovery, and | |||
indicated by RB:d tagging in the figure. Although increasing the | indicated by RB:d tagging in the figure. Although increasing the | |||
window during recovery seems to be ill advised, it is important to | window during recovery seems to be ill advised, it is important to | |||
remember that this actually less aggressive than permitted by RFC | remember that this is actually less aggressive than permitted by RFC | |||
5681, which sends the same quantity of additional data as a single | 5681, which sends the same quantity of additional data as a single | |||
burst in response to the ACK that triggered Fast Retransmit | burst in response to the ACK that triggered Fast Retransmit | |||
For less extreme events, where the total losses are smaller than the | For less extreme events, where the total losses are smaller than the | |||
difference between Flight Size and ssthresh, PRR-CRB and PRR-SSRB | difference between Flight Size and ssthresh, PRR-CRB and PRR-SSRB | |||
have identical behaviours. | have identical behaviours. | |||
4. Properties | 4. Properties | |||
The following properties are common to both PRR-CRB and PRR-SSRB | The following properties are common to both PRR-CRB and PRR-SSRB | |||
skipping to change at page 11, line 31 | skipping to change at page 11, line 31 | |||
5. Measurements | 5. Measurements | |||
In a companion IMC11 paper [IMC11] we describe some measurements | In a companion IMC11 paper [IMC11] we describe some measurements | |||
comparing the various strategies for reducing the window during | comparing the various strategies for reducing the window during | |||
recovery. The results are summarized here. | recovery. The results are summarized here. | |||
The various window reduction algorithms and extensive instrumentation | The various window reduction algorithms and extensive instrumentation | |||
were all implemented in Linux 2.6. We used the uniform set of | were all implemented in Linux 2.6. We used the uniform set of | |||
algorithms present in the base Linux implementation, including CUBIC | algorithms present in the base Linux implementation, including CUBIC | |||
[CUBIC], limited transmit [RFC3742], threshold transmit from [FACK] | [CUBIC], limited transmit [RFC3042], threshold transmit from [FACK] | |||
and lost retransmission detection algorithms. We confirmed that the | and lost retransmission detection algorithms. We confirmed that the | |||
behaviors of Rate Halving (the Linux default), RFC 3517 and PRR were | behaviors of Rate Halving (the Linux default), RFC 3517 and PRR were | |||
authentic to their respective specifications and that performance and | authentic to their respective specifications and that performance and | |||
features were comparable to the kernels in production use. The | features were comparable to the kernels in production use. The | |||
different window reduction algorithms were all present in the same | different window reduction algorithms were all present in the same | |||
kernel and could be selected with a sysctl, such that we had an | kernel and could be selected with a sysctl, such that we had an | |||
absolutely uniform baseline for comparing them. | absolutely uniform baseline for comparing them. | |||
Our experiments included an additional algorithm, PRR with an | Our experiments included an additional algorithm, PRR with an | |||
unlimited bound (PRR-UB), which sends ssthresh-pipe bursts when pipe | unlimited bound (PRR-UB), which sends ssthresh-pipe bursts when pipe | |||
skipping to change at page 12, line 41 | skipping to change at page 12, line 41 | |||
Although the packet conserving bound is very appealing for a number | Although the packet conserving bound is very appealing for a number | |||
of reasons, our measurements demonstrate that it is less aggressive | of reasons, our measurements demonstrate that it is less aggressive | |||
and does not perform as well as RFC3517, which permits significant | and does not perform as well as RFC3517, which permits significant | |||
bursts of data when there are large bursts of losses. PRR-SSRB is a | bursts of data when there are large bursts of losses. PRR-SSRB is a | |||
compromise that permits TCP to send one extra segment per ACK as | compromise that permits TCP to send one extra segment per ACK as | |||
relative to the packet conserving bound. From the perspective of the | relative to the packet conserving bound. From the perspective of the | |||
packet conserving bound, PRR-SSRB does indeed open the window during | packet conserving bound, PRR-SSRB does indeed open the window during | |||
recovery, however it is significantly less aggressive than RFC3517 in | recovery, however it is significantly less aggressive than RFC3517 in | |||
the presence of burst losses. Even so, it often out performs | the presence of burst losses. Even so, it often out performs | |||
RFC3517, because it avoids some of the self inflicted losses caused | RFC3517, because it avoids some of the self inflicted losses caused | |||
by bursts from RFC3517. | by bursts. | |||
At this time we see no reason not to test and deploy PRR-SSRB on a | At this time we see no reason not to test and deploy PRR-SSRB on a | |||
large scale. Implementers worried about any potential impact of | large scale. Implementers worried about any potential impact of | |||
raising the window during recovery may want to optionally support | raising the window during recovery may want to optionally support | |||
PRR-CRB (which is actually simpler to implement) for comparison | PRR-CRB (which is actually simpler to implement) for comparison | |||
studies. | studies. | |||
One final comment about terminology: we expect that common usage will | One final comment about terminology: we expect that common usage will | |||
drop "slow start reduction bound" from the algorithm name. This | drop "slow start reduction bound" from the algorithm name. This | |||
document needed to be pedantic about having distinct names for | document needed to be pedantic about having distinct names for | |||
skipping to change at page 13, line 16 | skipping to change at page 13, line 16 | |||
7. Acknowledgements | 7. Acknowledgements | |||
This draft is based in part on previous incomplete work by Matt | This draft is based in part on previous incomplete work by Matt | |||
Mathis, Jeff Semke and Jamshid Mahdavi [RHID] and influenced by | Mathis, Jeff Semke and Jamshid Mahdavi [RHID] and influenced by | |||
several discussion with John Heffner. | several discussion with John Heffner. | |||
Monia Ghobadi and Sivasankar Radhakrishnan helped analyze the | Monia Ghobadi and Sivasankar Radhakrishnan helped analyze the | |||
experiments. | experiments. | |||
Ilpo Jarvinen for reviewing the code. | Ilpo Jarvinen reviewed the code. | |||
8. Security Considerations | 8. Security Considerations | |||
Proportional Rate Reduction does not change the risk profile for TCP. | Proportional Rate Reduction does not change the risk profile for TCP. | |||
Implementers that change PRR from counting bytes to segments have to | Implementers that change PRR from counting bytes to segments have to | |||
be cautious about the effects of ACK splitting attacks [Savage99], | be cautious about the effects of ACK splitting attacks [Savage99], | |||
where the receiver acknowledges partial segments for the purpose of | where the receiver acknowledges partial segments for the purpose of | |||
confusing the sender's congestion accounting. | confusing the sender's congestion accounting. | |||
9. IANA Considerations | 9. IANA Considerations | |||
This document makes no request of IANA. | This document makes no request of IANA. | |||
Note to RFC Editor: this section may be removed on publication as an | Note to RFC Editor: this section may be removed on publication as an | |||
RFC. | RFC. | |||
10. References | 10. References | |||
10.1. Normative References | ||||
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | |||
Selective Acknowledgment Options", RFC 2018, October 1996. | Selective Acknowledgment Options", RFC 2018, October 1996. | |||
[RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A | [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A | |||
Conservative Selective Acknowledgment (SACK)-based Loss | Conservative Selective Acknowledgment (SACK)-based Loss | |||
Recovery Algorithm for TCP", RFC 3517, April 2003. | Recovery Algorithm for TCP", RFC 3517, April 2003. | |||
[RFC3742] Floyd, S., "Limited Slow-Start for TCP with Large | ||||
Congestion Windows", RFC 3742, March 2004. | ||||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, September 2009. | Control", RFC 5681, September 2009. | |||
10.2. Informative References | ||||
[RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing | ||||
TCP's Loss Recovery Using Limited Transmit", RFC 3042, | ||||
January 2001. | ||||
[IMC11] Dukkipati, N., Mathis, M., and Y. Cheng, "Proportional | [IMC11] Dukkipati, N., Mathis, M., and Y. Cheng, "Proportional | |||
Rate Reduction for TCP", ACM Internet Measurement | Rate Reduction for TCP", ACM Internet Measurement | |||
Conference IMC11, December 2011. | Conference IMC11, December 2011. | |||
[FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: | [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: | |||
Refining TCP Congestion Control", ACM SIGCOMM SIGCOMM96, | Refining TCP Congestion Control", ACM SIGCOMM SIGCOMM96, | |||
August 1996. | August 1996. | |||
[RHID] Mathis, M., Semke, J., Mahdavi, J., and K. Lahey, "The | [RHID] Mathis, M., Semke, J., Mahdavi, J., and K. Lahey, "The | |||
Rate-Halving Algorithm for TCP Congestion Control", draft- | Rate-Halving Algorithm for TCP Congestion Control", | |||
ratehalving (work in progress), June 1999. | draft-mathis-tcp-ratehalving (work in progress), | |||
June 1999. | ||||
[RHweb] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding | [RHweb] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding | |||
Parameters", Web publication , December 1997. | Parameters", Web publication , December 1997. | |||
[CUBIC] Rhee, I. and L. Xu, "CUBIC: A new TCP-friendly high-speed | [CUBIC] Rhee, I. and L. Xu, "CUBIC: A new TCP-friendly high-speed | |||
TCP variant", PFLDnet 2005, Feb 2005. | TCP variant", PFLDnet 2005, Feb 2005. | |||
[Savage99] | [Savage99] | |||
Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, | Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, | |||
"TCP congestion control with a misbehaving receiver", | "TCP congestion control with a misbehaving receiver", | |||
skipping to change at page 15, line 7 | skipping to change at page 15, line 13 | |||
fluctuation due to differences in packet arrival and exit times. Any | fluctuation due to differences in packet arrival and exit times. Any | |||
less aggressive algorithm will result in a declining queue at the | less aggressive algorithm will result in a declining queue at the | |||
bottleneck. Any more aggressive algorithm will result in an | bottleneck. Any more aggressive algorithm will result in an | |||
increasing queue or additional losses if it is a full drop tail | increasing queue or additional losses if it is a full drop tail | |||
queue. | queue. | |||
We demonstrate this property with a little thought experiment: | We demonstrate this property with a little thought experiment: | |||
Imagine a network path that has insignificant delays in both | Imagine a network path that has insignificant delays in both | |||
directions, except for the processing time and queue at a single | directions, except for the processing time and queue at a single | |||
bottleneck in the forward path. By insignificant delay, I mean when | bottleneck in the forward path. By insignificant delay, we mean when | |||
a packet is "served" at the head of the bottleneck queue, the | a packet is "served" at the head of the bottleneck queue, the | |||
following events happen in much less than one bottleneck packet time: | following events happen in much less than one bottleneck packet time: | |||
the packet arrives at the receiver; the receiver sends an ACK; which | the packet arrives at the receiver; the receiver sends an ACK; which | |||
arrives at the sender; the sender processes the ACK and sends some | arrives at the sender; the sender processes the ACK and sends some | |||
data; the data is queued at the bottleneck. | data; the data is queued at the bottleneck. | |||
If sndcnt is set to DeliveredData and nothing else is inhibiting | If sndcnt is set to DeliveredData and nothing else is inhibiting | |||
sending data, then clearly the data arriving at the bottleneck queue | sending data, then clearly the data arriving at the bottleneck queue | |||
will exactly replace the data that was served at the head of the | will exactly replace the data that was served at the head of the | |||
queue, so the queue will have a constant length. If queue is drop | queue, so the queue will have a constant length. If queue is drop | |||
tail and full then the queue will stay exactly full. Losses or | tail and full then the queue will stay exactly full. Losses or | |||
reordering on the ACK path only cause wider fluctuations in the queue | reordering on the ACK path only cause wider fluctuations in the queue | |||
size, but do not raise the peak size, independent of whether the data | size, but do not raise the peak size, independent of whether the data | |||
is in order or out-of-order (including loss recovery from an earlier | is in order or out-of-order (including loss recovery from an earlier | |||
RTT). Any more aggressive algorithm which sends additional data will | RTT). Any more aggressive algorithm which sends additional data will | |||
cause a queue overflow and loss. Any less aggressive algorithm will | overflow the drop tail queue and cause loss. Any less aggressive | |||
under fill the queue. Therefore setting sndcnt to DeliveredData is | algorithm will under fill the queue. Therefore setting sndcnt to | |||
the most aggressive algorithm that does not cause forced losses in | DeliveredData is the most aggressive algorithm that does not cause | |||
this simple network. Relaxing the assumptions (e.g. making delays | forced losses in this simple network. Relaxing the assumptions (e.g. | |||
more authentic and adding more flows, delayed ACKs, etc) may | making delays more authentic and adding more flows, delayed ACKs, | |||
increases the fine grained fluctuations in queue size but does not | etc) may increases the fine grained fluctuations in queue size but | |||
change its basic behavior. | does not change its basic behavior. | |||
Note that the congestion control algorithm implements a broader | Note that the congestion control algorithm implements a broader | |||
notion of optimal that includes appropriately sharing of the network. | notion of optimal that includes appropriately sharing the network. | |||
Typical congestion control algorithms are likely to reduce the data | Typical congestion control algorithms are likely to reduce the data | |||
sent relative to the packet conserving bound implemented by PRR | sent relative to the packet conserving bound implemented by PRR | |||
bringing TCP's actual window down to ssthresh. | bringing TCP's actual window down to ssthresh. | |||
Authors' Addresses | Authors' Addresses | |||
Matt Mathis | Matt Mathis | |||
Google, Inc | Google, Inc | |||
1600 Amphitheater Parkway | 1600 Amphitheater Parkway | |||
Mountain View, California 93117 | Mountain View, California 93117 | |||
End of changes. 35 change blocks. | ||||
67 lines changed or deleted | 83 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |