draft-ietf-tcpm-proportional-rate-reduction-00.txt | draft-ietf-tcpm-proportional-rate-reduction-01.txt | |||
---|---|---|---|---|
TCP Maintenance Working Group M. Mathis | TCP Maintenance Working Group M. Mathis | |||
Internet-Draft N. Dukkipati | Internet-Draft N. Dukkipati | |||
Intended status: Experimental Y. Cheng | Intended status: Experimental Y. Cheng | |||
Expires: April 26, 2012 Google, Inc | Expires: August 27, 2012 Google, Inc | |||
October 24, 2011 | February 24, 2012 | |||
Proportional Rate Reduction for TCP | Proportional Rate Reduction for TCP | |||
draft-ietf-tcpm-proportional-rate-reduction-00.txt | draft-ietf-tcpm-proportional-rate-reduction-01.txt | |||
Abstract | Abstract | |||
This document describes an experimental algorithm, Proportional Rate | This document describes an experimental algorithm, Proportional Rate | |||
Reduction (PPR) to improve the accuracy of the amount of data sent by | Reduction (PPR) to improve the accuracy of the amount of data sent by | |||
TCP during loss recovery. Standard Congestion Control requires that | TCP during loss recovery. Standard Congestion Control requires that | |||
TCP and other protocols reduce their congestion window in response to | TCP and other protocols reduce their congestion window in response to | |||
losses. This window reduction naturally occurs in the same round | losses. This window reduction naturally occurs in the same round | |||
trip as the data retransmissions to repair the losses, and is | trip as the data retransmissions to repair the losses, and is | |||
implemented by choosing not to transmit any data in response to some | implemented by choosing not to transmit any data in response to some | |||
ACKs arriving from the receiver. Two widely deployed algorithms are | ACKs arriving from the receiver. Two widely deployed algorithms are | |||
used to implement this window reduction: Fast Recovery and Rate | used to implement this window reduction: Fast Recovery and Rate | |||
Halving. Both algorithms are needlessly fragile under a number of | Halving. Both algorithms are needlessly fragile under a number of | |||
conditions, particularly when there is a burst of losses that such | conditions, particularly when there is a burst of losses that such | |||
that the number of ACKs returning to the sender is small. | that the number of ACKs returning to the sender is small. | |||
Proportional Rate Reduction avoids these excess window reductions | Proportional Rate Reduction minimizes these excess window reductions | |||
such that at the end of recovery the actual window size will be as | such that at the end of recovery the actual window size will be as | |||
close as possible to ssthresh, the window size determined by the | close as possible to ssthresh, the window size determined by the | |||
congestion control algorithm. It is patterned after Rate Halving, | congestion control algorithm. It is patterned after Rate Halving, | |||
but using the fraction that is appropriate for target window chosen | but using the fraction that is appropriate for target window chosen | |||
by the congestion control algorithm. | by the congestion control algorithm. | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
skipping to change at page 1, line 48 | skipping to change at page 1, line 48 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on April 26, 2012. | This Internet-Draft will expire on August 27, 2012. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2011 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
skipping to change at page 3, line 28 | skipping to change at page 3, line 28 | |||
It is fragile because it can not compensate for the implicit window | It is fragile because it can not compensate for the implicit window | |||
reduction caused by the losses themselves, and is exposed to | reduction caused by the losses themselves, and is exposed to | |||
timeouts. For example if half of the data or ACKs are lost, Fast | timeouts. For example if half of the data or ACKs are lost, Fast | |||
Recovery's expected behavior would be wait for half window of ACKs to | Recovery's expected behavior would be wait for half window of ACKs to | |||
pass and then not receive any ACKs for the recovery and suffer a | pass and then not receive any ACKs for the recovery and suffer a | |||
timeout. | timeout. | |||
The rate-halving algorithm improves this situation by sending data on | The rate-halving algorithm improves this situation by sending data on | |||
alternate ACKs during recovery, such that after one RTT the window | alternate ACKs during recovery, such that after one RTT the window | |||
has been halved. Rate-having is implemented in Linux after only | has been halved. Rate-having is implemented in Linux after only | |||
being informally published [RHweb], including from an uncompleted | being informally published [RHweb], including an uncompleted | |||
Internet-Draft [RHID]. Rate-halving does not adequately compensate | Internet-Draft [RHID]. Rate-halving does not adequately compensate | |||
for the implicit window reduction caused by the losses and also | for the implicit window reduction caused by the losses and assumes a | |||
assumes a 50% window reduction, which was completely standard at the | 50% window reduction, which was completely standard at the time it | |||
time it was written, but not appropriate for modern congestion | was written, but not appropriate for modern congestion control | |||
control algorithms such as Cubic [CUBIC], which can reduce the window | algorithms such as Cubic [CUBIC], which can reduce the window by less | |||
by less than 50%. As a consequence rate-halving often allows the | than 50%. As a consequence rate-halving often allows the window to | |||
window to fall further than necessary, reducing performance and | fall further than necessary, reducing performance and increasing the | |||
increasing the risk of timeouts if there are additional losses. | risk of timeouts if there are additional losses. | |||
Proportional Rate Reduction (PPR) avoids these excess window | Proportional Rate Reduction (PPR) avoids these excess window | |||
reductions such that at the end of recovery the actual window size | reductions such that at the end of recovery the actual window size | |||
will be as close as possible to, ssthresh, the window size determined | will be as close as possible to, ssthresh, the window size determined | |||
by the congestion control algorithm. It is patterned after Rate | by the congestion control algorithm. It is patterned after Rate | |||
Halving, but using the fraction that is appropriate for target window | Halving, but using the fraction that is appropriate for target window | |||
chosen by the congestion control algorithm. During PRR one of two | chosen by the congestion control algorithm. During PRR one of two | |||
additional reduction bound algorithms limits the total window | additional reduction bound algorithms limits the total window | |||
reduction due to all mechanisms, including application stalls and the | reduction due to all mechanisms, including application stalls and the | |||
losses themselves. | losses themselves. | |||
We describe two slightly different reduction bound algorithms: | We describe two slightly different reduction bound algorithms: | |||
conservative reduction bound (CRB), which is strictly packet | conservative reduction bound (CRB), which is strictly packet | |||
conserving; and a slow start reduction bound (SSRB), which is more | conserving; and a slow start reduction bound (SSRB), which is more | |||
aggressive than CRB by at most one segment per ACK. PRR-CRB meets | aggressive than CRB by at most one segment per ACK. PRR-CRB meets | |||
the strong conservative bound described in Appendix A, however in | the strong conservative bound described in Appendix A, however in | |||
real networks it does not perform as well as the algorithms described | real networks it does not perform as well as the algorithms described | |||
in RFC 3517, which prove to be non-conservative in a statistically | in RFC 3517, which prove to be non-conservative in a significant | |||
significant number of cases. SSRB offers a compromise by allowing | number of cases. SSRB offers a compromise by allowing TCP to send | |||
TCP to send one additional segment per ACK relative to CRB in some | one additional segment per ACK relative to CRB in some situations. | |||
situations. Although SSRB is less aggressive than RFC 3517 | Although SSRB is less aggressive than RFC 3517 (transmitting fewer | |||
(transmitting fewer segments or taking more time to transmit them) it | segments or taking more time to transmit them) it outperforms it, due | |||
outperforms it, due to the lower probability of additional losses | to the lower probability of additional losses during recovery. | |||
during recovery. | ||||
PRR and both reduction bounds are based on common design principles, | PRR and both reduction bounds are based on common design principles, | |||
derived from Van Jacobson's packet conservation principle: segments | derived from Van Jacobson's packet conservation principle: segments | |||
delivered to the receiver are used as the clock to trigger sending | delivered to the receiver are used as the clock to trigger sending | |||
the same number of segments back into the network. As much as | the same number of segments back into the network. As much as | |||
possible Proportional Rate Reduction and the reduction bound rely on | possible Proportional Rate Reduction and the reduction bound rely on | |||
this self clock process, and are only slightly affected by the | this self clock process, and are only slightly affected by the | |||
accuracy of other estimators, such as pipe [RFC3517] and cwnd. This | accuracy of other estimators, such as pipe [RFC3517] and cwnd. This | |||
is what gives the algorithms their precision in the presence of | is what gives the algorithms their precision in the presence of | |||
events that cause uncertainty in other estimators. | events that cause uncertainty in other estimators. | |||
skipping to change at page 4, line 45 | skipping to change at page 4, line 44 | |||
The following terms, parameters and state variables are used as they | The following terms, parameters and state variables are used as they | |||
are defined in earlier documents: | are defined in earlier documents: | |||
RFC 3517: covered (as in "covered sequence numbers") | RFC 3517: covered (as in "covered sequence numbers") | |||
RFC 5681: duplicate ACK, FlightSize, Sender Maximum Segment Size | RFC 5681: duplicate ACK, FlightSize, Sender Maximum Segment Size | |||
(SMSS) | (SMSS) | |||
Voluntary window reductions: choosing not to send data in response to | Voluntary window reductions: choosing not to send data in response to | |||
some ACKs, for the purpose of reducing the sending window size or | some ACKs, for the purpose of reducing the sending window size and | |||
data rate. | data rate. | |||
We define some additional variables: | We define some additional variables: | |||
SACKd: The total number of bytes that the scoreboard indicates have | SACKd: The total number of bytes that the scoreboard indicates have | |||
been delivered to the receiver. This can be computed by scanning the | been delivered to the receiver. This can be computed by scanning the | |||
scoreboard and counting the total number of bytes covered by all sack | scoreboard and counting the total number of bytes covered by all sack | |||
blocks. If SACK is not in use, SACKd is not defined. | blocks. If SACK is not in use, SACKd is not defined. | |||
DeliveredData: The total number of bytes that the current ACK | DeliveredData: The total number of bytes that the current ACK | |||
skipping to change at page 6, line 16 | skipping to change at page 6, line 16 | |||
prr_delivered += DeliveredData | prr_delivered += DeliveredData | |||
pipe = (RFC 3517 pipe algorithm) | pipe = (RFC 3517 pipe algorithm) | |||
if (pipe > ssthresh) { | if (pipe > ssthresh) { | |||
// Proportional Rate Reduction | // Proportional Rate Reduction | |||
sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out | sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out | |||
} else { | } else { | |||
// Two version of the reduction bound | // Two version of the reduction bound | |||
if (conservative) { // PRR+CRB | if (conservative) { // PRR+CRB | |||
limit = prr_delivered - prr_out | limit = prr_delivered - prr_out | |||
} else { // PRR+SSRB | } else { // PRR+SSRB | |||
limit = MAX(prr_delivered - prr_out, DeliveredData) + 1 | limit = MAX(prr_delivered - prr_out, DeliveredData) + MSS | |||
} | } | |||
// Attempt to catch up, as permitted by limit | // Attempt to catch up, as permitted by limit | |||
sndcnt = MIN(ssthresh - pipe, limit) | sndcnt = MIN(ssthresh - pipe, limit) | |||
} | } | |||
On any data transmission or retransmission: | On any data transmission or retransmission: | |||
prr_out += (data sent) // strictly less than or equal to sndcnt | prr_out += (data sent) // strictly less than or equal to sndcnt | |||
The following examples will make these algorithms clearer. | ||||
3.1. Examples | 3.1. Examples | |||
We illustrate these algorithms by showing their different behaviors | We illustrate these algorithms by showing their different behaviors | |||
for two scenarios: TCP experiencing either a single loss or a burst | for two scenarios: TCP experiencing either a single loss or a burst | |||
of 15 consecutive losses. In all cases we assume bulk data, standard | of 15 consecutive losses. In all cases we assume bulk data, standard | |||
AIMD congestion control (the ssthresh is set to FlightSize/2) and | AIMD congestion control and cwnd = FlightSize = pipe = 20 segments, | |||
cwnd = FlightSize = pipe = 20 segments, so ssthresh will be set to 10 | so ssthresh will be set to 10 at the beginning of recovery. We also | |||
at the beginning of recovery. We also assume standard Fast | assume standard Fast Retransmit and Limited Transmit, so we send two | |||
Retransmit and Limited Transmit, so we send two new segments followed | new segments followed by one retransmit on the first 3 duplicate ACKs | |||
by one retransmit on the first 3 duplicate ACKs after the losses. | after the losses. | |||
Each of the diagrams below shows the per ACK response to the first | Each of the diagrams below shows the per ACK response to the first | |||
round trip for the various recovery algorithms when the zeroth | round trip for the various recovery algorithms when the zeroth | |||
segment is lost. The top line indicates the transmitted segment | segment is lost. The top line indicates the transmitted segment | |||
number triggering the ACKs, with an X for the lost segment. "cwnd" | number triggering the ACKs, with an X for the lost segment. "cwnd" | |||
and "pipe" indicate the values of these algorithms after processing | and "pipe" indicate the values of these algorithms after processing | |||
each returning ACK. "Sent" indicates how much 'N'ew or | each returning ACK. "Sent" indicates how much 'N'ew or | |||
'R'etransmitted data would be sent. Note that the algorithms for | 'R'etransmitted data would be sent. Note that the algorithms for | |||
deciding which data to send are out of scope of this document. | deciding which data to send are out of scope of this document. | |||
skipping to change at page 9, line 22 | skipping to change at page 9, line 22 | |||
window reduction causes it to take excessively long to recover the | window reduction causes it to take excessively long to recover the | |||
losses and exposes it to additional timeouts. | losses and exposes it to additional timeouts. | |||
PRR-SSRB increases the window by exactly 1 segment per ACK until pipe | PRR-SSRB increases the window by exactly 1 segment per ACK until pipe | |||
rises to sshthresh during recovery. This is accomplished by setting | rises to sshthresh during recovery. This is accomplished by setting | |||
limit to one greater than the data reported to have been delivered to | limit to one greater than the data reported to have been delivered to | |||
the receiver on this ACK, implementing slowstart during recovery, and | the receiver on this ACK, implementing slowstart during recovery, and | |||
indicated by RB:d tagging in the figure. Although increasing the | indicated by RB:d tagging in the figure. Although increasing the | |||
window during recovery seems to be ill advised, it is important to | window during recovery seems to be ill advised, it is important to | |||
remember that this actually less aggressive than permitted by RFC | remember that this actually less aggressive than permitted by RFC | |||
5681, which sends the same quantity of extra data as a single burst | 5681, which sends the same quantity of additional data as a single | |||
in response to the ACK that triggered Fast Retransmit | burst in response to the ACK that triggered Fast Retransmit | |||
Under less extreme conditions, when the total losses are smaller than | For less extreme events, where the total losses are smaller than the | |||
the difference between Flight Size and ssthresh, PRR-CRB and PRR-SSRB | difference between Flight Size and ssthresh, PRR-CRB and PRR-SSRB | |||
have identical behaviours. | have identical behaviours. | |||
4. Properties | 4. Properties | |||
The following properties are common to both PRR-CRB and PRR-SSRB: | The following properties are common to both PRR-CRB and PRR-SSRB | |||
except as noted: | ||||
Proportional Rate Reduction maintains TCPs ACK clocking across most | ||||
recovery events, including burst losses. RFC 3517 can send large | ||||
unclocked bursts following burst losses. | ||||
Normally Proportional Rate Reduction will spread voluntary window | Normally Proportional Rate Reduction will spread voluntary window | |||
reductions out evenly across a full RTT. This has the potential to | reductions out evenly across a full RTT. This has the potential to | |||
generally reduce the burstiness of Internet traffic, and could be | generally reduce the burstiness of Internet traffic, and could be | |||
considered to be a type of soft pacing. Theoretically any pacing | considered to be a type of soft pacing. Hypothetically, any pacing | |||
increases the probability that different flows are interleaved, | increases the probability that different flows are interleaved, | |||
reducing the opportunity for ACK compression and other phenomena that | reducing the opportunity for ACK compression and other phenomena that | |||
increase traffic burstiness. However these effects have not been | increase traffic burstiness. However these effects have not been | |||
quantified. | quantified. | |||
If there are minimal losses, Proportional Rate Reduction will | If there are minimal losses, Proportional Rate Reduction will | |||
converge to exactly the target window chosen by the congestion | converge to exactly the target window chosen by the congestion | |||
control algorithm. Note that as TCP approaches the end of recovery | control algorithm. Note that as TCP approaches the end of recovery | |||
prr_delivered will approach RecoverFS and sndcnt will be computed | prr_delivered will approach RecoverFS and sndcnt will be computed | |||
such that prr_out approaches ssthresh. | such that prr_out approaches ssthresh. | |||
skipping to change at page 10, line 11 | skipping to change at page 10, line 16 | |||
recovery cause later voluntary reductions to be skipped. For small | recovery cause later voluntary reductions to be skipped. For small | |||
numbers of losses the window size ends at exactly the window chosen | numbers of losses the window size ends at exactly the window chosen | |||
by the congestion control algorithm. | by the congestion control algorithm. | |||
For burst losses, earlier voluntary window reductions can be undone | For burst losses, earlier voluntary window reductions can be undone | |||
by sending extra segments in response to ACKs arriving later during | by sending extra segments in response to ACKs arriving later during | |||
recovery. Note that as long as some voluntary window reductions are | recovery. Note that as long as some voluntary window reductions are | |||
not undone, the final value for pipe will be the same as ssthresh, | not undone, the final value for pipe will be the same as ssthresh, | |||
the target cwnd value chosen by the congestion control algorithm. | the target cwnd value chosen by the congestion control algorithm. | |||
Proportional Rate Reduction with either reduction round improves the | Proportional Rate Reduction with either reduction bound improves the | |||
situation when there are application stalls (e.g. when the sending | situation when there are application stalls (e.g. when the sending | |||
application does not queue data for transmission quickly enough or | application does not queue data for transmission quickly enough or | |||
the receiver stops advancing rwnd). When there is an application | the receiver stops advancing rwnd). When there is an application | |||
stall early during recovery prr_out will fall behind the sum of the | stall early during recovery prr_out will fall behind the sum of the | |||
transmissions permitted by sndcnt. The missed opportunities to send | transmissions permitted by sndcnt. The missed opportunities to send | |||
due to stalls are treated like banked voluntary window reductions: | due to stalls are treated like banked voluntary window reductions: | |||
specifically they cause prr_delivered-prr_out to be significantly | specifically they cause prr_delivered-prr_out to be significantly | |||
positive. If the application catches up while TCP is still in | positive. If the application catches up while TCP is still in | |||
recovery, TCP will send a partial window burst to catch up to exactly | recovery, TCP will send a partial window burst to catch up to exactly | |||
where it would have been, had the application never stalled. | where it would have been, had the application never stalled. | |||
skipping to change at page 10, line 50 | skipping to change at page 11, line 6 | |||
Since short term errors in pipe are smoothed out across multiple ACKs | Since short term errors in pipe are smoothed out across multiple ACKs | |||
and both Proportional Rate Reduction and the reduction bound converge | and both Proportional Rate Reduction and the reduction bound converge | |||
to the same final window, errors in the pipe estimator have less | to the same final window, errors in the pipe estimator have less | |||
impact on the final outcome. | impact on the final outcome. | |||
Under all conditions and sequences of events during recovery, PRR-CRB | Under all conditions and sequences of events during recovery, PRR-CRB | |||
strictly bounds the data transmitted to be equal to or less than the | strictly bounds the data transmitted to be equal to or less than the | |||
amount of data delivered to the receiver. We claim that this packet | amount of data delivered to the receiver. We claim that this packet | |||
conservation bound is the most aggressive algorithm that does not | conservation bound is the most aggressive algorithm that does not | |||
lead to additional forced losses in some environments. It has the | lead to additional forced losses in some environments. It has the | |||
property that if there is a standing queue at a bottleneck that is | property that if there is a standing queue at a bottleneck with no | |||
carrying no other traffic, the queue will maintain exactly constant | cross traffic, the queue will maintain exactly constant length for | |||
length for the duration of the recovery (except for +1/-1 fluctuation | the duration of the recovery, except for +1/-1 fluctuation due to | |||
due to differences in packet arrival and exit times) . See | differences in packet arrival and exit times. See Appendix A for a | |||
Appendix A for a detailed discussion of this property. | detailed discussion of this property. | |||
Although the packet Packet Conserving Bound in very appealing for a | Although the packet Packet Conserving Bound in very appealing for a | |||
number of reasons, our measurements summarized in Section 5 | number of reasons, our measurements summarized in Section 5 | |||
demonstrate that it is less aggressive and does not perform as well | demonstrate that it is less aggressive and does not perform as well | |||
as RFC3517, which permits large bursts of data when there are bursts | as RFC3517, which permits large bursts of data when there are bursts | |||
of losses. PRR-SSRB is a compromise that permits TCP to send one | of losses. PRR-SSRB is a compromise that permits TCP to send one | |||
extra segment per ACK as compared to the packet conserving bound. | extra segment per ACK as compared to the packet conserving bound. | |||
From the perspective of the packet conserving bound, PRR-SSRB does | From the perspective of the packet conserving bound, PRR-SSRB does | |||
indeed open the window during recovery, however it is significantly | indeed open the window during recovery, however it is significantly | |||
less aggressive than RFC3517 in the presence of burst losses. | less aggressive than RFC3517 in the presence of burst losses. | |||
5. Measurements | 5. Measurements | |||
In a companion IMC11 paper [IMC11] we describe some measurements | In a companion IMC11 paper [IMC11] we describe some measurements | |||
comparing the various strategies for reducing the window during | comparing the various strategies for reducing the window during | |||
recovery. The results are summarized here. | recovery. The results are summarized here. | |||
The various window reduction algorithms and extensive instrumentation | The various window reduction algorithms and extensive instrumentation | |||
were all implemented in Linux 2.6. We used the uniform set of | were all implemented in Linux 2.6. We used the uniform set of | |||
algorithms present in the base Linux implementation, including CUBIC | algorithms present in the base Linux implementation, including CUBIC | |||
[CUBIC], limited transmit [LT], threshold transmit from [FACK] and | [CUBIC], limited transmit [RFC3742], threshold transmit from [FACK] | |||
lost retransmission detection algorithms. We confirmed that the | and lost retransmission detection algorithms. We confirmed that the | |||
behaviors of Rate Halving (the Linux default), RFC 3517 and PRR were | behaviors of Rate Halving (the Linux default), RFC 3517 and PRR were | |||
authentic to their respective specifications and that performance and | authentic to their respective specifications and that performance and | |||
features were comparable to the kernels in production use. The | features were comparable to the kernels in production use. The | |||
different window reduction algorithms were all present in the same | different window reduction algorithms were all present in the same | |||
kernel and could be selected with a sysctl, such that we had an | kernel and could be selected with a sysctl, such that we had an | |||
absolutely uniform baseline for comparing them. | absolutely uniform baseline for comparing them. | |||
Our experiments included an additional algorithm, PRR with an | Our experiments included an additional algorithm, PRR with an | |||
unlimited bound (PRR-UB), which sends ssthresh-pipe bursts when pipe | unlimited bound (PRR-UB), which sends ssthresh-pipe bursts when pipe | |||
falls below ssthresh. This behavior parallels RFC 3517. | falls below ssthresh. This behavior parallels RFC 3517. | |||
skipping to change at page 12, line 39 | skipping to change at page 12, line 44 | |||
bursts of data when there are large bursts of losses. PRR-SSRB is a | bursts of data when there are large bursts of losses. PRR-SSRB is a | |||
compromise that permits TCP to send one extra segment per ACK as | compromise that permits TCP to send one extra segment per ACK as | |||
relative to the packet conserving bound. From the perspective of the | relative to the packet conserving bound. From the perspective of the | |||
packet conserving bound, PRR-SSRB does indeed open the window during | packet conserving bound, PRR-SSRB does indeed open the window during | |||
recovery, however it is significantly less aggressive than RFC3517 in | recovery, however it is significantly less aggressive than RFC3517 in | |||
the presence of burst losses. Even so, it often out performs | the presence of burst losses. Even so, it often out performs | |||
RFC3517, because it avoids some of the self inflicted losses caused | RFC3517, because it avoids some of the self inflicted losses caused | |||
by bursts from RFC3517. | by bursts from RFC3517. | |||
At this time we see no reason not to test and deploy PRR-SSRB on a | At this time we see no reason not to test and deploy PRR-SSRB on a | |||
large scale, indeed it already is. Implementers worried about any | large scale. Implementers worried about any potential impact of | |||
potential impact of raising the window during recovery may want to | raising the window during recovery may want to optionally support | |||
optionally support PRR-CRB (which is actually simpler to implement) | PRR-CRB (which is actually simpler to implement) for comparison | |||
for comparison studies. | studies. | |||
One final comment about terminology: we expect that common usage will | One final comment about terminology: we expect that common usage will | |||
drop "slow start reduction bound" from the algorithm name. This | drop "slow start reduction bound" from the algorithm name. This | |||
document needs to be pedantic about having distinct name for | document needed to be pedantic about having distinct names for | |||
proportional rate reduction and every variant of the reduction bound. | proportional rate reduction and every variant of the reduction bound. | |||
However, once paired they become one. | However, once paired they become one. | |||
7. Acknowledgements | 7. Acknowledgements | |||
This draft is based in part on previous incomplete work by Matt | This draft is based in part on previous incomplete work by Matt | |||
Mathis, Jeff Semke and Jamshid Mahdavi [RHID] and influenced by | Mathis, Jeff Semke and Jamshid Mahdavi [RHID] and influenced by | |||
several discussion with John Heffner. | several discussion with John Heffner. | |||
Monia Ghobadi and Sivasankar Radhakrishnan helped analyze the | Monia Ghobadi and Sivasankar Radhakrishnan helped analyze the | |||
experiments. | experiments. | |||
Ilpo Jarvinen for reviewing the code. | ||||
8. Security Considerations | 8. Security Considerations | |||
Proportional Rate Reduction does not change the risk profile for TCP. | Proportional Rate Reduction does not change the risk profile for TCP. | |||
Implementers that change PRR from counting bytes to segments have to | Implementers that change PRR from counting bytes to segments have to | |||
be cautious about the effects of ACK splitting attacks[SPLIT], where | be cautious about the effects of ACK splitting attacks [Savage99], | |||
the receiver acknowledges partial segments for the purpose of | where the receiver acknowledges partial segments for the purpose of | |||
confusing the sender's congestion accounting. | confusing the sender's congestion accounting. | |||
9. IANA Considerations | 9. IANA Considerations | |||
This document makes no request of IANA. | This document makes no request of IANA. | |||
Note to RFC Editor: this section may be removed on publication as an | Note to RFC Editor: this section may be removed on publication as an | |||
RFC. | RFC. | |||
10. References | 10. References | |||
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | |||
Selective Acknowledgment Options", RFC 2018, October 1996. | Selective Acknowledgment Options", RFC 2018, October 1996. | |||
[RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A | [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A | |||
Conservative Selective Acknowledgment (SACK)-based Loss | Conservative Selective Acknowledgment (SACK)-based Loss | |||
Recovery Algorithm for TCP", RFC 3517, April 2003. | Recovery Algorithm for TCP", RFC 3517, April 2003. | |||
[RFC3742] Floyd, S., "Limited Slow-Start for TCP with Large | ||||
Congestion Windows", RFC 3742, March 2004. | ||||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, September 2009. | Control", RFC 5681, September 2009. | |||
[IMC11] Dukkipati, N., Mathis, M., and Y. Cheng, "Proportional | [IMC11] Dukkipati, N., Mathis, M., and Y. Cheng, "Proportional | |||
Rate Reduction for TCP", ACM Internet Measurement | Rate Reduction for TCP", ACM Internet Measurement | |||
Conference IMC11, December 2011. | Conference IMC11, December 2011. | |||
[FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: | [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: | |||
Refining TCP Congestion Control", ACM SIGCOMM SIGCOMM96, | Refining TCP Congestion Control", ACM SIGCOMM SIGCOMM96, | |||
August 1996. | August 1996. | |||
skipping to change at page 14, line 15 | skipping to change at page 14, line 23 | |||
[RHID] Mathis, M., Semke, J., Mahdavi, J., and K. Lahey, "The | [RHID] Mathis, M., Semke, J., Mahdavi, J., and K. Lahey, "The | |||
Rate-Halving Algorithm for TCP Congestion Control", draft- | Rate-Halving Algorithm for TCP Congestion Control", draft- | |||
ratehalving (work in progress), June 1999. | ratehalving (work in progress), June 1999. | |||
[RHweb] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding | [RHweb] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding | |||
Parameters", Web publication , December 1997. | Parameters", Web publication , December 1997. | |||
[CUBIC] Rhee, I. and L. Xu, "CUBIC: A new TCP-friendly high-speed | [CUBIC] Rhee, I. and L. Xu, "CUBIC: A new TCP-friendly high-speed | |||
TCP variant", PFLDnet 2005, Feb 2005. | TCP variant", PFLDnet 2005, Feb 2005. | |||
[Savage99] | ||||
Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, | ||||
"TCP congestion control with a misbehaving receiver", | ||||
SIGCOMM Comput. Commun. Rev. 29(5), October 1999. | ||||
Appendix A. Packet Conservation Bound | Appendix A. Packet Conservation Bound | |||
PRR-CRB meets a conservative, philosophically pure and aesthetically | PRR-CRB meets a conservative, philosophically pure and aesthetically | |||
appealing notion of correct, described here. However, in real | appealing notion of correct, described here. However, in real | |||
networks it does not perform as well as the algorithms described in | networks it does not perform as well as the algorithms described in | |||
RFC 3517, which prove to be non-conservative in a statistically | RFC 3517, which proves to be non-conservative in a significant number | |||
significant number of cases. | of cases. | |||
Under all conditions and sequences of events during recovery, PRR-CRB | Under all conditions and sequences of events during recovery, PRR-CRB | |||
strictly bounds the data transmitted to be equal to or less than the | strictly bounds the data transmitted to be equal to or less than the | |||
amount of data delivered to the receiver. We claim that this packet | amount of data delivered to the receiver. We claim that this packet | |||
conservation bound is the most aggressive algorithm that does not | conservation bound is the most aggressive algorithm that does not | |||
lead to additional forced losses in some environments. It has the | lead to additional forced losses in some environments. It has the | |||
property that if there is a standing queue at a bottleneck that is | property that if there is a standing queue at a bottleneck that is | |||
carrying no other traffic, the queue will maintain exactly constant | carrying no other traffic, the queue will maintain exactly constant | |||
length for the entire recovery duration (except for +1/-1 fluctuation | length for the entire duration of the recovery, except for +1/-1 | |||
due to differences in packet arrival and exit times) . Any less | fluctuation due to differences in packet arrival and exit times. Any | |||
aggressive algorithm will result in a declining queue at the | less aggressive algorithm will result in a declining queue at the | |||
bottleneck. Any more aggressive algorithm will result in an | bottleneck. Any more aggressive algorithm will result in an | |||
increasing queue or additional losses at the bottleneck. | increasing queue or additional losses if it is a full drop tail | |||
queue. | ||||
We demonstrate this property with a little thought experiment: | We demonstrate this property with a little thought experiment: | |||
Imagine a network path that has insignificant delays in both | Imagine a network path that has insignificant delays in both | |||
directions, except the processing time and queue at a single | directions, except for the processing time and queue at a single | |||
bottleneck in the forward path. By insignificant delay, I mean when | bottleneck in the forward path. By insignificant delay, I mean when | |||
a packet is "served" at the head of the bottleneck queue, the | a packet is "served" at the head of the bottleneck queue, the | |||
following events happen in much less than one bottleneck packet time: | following events happen in much less than one bottleneck packet time: | |||
the packet arrives at the receiver; the receiver sends an ACK; which | the packet arrives at the receiver; the receiver sends an ACK; which | |||
arrives at the sender; the sender processes the ACK and sends some | arrives at the sender; the sender processes the ACK and sends some | |||
data; the data is queued at the bottleneck. | data; the data is queued at the bottleneck. | |||
If sndcnt is set to DeliveredData and nothing else is inhibiting | If sndcnt is set to DeliveredData and nothing else is inhibiting | |||
sending data, then clearly the data arriving at the bottleneck queue | sending data, then clearly the data arriving at the bottleneck queue | |||
will exactly replace the data that was served at the head of the | will exactly replace the data that was served at the head of the | |||
skipping to change at page 15, line 14 | skipping to change at page 15, line 29 | |||
reordering on the ACK path only cause wider fluctuations in the queue | reordering on the ACK path only cause wider fluctuations in the queue | |||
size, but do not raise the peak size, independent of whether the data | size, but do not raise the peak size, independent of whether the data | |||
is in order or out-of-order (including loss recovery from an earlier | is in order or out-of-order (including loss recovery from an earlier | |||
RTT). Any more aggressive algorithm which sends additional data will | RTT). Any more aggressive algorithm which sends additional data will | |||
cause a queue overflow and loss. Any less aggressive algorithm will | cause a queue overflow and loss. Any less aggressive algorithm will | |||
under fill the queue. Therefore setting sndcnt to DeliveredData is | under fill the queue. Therefore setting sndcnt to DeliveredData is | |||
the most aggressive algorithm that does not cause forced losses in | the most aggressive algorithm that does not cause forced losses in | |||
this simple network. Relaxing the assumptions (e.g. making delays | this simple network. Relaxing the assumptions (e.g. making delays | |||
more authentic and adding more flows, delayed ACKs, etc) may | more authentic and adding more flows, delayed ACKs, etc) may | |||
increases the fine grained fluctuations in queue size but does not | increases the fine grained fluctuations in queue size but does not | |||
change it's basic behavior. | change its basic behavior. | |||
Note that the congestion control algorithm implements a broader | Note that the congestion control algorithm implements a broader | |||
notion of optimal that includes appropriately sharing of the network. | notion of optimal that includes appropriately sharing of the network. | |||
Typical congestion control algorithms are likely to reduce the data | Typical congestion control algorithms are likely to reduce the data | |||
sent relative to the packet conserving bound implemented by PRR | sent relative to the packet conserving bound implemented by PRR | |||
bringing TCP's actual window down to ssthresh. | bringing TCP's actual window down to ssthresh. | |||
Authors' Addresses | Authors' Addresses | |||
Matt Mathis | Matt Mathis | |||
End of changes. 30 change blocks. | ||||
59 lines changed or deleted | 72 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |