draft-bensley-tcpm-dctcp-02.txt   draft-bensley-tcpm-dctcp-03.txt 
Network Working Group S. Bensley Network Working Group S. Bensley
Internet-Draft Microsoft Internet-Draft Microsoft
Intended status: Informational L. Eggert Intended status: Informational L. Eggert
Expires: July 25, 2015 NetApp Expires: October 15, 2015 NetApp
D. Thaler D. Thaler
Microsoft Microsoft
January 21, 2015 April 13, 2015
Microsoft's Datacenter TCP (DCTCP): Microsoft's Datacenter TCP (DCTCP):
TCP Congestion Control for Datacenters TCP Congestion Control for Datacenters
draft-bensley-tcpm-dctcp-02 draft-bensley-tcpm-dctcp-03
Abstract Abstract
This memo describes Datacenter TCP (DCTCP), an improvement to TCP This memo describes Datacenter TCP (DCTCP), an improvement to TCP
congestion control for datacenter traffic, as implemented in Windows congestion control for datacenter traffic, as implemented in Windows
Server 2012. DCTCP enhances Explicit Congestion Notification (ECN) Server 2012. DCTCP enhances Explicit Congestion Notification (ECN)
processing to estimate the fraction of bytes that encounter processing to estimate the fraction of bytes that encounter
congestion, rather than simply detecting that some congestion has congestion, rather than simply detecting that some congestion has
occurred. DCTCP then scales the TCP congestion window based on this occurred. DCTCP then scales the TCP congestion window based on this
estimate. This method achieves high burst tolerance, low latency, estimate. This method achieves high burst tolerance, low latency,
skipping to change at page 1, line 41 skipping to change at page 1, line 41
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 25, 2015. This Internet-Draft will expire on October 15, 2015.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 4, line 22 skipping to change at page 4, line 22
o The sender reacts to the congestion indication by reducing the TCP o The sender reacts to the congestion indication by reducing the TCP
congestion window (cwnd). congestion window (cwnd).
3.1. Marking Congestion on the Switch 3.1. Marking Congestion on the Switch
The switch indicates congestion to the end nodes by setting the CE The switch indicates congestion to the end nodes by setting the CE
codepoint in the IP header as specified in Section 5 of [RFC3168]. codepoint in the IP header as specified in Section 5 of [RFC3168].
For example, the switch may be configured with a congestion For example, the switch may be configured with a congestion
threshold. When a packet arrives at the switch and its queue length threshold. When a packet arrives at the switch and its queue length
is greater than the congestion threshold, the switch sets the CE is greater than the congestion threshold, the switch sets the CE
codepoint in the packet. However, the actual algorithm for marking codepoint in the packet. For example, Section 3.4 of [DCTCP10]
congestion is an implementation detail of the switch and will suggests threshold marking with a threshold K > (RTT * C)/7, where C
generally not be known to the sender and receiver. is the sending rate in packets per second. However, the actual
algorithm for marking congestion is an implementation detail of the
switch and will generally not be known to the sender and receiver.
Therefore, sender and receiver MUST NOT assume that a particular
marking algorithm is implemented by the switching fabric.
3.2. Echoing Congestion Information on the Receiver 3.2. Echoing Congestion Information on the Receiver
According to Section 6.1.3 of [RFC3168], the receiver sets the ECE According to Section 6.1.3 of [RFC3168], the receiver sets the ECE
flag if any of the packets being acknowledged had the CE code point flag if any of the packets being acknowledged had the CE code point
set. The receiver then continues to set the ECE flag until it set. The receiver then continues to set the ECE flag until it
receives a packet with the Congestion Window Reduced (CWR) flag set. receives a packet with the Congestion Window Reduced (CWR) flag set.
However, the DCTCP algorithm requires more detailed congestion However, the DCTCP algorithm requires more detailed congestion
information. In particular, the sender must be able to determine the information. In particular, the sender must be able to determine the
number of sent bytes that encountered congestion. Thus, the scheme number of sent bytes that encountered congestion. Thus, the scheme
skipping to change at page 9, line 5 skipping to change at page 9, line 5
This document has no actions for IANA. This document has no actions for IANA.
9. Acknowledgements 9. Acknowledgements
The DCTCP algorithm was originally proposed and analyzed in [DCTCP10] The DCTCP algorithm was originally proposed and analyzed in [DCTCP10]
by Mohammad Alizadeh, Albert Greenberg, Dave Maltz, Jitu Padhye, by Mohammad Alizadeh, Albert Greenberg, Dave Maltz, Jitu Padhye,
Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari
Sridharan. Sridharan.
Lars Eggert has received funding from the European Union's Horizon
2020 research and innovation program 2014-2018 under grant agreement
No. 644866. This document reflects only the authors' views and the
European Commission is not responsible for any use that may be made
of the information it contains.
10. References 10. References
10.1. Normative References 10.1. Normative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC
793, September 1981. 793, September 1981.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996. Selective Acknowledgment Options", RFC 2018, October 1996.
skipping to change at page 9, line 43 skipping to change at page 9, line 49
[ODCTCP] Kato, M., "Improving Transmission Performance with One- [ODCTCP] Kato, M., "Improving Transmission Performance with One-
Sided Datacenter TCP", M.S. Thesis, Keio University, 2014, Sided Datacenter TCP", M.S. Thesis, Keio University, 2014,
<http://eggert.org/students/kato-thesis.pdf>. <http://eggert.org/students/kato-thesis.pdf>.
[ADCTCP] Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis [ADCTCP] Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis
of DCTCP: Stability, Convergence, and Fairness", June of DCTCP: Stability, Convergence, and Fairness", June
2011, 2011,
<http://simula.stanford.edu/~alizade/Site/DCTCP_files/ <http://simula.stanford.edu/~alizade/Site/DCTCP_files/
dctcp_analysis-full.pdf>. dctcp_analysis-full.pdf>.
[LINUX] Borkmann, D., "Linux DCTCP patch", 2014, [LINUX] Borkmann, D. and F. Westphal, "Linux DCTCP patch", 2014,
<https://git.kernel.org/cgit/linux/kernel/git/davem/net- <https://git.kernel.org/cgit/linux/kernel/git/davem/net-
next.git/ next.git/
commit/?id=e3118e8359bb7c59555aca60c725106e6d78c5ce>. commit/?id=e3118e8359bb7c59555aca60c725106e6d78c5ce>.
[FREEBSD] Kato, M., "DCTCP (Data Center TCP) implementation", 2015, [FREEBSD] Kato, M. and H. Panchasara, "DCTCP (Data Center TCP)
implementation", 2015,
<https://github.com/freebsd/freebsd/ <https://github.com/freebsd/freebsd/
commit/8ad879445281027858a7fa706d13e458095b595f>. commit/8ad879445281027858a7fa706d13e458095b595f>.
Authors' Addresses Authors' Addresses
Stephen Bensley Stephen Bensley
Microsoft Microsoft
One Microsoft Way One Microsoft Way
Redmond, WA 98052 Redmond, WA 98052
USA USA
 End of changes. 8 change blocks. 
9 lines changed or deleted 20 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/