--- 1/draft-ietf-tcpm-fastopen-08.txt 2014-07-01 15:14:27.724029347 -0700 +++ 2/draft-ietf-tcpm-fastopen-09.txt 2014-07-01 15:14:27.776030607 -0700 @@ -1,17 +1,17 @@ Internet Draft Y. Cheng -draft-ietf-tcpm-fastopen-08.txt J. Chu +draft-ietf-tcpm-fastopen-09.txt J. Chu Intended status: Experimental S. Radhakrishnan -Expiration date: August, 2014 A. Jain +Expiration date: January, 2015 A. Jain Google, Inc. - March 11, 2014 + June 30, 2014 TCP Fast Open Abstract This document describes an experimental TCP mechanism TCP Fast Open (TFO). TFO allows data to be carried in the SYN and SYN-ACK packets and consumed by the receiving end during the initial connection handshake, thus saving up to one full round trip time (RTT) compared to the standard TCP, which requires a three-way handshake (3WHS) to @@ -65,84 +65,85 @@ 2. Data In SYN . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Relaxing TCP Semantics on Duplicated SYNs . . . . . . . . . 4 2.2. SYNs with Spoofed IP Addresses . . . . . . . . . . . . . . 4 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . . 5 4. Protocol Details . . . . . . . . . . . . . . . . . . . . . . . 7 4.1. Fast Open Cookie . . . . . . . . . . . . . . . . . . . . . 7 4.1.1. TCP Options . . . . . . . . . . . . . . . . . . . . . . 7 4.1.2. Server Cookie Handling . . . . . . . . . . . . . . . . 8 4.1.3. Client Cookie Handling . . . . . . . . . . . . . . . . 9 4.1.3.1 Client Caching Negative Responses . . . . . . . . . 9 - 4.2. Fast Open Protocol . . . . . . . . . . . . . . . . . . . . 10 - 4.2.1. Fast Open Cookie Request . . . . . . . . . . . . . . . 10 - 4.2.2. TCP Fast Open . . . . . . . . . . . . . . . . . . . . . 11 - 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 13 + 4.2. Fast Open Protocol . . . . . . . . . . . . . . . . . . . . 11 + 4.2.1. Fast Open Cookie Request . . . . . . . . . . . . . . . 11 + 4.2.2. TCP Fast Open . . . . . . . . . . . . . . . . . . . . . 12 + 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 14 5.1. Resource Exhaustion Attack by SYN Flood with Valid - Cookies . . . . . . . . . . . . . . . . . . . . . . . . . . 13 - 5.1.1 Attacks from behind Shared Public IPs (NATs) . . . . . . 14 - 5.2. Amplified Reflection Attack to Random Host . . . . . . . . 15 - 6. TFO's Applicability . . . . . . . . . . . . . . . . . . . . . . 16 - 6.1 Duplicate Data in SYNs . . . . . . . . . . . . . . . . . . . 16 - 6.2 Potential Performance Improvement . . . . . . . . . . . . . 16 - 6.3. Example: Web Clients and Servers . . . . . . . . . . . . . 17 - 6.3.1. HTTP Request Replay . . . . . . . . . . . . . . . . . . 17 - 6.3.2. Speculative Connections by the Applications . . . . . . 17 - 6.3.3. HTTP over TLS (HTTPS) . . . . . . . . . . . . . . . . . 17 - 6.3.4. Comparison with HTTP Persistent Connections . . . . . . 17 - 7. Open Areas for Experimentation . . . . . . . . . . . . . . . . 18 - 7.1. Performance impact due to middle-boxes and NAT . . . . . . 18 - 7.2. Cookie-less Fast Open . . . . . . . . . . . . . . . . . . . 19 - 7.3 Impact on congestion control . . . . . . . . . . . . . . . . 19 - 8. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 20 - 8.1. T/TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 - 8.2. Common Defenses Against SYN Flood Attacks . . . . . . . . . 20 - 8.3. TCP Cookie Transaction (TCPCT) . . . . . . . . . . . . . . 20 + Cookies . . . . . . . . . . . . . . . . . . . . . . . . . . 14 + 5.1.1 Attacks from behind Shared Public IPs (NATs) . . . . . . 15 + 5.2. Amplified Reflection Attack to Random Host . . . . . . . . 16 + 6. TFO's Applicability . . . . . . . . . . . . . . . . . . . . . . 17 + 6.1 Duplicate Data in SYNs . . . . . . . . . . . . . . . . . . . 17 + 6.2 Potential Performance Improvement . . . . . . . . . . . . . 17 + 6.3. Example: Web Clients and Servers . . . . . . . . . . . . . 18 + 6.3.1. HTTP Request Replay . . . . . . . . . . . . . . . . . . 18 + 6.3.2. Speculative Connections by the Applications . . . . . . 18 + 6.3.3. HTTP over TLS (HTTPS) . . . . . . . . . . . . . . . . . 18 + 6.3.4. Comparison with HTTP Persistent Connections . . . . . . 18 + 7. Open Areas for Experimentation . . . . . . . . . . . . . . . . 19 + 7.1. Performance impact due to middle-boxes and NAT . . . . . . 19 + 7.2. Cookie-less Fast Open . . . . . . . . . . . . . . . . . . . 20 + 7.3 Impact on congestion control . . . . . . . . . . . . . . . . 20 + 8. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 21 + 8.1. T/TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 + 8.2. Common Defenses Against SYN Flood Attacks . . . . . . . . . 21 + 8.3. TCP Cookie Transaction (TCPCT) . . . . . . . . . . . . . . 21 - 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 20 - 10. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 21 - 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 - 11.1. Normative References . . . . . . . . . . . . . . . . . . . 21 - 11.2. Informative References . . . . . . . . . . . . . . . . . . 21 - Appendix A. Example Socket API Changes to support TFO . . . . . . 23 - A.1 Active Open . . . . . . . . . . . . . . . . . . . . . . . . 23 - A.2 Passive Open . . . . . . . . . . . . . . . . . . . . . . . . 23 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 + 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 21 + 10. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 23 + 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 + 11.1. Normative References . . . . . . . . . . . . . . . . . . . 23 + 11.2. Informative References . . . . . . . . . . . . . . . . . . 23 + Appendix A. Example Socket API Changes to support TFO . . . . . . 25 + A.1 Active Open . . . . . . . . . . . . . . . . . . . . . . . . 25 + A.2 Passive Open . . . . . . . . . . . . . . . . . . . . . . . . 25 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26 1. Introduction TCP Fast Open (TFO) is an experimental update to TCP that enables data to be exchanged safely during TCP's connection handshake. This document describes a design that enables applications to save a round trip while avoiding severe security ramifications. At the core of TFO is a security cookie used by the server side to authenticate a client initiating a TFO connection. This document covers the details of exchanging data during TCP's initial handshake, the protocol for TFO cookies, potential new security vulnerabilities and their mitigation, and the new socket API. TFO is motivated by the performance needs of today's Web applications. Current TCP only permits data exchange after the 3-way handshake (3WHS)[RFC793], which adds one RTT to network latency. For short Web transfers this additional RTT is a significant portion of overall network latency, even when HTTP persistent connection is - widely used. For example, the Chrome browser keeps TCP connections - idle for up to 5 minutes but 35% of Chrome HTTP requests are made on - new TCP connections [RCCJR11]. For such Web and Web-like applications - placing data in the SYN can yield significant latency improvements. - Next we describe how we resolve the challenges that arise upon doing - so. + widely used. For example, the Chrome browser [Chrome] keeps TCP + connections idle for up to 5 minutes but 35% of HTTP requests are + made on new TCP connections [RCCJR11]. For such Web and Web-like + applications placing data in the SYN can yield significant latency + improvements. Next we describe how we resolve the challenges that + arise upon doing so. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. + TFO refers to TCP Fast Open. Client refers to the TCP's active open side and server refers to the TCP's passive open side. 2. Data In SYN Standard TCP already allows data to be carried in SYN packets ([RFC793], section 3.4) but forbids the receiver from delivering it to the application until 3WHS is completed. This is because TCP's initial handshake serves to capture old or duplicate SYNs. @@ -294,39 +295,41 @@ server in subsequent SYN packets. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Kind | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ~ Cookie ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Kind 1 byte: constant TBD (assigned by IANA) + Kind 1 byte: constant-TBD (to be assigned by IANA) Length 1 byte: range 6 to 18 (bytes); limited by remaining space in the options field. The number MUST be even. Cookie 4 to 16 bytes (Length - 2) Options with invalid Length values or without SYN flag set MUST be ignored. The minimum Cookie size is 4 bytes. Although the diagram shows a cookie aligned on 32-bit boundaries, alignment is not required. Fast Open Cookie Request Option The client uses this option in the SYN packet to request a cookie from a TFO-enabled server +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Kind | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Kind 1 byte: same as the Fast Open Cookie option + Kind 1 byte: constant-TBD (same value as the Fast Open + Cookie option) + Length 1 byte: constant 2. This distinguishes the option from the Fast Open cookie option. Options with invalid Length values, without SYN flag set, or with ACK flag set MUST be ignored. 4.1.2. Server Cookie Handling The server is in charge of cookie generation and authentication. The cookie SHOULD be a message authentication code tag with the following properties: @@ -373,38 +376,40 @@ If only one valid cookie is allowed per-IP and the server can regenerate the cookie independently, the best validation process is to simply regenerate a valid cookie and compare it against the incoming cookie. In that case if the incoming cookie fails the check, a valid cookie is readily available to be sent to the client. 4.1.3. Client Cookie Handling The client MUST cache cookies from servers for later Fast Open - connections. For a multi-homed client, the cookies are both client - and server IP dependent. Beside the cookie we RECOMMEND that the - client caches the MSS to the server to enhance performance. + connections. For a multi-homed client, the cookies are dependent on + the client and server IP addresses. Hence the client should cache at + most one (most recently received) cookie per client and server IP + addresses pair. - The MSS advertised by the server is stored in the cache to determine - the maximum amount of data that can be supported in the SYN packet. - This information is needed because data is sent before the server - announces its MSS in the SYN-ACK packet. Without this information, - the data size in the SYN packet is limited to the default MSS of 536 - bytes for IPv4 [RFC1122] and 1240 bytes for IPv6 [RFC2460]. In - particular it's known an IPv4 receiver advertised MSS less than 536 - bytes would result in transmission of an unexpected large segment. If - the cache MSS is larger than the typical 1460 bytes, the extra large - data in SYN segment may have issues that offsets the performance - benefit of Fast Open. For examples, the super-size SYN may trigger IP - fragmentation and may confuse firewall or middle-boxes, causing SYN - retransmission and other side-effects. Therefore the client MAY limit - the cached MSS to 1460 bytes. + Beside the cookie we RECOMMEND that the client caches the MSS to the + server to enhance performance. The MSS advertised by the server is + stored in the cache to determine the maximum amount of data that can + be supported in the SYN packet. This information is needed because + data is sent before the server announces its MSS in the SYN-ACK + packet. Without this information, the data size in the SYN packet is + limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1240 + bytes for IPv6 [RFC2460]. In particular it's known an IPv4 receiver + advertised MSS less than 536 bytes would result in transmission of an + unexpected large segment. If the cache MSS is larger than the typical + 1460 bytes, the extra large data in SYN segment may have issues that + offsets the performance benefit of Fast Open. For examples, the + super-size SYN may trigger IP fragmentation and may confuse firewall + or middle-boxes, causing SYN retransmission and other side-effects. + Therefore the client MAY limit the cached MSS to 1460 bytes. 4.1.3.1 Client Caching Negative Responses The client MUST cache negative responses from the server in order to avoid potential connection failures. Negative responses include server not acknowledging the data in SYN, ICMP error messages, and most importantly no response (SYN/ACK) from the server at all, i.e., connection timeout. The last case is likely due to incompatible middle-boxes or firewall blocking the connection completely after it sees data in SYN. If the client does not react to these negative @@ -416,21 +421,22 @@ at least temporarily. Since TFO is enabled on a per-service port basis but cookies are independent of service ports, the client's cache should include remote port numbers too. 4.2. Fast Open Protocol One predominant requirement of TFO is to be fully compatible with existing TCP implementations, both on the client and the server sides. - The server keeps two variables per listening port: + The server keeps two variables per listening socket (IP address & + port): FastOpenEnabled: default is off. It MUST be turned on explicitly by the application. When this flag is off, the server does not perform any TFO related operations and MUST ignore all cookie options. PendingFastOpenRequests: tracks number of TFO connections in SYN-RCVD state. If this variable goes over a preset system limit, the server MUST disable TFO for all new connection requests until PendingFastOpenRequests drops below the system limit. This variable is used for defending some vulnerabilities discussed in the "Security @@ -463,22 +469,22 @@ The network or servers may drop the SYN or SYN-ACK packets with the new cookie options, which will cause SYN or SYN-ACK timeouts. We RECOMMEND both the client and the server to retransmit SYN and SYN- ACK without the cookie options on timeouts. This ensures the connections of cookie requests will go through and lowers the latency penalty (of dropped SYN/SYN-ACK packets). The obvious downside for maximum compatibility is that any regular SYN drop will fail the cookie (although one can argue the delay in the data transmission till after 3WHS is justified if the SYN drop is due to network - congestion). Next section describes a heuristic to detect such drops - when the client receives the SYN-ACK. + congestion). The next section describes a heuristic to detect such + drops when the client receives the SYN-ACK. We also RECOMMEND the client to record the set of servers that failed to respond to cookie requests and only attempt another cookie request after certain period. An alternate proposal is to request a TFO cookie in the FIN instead, since FIN-drop by incompatible middle-boxes does not affect latency. However paths that block SYN cookies may be more likely to drop a later SYN packet with data, and many applications close a connection with RST instead anyway. @@ -551,22 +557,22 @@ client in SYN-SENT state. The protocol remains the same because [RFC793] already supports both data in SYN and simultaneous open. But the client's socket may have data available to read before it's connected. This document does not cover the corresponding API change. Client: Receiving SYN-ACK The client SHOULD perform the following steps upon receiving the SYN- ACK: - 1. Update the cookie cache if the SYN-ACK has a Fast Open Cookie - Option or MSS option or both. + 1. If the SYN-ACK has a Fast Open Cookie Option or MSS option or both, + update the corresponding cookie and MSS information in the cookie cache. 2. Send an ACK packet. Set acknowledgment number to RCV.NXT and include the data after SND.UNA if data is available. 3. Advance to the ESTABLISHED state. Note there is no latency penalty if the server does not acknowledge the data in the original SYN packet. The client SHOULD retransmit any unacknowledged data in the first ACK packet in step 2. The data exchange will start after the handshake like a regular TCP @@ -721,21 +727,23 @@ in the SYN but not subsequent data exchanges. Empirically [JIDKT07] showed the packet duplication on a Tier-1 network is rare. Since the replay only happens specifically when the SYN data packet is duplicated and also the duplicate arrives after the receiver has cleared the original SYN's connection state, the replay is thought to be uncommon in practice. Neverthless a client that cannot handle receiving the same SYN data more than once MUST NOT enable TFO to send data in a SYN. Similarly a server that cannot accept receiving the same SYN data more than once MUST NOT enable TFO - to receive data in a SYN. + to receive data in a SYN. Further investigation is needed to judge + about the probability of receiving duplicated SYN or SYN-ACK with + data in non-Tier 1 networks. 6.2 Potential Performance Improvement TFO is designed for latency-conscious applications that are sensitive to TCP's initial connection setup delay. To benefit from TFO, the first application data unit (e.g., an HTTP request) needs to be no more than TCP's maximum segment size (minus options used in SYN). Otherwise the remote server can only process the client's application data unit once the rest of it is delivered after the initial handshake, diminishing TFO's benefit. @@ -798,27 +806,27 @@ timeout idle connections within 30 minutes. End hosts, unaware of silent middle-box timeouts, suffer multi-minute TCP timeouts upon using those long-idle connections. To circumvent this problem, some applications send frequent TCP keep- alive probes. However, this technique drains power on mobile devices [MQXMZ11]. In fact, power has become such a prominent issue in modern LTE devices that mobile browsers close HTTP connections within seconds or even immediately [SOUDERS11]. - [RCCJR11] studied Chrome browser performance based on 28 days of - global statistics. The Chrome browser keeps idle HTTP persistent - connections for 5 to 10 minutes. However the average number of the - transactions per connection is only 3.3 and TCP 3WHS accounts for up - to 25% of the HTTP transaction network latency. The authors estimated - that TFO improves page load time by 10% to 40% on selected popular - Web sites. + [RCCJR11] studied Chrome browser [Chrome] performance based on 28 + days of global statistics. The Chrome browser keeps idle HTTP + persistent connections for 5 to 10 minutes. However the average + number of the transactions per connection is only 3.3 and TCP 3WHS + accounts for up to 25% of the HTTP transaction network latency. The + authors estimated that TFO improves page load time by 10% to 40% on + selected popular Web sites. 7. Open Areas for Experimentation We now outline some areas that need experimentation in the Internet and under different network scenarios. These experiments should help the community evaluate Fast Open benefits and risks towards further standardization and implementation of Fast Open and its related protocols. 7.1. Performance impact due to middle-boxes and NAT @@ -918,26 +926,28 @@ TCPCT [RFC6013] eliminates server state during initial handshake and defends spoofing DoS attacks. Like TFO, TCPCT allows SYN and SYN-ACK packets to carry data. But the server can only send up to MSS bytes of data during the handshake instead of the initial congestion window unlike TFO. Therefore the latency of applications such as Web may be worse than with TFO. 9. IANA Considerations - The Fast Open Cookie Option and Fast Open Cookie Request Option - define no new namespace. The options require IANA to allocate one - value from the TCP option Kind namespace. Early implementation before - the IANA allocation SHOULD follow [RFC6994] and use experimental - option 254 and magic number 0xF989 (16 bits), then migrate to the new - option after the allocation accordingly. + IANA is requested to allocate one value from the TCP Option Kind + Numbers: The constant-TBD in Section Section 4.1.1 has to be replaced + with the newly assigned value. The length of the new TCP option Kind + is variable and the Meaning should be set to "TCP Fast Open Cookie". + Early implementation before the IANA allocation SHOULD follow + [RFC6994] and use experimental option 254 and magic number 0xF989 (16 + bits), then migrate to the new option after the allocation + accordingly. 10. Acknowledgement We thank Bob Briscoe, Michael Scharf, Gorry Fairhurst, Rick Jones, Roberto Peon, William Chan, Adam Langley, Neal Cardwell, Eric Dumazet, and Matt Mathis for their feedbacks. We especially thank Barath Raghavan for his contribution on the security design of Fast Open and proofreading this draft numerous times. 11. References @@ -1028,20 +1038,22 @@ [BELSHE11] Belshe, M., "The era of browser preconnect.", http://www.belshe.com/2011/02/10/ the-era-of-browser-preconnect/ [JIDKT07] Jaiswal, S., Iannaccone, G., Diot, C., Kurose, J., Towsley, D., "Measurement and classification of out-of-sequence packets in a tier-1 IP backbone.". IEEE/ACM Transactions on Networking (TON), 15(1), 54-66. + [Chrome] Chrome Browser. https://www.google.com/intl/en-US/chrome/browser/ + Appendix A. Example Socket API Changes to support TFO A.1 Active Open The active open side involves changing or replacing the connect() call, which does not take a user data buffer argument. We recommend replacing connect() call to minimize API changes and hence applications to reduce the deployment hurdle. One solution implemented in Linux 3.7 is introducing a new flag