--- 1/draft-ietf-hybi-thewebsocketprotocol-15.txt 2011-09-27 13:14:03.930132380 +0200 +++ 2/draft-ietf-hybi-thewebsocketprotocol-16.txt 2011-09-27 13:14:04.042133962 +0200 @@ -1,19 +1,19 @@ HyBi Working Group I. Fette Internet-Draft Google, Inc. Intended status: Standards Track A. Melnikov -Expires: March 20, 2012 Isode Ltd - September 17, 2011 +Expires: March 30, 2012 Isode Ltd + September 27, 2011 The WebSocket protocol - draft-ietf-hybi-thewebsocketprotocol-15 + draft-ietf-hybi-thewebsocketprotocol-16 Abstract The WebSocket protocol enables two-way communication between a client running untrusted code running in a controlled environment to a remote host that has opted-in to communications from that code. The security model used for this is the Origin-based security model commonly used by Web browsers. The protocol consists of an opening handshake followed by basic message framing, layered over TCP. The goal of this technology is to provide a mechanism for browser-based @@ -31,21 +31,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on March 20, 2012. + This Internet-Draft will expire on March 30, 2012. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -652,22 +652,21 @@ 4. Opening Handshake 4.1. Client Requirements To _Establish a WebSocket Connection_, a client opens a connection and sends a handshake as defined in this section. A connection is defined to initially be in a CONNECTING state. A client will need to supply a /host/, /port/, /resource name/, and a /secure/ flag, which are the components of a WebSocket URI as discussed in Section 3, along with a list of /protocols/ and /extensions/ to be used. - Additionally, if the client is a web browser, an /origin/ MUST be - supplied. + Additionally, if the client is a web browser, it supplies /origin/. Clients running in controlled environments, e.g. browsers on mobile handsets tied to specific carriers, MAY offload the management of the connection to another agent on the network. In such a situation, the client for the purposes of this specification is considered to include both the handset software and any such agents. When the client is to _Establish a WebSocket Connection_ given a set of (/host/, /port/, /resource name/, and /secure/ flag), along with a list of /protocols/ and /extensions/ to be used, and an /origin/ in @@ -697,23 +696,24 @@ If the client cannot determine the IP address of the remote host (for example because all communication is being done through a proxy server that performs DNS queries itself), then the client MUST assume for the purposes of this step that each host name refers to a distinct remote host, and instead the client SHOULD limit the total number of simultaneous pending connections to a reasonably low number (e.g., the client might allow simultaneous pending connections to a.example.com and b.example.com, but if thirty simultaneous connections to a single host are requested, - that may not be allowed). In a Web browser context, the client - SHOULD consider the number of tabs the user has open in setting a - limit to the number of simultaneous pending connections. + that may not be allowed). For example in a Web browser context, + the client needs to consider the number of tabs the user has open + in setting a limit to the number of simultaneous pending + connections. NOTE: This makes it harder for a script to perform a denial of service attack by just opening a large number of WebSocket connections to a remote host. A server can further reduce the load on itself when attacked by pausing before closing the connection, as that will reduce the rate at which the client reconnects. NOTE: There is no limit to the number of established WebSocket connections a client can have with a single remote host. Servers @@ -1109,21 +1109,20 @@ of this concatenated value to obtain a 20-byte value, and base64-encoding (see Section 4 of [RFC4648]) this 20-byte hash. The ABNF [RFC2616] of this header field is defined as follows: Sec-WebSocket-Accept = base64-value-non-empty base64-value-non-empty = (1*base64-data [ base64-padding ]) | base64-padding - base64-value = *base64-data [ base64-padding ] base64-data = 4base64-character base64-padding = (2base64-character "==") | (3base64-character "=") base64-character = ALPHA | DIGIT | "+" | "/" NOTE: As an example, if the value of the "Sec-WebSocket-Key" header field in the client's handshake were "dGhlIHNhbXBsZSBub25jZQ==", the server would append the string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" to form the string "dGhlIHNhbXBsZSBub25jZQ==258EAFA5-E914-47DA-95CA- @@ -1253,47 +1252,50 @@ Sec-WebSocket-Version: 13 5. Data Framing 5.1. Overview In the WebSocket protocol, data is transmitted using a sequence of frames. To avoid confusing network intermediaries (such as intercepting proxies) and for security reasons that are further discussed in Section 10.3, a client MUST mask all frames that it - sends to the server (see Section 5.3 for further details). The - server MUST close the connection upon receiving a frame that is not - masked. In this case, a server MAY send a close frame with a status - code of 1002 (protocol error) as defined in Section 7.4.1. A server - MUST NOT mask any frames that it sends to the client. A client MUST - close a connection if it detects a masked frame. In this case, it - MAY use the status code 1002 (protocol error) as defined in - Section 7.4.1. (These rules might be relaxed in a future - specification.) + sends to the server (see Section 5.3 for further details). (Note + that masking is done whether or not the WebSocket protocol is running + over TLS.) The server MUST close the connection upon receiving a + frame that is not masked. In this case, a server MAY send a close + frame with a status code of 1002 (protocol error) as defined in + Section 7.4.1. A server MUST NOT mask any frames that it sends to + the client. A client MUST close a connection if it detects a masked + frame. In this case, it MAY use the status code 1002 (protocol + error) as defined in Section 7.4.1. (These rules might be relaxed in + a future specification.) The base framing protocol defines a frame type with an opcode, a payload length, and designated locations for extension and application data, which together define the _payload_ data. Certain bits and opcodes are reserved for future expansion of the protocol. A data frame MAY be transmitted by either the client or the server at any time after opening handshake completion and before that endpoint has sent a close frame (Section 5.5.1). 5.2. Base Framing Protocol This wire format for the data transfer part is described by the ABNF - [RFC5234] given in detail in this section. A high level overview of - the framing is given in the following figure. In a case of conflict - between the figure below and the ABNF specified later in this - section, the ABNF version should be considered to be more - authoritative. + [RFC5234] given in detail in this section. (Note that unlike in + other sections of this document the ABNF in this section is operating + on groups of bits. When encoded on the wire the most significant bit + is the leftmost in the ABNF). A high level overview of the framing + is given in the following figure. In a case of conflict between the + figure below and the ABNF specified later in this section, the ABNF + version should be considered to be more authoritative. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/63) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | @@ -1470,45 +1472,44 @@ A masked frame MUST have the field frame-masked set to 1, as defined in Section 5.2. The masking key is contained completely within the frame, as defined in Section 5.2 as frame-masking-key. It is used to mask the payload data defined in the same section as frame-payload-data, which includes extension and application data. The masking key is a 32-bit value chosen at random by the client. - The masking key MUST be derived from a strong source of entropy, and - the masking key for a given frame MUST NOT make it simple for a - server to predict the masking key for a subsequent frame. RFC 4086 - [RFC4086] discusses what entails a suitable source of entropy for - security-sensitive applications. + When preparing a masked frame, the client MUST pick a fresh masking + key from the set of allowed 32-bit values. The masking key needs to + be unpredictable, thus the masking key MUST be derived from a strong + source of entropy, and the masking key for a given frame MUST NOT + make it simple for a server/proxy to predict the masking key for a + subsequent frame. The unpredictability of the masking key is + essential to prevent the author of malicious applications from + selecting the bytes that appear on the wire. RFC 4086 [RFC4086] + discusses what entails a suitable source of entropy for security- + sensitive applications. The masking does not affect the length of the payload data. To convert masked data into unmasked data, or vice versa, the following algorithm is applied. The same algorithm applies regardless of the direction of the translation - e.g. the same steps are applied to mask the data as to unmask the data. Octet i of the transformed data ("transformed-octet-i") is the XOR of octet i of the original data ("original-octet-i") with octet at index i modulo 4 of the masking key ("masking-key-octet-j"): j = i MOD 4 transformed-octet-i = original-octet-i XOR masking-key-octet-j - When preparing a masked frame, the client MUST pick a fresh masking - key from the set of allowed 32-bit values. The masking key must be - unpredictable. The unpredictability of the masking key is essential - to prevent the author of malicious applications from selecting the - bytes that appear on the wire. - The payload length, indicated in the framing as frame-payload-length, does NOT include the length of the masking key. It is the length of the payload data, e.g. the number of bytes following the masking key. 5.4. Fragmentation The primary purpose of fragmentation is to allow sending a message that is of unknown size when the message is started without having to buffer that message. If messages couldn't be fragmented, then an endpoint would have to buffer the entire message so its length could @@ -1580,21 +1581,24 @@ o As control frames cannot be fragmented, an intermediary MUST NOT attempt to change the fragmentation of a control frame. o An intermediary MUST NOT change the fragmentation of a message if any reserved bit values are used and the meaning of these values is not known to the intermediary. o An intermediary MUST NOT change the fragmentation of any message in the context of a connection where extensions have been negotiated and the intermediary is not aware of the semantics of - the negotiated extensions. + the negotiated extensions. Similarly, an intermediary that didn't + see the WebSocket handshake (and wasn't notified about its + content) that resulted in a WebSocket connection MUST NOT change + the fragmentation of any message of such connection. o As a consequence of these rules, all fragments of a message are of the same type, as set by the first fragment's opcode. Since Control frames cannot be fragmented, the type for all fragments in a message MUST be either text or binary, or one of the reserved opcodes. _Note: if control frames could not be interjected, the latency of a ping, for example, would be very long if behind a large message. Hence, the requirement of handling control frames in the middle of a @@ -1640,26 +1644,28 @@ users. Close frames sent from client to server must be masked as per Section 5.3. The application MUST NOT send any more data frames after sending a close frame. If an endpoint receives a Close frame and that endpoint did not previously send a Close frame, the endpoint MUST send a Close frame - in response. It SHOULD do so as soon as is practical. An endpoint - MAY delay sending a close frame until its current message is sent - (for instance, if the majority of a fragmented message is already - sent, an endpoint MAY send the remaining fragments before sending a - Close frame). However, there is no guarantee that the endpoint which - has already sent a Close frame will continue to process data. + in response. (When sending a Close frame in response the endpoint + typically echos the status code it received.) It SHOULD do so as + soon as practical. An endpoint MAY delay sending a close frame until + its current message is sent (for instance, if the majority of a + fragmented message is already sent, an endpoint MAY send the + remaining fragments before sending a Close frame). However, there is + no guarantee that the endpoint which has already sent a Close frame + will continue to process data. After both sending and receiving a close message, an endpoint considers the WebSocket connection closed, and MUST close the underlying TCP connection. The server MUST close the underlying TCP connection immediately; the client SHOULD wait for the server to close the connection but MAY close the connection at any time after sending and receiving a close message, e.g. if it has not received a TCP close from the server in a reasonable time period. If a client and server both send a Close message at the same time, @@ -2273,21 +2279,21 @@ EXAMPLE: For example, if the server uses input as part of SQL queries, all input text should be escaped before being passed to the SQL server, lest the server be susceptible to SQL injection. 10.2. Origin Considerations Servers that are not intended to process input from any Web page but only for certain sites SHOULD verify the "Origin" field is an origin they expect. If the origin indicated is unacceptable to the server, then it SHOULD respond to the WebSocket handshake with a reply - containing HTTP 403 Unauthorized status code. + containing HTTP 403 Forbidden status code. The "Origin" header field protects from the attack cases when the untrusted party is typically the author of a JavaScript application that is executing in the context of the trusted client. The client itself can contact the server and via the mechanism of the "Origin" header field, determine whether to extend those communication privileges to the JavaScript application. The intent is not to prevent non-browsers from establishing connections, but rather to ensure that trusted browsers under the control of potentially malicious JavaScript cannot fake a WebSocket handshake. @@ -2362,29 +2368,29 @@ to forge a request. As such, it was not deemed necessary to mask data in both directions (the data from the server to the client is not masked). Despite the protection provided by masking, non-compliant HTTP proxies will still be vulnerable to poisoning attacks of this type by clients and servers that do not apply masking. 10.4. Implementation-Specific Limits - Implementations MUST impose implementation-specific limits on - otherwise unconstrained inputs, e.g. to prevent denial of service - attacks, to guard against running out of memory, or to work around - platform-specific limitations. In particular, a malicious endpoint - can try to exhaust its peer's memory by sending either a single big - frame (e.g. of size 2**60), or by sending a long stream of small - frames which are a part of a fragmented message. In order to protect - from this implementations SHOULD impose limit on frame sizes and the - total message size after reassembly from multiple frames. + Implementations which have implementation- and/or platform-specific + limitations regarding the frame size or total message size after + reassembly from multiple frames MUST protect themselves against + exceeding those limits. (For example, a malicious endpoint can try + to exhaust its peer's memory or mount a denial of service attack by + sending either a single big frame (e.g. of size 2**60), or by sending + a long stream of small frames which are a part of a fragmented + message.) Such an implementation SHOULD impose limit on frame sizes + and the total message size after reassembly from multiple frames. 10.5. WebSocket client authentication This protocol doesn't prescribe any particular way that servers can authenticate clients during the WebSocket handshake. The WebSocket server can use any client authentication mechanism available to a generic HTTP server, such as Cookies, HTTP Authentication, or TLS authentication. 10.6. Connection confidentiality and integrity @@ -3067,23 +3073,23 @@ Thank you to the following people who participated in discussions on the HYBI WG mailing list and contributed ideas and/or provided detailed reviews (the list is likely to be incomplete): Greg Wilkins, John Tamplin, Willy Tarreau, Maciej Stachowiak, Jamie Lokier, Scott Ferguson, Bjoern Hoehrmann, Julian Reschke, Dave Cridland, Andy Green, Eric Rescorla, Inaki Baz Castillo, Martin Thomson, Roberto Peon, Patrick McManus, Zhong Yu, Bruce Atherton, Takeshi Yoshino, Martin J. Duerst, James Graham, Simon Pieters, Roy T. Fielding, Mykyta Yevstifeyev, Len Holgate, Paul Colomiets, Piotr Kulaga, Brian Raymor, Jan Koehler, Joonas Lehtolahti, Sylvain Hellegouarch, Stephen - Farrell, Sean Turner, Pete Resnick, Peter Thorson, Joe Mason. Note - that people listed above didn't necessarily endorse the end result of - this work. + Farrell, Sean Turner, Pete Resnick, Peter Thorson, Joe Mason, John + Fallows, Alexander Philippou. Note that people listed above didn't + necessarily endorse the end result of this work. 14. References 14.1. Normative References [ANSI.X3-4.1986] American National Standards Institute, "Coded Character Set - 7-bit American Standard Code for Information Interchange", ANSI X3.4, 1986. @@ -3163,21 +3169,21 @@ [RFC6202] Loreto, S., Saint-Andre, P., Salsano, S., and G. Wilkins, "Known Issues and Best Practices for the Use of Long Polling and Streaming in Bidirectional HTTP", RFC 6202, April 2011. [RFC4270] Hoffman, P. and B. Schneier, "Attacks on Cryptographic Hashes in Internet Protocols", RFC 4270, November 2005. [W3C.REC-wsc-ui-20100812] - Saldhana, A. and T. Roessler, "Web Security Context: User + Roessler, T. and A. Saldhana, "Web Security Context: User Interface Guidelines", World Wide Web Consortium Recommendation REC-wsc-ui-20100812, August 2010, . [TALKING] Huang, L-S., Chen, E., Barth, A., and E. Rescorla, "Talking to Yourself for Fun and Profit", 2010, . [XMLHttpRequest]