--- 1/draft-ietf-hybi-thewebsocketprotocol-08.txt 2011-06-14 02:15:36.000000000 +0200 +++ 2/draft-ietf-hybi-thewebsocketprotocol-09.txt 2011-06-14 02:15:36.000000000 +0200 @@ -1,18 +1,18 @@ HyBi Working Group I. Fette Internet-Draft Google, Inc. -Intended status: Standards Track June 7, 2011 -Expires: December 9, 2011 +Intended status: Standards Track June 13, 2011 +Expires: December 15, 2011 The WebSocket protocol - draft-ietf-hybi-thewebsocketprotocol-08 + draft-ietf-hybi-thewebsocketprotocol-09 Abstract The WebSocket protocol enables two-way communication between a client running untrusted code running in a controlled environment to a remote host that has opted-in to communications from that code. The security model used for this is the Origin-based security model commonly used by Web browsers. The protocol consists of an opening handshake followed by basic message framing, layered over TCP. (In theory, any transport protocol could be used so long as it provides @@ -33,21 +33,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on December 9, 2011. + This Internet-Draft will expire on December 15, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -72,21 +72,21 @@ 2. Conformance Requirements . . . . . . . . . . . . . . . . . . . 13 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 13 3. WebSocket URIs . . . . . . . . . . . . . . . . . . . . . . . . 15 4. Data Framing . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2. Base Framing Protocol . . . . . . . . . . . . . . . . . . 16 4.3. Client-to-Server Masking . . . . . . . . . . . . . . . . 20 4.4. Fragmentation . . . . . . . . . . . . . . . . . . . . . . 21 4.5. Control Frames . . . . . . . . . . . . . . . . . . . . . 22 4.5.1. Close . . . . . . . . . . . . . . . . . . . . . . . . 23 - 4.5.2. Ping . . . . . . . . . . . . . . . . . . . . . . . . . 23 + 4.5.2. Ping . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.5.3. Pong . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.6. Data Frames . . . . . . . . . . . . . . . . . . . . . . . 24 4.7. Examples . . . . . . . . . . . . . . . . . . . . . . . . 25 4.8. Extensibility . . . . . . . . . . . . . . . . . . . . . . 25 5. Opening Handshake . . . . . . . . . . . . . . . . . . . . . . 27 5.1. Client Requirements . . . . . . . . . . . . . . . . . . . 27 5.2. Server-side Requirements . . . . . . . . . . . . . . . . 32 5.2.1. Reading the Client's Opening Handshake . . . . . . . . 32 5.2.2. Sending the Server's Opening Handshake . . . . . . . . 33 6. Sending and Receiving Data . . . . . . . . . . . . . . . . . . 37 @@ -511,24 +511,24 @@ is specified, the server needs to include the same field and one of the selected subprotocol values in its response for the connection to be established. These subprotocol names should be registered as per Section 11.10. To avoid potential collisions, it is recommended to use names that contain the domain name of the subprotocol's originator. For example, if Example Corporation were to create a Chat subprotocol to be implemented by many servers around the Web, they could name it "chat.example.com". If the Example Organization called their - competing subprotocol "example.org's chat protocol", then the two - subprotocols could be implemented by servers simultaneously, with the - server dynamically selecting which subprotocol to use based on the - value sent by the client. + competing subprotocol "chat.example.org", then the two subprotocols + could be implemented by servers simultaneously, with the server + dynamically selecting which subprotocol to use based on the value + sent by the client. Subprotocols can be versioned in backwards-incompatible ways by changing the subprotocol name, e.g. going from "bookings.example.net" to "v2.bookings.example.net". These subprotocols would be considered completely separate by WebSocket clients. Backwards-compatible versioning can be implemented by reusing the same subprotocol string but carefully designing the actual subprotocol to support this kind of extensibility. 2. Conformance Requirements @@ -595,39 +595,42 @@ transmission arbitrarily, e.g. buffering data so as to send fewer IP packets. 3. WebSocket URIs This specification defines two URI schemes, using the ABNF syntax defined in RFC 5234 [RFC5234], and terminology and ABNF productions defined by the URI specification RFC 3986 [RFC3986]. ws-URI = "ws:" "//" host [ ":" port ] path [ "?" query ] - wss-URI = "ws:" "//" host [ ":" port ] path [ "?" query ] + wss-URI = "wss:" "//" host [ ":" port ] path [ "?" query ] host = - port = + port = path = query = The port component is OPTIONAL; the default for "ws" is port 80, while the default for "wss" is port 443. The URI is called "secure" if the scheme component matches "wss" case-insensitively. The "resource-name" can be constructed by concatenating - "/" if the path component is empty - the path component - "?" if the query component is non-empty - the query component + o "/" if the path component is empty + + o the path component + + o "?" if the query component is non-empty + + o the query component Fragment identifiers are meaningless in the context of WebSocket URIs, and MUST NOT be used on these URIs. The character "#" in URIs MUST be escaped as %23 if used as part of the query component. 4. Data Framing 4.1. Overview In the WebSocket protocol, data is transmitted using a sequence of @@ -672,21 +675,22 @@ FIN: 1 bit Indicates that this is the final fragment in a message. The first fragment MAY also be the final fragment. RSV1, RSV2, RSV3: 1 bit each MUST be 0 unless an extension is negotiated which defines meanings for non-zero values. If a nonzero value is received and none of the negotiated extensions defines the meaning of such a nonzero - value, the receiving endpoint MUST ignore that value. + value, the receiving endpoint MUST _Fail the WebSocket + Connection_. Opcode: 4 bits Defines the interpretation of the payload data. If an unknown opcode is received, the receiving endpoint MUST ignore that frame. The following values are defined. * %x0 denotes a continuation frame * %x1 denotes a text frame @@ -884,40 +888,39 @@ payloads of the fragments in order, however in the presence of extensions this may not hold true as the extension defines the interpretation of the extension data present. For instance, extension data may only be present at the beginning of the first fragment and apply to subsequent fragments, or there may be extension data present in each of the fragments that applies only to that particular fragment. Setting aside the issue of extensions, the following example demonstrates how fragmentation works. - o EXAMPLE: For a text message sent as three fragments, the first + EXAMPLE: For a text message sent as three fragments, the first fragment would have an opcode of 0x1 and a FIN bit clear, the second fragment would have an opcode of 0x0 and a FIN bit clear, and the third fragment would have an opcode of 0x0 and a FIN bit that is set. o Control frames MAY be injected in the middle of a fragmented message. Control frames themselves MUST NOT be fragmented. o Message fragments MUST be delivered to the recipient in the order sent by the sender. + o The fragments of one message MUST NOT be interleaved between the + fragments of another message unless an extension has been + negotiated that can interpret the interleaving. + o An endpoint MUST be capable of handling control frames in the middle of a fragmented message. - o _Note: if control frames could not be interjected, the latency of - a ping, for example, would be very long if behind a large message. - Hence, the requirement of handling control frames in the middle of - a fragmented message._ - o A sender MAY create fragments of any size for non-control messages. o Clients and servers MUST support receiving both fragmented and unfragmented messages. o As control frames cannot be fragmented, an intermediary MUST NOT attempt to change the fragmentation of a control frame. o An intermediary MUST NOT change the fragmentation of a message if @@ -928,20 +931,25 @@ in the context of a connection where extensions have been negotiated and the intermediary is not aware of the semantics of the negotiated extensions. o As a consequence of these rules, all fragments of a message are of the same type, as set by the first fragment's opcode. Since Control frames cannot be fragmented, the type for all fragments in a message MUST be either text or binary, or one of the reserved opcodes. + _Note: if control frames could not be interjected, the latency of a + ping, for example, would be very long if behind a large message. + Hence, the requirement of handling control frames in the middle of a + fragmented message._ + 4.5. Control Frames Control frames are identified by opcodes where the most significant bit of the opcode is 1. Currently defined opcodes for control frames include 0x8 (Close), 0x9 (Ping), and 0xA (Pong). Opcodes 0xB-0xF are reserved for further control frames yet to be defined. Control frames are used to communicate state about the WebSocket. Control frames can be interjected in the middle of a fragmented message. @@ -959,20 +967,23 @@ endpoint having received a frame that does not conform to the format expected by the other endpoint. If there is a body, the first two bytes of the body MUST be a 2-byte unsigned integer (in network byte order) representing a status code with value /code/ defined in Section 7.4. Following the 2-byte integer the body MAY contain UTF-8 encoded data with value /reason/, the interpretation of which is not defined by this specification. This data is not necessarily human readable, but may be useful for debugging or passing information relevant to the script that opened the connection. + Close frames sent from client to server must be masked as per + Section 4.3. + The application MUST NOT send any more data frames after sending a close frame. If an endpoint receives a Close frame and that endpoint did not previously send a Close frame, the endpoint MUST send a Close frame in response. It SHOULD do so as soon as is practical. An endpoint MAY delay sending a close frame until its current message is sent (for instance, if the majority of a fragmented message is already sent, an endpoint MAY send the remaining fragments before sending a Close frame). However, there is no guarantee that the endpoint which @@ -1284,21 +1296,28 @@ be lower-case. The value MUST NOT contain letters in the range U+0041 to U+005A (i.e. LATIN CAPITAL LETTER A to LATIN CAPITAL LETTER Z) [I-D.ietf-websec-origin]. The ABNF is as defined in Section 6.1 of [I-D.ietf-websec-origin]. As an example, if code is running on www.example.com attempting to establish a connection to ww2.example.com, the value of the header would be "http://www.example.com". 9. The request MUST include a header with the name "Sec-WebSocket- - Version". The value of this header MUST be 8. + Version". The value of this header MUST be 8. _Note: Although a + draft -09 was published, as -09 was comprised of editorial + changes and not changes to the wire protocol, 9 was not used as + a valid value for Sec-WebSocket-Version. This value was + reserved in the IANA registry but was not and will not be used. + + If subsequent changes to the wire protocol are necessary, 9 will + be skipped to prevent confusion with the draft 9 protocol._ 10. The request MAY include a header with the name "Sec-WebSocket- Protocol". If present, this value indicates the subprotocol(s) the client wishes to speak, ordered by preference. The elements that comprise this value MUST be non-empty strings with characters in the range U+0021 to U+007E not including separator characters as defined in [RFC2616], and MUST all be unique strings. The ABNF for the value of this header is 1#token, where the definitions of constructs and rules are as given in [RFC2616]. @@ -1494,24 +1513,30 @@ the values from the "Sec-WebSocket-Extensions" field. The absence of such a field is equivalent to the null value. The empty string is not the same as the null value for these purposes. Extensions not listed by the client MUST NOT be listed. The method by which these values should be selected and interpreted is discussed in Section 9.1. 3. If the server chooses to accept the incoming connection, it MUST reply with a valid HTTP response indicating the following. - 1. A 101 response code. Such a response could look like - "HTTP/1.1 101 Switching Protocols" + 1. A Status-Line with a 101 response code as per RFC 2616 + [RFC2616]. Such a response could look like "HTTP/1.1 101 + Switching Protocols" - 2. A "Sec-WebSocket-Accept" header. The value of this header is + 2. An "Upgrade" header with value "websocket" as per RFC 2616 + [RFC2616]. + + 3. A "Connection" header with value "Upgrade" + + 4. A "Sec-WebSocket-Accept" header. The value of this header is constructed by concatenating /key/, defined above in Paragraph 2 of Section 5.2.2, with the string "258EAFA5-E914- 47DA-95CA-C5AB0DC85B11", taking the SHA-1 hash of this concatenated value to obtain a 20-byte value, and base64- encoding this 20-byte hash. The ABNF of this header is defined as follows: accept-value = base64-value base64-value = *base64-data [ base64-padding ] @@ -1524,24 +1549,24 @@ "dGhlIHNhbXBsZSBub25jZQ==", the server would append the string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" to form the string "dGhlIHNhbXBsZSBub25jZQ==258EAFA5-E914-47DA-95CA- C5AB0DC85B11". The server would then take the SHA-1 hash of this string, giving the value 0xb3 0x7a 0x4f 0x2c 0xc0 0x62 0x4f 0x16 0x90 0xf6 0x46 0x06 0xcf 0x38 0x59 0x45 0xb2 0xbe 0xc4 0xea. This value is then base64-encoded, to give the value "s3pPLMBiTxaQ9kYGzzhZRbK+xOo=", which would be returned in the "Sec-WebSocket-Accept" header. - 3. Optionally, a "Sec-WebSocket-Protocol" header, with a value + 5. Optionally, a "Sec-WebSocket-Protocol" header, with a value /subprotocol/ as defined in Paragraph 2 of Section 5.2.2. - 4. Optionally, a "Sec-WebSocket-Extensions" header, with a value + 6. Optionally, a "Sec-WebSocket-Extensions" header, with a value /extensions/ as defined in Paragraph 2 of Section 5.2.2. If multiple extensions are to be used, they must all be listed in a single Sec-WebSocket-Extensions header. This header MUST NOT be repeated. This completes the server's handshake. If the server finishes these steps without aborting the WebSocket handshake, the server considers the WebSocket connection to be established and that the WebSocket connection is in the OPEN state. At this point, the server may begin sending (and receiving) data. @@ -1599,20 +1624,23 @@ frames is concatenated to form the /data/. When the last fragment is received as indicated by the FIN bit (frame-fin), it is said that _A WebSocket Message Has Been Received_ with data /data/ (comprised of the concatenation of the _Application Data_ of the fragments) and type /type/ (noted from the first frame of the fragmented message). Subsequent data frames MUST be interpreted as belonging to a new WebSocket Message. Extensions (Section 9) MAY change the semantics of how data is read, specifically including what comprises a message boundary. + Extensions, in addition to adding "Extension data" before the + "Application data" in a payload, MAY also modify the "Application + data" (such as by compressing it). Data frames received by a server from a client MUST be unmasked as described in Section 4.3. 7. Closing the connection 7.1. Definitions 7.1.1. Close the WebSocket Connection @@ -2006,25 +2034,29 @@ "Sec-WebSocket-Origin" field, without bothering to check the client's value. If at any time a server is faced with data that it does not understand, or that violates some criteria by which the server determines safety of input, or when the server sees an opening handshake that does not correspond to the values the server is expecting (e.g. incorrect path or origin), the server SHOULD just disconnect. It is always safe to disconnect. - The biggest security risk when sending text data using this protocol - is sending data using the wrong encoding. If an attacker can trick - the server into sending data encoded as ISO-8859-1 verbatim (for - instance), rather than encoded as UTF-8, then the attacker could - inject arbitrary frames into the data stream. + A common class of security problems arise when sending text data + using using the wrong encoding. This protocol specifies that + messages with a Text data type (as opposed to Binary or other types) + contain UTF-8 encoded data. Although the length is still indicated + and applications implementing this protocol should use the length to + determine where the frame actually ends, sending data in an improper + encoding may still break assumptions applications built on top of + this protocol may make, leading from anything to misinterpretation of + data to loss of data to potential security bugs. In addition to endpoints being the target of attacks via WebSockets, other parts of web infrastructure, such as proxies, may be the subject of an attack. In particular, an intermediary may interpret a WebSocket frame from a client as a request, and a frame from the server as a response to that request. For instance, an attacker could get a browser to establish a connection to its server, get the browser to send a frame that looks to an intermediary like a GET request for a common piece of JavaScript on another domain, and send back a frame that is interpreted as a cacheable response to that @@ -2462,20 +2494,22 @@ | 4 + draft-ietf-hybi-thewebsocketprotocol-04 | -+----------------+-----------------------------------------+- | 5 + draft-ietf-hybi-thewebsocketprotocol-05 | -+----------------+-----------------------------------------+- | 6 + draft-ietf-hybi-thewebsocketprotocol-06 | -+----------------+-----------------------------------------+- | 7 + draft-ietf-hybi-thewebsocketprotocol-07 | -+----------------+-----------------------------------------+- | 8 + draft-ietf-hybi-thewebsocketprotocol-08 | -+----------------+-----------------------------------------+- + | 9 + draft-ietf-hybi-thewebsocketprotocol-09 | + -+----------------+-----------------------------------------+- 11.13. WebSocket Close Code Number Registry This specification requests the creation of a new IANA registry for WebSocket Connection Close Code Numbers in accordance with the principles set out in RFC 5226 [RFC5226]. As part of this registry IANA will maintain the following information: