draft-ietf-websec-mime-sniff-02.txt   draft-ietf-websec-mime-sniff-03.txt 
None A. Barth None A. Barth
Internet-Draft I. Hickson Internet-Draft I. Hickson
Expires: August 8, 2011 Google, Inc. Expires: November 8, 2011 Google, Inc.
February 4, 2011 May 7, 2011
Media Type Sniffing Media Type Sniffing
draft-ietf-websec-mime-sniff-02 draft-ietf-websec-mime-sniff-03
Abstract Abstract
Many web servers supply incorrect Content-Type header fields with Many web servers supply incorrect Content-Type header fields with
their HTTP responses. In order to be compatible with these servers, their HTTP responses. In order to be compatible with these servers,
user agents consider the content of HTTP responses as well as the user agents consider the content of HTTP responses as well as the
Content-Type header fields when determining the effective media type Content-Type header fields when determining the effective media type
of the response. This document describes an algorithm for of the response. This document describes an algorithm for
determining the effective media type of HTTP responses that balances determining the effective media type of HTTP responses that balances
security and compatibility considerations. security and compatibility considerations.
skipping to change at page 1, line 44 skipping to change at page 1, line 44
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 8, 2011. This Internet-Draft will expire on November 8, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the BSD License. described in the BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Web Pages . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3. Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. Text or Binary . . . . . . . . . . . . . . . . . . . . . . . . 8 4. Web Pages . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5. Unknown Type . . . . . . . . . . . . . . . . . . . . . . . . . 10 5. Text or Binary . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1. Signature for H.264 . . . . . . . . . . . . . . . . . . . 15 6. Unknown Type . . . . . . . . . . . . . . . . . . . . . . . . . 11
6. Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.1. Signature for MP4 . . . . . . . . . . . . . . . . . . . . 16
7. Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7. Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
8. Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 8. Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
9. Feed or HTML . . . . . . . . . . . . . . . . . . . . . . . . . 19 9. Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 10. Feed or HTML . . . . . . . . . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
11.1. Normative References . . . . . . . . . . . . . . . . . . . 23
11.2. Informative References . . . . . . . . . . . . . . . . . . 23
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24
1. Introduction 1. Introduction
The HTTP Content-Type header field indicates the media type of an The HTTP Content-Type header field indicates the media type of an
HTTP response. However, many HTTP servers supply a Content-Type that HTTP response. However, many HTTP servers supply a Content-Type that
does not match the actual contents of the response. Historically, does not match the actual contents of the response. Historically,
web browsers have tolerated these servers by examining the content of web browsers have tolerated these servers by examining the content of
HTTP responses in addition to the Content-Type header field to HTTP responses in addition to the Content-Type header field to
determine the effective media type of the response. determine the effective media type of the response.
skipping to change at page 5, line 5 skipping to change at page 5, line 5
therein), an attacker might be able to steal the user's therein), an attacker might be able to steal the user's
authentication credentials and mount other cross-site scripting authentication credentials and mount other cross-site scripting
attacks. attacks.
Conformance requirements phrased as algorithms or specific steps MAY Conformance requirements phrased as algorithms or specific steps MAY
be implemented in any manner, so long as the end result is be implemented in any manner, so long as the end result is
equivalent. (In particular, the algorithms defined in this equivalent. (In particular, the algorithms defined in this
specification are intended to be easy to follow, and not intended to specification are intended to be easy to follow, and not intended to
be performant.) be performant.)
2. Metadata 2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Requirements phrased in the imperative as part of algorithms (such as
"strip any leading space characters" or "return false and abort these
steps") are to be interpreted with the meaning of the key word
("MUST", "SHOULD", "MAY", etc) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps can
be implemented in any manner, so long as the end result is
equivalent. In particular, the algorithms defined in this
specification are intended to be easy to understand and are not
intended to be performant.
3. Metadata
The explicit media type metadata information associated with sequence The explicit media type metadata information associated with sequence
of octets depends on the protocol that was used to fetch the octets. of octets depends on the protocol that was used to fetch the octets.
For octets received via HTTP, the Content-Type HTTP header field, if For octets received via HTTP, the Content-Type HTTP header field, if
present, indicates the media type. Let the official-type be the present, indicates the media type. Let the official-type be the
media type indicted by the HTTP Content-Type header field, if media type indicted by the HTTP Content-Type header field, if
present. If the Content-Type header field is absent or if its value present. If the Content-Type header field is absent or if its value
cannot be interpreted as a media type (e.g. because its value doesn't cannot be interpreted as a media type (e.g. because its value doesn't
contain a U+002F SOLIDUS ('/') character), then there is no official- contain a U+002F SOLIDUS ('/') character), then there is no official-
type. type. (Such messages are invalid according to [RFC2616]
Note: If an HTTP response contains multiple Content-Type header Note: If an HTTP response contains multiple Content-Type header
fields, the user agent MUST use the textually last Content-Type fields, the user agent MUST use the textually last Content-Type
header field to the official-type. For example, if the last header field to the official-type. For example, if the last
Content-Type header field contains the value "foo", then there is Content-Type header field contains the value "foo", then there is
no official media type because "foo" cannot be interpreted as a no official media type because "foo" cannot be interpreted as a
media type (even if the HTTP response contains another Content- media type (even if the HTTP response contains another Content-
Type header field that could be interpreted as a media type). Type header field that could be interpreted as a media type).
For octets fetched from the file system, user agents should use For octets fetched from the file system, user agents should use
skipping to change at page 5, line 38 skipping to change at page 6, line 38
type mappings) to determine the official-type. type mappings) to determine the official-type.
Note: It is essential that file extensions are not used for Note: It is essential that file extensions are not used for
determining the media type for octets fetched over HTTP because, determining the media type for octets fetched over HTTP because,
in some cases, file extensions can be supplied by malicious in some cases, file extensions can be supplied by malicious
parties. For example, most PHP installations let the attacker parties. For example, most PHP installations let the attacker
append arbitrary path information to URLs (e.g., append arbitrary path information to URLs (e.g.,
http://example.com/foo.php/bar.html) and thereby determine the http://example.com/foo.php/bar.html) and thereby determine the
file extension. file extension.
For octets fetched over some other protocols, e.g. FTP, there is no For octets fetched over some other protocols, e.g. FTP [RFC0959],
type information. there is no type information.
Note: Comparisons between media types, as defined by MIME Note: Comparisons between media types, as defined by MIME
specifications, are done in an ASCII case-insensitive manner. specifications, are done in an ASCII case-insensitive manner.
[RFC2046] [RFC2046]
3. Web Pages 4. Web Pages
The user agent MUST use the following algorithm to determine the The user agent MUST use the following algorithm to determine the
sniffed-type of a sequence of octets: sniffed-type of a sequence of octets:
1. If the user agent is configured to strictly obey the official- 1. If the user agent is configured to strictly obey the official-
type, then let the sniffed-type be the official-type and abort type, then let the sniffed-type be the official-type and abort
these steps. these steps.
2. If the octets were fetched via HTTP and there is an HTTP Content- 2. If the octets were fetched via HTTP and there is an HTTP Content-
Type header field and the value of the last such header field has Type header field and the value of the last such header field has
skipping to change at page 8, line 5 skipping to change at page 9, line 5
6. If the official-type is an image type supported by the user agent 6. If the official-type is an image type supported by the user agent
(e.g., "image/png", "image/gif", "image/jpeg", etc), then jump to (e.g., "image/png", "image/gif", "image/jpeg", etc), then jump to
the "images" section below. the "images" section below.
7. If the official-type is "text/html", then jump to the "feed or 7. If the official-type is "text/html", then jump to the "feed or
HTML" section below. HTML" section below.
8. Let the sniffed-type be the official type. 8. Let the sniffed-type be the official type.
4. Text or Binary 5. Text or Binary
This section defines the *rules for distinguishing if a resource is This section defines the *rules for distinguishing if a resource is
text or binary*. text or binary*.
1. The user agent MAY wait for 512 or more octets be to arrive. 1. The user agent MAY wait for 512 or more octets to arrive.
Note: Waiting for 512 octets octets to arrive causes the text- Note: Waiting for 512 octets octets to arrive causes the text-
or-binary algorithm to be deterministic for a given sequence or-binary algorithm to be deterministic for a given sequence
of octets. However, in some cases, the user agent might need of octets. However, in some cases, the user agent might need
to wait an arbitrary length of time for these octets to to wait an arbitrary length of time for these octets to
arrive. User agents SHOULD wait for 512 octets to arrive, arrive. User agents SHOULD wait for 512 octets to arrive,
when feasible. when feasible.
2. Let n be the smaller of either 512 or the number of octets that 2. Let n be the smaller of either 512 or the number of octets that
have already arrived. have already arrived.
skipping to change at page 10, line 5 skipping to change at page 11, line 5
given in the corresponding cell in the "sniffed type" column on given in the corresponding cell in the "sniffed type" column on
that row and abort these steps. that row and abort these steps.
WARNING! It is critical that this step not ever return a WARNING! It is critical that this step not ever return a
scriptable type (e.g., text/html), because otherwise that scriptable type (e.g., text/html), because otherwise that
would allow a privilege escalation attack. would allow a privilege escalation attack.
6. Otherwise, let the sniffed-type be "application/octet-stream" and 6. Otherwise, let the sniffed-type be "application/octet-stream" and
abort these steps. abort these steps.
5. Unknown Type 6. Unknown Type
1. The user agent MAY wait for 512 or more octets to arrive for the 1. The user agent MAY wait for 512 or more octets to arrive for the
same reason as in the "text or binary" section above. same reason as in the "text or binary" section above.
2. Let n be the smaller of either 512 or the number of octets that 2. Let n be the smaller of either 512 or the number of octets that
have already arrived. have already arrived.
3. For each row in the table below: 3. For each row in the table below:
* If the row has no "WS" octets: * If the row has no "WS" octets:
skipping to change at page 11, line 32 skipping to change at page 12, line 32
Otherwise, increment only the index-pattern to the Otherwise, increment only the index-pattern to the
next octet in the mask and pattern. next octet in the mask and pattern.
- Otherwise, if the index-pattern-th octet of the pattern - Otherwise, if the index-pattern-th octet of the pattern
is a "_>" octet: is a "_>" octet:
"_>" means "space-or-bracket", and allows HTML tag "_>" means "space-or-bracket", and allows HTML tag
names to terminate with either a space or a greater names to terminate with either a space or a greater
than sign. than sign.
If index-stream-th octet of the stream different If index-stream-th octet of the stream is different
than 0x20 (ASCII space) or 0x3E (ASCII ">"), then than 0x20 (ASCII space) or 0x3E (ASCII ">"), then
skip this row. skip this row.
Otherwise, increment index-pattern to the next octet Otherwise, increment index-pattern to the next octet
in the mask and pattern and index-stream to the next in the mask and pattern and index-stream to the next
octet in the octet stream. octet in the octet stream.
5. If index-pattern does not point beyond the end of the mask 5. If index-pattern does not point beyond the end of the mask
and pattern octet strings, then jump back to the LOOP step and pattern octet strings, then jump back to the LOOP step
in this algorithm. in this algorithm.
6. Otherwise, let the sniffed-type be the type given in the 6. Otherwise, let the sniffed-type be the type given in the
cell of the third column in that row and abort these cell of the third column in that row and abort these
steps. steps.
4. If the first n octets match the signature for H264 (as define in 4. If the first n octets match the signature for MP4 (as define in
Section 5.1), then let the sniffed-type be video/H264 and abort Section 6.1), then let the sniffed-type be video/mp4 and abort
these steps. these steps.
5. If none of the first n octets are binary data (as defined in the 5. If none of the first n octets are binary data (as defined in the
"text or binary" section), then let the sniffed-type be "text/ "text or binary" section), then let the sniffed-type be "text/
plain" and abort these steps. plain" and abort these steps.
6. Otherwise, let the sniffed-type be "application/octet-stream" and 6. Otherwise, let the sniffed-type be "application/octet-stream" and
abort these steps. abort these steps.
The table used by the above algorithm is: The table used by the above algorithm is:
skipping to change at page 15, line 15 skipping to change at page 16, line 15
e.g., a server uses the above table to determine that content is not e.g., a server uses the above table to determine that content is not
HTML and thus safe from cross-site scripting attacks, but then a user HTML and thus safe from cross-site scripting attacks, but then a user
agent detects it as HTML anyway and allows script to execute). In agent detects it as HTML anyway and allows script to execute). In
extending this table, user agents SHOULD NOT introduce any privilege extending this table, user agents SHOULD NOT introduce any privilege
escalation vulnerabilities. escalation vulnerabilities.
Note: The column marked "security" is used by the algorithm in the Note: The column marked "security" is used by the algorithm in the
"text or binary" section, to avoid sniffing text/plain content as a "text or binary" section, to avoid sniffing text/plain content as a
type that can be used for a privilege escalation attack. type that can be used for a privilege escalation attack.
5.1. Signature for H.264 6.1. Signature for MP4
This section defines whether a sequence of n octets *matches the This section defines whether a sequence of n octets *matches the
signature for H.264*. signature for MP4*.
If n is less than 4, then the sequence does not match the signature If n is less than 4, then the sequence does not match the signature
for H264 and abort these steps. for MP4 and abort these steps.
Let box-size be the value of the first four octets, interpreted as a Let box-size be the value of the first four octets, interpreted as a
32 bit unsigned, little-endian integer. 32 bit unsigned, little-endian integer.
If n is less than box-size or if box-size is not evenly divisible by If n is less than box-size or if box-size is not evenly divisible by
4, then the sequence does not match the signature for H264 and abort 4, then the sequence does not match the signature for MP4 and abort
these steps. these steps.
If octets 5 through 8 (inclusive) of the sequence are not 0x66 0x74 If octets 5 through 8 (inclusive) of the sequence are not 0x66 0x74
0x79 0x70 (the ASCII string "ftyp"), then the sequence does not match 0x79 0x70 (the ASCII string "ftyp"), then the sequence does not match
the signature for H264 and abort these steps. the signature for MP4 and abort these steps.
For each i from 2 to box-size/4 - 1 (inclusive): For each i from 2 to box-size/4 - 1 (inclusive):
1. If i is equal to 3, continue to the next i, if any. (These 1. If i is equal to 3, continue to the next i, if any. (These
octets correspond to the minor version number.) octets correspond to the minor version number.)
2. If octets 4*i through 4*i + 3 (inclusive) of the sequence are 2. If octets 4*i through 4*i + 3 (inclusive) of the sequence are
0x6D 0x70 0x34 (the ASCII string "mp4"), then the sequence *does* 0x6D 0x70 0x34 (the ASCII string "mp4"), then the sequence *does*
match the signature for H264 and abort these steps. match the signature for MP4 and abort these steps.
The sequence does not match the signature for H264. The sequence does not match the signature for MP4.
6. Image 7. Image
This section defines the *rules for sniffing images specifically*. This section defines the *rules for sniffing images specifically*.
If the official-type is "image/svg+xml", then let the sniffed-type be If the official-type is "image/svg+xml", then let the sniffed-type be
the official-type (an XML type) and abort these steps. the official-type (an XML type) and abort these steps.
If the first octets match one of the signatures in Section 5 for one If the first octets match one of the signatures in Section 6 for one
of the following media types, then let the sniffed-type be the of the following media types, then let the sniffed-type be the
corresponding media type and abort these steps: corresponding media type and abort these steps:
o image/gif o image/gif
o image/png o image/png
o image/jpeg o image/jpeg
o image/bmp o image/bmp
o image/vnd.microsoft.icon o image/vnd.microsoft.icon
o image/webp o image/webp
Otherwise, let the sniffed-type be the official-type and abort these Otherwise, let the sniffed-type be the official-type and abort these
steps. steps.
7. Video 8. Video
This section defines the *rules for sniffing videos specifically*. This section defines the *rules for sniffing videos specifically*.
If the first octets match one of the signatures in Section 5 for one If the first octets match one of the signatures in Section 6 for one
of the following media types, then let the sniffed-type be the of the following media types, then let the sniffed-type be the
corresponding media type and abort these steps: corresponding media type and abort these steps:
o video/H264 o video/mp4
o video/webm o video/webm
o application/ogg
Otherwise, let the sniffed-type be the official-type and abort these Otherwise, let the sniffed-type be the official-type and abort these
steps. steps.
8. Fonts 9. Fonts
This section defines the *rules for sniffing fonts specifically*. This section defines the *rules for sniffing fonts specifically*.
TODO TODO
Otherwise, let the sniffed-type be the official-type and abort these Otherwise, let the sniffed-type be the official-type and abort these
steps. steps.
9. Feed or HTML 10. Feed or HTML
1. The user agent MAY wait for 512 or more octets to arrive for the 1. The user agent MAY wait for 512 or more octets to arrive for the
same reason as in the "text or binary" section above. same reason as in the "text or binary" section above.
2. Let s be the stream of octets, and let s[i] represent the octet 2. Let s be the stream of octets, and let s[i] represent the octet
in s with position i, treating s as zero-indexed (so the first in s with position i, treating s as zero-indexed (so the first
octet is at i=0). octet is at i=0).
3. If at any point this algorithm requires the user agent to 3. If at any point this algorithm requires the user agent to
determine the value of a octet in s which has not yet arrived, determine the value of a octet in s which has not yet arrived,
skipping to change at page 22, line 5 skipping to change at page 23, line 5
continue to step 19 of this algorithm. continue to step 19 of this algorithm.
18. Jump back to step 13 of this algorithm. 18. Jump back to step 13 of this algorithm.
19. Let the sniffed-type be "text/html" and abort these steps. 19. Let the sniffed-type be "text/html" and abort these steps.
For efficiency reasons, implementations might wish to implement this For efficiency reasons, implementations might wish to implement this
algorithm and the algorithm for detecting the character encoding of algorithm and the algorithm for detecting the character encoding of
HTML documents in parallel. HTML documents in parallel.
10. References 11. References
11.1. Normative References
[RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046,
November 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
11.2. Informative References
[RFC0959] Postel, J. and J. Reynolds, "File Transfer Protocol",
STD 9, RFC 959, October 1985.
[BarthCaballeroSong2009] [BarthCaballeroSong2009]
Barth, A., Caballero, J., and D. Song, "Secure Content Barth, A., Caballero, J., and D. Song, "Secure Content
Sniffing for Web Browsers, or How to Stop Papers from Sniffing for Web Browsers, or How to Stop Papers from
Reviewing Themselves", 2009, <http://www.adambarth.com/ Reviewing Themselves", 2009, <http://www.adambarth.com/
papers/2009/barth-caballero-song.pdf>. papers/2009/barth-caballero-song.pdf>.
Authors' Addresses Authors' Addresses
Adam Barth Adam Barth
 End of changes. 29 change blocks. 
41 lines changed or deleted 81 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/