< draft-reschke-rfc5987bis-00.txt   draft-reschke-rfc5987bis-01.txt >
Network Working Group J. Reschke Network Working Group J. Reschke
Internet-Draft greenbytes Internet-Draft greenbytes
Obsoletes: 5987 (if approved) April 15, 2011 Obsoletes: 5987 (if approved) September 8, 2011
Intended status: Standards Track Intended status: Standards Track
Expires: October 17, 2011 Expires: March 11, 2012
Character Set and Language Encoding for Indicating Character Encoding and Language for HTTP Header Field
Hypertext Transfer Protocol (HTTP) Header Field Parameters Parameters
draft-reschke-rfc5987bis-00 draft-reschke-rfc5987bis-01
Abstract Abstract
By default, message header field parameters in Hypertext Transfer By default, message header field parameters in Hypertext Transfer
Protocol (HTTP) messages cannot carry characters outside the ISO- Protocol (HTTP) messages cannot carry characters outside the ISO-
8859-1 character set. RFC 2231 defines an encoding mechanism for use 8859-1 character set. RFC 2231 defines an encoding mechanism for use
in Multipurpose Internet Mail Extensions (MIME) headers. This in Multipurpose Internet Mail Extensions (MIME) headers. This
document specifies an encoding suitable for use in HTTP header fields document specifies an encoding suitable for use in HTTP header fields
that is compatible with a profile of the encoding defined in RFC that is compatible with a profile of the encoding defined in RFC
2231. 2231.
skipping to change at page 2, line 10 skipping to change at page 2, line 10
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 17, 2011. This Internet-Draft will expire on March 11, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 18 skipping to change at page 3, line 18
2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 4 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 4
3. Comparison to RFC 2231 and Definition of the Encoding . . . . 4 3. Comparison to RFC 2231 and Definition of the Encoding . . . . 4
3.1. Parameter Continuations . . . . . . . . . . . . . . . . . 5 3.1. Parameter Continuations . . . . . . . . . . . . . . . . . 5
3.2. Parameter Value Character Set and Language Information . . 5 3.2. Parameter Value Character Set and Language Information . . 5
3.2.1. Definition . . . . . . . . . . . . . . . . . . . . . . 5 3.2.1. Definition . . . . . . . . . . . . . . . . . . . . . . 5
3.2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . 7 3.2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . 7
3.3. Language Specification in Encoded Words . . . . . . . . . 8 3.3. Language Specification in Encoded Words . . . . . . . . . 8
4. Guidelines for Usage in HTTP Header Field Definitions . . . . 8 4. Guidelines for Usage in HTTP Header Field Definitions . . . . 8
4.1. When to Use the Extension . . . . . . . . . . . . . . . . 9 4.1. When to Use the Extension . . . . . . . . . . . . . . . . 9
4.2. Error Handling . . . . . . . . . . . . . . . . . . . . . . 9 4.2. Error Handling . . . . . . . . . . . . . . . . . . . . . . 9
5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7.1. Normative References . . . . . . . . . . . . . . . . . . . 10 7.1. Normative References . . . . . . . . . . . . . . . . . . . 10
7.2. Informative References . . . . . . . . . . . . . . . . . . 11 7.2. Informative References . . . . . . . . . . . . . . . . . . 11
Appendix A. Changes from RFC 5987 . . . . . . . . . . . . . . . . 11 Appendix A. Changes from RFC 5987 . . . . . . . . . . . . . . . . 12
Appendix B. Change Log (to be removed by RFC Editor before Appendix B. Change Log (to be removed by RFC Editor before
publication) . . . . . . . . . . . . . . . . . . . . 12 publication) . . . . . . . . . . . . . . . . . . . . 12
B.1. Since RFC5987 . . . . . . . . . . . . . . . . . . . . . . 12 B.1. Since RFC5987 . . . . . . . . . . . . . . . . . . . . . . 12
B.2. Since draft-reschke-rfc5987bis-00 . . . . . . . . . . . . 12
Appendix C. Resolved issues (to be removed by RFC Editor Appendix C. Resolved issues (to be removed by RFC Editor
before publication) . . . . . . . . . . . . . . . . . 12 before publication) . . . . . . . . . . . . . . . . . 12
C.1. obs5987 . . . . . . . . . . . . . . . . . . . . . . . . . 12 C.1. iso-8859-1 . . . . . . . . . . . . . . . . . . . . . . . . 12
C.2. title . . . . . . . . . . . . . . . . . . . . . . . . . . 13
C.3. historic5987 . . . . . . . . . . . . . . . . . . . . . . . 13
Appendix D. Open issues (to be removed by RFC Editor prior to Appendix D. Open issues (to be removed by RFC Editor prior to
publication) . . . . . . . . . . . . . . . . . . . . 12 publication) . . . . . . . . . . . . . . . . . . . . 13
D.1. edit . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 D.1. edit . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
D.2. impls . . . . . . . . . . . . . . . . . . . . . . . . . . 12 D.2. impls . . . . . . . . . . . . . . . . . . . . . . . . . . 13
D.3. iso-8859-1 . . . . . . . . . . . . . . . . . . . . . . . . 12
1. Introduction 1. Introduction
By default, message header field parameters in HTTP ([RFC2616]) By default, message header field parameters in HTTP ([RFC2616])
messages cannot carry characters outside the ISO-8859-1 character set messages cannot carry characters outside the ISO-8859-1 character set
([ISO-8859-1]). RFC 2231 ([RFC2231]) defines an encoding mechanism ([ISO-8859-1]). RFC 2231 ([RFC2231]) defines an encoding mechanism
for use in MIME headers. This document specifies an encoding for use in MIME headers. This document specifies an encoding
suitable for use in HTTP header fields that is compatible with a suitable for use in HTTP header fields that is compatible with a
profile of the encoding defined in RFC 2231. profile of the encoding defined in RFC 2231.
This document obsoletes [RFC5987]; the changes are summarized in This document obsoletes [RFC5987] and moves it to "historic" status;
Appendix A. the changes are summarized in Appendix A.
Note: in the remainder of this document, RFC 2231 is only Note: in the remainder of this document, RFC 2231 is only
referenced for the purpose of explaining the choice of features referenced for the purpose of explaining the choice of features
that were adopted; they are therefore purely informative. that were adopted; they are therefore purely informative.
Note: this encoding does not apply to message payloads transmitted Note: this encoding does not apply to message payloads transmitted
over HTTP, such as when using the media type "multipart/form-data" over HTTP, such as when using the media type "multipart/form-data"
([RFC2388]). ([RFC2388]).
2. Notational Conventions 2. Notational Conventions
skipping to change at page 5, line 26 skipping to change at page 5, line 26
3.2. Parameter Value Character Set and Language Information 3.2. Parameter Value Character Set and Language Information
Section 4 of [RFC2231] specifies how to embed language information Section 4 of [RFC2231] specifies how to embed language information
into parameter values, and also how to encode non-ASCII characters, into parameter values, and also how to encode non-ASCII characters,
dealing with restrictions both in MIME and HTTP header parameters. dealing with restrictions both in MIME and HTTP header parameters.
However, RFC 2231 does not specify a mandatory-to-implement character However, RFC 2231 does not specify a mandatory-to-implement character
set, making it hard for senders to decide which character set to use. set, making it hard for senders to decide which character set to use.
Thus, recipients implementing this specification MUST support the Thus, recipients implementing this specification MUST support the
character sets "ISO-8859-1" [ISO-8859-1] and "UTF-8" [RFC3629]. "UTF-8" character set [RFC3629].
Furthermore, RFC 2231 allows the character set information to be left Furthermore, RFC 2231 allows the character set information to be left
out. The encoding defined by this specification does not allow that. out. The encoding defined by this specification does not allow that.
3.2.1. Definition 3.2.1. Definition
The syntax for parameters is defined in Section 3.6 of [RFC2616] The syntax for parameters is defined in Section 3.6 of [RFC2616]
(with RFC 2616 implied LWS translated to RFC 5234 LWSP): (with RFC 2616 implied LWS translated to RFC 5234 LWSP):
parameter = attribute LWSP "=" LWSP value parameter = attribute LWSP "=" LWSP value
skipping to change at page 6, line 17 skipping to change at page 6, line 17
reg-parameter = parmname LWSP "=" LWSP value reg-parameter = parmname LWSP "=" LWSP value
ext-parameter = parmname "*" LWSP "=" LWSP ext-value ext-parameter = parmname "*" LWSP "=" LWSP ext-value
parmname = 1*attr-char parmname = 1*attr-char
ext-value = charset "'" [ language ] "'" value-chars ext-value = charset "'" [ language ] "'" value-chars
; like RFC 2231's <extended-initial-value> ; like RFC 2231's <extended-initial-value>
; (see [RFC2231], Section 7) ; (see [RFC2231], Section 7)
charset = "UTF-8" / "ISO-8859-1" / mime-charset charset = "UTF-8" / mime-charset
mime-charset = 1*mime-charsetc mime-charset = 1*mime-charsetc
mime-charsetc = ALPHA / DIGIT mime-charsetc = ALPHA / DIGIT
/ "!" / "#" / "$" / "%" / "&" / "!" / "#" / "$" / "%" / "&"
/ "+" / "-" / "^" / "_" / "`" / "+" / "-" / "^" / "_" / "`"
/ "{" / "}" / "~" / "{" / "}" / "~"
; as <mime-charset> in Section 2.3 of [RFC2978] ; as <mime-charset> in Section 2.3 of [RFC2978]
; except that the single quote is not included ; except that the single quote is not included
; SHOULD be registered in the IANA charset registry ; SHOULD be registered in the IANA charset registry
skipping to change at page 7, line 12 skipping to change at page 7, line 12
single quote characters. Note that both character set names and single quote characters. Note that both character set names and
language tags are restricted to the US-ASCII character set, and are language tags are restricted to the US-ASCII character set, and are
matched case-insensitively (see [RFC2978], Section 2.3 and [RFC5646], matched case-insensitively (see [RFC2978], Section 2.3 and [RFC5646],
Section 2.1.1). Section 2.1.1).
Inside the value part, characters not contained in attr-char are Inside the value part, characters not contained in attr-char are
encoded into an octet sequence using the specified character set. encoded into an octet sequence using the specified character set.
That octet sequence is then percent-encoded as specified in Section That octet sequence is then percent-encoded as specified in Section
2.1 of [RFC3986]. 2.1 of [RFC3986].
Producers MUST use either the "UTF-8" ([RFC3629]) or the "ISO-8859-1" Producers MUST use the "UTF-8" ([RFC3629]) character set. Extension
([ISO-8859-1]) character set. Extension character sets (mime- character sets (mime-charset) are reserved for future use.
charset) are reserved for future use.
Note: recipients should be prepared to handle encoding errors, Note: recipients should be prepared to handle encoding errors,
such as malformed or incomplete percent escape sequences, or non- such as malformed or incomplete percent escape sequences, or non-
decodable octet sequences, in a robust manner. This specification decodable octet sequences, in a robust manner. This specification
does not mandate any specific behavior, for instance, the does not mandate any specific behavior, for instance, the
following strategies are all acceptable: following strategies are all acceptable:
* ignoring the parameter, * ignoring the parameter,
* stripping a non-decodable octet sequence, * stripping a non-decodable octet sequence,
skipping to change at page 7, line 42 skipping to change at page 7, line 41
Section 5.1 of [RFC2045]) in that curly braces ("{" and "}") are Section 5.1 of [RFC2045]) in that curly braces ("{" and "}") are
excluded. Thus, these two characters are excluded from the attr- excluded. Thus, these two characters are excluded from the attr-
char production as well. char production as well.
Note: the <mime-charset> ABNF defined here differs from the one in Note: the <mime-charset> ABNF defined here differs from the one in
Section 2.3 of [RFC2978] in that it does not allow the single Section 2.3 of [RFC2978] in that it does not allow the single
quote character (see also RFC Errata ID 1912 [Err1912]). In quote character (see also RFC Errata ID 1912 [Err1912]). In
practice, no character set names using that character have been practice, no character set names using that character have been
registered at the time of this writing. registered at the time of this writing.
Note: [RFC5987] did require support for ISO-8859-1, too; for
compatibility with legacy code, recipients are encouraged to
support this encoding as well.
3.2.2. Examples 3.2.2. Examples
Non-extended notation, using "token": Non-extended notation, using "token":
foo: bar; title=Economy foo: bar; title=Economy
Non-extended notation, using "quoted-string": Non-extended notation, using "quoted-string":
foo: bar; title="US-$ rates" foo: bar; title="US-$ rates"
Extended notation, using the Unicode character U+00A3 (POUND SIGN): Extended notation, using the Unicode character U+00A3 (POUND SIGN):
foo: bar; title*=iso-8859-1'en'%A3%20rates foo: bar; title*=utf-8'en'%C2%A3%20rates
Note: the Unicode pound sign character U+00A3 was encoded into the Note: the Unicode pound sign character U+00A3 was encoded into the
single octet A3 using the ISO-8859-1 character encoding, then octet sequence C2 A3 using the UTF-8 character encoding, then
percent-encoded. Also, note that the space character was encoded as percent-encoded. Also, note that the space character was encoded as
%20, as it is not contained in attr-char. %20, as it is not contained in attr-char.
Extended notation, using the Unicode characters U+00A3 (POUND SIGN) Extended notation, using the Unicode characters U+00A3 (POUND SIGN)
and U+20AC (EURO SIGN): and U+20AC (EURO SIGN):
foo: bar; title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates foo: bar; title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates
Note: the Unicode pound sign character U+00A3 was encoded into the Note: the Unicode pound sign character U+00A3 was encoded into the
octet sequence C2 A3 using the UTF-8 character encoding, then octet sequence C2 A3 using the UTF-8 character encoding, then
skipping to change at page 10, line 23 skipping to change at page 10, line 33
parameter, and such use might allow spoofing attacks, where different parameter, and such use might allow spoofing attacks, where different
language versions of the same parameter are not equivalent. Whether language versions of the same parameter are not equivalent. Whether
this attack is useful as an attack depends on the parameter this attack is useful as an attack depends on the parameter
specified. specified.
6. Acknowledgements 6. Acknowledgements
Thanks to Martin Duerst and Frank Ellermann for help figuring out Thanks to Martin Duerst and Frank Ellermann for help figuring out
ABNF details, to Graham Klyne and Alexey Melnikov for general review, ABNF details, to Graham Klyne and Alexey Melnikov for general review,
to Chris Newman for pointing out an RFC 2231 incompatibility, and to to Chris Newman for pointing out an RFC 2231 incompatibility, and to
Benjamin Carlyle and Roar Lauritzsen for implementer's feedback. Benjamin Carlyle, Roar Lauritzsen, and Eric Lawrence for
implementer's feedback.
7. References 7. References
7.1. Normative References 7.1. Normative References
[ISO-8859-1] International Organization for Standardization,
"Information technology -- 8-bit single-byte coded
graphic character sets -- Part 1: Latin alphabet No.
1", ISO/IEC 8859-1:1998, 1998.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC2978] Freed, N. and J. Postel, "IANA Charset Registration [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration
Procedures", BCP 19, RFC 2978, October 2000. Procedures", BCP 19, RFC 2978, October 2000.
skipping to change at page 11, line 20 skipping to change at page 11, line 26
[USASCII] American National Standards Institute, "Coded Character [USASCII] American National Standards Institute, "Coded Character
Set -- 7-bit American Standard Code for Information Set -- 7-bit American Standard Code for Information
Interchange", ANSI X3.4, 1986. Interchange", ANSI X3.4, 1986.
7.2. Informative References 7.2. Informative References
[Err1912] RFC Errata, "Errata ID 1912, RFC 2978", [Err1912] RFC Errata, "Errata ID 1912, RFC 2978",
<http://www.rfc-editor.org>. <http://www.rfc-editor.org>.
[ISO-8859-1] International Organization for Standardization,
"Information technology -- 8-bit single-byte coded
graphic character sets -- Part 1: Latin alphabet No.
1", ISO/IEC 8859-1:1998, 1998.
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet
Mail Extensions (MIME) Part One: Format of Internet Mail Extensions (MIME) Part One: Format of Internet
Message Bodies", RFC 2045, November 1996. Message Bodies", RFC 2045, November 1996.
[RFC2047] Moore, K., "MIME (Multipurpose Internet Mail [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail
Extensions) Part Three: Message Header Extensions for Extensions) Part Three: Message Header Extensions for
Non-ASCII Text", RFC 2047, November 1996. Non-ASCII Text", RFC 2047, November 1996.
[RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and
Encoded Word Extensions: Character Sets, Languages, and Encoded Word Extensions: Character Sets, Languages, and
skipping to change at page 12, line 5 skipping to change at page 12, line 15
URIs URIs
[1] <mailto:ietf-http-wg@w3.org> [1] <mailto:ietf-http-wg@w3.org>
[2] <mailto:ietf-http-wg-request@w3.org?subject=subscribe> [2] <mailto:ietf-http-wg-request@w3.org?subject=subscribe>
Appendix A. Changes from RFC 5987 Appendix A. Changes from RFC 5987
This section summarizes the changes compared to [RFC5987]: This section summarizes the changes compared to [RFC5987]:
[[anchor8: None yet.]] o The document title was changed to "Indicating Character Encoding
and Language for HTTP Header Field Parameters".
o The requirement to support the "ISO-8859-1" encoding was removed.
Appendix B. Change Log (to be removed by RFC Editor before publication) Appendix B. Change Log (to be removed by RFC Editor before publication)
B.1. Since RFC5987 B.1. Since RFC5987
Only editorial changes for the purpose of starting the revision Only editorial changes for the purpose of starting the revision
process (obs5987). process (obs5987).
B.2. Since draft-reschke-rfc5987bis-00
Resolved issues "iso-8859-1" and "title" (title simplified). Added
and resolved issue "historic5987".
Appendix C. Resolved issues (to be removed by RFC Editor before Appendix C. Resolved issues (to be removed by RFC Editor before
publication) publication)
Issues that were either rejected or resolved in this version of this Issues that were either rejected or resolved in this version of this
document. document.
C.1. obs5987 C.1. iso-8859-1
Type: change Type: change
julian.reschke@greenbytes.de (2011-04-15): Obsolete RFC 5987, julian.reschke@greenbytes.de (2011-04-15): Remove requirement to
summarize differences. support ISO-8859-1? It doesn't really help, and it is not
implemented in IE9.
Resolution (2011-09-07): Removed requirement; adjusted examples;
explain that RFC 5987 required this so recipients may want to support
it anyway.
C.2. title
Type: edit
duerst@it.aoyama.ac.jp (2011-04-17): Proposed title: "Indicating
Character Encoding and Language for HTTP Header Field Parameters"
Resolution (2011-09-07): Done.
C.3. historic5987
In Section 1:
Type: change
julian.reschke@greenbytes.de (2011-09-08): Point out that RFC 5987
should be moved to "historic".
Resolution (2011-09-08): Done.
Appendix D. Open issues (to be removed by RFC Editor prior to Appendix D. Open issues (to be removed by RFC Editor prior to
publication) publication)
D.1. edit D.1. edit
Type: edit Type: edit
julian.reschke@greenbytes.de (2011-04-15): Umbrella issue for julian.reschke@greenbytes.de (2011-04-15): Umbrella issue for
editorial fixes/enhancements. editorial fixes/enhancements.
D.2. impls D.2. impls
Type: change Type: change
julian.reschke@greenbytes.de (2011-04-15): Add implementation report. julian.reschke@greenbytes.de (2011-04-15): Add implementation report.
D.3. iso-8859-1
Type: change
julian.reschke@greenbytes.de (2011-04-15): Remove requirement to
support ISO-8859-1? It doesn't really help, and it is not
implemented in IE9.
Author's Address Author's Address
Julian F. Reschke Julian F. Reschke
greenbytes GmbH greenbytes GmbH
Hafenweg 16 Hafenweg 16
Muenster, NW 48155 Muenster, NW 48155
Germany Germany
EMail: julian.reschke@greenbytes.de EMail: julian.reschke@greenbytes.de
URI: http://greenbytes.de/tech/webdav/ URI: http://greenbytes.de/tech/webdav/
 End of changes. 24 change blocks. 
40 lines changed or deleted 71 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/
X-Generator: pyht 0.35