--- 1/draft-ietf-appsawg-xml-mediatypes-03.txt 2013-11-04 05:15:13.615343194 -0800 +++ 2/draft-ietf-appsawg-xml-mediatypes-04.txt 2013-11-04 05:15:13.671344643 -0800 @@ -1,20 +1,20 @@ Network Working Group H. S. Thompson Internet-Draft University of Edinburgh Obsoletes: 3023 (if approved) C. Lilley Updates: 6839 (if approved) W3C -Intended status: Standards Track October 16, 2013 -Expires: April 19, 2014 +Intended status: Standards Track November 04, 2013 +Expires: May 08, 2014 XML Media Types - draft-ietf-appsawg-xml-mediatypes-03 + draft-ietf-appsawg-xml-mediatypes-04 Abstract This specification standardizes three media types -- application/xml, application/xml-external-parsed-entity, and application/xml-dtd -- for use in exchanging network entities that are related to the Extensible Markup Language (XML) while defining text/xml and text/ xml-external-parsed-entity as aliases for the respective application/ types. This specification also standardizes the '+xml' suffix for naming media types outside of these five types when those media types @@ -28,21 +28,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on April 19, 2014. + This Internet-Draft will expire on May 08, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -56,49 +56,49 @@ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Notational Conventions . . . . . . . . . . . . . . . . . . . 3 3. XML Media Types . . . . . . . . . . . . . . . . . . . . . . . 4 3.1. Application/xml Registration . . . . . . . . . . . . . . 5 3.2. Text/xml Registration . . . . . . . . . . . . . . . . . . 7 3.3. Application/xml-external-parsed-entity Registration . . . 7 3.4. Text/xml-external-parsed-entity Registration . . . . . . 8 3.5. Application/xml-dtd Registration . . . . . . . . . . . . 8 3.6. Charset considerations . . . . . . . . . . . . . . . . . 9 - 4. The Byte Order Mark (BOM) and Charset Conversions . . . . . . 10 - 5. Fragment Identifiers . . . . . . . . . . . . . . . . . . . . 11 + 4. The Byte Order Mark (BOM) and Charset Conversions . . . . . . 11 + 5. Fragment Identifiers . . . . . . . . . . . . . . . . . . . . 12 6. The Base URI . . . . . . . . . . . . . . . . . . . . . . . . 12 7. XML Versions . . . . . . . . . . . . . . . . . . . . . . . . 13 - 8. A Naming Convention for XML-Based Media Types . . . . . . . . 13 - 8.1. Referencing . . . . . . . . . . . . . . . . . . . . . . . 14 + 8. A Naming Convention for XML-Based Media Types . . . . . . . . 14 + 8.1. Referencing . . . . . . . . . . . . . . . . . . . . . . . 15 8.2. +xml Structured Syntax Suffix Registration . . . . . . . 15 - 9. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 16 + 9. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 17 9.1. UTF-8 Charset . . . . . . . . . . . . . . . . . . . . . . 17 - 9.2. UTF-16 Charset . . . . . . . . . . . . . . . . . . . . . 17 - 9.3. Omitted Charset and 8-bit MIME entity . . . . . . . . . . 17 + 9.2. UTF-16 Charset . . . . . . . . . . . . . . . . . . . . . 18 + 9.3. Omitted Charset and 8-bit MIME entity . . . . . . . . . . 18 9.4. Omitted Charset and 16-bit MIME entity . . . . . . . . . 18 9.5. Omitted Charset, no Internal Encoding Declaration and - UTF-8 Entity . . . . . . . . . . . . . . . . . . . . . . 18 + UTF-8 Entity . . . . . . . . . . . . . . . . . . . . . . 19 9.6. UTF-16BE Charset . . . . . . . . . . . . . . . . . . . . 19 - 9.7. Non-UTF Charset . . . . . . . . . . . . . . . . . . . . . 19 - 9.8. Omitted Charset with Internal Encoding Declaration . . . 19 + 9.7. Non-UTF Charset . . . . . . . . . . . . . . . . . . . . . 20 + 9.8. Omitted Charset with Internal Encoding Declaration . . . 20 9.9. INCONSISTENT EXAMPLE: Conflicting Charset and Internal Encoding Declaration . . . . . . . . . . . . . . . . . . 20 - 9.10. INCONSISTENT EXAMPLE: Conflicting Charset and BOM . . . . 20 + 9.10. INCONSISTENT EXAMPLE: Conflicting Charset and BOM . . . . 21 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 11. Security Considerations . . . . . . . . . . . . . . . . . . . 21 - 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 - 12.1. Normative References . . . . . . . . . . . . . . . . . . 22 - 12.2. Informative References . . . . . . . . . . . . . . . . . 24 + 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 + 12.1. Normative References . . . . . . . . . . . . . . . . . . 23 + 12.2. Informative References . . . . . . . . . . . . . . . . . 25 Appendix A. Why Use the '+xml' Suffix for XML-Based MIME Types? 26 Appendix B. Changes from RFC 3023 . . . . . . . . . . . . . . . 26 - Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 26 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 + Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 27 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 1. Introduction The World Wide Web Consortium has issued the Extensible Markup Language (XML) 1.0 [XML] and Extensible Markup Language (XML) 1.1 [XML1.1] specifications. To enable the exchange of XML network entities, this specification standardizes three media types -- application/xml, application/xml-external-parsed-entity, and application/xml-dtd and two aliases -- text/xml and text/xml- external-parsed-entity, as well as a naming convention for @@ -208,21 +208,21 @@ default character sets for the text/xml... types has been resolved by [HTTPbis] changing [RFC2616] by removing the ISO-8859-1 default and not defining any default at all, as well as [RFC6657] updating [RFC2046] to remove the US-ASCII default. See Section 3.6 for the now-unified approach to the charset parameter which results. XML provides a general framework for defining sequences of structured data. It is often appropriate to define new media types that use XML but define a specific application of XML, due to domain-specific display, editing, security considerations or runtime information. - Furthermore, such media types may allow UTF-8 or UTF-16 only and + Furthermore, such media types may allow only UTF-8 and/or UTF-16 and prohibit other character sets. This specification does not prohibit such media types and in fact expects them to proliferate. However, developers of such media types are RECOMMENDED to use this specification as a basis for their registration. See Section 8 for more detailed recommendations on using the '+xml' suffix for registration of such media types. An XML document labeled as application/xml or text/xml, or with a '+xml' media type, might contain namespace declarations, stylesheet- linking processing instructions (PIs), schema information, or other @@ -237,44 +237,44 @@ Type name: application Subtype name: xml Required parameters: none Optional parameters: charset See Section 3.6. Encoding considerations: Depending on the charset encoding used, XML - MIME entities may consist of 7bit, 8bit or binary data [RFC6838]. + MIME entities can consist of 7bit, 8bit or binary data [RFC6838]. For 7-bit transports, 7bit data, for example data with charset encoding US-ASCII, does not require content-transfer-encoding, but 8bit or binary data, for example data with charset encoding UTF-8 or UTF-16, MUST be content-transfer-encoded in quoted-printable or base64. For 8-bit clean transport (e.g. 8BITMIME [RFC6152], ESMTP or NNTP [RFC3977]), 7bit or 8bit data, for example data with charset encoding UTF-8 or US-ASCII, does not require content- transfer-encoding, but binary data, for example data with a charset encoding from the UTF-16 family, MUST be content-transfer- encoded in base64. For binary clean transports (e.g. HTTP [RFC2616]), no content-transfer-encoding is necessary (or even possible, in the case of HTTP) for 7bit, 8bit or binary data. Security considerations: See Section 11. Interoperability considerations: XML has proven to be interoperable across both generic and task-specific applications and for import - and export from multiple XML authoring and editing tools. For - maximum interoperability, validating processors are recommended. - Although non-validating processors may be more efficient, they are - not required to handle all features of XML. For further - information, see sub-section 2.9 "Standalone Document Declaration" - and section 5 "Conformance" of [XML] . + and export from multiple XML authoring and editing tools. + Validating processors provide maximum interoperability. Although + non-validating processors may be more efficient, they are not + required to handle all features of XML. For further information, + see sub-section 2.9 "Standalone Document Declaration" and section + 5 "Conformance" of [XML] . Published specification: Extensible Markup Language (XML) 1.0 (Fifth Edition) [XML] or subsequent editions or versions thereof. Applications that use this media type: XML is device-, platform-, and vendor-neutral and is supported by a wide range of generic XML tools (editors, parsers, Web agents, ...), generic and task- specific applications. Additional information: @@ -296,21 +296,21 @@ Base URI: See Section 6 Person and email address for further information: See Authors' Addresses section Intended usage: COMMON Author: See Authors' Addresses section Change controller: The XML specification is a work product of the - World Wide Web Consortium's XML Working Group + World Wide Web Consortium's XML Core Working Group 3.2. Text/xml Registration text/xml is an alias for application/xml, as defined in Section 3.1 above. 3.3. Application/xml-external-parsed-entity Registration Type name: application @@ -356,21 +356,21 @@ Base URI: See Section 6 Person and email address for further information: See Authors' Addresses section. Intended usage: COMMON Author: See Authors' Addresses section. Change controller: The XML specification is a work product of the - World Wide Web Consortium's XML Working Group + World Wide Web Consortium's XML Core Working Group 3.4. Text/xml-external-parsed-entity Registration text/xml-external-parsed-entity is an alias for application/xml- external-parsed-entity, as defined in Section 3.3 above. 3.5. Application/xml-dtd Registration Type name: application @@ -407,64 +407,98 @@ Macintosh File Type Code(s): "TEXT" Person and email address for further information: See Authors' Addresses section. Intended usage: COMMON Author: See Authors' Addresses section. Change controller: The XML specification is a work product of the - World Wide Web Consortium's XML Working Group + World Wide Web Consortium's XML Working Core Group 3.6. Charset considerations - When a charset parameter is specified for an XML MIME entity which - contains in-band encoding information, that is, either a BOM - (Section 4) or an XML encoding declaration or both, the normative - component of the [XML] specification leaves the question open as to - which should be taken to be authoritative in the case of conflict. - In its (non-normative) Appendix F it defers to this specification: + As many as three distinct sources of information about character + encoding may be present for an XML MIME entity: a charset parameter, + a Byte Order Mark (BOM -- see Section 4 below) and an XML encoding + declaration (see Section 4.3.3 of [XML]). Ensuring consistency among + these sources requires coordination between entity authors and MIME + agents (that is, processes which package, transfer, deliver and/or + receive MIME entities). Some MIME agents will be what we will call + "XML-aware", that is, capable of processing XML MIME entities and + detecting the XML encoding declaration (or its absence). Others will + not be XML-aware, and thus cannot know anything about the XML + encoding declaration. Some MIME agents, such as proxies and + transcoders, both consume and produce MIME entities. + + XML-aware MIME producers SHOULD supply a charset parameter and/or an + appropriate BOM with non-UTF-8-encoded XML MIME entities which lack + an encoding declaration, and SHOULD remove or correct an encoding + declaration which is known to be incorrect (for example, as a result + of transcoding). + + XML-unaware MIME producers MUST NOT supply a charset parameter with + an XML MIME entity unless the entity's character encoding is reliably + known. + + XML MIME producers are RECOMMENDED to provide means for XML MIME + entity authors to control the supply of charset parameters for their + entities, for example by enabling user-level configuration of + filename-to-Content-Type-header mappings on a file-by-file or suffix + basis. + + For XML MIME consumers, the question of priority arises in cases when + the available character encoding information is not consistent. + Again, we must distinguish betweeen XML-aware and XML-unaware + processors. + + When a charset parameter is specified for an XML MIME entity, then + regardless of whether or not the entity contains in-band encoding + information, that is, either a BOM (Section 4) or an XML encoding + declaration or both, or none, the normative component of the [XML] + specification leaves the question open as to how to determine the + encoding with which to attempt to process the entity. In particular, + in the case where there is in-band information and it conflicts with + the charset parameter, the [XML] specification does not specify which + should be taken to be authoritative. In its (non-normative) + Appendix F it defers to this specification: [T]he preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. In particular, please refer to [IETF RFC 3023] or its successor - All processors SHOULD treat a BOM (Section 4) as authoritative if it - is present in an XML MIME entity. In the absence of a BOM - (Section 4), all processors SHOULD treat the charset parameter as - authoritative. Section 4.3.3 of the [XML] specification does _not_ - make it an error for the charset parameter and the XML encoding - declaration to be inconsistent. - - XML-aware processors SHOULD supply a charset parameter and/or an - appropriate BOM with non-UTF-8-encoded XML MIME entities which lack - an encoding declaration, or whose encoding declaration is known to be - incorrect (for example, as a result of transcoding). + Accordingly, to conform with deployed processors and content and to + avoid conflicting with this or other normative specifications, this + specification sets the priority as follows: - The charset parameter MUST NOT be used unless the charset is reliably - known. This information will be used by all processors to determine - authoritatively the charset of the XML MIME entity in the absence of - a BOM. + All consumers SHOULD treat a BOM (Section 4) as authoritative if it + is present in an XML MIME entity. In the absence of a BOM + (Section 4), all consumers SHOULD treat the charset parameter as + authoritative if it is present. For XML-aware consumers, note that + Section 4.3.3 of the [XML] specification does _not_ make it an error + for the charset parameter and the XML encoding declaration (or the + UTF-8 default in the absence of encoding declaration and BOM) to be + inconsistent, although such processors might choose to issue a + warning in this case. - "utf-8" [RFC3629] and "utf-16" [RFC2781] are the recommended values, - representing the UTF-8 and UTF-16 character sets, respectively. - These character sets are preferred since they are supported by all - conforming processors of [XML]. + When MIME producers conform to the requirements on them stated above, + such inconsistencies will not arise---this statement of priorities + only has practical impact in the case of non-conforming XML MIME + entities. - If an entity of one of the types defined above is received where the - charset parameter is omitted, no information is being provided about - the charset by the MIME Content-Type header. Conforming XML - processors MUST follow the requirements in section 4.3.3 of [XML] - that directly address this contingency. MIME processors that are not - XML processors SHOULD NOT assume a default charset if the charset - parameter is omitted from such an entity. + If an XML MIME entity is received where the charset parameter is + omitted, no information is being provided about the charset by the + MIME Content-Type header. XML-aware processors MUST follow the + requirements in section 4.3.3 of [XML] that directly address this + case. XML-unaware MIME processors SHOULD NOT assume a default + charset in this case. 4. The Byte Order Mark (BOM) and Charset Conversions Section 4.3.3 of [XML] specifies that XML MIME entities in the charset "utf-16" MUST begin with a byte order mark (BOM), which is a hexadecimal octet sequence 0xFE 0xFF (or 0xFF 0xFE, depending on endian). The XML Recommendation further states that the BOM is an encoding signature, and is not part of either the markup or the character data of the XML document. @@ -502,25 +536,26 @@ fragment identifiers is devolved by [RFC3986] to the appropriate media type registration. The syntax and semantics of fragment identifiers for the XML media types defined in this specification are based on the [XPointerFramework] W3C Recommendation. It allows simple names, and more complex constructions based on named schemes. When the syntax of a fragment identifier part of any URI or IRI with a retrieved media type governed by this specification conforms to the syntax specified in [XPointerFramework], conforming applications MUST - interpret such fragment identifiers as designating that part of the - retrieved representation specified by [XPointerFramework] and - whatever other specifications define any XPointer schemes used. - Conforming applications MUST support the 'element' scheme as defined - in [XPointerElement], but need not support other schemes. + interpret such fragment identifiers as designating whatever is + specified by the [XPointerFramework] together with any other + specifications governing the XPointer schemes used in those + identifiers which the applications support. Conforming applications + MUST support the 'element' scheme as defined in [XPointerElement], + but need not support other schemes. If an XPointer error is reported in the attempt to process the part, this specification does not define an interpretation for the part. A registry of XPointer schemes [XPtrReg] is maintained at the W3C. Document authors SHOULD NOT use unregistered schemes. Scheme authors SHOULD register their schemes ([XPtrRegPolicy] describes requirements and procedures for doing so). See Section 8.1 for additional requirements which apply when an XML- @@ -556,21 +590,21 @@ specified in an external DTD subset or external parameter entity. Since conforming XML processors need not always read and process external entities, the effect of such an external default is uncertain and therefore its use is NOT RECOMMENDED. 7. XML Versions application/xml, application/xml-external-parsed-entity, and application/xml-dtd, text/xml and text/xml-external-parsed-entity are to be used with [XML]. In all examples herein where version="1.0" is - shown, it is understood that version="1.1" may also be used, + shown, it is understood that version="1.1" might also appear, providing the content does indeed conform to [XML1.1]. The normative requirement of this specification upon XML documents and processors is to follow the requirements of [XML], section 4.3.3. Except for minor clarifications, that section is substantially identical from the first edition to the current (5th) edition of XML 1.0, and for XML 1.1 1st or 2nd edition [XML1.1]. Therefore, references herein to [XML] may be interpreted as referencing any existing version or edition of XML, or any subsequent edition or version which makes no incompatible changes to that section. @@ -714,39 +749,39 @@ and also Section 3.6, for guidelines on the use of the 'charset' parameter. Security considerations: See Section 11. Contact: See Authors' Addresses section. Author: See Authors' Addresses section. Change controller: The XML specification is a work product of the - World Wide Web Consortium's XML Working Group. + World Wide Web Consortium's XML WorkingCore Group. 9. Examples The examples below give the charset portion, if any, of the value of the MIME Content-type header and the XML declaration or Text declaration (which includes the encoding declaration) inside the XML MIME entity. For UTF-16 examples, the Byte Order Mark character appropriately UTF-16-encoded is denoted as "{BOM}", and the XML or Text declaration is assumed to come at the beginning of the XML MIME entity, immediately following the encoded BOM. Note that other MIME headers may be present, and the XML MIME entity may contain other data in addition to the XML declaration; the examples focus on the Content-type header and the encoding declaration for clarity. All the examples below apply to all five media types declared above in Section 3, as well as to any media types declared using the '+xml' convention (with the exception of the examples involving the charset - parameter for any such media types which to not enable its use). See + parameter for any such media types which do not enable its use). See the XML MIME entities table (Section 3, Paragraph 2) for discussion of which types are appropriate for which varieties of XML MIME entities. This section is non-normative. In particular, note that all [RFC2119] language herein reproduces or summarizes the consequences of normative statements already made above, and has no independent normative force, and accordingly does not appear in uppercase. 9.1. UTF-8 Charset @@ -1011,23 +1046,24 @@ remotely, making it possible for a text display operation to directly perform some unwanted action. As such, the ability to program keys SHOULD be blocked either by filtering or by disabling the ability to program keys entirely. Note that it is also possible to construct XML documents that make use of what XML terms "[XML-]entity references" to construct repeated expansions of text. Recursive expansions are prohibited by [XML] and XML processors are required to detect them. However, even non- recursive expansions may cause problems with the finite computing - resources of computers, if they are performed many times. (XML- - entity A consists of 100 copies of XML-entity B, which in turn - consists of 100 copies of XML-entity C, and so on) + resources of computers, if they are performed many times. For + example, consider the case where XML-entity A consists of 100 copies + of XML-entity B, which in turn consists of 100 copies of XML-entity + C, and so on. 12. References 12.1. Normative References [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail @@ -1177,21 +1213,21 @@ [RFC3023] contains a detailed discussion of the (at the time) novel use of a suffix, a practice which has since become widespread. Interested parties are referred to [RFC3023], Appendix A. Appendix B. Changes from RFC 3023 There are numerous and significant differences between this specification and [RFC3023], which it obsoletes. This appendix summarizes the major differences only. - First, XPointer ([XPointerFramework] and [XPointerElement] has been + First, XPointer ([XPointerFramework] and [XPointerElement]) has been added as fragment identifier syntax for "application/xml", and the XPointer Registry ([XPtrReg]) mentioned. Second, [XMLBase] has been added as a mechanism for specifying base URIs. Third, the language regarding character sets was updated to correspond to the W3C TAG finding Internet Media Type registration, consistency of use [TAGMIME]. Fourth, many references are updated, and the existence of and relevance of the spec. to XML 1.1 acknowledged. Finally, a number of justifications and contextualizations which were appropriate when XML was new have been removed, including the whole of the original Appendix A.